ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical

Size: px
Start display at page:

Download "ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical"

Transcription

1 IEEE TRANSACTIONS ON INFORMATION THEORY Large Alphabet Source Coding using Independent Coponent Analysis Aichai Painsky, Meber, IEEE, Saharon Rosset and Meir Feder, Fellow, IEEE arxiv:67.7v [cs.it] Jul 6 Abstract Large alphabet source coding is a basic and well studied proble in data copression. It has any applications such as copression of natural language text, speech and iages. The classic perception of ost coonly used ethods is that a source is best described over an alphabet which is at least as large as the observed alphabet. In this work we challenge this approach and introduce a conceptual fraework in which a large alphabet source is decoposed into as statistically independent as possible coponents. This decoposition allows us to apply entropy encoding to each coponent separately, while benefiting fro their reduced alphabet size. We show that in any cases, such decoposition results in a su of arginal entropies which is only slightly greater than the entropy of the source. Our suggested algorith, based on a generalization of the Binary Independent Coponent Analysis, is applicable for a variety of large alphabet source coding setups. This includes the classical lossless copression, universal copression and high-diensional vector quantization. In each of these setups, our suggested approach outperfors ost coonly used ethods. Moreover, our proposed fraework is significantly easier to ipleent in ost of these cases. I. INTRODUCTION ASSUME a source over an alphabet size, fro which a sequence of n independent saples are drawn. The classical source coding proble is concerned with finding a saple-to-codeword apping, such that the average codeword length is inial, and the saples ay be uniquely decodable. This proble was studied since the early days of inforation theory, and a variety of algoriths [], [] and theoretical bounds [] were introduced throughout the years. The classical source coding proble usually assues an alphabet size which is sall, copared with n. Here, we focus on a ore difficult (and coon) scenario, where the source s alphabet size is considered large (for exaple, a word-wise copression of natural language texts). In this setup, takes values which are either coparable (or even larger) than the length of the sequence n. The ain challenge in large alphabet source coding is that the redundancy of the code, forally defined as the excess nuber of used over the source s entropy, typically increases with the alphabet size []. In this work we propose a conceptual fraework for large alphabet source coding, in which we reduce the alphabet size by decoposing the source into ultiple coponents which are as statistically independent as possible. This allows us to encode each of the coponents separately, while benefiting fro the reduced redundancy of the saller alphabet. To utilize this concept we introduce a fraework based on a generalization of the Binary Independent Coponent Analysis (BICA) ethod [5]. This fraework efficiently searches for an invertible transforation which iniizes the difference between the su of arginal entropies (after the transforation is applied) and the joint entropy of the source. Hence, it iniizes the (attainable) lower bound on the average codeword length, when applying arginal entropy coding. We show that while there exist sources which cannot be efficiently decoposed, their portion (of all possible sources over a given alphabet size) is sall. Moreover, we show that the difference between the su of arginal entropies (after our transforation is applied) and the joint entropy, is bounded, on average, by a sall constant for every (when the averaging takes place uniforly, over all possible sources of the sae alphabet size ). This iplies that our suggested approach is suitable for any sources of increasing alphabet size. We deonstrate our ethod in a variety of large alphabet source coding setups. This includes the classical lossless coding, when the probability distribution of the source is known both to the encoder and the decoder, universal lossless coding, in which the decoder is not failiar with the distribution of the source, and lossy coding in the for of vector quantization. We show that our approach outperfors currently known ethods in all these setups, for a variety of typical sources. The rest of this anuscript is organized as follows: After a short notation section, we review the work that was previously done in large alphabet source coding in Section III. Section IV presents the generalized BICA proble, propose two different algoriths, and deonstrate their behavior on average and in worst-case. In Section V we apply our suggested fraework to the classical lossless coding proble, over large alphabets. We then extend the discussion to universal copression in Section VI. In Section VII we further deonstrate our approach on vector quantization, with a special attention to high diensional sources and low. A. Painsky and S. Rosset are with the Statistics Departent, Tel Aviv University, Tel Aviv, Israel. contact: aichaip@eng.tau.ac.il M. Feder is with the Departent of Electrical Engineering, Tel Aviv University, Tel Aviv, Israel The aterial in this paper was presented in part at the International Syposiu on Inforation Theory (ISIT) and the Data Copression Conference (DCC) 5.

2 IEEE TRANSACTIONS ON INFORMATION THEORY II. NOTATION Throughout this paper we use the following standard notation: underlines denote vector quantities, where their respective coponents are written without underlines but with index. For exaple, the coponents of the n-diensional vector X are X, X,... X n. Rando variables are denoted with capital letters while their realizations are denoted with the respective lower-case letters. P X (x) P (X = x, X = x... ) is the probability function of X while H (X) is the entropy of X. This eans H (X) = x P X (x) log P X (x) where the log function denotes a logarith of base and li x x log (x) =. Specifically, we refer to the binary entropy of a Bernoulli distributed rando variable X Ber(p) as H b (X), while we denote the binary entropy function as h b (p) = p log p ( p) log ( p). III. PREVIOUS WORK In the classical lossless data copression fraework, one usually assues that both the encoder and the decoder are failiar with the probability distribution of the encoded source, X. Therefore, encoding a sequence of n eoryless saples drawn for this this source takes on average at least n ties its entropy H (X), for sufficiently large n []. In other words, if n is large enough to assue that the joint epirical entropy of the saples, Ĥ (X), is close enough to the true joint entropy of the source, H (X), then H (X) is the inial average nuber of required to encode a source sybol. Moreover, it can be shown [] that the iniu average codeword length, l in, for a uniquely decodable code, satisfies H (X) l in H (X) +. () Entropy coding is a lossless data copression schee that strives to achieve the lower bound, l in = H (X). Two of the ost coon entropy coding techniques are Huffan coding [] and arithetic coding []. The Huffan algorith is an iterative construction of variable-length code table for encoding the source sybols. The algorith derives this table fro the probability of occurrence of each source sybol. Assuing these probabilities are dyadic (i.e., log(p(x)) is an integer for every sybol x X), then the Huffan algorith achieves l in = H (X). However, in the case where the probabilities are not dyadic then the Huffan code does not achieve the lower-bound of () and ay result in an average codeword length of up to H (X) +. Moreover, although the Huffan code is theoretically easy to construct (linear in the nuber of sybols, assuing they are sorted according to their probabilities) it is practically a challenge to ipleent when the nuber of sybols increases [6]. Huffan codes achieve the iniu average codeword length aong all uniquely decodable codes that assign a separate codeword to each sybol. However, if the probability of one of the sybols is close to, a Huffan code with an average codeword length close to the entropy can only be constructed if a large nuber of sybols is jointly coded. The popular ethod of arithetic coding is designed to overcoe this proble. In arithetic coding, instead of using a sequence of to represent a sybol, we represent it by a subinterval of the unit interval []. This eans that the code for a sequence of sybols is an interval whose length decreases as we add ore sybols to the sequence. This property allows us to have a coding schee that is increental. In other words, the code for an extension to a sequence can be calculated siply fro the code for the original sequence. Moreover, the codeword lengths are not restricted to be integral. The arithetic coding procedure achieves an average length for the block that is within of the entropy. Although this is not necessarily optial for any fixed block length (as we show for Huffan code), the procedure is increental and can be used for any block-length. Moreover, it does not requir the source probabilities to be dyadic. However, arithetic codes are ore coplicated to ipleent and are a less likely to practically achieve the entropy of the source as the nuber of sybols increases. More specifically, due to the well-known underflow and overflow probles, finite precision ipleentations of the traditional adaptive arithetic coding cannot work if the size of the source exceeds a certain liit [7]. For exaple, the widely used arithetic coder by Witten et al. [] cannot work when the alphabet size is greater than 5. The iproved version of arithetic coder by Moffat et al. [8] extends the alphabet to size by using low precision arithetic, at the expense of copression perforance. Notice that a large nuber of sybols not only results with difficulties in ipleenting entropy codes: as the alphabet size increases, we require an exponentially larger nuber of saples for the epirical entropy to converge to the true entropy. Therefore, when dealing with sources over large alphabets we usually turn to a universal copression fraework. Here, we assue that the epirical probability distribution is not necessarily equal to the true distribution and henceforth unknown to the decoder. This eans that a copressed representation of the saples now involves with two parts the copressed saples and an overhead redundancy (where the redundancy is defined as the nuber of used to transit a essage, inus the nuber of of actual inforation in the essage). As entioned above, encoding a sequence of n saples, drawn fro a eoryless source X, requires at least n ties the epirical entropy, Ĥ(X). This is attained through entropy coding according to the source s epirical distribution. The redundancy, on the other hand, ay be quantified in several ways. One coon way of easuring the coding redundancy is through the iniax criterion []. Here, the worst-case redundancy is the lowest nuber of extra (over the epirical entropy) required in the worst case (that is, aong all sequences) by any possible encoder. Many worst-case redundancy results are known when the source s alphabet is finite. A succession of

3 IEEE TRANSACTIONS ON INFORMATION THEORY papers initiated by [9] show that for the collection I n of i.i.d. distributions over length-n sequences drawn fro an alphabet of a fixed size, the worst-case redundancy behaves asyptotically as log n, as n grows. Orlitsky and Santhana [] extended this result to cases where varies with n. The standard copression schee they introduce differentiates between three situations in which = o(n), n = o() and = Θ(n). They provide leading ter asyptotics and bounds to the worst-case iniax redundancy for these ranges of the alphabet size. Szpankowski and Weinberger [] copleted this study, providing the precise asyptotics to these ranges. For the purpose of our work we adopt the leading ters of their results, showing that the worst-case iniax redundancy, when, as n grows, behaves as follows: n i For = o(n): ˆR(I ) log n + log e log e + () n n ii For n = o(): ˆR(I ) n log n + n log e n log e () n iii = αn + l(n): ˆR(I ) n log B α + l(n) log C α log A α () where α is a positive constant, l(n) = o(n) and C α + + α, A α C α + α, B α αc α+ α e Cα. In a landark paper fro, Orlitsky et al. [] presented a novel fraework for universal copression of eoryless sources over unknown and possibly infinite alphabets. According to their fraework, the description of any string, over any alphabet, can be viewed as consisting of two parts: the sybols appearing in the string and the pattern that they for. For exaple, the string abracadabra can be described by conveying the pattern 5 and the dictionary index 5 letter a b r c d Together, the pattern and dictionary specify that the string abracadabra consists of the first letter to appear (a), followed by the second letter to appear (b), then by the third to appear (r), the first that appeared (a again), the fourth (c), etc. Therefore, a copressed string involves with a copression of the pattern and its corresponding dictionary. Orlitsky et al. derived the bounds for pattern copression, showing that the redundancy of patterns copression under i.i.d. distributions over potentially infinite alphabets is bounded by ( log e) n /. Therefore, assuing the alphabet size is and the nuber of uniquely observed sybols is n, the dictionary can be described in n log, leading to an overall lower bound of n log + n / on the copression redundancy. An additional (and very coon) universal copression schee is the canonical Huffan coding []. A canonical Huffan code is a particular type of Huffan code with unique properties which allow it to be described in a very copact anner. The advantage of a canonical Huffan tree is that one can encode a codebook in fewer than a fully described tree. Since a canonical Huffan codebook can be stored especially efficiently, ost copressors start by generating a non-canonical Huffan codebook, and then convert it to a canonical for before using it. In canonical Huffan coding the bit lengths of each sybol are the sae as in the traditional Huffan code. However, each code word is replaced with new code words (of the sae length), such that a subsequent sybol is assigned the next binary nuber in sequence. For exaple, assue a Huffan code for four sybols, A to D: Applying canonical Huffan coding to it we have sybol A B C D codeword sybol B A C D codeword This way we do not need to store the entire Huffan apping but only a list of all sybols in increasing order by their bit-lengths and record the nuber of sybols for each bit-length. This allows a ore copact representation of the code, hence, lower redundancy. An additional class of data encoding ethods which we refer to in this work is lossy copression. In the lossy copression setup one applies inexact approxiations for representing the content that has been encoded. In this work we focus on vector quantization, in which a high-diensional vector X R d is to be represented by a finite nuber of points. Vector quantization works by clustering the observed saples of the vector X into groups, where each group is represented by its centroid point, such as in k-eans and other clustering algoriths. Then, the centroid points that represent the observed saples are copressed in a lossless anner.

4 IEEE TRANSACTIONS ON INFORMATION THEORY In the lossy copression setup, one is usually interested in iniizing the aount of which represent the data for a given a easure (or equivalently, iniizing the for a given copressed data size). The rate- function defines the lower bound on this objective. It is defined as R (D) = in I(X; Y ) s.t. E {D(X, Y )} D (5) P (Y X) where X is the source, Y is recovered version of X and D(X, Y ) is soe easure between X and Y. Notice that since the quantization is a deterinistic apping between X and Y, we have that I(X; Y ) = H(Y ). The Entropy Constrained Vector Quantization (ECVQ) is an iterative ethod for clustering the observed saples fro X into centroid points which are later represented by a inial average codeword length. The ECVQ algorith iniizes the Lagrangian L = E {D(X, Y )} + λe {l(x)} (6) where λ is the Lagrange ultiplier and E (l(x)) is the average codeword length for each sybol in X. The ECVQ algorith perfors an iterative local iniization ethod siilar to the generalized Lloyd algorith []. This eans that for a given clustering of saples it constructs an entropy code to iniize the average codeword lengths of the centroids. Then, for a given coding of centroids it clusters the observed saples such that the average is iniized, biased by the length of the codeword. This process continues until a local convergence occurs. The ECVQ algorith perfors local optiization (as a variant of the k-eans algorith) which is also not very scalable for an increasing nuber of saples. This eans that in the presence of a large nuber of saples, or when the alphabet size of the saples is large enough, the clustering phase of the ECVQ becoes ipractical. Therefore, in these cases, one usually uses a predefined lattice quantizer and only constructs a corresponding codebook for its centroids. It is quite evident that large alphabet sources entails a variety of difficulties in all the copression setups entioned above: it is ore coplicated to construct an entropy code for, it results in a great redundancy when universally encoded and it is uch ore challenging to design a vector quantizer for. In the following sections we introduce a fraework which is intended to overcoe these drawbacks. IV. GENERALIZED BINARY INDEPENDENT COMPONENT ANALYSIS A coon iplicit assuption to ost copression schees in that the source is best represented over its observed alphabet size. We would like to challenge this assuption, suggesting that in soe cases there exists a transforation which decoposes a source into ultiple as independent as possible coponents whose alphabet size is uch saller. A. Proble Forulation Suppose we are given a binary rando vector X p of a diension d. We are interested in an invertible transforation Y = g(x) such that Y is of the sae diension and alphabet size, g : d d. In addition we would like the coponents () of Y to be as statistically independent as possible. Notice that an invertible transforation of a vector X is actually a one-toone apping (i.e. perutation) of its = d alphabet sybols. Therefore, there exist d! possible invertible transforations. To quantify the statistical independence aong the coponents of the vector Y we use the well-known total correlation easure as a ultivariate generalization of the utual inforation, C(Y ) = H b (Y j ) H(Y ). (7) j= This easure can also be viewed as the cost of encoding the vector Y coponent-wise, as if its coponents were statistically independent, copared to its true entropy. Notice that the total correlation is non-negative and equals zero iff the coponents of Y are utually independent. Therefore, as statistically independent as possible ay be quantified by iniizing C(Y ). The total correlation easure was first considered as an objective for inial redundancy representation by Barlow [5]. It is also not new to finite field ICA probles, as deonstrated in [6]. Since we define Y to be an invertible transforation of X we have H(Y ) = H(X) and our iniization objective is H b (Y j ) in. (8) j= We notice that P (Y j = ) is the su of probabilities of all words whose j th bit equals. We further notice that the optial transforation is not unique. For exaple, we can always invert the j th bit of all words, or even shuffle the, to achieve the sae iniu. In the following sections we review and introduce several ethods for solving (8). As a first step towards this goal we briefly describe the generalized BICA ethod. A coplete derivation of this fraework appears in [7]. Then, Sections IV-C,IV-D and IV-E provide a siplified novel ethod for (8) and discuss its theoretical properties.

5 IEEE TRANSACTIONS ON INFORMATION THEORY 5 B. Piece-wise Linear Relaxation Algorith In this section we briefly review our suggested ethod, as it appears in detail in [7]. Let us first notice that the proble we are dealing with (8) is a concave iniization proble over a discrete perutation set which is a hard proble. However, let us assue for the oent that instead of our true objective (8) we have a sipler linear objective function. That is, L(Y ) = a j π j + b j = c i P (Y = y(i)) + d (9) j= where π j = p(y j = ) and the last equality changes the suation over d to a suation over all = d sybols. In order to iniize this objective function over the given probabilities p we siply sort these probabilities in a descending order and allocate the such that the largest probability goes with the sallest coefficient c i and so on. Assuing both the coefficients and the probabilities are known and sorted in advance, the coplexity of this procedure is linear in. We now turn to the generalized BICA proble, defined in (8). Since our objective is concave we would first like to bound it fro above with a piecewise linear function which contains k pieces, as shown in Figure. We show that solving the piecewise linear proble approxiates the solution to (8) as closely as we want. i= h(p)... R R R R p Fig. : piecewise linear (k = ) relaxation to the binary entropy First, we notice that all π j s are exchangeable (in the sense that we can always interchange the and achieve the sae result). This eans we can find the optial solution to the piece-wise linear proble by going over all possible cobinations of placing the d variables π j in the k different regions of the piece-wise linear function. For each of these cobinations we need to solve a linear proble (9), where the iniization is with respect to allocation of the given probabilities p, and with additional constraints on the ranges of each π j. For exaple, assue d = and the optial solution is such that two π j s (e.g. π and π ) are at the first region R and π is at the second region R, then we need to solve the following constrained linear proble, iniize a (π + π ) + b + a π + b subject to π, π R, π R () where the iniization is over the allocation of the given {p i } i=, which deterine the corresponding π j s, as deonstrated in (9). This proble again sees hard. However, if we attept to solve it without the constraints we notice the following: ) If the collection of π j s which define the optial solution to the unconstrained linear proble happens to eet the constraints then it is obviously the optial solution with the constraints. ) If the collection of π j s of the optial solution does not eet the constraints (say, π R ) then, due to the concavity of the entropy function, there exists a different cobination with a different constrained linear proble, iniize a π + b + a (π + π ) + b subject to π R π, π R in which this set of π j s necessarily achieves a lower iniu (since a x + b < a x + b x R ). Therefore, in order to find the optial solution to the piece-wise linear proble, all we need to do is to go over all possible cobinations of placing the π j s in k different regions, and for each cobination solve an unconstrained linear proble (which is solved in linear tie in ). If the solution does not eet the constraint then it eans that the assuption that the optial π j reside within this cobination s regions is false. Otherwise, if the solution does eet the constraint, it is considered as a candidate for the global optial solution.

6 IEEE TRANSACTIONS ON INFORMATION THEORY 6 The nuber of cobinations we need to go through is equivalent to the nuber of ways of placing d identical balls in k boxes, which is (for a fixed k), ( ) d + k = O(d k ). () d Assuing the coefficients are all known and sorted in advance, the overall coplexity of our suggested algorith, as d, is just O(d k d ) = O( log k ). Notice that any approach which exploits the full statistical description of X would require going over the probabilities of all of its sybols at least once. Therefore, a coputational load of at least O() = O( d ) sees inevitable. Still, this is significantly saller then O(!) = O( d!), required by brute-force search over all possible perutations. It is also iportant to notice that even though the asyptotic coplexity of our approxiation algorith is O( log k ), it takes a few seconds to run an entire experient on a standard personal coputer (with d = and k = 8, for exaple). The reason is that the factor coes fro the coplexity of sorting a vector and ultiplying two vectors, operations which are coputationally efficient on ost available software. Moreover, if we assue that the linear probles coefficients (9) are calculated, sorted and stored in advance, we can place the in a atrix for and ultiply the atrix with the (sorted) vector p. The iniu of this product is exactly the solution to the linear approxiation proble. Therefore, the practical asypthotic coplexity of the approxiation algorith drops to a single ultiplication of a (log k () ) atrix with a ( ) vector. Even though the coplexity of this ethod is significantly lower than full enueration, it ay still be coputationally infeasible as increases. Therefore, we suggest a sipler (greedy) solution, which is uch easier to ipleent and apply. C. Order Algorith As entioned above, the iniization proble we are dealing with (8) is cobinatorial in its essence and is consequently considered hard. We therefore suggest a siplified greedy algorith which strives to sequentially iniize each ter of the suation (8), H b (Y j ), for j =,..., d. With no loss of generality, let us start by iniizing H b (Y ), which corresponds to the arginal entropy of the ost significant bit (sb). Since the binary entropy is onotonically increasing in the range [, ], we would like to find a perutation of p that iniizes a su of half of its values. This eans we should order the p i s so that half of the p i s with the sallest values are assigned to P (Y ) = while the other half of p i s (with the largest values) are assigned to P (Y ) =. For exaple, assuing = 8 and p p p 8, a perutation which iniizes H b (Y ) is codeword probability p p p p p 8 p 5 p 6 p 7 We now proceed to iniize the arginal entropy of the second ost significant bit, H b (Y ). Again, we would like to assign P (Y ) = the sallest possible values of p i s. However, since the we already deterined which p i s are assigned to the sb, all we can do is reorder the p i s without changing the sb. This eans we again sort the p i s so that the sallest possible values are assigned to P (Y ) =, without changing the sb. In our exaple, this leads to, codeword probability p p p p p 6 p 5 p 8 p 7 Continuing in the sae anner, we would now like to reorder the p i s to iniize H b (Y ) without changing the previous. This results with codeword probability p p p p p 5 p 6 p 7 p 8 Therefore, we show that a greedy solution to (8) which sequentially iniizes H b (Y j ) is attained by siply ordering the joint distribution p in an ascending (or equivalently descending) order. In other words, the order perutation suggests to siply order the probability distribution p,..., p in an ascending order, followed by a apping of the i th sybol (in its binary representation) the i th sallest probability. At this point it sees quite unclear how well the order perutation perfors, copared both with the relaxed BICA we previously discussed, and the optial perutation which iniizes (8). In the following sections we introduce soe theoretical properties which deonstrate its effectiveness. D. Worst-case Independent Representation We now introduce the theoretical properties of our suggested algoriths. Naturally, we would like to quantify how uch we lose by representing a given rando vector X as if its coponents are statistically independent. Therefore, for any given rando vector X p and an invertible transforation Y = g(x), we denote the cost function C(p, g) = d j= H b(y j ) H(X), as appears in (8).

7 IEEE TRANSACTIONS ON INFORMATION THEORY 7 Since both our ethods strongly depend on the given probability distribution p, we focus on the worst-case and the average case of C(p, g), with respect to p. Let us denote the order perutation as g ord and the perutation which is found by the piece-wise linear relaxation as g lin. We further define g bst as the perutation that results with a lower value of C(p, g), between g lin and g ord. This eans that g bst = arg in C(p, g). {g lin,g ord } In addition, we define g opt as the optial perutation that iniizes (8) over all possible perutations. Therefore, for any given p, we have that C( p, g opt ) C( p, g bst ) C( p, g ord ). In this Section we exaine the worst-case perforance of both of our suggested algoriths. Specifically, we would like to quantify the axiu of C(p, g) over all joint probability distributions p, of a given alphabet size. Theore : For any rando vector X p, over an alphabet size we have that ax C(p, g opt ) = Θ(log()) p Proof We first notice that d j= H b(y j ) d = log(). In addition, H(X). Therefore, we have that C(p, g opt ) is bounded fro above by log(). Let us also show that this bound is tight, in the sense that there exists a joint probability distribution p such that C( p, g opt ) is linear in log(). Let p = p = = p = ( ) and p =. Then, p is ordered and satisfies P (Y i = ) = 6( ). In addition, we notice that assigning sybols in a decreasing order to p (as entioned in Section IV-C) results with an optial perutation. This is siply since P (Y j = ) = 6( ) is the inial possible value of any P (Y j = ) that can be achieved when suing any eleents of p i. Further we have that, C( p, g opt ) = H b (Y j ) H(X) = j= ( ) log() h b 6( ) ( log() h b 6( ) Therefore, ax C(p, g opt ) = Θ(log()). p ( + ( ) ( ) log ( ) + log ) = ) log( ) + log + log log() () ( ( ) h b ) ( ) h b. 6 Theore shows that even the optial perutation achieves a su of arginal entropies which is Θ(log()) greater than the joint entropy, in the worst case. This eans that there exists at least one source X with a joint probability distribution which is ipossible to encode as if its coponents are independent without losing at least Θ(log()). However, we now show that such sources are very rare. E. Average-case Independent Representation In this section we show that the expected value of C(p, g opt ) is bounded by a sall constant, when averaging uniforly over all possible p over an alphabet size. To prove this, we recall that C(p, g opt ) C(p, g ord ) for any given probability distribution p. Therefore, we would like to find the expectation of C(p, g ord ) where the rando variables are p,..., p, taking values over a unifor siplex. Proposition : Let X p be a rando vector of an alphabet size and a joint probability distribution p. The expected joint entropy of X, where the expectation is over a unifor siplex of joint probability distributions p is E p {H(X)} = (ψ( + ) ψ()) log e where ψ is the digaa function. The proof of this proposition is left for the Appendix. We now turn to exaine the expected su of the arginal entropies, d j= H b(y j ) under the order perutation. As described above, the order perutation suggests sorting the probability distribution p,..., p in an ascending order, followed by apping of the i th sybol (in a binary representation) the i th sallest probability. Let us denote p () p () the ascending ordered probabilities p,..., p. Bairaov et al. [8] show that the expected value of p (i) is E { p (i) } = k=+ i k = (K K i ) ()

8 IEEE TRANSACTIONS ON INFORMATION THEORY 8 where K = k= k is the Haronic nuber. Denote the ascending ordered binary representation of all possible sybols in a atrix for A {, } ( d). This eans that entry A ij corresponds to the j th bit in the i th sybol, when the sybols are given in an ascending order. Therefore, the expected su of the arginal entropies of Y, when the expectation is over a unifor siplex of joint probability distributions p, follows E p H b (Y j ) (a) j= h b (E p {Y j }) = (b) j= ( h b j= ) A ij (K K i ) i= = (c) j= h b ( K ) A ij K i () where (a) follows fro Jensen s inequality, (b) follows fro () and (c) follows since i= A ij = for all j =,..., d. We now turn to derive asyptotic bounds of the expected difference between the su of Y s arginal entropies and the joint entropy of X, as appears in (8). Theore : Let X p be a rando vector of an alphabet size and joint probability distribution p. Let Y = g ord (X) be the order perutation. For d, the expected value of C(p, g ord ), over a unifor siplex of joint probability distributions p, satisfies ( ) E p C(p, g ord ) = E p H b (Y j ) H(X) <.6 + O j= Proof Let us first derive the expected arginal entropy of the least significant bit, j =, according to (). E p {H b (Y )} h b K ( h b K ( ( h b log e ( ( h b log e / K i i= = h b K K i i= i= K i ( K K + ) ) ( ( = h b K K + )) < (b) ) + ( )) ( ( ) + O h b (c) log e + ) ( ) + O h b ) + ) ( ) + O where (a) and (b) follow the haronic nuber properties: (a) i= K i = ( + )K + ( + ) (b) (+) < K log e () γ <, where γ is the Euler-Mascheroni constant [9] and (c) results fro the concavity of the binary entropy. Repeating the sae derivation for different values of j, we attain E p {H b (Y j )} h b K j l j ( ) l+ K i = h b K j ( ) l l= i= l= h b K j ( ( ) l l j K l l ) j j < h b j i= l= ( ) i+ i j log e ( ) i j + + O ( ) i= = (5) (a) j =,..., d. ( log e l j i= ( ) + ) = K i = (6) We ay now evaluate the su of expected arginal entropies of Y. For siplicity of( derivation let us obtain E p {H b (Y j )} for j =,..., according to (6) and upper bound E p {H b (Y j )} for j > with h b ) =. This eans that for d we have E p H b (Y j ) < E p (H b {Y j }) + j= j= ( ) h b j= ( ) < (d ) + O. (7) The expected joint entropy ay also be expressed in a ore copact anner. In Proposition it is shown than E p {H(X)} = log e (ψ( + ) ψ()). Following the inequality in [9], the Digaa function, ψ( + ), is bounded fro below by ψ( + ) = H γ > log e () + (+). Therefore, we conclude that for d we have that

9 IEEE TRANSACTIONS ON INFORMATION THEORY 9 E p H b (Y j ) H(X) j= ( ) ( ) ψ() < (d ) log () + log e + O =.6 + O In addition, we would like to evaluate the expected difference between the su of arginal entropies and the joint entropy of X, that is, without applying any perutation. This shall serve us as a reference to the upper bound we achieve in Theore. Theore : Let X p be a rando vector of an alphabet size and joint probability distribution p. The expected difference between the su of arginal entropies and the joint entropy of X, when the expectation is taken over a unifor siplex of joint probability distributions p, satisfies E p H b (X j ) H(X) < ψ() log e =.699 j= Proof We first notice that P (X j = ) equals the su of one half of the probabilities p i, i =,..., for every j =... d. Assue p i s are randoly (and uniforly) assigned to each of the sybols. Then, E{P (X j = )} = for every j =... d. Hence, E p H b (X j ) H(X) = j= j= E p {H b (X j )} E p {H(X)} < d log () + ( ψ() log e To conclude, we show that for a rando vector X over an alphabet size, we have E p C(p, g opt ) E p C(p, g bst ) E p C(p, g ord ) <.6 + O ( ) ) < ψ() ( + ) log e for d, where the expectation is over a unifor siplex of joint probability distributions p. This eans that when the alphabet size is large enough, even the siple order perutation achieves, on the average, a su of arginal entropies which is only.6 greater than the joint entropy, when all possible probability distributions p are equally likely to appear. Moreover, we show that the siple order perutation reduced the expected difference between the su of the arginal entropies and the joint entropy of X by ore than half a bit, for sufficiently large. (8) V. LARGE ALPHABET SOURCE CODING Assue a classic copression setup in which both the encoder and the decoder are failiar with the joint probability distribution of the source X p, and the nuber of observations n is sufficiently large in the sense that Ĥ(X) H(X). As discussed above, both Huffan and arithetic coding entail a growing redundancy and a quite involved ipleentation as the alphabet size increases. The Huffan code guarantees a redundancy of at ost a single bit for every alphabet size, depending on the dyadic structure of p. On the other hand, arithetic coding does not require a dyadic p, but only guarantees a redundancy of up to two, and is practically liited for saller alphabet size [], [7]. In other words, both Huffan and arithetic coding are quite likely to have an average codeword length which is greater than H(X), and are coplicated (or soeties even ipossible) to ipleent, as increases. To overcoe these drawbacks, we suggest a siple solution in which we first apply an invertible transforation to ake the coponents of X as statistically independent as possible, following an entropy coding on each of its coponents separately. This schee results with a redundancy which we previously defined as C(p, g) = j= H(Y j) H(X). However, it allows us to apply a Huffan or arithetic encoding on each of the coponents separately; hence, over a binary alphabet. Moreover, notice we can group several coponents, Y j, into blocks so that the joint entropy of the block is necessarily lower than the su of arginal entropies of Y j. Specifically, denote b as the nuber of coponents in each block and B as the nuber of blocks. Then, b B = d and for each block v =,..., B we have that H(Y (v) ) b u= H b (Y (v) u ) (9)

10 IEEE TRANSACTIONS ON INFORMATION THEORY where H(Y (v) ) is the entropy of the block v and H b (Y (v) u ) is the arginal entropy of the u th coponent of the block v. Suing over all B blocks we have B H(Y (v) ) v= B b v= u= H b (Y (v) u ) = H b (Y j ). () This eans we can always apply our suggested invertible transforation which iniizes d j= H b(y j ), and then the group coponents into B blocks and encode each block separately. This results with B v= H(Y (v) ) d j= H b(y j ). By doing so, we increase the alphabet size of each block (to a point which is still not probleatic to ipleent with Huffan or arithetic coding) while at the sae tie we decrease the redundancy. We discuss different considerations in choosing the nuber of blocks B in the following sections. A ore direct approach of iniizing the su of block entropies B v= H(Y (v) ) is to refer to each block as a sybol over a greater alphabet size, b. This allows us to seek an invertible transforation which iniizes the su of arginal entropies, where each arginal entropy corresponds to a arginal probability distribution over an alphabet size b. This iniization proble is referred to as a generalized ICA over finite alphabets and is discussed in detail in [7]. However, notice that both the Piece-wise Linear Relaxation algorith (Section IV-B), and the solutions discussed in [7], require an extensive coputational effort in finding a iniizer for (8) as the alphabet size increases. Therefore, we suggest applying the greedy order perutation as grows. This solution ay result in quite a large redundancy for a several joint probability distributions p (as shown in Section IV-D). However, as we uniforly average over all possible p s, the redundancy is bounded with a sall constant as the alphabet size increases (Section IV-E). Moreover, the ordering approach siply requires ordering the values of p, which is significantly faster than constructing a Huffan dictionary or arithetic encoder. To illustrate our suggested schee, consider a source X p over an alphabet size, which follows the Zipf s law distribution, P (k; s, ) = k s l= l s where is the alphabet size and s is the skewness paraeter. The Zipf s law distribution is a coonly used heavy-tailed distribution, ostly in odeling of natural (real-world) quantities. It is widely used in physical and social sciences, linguistics, econoics and any other fields. We would like to design an entropy code for X with = 6 and different values of s. We first apply a standard Huffan code as an exaple of a coon entropy coding schee. Notice that we are not able to construct an arithetic encoder as the alphabet size is too large [7]. We further apply our suggested order perutation schee (Section IV-C), in which we sort p in a descending order, followed by arithetic encoding to each of the coponents separately. We further group these coponents into two separate blocks (as discussed above) and apply an arithetic encoder on each of the blocks. We repeat this experient for a range of paraeter values s. Figure deonstrates the results we achieve. Our results show that the Huffan code attains an average codeword length which is very close to the entropy of the source for lower values of s. However, as s increases and the distribution of the source becoes ore skewed, the Huffan code diverges fro the entropy. On the other hand, our suggested ethod succeeds in attaining an average codeword length which is very close to the entropy of the source for every s, especially as s increases and when independently encoding each of the blocks. j= VI. UNIVERSAL SOURCE CODING The classical source coding proble is typically concerned with a source whose alphabet size is uch saller than the length of the sequence. In this case one usually assues that Ĥ(X) H(X). However, in any real world applications such an assuption is not valid. A paradigatic exaple is the word-wise copression of natural language texts. In this setup we draw a eoryless sequence of words, so that the alphabet size is often coparable to or even larger than the length of the source sequence. As discussed above, the ain challenge in large alphabet source coding is the redundancy of the code, which is forally defined as the excess nuber of used over the source s entropy. The redundancy ay be quantified as the expected nuber of extra required to code a eoryless sequence drawn fro X p, when using a code that was constructed for p, rather than using the true code, optiized for the epirical distribution ˆp. Another way to quantify these extra is to directly design a code for ˆp, and transit the encoded sequence together with this code. Here again, we clai that in soe cases, applying a transforation which decoposes the observed sequence into ultiple as independent as possible coponents results in a better copression rate. However, notice that now we also need to consider the nuber of required to describe the transforation. In other words, our redundancy involves not only (7) and the designated code for the observed sequence, but also the invertible transforation we applied on the sequence. This eans that even the siple order perutation (Section IV-C) requires at ost n log

11 IEEE TRANSACTIONS ON INFORMATION THEORY Huffan code Our ethod - Our ethod - blocks Entropy of the source.. Huffan code Our ethod - Our ethod - blocks s s Fig. : Zipf s law siulation results. Left: the curve with the squares is the average codeword length using a Huffan code, the curve with the crosses corresponds to the average codeword length using our suggested ethods when encoding each coponent separately, and the curve with the asterisks is our suggested ethod when encoding each of the two blocks separately. The black curve (which tightly lower-bounds all the curves) is the entropy of the source. Right: The difference between each encoding ethod and the entropy of the source to describe, where is the alphabet size and n is the length of the sequence. This redundancy alone is not copetitive with Szpankowski and Weinberger [] worst-case redundancy results, described in (). Therefore, we require a different approach which iniizes the su of arginal entropies (8) but at the sae tie is sipler to describe. One possible solution is to seek invertible, yet linear, transforations. This eans that describing the transforation would now only require log. However, this generalized linear BICA proble is also quite involved. In their work, Attux et al. [6] describe the difficulties in iniizing (8) over XOR field (linear transforations) and suggest an iune-inspired algorith for it. Their algorith, which is heuristic in its essence, deonstrates soe proising results. However, it is not very scalable (with respect to the nuber of coponents log ) and does not guarantee to converge to the global optial solution. Therefore, we would like to odify our suggested approach (Section IV-B) so that the transforation we achieve requires fewer to describe. As in the previous section, we argue that in soe setups it is better to split the coponents of the data into blocks, with b coponents in each block, and encode the blocks separately. Notice that we ay set the value of b so that the blocks are no longer considered as over a large alphabet size (n b ). This way, the redundancy of encoding each block separately is again negligible, at the cost of longer averaged codeword length. For siplicity of notation we define the nuber of blocks as B, and assue B = d /b is a natural nuber. Therefore, encoding the d coponents all together takes n Ĥ(X) for the data itself, plus a redundancy ter according to () and (), while the block-wise copression takes about n B v= Ĥ(X (v) ) + B b log n b (), where the first ter is n ties the su of B epirical block entropies and the second ter is B ties the redundancy of each block when n = o( b ). Two subsequent questions arise fro this setup: ) What is the optial value of b that iniizes ()? ) Given a fixed value of b, how can we re-arrange d coponents into B blocks so that the averaged codeword length (which is bounded fro below by the epirical entropy), together with the redundancy, is as sall as possible? Let us start by fixing b and focusing on the second question. A naive shuffling approach is to exhaustively or randoly search for all possible cobinations of clustering d coponents into B blocks. Assuing d is quite large, an exhaustive search is practically infeasible. Moreover, the shuffling search space is quite liited and results with a very large value of (7), as shown below. Therefore, a different ethod is required. We suggest

12 IEEE TRANSACTIONS ON INFORMATION THEORY applying our generalized BICA tool as an upper-bound search ethod for efficiently searching for a inial possible averaged codeword length. As in previous sections we define Y = g(x), where g is soe invertible transforation of X. Every block of the vector Y satisfies (9), where the entropy ters are now replaced with epirical entropies. In the sae anner as in Section V, suing over all B blocks results with () where again, the entropy ters are replaced with epirical entropies. This eans that the su of the epirical block entropies is bounded fro above by the epirical arginal entropies of the coponents of Y (with equality iff the coponents are independently distributed). B Ĥ(Y (v) ) v= Ĥ b (Y j ). () Our suggested schee works as follows: We first randoly partition the d coponents into B blocks. We estiate the joint probability of each block and apply the generalized BICA on it. The su of epirical arginal entropies (of each block) is an upper bound on the epirical entropy of each block, as described in the previous paragraph. Now, let us randoly shuffle the d coponents of the vector Y. By shuffle we refer to an exchange of positions of Y s coponents. Notice that by doing so, the su of epirical arginal entropies of the entire vector d i= Ĥb(Y i ) is aintained. We now apply the generalized BICA on each of the (new) blocks. This way we iniize (or at least do not increase) the su of epirical arginal entropies of the (new) blocks. This obviously results with a lower su of epirical arginal entropies of the entire vector Y. It also eans that we iniize the left hand side of (), which upper bounds the su of epirical block entropies, as the inequality in () suggests. In other words, we show that in each iteration we decrease (at least do not increase) an upper bound on our objective. We terinate once a axial nuber of iterations is reached or we can no longer decrease the su of epirical arginal entropies. Therefore, assuing we terinate at iteration I, encoding the data takes about n B v= Ĥ [I] (Y (v) ) + B b j= log n b + I B b b + I d log d (), where the first ter refers to the su of epirical block entropies at the I iteration, the third ter refers to the representation of I B invertible transforation of each block during the process until I, and the fourth ter refers to the bit perutations at the beginning of each iteration. Hence, to iniize () we need to find the optial trade-off between a low value of B v= Ĥ[I] (Y (v ) and a low iteration nuber I. We ay apply this technique with different values of b to find the best copression schee over all block sizes. A. synthetic experients In order to deonstrate our suggested ethod we first generate a dataset according to the Zipf law distribution which was previously described. We draw n = 6 realizations fro this distribution with an alphabet size = and a paraeter value s =.. We encounter n = 8, 7 unique words and attain an epirical entropy of 8.8 (while the true entropy is 8.65 ). Therefore, copressing the drawn realizations in its given alphabet size takes a total of about = 9.6 6, according to (). Using the patterns ethod [], the redundancy we achieve is the redundancy of the pattern plus the size of the dictionary. Hence, the copressed size of the data set according to this ethod is lower bounded by , 7 + = In addition to these asyptotic schees we would also like to copare our ethod with a coon practical approach. For this purpose we apply the canonical version of the Huffan code. Through the canonical Huffan code we are able to achieve a copression rate of 9.7 per sybol, leading to a total copression size of about. 7. Let us now apply a block-wise copression. We first deonstrate the behavior of our suggest approach with four blocks (B = ) as appears in Figure. To have a good starting point, we initiate our algorith with a the naive shuffling search ethod (described above). This way we apply our optiization process on the best representation a rando bit shuffling could attain (with a negligible d log d redundancy cost). As we can see in Figure.B, we iniize () over I = 6 and B v= Ĥ(Y (v) ) = 9.9 to achieve a total of 9. 6 for the entire dataset. Table I suarizes the results we achieve for different block sizes B. We see that the lowest copression size is achieved over B =, i.e. two blocks. The reason is that for a fixed n, the redundancy is approxiately exponential in the size of the block b. This eans the redundancy drops exponentially with the nuber of blocks while the iniu of B v= Ĥ(Y (v) ) keeps increasing. In other words, in this exaple we earn a great redundancy reduction when oving to a two-block representation while not losing too uch in ters of the average code-word length we can achieve. We further notice that the optial iterations nuber grows with the nuber of blocks. This results fro the cost of describing the optial transforation for each block, at each iteration, I B b b, which exponentially increase with the block size b. Coparing our results with the three ethods described above we are able to reduce the total copression size in 8 5, copared to the iniu aong all our copetitors.

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t. CS 493: Algoriths for Massive Data Sets Feb 2, 2002 Local Models, Bloo Filter Scribe: Qin Lv Local Models In global odels, every inverted file entry is copressed with the sae odel. This work wells when

More information

Fixed-to-Variable Length Distribution Matching

Fixed-to-Variable Length Distribution Matching Fixed-to-Variable Length Distribution Matching Rana Ali Ajad and Georg Böcherer Institute for Counications Engineering Technische Universität München, Gerany Eail: raa2463@gail.co,georg.boecherer@tu.de

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

Compression and Predictive Distributions for Large Alphabet i.i.d and Markov models

Compression and Predictive Distributions for Large Alphabet i.i.d and Markov models 2014 IEEE International Syposiu on Inforation Theory Copression and Predictive Distributions for Large Alphabet i.i.d and Markov odels Xiao Yang Departent of Statistics Yale University New Haven, CT, 06511

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

arxiv: v1 [cs.ds] 3 Feb 2014

arxiv: v1 [cs.ds] 3 Feb 2014 arxiv:40.043v [cs.ds] 3 Feb 04 A Bound on the Expected Optiality of Rando Feasible Solutions to Cobinatorial Optiization Probles Evan A. Sultani The Johns Hopins University APL evan@sultani.co http://www.sultani.co/

More information

Randomized Recovery for Boolean Compressed Sensing

Randomized Recovery for Boolean Compressed Sensing Randoized Recovery for Boolean Copressed Sensing Mitra Fatei and Martin Vetterli Laboratory of Audiovisual Counication École Polytechnique Fédéral de Lausanne (EPFL) Eail: {itra.fatei, artin.vetterli}@epfl.ch

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

e-companion ONLY AVAILABLE IN ELECTRONIC FORM OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

Stochastic Subgradient Methods

Stochastic Subgradient Methods Stochastic Subgradient Methods Lingjie Weng Yutian Chen Bren School of Inforation and Coputer Science University of California, Irvine {wengl, yutianc}@ics.uci.edu Abstract Stochastic subgradient ethods

More information

Lec 05 Arithmetic Coding

Lec 05 Arithmetic Coding Outline CS/EE 5590 / ENG 40 Special Topics (7804, 785, 7803 Lec 05 Arithetic Coding Lecture 04 ReCap Arithetic Coding About Hoework- and Lab Zhu Li Course Web: http://l.web.ukc.edu/lizhu/teaching/06sp.video-counication/ain.htl

More information

On Constant Power Water-filling

On Constant Power Water-filling On Constant Power Water-filling Wei Yu and John M. Cioffi Electrical Engineering Departent Stanford University, Stanford, CA94305, U.S.A. eails: {weiyu,cioffi}@stanford.edu Abstract This paper derives

More information

Iterative Decoding of LDPC Codes over the q-ary Partial Erasure Channel

Iterative Decoding of LDPC Codes over the q-ary Partial Erasure Channel 1 Iterative Decoding of LDPC Codes over the q-ary Partial Erasure Channel Rai Cohen, Graduate Student eber, IEEE, and Yuval Cassuto, Senior eber, IEEE arxiv:1510.05311v2 [cs.it] 24 ay 2016 Abstract In

More information

A note on the multiplication of sparse matrices

A note on the multiplication of sparse matrices Cent. Eur. J. Cop. Sci. 41) 2014 1-11 DOI: 10.2478/s13537-014-0201-x Central European Journal of Coputer Science A note on the ultiplication of sparse atrices Research Article Keivan Borna 12, Sohrab Aboozarkhani

More information

Combining Classifiers

Combining Classifiers Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/

More information

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher

More information

Graphical Models in Local, Asymmetric Multi-Agent Markov Decision Processes

Graphical Models in Local, Asymmetric Multi-Agent Markov Decision Processes Graphical Models in Local, Asyetric Multi-Agent Markov Decision Processes Ditri Dolgov and Edund Durfee Departent of Electrical Engineering and Coputer Science University of Michigan Ann Arbor, MI 48109

More information

Sharp Time Data Tradeoffs for Linear Inverse Problems

Sharp Time Data Tradeoffs for Linear Inverse Problems Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic

More information

Non-Parametric Non-Line-of-Sight Identification 1

Non-Parametric Non-Line-of-Sight Identification 1 Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,

More information

Design of Spatially Coupled LDPC Codes over GF(q) for Windowed Decoding

Design of Spatially Coupled LDPC Codes over GF(q) for Windowed Decoding IEEE TRANSACTIONS ON INFORMATION THEORY (SUBMITTED PAPER) 1 Design of Spatially Coupled LDPC Codes over GF(q) for Windowed Decoding Lai Wei, Student Meber, IEEE, David G. M. Mitchell, Meber, IEEE, Thoas

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Coputable Shell Decoposition Bounds John Langford TTI-Chicago jcl@cs.cu.edu David McAllester TTI-Chicago dac@autoreason.co Editor: Leslie Pack Kaelbling and David Cohn Abstract Haussler, Kearns, Seung

More information

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay A Low-Coplexity Congestion Control and Scheduling Algorith for Multihop Wireless Networks with Order-Optial Per-Flow Delay Po-Kai Huang, Xiaojun Lin, and Chih-Chun Wang School of Electrical and Coputer

More information

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search Quantu algoriths (CO 781, Winter 2008) Prof Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search ow we begin to discuss applications of quantu walks to search algoriths

More information

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

Optimal Resource Allocation in Multicast Device-to-Device Communications Underlaying LTE Networks

Optimal Resource Allocation in Multicast Device-to-Device Communications Underlaying LTE Networks 1 Optial Resource Allocation in Multicast Device-to-Device Counications Underlaying LTE Networks Hadi Meshgi 1, Dongei Zhao 1 and Rong Zheng 2 1 Departent of Electrical and Coputer Engineering, McMaster

More information

List Scheduling and LPT Oliver Braun (09/05/2017)

List Scheduling and LPT Oliver Braun (09/05/2017) List Scheduling and LPT Oliver Braun (09/05/207) We investigate the classical scheduling proble P ax where a set of n independent jobs has to be processed on 2 parallel and identical processors (achines)

More information

An Algorithm for Quantization of Discrete Probability Distributions

An Algorithm for Quantization of Discrete Probability Distributions An Algorith for Quantization of Discrete Probability Distributions Yuriy A. Reznik Qualco Inc., San Diego, CA Eail: yreznik@ieee.org Abstract We study the proble of quantization of discrete probability

More information

Multi-Scale/Multi-Resolution: Wavelet Transform

Multi-Scale/Multi-Resolution: Wavelet Transform Multi-Scale/Multi-Resolution: Wavelet Transfor Proble with Fourier Fourier analysis -- breaks down a signal into constituent sinusoids of different frequencies. A serious drawback in transforing to the

More information

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness A Note on Scheduling Tall/Sall Multiprocessor Tasks with Unit Processing Tie to Miniize Maxiu Tardiness Philippe Baptiste and Baruch Schieber IBM T.J. Watson Research Center P.O. Box 218, Yorktown Heights,

More information

arxiv: v1 [cs.ds] 17 Mar 2016

arxiv: v1 [cs.ds] 17 Mar 2016 Tight Bounds for Single-Pass Streaing Coplexity of the Set Cover Proble Sepehr Assadi Sanjeev Khanna Yang Li Abstract arxiv:1603.05715v1 [cs.ds] 17 Mar 2016 We resolve the space coplexity of single-pass

More information

Low-complexity, Low-memory EMS algorithm for non-binary LDPC codes

Low-complexity, Low-memory EMS algorithm for non-binary LDPC codes Low-coplexity, Low-eory EMS algorith for non-binary LDPC codes Adrian Voicila,David Declercq, François Verdier ETIS ENSEA/CP/CNRS MR-85 954 Cergy-Pontoise, (France) Marc Fossorier Dept. Electrical Engineering

More information

Generalized Queries on Probabilistic Context-Free Grammars

Generalized Queries on Probabilistic Context-Free Grammars IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 20, NO. 1, JANUARY 1998 1 Generalized Queries on Probabilistic Context-Free Graars David V. Pynadath and Michael P. Wellan Abstract

More information

Hamming Compressed Sensing

Hamming Compressed Sensing Haing Copressed Sensing Tianyi Zhou, and Dacheng Tao, Meber, IEEE Abstract arxiv:.73v2 [cs.it] Oct 2 Copressed sensing CS and -bit CS cannot directly recover quantized signals and require tie consuing

More information

Fundamentals of Image Compression

Fundamentals of Image Compression Fundaentals of Iage Copression Iage Copression reduce the size of iage data file while retaining necessary inforation Original uncopressed Iage Copression (encoding) 01101 Decopression (decoding) Copressed

More information

Chapter 6 1-D Continuous Groups

Chapter 6 1-D Continuous Groups Chapter 6 1-D Continuous Groups Continuous groups consist of group eleents labelled by one or ore continuous variables, say a 1, a 2,, a r, where each variable has a well- defined range. This chapter explores:

More information

Boosting with log-loss

Boosting with log-loss Boosting with log-loss Marco Cusuano-Towner Septeber 2, 202 The proble Suppose we have data exaples {x i, y i ) i =... } for a two-class proble with y i {, }. Let F x) be the predictor function with the

More information

A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine. (1900 words)

A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine. (1900 words) 1 A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine (1900 words) Contact: Jerry Farlow Dept of Matheatics Univeristy of Maine Orono, ME 04469 Tel (07) 866-3540 Eail: farlow@ath.uaine.edu

More information

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition

More information

COS 424: Interacting with Data. Written Exercises

COS 424: Interacting with Data. Written Exercises COS 424: Interacting with Data Hoework #4 Spring 2007 Regression Due: Wednesday, April 18 Written Exercises See the course website for iportant inforation about collaboration and late policies, as well

More information

Homework 3 Solutions CSE 101 Summer 2017

Homework 3 Solutions CSE 101 Summer 2017 Hoework 3 Solutions CSE 0 Suer 207. Scheduling algoriths The following n = 2 jobs with given processing ties have to be scheduled on = 3 parallel and identical processors with the objective of iniizing

More information

Probability Distributions

Probability Distributions Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples

More information

Optimal Jamming Over Additive Noise: Vector Source-Channel Case

Optimal Jamming Over Additive Noise: Vector Source-Channel Case Fifty-first Annual Allerton Conference Allerton House, UIUC, Illinois, USA October 2-3, 2013 Optial Jaing Over Additive Noise: Vector Source-Channel Case Erah Akyol and Kenneth Rose Abstract This paper

More information

The Weierstrass Approximation Theorem

The Weierstrass Approximation Theorem 36 The Weierstrass Approxiation Theore Recall that the fundaental idea underlying the construction of the real nubers is approxiation by the sipler rational nubers. Firstly, nubers are often deterined

More information

Fairness via priority scheduling

Fairness via priority scheduling Fairness via priority scheduling Veeraruna Kavitha, N Heachandra and Debayan Das IEOR, IIT Bobay, Mubai, 400076, India vavitha,nh,debayan}@iitbacin Abstract In the context of ulti-agent resource allocation

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Journal of Machine Learning Research 5 (2004) 529-547 Subitted 1/03; Revised 8/03; Published 5/04 Coputable Shell Decoposition Bounds John Langford David McAllester Toyota Technology Institute at Chicago

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

Error Exponents in Asynchronous Communication

Error Exponents in Asynchronous Communication IEEE International Syposiu on Inforation Theory Proceedings Error Exponents in Asynchronous Counication Da Wang EECS Dept., MIT Cabridge, MA, USA Eail: dawang@it.edu Venkat Chandar Lincoln Laboratory,

More information

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians Using EM To Estiate A Probablity Density With A Mixture Of Gaussians Aaron A. D Souza adsouza@usc.edu Introduction The proble we are trying to address in this note is siple. Given a set of data points

More information

Testing Properties of Collections of Distributions

Testing Properties of Collections of Distributions Testing Properties of Collections of Distributions Reut Levi Dana Ron Ronitt Rubinfeld April 9, 0 Abstract We propose a fraework for studying property testing of collections of distributions, where the

More information

Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space

Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space Journal of Machine Learning Research 3 (2003) 1333-1356 Subitted 5/02; Published 3/03 Grafting: Fast, Increental Feature Selection by Gradient Descent in Function Space Sion Perkins Space and Reote Sensing

More information

OBJECTIVES INTRODUCTION

OBJECTIVES INTRODUCTION M7 Chapter 3 Section 1 OBJECTIVES Suarize data using easures of central tendency, such as the ean, edian, ode, and idrange. Describe data using the easures of variation, such as the range, variance, and

More information

Fast Montgomery-like Square Root Computation over GF(2 m ) for All Trinomials

Fast Montgomery-like Square Root Computation over GF(2 m ) for All Trinomials Fast Montgoery-like Square Root Coputation over GF( ) for All Trinoials Yin Li a, Yu Zhang a, a Departent of Coputer Science and Technology, Xinyang Noral University, Henan, P.R.China Abstract This letter

More information

In this chapter, we consider several graph-theoretic and probabilistic models

In this chapter, we consider several graph-theoretic and probabilistic models THREE ONE GRAPH-THEORETIC AND STATISTICAL MODELS 3.1 INTRODUCTION In this chapter, we consider several graph-theoretic and probabilistic odels for a social network, which we do under different assuptions

More information

A Theoretical Analysis of a Warm Start Technique

A Theoretical Analysis of a Warm Start Technique A Theoretical Analysis of a War Start Technique Martin A. Zinkevich Yahoo! Labs 701 First Avenue Sunnyvale, CA Abstract Batch gradient descent looks at every data point for every step, which is wasteful

More information

The Transactional Nature of Quantum Information

The Transactional Nature of Quantum Information The Transactional Nature of Quantu Inforation Subhash Kak Departent of Coputer Science Oklahoa State University Stillwater, OK 7478 ABSTRACT Inforation, in its counications sense, is a transactional property.

More information

Lower Bounds for Quantized Matrix Completion

Lower Bounds for Quantized Matrix Completion Lower Bounds for Quantized Matrix Copletion Mary Wootters and Yaniv Plan Departent of Matheatics University of Michigan Ann Arbor, MI Eail: wootters, yplan}@uich.edu Mark A. Davenport School of Elec. &

More information

Bipartite subgraphs and the smallest eigenvalue

Bipartite subgraphs and the smallest eigenvalue Bipartite subgraphs and the sallest eigenvalue Noga Alon Benny Sudaov Abstract Two results dealing with the relation between the sallest eigenvalue of a graph and its bipartite subgraphs are obtained.

More information

On the Communication Complexity of Lipschitzian Optimization for the Coordinated Model of Computation

On the Communication Complexity of Lipschitzian Optimization for the Coordinated Model of Computation journal of coplexity 6, 459473 (2000) doi:0.006jco.2000.0544, available online at http:www.idealibrary.co on On the Counication Coplexity of Lipschitzian Optiization for the Coordinated Model of Coputation

More information

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup)

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup) Recovering Data fro Underdeterined Quadratic Measureents (CS 229a Project: Final Writeup) Mahdi Soltanolkotabi Deceber 16, 2011 1 Introduction Data that arises fro engineering applications often contains

More information

Ph 20.3 Numerical Solution of Ordinary Differential Equations

Ph 20.3 Numerical Solution of Ordinary Differential Equations Ph 20.3 Nuerical Solution of Ordinary Differential Equations Due: Week 5 -v20170314- This Assignent So far, your assignents have tried to failiarize you with the hardware and software in the Physics Coputing

More information

On the Analysis of the Quantum-inspired Evolutionary Algorithm with a Single Individual

On the Analysis of the Quantum-inspired Evolutionary Algorithm with a Single Individual 6 IEEE Congress on Evolutionary Coputation Sheraton Vancouver Wall Centre Hotel, Vancouver, BC, Canada July 16-1, 6 On the Analysis of the Quantu-inspired Evolutionary Algorith with a Single Individual

More information

Weighted- 1 minimization with multiple weighting sets

Weighted- 1 minimization with multiple weighting sets Weighted- 1 iniization with ultiple weighting sets Hassan Mansour a,b and Özgür Yılaza a Matheatics Departent, University of British Colubia, Vancouver - BC, Canada; b Coputer Science Departent, University

More information

Convex Programming for Scheduling Unrelated Parallel Machines

Convex Programming for Scheduling Unrelated Parallel Machines Convex Prograing for Scheduling Unrelated Parallel Machines Yossi Azar Air Epstein Abstract We consider the classical proble of scheduling parallel unrelated achines. Each job is to be processed by exactly

More information

Birthday Paradox Calculations and Approximation

Birthday Paradox Calculations and Approximation Birthday Paradox Calculations and Approxiation Joshua E. Hill InfoGard Laboratories -March- v. Birthday Proble In the birthday proble, we have a group of n randoly selected people. If we assue that birthdays

More information

Bootstrapping Dependent Data

Bootstrapping Dependent Data Bootstrapping Dependent Data One of the key issues confronting bootstrap resapling approxiations is how to deal with dependent data. Consider a sequence fx t g n t= of dependent rando variables. Clearly

More information

A Better Algorithm For an Ancient Scheduling Problem. David R. Karger Steven J. Phillips Eric Torng. Department of Computer Science

A Better Algorithm For an Ancient Scheduling Problem. David R. Karger Steven J. Phillips Eric Torng. Department of Computer Science A Better Algorith For an Ancient Scheduling Proble David R. Karger Steven J. Phillips Eric Torng Departent of Coputer Science Stanford University Stanford, CA 9435-4 Abstract One of the oldest and siplest

More information

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information Cite as: Straub D. (2014). Value of inforation analysis with structural reliability ethods. Structural Safety, 49: 75-86. Value of Inforation Analysis with Structural Reliability Methods Daniel Straub

More information

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis Soft Coputing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis Beverly Rivera 1,2, Irbis Gallegos 1, and Vladik Kreinovich 2 1 Regional Cyber and Energy Security Center RCES

More information

Understanding Machine Learning Solution Manual

Understanding Machine Learning Solution Manual Understanding Machine Learning Solution Manual Written by Alon Gonen Edited by Dana Rubinstein Noveber 17, 2014 2 Gentle Start 1. Given S = ((x i, y i )), define the ultivariate polynoial p S (x) = i []:y

More information

Multicollision Attacks on Some Generalized Sequential Hash Functions

Multicollision Attacks on Some Generalized Sequential Hash Functions Multicollision Attacks on Soe Generalized Sequential Hash Functions M. Nandi David R. Cheriton School of Coputer Science University of Waterloo Waterloo, Ontario N2L 3G1, Canada 2nandi@uwaterloo.ca D.

More information

EMPIRICAL COMPLEXITY ANALYSIS OF A MILP-APPROACH FOR OPTIMIZATION OF HYBRID SYSTEMS

EMPIRICAL COMPLEXITY ANALYSIS OF A MILP-APPROACH FOR OPTIMIZATION OF HYBRID SYSTEMS EMPIRICAL COMPLEXITY ANALYSIS OF A MILP-APPROACH FOR OPTIMIZATION OF HYBRID SYSTEMS Jochen Till, Sebastian Engell, Sebastian Panek, and Olaf Stursberg Process Control Lab (CT-AST), University of Dortund,

More information

Symbolic Analysis as Universal Tool for Deriving Properties of Non-linear Algorithms Case study of EM Algorithm

Symbolic Analysis as Universal Tool for Deriving Properties of Non-linear Algorithms Case study of EM Algorithm Acta Polytechnica Hungarica Vol., No., 04 Sybolic Analysis as Universal Tool for Deriving Properties of Non-linear Algoriths Case study of EM Algorith Vladiir Mladenović, Miroslav Lutovac, Dana Porrat

More information

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS A Thesis Presented to The Faculty of the Departent of Matheatics San Jose State University In Partial Fulfillent of the Requireents

More information

Asynchronous Gossip Algorithms for Stochastic Optimization

Asynchronous Gossip Algorithms for Stochastic Optimization Asynchronous Gossip Algoriths for Stochastic Optiization S. Sundhar Ra ECE Dept. University of Illinois Urbana, IL 680 ssrini@illinois.edu A. Nedić IESE Dept. University of Illinois Urbana, IL 680 angelia@illinois.edu

More information

Reed-Muller codes for random erasures and errors

Reed-Muller codes for random erasures and errors Reed-Muller codes for rando erasures and errors Eanuel Abbe Air Shpilka Avi Wigderson Abstract This paper studies the paraeters for which Reed-Muller (RM) codes over GF (2) can correct rando erasures and

More information

Distributed Subgradient Methods for Multi-agent Optimization

Distributed Subgradient Methods for Multi-agent Optimization 1 Distributed Subgradient Methods for Multi-agent Optiization Angelia Nedić and Asuan Ozdaglar October 29, 2007 Abstract We study a distributed coputation odel for optiizing a su of convex objective functions

More information

On the Use of A Priori Information for Sparse Signal Approximations

On the Use of A Priori Information for Sparse Signal Approximations ITS TECHNICAL REPORT NO. 3/4 On the Use of A Priori Inforation for Sparse Signal Approxiations Oscar Divorra Escoda, Lorenzo Granai and Pierre Vandergheynst Signal Processing Institute ITS) Ecole Polytechnique

More information

Bounds on the Minimax Rate for Estimating a Prior over a VC Class from Independent Learning Tasks

Bounds on the Minimax Rate for Estimating a Prior over a VC Class from Independent Learning Tasks Bounds on the Miniax Rate for Estiating a Prior over a VC Class fro Independent Learning Tasks Liu Yang Steve Hanneke Jaie Carbonell Deceber 01 CMU-ML-1-11 School of Coputer Science Carnegie Mellon University

More information

Statistical clustering and Mineral Spectral Unmixing in Aviris Hyperspectral Image of Cuprite, NV

Statistical clustering and Mineral Spectral Unmixing in Aviris Hyperspectral Image of Cuprite, NV CS229 REPORT, DECEMBER 05 1 Statistical clustering and Mineral Spectral Unixing in Aviris Hyperspectral Iage of Cuprite, NV Mario Parente, Argyris Zynis I. INTRODUCTION Hyperspectral Iaging is a technique

More information

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material Consistent Multiclass Algoriths for Coplex Perforance Measures Suppleentary Material Notations. Let λ be the base easure over n given by the unifor rando variable (say U over n. Hence, for all easurable

More information

Deflation of the I-O Series Some Technical Aspects. Giorgio Rampa University of Genoa April 2007

Deflation of the I-O Series Some Technical Aspects. Giorgio Rampa University of Genoa April 2007 Deflation of the I-O Series 1959-2. Soe Technical Aspects Giorgio Rapa University of Genoa g.rapa@unige.it April 27 1. Introduction The nuber of sectors is 42 for the period 1965-2 and 38 for the initial

More information

When Short Runs Beat Long Runs

When Short Runs Beat Long Runs When Short Runs Beat Long Runs Sean Luke George Mason University http://www.cs.gu.edu/ sean/ Abstract What will yield the best results: doing one run n generations long or doing runs n/ generations long

More information

The proofs of Theorem 1-3 are along the lines of Wied and Galeano (2013).

The proofs of Theorem 1-3 are along the lines of Wied and Galeano (2013). A Appendix: Proofs The proofs of Theore 1-3 are along the lines of Wied and Galeano (2013) Proof of Theore 1 Let D[d 1, d 2 ] be the space of càdlàg functions on the interval [d 1, d 2 ] equipped with

More information

Ch 12: Variations on Backpropagation

Ch 12: Variations on Backpropagation Ch 2: Variations on Backpropagation The basic backpropagation algorith is too slow for ost practical applications. It ay take days or weeks of coputer tie. We deonstrate why the backpropagation algorith

More information

Upper bound on false alarm rate for landmine detection and classification using syntactic pattern recognition

Upper bound on false alarm rate for landmine detection and classification using syntactic pattern recognition Upper bound on false alar rate for landine detection and classification using syntactic pattern recognition Ahed O. Nasif, Brian L. Mark, Kenneth J. Hintz, and Nathalia Peixoto Dept. of Electrical and

More information

On the Inapproximability of Vertex Cover on k-partite k-uniform Hypergraphs

On the Inapproximability of Vertex Cover on k-partite k-uniform Hypergraphs On the Inapproxiability of Vertex Cover on k-partite k-unifor Hypergraphs Venkatesan Guruswai and Rishi Saket Coputer Science Departent Carnegie Mellon University Pittsburgh, PA 1513. Abstract. Coputing

More information

Support recovery in compressed sensing: An estimation theoretic approach

Support recovery in compressed sensing: An estimation theoretic approach Support recovery in copressed sensing: An estiation theoretic approach Ain Karbasi, Ali Horati, Soheil Mohajer, Martin Vetterli School of Coputer and Counication Sciences École Polytechnique Fédérale de

More information

A Probabilistic and RIPless Theory of Compressed Sensing

A Probabilistic and RIPless Theory of Compressed Sensing A Probabilistic and RIPless Theory of Copressed Sensing Eanuel J Candès and Yaniv Plan 2 Departents of Matheatics and of Statistics, Stanford University, Stanford, CA 94305 2 Applied and Coputational Matheatics,

More information

Tight Information-Theoretic Lower Bounds for Welfare Maximization in Combinatorial Auctions

Tight Information-Theoretic Lower Bounds for Welfare Maximization in Combinatorial Auctions Tight Inforation-Theoretic Lower Bounds for Welfare Maxiization in Cobinatorial Auctions Vahab Mirrokni Jan Vondrák Theory Group, Microsoft Dept of Matheatics Research Princeton University Redond, WA 9805

More information

Extension of CSRSM for the Parametric Study of the Face Stability of Pressurized Tunnels

Extension of CSRSM for the Parametric Study of the Face Stability of Pressurized Tunnels Extension of CSRSM for the Paraetric Study of the Face Stability of Pressurized Tunnels Guilhe Mollon 1, Daniel Dias 2, and Abdul-Haid Soubra 3, M.ASCE 1 LGCIE, INSA Lyon, Université de Lyon, Doaine scientifique

More information

A remark on a success rate model for DPA and CPA

A remark on a success rate model for DPA and CPA A reark on a success rate odel for DPA and CPA A. Wieers, BSI Version 0.5 andreas.wieers@bsi.bund.de Septeber 5, 2018 Abstract The success rate is the ost coon evaluation etric for easuring the perforance

More information

Approximation in Stochastic Scheduling: The Power of LP-Based Priority Policies

Approximation in Stochastic Scheduling: The Power of LP-Based Priority Policies Approxiation in Stochastic Scheduling: The Power of -Based Priority Policies Rolf Möhring, Andreas Schulz, Marc Uetz Setting (A P p stoch, r E( w and (B P p stoch E( w We will assue that the processing

More information

Interactive Markov Models of Evolutionary Algorithms

Interactive Markov Models of Evolutionary Algorithms Cleveland State University EngagedScholarship@CSU Electrical Engineering & Coputer Science Faculty Publications Electrical Engineering & Coputer Science Departent 2015 Interactive Markov Models of Evolutionary

More information

A PROBABILISTIC AND RIPLESS THEORY OF COMPRESSED SENSING. Emmanuel J. Candès Yaniv Plan. Technical Report No November 2010

A PROBABILISTIC AND RIPLESS THEORY OF COMPRESSED SENSING. Emmanuel J. Candès Yaniv Plan. Technical Report No November 2010 A PROBABILISTIC AND RIPLESS THEORY OF COMPRESSED SENSING By Eanuel J Candès Yaniv Plan Technical Report No 200-0 Noveber 200 Departent of Statistics STANFORD UNIVERSITY Stanford, California 94305-4065

More information