W Arithmetic Coding (BAC). BAC is a variable in to

Size: px
Start display at page:

Download "W Arithmetic Coding (BAC). BAC is a variable in to"

Transcription

1 1546 EEE TRANSACTONS ON NFORMATON THEORY, VOL. 39, NO. 5, SEPTEMBER 1993 Block Arithetic Coding for Source Copression Charles G. Boncelet Jr., Meber EEE Abstruct- We introduce Block Arithetic Coding (BAC), a technique for entropy coding that cobines any of the advantages of ordinary strea arithetic coding with the siplicity of block codes. The code is variable length in to fixed out (V to F), unlike Huffan coding which is fixed in to variable out (F to V). We develop two versions of the coder: 1) an optial encoder based on dynaic prograing arguents, and 2) a suboptial heuristic based on arithetic coding. The optial coder is oetial over all V to F coplete and proper block codes. We show that the suboptial coder achieves copression that is within a constant of a perfect entropy coder for independent and identically distributed inputs. BAC is easily ipleented, even with large codebooks, because the algoriths for coding and decoding are regular. For instance, codebooks with 232 entries are feasible. BAC also does not suffer catastrophic failure in the presence of channel errors. Decoding ers a confined to the block in question. The encoding is in practice reasonably efficient. With i.i.d. binary inputs with P(1) = 0.95 and 16 bit codes, entropy arguents indicate at ost 55.8 bits can be encoded; the BAC heuristic achieves 53.0 and the optial BAC achieves Finally, BAC appears to be uch faster than ordinary arithetic coding. ndex Ters-Arithetic codes, block codes, variable to fixed codes, entropy copression.. NTRODUCTON E introduce a ethod of source coding called Block W Arithetic Coding (BAC). BAC is a variable in to fixed out block coder (V to F) unlike Huffan codes which are fixed in to variable out (F to V). BAC is siple to ipleent, efficient, and usable in a wide variety of applications. Furtherore, BAC codes do not suffer catastrophic failure in the presence of channel errors. Decoding errors are confined to the block in question. Consider the input, X = 21x2~3... to be a sequence of independent and identically distributed input sybols taken fro soe input alphabet A = {al, a2,..., a}. The sybols obey probabilities p3 = Pr(xl = a3). Assue that p3 > 0 for all j. The entropy of the source is H(X) = - CPZ log. (1) z=1 (We will easure all entropies in bits.) The copression efficiency of entropy encoders is bounded below by the entropy. For F to V encoders, each input sybol requires on Manuscript received Septeber 9, 1991; revised February 15, This paper was presented in part in The 1992 Conference on nforation Sciences and Systes, Princeton, NJ, March This work was perfored in part while the author was visiting at the University of Michigan. The author is with the Departent of Electrical Engineering, University of Delaware, Newark, DE EEE Log Nuber /93$ EEE average at least H(X) output bits; for V to F encoders, each output bit can on average represent at ost,1/h(x) input sybols. The ost iportant entropy copressor is Huffan coding [9]. Huffan codes are F to V block codes. They block the input into strings of q letters and encode these strings with variable length output strings. t is well-known that the nuber of output bits of a Huffan code is bounded above by qh(x) + 1 for all block sizes. n special cases, this bound can be tightened [5], [17]. As a result the relative efficiency of Huffan codes can be ade as high as desired by taking the block size to be large enough. n practice, however, large block sizes are difficult to ipleent. Codebooks are usually stored and ust be searched for each input string. This storage and searching liits block sizes to sall values. (However, see Jakobsson [ 111.) Huffan codes can be ipleented adaptively, but it is difficult to do so. The process of generating Huffan code trees is sufficiently cubersoe that adapting the tree on a sybol by sybol basis is rarely coputationally efficient. n recent years, a class of entropy coders called arithetic coders has been studied and developed [S, [13], [M-[21], [25]. Arithetic coders are strea coders. They take in an arbitrarily long input and output a corresponding output strea. Arithetic coders have any advantages. On average, the ratio of output length to input length can be very close to the source entropy. The input probabilities can be changed on a sybol by sybol basis and can be estiated adaptively. However, arithetic codes have certain disadvantages. n soe applications, the encoding and decoding steps are too coplicated to be done in real tie. The entropy coder ost siilar to BAC is due to Tunstall [23], attributed in [12]. Tunstall s encoder is a V to F block coder that operates as follows: Given a codebook with K output sybols, a codebook with K output sybols is generated by splitting the ost probable input sequence into successors, each fored by appending one of the a3. Unfortunately, the encoding of input sequences appears to require searching a codebook. Thus, large block sizes appear to be prohibitive, both in creating the code and in searching it. This has the practical effect of liiting block sizes to sall values. Recently, a V to F block arithetic coder has been proposed by Teuhola and Raita [22]. While siilar in spirit to this work, their coder, naed FXAR, differs in any details fro BAC. FXAR parses the input differently fro BAC and encodes each string differently. The analysis of FXAR is entirely different fro the analysis of BAC presented in Section 111. FXAR does appear to be fast.

2 BONOCELET BLOCK ARiTHMETC CODNG FOR SOURCE COMPRESSON 1547 Other V to F entropy coders are runlength [7] and Ziv- Lepel [14], [24], [27]. Runlength codes count the runs of single sybols and encode the runs by transitting the nuber in each run. They work well when the probabilities are highly skewed or when there is significant positive correlation between sybols. Runlength codes are often used for sall alphabets, especially binary inputs. Then the runlength code is particularly siple since, by necessity, the runs alternate between l s and 0 s. Runlength codes are occasionally cobined with Huffan to for a variable in to variable out (VtoV) schee. For exaple, consider the CC T facsiile code [lo] or the ore general VtoV runlength codes of Elias [6]. Ziv-Lepel codes work on different principles than BAC codes or the other codes considered so far. They find substrings of input sybols that appear frequently and encode each with a fixed length codeword. Ziv-Lepel codes work particularly well on English text, as they can discover coplex dependencies between the sybols. One advantage of ost fixed output length block coders, including BAC, is that the effects of transission errors (sybols received incorrectly) are confined to the block in question. The.decoder does not lose synchronis with the incoing bit strea. While the corrupted block ay be decoded incorrectly, reaining blocks will be decoded correctly. However, there ay be insertions or deletions in the decoded strea. These insertions and deletions can usually be dealt with at the application level. n contrast, Huffan codes and arithetic codes can lose synchronis and suffer catastrophic failure in the presence of channel errors. (t is often argued that Huffan codes are self-synchronizing, but this is not guaranteed. Various techniques have been proposed to cobat this proble. See, e.g., [16].) One subtle point is that fixed output length codes can still suffer catastrophic failure if ipleented in an adaptive fashion. f the probability estiates depend on previously transitted sybols, then a single error can affect other blocks. n particular, Ziv-Lepel codes are usually ipleented in an adaptive fashion and are susceptible to catastrophic failure if their dictionaries get corrupted. BAC codes will be vulnerable to the sae catastrophic failure if the probabilities are estiated adaptively based on prior sybols. Finally, there is a growing recognition that V to F codes can outperfor F to V codes with strong dependencies between sybols [M, [26]. The idea is that, for a fixed nuber of sybols, V to F codes can atch long input strings. For exaple, a runlength coder with K codewords can atch a run of up to K - 1 sybols. This run can be encoded with approxiately log K bits, resulting in an exponential gain. n contrast, Huffan codes only give arithetic gains. For K large enough, the V to F coder is ore efficient. This paper is organized as follows: Section 1 develops the optial and heuristic BAC s. n Section 111, we present the principal theoretical result of this paper: that BAC s are within an additive constant of the entropy for all code sizes. Section V presents coputations illustrating the efficiency of BAC s. Section V discusses extensions to tie varying and Markov sybols, Section V discusses the EOF proble and proposes two new solutions, and Section V1 presents conclusions and discussion. 11. DEVELOPMENT OF BLOCK ARTHMETC CODNG The BAC encoder parses the input into variable length input strings and encodes each with single fixed length output codewords. The following definitions are taken fro Jelinek and Schneider [12]: Let W(K) = {wi : 1 5 i 5 K} be a K eleent collection of words, wi, with each word a substring of X. W(K) is proper if, for any two words, w; and wj with i # j, wi is not a prefix of wj. W(K) is coplete if every infinite length input string has a prefix in W. The characterization in [15] is succinct: a code is coplete and proper if every infinite length input string has one and only one prefix in W(K). t is useful to think of coplete and proper fro the viewpoint of parsing trees. Proper eans that the codewords are all leafs of the tree; coplete eans that all the nodes in the tree are either leafs or they are internal nodes with children. The nuber of codewords in coplete and proper sets is 1 + L( - 1) for L = 1,2,... [12]. This result is also easy to derive fro the viewpoint of trees. The root has children (L = 1). Each tie a leaf node is replaced by children, a net gain of - 1 leafs result. Below, we will allow L = 0 so that one codeword can be allowed in appropriate sets. We assue that to each output codeword is assigned an integer index i with i = 1,2,..., Ks. Let S be the set of output codewords. Consider a subset of codewords with contiguous indices. Denote the first index by A and the last by B. The nuber of codewords in the subset is K = B - A + 1. BAC proceeds by recursively splitting S into disjoint subsets. With each input sybol, the current subset is split into nonepty, disjoint subsets, one for each possible input letter. The new subset corresponding to the actual letter is chosen and the process continues recursively. When the subset has fewer than codewords, BAC stops splitting, outputs any of the codewords in the subset, reinitializes, and continues. The ideal situation is that each final subset contain only 1 codeword; otherwise, the extra codewords are wasted. Denote the nuber of codewords in each of the disjoint subsets by Kl(K), where K is the nuber of codewords in the current set. Usually, we will suppress the dependence on K and. denote the nuber siply by Kl. The question of how the Kl should be selected is central to the reainder of this paper. We can identify five criteria that are necessary or desirable: 1) Kl > 0. 2) CEKl L K. 3) C; =,Ki = K. 4) Kl = 1 + Ll( - 1) for soe L = 0,1,2,.... 5) f pj > pi, then Kj 2 Ki. Criteria Q1 and Q2 are necessary to ensure that the parsing is coplete and proper and that the encoding can be decoded. Each subset ust have at least one codeword and the su of all the subsets cannot exceed the total available. Criteria Q3, Q4, and Q5 are desirable in that eeting each results in a ore efficient encoding. n effect, Q3 says that one ought to use all the codewords that are available. Q4 assures that the nuber of codewords in each subset is equal to one (a leaf) or is the nuber in a coplete and proper set. For

3 1548 EEE TRANSACTONS ON NFORMATON THEORY, VOL. 39, NO. 5, SEPTEMBER 1993 /* BAC Encoder */ K = Ks; A = 1; while (( = getinput()) # EOF) { Copute fl, Kz,...,f,,,; A = A + C,; K = f(; if (f < ) doeof(a, (); f* BAC Decoder *f { Output code(a); A= 1; f = fs; Fig. 1. Basic BAC Encoder. while ((C = getindexcode()) # EOF) f = KS; A = 1; while (f 1 ) t Copute Cl, fz,...,c,,,; Find 1 s.t. A + output ai; f = f1; 1 undoeof(a, f); A = A +.E!=; K,; Fig. 2. Basic BAC Decoder C; _< C < A + E!=, K,; exaple, consider = 3 and K = 5. The subsets should be divided into soe perutation of 1,1,3, and not 1,2,2. n the forer case, the set with 3 eleents can be divided one ore tie; in the latter case, none of the subsets can. Note, Q3 and Q4 can both be satisfied if the initial K satisfies K = 1 + L( - 1) and ELl Ll = L - 1. Q5 erely reflects the intuitive goal that ore probable letters should get ore codewords. n theory, Q5 can always be et by sorting the Kl s in the sae order as the pl s. Soeties in practice, e.g., in adaptive odels, it ay be coputationally difficult to know the sorted order and eeting Q5 ay be probleatic. The BAC encoding algorith is given in Fig. 1. The BAC decoding algorith is given in Fig. 2. We assue that the various functions behave as follows: [getinputo] Returns the index of the next input sybol to be coded. f a, is the next letter, then it returns r. [code(a)] Returns the codeword corresponding to codeword index, A. [doeof(a, K)] Handles input end-of-file (EOF). Discussed below in Section V. [getindexcodeo] Returns the index of the next codeword taken fro the channel. [undoeofo] Undoes the effect of doeof(). As exaples of the kinds of parsings available, consider the following for binary inputs (a1 = O,a2 = 1): Let K1 = K - 1 and K2 = 1. These are runlength codes for runs of 0 s. Siilarly, if K1 = 1 and K2 = K - 1, we get runlength codes for runs of 1 s. The parsing trees are as uch left or right as possible. Let K1(Ks) = Ks/2 and Kz(K.9) = Ks/2. f the first sybol is a 1, then let K1 = K - 1 and K2 = 1 for all other K s; else, let K1 = 1 and K2 = K - 1. These are syetric runlength codes, useful when the starting value is unknown. Let K1 = K/2 and K2 = K/2. These codes ap the input to output in an identity-like fashion. With the conditions above, we can state the following theore: Theore 1 Block Arithetic Codes are coplete and proper. Furtherore, all coplete and proper parsings can be developed as BAC s. Proojl t is easiest to develop this equivalence with parsing trees. BAC codes are coplete and proper fro their construction. They are proper because no outputs are generated at nonleaf nodes; they are coplete because each terinal node has - 1 siblings. For the converse, consider any coplete and proper parsing as a tree. At each node of the tree, a BAC coder can reproduce the sae tree by appropriately choosing the subset sizes, Kl. For instance, at the root node, count the nuber of leafs in each branch. Use each of these nubers for Kl, respectively. The process can clearly continue at each node. 0 n the case considered so far of independent and identically distributed input sybols, it is straightforward to derive an expression for the efficiency of a code. Let N(K) denote the expected nuber of input sybols encoded with K output sybols. Due to the recursive decoposition, one obtains the following recursive forula: The 1 is for the current sybol, zj. The su follows fro considering all possible choices for z. With probability pl, the input is a1 and the subset chosen has Kl codewords. The expected nuber of input sybols encoded fro here onward is N(Kl). Since this equation is crucial to the rest of this developent, we also present an alternate derivation. N(K), as the expected nuber of input sybols encoded with K output sybols, can be written as where ni is the nuber of bits in wi. Parse wi = aljw: where al,; is the first letter in w;. Then n: = n; - 1 and, by

4 BONOCELET BLOCK ARTHMETC CODNG FOR SOURCE COMPRESSON 1549 independence, Pr(w;) = Pr(a1,;) Pr(wi). Thus, K i=l K i=l 111 na (3) follows because Kl codewords start with al. The boundary conditions are easy and follow fro the sjple requireent that at least codewords are needed to encode a single sybol: (3) N(K)=O K=0,1,..., -1. (4) Note, one application of (2) yields N() = 1. While hard to copute N(K) in closed for, except in special cases, (2) and (4) are easy to progra. For an exaple of an easy case, consider ordinary runlength coding for binary inputs. Letting N,(K) refer to the expected nuber of input sybols encoded with K output codewords using runlength encoding, then the recursion siplifies to N,(K) = 1 + pn?.(k - 1) + qn,(l) = 1 +pn,(k - l), (5) where p is the probability of a sybol in the current run and q = 1 - p. (5) follows since N,(1) = 0. The solution to this linear difference equation is N,(K) = (1 - pk-1)/(1 - p). (6) nterestingly, this solution asyptotically approaches 1/( 1 - p). We arrive at the well-known conclusion that V to F runlength codes do not work well for large block sizes. We propose two specific rules for deterining the size of each subset. The first is an optial dynaic prograing approach, letting N,(K) be the optial expected nuber of input sybols encoded using K output codewords: subject to the constraints Kl 2 1 and CEKl = K. This optiization is readily solved by dynaic prograing since all the Kl obey Kl < K. However, if is large, a full search ay be coputationally prohibitive. The optial solution can be stored in tabular for and ipleented with a table lookup. Note, the optial codebook does not have to be stored, only the optial subset sizes. The second is a heuristic based on ordinary arithetic coding. n arithetic coding, an interval is subdivided into disjoint subintervals with the size of each subinterval proportional to the probability of the corresponding sybol. We propose to do the sae for BAC. Kl should be chosen as close to plk /* Good Heuristic to deterine the L s (Kt = 1 + Ll( - 1)) */ Sort the sybols so that p1 5 p p. q = 1.0; f,=& for ( = 1; l 5 ; ++) { fi = P1/% 4 = fif, + ((fi(2 - ) - l)/( - 1) + 0.5)]; if (L < 0) L = 0; f, = i - Lt; q=q-p1. Fig. 3. Good probability quantizer. Coputes L1 for 1 = 1,2,..., given L. Note, all quantities not involving L can be precoputed. as possible. Let Kl = [plk], where [S is the quantization of s, with the proviso that each Kl 2 1. One way to do the quantization consistent with Q144 above is described in Fig. 3. The idea is to process the input letters fro least probable to ost probable and, for each letter, to keep track of the nuber of codewords reaining and the total probability reaining. The algorith actually coputes Ll, not Kl, as the forer is soewhat easier. Note, Q5 ay not be satisfied due to quantization effects. However, our coputations and siulations indicate that it alost always is. Furtherore, Q5 can be satisfied if desired by sorting the Lz s. One drawback of the Good quantizer in Fig. 3 is that the Ll s are coputed one at a tie fro the least probable letter up to the letter input. Potentially, this loop ay execute ties for each input sybol. For large, this ay be too slow. One way to circuvent this proble is the Fast quantizer described below. The idea behind the fast quantizer is to copute a cuulative L function, Rl, or, equivalently, a cuulative K function, Ql, and for either Ll or Kl = 1 + (- 1)Ll by taking a difference. The equations for R and L take the for: Ro = 0 (8) R1 = [(L - 1) -& j=1 Ll = R - Rl-1 Those for Q and K are as follows: (9) (10) Qo = 0 (11) Ql = 1 + (- 1)Rl (12) KZ = Q - Qz-1 (13) The Fast quantizer satisfies Q144, though not necessarily Q5. For large alphabets it is slightly less efficient than the Good quantizer, but is uch faster because there is no loop over all the sybols. The algorithic coplexity of BAC can be suarized as follows: Using the good heuristic, the encoder can require up to - 1 ultiplications per input sybol, and a siilar

5 1550 EEE TRANSACTONS ON NFORh4tWON THEORY, VOL. 39, NO. 5, SEPTEMBER 1993 nuber of additions; using the fast heuristic, it only requires two ultiplications and a fixed nuber of additions and coparisons per input sybol. The decocler does the sae coputations of Kl (or L) as the encoder, but also has to search for the proper interval. This searching can be done in O(1og ) operations. Thus, the encoder can be ipleented in 0(1) or O() operations per input sybol and the decoder in O(1og) or O() operations, depending on whether the fast or the good heuristic is used. n the special case of binary inputs, there is little difference, both in coplexity and perforance, between the two heuristics. After the Kl s have been coputed, the optial BAC can be ipleented with no ultiplications and a siilar nuber of additions to the fast heuristic. However, the one tie cost of coputing the optial Kl s ay be high (a brute force search requires approxiately O(LS) operations if is large, where KS = l+ls(-l)). n soe applications, it is convenient to cobine the optial and heuristic approaches in one hybrid approach: use the optial sizes for sall values of K and use the heuristic for large values. As we argue in Section V, both encoders incur their greatest inefficiency for sall values of K. For instance, an encoder can begin with the heuristic until codewords are left and then switch to the optial. Letting Nh(K) represent the expected nuber of input sybols encoded with K output codewords under the heuristic, the expression for N(K) becoes Nh(K) = 1 + CPlNh(blK1). (14) For the oent, if we ignore the quantizing above, relax the boundary conditions, and consider N(.) to be a function of a continuous variable, say s, we get the following equation: enough. See, e.g., Fig. 3.) n this section, we will take all logs to be natural logs to the base e. Theore 1 For independent and identically distributed inputs, logk - H(X)Nh(K) 5 c, (17) for all K and soe constant C. (C depends on the probabilities, the quantization, and on the size of the alphabet, but not on K.) Proofi We will actually show a stronger interediate result, naely that D logk - H(X)Nh(K) G(K) = C - -, K (18) where D > 0. Clearly C - D/K 5 C. The proof is inductive. There are three constants to be deterined: C and D, and a splitting constant, Kc. First, select any KC such that KC > 2/pl, where pl > 0 is the iniu of the pl s. The basis is as follows: For all K 5 Kc, and for any D > 0, we can choose C so that D c 2 logk - H(X)Nh(K) + D 2 log Kc , 1 (19) where, clearly, Nh(K) 2 0 and D/K 5 D. At this point, Kc has been deterined and C has been specified in ters of D and the proposition holds for all K 5 KC. For the inductive step, we now assue that (18) applies for all K K for soe K 2 Kc. Then, we find a condition on D so that (18) applies for all K. Let K = K + 1, then logk - H(X)Nh(K) where the subscript e refers to entropy. The entropy rate for - H(X) - CPlW)Nh(blKl) V to F codes, satisfies this equation. Thus we see that, in the absence of quantization effects, BAC codes are entropy codes. As we argued above, for large K the quantization effects are sall. Furtherore, logk is a slowly varying function of K. We ight expect the heuristic to perfor close to the optial, entropy rate. n fact, we prove in Section 11 below that Nh (K) is within an additive constant (which depends on the p s and ) of the entropy rate for all K ASYMPT~TC RESULTS Consider the heuristic encoder discussed above. Assue the quantization is done so that, for K large enough, blk] = prk + 61, where Assue further that CZ, 61 = 0. (Such quantization can always be achieved for K large We believe this entropy solution uniquely solves (15), but have been unable to prove uniqueness. The inductive hypothesis is used in (20). The last ter above is easily bounded since G(-) is nondecreasing:

6 BONOCELET BLOCK ARTHMETC CODNG FOR SOURCE COMPRESSON 1551 f , then entropy optial heuristic o - runlength -- Siilarly, if 61 < 0, rpi K z (By construction, plk + 61 > 0.) The first ter of (20) can be bounded as follows: CPl(log(PlK) - log(b1k)) -1 f (1 + Za1jK-j 3=1 (25) where ai3 are coefficients that are polynoial functions Of the 61's. They do not otherwise depend on K. Since E;"=, 61 = 0, we get the following: (26) Where Pj = E;"=, 6lalj are polynoial functions of the 61's and do not otherwise depend on K. Now we can bound each ter separately. By construction, 1 + Si/(p;K) > 1/2. Then, The second ter of (26) can be bounded as follows: -1 - PjK-j ( - l)k-'+x Pjl. (28) j= log K Fig. 4. Calculated N(K) versus log K for binary inputs with p = Cobining (21), (27), and (28) and letting D' = ( - 1)2" axj Pjl, we get the following: log K - H(X)Nh(K) G(K - 1) + G(K). Substituting in for G(.), we need to select inequality in (29) is valid: or, equivalently, +-<c-- D D' D C-- K-1 K2- K' D'K-~ (29) D so that the K-1 -DK + D'- K <-DK+D, (31) which is valid for all D > D'. This copletes the proof. 0 Clearly, the optial approach is better than the heuristic and worse than entropy. So, cobining with the theore above, we have the following ordering: -- log K logk-c H(X) - Ne(K) L No(K) 2 Nh(K) L. (32) H(X) V. COMPUTATONAL RESULTS n this section, we copute BAC efficiencies for the heuristic and optial ipleentations and copare these with entropy values. We also copare an ipleentation of BAC and the arithetic coder of Witten, Neal, and Cleary [25], both operating on a binary input, and show that BAC is uch faster. Source code for the tests presented here can be found in [4]. n Fig. 4, we present calculations of N(K) for a hypothetical entropy coder, the optial BAC, the good heuristic BAC, and a siple V to F runlength coder versus log K = 0,1,...,16. The input is binary with p = Pr(z = 1) = Note that the optial and heuristic curves closely follow the entropy line. The axiu difference between both curves and entropy is 4.4 and occurs at log K = 3. n contrast, for log K = 16, the optial is within 2.3 input sybols of the entropy and the heuristic within 2.8. This behavior sees to be consistent over a wide range of values for p. The axiu difference occurs for a sall value of logk. The coders iprove slightly in an absolute sense as log K grows. At soe point the daerence sees to level off. The relative efficiency increases as the nuber of codewords grow. At log K = 3, it

7 1552 EEE TRANSACTONS ON NFOWON THEORY, VOL. 39, NO. 5, SETEMBER entropy - optial good... - fast TABLE r EXECUTON TMES FOR BAC AND WNC FOR A 1 MLLON BT FLE Version Tie (secs) BAC Encoder 4.2 BAC Decoder 3.7 WNC Encoder 21.6 WNC Decoder 29.5 WNC Encoder (fast) 22.2 WNC Decoder (fast) 29.8 O Speed (encoder) 2.0 O Speed (decoder) 1.7 P Entropy Optial Heuristic Runlength Ne(K) No(K) 6 Nh(K) NR(K) is 58% for both and, at logk = 16, it is 96% for the optial BAC and 95% for the heuristic BAC. n Fig. 5, we present calculations of N(K) versus log K for a 27 letter English alphabet taken fro Blahut [l, p Plotted are N(K) for the entropy, optial, and good and fast heuristic BAC s. On this exaple, the curves for the optial and the good heuristic are alost indistinguishable. n Table, we present calculated N(K) for binary inputs and selected p s with log K = 16, i.e., 16 bit codewords. One can see that both the heuristic and optial BAC s are efficient across the whole range. For instance, with p = 0.80, both exceed 98.7% of entropy; with p = 0.95, both exceed 95.0%; and with p = 0.99, both exceed 91.5%. The coputational results support an approxiation. For all K large enough, H(X)Nh(K) M log K - C. n general, it sees to be that < C, where C is taken fro Theore 2. Also in Table are coputed values of C = logk - H(X)Nh(K) for K = 216. As another experient, we ipleented BAC and the coder of Witten, Neal, and Cleary (referred to as WNC) [25] to assess the relative speeds. Source code for WNC appears in their paper, so it akes a good coparison. Speed coparisons are adittedly tricky. To ake the best coparison, both BAC and WNC were optiized for a binary input and the encoders for both share the sae input routine and the decoders the sae output routine. There are two versions of WNC, a straightforward C version, and an optiized C version. We ipleented both and found that, after optiizing for binary inputs, the straightforward version was actually slightly faster than the optiized version. Both were tested on the sae input file, 220 independent and identically distributed bits with a probability of a 1 equal to The execution ties on a SPARC PC are listed in Table 11. The ties are repeatable to within 0.1 seconds. Also listed are the ties for input and output (O Speed), i.e., to read in the input one bit at a tie and write out the appropriate nuber of bytes, and to read in bytes and write out the appropriate nuber of bits. The O Speed nubers do not reflect any coputation, just the input and output necessary to both BAC and WNC. We see that in this test, the BAC encoder is approxiately 5 ties faster than WNC and the BAC decoder is 8 ties faster than the WNC decoder. ndeed, the BAC ties are not uch longer than the inputloutput operations alone. V. EXTENSONS BAC s can be extended to work in ore coplicated environents than that of i.i.d. sybols considered so far. For instance, consider the sybols to be independent, but whose probabilities are tie varying: p(z,j) = Pr(xc, = Q). (33) Then denote the nuber of input sybols encoded with K output sybols starting at tie j by N(K, j). Then, in analogy to (2) we get the following: N(K(j),j) = 1 + Cp(Wvqj.+ l>,j + ), (34) where Kl(j) is the nuber of codewords assigned to a1 at tie j. The heuristic is easy: Choose Kl(j+l) = b(z,j)k(j)]. We can also present an optial dynaic prograing solution: No(K(j),j) = 1+ This proble can be solved backward in tie, fro a axiu j = K to a iniu j = 1. As another exaple consider a tie invariant, first order Markov source. Let p(ilz) = Pr(x(j) = ailx(j - 1) = a). Then the recursion for N(K) splits into two parts. The first is for the first input sybol; the second is for all other input sybols: N(K) = 1 + xpin(ki(z) (36) (37) i=l

8 BONOCELET BLOCK ARTHMETC CODNG FOR SOURCE COMPRESSON 1553 where N(KZ) is the nuber of input sybols encoded using K codewords given that the current input is al. The heuristic again is easy: Choose Ki = p(ilz)k]. The optial solution is harder, although it can be done with dynaic prograing. For every 1, the optial N,(KZ) can be found because each Ki < K. n [3], we show a siilar optiality result to Theore 2 for a class of first order Markov sources. n soe applications it is desirable to adaptively estiate the probabilities. As with strea arithetic coding, BAC encodes the input in a first-in-first-out fashion. The only requireent is that the adaptive forula depends only on previous sybols, not on present or future sybols. V. THE EOF PROBLEM One practical proble that BAC and other arithetic coders have is denoting the end of the input sequence. n particular, the last codeword ay be only partially selected. The siplest solution to this proble is to count the nuber of sybols to be encoded and to send this count before encoding any. t is an easy atter for the decoder to count the nuber of sybols decoded and stop when the appropriate nuber is reached. However, this schee requires that the encoder process the data twice and incurs a transission overhead to send the count. The EOF solution proposed in the strea arithetic coder of Witten, Neal, and Cleary [25] and used in FXAR [22] is to create an artificial letter with inial probability. This letter is only transitted once, denoting the end of the input. When the decoder decodes this special letter, it stops. We coputed BAC s lost efficiency for binary inputs for a variety of probabilities and codebook sizes and found that it averages about 7%. For the English text exaple, the loss averaged about 3.5% over a wide range of codebook sizes. We propose two alternatives to the siple schees above (see also [2]). The first assues the channel can tell the decoder that no ore codewords reain and that the decoder is able to lookahead a odest nuber of bits. The idea is for the encoder to append to the input sequence as any least probable sybols as necessary (possibly zero) to flush out the last codeword. After transitting the last codeword, the encoder transits the nuber of encoded sybols to the decoder. Even with 232 codewords, at ost 31 extra input sybols are needed. This nuber can be encoded with 5 bits. The decoder looks ahead 5 bits until it detects that no ore sybols are present. t then discards the appropriate nuber of appended sybols. The overhead caused by this schee is odest: 5 bits at the end and one possibly wasteful codeword. The second schee assues the channel can not tell the decoder that no ore codewords reain. We suggest devoting one codeword to specify the end of the codeword sequence. (Note, we are not suggesting an extra input letter, but one of the K codewords.) Then append the extra nuber of bits as above. The inefficiency here includes the 5 bits above and the loss due to one less codeword. Using the approxiation discussed in Section V above, one coputes the relative inefficiency as follows: log K - C - log(k - 1) + C M (K(1og K - Cl))- (38) log K - C For K large enough, this overhead is negligible. W. CONCLUSONS AND DSCUSSON We believe that BAC represents an interesting alternative in entropy coding. BAC is siple to ipleent, even for large block sizes, because the algorith is top-down and regular. n contrast, Huffan s and Tunstall s algoriths are not topdown and generally require storing and searching a codebook. BAC is efficient, though probably slightly less efficient than ordinary arithetic coding. The coparison with Huffan is a little ore coplicated. The asyptotic result for BAC is, on the surface, weaker than for Huffan. The BAC bound uses C which ust be coputed on a case by case basis, while the Huffan bound is 1. Both coders can be ade as relatively efficient as desired by selecting the block size large enough. However, BAC can use uch larger block sizes than is practical for Huffan. The BAC asyptotic result is uch stronger than that for Tunstall s coder. For BAC, the difference between entropy and the heuristic rates is bounded. Tunstall shows only that the ratio of rates of entropy and his coder approaches 1 as K n the proof for Theore 2, we have shown that a constant, C, exists that bounds the difference between entropy and the heuristic BAC for all K. We have ade no effort to actually evaluate the constant fro the conditions given in the proof. This is because such an evaluation would be worthless in evaluating the perforance of BAC. As shown in Section V, the constant in practice is quite sall. One iportant need for future research is to provide tight bounds for C and, perhaps, to characterize the difference between the entropy rate and BAC ore accurately. One advantage of BAC copared to Huffan and strea arithetic coding is that BAC uses fixed length output codewords. n the presence of channel errors, BAC will not suffer catastrophic failure. The other two ight. We have also argued that BAC can accoodate ore coplicated situations. Certainly the heuristic can handle tie varying and Markov probabilities. t can estiate the probabilities adaptively. t reains for future work to prove optiality results for these ore coplicated situations. REFERENCES [l] R. E. Blahut, Principles and Practice of nforation Theory. Reading, MA: Addison-Wesley, [2] C. G. Boncelet, Jr., Extensions to block arithetic coding, in Proc Confnfor. Sci. and Syst., Princeton NJ, Mar. 1992, pp [3] -, Block arithetic coding for Markov sources, in Proc nfor. Thov Syp., San Antonio, TX, Jan [4] -, Experients in block arithetic coding, Univ. Delaware, Dep. Elec. Eng., Tech. Rep ,1993. [5] R. M. Capocelli and A. De Santis, New bounds on the redundancy of Hufhnan codes, EEE Trans. nfor. Theory, vol. 37, pp , July [6] R. Elias, Universal codeword sets and representation of the integers, EEE Trans. nfor. Theory, vol. T-21, pp , Mar [7] S. W. Golob, Run-length encodings, EEE Transhafor. Theory, vol. T-12, pp , July 1966.,

9 1554 EEE TRANSACTONS ON NFORMATON THEORY, VOL. 39, NO. 5, SEPTEMBER 1993 [8] M. Guazzo, A general iniu-redundancy source-coding algorith, EEE Trans. nfor. Theory, vol. T-26, pp , [9] D. A. Huffan, A ethod for the construction of iniu-redundancy codes, Proc. RE, pp , Sept [lo] R. Hunter and A. H. Robinson, nternational digital facsiile coding standards, Proc. EEE, vol. 68, pp , July [ll] M. Jakobsson, Huffan coding in bit-vector copression. n& Proc. Left., vol. 7, no. 6, pp , Oct [12] F. Jelinek and K. Schneider, On variable-length-to-block coding, EEE Trans. nfor. Theory, vol. T-18, pp , Nov [13] G. G. Langdon, An introduction to arithetic coding, BM J. Res. Develop., vol. 28, pp , Mar [14] A. Lepel and J. Ziv, On the coplexity of finite sequences, EEE Trans. nfor. Theory, vol. T-22, pp , Jan [15] N. Merhav and D. L. Neuhoff, Variable-to-fixed length codes provide better large deviations perforance than fixed-to-variable length codes, EEE Trans. nfor. Theory, vol. 38, pp , Jan (161 B. L. Montgoery and J. Abraha, Synchronization of binary source codes, EEE Trans. nfor Theory, vol. T-32, pp , Nov [17] -, On the redundancy of optial binary prefix-condition codes for finite and infinite sources, EEE Trans. nfor. Theory, vol. T-33, pp , Jan [18] W. B. Pennebaker, J. L. Mitchell, Jr. G. G. Langdon, and R. B. Arps, An overview of the basic principles of the q-coder adaptive binary arithetic coder, BM J. Res. Develop., vol. 32, no. 6 pp , Nov [ 191 J. Rissanen, Generalized kraft inequality and arithetic coding. BM J. Res. Develop., pp , May [20] J. Rissanen and G. G. Langdon, Arithetic coding, BM J. Res. Develop., vol. 23, no. 2, pp , Mar [21] J. Rissanen and K. M. Mohiuddin, A ultiplication-free ultialphabet arithetic code. EEE Trans. Coun., vol. 37, pp , Feb [22] J. Teuhola and T. Raita, Piecewise arithetic coding. in Proc. Data Copression Con& 1991, J. A. Storer and J. H. Reif, Fds., Snow Bird, UT, Apr. 1991, pp. 3342, [23] B. P. Tunstall, Synthesis of noiseless copression codes, Ph.D. dissertation, Georgia nst. Techno]., [24] T. A. Welch, A technique for high-perforance data copression, EEE Copuf. Mag., pp. 8-19, June [25]. H. Witten, R. M. Neal, and J. G. Cleary, Arithetic coding for data copression. Coun. ACM, vol. 30, pp , June [26] J. Ziv, Variable-to-ked length codes are better than fixed-to-variable length codes for Markov sources, EEE Tra. nfor. Theory, vol. 36, pp , July [27] J. Ziv and A. Lepel, Copression of individual sequences via variable-rate coding, EEE Trans. nfor. Theory, vol. T-24, pp , Sept

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t. CS 493: Algoriths for Massive Data Sets Feb 2, 2002 Local Models, Bloo Filter Scribe: Qin Lv Local Models In global odels, every inverted file entry is copressed with the sae odel. This work wells when

More information

Fixed-to-Variable Length Distribution Matching

Fixed-to-Variable Length Distribution Matching Fixed-to-Variable Length Distribution Matching Rana Ali Ajad and Georg Böcherer Institute for Counications Engineering Technische Universität München, Gerany Eail: raa2463@gail.co,georg.boecherer@tu.de

More information

Lec 05 Arithmetic Coding

Lec 05 Arithmetic Coding Outline CS/EE 5590 / ENG 40 Special Topics (7804, 785, 7803 Lec 05 Arithetic Coding Lecture 04 ReCap Arithetic Coding About Hoework- and Lab Zhu Li Course Web: http://l.web.ukc.edu/lizhu/teaching/06sp.video-counication/ain.htl

More information

Fast Montgomery-like Square Root Computation over GF(2 m ) for All Trinomials

Fast Montgomery-like Square Root Computation over GF(2 m ) for All Trinomials Fast Montgoery-like Square Root Coputation over GF( ) for All Trinoials Yin Li a, Yu Zhang a, a Departent of Coputer Science and Technology, Xinyang Noral University, Henan, P.R.China Abstract This letter

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical IEEE TRANSACTIONS ON INFORMATION THEORY Large Alphabet Source Coding using Independent Coponent Analysis Aichai Painsky, Meber, IEEE, Saharon Rosset and Meir Feder, Fellow, IEEE arxiv:67.7v [cs.it] Jul

More information

Polygonal Designs: Existence and Construction

Polygonal Designs: Existence and Construction Polygonal Designs: Existence and Construction John Hegean Departent of Matheatics, Stanford University, Stanford, CA 9405 Jeff Langford Departent of Matheatics, Drake University, Des Moines, IA 5011 G

More information

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

e-companion ONLY AVAILABLE IN ELECTRONIC FORM OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer

More information

Ph 20.3 Numerical Solution of Ordinary Differential Equations

Ph 20.3 Numerical Solution of Ordinary Differential Equations Ph 20.3 Nuerical Solution of Ordinary Differential Equations Due: Week 5 -v20170314- This Assignent So far, your assignents have tried to failiarize you with the hardware and software in the Physics Coputing

More information

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential

More information

A Better Algorithm For an Ancient Scheduling Problem. David R. Karger Steven J. Phillips Eric Torng. Department of Computer Science

A Better Algorithm For an Ancient Scheduling Problem. David R. Karger Steven J. Phillips Eric Torng. Department of Computer Science A Better Algorith For an Ancient Scheduling Proble David R. Karger Steven J. Phillips Eric Torng Departent of Coputer Science Stanford University Stanford, CA 9435-4 Abstract One of the oldest and siplest

More information

arxiv: v1 [cs.ds] 3 Feb 2014

arxiv: v1 [cs.ds] 3 Feb 2014 arxiv:40.043v [cs.ds] 3 Feb 04 A Bound on the Expected Optiality of Rando Feasible Solutions to Cobinatorial Optiization Probles Evan A. Sultani The Johns Hopins University APL evan@sultani.co http://www.sultani.co/

More information

Non-Parametric Non-Line-of-Sight Identification 1

Non-Parametric Non-Line-of-Sight Identification 1 Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,

More information

Analyzing Simulation Results

Analyzing Simulation Results Analyzing Siulation Results Dr. John Mellor-Cruey Departent of Coputer Science Rice University johnc@cs.rice.edu COMP 528 Lecture 20 31 March 2005 Topics for Today Model verification Model validation Transient

More information

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

On Poset Merging. 1 Introduction. Peter Chen Guoli Ding Steve Seiden. Keywords: Merging, Partial Order, Lower Bounds. AMS Classification: 68W40

On Poset Merging. 1 Introduction. Peter Chen Guoli Ding Steve Seiden. Keywords: Merging, Partial Order, Lower Bounds. AMS Classification: 68W40 On Poset Merging Peter Chen Guoli Ding Steve Seiden Abstract We consider the follow poset erging proble: Let X and Y be two subsets of a partially ordered set S. Given coplete inforation about the ordering

More information

Curious Bounds for Floor Function Sums

Curious Bounds for Floor Function Sums 1 47 6 11 Journal of Integer Sequences, Vol. 1 (018), Article 18.1.8 Curious Bounds for Floor Function Sus Thotsaporn Thanatipanonda and Elaine Wong 1 Science Division Mahidol University International

More information

Vulnerability of MRD-Code-Based Universal Secure Error-Correcting Network Codes under Time-Varying Jamming Links

Vulnerability of MRD-Code-Based Universal Secure Error-Correcting Network Codes under Time-Varying Jamming Links Vulnerability of MRD-Code-Based Universal Secure Error-Correcting Network Codes under Tie-Varying Jaing Links Jun Kurihara KDDI R&D Laboratories, Inc 2 5 Ohara, Fujiino, Saitaa, 356 8502 Japan Eail: kurihara@kddilabsjp

More information

Math 262A Lecture Notes - Nechiporuk s Theorem

Math 262A Lecture Notes - Nechiporuk s Theorem Math 6A Lecture Notes - Nechiporuk s Theore Lecturer: Sa Buss Scribe: Stefan Schneider October, 013 Nechiporuk [1] gives a ethod to derive lower bounds on forula size over the full binary basis B The lower

More information

arxiv: v1 [math.nt] 14 Sep 2014

arxiv: v1 [math.nt] 14 Sep 2014 ROTATION REMAINDERS P. JAMESON GRABER, WASHINGTON AND LEE UNIVERSITY 08 arxiv:1409.411v1 [ath.nt] 14 Sep 014 Abstract. We study properties of an array of nubers, called the triangle, in which each row

More information

A note on the multiplication of sparse matrices

A note on the multiplication of sparse matrices Cent. Eur. J. Cop. Sci. 41) 2014 1-11 DOI: 10.2478/s13537-014-0201-x Central European Journal of Coputer Science A note on the ultiplication of sparse atrices Research Article Keivan Borna 12, Sohrab Aboozarkhani

More information

Randomized Recovery for Boolean Compressed Sensing

Randomized Recovery for Boolean Compressed Sensing Randoized Recovery for Boolean Copressed Sensing Mitra Fatei and Martin Vetterli Laboratory of Audiovisual Counication École Polytechnique Fédéral de Lausanne (EPFL) Eail: {itra.fatei, artin.vetterli}@epfl.ch

More information

Fundamentals of Image Compression

Fundamentals of Image Compression Fundaentals of Iage Copression Iage Copression reduce the size of iage data file while retaining necessary inforation Original uncopressed Iage Copression (encoding) 01101 Decopression (decoding) Copressed

More information

SIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding

SIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding SIGNAL COMPRESSION Lecture 7 Variable to Fix Encoding 1. Tunstall codes 2. Petry codes 3. Generalized Tunstall codes for Markov sources (a presentation of the paper by I. Tabus, G. Korodi, J. Rissanen.

More information

Efficient Filter Banks And Interpolators

Efficient Filter Banks And Interpolators Efficient Filter Banks And Interpolators A. G. DEMPSTER AND N. P. MURPHY Departent of Electronic Systes University of Westinster 115 New Cavendish St, London W1M 8JS United Kingdo Abstract: - Graphical

More information

Characterization of the Line Complexity of Cellular Automata Generated by Polynomial Transition Rules. Bertrand Stone

Characterization of the Line Complexity of Cellular Automata Generated by Polynomial Transition Rules. Bertrand Stone Characterization of the Line Coplexity of Cellular Autoata Generated by Polynoial Transition Rules Bertrand Stone Abstract Cellular autoata are discrete dynaical systes which consist of changing patterns

More information

Homework 3 Solutions CSE 101 Summer 2017

Homework 3 Solutions CSE 101 Summer 2017 Hoework 3 Solutions CSE 0 Suer 207. Scheduling algoriths The following n = 2 jobs with given processing ties have to be scheduled on = 3 parallel and identical processors with the objective of iniizing

More information

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair Proceedings of the 6th SEAS International Conference on Siulation, Modelling and Optiization, Lisbon, Portugal, Septeber -4, 006 0 A Siplified Analytical Approach for Efficiency Evaluation of the eaving

More information

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs BULLETIN OF THE POLISH ACADEMY OF SCIENCES TECHNICAL SCIENCES Vol. 57, No. 3, 2009 Algoriths for parallel processor scheduling with distinct due windows and unit-tie obs A. JANIAK 1, W.A. JANIAK 2, and

More information

Compression and Predictive Distributions for Large Alphabet i.i.d and Markov models

Compression and Predictive Distributions for Large Alphabet i.i.d and Markov models 2014 IEEE International Syposiu on Inforation Theory Copression and Predictive Distributions for Large Alphabet i.i.d and Markov odels Xiao Yang Departent of Statistics Yale University New Haven, CT, 06511

More information

3.8 Three Types of Convergence

3.8 Three Types of Convergence 3.8 Three Types of Convergence 3.8 Three Types of Convergence 93 Suppose that we are given a sequence functions {f k } k N on a set X and another function f on X. What does it ean for f k to converge to

More information

Generalized Alignment Chain: Improved Converse Results for Index Coding

Generalized Alignment Chain: Improved Converse Results for Index Coding Generalized Alignent Chain: Iproved Converse Results for Index Coding Yucheng Liu and Parastoo Sadeghi Research School of Electrical, Energy and Materials Engineering Australian National University, Canberra,

More information

On Constant Power Water-filling

On Constant Power Water-filling On Constant Power Water-filling Wei Yu and John M. Cioffi Electrical Engineering Departent Stanford University, Stanford, CA94305, U.S.A. eails: {weiyu,cioffi}@stanford.edu Abstract This paper derives

More information

When Short Runs Beat Long Runs

When Short Runs Beat Long Runs When Short Runs Beat Long Runs Sean Luke George Mason University http://www.cs.gu.edu/ sean/ Abstract What will yield the best results: doing one run n generations long or doing runs n/ generations long

More information

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search Quantu algoriths (CO 781, Winter 2008) Prof Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search ow we begin to discuss applications of quantu walks to search algoriths

More information

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness A Note on Scheduling Tall/Sall Multiprocessor Tasks with Unit Processing Tie to Miniize Maxiu Tardiness Philippe Baptiste and Baruch Schieber IBM T.J. Watson Research Center P.O. Box 218, Yorktown Heights,

More information

Handout 7. and Pr [M(x) = χ L (x) M(x) =? ] = 1.

Handout 7. and Pr [M(x) = χ L (x) M(x) =? ] = 1. Notes on Coplexity Theory Last updated: October, 2005 Jonathan Katz Handout 7 1 More on Randoized Coplexity Classes Reinder: so far we have seen RP,coRP, and BPP. We introduce two ore tie-bounded randoized

More information

lecture 36: Linear Multistep Mehods: Zero Stability

lecture 36: Linear Multistep Mehods: Zero Stability 95 lecture 36: Linear Multistep Mehods: Zero Stability 5.6 Linear ultistep ethods: zero stability Does consistency iply convergence for linear ultistep ethods? This is always the case for one-step ethods,

More information

The Frequent Paucity of Trivial Strings

The Frequent Paucity of Trivial Strings The Frequent Paucity of Trivial Strings Jack H. Lutz Departent of Coputer Science Iowa State University Aes, IA 50011, USA lutz@cs.iastate.edu Abstract A 1976 theore of Chaitin can be used to show that

More information

The Weierstrass Approximation Theorem

The Weierstrass Approximation Theorem 36 The Weierstrass Approxiation Theore Recall that the fundaental idea underlying the construction of the real nubers is approxiation by the sipler rational nubers. Firstly, nubers are often deterined

More information

Probability Distributions

Probability Distributions Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples

More information

Error Exponents in Asynchronous Communication

Error Exponents in Asynchronous Communication IEEE International Syposiu on Inforation Theory Proceedings Error Exponents in Asynchronous Counication Da Wang EECS Dept., MIT Cabridge, MA, USA Eail: dawang@it.edu Venkat Chandar Lincoln Laboratory,

More information

Multi-Scale/Multi-Resolution: Wavelet Transform

Multi-Scale/Multi-Resolution: Wavelet Transform Multi-Scale/Multi-Resolution: Wavelet Transfor Proble with Fourier Fourier analysis -- breaks down a signal into constituent sinusoids of different frequencies. A serious drawback in transforing to the

More information

Finite fields. and we ve used it in various examples and homework problems. In these notes I will introduce more finite fields

Finite fields. and we ve used it in various examples and homework problems. In these notes I will introduce more finite fields Finite fields I talked in class about the field with two eleents F 2 = {, } and we ve used it in various eaples and hoework probles. In these notes I will introduce ore finite fields F p = {,,...,p } for

More information

List Scheduling and LPT Oliver Braun (09/05/2017)

List Scheduling and LPT Oliver Braun (09/05/2017) List Scheduling and LPT Oliver Braun (09/05/207) We investigate the classical scheduling proble P ax where a set of n independent jobs has to be processed on 2 parallel and identical processors (achines)

More information

Generalized Queries on Probabilistic Context-Free Grammars

Generalized Queries on Probabilistic Context-Free Grammars IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 20, NO. 1, JANUARY 1998 1 Generalized Queries on Probabilistic Context-Free Graars David V. Pynadath and Michael P. Wellan Abstract

More information

Iterative Decoding of LDPC Codes over the q-ary Partial Erasure Channel

Iterative Decoding of LDPC Codes over the q-ary Partial Erasure Channel 1 Iterative Decoding of LDPC Codes over the q-ary Partial Erasure Channel Rai Cohen, Graduate Student eber, IEEE, and Yuval Cassuto, Senior eber, IEEE arxiv:1510.05311v2 [cs.it] 24 ay 2016 Abstract In

More information

An Algorithm for Quantization of Discrete Probability Distributions

An Algorithm for Quantization of Discrete Probability Distributions An Algorith for Quantization of Discrete Probability Distributions Yuriy A. Reznik Qualco Inc., San Diego, CA Eail: yreznik@ieee.org Abstract We study the proble of quantization of discrete probability

More information

The Methods of Solution for Constrained Nonlinear Programming

The Methods of Solution for Constrained Nonlinear Programming Research Inventy: International Journal Of Engineering And Science Vol.4, Issue 3(March 2014), PP 01-06 Issn (e): 2278-4721, Issn (p):2319-6483, www.researchinventy.co The Methods of Solution for Constrained

More information

The Transactional Nature of Quantum Information

The Transactional Nature of Quantum Information The Transactional Nature of Quantu Inforation Subhash Kak Departent of Coputer Science Oklahoa State University Stillwater, OK 7478 ABSTRACT Inforation, in its counications sense, is a transactional property.

More information

SPECTRUM sensing is a core concept of cognitive radio

SPECTRUM sensing is a core concept of cognitive radio World Acadey of Science, Engineering and Technology International Journal of Electronics and Counication Engineering Vol:6, o:2, 202 Efficient Detection Using Sequential Probability Ratio Test in Mobile

More information

Lecture 21. Interior Point Methods Setup and Algorithm

Lecture 21. Interior Point Methods Setup and Algorithm Lecture 21 Interior Point Methods In 1984, Kararkar introduced a new weakly polynoial tie algorith for solving LPs [Kar84a], [Kar84b]. His algorith was theoretically faster than the ellipsoid ethod and

More information

EMPIRICAL COMPLEXITY ANALYSIS OF A MILP-APPROACH FOR OPTIMIZATION OF HYBRID SYSTEMS

EMPIRICAL COMPLEXITY ANALYSIS OF A MILP-APPROACH FOR OPTIMIZATION OF HYBRID SYSTEMS EMPIRICAL COMPLEXITY ANALYSIS OF A MILP-APPROACH FOR OPTIMIZATION OF HYBRID SYSTEMS Jochen Till, Sebastian Engell, Sebastian Panek, and Olaf Stursberg Process Control Lab (CT-AST), University of Dortund,

More information

Convolutional Codes. Lecture Notes 8: Trellis Codes. Example: K=3,M=2, rate 1/2 code. Figure 95: Convolutional Encoder

Convolutional Codes. Lecture Notes 8: Trellis Codes. Example: K=3,M=2, rate 1/2 code. Figure 95: Convolutional Encoder Convolutional Codes Lecture Notes 8: Trellis Codes In this lecture we discuss construction of signals via a trellis. That is, signals are constructed by labeling the branches of an infinite trellis with

More information

Ch 12: Variations on Backpropagation

Ch 12: Variations on Backpropagation Ch 2: Variations on Backpropagation The basic backpropagation algorith is too slow for ost practical applications. It ay take days or weeks of coputer tie. We deonstrate why the backpropagation algorith

More information

LogLog-Beta and More: A New Algorithm for Cardinality Estimation Based on LogLog Counting

LogLog-Beta and More: A New Algorithm for Cardinality Estimation Based on LogLog Counting LogLog-Beta and More: A New Algorith for Cardinality Estiation Based on LogLog Counting Jason Qin, Denys Ki, Yuei Tung The AOLP Core Data Service, AOL, 22000 AOL Way Dulles, VA 20163 E-ail: jasonqin@teaaolco

More information

A method to determine relative stroke detection efficiencies from multiplicity distributions

A method to determine relative stroke detection efficiencies from multiplicity distributions A ethod to deterine relative stroke detection eiciencies ro ultiplicity distributions Schulz W. and Cuins K. 2. Austrian Lightning Detection and Inoration Syste (ALDIS), Kahlenberger Str.2A, 90 Vienna,

More information

Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval

Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval Unifor Approxiation and Bernstein Polynoials with Coefficients in the Unit Interval Weiang Qian and Marc D. Riedel Electrical and Coputer Engineering, University of Minnesota 200 Union St. S.E. Minneapolis,

More information

Low-complexity, Low-memory EMS algorithm for non-binary LDPC codes

Low-complexity, Low-memory EMS algorithm for non-binary LDPC codes Low-coplexity, Low-eory EMS algorith for non-binary LDPC codes Adrian Voicila,David Declercq, François Verdier ETIS ENSEA/CP/CNRS MR-85 954 Cergy-Pontoise, (France) Marc Fossorier Dept. Electrical Engineering

More information

Bipartite subgraphs and the smallest eigenvalue

Bipartite subgraphs and the smallest eigenvalue Bipartite subgraphs and the sallest eigenvalue Noga Alon Benny Sudaov Abstract Two results dealing with the relation between the sallest eigenvalue of a graph and its bipartite subgraphs are obtained.

More information

Kinematics and dynamics, a computational approach

Kinematics and dynamics, a computational approach Kineatics and dynaics, a coputational approach We begin the discussion of nuerical approaches to echanics with the definition for the velocity r r ( t t) r ( t) v( t) li li or r( t t) r( t) v( t) t for

More information

KONINKL. NEDERL. AKADEMIE VAN WETENSCHAPPEN AMSTERDAM Reprinted from Proceedings, Series A, 61, No. 1 and Indag. Math., 20, No.

KONINKL. NEDERL. AKADEMIE VAN WETENSCHAPPEN AMSTERDAM Reprinted from Proceedings, Series A, 61, No. 1 and Indag. Math., 20, No. KONINKL. NEDERL. AKADEMIE VAN WETENSCHAPPEN AMSTERDAM Reprinted fro Proceedings, Series A, 6, No. and Indag. Math., 20, No., 95 8 MATHEMATIC S ON SEQUENCES OF INTEGERS GENERATED BY A SIEVIN G PROCES S

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

Chapter 6 1-D Continuous Groups

Chapter 6 1-D Continuous Groups Chapter 6 1-D Continuous Groups Continuous groups consist of group eleents labelled by one or ore continuous variables, say a 1, a 2,, a r, where each variable has a well- defined range. This chapter explores:

More information

Figure 1: Equivalent electric (RC) circuit of a neurons membrane

Figure 1: Equivalent electric (RC) circuit of a neurons membrane Exercise: Leaky integrate and fire odel of neural spike generation This exercise investigates a siplified odel of how neurons spike in response to current inputs, one of the ost fundaental properties of

More information

time time δ jobs jobs

time time δ jobs jobs Approxiating Total Flow Tie on Parallel Machines Stefano Leonardi Danny Raz y Abstract We consider the proble of optiizing the total ow tie of a strea of jobs that are released over tie in a ultiprocessor

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Coputable Shell Decoposition Bounds John Langford TTI-Chicago jcl@cs.cu.edu David McAllester TTI-Chicago dac@autoreason.co Editor: Leslie Pack Kaelbling and David Cohn Abstract Haussler, Kearns, Seung

More information

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis Soft Coputing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis Beverly Rivera 1,2, Irbis Gallegos 1, and Vladik Kreinovich 2 1 Regional Cyber and Energy Security Center RCES

More information

Interactive Markov Models of Evolutionary Algorithms

Interactive Markov Models of Evolutionary Algorithms Cleveland State University EngagedScholarship@CSU Electrical Engineering & Coputer Science Faculty Publications Electrical Engineering & Coputer Science Departent 2015 Interactive Markov Models of Evolutionary

More information

1 Proof of learning bounds

1 Proof of learning bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a

More information

The Wilson Model of Cortical Neurons Richard B. Wells

The Wilson Model of Cortical Neurons Richard B. Wells The Wilson Model of Cortical Neurons Richard B. Wells I. Refineents on the odgkin-uxley Model The years since odgkin s and uxley s pioneering work have produced a nuber of derivative odgkin-uxley-like

More information

Low complexity bit parallel multiplier for GF(2 m ) generated by equally-spaced trinomials

Low complexity bit parallel multiplier for GF(2 m ) generated by equally-spaced trinomials Inforation Processing Letters 107 008 11 15 www.elsevier.co/locate/ipl Low coplexity bit parallel ultiplier for GF generated by equally-spaced trinoials Haibin Shen a,, Yier Jin a,b a Institute of VLSI

More information

Boosting with log-loss

Boosting with log-loss Boosting with log-loss Marco Cusuano-Towner Septeber 2, 202 The proble Suppose we have data exaples {x i, y i ) i =... } for a two-class proble with y i {, }. Let F x) be the predictor function with the

More information

Birthday Paradox Calculations and Approximation

Birthday Paradox Calculations and Approximation Birthday Paradox Calculations and Approxiation Joshua E. Hill InfoGard Laboratories -March- v. Birthday Proble In the birthday proble, we have a group of n randoly selected people. If we assue that birthdays

More information

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay A Low-Coplexity Congestion Control and Scheduling Algorith for Multihop Wireless Networks with Order-Optial Per-Flow Delay Po-Kai Huang, Xiaojun Lin, and Chih-Chun Wang School of Electrical and Coputer

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

The Euler-Maclaurin Formula and Sums of Powers

The Euler-Maclaurin Formula and Sums of Powers DRAFT VOL 79, NO 1, FEBRUARY 26 1 The Euler-Maclaurin Forula and Sus of Powers Michael Z Spivey University of Puget Sound Tacoa, WA 98416 spivey@upsedu Matheaticians have long been intrigued by the su

More information

Lower Bounds for Quantized Matrix Completion

Lower Bounds for Quantized Matrix Completion Lower Bounds for Quantized Matrix Copletion Mary Wootters and Yaniv Plan Departent of Matheatics University of Michigan Ann Arbor, MI Eail: wootters, yplan}@uich.edu Mark A. Davenport School of Elec. &

More information

arxiv: v1 [cs.ds] 17 Mar 2016

arxiv: v1 [cs.ds] 17 Mar 2016 Tight Bounds for Single-Pass Streaing Coplexity of the Set Cover Proble Sepehr Assadi Sanjeev Khanna Yang Li Abstract arxiv:1603.05715v1 [cs.ds] 17 Mar 2016 We resolve the space coplexity of single-pass

More information

Design of Spatially Coupled LDPC Codes over GF(q) for Windowed Decoding

Design of Spatially Coupled LDPC Codes over GF(q) for Windowed Decoding IEEE TRANSACTIONS ON INFORMATION THEORY (SUBMITTED PAPER) 1 Design of Spatially Coupled LDPC Codes over GF(q) for Windowed Decoding Lai Wei, Student Meber, IEEE, David G. M. Mitchell, Meber, IEEE, Thoas

More information

1 Generalization bounds based on Rademacher complexity

1 Generalization bounds based on Rademacher complexity COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #0 Scribe: Suqi Liu March 07, 08 Last tie we started proving this very general result about how quickly the epirical average converges

More information

NBN Algorithm Introduction Computational Fundamentals. Bogdan M. Wilamoswki Auburn University. Hao Yu Auburn University

NBN Algorithm Introduction Computational Fundamentals. Bogdan M. Wilamoswki Auburn University. Hao Yu Auburn University NBN Algorith Bogdan M. Wilaoswki Auburn University Hao Yu Auburn University Nicholas Cotton Auburn University. Introduction. -. Coputational Fundaentals - Definition of Basic Concepts in Neural Network

More information

The Simplex Method is Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate

The Simplex Method is Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate The Siplex Method is Strongly Polynoial for the Markov Decision Proble with a Fixed Discount Rate Yinyu Ye April 20, 2010 Abstract In this note we prove that the classic siplex ethod with the ost-negativereduced-cost

More information

C na (1) a=l. c = CO + Clm + CZ TWO-STAGE SAMPLE DESIGN WITH SMALL CLUSTERS. 1. Introduction

C na (1) a=l. c = CO + Clm + CZ TWO-STAGE SAMPLE DESIGN WITH SMALL CLUSTERS. 1. Introduction TWO-STGE SMPLE DESIGN WITH SMLL CLUSTERS Robert G. Clark and David G. Steel School of Matheatics and pplied Statistics, University of Wollongong, NSW 5 ustralia. (robert.clark@abs.gov.au) Key Words: saple

More information

PHY 171. Lecture 14. (February 16, 2012)

PHY 171. Lecture 14. (February 16, 2012) PHY 171 Lecture 14 (February 16, 212) In the last lecture, we looked at a quantitative connection between acroscopic and icroscopic quantities by deriving an expression for pressure based on the assuptions

More information

Warning System of Dangerous Chemical Gas in Factory Based on Wireless Sensor Network

Warning System of Dangerous Chemical Gas in Factory Based on Wireless Sensor Network 565 A publication of CHEMICAL ENGINEERING TRANSACTIONS VOL. 59, 07 Guest Editors: Zhuo Yang, Junie Ba, Jing Pan Copyright 07, AIDIC Servizi S.r.l. ISBN 978-88-95608-49-5; ISSN 83-96 The Italian Association

More information

arxiv: v3 [cs.ds] 22 Mar 2016

arxiv: v3 [cs.ds] 22 Mar 2016 A Shifting Bloo Filter Fraewor for Set Queries arxiv:1510.03019v3 [cs.ds] Mar 01 ABSTRACT Tong Yang Peing University, China yangtongeail@gail.co Yuanun Zhong Nanjing University, China un@sail.nju.edu.cn

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic

More information

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition

More information

Optimal Resource Allocation in Multicast Device-to-Device Communications Underlaying LTE Networks

Optimal Resource Allocation in Multicast Device-to-Device Communications Underlaying LTE Networks 1 Optial Resource Allocation in Multicast Device-to-Device Counications Underlaying LTE Networks Hadi Meshgi 1, Dongei Zhao 1 and Rong Zheng 2 1 Departent of Electrical and Coputer Engineering, McMaster

More information

In this chapter, we consider several graph-theoretic and probabilistic models

In this chapter, we consider several graph-theoretic and probabilistic models THREE ONE GRAPH-THEORETIC AND STATISTICAL MODELS 3.1 INTRODUCTION In this chapter, we consider several graph-theoretic and probabilistic odels for a social network, which we do under different assuptions

More information

Symbolic Analysis as Universal Tool for Deriving Properties of Non-linear Algorithms Case study of EM Algorithm

Symbolic Analysis as Universal Tool for Deriving Properties of Non-linear Algorithms Case study of EM Algorithm Acta Polytechnica Hungarica Vol., No., 04 Sybolic Analysis as Universal Tool for Deriving Properties of Non-linear Algoriths Case study of EM Algorith Vladiir Mladenović, Miroslav Lutovac, Dana Porrat

More information

Ensemble Based on Data Envelopment Analysis

Ensemble Based on Data Envelopment Analysis Enseble Based on Data Envelopent Analysis So Young Sohn & Hong Choi Departent of Coputer Science & Industrial Systes Engineering, Yonsei University, Seoul, Korea Tel) 82-2-223-404, Fax) 82-2- 364-7807

More information

Constant-Space String-Matching. in Sublinear Average Time. (Extended Abstract) Wojciech Rytter z. Warsaw University. and. University of Liverpool

Constant-Space String-Matching. in Sublinear Average Time. (Extended Abstract) Wojciech Rytter z. Warsaw University. and. University of Liverpool Constant-Space String-Matching in Sublinear Average Tie (Extended Abstract) Maxie Crocheore Universite de Marne-la-Vallee Leszek Gasieniec y Max-Planck Institut fur Inforatik Wojciech Rytter z Warsaw University

More information

N-Point. DFTs of Two Length-N Real Sequences

N-Point. DFTs of Two Length-N Real Sequences Coputation of the DFT of In ost practical applications, sequences of interest are real In such cases, the syetry properties of the DFT given in Table 5. can be exploited to ake the DFT coputations ore

More information

Department of Electronic and Optical Engineering, Ordnance Engineering College, Shijiazhuang, , China

Department of Electronic and Optical Engineering, Ordnance Engineering College, Shijiazhuang, , China 6th International Conference on Machinery, Materials, Environent, Biotechnology and Coputer (MMEBC 06) Solving Multi-Sensor Multi-Target Assignent Proble Based on Copositive Cobat Efficiency and QPSO Algorith

More information

Combining Classifiers

Combining Classifiers Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/

More information

Necessity of low effective dimension

Necessity of low effective dimension Necessity of low effective diension Art B. Owen Stanford University October 2002, Orig: July 2002 Abstract Practitioners have long noticed that quasi-monte Carlo ethods work very well on functions that

More information