COMP9319 Web Data Compression and Search. Lecture 2: Adaptive Huffman, BWT
|
|
- Tamsin Porter
- 6 years ago
- Views:
Transcription
1 COMP9319 Web Data Compression and Search Lecture 2: daptive Huffman, BWT 1
2 Original readings Login to your cse account: cd ~cs9319/papers Original readings of each lecture will be placed there. 2
3 Course schedule Data compression Search Data compression + Search Web data compression + Search Optional topics 3
4 Huffman coding S Freq Huffman a b 6 10 c d e k a b c d e
5 Huffman not optimal H = log log / log L = (100000*1 + )/
6 Problems of Huffman coding Huffman codes have an integral # of bits. E.g., log (3) = while Huffman may need 2 bits Noticeable non-optimality when prob of a symbol is high. => rithmetic coding 6
7 Problems of Static coding Need statistics & static: e.g., single pass over the data just to collect stat & stat unchanged during encoding To decode, the stat table need to be transmitted. Table size can be significant for small msg. => daptive compression e.g., adaptive huffman 7
8 daptive compression Encoder Initialize the model Repeat for each input char ( Encode char Update the model ) Decoder Initialize the model Repeat for each input char ( Decode char Update the model ) Make sure both sides have the same Initialize & update model algorithms. 8
9 daptive Huffman Coding (dummy) Encoder Reset the stat Repeat for each input char ( Encode char Update the stat Rebuild huffman tree ) Decoder Reset the stat Repeat for each input char ( Decode char Update the stat Rebuild huffman tree ) 9
10 daptive Huffman Coding (dummy) Encoder Reset the stat Repeat for each input char ( Encode char Update the stat Rebuild huffman tree ) Decoder Reset the stat Repeat for each input char ( Decode char Update the stat Rebuild huffman tree ) This works but too slow! 10
11 daptive Huffman (lgorithm outline) 1. If current symbol is NYT, add two child nodes to NYT node. One will be a new NYT node the other is a leaf node for our symbol. Increase weight for the new leaf node and the old NYT and go to step 4. If not, go to symbol's leaf node. 2. If this node does not have the highest number in a block, swap it with the node having the highest number 3. Increase weight for current node 4. If this is not the root node go to parent node then go to step 2. If this is the root, end. 11
12 The update procedure from Introduction to Data Compression by by Sayood Khalid lso, Wikipedia provides a good summary, example and explanation (i.e., en.wikipedia.org/wiki/ daptive_huffman_coding) 12
13 daptive Huffman abbbbba: abbbbba: a: b: Modified from Wikipedia
14 More example 256: W=17 252: W=3 254: W=7 253: W=4 e 255: W=10 More aaaa. coming a 248: W=1 b 249: W=2 c 250: W=2 d 251: W=2 14
15 More example 256: W=18 252: W=4 254: W=8 253: W=4 e 255: W=10 a 248: W=2 b 249: W=2 c 250: W=2 d 251: W=2 15
16 More example 256: W=19 252: W=4 254: W=9 253: W=5 e 255: W=10 d 248: W=2 b 249: W=2 c 250: W=2 a 251: W=3 16
17 More example 256: W=20 252: W=4 254: W=10 253: W=6 e 255: W=10 d 248: W=2 b 249: W=2 c 250: W=2 a 251: W=4 17
18 More example 256: W=20 252: W=4 254: W=10 253: W=6 e 255: W=10 d 248: W=2 b 249: W=2 c 250: W=2 a 251: W=4 18
19 More example : W=20 a 252: W=5 254: W= : W=6 e 255: W=10 251: W=4 c 250: W=2 19 d 248: W=2 b 249: W=2
20 More example 256: W=21 e 254: W=10 a 252: W=5 255: W=11 253: W=6 251: W=4 c 250: W=2 20 d 248: W=2 b 249: W=2
21 3 daptive Huffman (FGK)
22 daptive Huffman (FGK): when f is inserted 4
23 daptive Huffman (FGK vs Vitter) 1. FGK: (Explicit) node numbering Vitter: Implicit numbering 2. Vitter s Invariant: 5
24 aa bbb c (Huffman) a 2 b 3 c 1 sp c 2 sp 2 a 3 b 6
25 aa bbb c (Huffman) a 2 b 3 c 1 sp c 2 sp 2 a 3 b 7 a 01 b 1 c 000 sp 001 Total 16bits a 10 b 11 c 00 sp 01
26 8 daptive Huffman (Vitter s Invariant)
27 daptive Huffman (Vitter 1987) abbbbba: abbbbba: a: b: Modified from Wikipedia
28 daptive Huffman (Vitter 1987) abbbbba: abbbbba: a a: b: You can correct the Wikipedia article Modified from Wikipedia
29 11 daptive Huffman (Vitter 87)
30 daptive Huffman Question: daptive Huffman vs Static Huffman 21
31 Compared with Static Huffman Dynamic and can offer better compression (cf. Vitter s experiments next) i.e., the tree can be smaller (hence shorter the code) before the whole bitstream is received. Works when prior stat is unavailable Saves symbol table overhead (cf. Vitter s expt next) 22
32 Vitter s experiments Include overheads such as symbol tables / leaf node code etc. 95 SCII chars + <end-of-line> Exclude overheads such as symbol tables / leaf node code etc. 23 From Vitter s paper. You know where it is.
33 24 More experiments
34 Next BWT BWT: Burrows Wheeler Transform It is a transform, not a compression; but it usually helps compression (esp. text compression). 25 Excerpted from Wikipedia
35 Recall from Lecture 1 s RLE and BWT example rabcabcababaabacabcabcabcababaa$ aabbbbccacccrcbaaaaaaaaaabbbbba$ aab4ccac3rcba10b5a$ 26
36 simple example Input: #BNNS 27 Excerpted from Wikipedia
37 ll rotations #BNNS S#BNN S#BNN NS#BN NS#BN NNS#B NNS#B BNNS# 28
38 Sort the rows #BNNS NNS#B NS#BN S#BNN BNNS# NNS#B NS#BN S#BNN 29
39 Output #BNNS NNS#B NS#BN S#BNN BNNS# NNS#B NS#BN S#BNN 30
40 Exercise: you can try the example rabcabcababaabacabcabcabcababaa$ aabbbbccacccrcbaaaaaaaaaabbbbba$ 31
41 Now the inverse Input: S B N N # 32
42 First add S B N N # 33
43 Then sort # B N N S 34
44 dd again S# B N N #B N N S 35
45 Then sort #B N N S B N N S# 36
46 Then add S#B BN NN NS #B N N S# 37
47 Then sort #B N N S# BN NN NS S#B 38
48 Then add S#B BN NN NS# #BN NN NS S#B 39
49 Then sort #BN NN NS S#B BN NN NS# S#B 40
50 Then add S#BN BNN NNS NS#B #BN NN NS# S#B 41
51 Then sort #BN NN NS# S#B BNN NNS NS#B S#BN 42
52 Then add S#BN BNN NNS# NS#B #BNN NNS NS#B S#BN 43
53 Then sort #BNN NNS NS#B S#BN BNN NNS# NS#B S#BN 44
54 Then add S#BNN BNNS NNS#B NS#BN #BNN NNS# NS#B S#BN 45
55 Then sort #BNN NNS# NS#B S#BN BNNS NNS#B NS#BN S#BNN 46
56 Then add S#BNN BNNS# NNS#B NS#BN #BNNS NNS#B NS#BN S#BNN 47
57 Then sort (?) #BNNS NNS#B NS#BN S#BNN BNNS# NNS#B NS#BN S#BNN 48
58 Implementation Do we need to represent the table in the encoder? No, a single pointer for each row is needed. 49
59 BWT(S) function BWT (string s) create a table, rows are all possible rotations of s sort rows alphabetically return (last column of the table) 50
60 InverseBWT(S) function inversebwt (string s) create empty table repeat length(s) times insert s as a column of table before first column of the table // first insert creates first column sort rows of the table alphabetically return (row that ends with the 'EOF' character) 51
61 Move to Front (MTF) Reduce entropy based on local frequency correlation Usually used for BWT before an entropyencoding step uthor and detail: Original paper at cs9319/papers 52
62 Example: abaabacad Symbol Code List a 0 abcde.. b 1 bacde.. a 1 abcde.. a 0 abcde.. b 1 bacde.. a 1 abcde.. c 2 cabde.. a 1 acbde.. d 3 dacbe.. 53 To transform a general file, the list has 256 SCII symbols.
63 BWT compressor vs ZIP ZIP (i.e., LZW based) BWT+RLE+MTF+C 54 From
64 Other ways to reverse BWT Consider L=BWT(S) is composed of the symbols V 0 V N-1, the transformed string may be parsed to obtain: The number of symbols in the substring V 0 V i-1 that are identical to V i. For each unique symbol, V i, in L, the number of symbols that are lexicographically less than that symbol. 55
65 Example Position Symbol # Matching 0 B 0 1 N 0 2 N 1 3 [ ] Symbol # LessThan 0 B 3 N 4 [ 6 ] 7 56
66 ???????] Position Symbol # Matching 0 B 0 1 N 0 2 N 1 3 [ ] Symbol # LessThan 0 B 3 N 4 [ 6 ] 7 57
67 ??????] Position Symbol # Matching 0 B 0 1 N 0 2 N 1 3 [ ] Symbol # LessThan 0 B 3 N 4 [ 6 ] 7 58
68 ?????N] Position Symbol # Matching 0 B 0 1 N 0 2 N 1 3 [ ] Symbol # LessThan 0 B 3 N 4 [ 6 ] 7 59
69 ????N] Position Symbol # Matching 0 B 0 1 N 0 2 N 1 3 [ ] Symbol # LessThan 0 B 3 N 4 [ 6 ] 7 60
70 ???NN] Position Symbol # Matching 0 B 0 1 N 0 2 N 1 3 [ ] Symbol # LessThan 0 B 3 N 4 [ 6 ] 7 61
71 ??NN] Position Symbol # Matching 0 B 0 1 N 0 2 N 1 3 [ ] Symbol # LessThan 0 B 3 N 4 [ 6 ] 7 62
72 ?BNN] Position Symbol # Matching 0 B 0 1 N 0 2 N 1 3 [ ] Symbol # LessThan 0 B 3 N 4 [ 6 ] 7 63
73 [BNN] Position Symbol # Matching 0 B 0 1 N 0 2 N 1 3 [ ] Symbol # LessThan 0 B 3 N 4 [ 6 ] 7 64
74 [BNN] Position Symbol # Matching 0 B 0 1 N 0 2 N 1 3 [ ] Symbol # LessThan 0 B 3 N 4 [ 6 ] 7 65 Occ / Rank C [ ]
75 n illustration B N N [ ] B N N [ ] First Last 66
76 ] B N N [ ] B N N [ ] 67
77 N] B N N [ ] B N N [ ] 68
78 N] B N N [ ] B N N [ ] 69
79 NN] B N N [ ] B N N [ ] 70
80 NN] B N N [ ] B N N [ ] 71
81 BNN] B N N [ ] B N N [ ] 72
82 [BNN] B N N [ ] B N N [ ] 73
83 Dynamic BWT? Instead of reconstructing BWT, local reordering from the original BWT. Details: Salson M, Lecroq T, Léonard M and Mouchard L (2009). " Four-Stage lgorithm for Updating a Burrows Wheeler Transform". Theoretical Computer Science 410 (43):
COMP9319 Web Data Compression and Search. Lecture 2: Adaptive Huffman, BWT
COMP9319 Web Data Compression and Search Lecture 2: daptive Huffman, BWT 1 Original readings Login to your cse account:! cd ~cs9319/papers! Original readings of each lecture will be placed there. 2 Course
More informationSources: The Data Compression Book, 2 nd Ed., Mark Nelson and Jean-Loup Gailly.
Lossless ompression Multimedia Systems (Module 2 Lesson 2) Summary: daptive oding daptive Huffman oding Sibling Property Update lgorithm rithmetic oding oding and ecoding Issues: OF problem, Zero frequency
More informationCSE 421 Greedy: Huffman Codes
CSE 421 Greedy: Huffman Codes Yin Tat Lee 1 Compression Example 100k file, 6 letter alphabet: File Size: ASCII, 8 bits/char: 800kbits 2 3 > 6; 3 bits/char: 300kbits better: 2.52 bits/char 74%*2 +26%*4:
More informationData Compression Techniques
Data Compression Techniques Part 2: Text Compression Lecture 5: Context-Based Compression Juha Kärkkäinen 14.11.2017 1 / 19 Text Compression We will now look at techniques for text compression. These techniques
More informationLecture 4 : Adaptive source coding algorithms
Lecture 4 : Adaptive source coding algorithms February 2, 28 Information Theory Outline 1. Motivation ; 2. adaptive Huffman encoding ; 3. Gallager and Knuth s method ; 4. Dictionary methods : Lempel-Ziv
More informationEntropy Coding. Connectivity coding. Entropy coding. Definitions. Lossles coder. Input: a set of symbols Output: bitstream. Idea
Connectivity coding Entropy Coding dd 7, dd 6, dd 7, dd 5,... TG output... CRRRLSLECRRE Entropy coder output Connectivity data Edgebreaker output Digital Geometry Processing - Spring 8, Technion Digital
More informationSource Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria
Source Coding Master Universitario en Ingeniería de Telecomunicación I. Santamaría Universidad de Cantabria Contents Introduction Asymptotic Equipartition Property Optimal Codes (Huffman Coding) Universal
More informationLecture 1 : Data Compression and Entropy
CPS290: Algorithmic Foundations of Data Science January 8, 207 Lecture : Data Compression and Entropy Lecturer: Kamesh Munagala Scribe: Kamesh Munagala In this lecture, we will study a simple model for
More informationData Compression Techniques
Data Compression Techniques Part 2: Text Compression Lecture 7: Burrows Wheeler Compression Juha Kärkkäinen 21.11.2017 1 / 16 Burrows Wheeler Transform The Burrows Wheeler transform (BWT) is a transformation
More informationHuffman Coding. C.M. Liu Perceptual Lab, College of Computer Science National Chiao-Tung University
Huffman Coding C.M. Liu Perceptual Lab, College of Computer Science National Chiao-Tung University http://www.csie.nctu.edu.tw/~cmliu/courses/compression/ Office: EC538 (03)573877 cmliu@cs.nctu.edu.tw
More informationAlphabet Friendly FM Index
Alphabet Friendly FM Index Author: Rodrigo González Santiago, November 8 th, 2005 Departamento de Ciencias de la Computación Universidad de Chile Outline Motivations Basics Burrows Wheeler Transform FM
More informationRun-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE
General e Image Coder Structure Motion Video x(s 1,s 2,t) or x(s 1,s 2 ) Natural Image Sampling A form of data compression; usually lossless, but can be lossy Redundancy Removal Lossless compression: predictive
More informationCSEP 590 Data Compression Autumn Arithmetic Coding
CSEP 590 Data Compression Autumn 2007 Arithmetic Coding Reals in Binary Any real number x in the interval [0,1) can be represented in binary as.b 1 b 2... where b i is a bit. x 0 0 1 0 1... binary representation
More informationCSEP 521 Applied Algorithms Spring Statistical Lossless Data Compression
CSEP 52 Applied Algorithms Spring 25 Statistical Lossless Data Compression Outline for Tonight Basic Concepts in Data Compression Entropy Prefix codes Huffman Coding Arithmetic Coding Run Length Coding
More informationBandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet)
Compression Motivation Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet) Storage: Store large & complex 3D models (e.g. 3D scanner
More informationState of the art Image Compression Techniques
Chapter 4 State of the art Image Compression Techniques In this thesis we focus mainly on the adaption of state of the art wavelet based image compression techniques to programmable hardware. Thus, an
More informationInteger Sorting on the word-ram
Integer Sorting on the word-rm Uri Zwick Tel viv University May 2015 Last updated: June 30, 2015 Integer sorting Memory is composed of w-bit words. rithmetical, logical and shift operations on w-bit words
More informationLecture 16. Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code
Lecture 16 Agenda for the lecture Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code Variable-length source codes with error 16.1 Error-free coding schemes 16.1.1 The Shannon-Fano-Elias
More informationCS4800: Algorithms & Data Jonathan Ullman
CS4800: Algorithms & Data Jonathan Ullman Lecture 22: Greedy Algorithms: Huffman Codes Data Compression and Entropy Apr 5, 2018 Data Compression How do we store strings of text compactly? A (binary) code
More informationSource Coding Techniques
Source Coding Techniques. Huffman Code. 2. Two-pass Huffman Code. 3. Lemple-Ziv Code. 4. Fano code. 5. Shannon Code. 6. Arithmetic Code. Source Coding Techniques. Huffman Code. 2. Two-path Huffman Code.
More informationSlides for CIS 675. Huffman Encoding, 1. Huffman Encoding, 2. Huffman Encoding, 3. Encoding 1. DPV Chapter 5, Part 2. Encoding 2
Huffman Encoding, 1 EECS Slides for CIS 675 DPV Chapter 5, Part 2 Jim Royer October 13, 2009 A toy example: Suppose our alphabet is { A, B, C, D }. Suppose T is a text of 130 million characters. What is
More informationSplay trees (Sleator, Tarjan 1983)
Splay trees (Sleator, Tarjan 1983) 1 Main idea Try to arrange so frequently used items are near the root We shall assume that there is an item in every node including internal nodes. We can change this
More informationMultimedia. Multimedia Data Compression (Lossless Compression Algorithms)
Course Code 005636 (Fall 2017) Multimedia Multimedia Data Compression (Lossless Compression Algorithms) Prof. S. M. Riazul Islam, Dept. of Computer Engineering, Sejong University, Korea E-mail: riaz@sejong.ac.kr
More informationData Compression Techniques
Data Compression Techniques Part 1: Entropy Coding Lecture 4: Asymmetric Numeral Systems Juha Kärkkäinen 08.11.2017 1 / 19 Asymmetric Numeral Systems Asymmetric numeral systems (ANS) is a recent entropy
More informationCMPT 365 Multimedia Systems. Lossless Compression
CMPT 365 Multimedia Systems Lossless Compression Spring 2017 Edited from slides by Dr. Jiangchuan Liu CMPT365 Multimedia Systems 1 Outline Why compression? Entropy Variable Length Coding Shannon-Fano Coding
More informationSummary of Last Lectures
Lossless Coding IV a k p k b k a 0.16 111 b 0.04 0001 c 0.04 0000 d 0.16 110 e 0.23 01 f 0.07 1001 g 0.06 1000 h 0.09 001 i 0.15 101 100 root 1 60 1 0 0 1 40 0 32 28 23 e 17 1 0 1 0 1 0 16 a 16 d 15 i
More informationChapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code
Chapter 3 Source Coding 3. An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code 3. An Introduction to Source Coding Entropy (in bits per symbol) implies in average
More informationLecture 3 : Algorithms for source coding. September 30, 2016
Lecture 3 : Algorithms for source coding September 30, 2016 Outline 1. Huffman code ; proof of optimality ; 2. Coding with intervals : Shannon-Fano-Elias code and Shannon code ; 3. Arithmetic coding. 1/39
More informationSIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding
SIGNAL COMPRESSION Lecture 7 Variable to Fix Encoding 1. Tunstall codes 2. Petry codes 3. Generalized Tunstall codes for Markov sources (a presentation of the paper by I. Tabus, G. Korodi, J. Rissanen.
More informationAutumn Coping with NP-completeness (Conclusion) Introduction to Data Compression
Autumn Coping with NP-completeness (Conclusion) Introduction to Data Compression Kirkpatrick (984) Analogy from thermodynamics. The best crystals are found by annealing. First heat up the material to let
More information4.8 Huffman Codes. These lecture slides are supplied by Mathijs de Weerd
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd Data Compression Q. Given a text that uses 32 symbols (26 different letters, space, and some punctuation characters), how can we
More informationComputing Techniques for Parallel and Distributed Systems with an Application to Data Compression. Sergio De Agostino Sapienza University di Rome
Computing Techniques for Parallel and Distributed Systems with an Application to Data Compression Sergio De Agostino Sapienza University di Rome Parallel Systems A parallel random access machine (PRAM)
More informationText Compression. Jayadev Misra The University of Texas at Austin December 5, A Very Incomplete Introduction to Information Theory 2
Text Compression Jayadev Misra The University of Texas at Austin December 5, 2003 Contents 1 Introduction 1 2 A Very Incomplete Introduction to Information Theory 2 3 Huffman Coding 5 3.1 Uniquely Decodable
More informationEECS 229A Spring 2007 * * (a) By stationarity and the chain rule for entropy, we have
EECS 229A Spring 2007 * * Solutions to Homework 3 1. Problem 4.11 on pg. 93 of the text. Stationary processes (a) By stationarity and the chain rule for entropy, we have H(X 0 ) + H(X n X 0 ) = H(X 0,
More informationBASIC COMPRESSION TECHNIQUES
BASIC COMPRESSION TECHNIQUES N. C. State University CSC557 Multimedia Computing and Networking Fall 2001 Lectures # 05 Questions / Problems / Announcements? 2 Matlab demo of DFT Low-pass windowed-sinc
More information21. Dynamic Programming III. FPTAS [Ottman/Widmayer, Kap. 7.2, 7.3, Cormen et al, Kap. 15,35.5]
575 21. Dynamic Programming III FPTAS [Ottman/Widmayer, Kap. 7.2, 7.3, Cormen et al, Kap. 15,35.5] Approximation 576 Let ε (0, 1) given. Let I opt an optimal selection. No try to find a valid selection
More informationChapter 2: Source coding
Chapter 2: meghdadi@ensil.unilim.fr University of Limoges Chapter 2: Entropy of Markov Source Chapter 2: Entropy of Markov Source Markov model for information sources Given the present, the future is independent
More informationUNIT I INFORMATION THEORY. I k log 2
UNIT I INFORMATION THEORY Claude Shannon 1916-2001 Creator of Information Theory, lays the foundation for implementing logic in digital circuits as part of his Masters Thesis! (1939) and published a paper
More informationA Four-Stage Algorithm for Updating a Burrows-Wheeler Transform
A Four-Stage Algorithm for Updating a Burrows-Wheeler ransform M. Salson a,1,. Lecroq a, M. Léonard a, L. Mouchard a,b, a Université de Rouen, LIIS EA 4108, 76821 Mont Saint Aignan, France b Algorithm
More informationCMPT 365 Multimedia Systems. Final Review - 1
CMPT 365 Multimedia Systems Final Review - 1 Spring 2017 CMPT365 Multimedia Systems 1 Outline Entropy Lossless Compression Shannon-Fano Coding Huffman Coding LZW Coding Arithmetic Coding Lossy Compression
More informationStatic Huffman. Wrong probabilities. Adaptive Huffman. Canonical Huffman Trees. Algorithm for canonical trees. Example of a canonical tree
Wrong probabilities What is different in this text? Static Huffman known tree is used for compressing a file. different tree can be used for each type of file. For example a different tree for an nglish
More informationImage and Multidimensional Signal Processing
Image and Multidimensional Signal Processing Professor William Hoff Dept of Electrical Engineering &Computer Science http://inside.mines.edu/~whoff/ Image Compression 2 Image Compression Goal: Reduce amount
More informationTheoretical Computer Science. Dynamic rank/select structures with applications to run-length encoded texts
Theoretical Computer Science 410 (2009) 4402 4413 Contents lists available at ScienceDirect Theoretical Computer Science journal homepage: www.elsevier.com/locate/tcs Dynamic rank/select structures with
More informationIntroduction to information theory and coding
Introduction to information theory and coding Louis WEHENKEL Set of slides No 5 State of the art in data compression Stochastic processes and models for information sources First Shannon theorem : data
More informationCHAPTER 8 COMPRESSION ENTROPY ESTIMATION OF HEART RATE VARIABILITY AND COMPUTATION OF ITS RENORMALIZED ENTROPY
108 CHAPTER 8 COMPRESSION ENTROPY ESTIMATION OF HEART RATE VARIABILITY AND COMPUTATION OF ITS RENORMALIZED ENTROPY 8.1 INTRODUCTION Klimontovich s S-theorem offers an approach to compare two different
More informationLecture 10 : Basic Compression Algorithms
Lecture 10 : Basic Compression Algorithms Modeling and Compression We are interested in modeling multimedia data. To model means to replace something complex with a simpler (= shorter) analog. Some models
More informationSIGNAL COMPRESSION Lecture Shannon-Fano-Elias Codes and Arithmetic Coding
SIGNAL COMPRESSION Lecture 3 4.9.2007 Shannon-Fano-Elias Codes and Arithmetic Coding 1 Shannon-Fano-Elias Coding We discuss how to encode the symbols {a 1, a 2,..., a m }, knowing their probabilities,
More information3 Greedy Algorithms. 3.1 An activity-selection problem
3 Greedy Algorithms [BB chapter 6] with different examples or [Par chapter 2.3] with different examples or [CLR2 chapter 16] with different approach to greedy algorithms 3.1 An activity-selection problem
More informationGreedy. Outline CS141. Stefano Lonardi, UCR 1. Activity selection Fractional knapsack Huffman encoding Later:
October 5, 017 Greedy Chapters 5 of Dasgupta et al. 1 Activity selection Fractional knapsack Huffman encoding Later: Outline Dijkstra (single source shortest path) Prim and Kruskal (minimum spanning tree)
More informationLecture 18 April 26, 2012
6.851: Advanced Data Structures Spring 2012 Prof. Erik Demaine Lecture 18 April 26, 2012 1 Overview In the last lecture we introduced the concept of implicit, succinct, and compact data structures, and
More informationOptimal codes - I. A code is optimal if it has the shortest codeword length L. i i. This can be seen as an optimization problem. min.
Huffman coding Optimal codes - I A code is optimal if it has the shortest codeword length L L m = i= pl i i This can be seen as an optimization problem min i= li subject to D m m i= lp Gabriele Monfardini
More informationMultimedia Information Systems
Multimedia Information Systems Samson Cheung EE 639, Fall 2004 Lecture 3 & 4: Color, Video, and Fundamentals of Data Compression 1 Color Science Light is an electromagnetic wave. Its color is characterized
More informationText Indexing: Lecture 6
Simon Gog gog@kit.edu - 0 Simon Gog: KIT The Research University in the Helmholtz Association www.kit.edu Reviewing the last two lectures We have seen two top-k document retrieval frameworks. Question
More informationPreview: Text Indexing
Simon Gog gog@ira.uka.de - Simon Gog: KIT University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association www.kit.edu Text Indexing Motivation Problems Given a text
More informationInformation Theory. Week 4 Compressing streams. Iain Murray,
Information Theory http://www.inf.ed.ac.uk/teaching/courses/it/ Week 4 Compressing streams Iain Murray, 2014 School of Informatics, University of Edinburgh Jensen s inequality For convex functions: E[f(x)]
More informationSGN-2306 Signal Compression. 1. Simple Codes
SGN-236 Signal Compression. Simple Codes. Signal Representation versus Signal Compression.2 Prefix Codes.3 Trees associated with prefix codes.4 Kraft inequality.5 A lower bound on the average length of
More informationOptimal Dynamic Sequence Representations
Optimal Dynamic Sequence Representations Gonzalo Navarro Yakov Nekrich Abstract We describe a data structure that supports access, rank and select queries, as well as symbol insertions and deletions, on
More informationBinary Search Trees. Lecture 29 Section Robb T. Koether. Hampden-Sydney College. Fri, Apr 8, 2016
Binary Search Trees Lecture 29 Section 19.2 Robb T. Koether Hampden-Sydney College Fri, Apr 8, 2016 Robb T. Koether (Hampden-Sydney College) Binary Search Trees Fri, Apr 8, 2016 1 / 40 1 Binary Search
More informationModule 5 EMBEDDED WAVELET CODING. Version 2 ECE IIT, Kharagpur
Module 5 EMBEDDED WAVELET CODING Lesson 13 Zerotree Approach. Instructional Objectives At the end of this lesson, the students should be able to: 1. Explain the principle of embedded coding. 2. Show the
More informationCS 229r Information Theory in Computer Science Feb 12, Lecture 5
CS 229r Information Theory in Computer Science Feb 12, 2019 Lecture 5 Instructor: Madhu Sudan Scribe: Pranay Tankala 1 Overview A universal compression algorithm is a single compression algorithm applicable
More information6.02 Fall 2012 Lecture #1
6.02 Fall 2012 Lecture #1 Digital vs. analog communication The birth of modern digital communication Information and entropy Codes, Huffman coding 6.02 Fall 2012 Lecture 1, Slide #1 6.02 Fall 2012 Lecture
More informationData Compression Using a Sort-Based Context Similarity Measure
Data Compression Using a Sort-Based Context Similarity easure HIDETOSHI YOKOO Department of Computer Science, Gunma University, Kiryu, Gunma 76, Japan Email: yokoo@cs.gunma-u.ac.jp Every symbol in the
More informationCompressed Representations of Sequences and Full-Text Indexes
Compressed Representations of Sequences and Full-Text Indexes PAOLO FERRAGINA Università di Pisa GIOVANNI MANZINI Università del Piemonte Orientale VELI MÄKINEN University of Helsinki AND GONZALO NAVARRO
More information1. Basics of Information
1. Basics of Information 6.004x Computation Structures Part 1 Digital Circuits Copyright 2015 MIT EECS 6.004 Computation Structures L1: Basics of Information, Slide #1 What is Information? Information,
More informationMidterm 2 for CS 170
UC Berkeley CS 170 Midterm 2 Lecturer: Gene Myers November 9 Midterm 2 for CS 170 Print your name:, (last) (first) Sign your name: Write your section number (e.g. 101): Write your sid: One page of notes
More informationLecture 1: Shannon s Theorem
Lecture 1: Shannon s Theorem Lecturer: Travis Gagie January 13th, 2015 Welcome to Data Compression! I m Travis and I ll be your instructor this week. If you haven t registered yet, don t worry, we ll work
More informationCSEP 590 Data Compression Autumn Dictionary Coding LZW, LZ77
CSEP 590 Data Compression Autumn 2007 Dictionary Coding LZW, LZ77 Dictionary Coding Does not use statistical knowledge of data. Encoder: As the input is processed develop a dictionary and transmit the
More informationSuccinct Suffix Arrays based on Run-Length Encoding
Succinct Suffix Arrays based on Run-Length Encoding Veli Mäkinen Gonzalo Navarro Abstract A succinct full-text self-index is a data structure built on a text T = t 1 t 2...t n, which takes little space
More informationF U N C T I O N A L P E A R L S Inverting the Burrows-Wheeler Transform
Under consideration for publication in J. Functional Programming 1 F U N C T I O N A L P E A R L S Inverting the Burrows-Wheeler Transform RICHARD BIRD and SHIN-CHENG MU Programming Research Group, Oxford
More informationCompact Data Strutures
(To compress is to Conquer) Compact Data Strutures Antonio Fariña, Javier D. Fernández and Miguel A. Martinez-Prieto 3rd KEYSTONE Training School Keyword search in Big Linked Data 23 TH AUGUST 2017 Agenda
More informationarxiv: v1 [cs.ds] 19 Apr 2011
Fixed Block Compression Boosting in FM-Indexes Juha Kärkkäinen 1 and Simon J. Puglisi 2 1 Department of Computer Science, University of Helsinki, Finland juha.karkkainen@cs.helsinki.fi 2 Department of
More informationAdapting Boyer-Moore-Like Algorithms for Searching Huffman Encoded Texts
Adapting Boyer-Moore-Like Algorithms for Searching Huffman Encoded Texts Domenico Cantone Simone Faro Emanuele Giaquinta Department of Mathematics and Computer Science, University of Catania, Italy 1 /
More informationSmaller and Faster Lempel-Ziv Indices
Smaller and Faster Lempel-Ziv Indices Diego Arroyuelo and Gonzalo Navarro Dept. of Computer Science, Universidad de Chile, Chile. {darroyue,gnavarro}@dcc.uchile.cl Abstract. Given a text T[1..u] over an
More informationInformation Theory and Statistics Lecture 2: Source coding
Information Theory and Statistics Lecture 2: Source coding Łukasz Dębowski ldebowsk@ipipan.waw.pl Ph. D. Programme 2013/2014 Injections and codes Definition (injection) Function f is called an injection
More informationAlgorithm Design and Analysis
Algorithm Design and Analysis LECTURE 8 Greedy Algorithms V Huffman Codes Adam Smith Review Questions Let G be a connected undirected graph with distinct edge weights. Answer true or false: Let e be the
More informationOpportunistic Data Structures with Applications
Opportunistic Data Structures with Applications Paolo Ferragina Giovanni Manzini Abstract There is an upsurging interest in designing succinct data structures for basic searching problems (see [23] and
More information17.1 Binary Codes Normal numbers we use are in base 10, which are called decimal numbers. Each digit can be 10 possible numbers: 0, 1, 2, 9.
( c ) E p s t e i n, C a r t e r, B o l l i n g e r, A u r i s p a C h a p t e r 17: I n f o r m a t i o n S c i e n c e P a g e 1 CHAPTER 17: Information Science 17.1 Binary Codes Normal numbers we use
More informationInformation Theory with Applications, Math6397 Lecture Notes from September 30, 2014 taken by Ilknur Telkes
Information Theory with Applications, Math6397 Lecture Notes from September 3, 24 taken by Ilknur Telkes Last Time Kraft inequality (sep.or) prefix code Shannon Fano code Bound for average code-word length
More informationDynamic Entropy-Compressed Sequences and Full-Text Indexes
Dynamic Entropy-Compressed Sequences and Full-Text Indexes VELI MÄKINEN University of Helsinki and GONZALO NAVARRO University of Chile First author funded by the Academy of Finland under grant 108219.
More informationIntroduction to Theory of Computing
CSCI 2670, Fall 2012 Introduction to Theory of Computing Department of Computer Science University of Georgia Athens, GA 30602 Instructor: Liming Cai www.cs.uga.edu/ cai 0 Lecture Note 3 Context-Free Languages
More informationAverage Case Analysis of QuickSort and Insertion Tree Height using Incompressibility
Average Case Analysis of QuickSort and Insertion Tree Height using Incompressibility Tao Jiang, Ming Li, Brendan Lucier September 26, 2005 Abstract In this paper we study the Kolmogorov Complexity of a
More informationInformation Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay
Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 13 Competitive Optimality of the Shannon Code So, far we have studied
More informationCoding of memoryless sources 1/35
Coding of memoryless sources 1/35 Outline 1. Morse coding ; 2. Definitions : encoding, encoding efficiency ; 3. fixed length codes, encoding integers ; 4. prefix condition ; 5. Kraft and Mac Millan theorems
More informationSequence comparison by compression
Sequence comparison by compression Motivation similarity as a marker for homology. And homology is used to infer function. Sometimes, we are only interested in a numerical distance between two sequences.
More information2018/5/3. YU Xiangyu
2018/5/3 YU Xiangyu yuxy@scut.edu.cn Entropy Huffman Code Entropy of Discrete Source Definition of entropy: If an information source X can generate n different messages x 1, x 2,, x i,, x n, then the
More informationEntropy as a measure of surprise
Entropy as a measure of surprise Lecture 5: Sam Roweis September 26, 25 What does information do? It removes uncertainty. Information Conveyed = Uncertainty Removed = Surprise Yielded. How should we quantify
More informationChapter 2 Date Compression: Source Coding. 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code
Chapter 2 Date Compression: Source Coding 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code 2.1 An Introduction to Source Coding Source coding can be seen as an efficient way
More informationData Compression for use in the Short Messaging System
Data Compression for use in the Short Messaging System by Måns Andersson This thesis is presented as part of Degree of Bachelor of Science in Computer Science Blekinge Institute of Technology June 1, 2010
More informationCSCI 2570 Introduction to Nanocomputing
CSCI 2570 Introduction to Nanocomputing Information Theory John E Savage What is Information Theory Introduced by Claude Shannon. See Wikipedia Two foci: a) data compression and b) reliable communication
More information10-704: Information Processing and Learning Fall Lecture 10: Oct 3
0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 0: Oct 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy of
More informationImage Data Compression
Image Data Compression Image data compression is important for - image archiving e.g. satellite data - image transmission e.g. web data - multimedia applications e.g. desk-top editing Image data compression
More informationSuccincter text indexing with wildcards
University of British Columbia CPM 2011 June 27, 2011 Problem overview Problem overview Problem overview Problem overview Problem overview Problem overview Problem overview Problem overview Problem overview
More informationLec 03 Entropy and Coding II Hoffman and Golomb Coding
CS/EE 5590 / ENG 40 Special Topics Multimedia Communication, Spring 207 Lec 03 Entropy and Coding II Hoffman and Golomb Coding Zhu Li Z. Li Multimedia Communciation, 207 Spring p. Outline Lecture 02 ReCap
More informationarxiv: v1 [cs.ds] 15 Feb 2012
Linear-Space Substring Range Counting over Polylogarithmic Alphabets Travis Gagie 1 and Pawe l Gawrychowski 2 1 Aalto University, Finland travis.gagie@aalto.fi 2 Max Planck Institute, Germany gawry@cs.uni.wroc.pl
More informationLecture 2: Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments
Lecture 2: Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments Dr. Jian Zhang Conjoint Associate Professor NICTA & CSE UNSW COMP9519 Multimedia Systems S2 2006 jzhang@cse.unsw.edu.au
More informationData Compression Techniques (Spring 2012) Model Solutions for Exercise 2
582487 Data Compression Techniques (Spring 22) Model Solutions for Exercise 2 If you have any feedback or corrections, please contact nvalimak at cs.helsinki.fi.. Problem: Construct a canonical prefix
More information4. Quantization and Data Compression. ECE 302 Spring 2012 Purdue University, School of ECE Prof. Ilya Pollak
4. Quantization and Data Compression ECE 32 Spring 22 Purdue University, School of ECE Prof. What is data compression? Reducing the file size without compromising the quality of the data stored in the
More informationLec 04 Variable Length Coding (VLC) in JPEG
ECE 5578 Multimedia Communication Lec 04 Variable Length Coding (VLC) in JPEG Zhu Li Dept of CSEE, UMKC Z. Li Multimedia Communciation, 2018 p.1 Outline Lecture 03 ReCap VLC JPEG Image Coding Framework
More informationNoisy channel communication
Information Theory http://www.inf.ed.ac.uk/teaching/courses/it/ Week 6 Communication channels and Information Some notes on the noisy channel setup: Iain Murray, 2012 School of Informatics, University
More information1 Introduction to information theory
1 Introduction to information theory 1.1 Introduction In this chapter we present some of the basic concepts of information theory. The situations we have in mind involve the exchange of information through
More information