Static Huffman. Wrong probabilities. Adaptive Huffman. Canonical Huffman Trees. Algorithm for canonical trees. Example of a canonical tree

Similar documents
Multimedia Communications. Mathematical Preliminaries for Lossless Compression

Data Compression Techniques

SIGNAL COMPRESSION Lecture Shannon-Fano-Elias Codes and Arithmetic Coding

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE

Lecture 1 : Data Compression and Entropy

Sources: The Data Compression Book, 2 nd Ed., Mark Nelson and Jean-Loup Gailly.

SIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding

10-704: Information Processing and Learning Fall Lecture 10: Oct 3

Lecture 4 : Adaptive source coding algorithms

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

Entropy Coding. Connectivity coding. Entropy coding. Definitions. Lossles coder. Input: a set of symbols Output: bitstream. Idea

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

17.1 Binary Codes Normal numbers we use are in base 10, which are called decimal numbers. Each digit can be 10 possible numbers: 0, 1, 2, 9.

Lecture 3 : Algorithms for source coding. September 30, 2016

COMP9319 Web Data Compression and Search. Lecture 2: Adaptive Huffman, BWT

Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet)

3F1 Information Theory, Lecture 3

Information Theory. Week 4 Compressing streams. Iain Murray,

Basic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols.

Adapting Boyer-Moore-Like Algorithms for Searching Huffman Encoded Texts

Shannon-Fano-Elias coding

Data Compression Techniques

Chapter 2 Date Compression: Source Coding. 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code

COMP9319 Web Data Compression and Search. Lecture 2: Adaptive Huffman, BWT

UNIT I INFORMATION THEORY. I k log 2

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code

1. Basics of Information

Source Coding Techniques

4. "God called the light (BRIGHT, DAY), and the darkness He called (NIGHT, SPACE). So the evening and the morning were the first day.

CSE 421 Greedy: Huffman Codes

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

CSEP 590 Data Compression Autumn Arithmetic Coding

Data Compression Techniques (Spring 2012) Model Solutions for Exercise 2

Huffman Coding. C.M. Liu Perceptual Lab, College of Computer Science National Chiao-Tung University

( c ) E p s t e i n, C a r t e r a n d B o l l i n g e r C h a p t e r 1 7 : I n f o r m a t i o n S c i e n c e P a g e 1

Lecture 16. Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16

Multimedia. Multimedia Data Compression (Lossless Compression Algorithms)

A Mathematical Theory of Communication

4.8 Huffman Codes. These lecture slides are supplied by Mathijs de Weerd

Optimal codes - I. A code is optimal if it has the shortest codeword length L. i i. This can be seen as an optimization problem. min.

3 Greedy Algorithms. 3.1 An activity-selection problem

21. Dynamic Programming III. FPTAS [Ottman/Widmayer, Kap. 7.2, 7.3, Cormen et al, Kap. 15,35.5]

Coding of memoryless sources 1/35

EECS 229A Spring 2007 * * (a) By stationarity and the chain rule for entropy, we have

EE376A - Information Theory Midterm, Tuesday February 10th. Please start answering each question on a new page of the answer booklet.

CSEP 521 Applied Algorithms Spring Statistical Lossless Data Compression

TTIC 31230, Fundamentals of Deep Learning David McAllester, April Information Theory and Distribution Modeling

CMPT 365 Multimedia Systems. Lossless Compression

Lecture 1: Shannon s Theorem

Nearest Neighbor Search with Keywords: Compression

Slides for CIS 675. Huffman Encoding, 1. Huffman Encoding, 2. Huffman Encoding, 3. Encoding 1. DPV Chapter 5, Part 2. Encoding 2

Splay trees (Sleator, Tarjan 1983)

Exercises with solutions (Set B)

COMM901 Source Coding and Compression. Quiz 1

Image and Multidimensional Signal Processing

Using an innovative coding algorithm for data encryption

Multimedia Information Systems

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Lecture 10 : Basic Compression Algorithms

SUPPLEMENTARY INFORMATION

Entropy as a measure of surprise

1 Introduction to information theory

Lec 03 Entropy and Coding II Hoffman and Golomb Coding

Section Summary. Sequences. Recurrence Relations. Summations. Examples: Geometric Progression, Arithmetic Progression. Example: Fibonacci Sequence

Chapter 2: Source coding

Information & Correlation

Data Compression. Limit of Information Compression. October, Examples of codes 1

Lecture 17: Trees and Merge Sort 10:00 AM, Oct 15, 2018

Lecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1)

Text Compression. Jayadev Misra The University of Texas at Austin December 5, A Very Incomplete Introduction to Information Theory 2

Fibonacci Coding for Lossless Data Compression A Review

Tight Upper Bounds on the Redundancy of Optimal Binary AIFV Codes

3F1 Information Theory, Lecture 3

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Noisy channel communication

lossless, optimal compressor

Information Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18

PREDICTION THE MISSING PARTS OF SATELLITE IMAGE BY USING THE MODIFIED ARITHMETIC CODING METHOD

Motivation for Arithmetic Coding

Lecture 11: Quantum Information III - Source Coding

Summary of Last Lectures

Greedy. Outline CS141. Stefano Lonardi, UCR 1. Activity selection Fractional knapsack Huffman encoding Later:

Error Detection and Correction: Hamming Code; Reed-Muller Code

Information Theory and Statistics Lecture 2: Source coding

Homework Set #2 Data Compression, Huffman code and AEP

Compression and Coding

CS4800: Algorithms & Data Jonathan Ullman

Autumn Coping with NP-completeness (Conclusion) Introduction to Data Compression

2. SPECTRAL ANALYSIS APPLIED TO STOCHASTIC PROCESSES

Introduction to information theory and coding

MARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for

Stream Codes. 6.1 The guessing game

U Logo Use Guidelines

An Approximation Algorithm for Constructing Error Detecting Prefix Codes

Section Summary. Relations and Functions Properties of Relations. Combining Relations

Sequence comparison by compression

Advanced Implementations of Tables: Balanced Search Trees and Hashing

CMPT 365 Multimedia Systems. Final Review - 1

Lecture 8: Channel and source-channel coding theorems; BEC & linear codes. 1 Intuitive justification for upper bound on channel capacity

Transcription:

Wrong probabilities What is different in this text? Static Huffman known tree is used for compressing a file. different tree can be used for each type of file. For example a different tree for an nglish text and a different tree for a Hebrew text. Two passes on file. One pass for building the tree and one for compression.. Klein S. T. and Wiseman Y.. Klein S. T. and Wiseman Y. anonical Huffman Trees. Klein S. T. and Wiseman Y. daptive Huffman x t is encoded with tree of x,,x t-. Tree is changed during compression. Only one pass. No need to transmit the tree. Two possibilities: t the beginning assume each item appeared once. t the beginning probabilities are wrong but after a large amount of data the error is negligible. When a new character appears send an escape character before it.. Klein S. T. and Wiseman Y. xample of a canonical tree Suppose the lengths are: -, -, -, -, -, F-, G- The sorted list is: -, -, -, -, F-, -, G- Item L i i L L i j j= F G.. Klein S. T. and Wiseman Y.............. lgorithm for canonical trees Find a Huffman tree with lengths L,,L n for the items. Sort the items according to their lengths. ssign to each item the first L i bits after the binary point of i L j j=. Klein S. T. and Wiseman Y.!

rrors in Huffman coded files In the beginning God created the heaven and the earth. nd the earth was without form, and void; and darkness was upon the face of the deep. nd the Spirit of God moved upon the face of the waters. nd God said, Let there be light: and there was light. nd God saw the light, that it was good: and God divided the light from the darkness. nd God called the light ay, and the darkness he called Night. nd the evening and the morning were the first day. nd God said, Let there be a firmament in the midst of the waters, and let it divide the waters from the waters What will happen if the compressed file is read from arbitrary points? : ac darkness was upon the face csoaraters. nd God said, Let there be light d lnrathat it was good: and God divided.aauya dy, and the darkness he called Night c y. nd God said, Let there be a firmament in the midst. Klein S. T. and Wiseman Y. Why canonical trees? canonical tree can be transferred easily. Send number of items for every length. Send order of items. anonical codes synchronize faster after errors. anonical codes can be decoded faster.. Klein S. T. and Wiseman Y. efinitions Let P,,P n be the probabilities of the items (The leaves of the Huffman tree). Let L,..,L n be the length of codewords. Let X be the set of internal nodes in the Huffman tree. x X let I x be the set of leaves in the sub-tree rooted by x.. Klein S. T. and Wiseman Y. Synchronization after error If the code is not an affix code: erroneous synchronization correct erroneous synchronization correct. Klein S. T. and Wiseman Y. Synchronization's probability Let S denote the event that the synchronization point is at the end of the codeword including x X. PyQ(x,y) PyQ(x,y) y Ix x X y I P(S) = P(x) x = P W x X y y I x. Klein S. T. and Wiseman Y. Formulas verage codeword's length is W = P i L i The Probability that an arbitrary point in the file will be in node x is: Py y P(x) = Ix x X and y I x define: Q(x,y)= {. Klein S. T. and Wiseman Y. n i= W if the path from x to y corresponds to a sequence of one or more codewords in the code otherwise!

Synchronization canonical trees synchronize better since every sub-tree of a canonical tree is a canonical tree itself. xpected number of bits until synchronization is: = P W (S) Nelson anonical Probability for a canonical tree Nelson's trees are the trees described in. with some other features...... P(S).... bib paper obj french- english- hebrew- french english hebrew file name. Klein S. T. and Wiseman Y.. Klein S. T. and Wiseman Y. Skeleton Trees No need to save the whole tree e.g. if a codeword starts with, it ought to be of length bits. Thus, we can read the following bits as a block.. Klein S. T. and Wiseman Y. xpected synchronization vg Max bib paper obj french- english- hebrew- french english hebrew file name. Klein S. T. and Wiseman Y. bits efinition Let m=min{ l n i > } where n l is the number of codewords of length l. Let base(l) be: base(m)= base(l)=(base(l-)+n l- ) Let seq(l) be: seq(m)= seq(l)=seq(l-)+n l- Illustration of a Skeleton Tree This is the skeleton tree for the code on the previous slide. It has nodes, while the original one has nodes.. Klein S. T. and Wiseman Y.. Klein S. T. and Wiseman Y.!

n example for values efinition (cont.) These are the values for the code depicted on the previous slides. l n i base(l) seq(l) diff(l) Let s (k) denote the s-bit binary representation of the integer k with leading zeros if necessary. Let I(w) be the integer value of the binary string w, i.e. if w is of length of l, w= l (I(w)). I(w)-base(l) is the relative index of codeword w within the block of codewords of length l. seq(l)+i(w)-base(l) is the relative index of w within the full list of codewords. This can be rewritten as I(w)-diff(l), for diff(l)=base(l)-seq(l). Thus all one needs is the list of diff(l).. Klein S. T. and Wiseman Y.. Klein S. T. and Wiseman Y. Reduced Skeleton Trees efine for each node v of the Skeleton Tree: If v is a leaf lower(v)=upper(v)=value(v) If v is an internal node lower(v)=lower(left(v)) upper(v)=upper(right(v)) Reduced Skeleton Tree is the smallest subtree of the original Skeleton Tree for which all the leaves w hold: upper(w) lower(w)+. Klein S. T. and Wiseman Y. ecoding lgorithm i start while i<length_of_string if string[i]= tree_pointer left(tree_pointer) else tree_pointer right(tree_pointer) if value(tree_pointer)> codeword string[start (start+value(tree_pointer)-)] output table[i(codeword)-diff[value(tree_pointer)]] start start+value(tree_pointer) i start else i++. Klein S. T. and Wiseman Y. nother example of a reduced tree This is a reduced Skeleton Tree for bigrams of the Hebrew ible. Just lengths up to are listed. - Illustration of a reduced tree This is the reduced tree of the previous depicted tree. If have been read we can know that the length of the codeword is either or. If the bits after were, we would have performed four more comparisons and still cannot know if the length is or - - - - - - - - - - - - -. Klein S. T. and Wiseman Y.. Klein S. T. and Wiseman Y.!

ffix codes ffix codes are never synchronizing, but they can be decoded backward. PL/ allows files on magnetic tapes to be accessed in reverse order. Information Retrieval systems use concordance points to the words locations in the text. When a word is retrieved, typically, a context of some words is displayed.. Klein S. T. and Wiseman Y. lgorithm of reduced trees i start while i<length_of_string if string[i]= tree_pointer left(tree_pointer) else tree_pointer right(tree_pointer) if value(tree_pointer)> len value(tree_pointer) codeword string[start (start+len-)] if flag(tree_pointer)= and I(codeword) base(len+) codeword string[start(start+len) len++ output table[i(codeword)-diff[len]] i start start+len else i++. Klein S. T. and Wiseman Y. Markov chains Non-Trivial ffix codes sequence of events, each of which depends only on n events before it, is called an n th Order Markov chain. First order Markov chain - vent t is depending just on event t-. th order Markov chain - events are independent. xamples: Fibonacci sequence is a second order chain. n arithmetic sequence is a first order chain. Fixed length codes are called trivial affix codes. Theorem: There are infinite non-trivial complete affix codes. Proof: One non-trivial code is showed in this slide. Let ={a,,a n } be an affix code. onsider the set ={b,,b n } defined by b i =a i, b i- =a i for i n. Obviously is an affix code.. Klein S. T. and Wiseman Y.. Klein S. T. and Wiseman Y. lustering Markov chains for Huffman trees can be expanded for n th order chains. Overhead for saving so many trees can be very high. Similar trees can be clustered into a one tree, which will be the average of the original trees. xample: The trees of v and b may be similar since they have a similar sound. Markov chain of Huffman trees different Huffman tree for each item in the set. The tree for an item x will have the probabilities of each item to appear after x. xamples: u will have a much shorter codeword in q's tree, than other trees..ג will have a much longer codeword after ט This method implements a first order Markov chain.. Klein S. T. and Wiseman Y.. Klein S. T. and Wiseman Y.!