ASYMMETRIC NUMERAL SYSTEMS: ADDING FRACTIONAL BITS TO HUFFMAN CODER

Size: px
Start display at page:

Download "ASYMMETRIC NUMERAL SYSTEMS: ADDING FRACTIONAL BITS TO HUFFMAN CODER"

Transcription

1 ASYMMETRIC NUMERAL SYSTEMS: ADDING FRACTIONAL BITS TO HUFFMAN CODER Huffman coding Arithmetic coding fast, but operates on integer number of bits: approximates probabilities with powers of ½, getting inferior compression rate accurate, but many times slower (computationally more demanding) Asymmetric Numeral Systems (ANS) accurate and faster than Huffman we construct low state automaton optimized for given probability distribution Jarek Duda Kraków,

2 We need n bits of information to choose one of 2 n possibilities. For length n 0/1 sequences with pn of 1, how many bits we need to choose one? Entropy coder: encodes sequence with (p i ) i=1..m probability distribution using asymptotically at least H = m i=1 p i lg(1/p i ) bits/symbol (H lg(m)) Seen as weighted average: symbol of probability p contains lg(1/p) bits. Encoding (p i ) distribution with entropy coder optimal for (q i ) distribution costs ΔH = p i lg(1/q i ) p i lg(1/p i ) = p i lg ( p i ) 1 i i q i ln (4) i p i more bits/symbol - so called Kullback-Leibler distance (not symmetric). i (p i q i ) 2 2

3 Symbol frequencies from a version of the British National Corpus containing 67 million words ( ): Uniform: lg(27) bits/symbol (x 27x + s) Huffman uses: H = i p i r i bits/symbol Shannon: H = i p i lg(1/p i ) bits/symbol We can theoretically improve by 1% here ΔH bits/symbol order 1 Markov: ~3.3 bits/symbol order 2: ~3.1 bits/symbol, word: ~2.1 bits/symbol Currently the best text compression: durlica kingsize ( ) 10 9 bytes of text from Wikipedia (enwik9) into bytes: 1bit/symbol (lossy) video compression: ~ 1000 reduction 3

4 Huffman codes encode symbols as bit sequences Perfect if symbol probabilities are powers of 2 e.g. p(a) =1/2, p(b)=1/4, p(c)=1/4 A 0 0 B 1 1 C A 0 C 11 B 10 We can reduce ΔH by grouping m symbols together (alphabet size 2 m or 3 m ): Generally, the maximal depth (R = max r i ) grows proportionally to m here, i ΔH drops approximately proportionally to 1/m, ΔH 1/R (= 1/lg (L)) for ANS: ΔH 4 R (= 1/L 2 for L = 2 R is the number of states) 4

5 Fast Huffman decoding step for maximal depth R: let X contain last R bits (requires decodingtable of size proportional to L = 2 R ): t = decodingtable[x]; // X {0,, 2 R 1} is current state usesymbol(t. symbol); // use or store decoded symbol X = t. newx + readbits(t. nbbits); // state transition where t. newx for Huffman are unused bits of the state, shifted to oldest position: t. newx = (X nbbits) & (2 R 1) (mod(a, 2 R ) = a & (2 R 1)) tans: the same decoder, different t. newx: not only shifts the unused bits, but also modifies them accordingly to the remaining fractional bits (lg(1/p)): x = X + L {L,, 2L 1} buffer containing lg(x) [R, R + 1) bits: 5

6 Operating on fractional number of bits restricting range L to length l subrange number x contains lg(l/l) bits contains lg(x) bits adding symbol of probability p - containing lg(1/p) bits lg(l/l) + lg(1/p) lg(l/l ) bits for l p l lg(x) + lg(1/p) = lg(x ) for x x/p 6

7 Asymmetric numeral systems redefine even/odd numbers: symbol distribution: s: N A (A alphabet, e.g. {0,1}) ( s(x) = mod(x, b) for base b numeral system: C(s, x) = bx + s) Should be still uniformly distributed but with density p s : # { 0 y < x: s(y) = s} xp s then x becomes x-th appearance of given symbol: D(x ) = (s(x ), # {0 y < x : s(y) = s(x )}) C(D(x )) = x D(C(s, x)) = (s, x) x x/p s example: range asymmetric binary systems (rabs) Pr(0) = 1/4 Pr(1) = 3/4 - take base 4 system and merge 3 digits, cyclic (0123) symbol distribution s becomes cyclic (0111): s(x) = 0 if mod(x, 4) = 0, else 1 to decode or encode 1, localize quadruple ( x/4 or x/3 ) if s(x) = 0, D(x) = (0, x/4 ) else D(x) = (1,3 x/4 + mod(x, 4) 1) C(0, x) = 4x C(1, x) = 4 x/3 + mod(x, 3) + 1 7

8 rans - range variant for large alphabet A = {0,.., m 1} assume Pr(s) = f s /2 n c s : = f 0 + f f s 1 start with base 2 n numeral system and merge length f s ranges for x {0,1,, 2 n 1}, s(x) = max{s: c s x} encoding: C(s, x) = x/f s n + mod(x, f s ) + c s decoding: s = s(x & (2 n 1)) (e.g. tabled, alias method) D(x) = (s, f s (x n) + ( x & (2 n 1)) c s ) Similar to Range Coding, but decoding has 1 multiplication (instead of 2), and state is 1 number (instead of 2), making it convenient for SIMD vectorization. ( ). Additionally, we can use alias method to store s for very precise probabilities. It is also convenient for dynamical updating. Alias method: rearrange probability distribution into m buckets: containing the primary symbol and eventually a single alias symbol 8

9 uabs - uniform binary variant (A = {0,1}) - extremely accurate Assume binary alphabet, p Pr (1), denote x s = {y < x: s(y) = s} xp s For uniform symbol distribution we can choose: x 1 = xp x 0 = x x 1 = x xp s(x) = 1 if there is jump on next position: s = s(x) = (x + 1)p xp decoding function: D(x) = (s, x s ) its inverse coding function: C(0, x) = x p C(1, x) = x p For p = Pr(1) = 1 Pr(0) = 0.3: 9

10 Stream version renormalization Currently: we encode using succeeding C functions into huge number x, then decode (in reverse direction!) using succeeding D. Like in arithmetic coding, we need renormalization to limit working precision - enforce x I = {L,, bl 1} by transferring base b youngest digits: ANS decoding step from state x (s, x) = D(x); usesymbol(s); while x < L, x = bx + readdigit(); encoding step for symbol s from state x while x > maxx[s] // = bl s 1 {writedigit(mod(x, b)); x = x/b }; x = C(s, x); For unique decoding, we need to ensure that there is a single way to perform above loops: I = {L,, bl 1}, I s = {L s,, bl s 1} where I s = {x: C(s, x) I} Fulfilled e.g. for - rabs/rans when p s = f s /2 n has 1/L accuracy: 2 n divides L, - uabs when p has 1/L accuracy: b Lp = blp, - in tabled variants (tabs/tans) it will be fulfilled by construction. 10

11 Single step of stream version: to get x I to I s = {L s,, bl s 1}, we need to transfer k digits: x s (C(s, x/b k ), mod(x, b k )) where k = log b (x/l s ) for given s, observe that k can obtain one of two values: k = k s or k = k s 1 for k s = log b (p s ) = log b (L s /L) e. g. : b = 2, k s = 3, L s = 13, L = 66, b k sx + 3 = 115, x = 14, p s = 13/66: 11

12 General picture: encoder prepares before consuming succeeding symbol decoder produces symbol, then consumes succeeding digits Decoding is in reversed direction: we have stack of symbols (LIFO) - the final state has to be stored, but we can write information in initial state - encoding should be made in backward direction, but use context from perspective of future forward decoding. 12

13 In single step (I = {L,, bl 1}): lg(x) lg(x) + lg(1/p) Three sources of unpredictability/chaosity: 1) Asymmetry: behavior strongly dependent on chosen symbol small difference changes decoded symbol and so the entire behavior. 2) Ergodicity: usually log b (1/p) is irrational succeeding iterations cover entire range. 3) Diffusivity: C(s, x) is close but not exactly x/p s there is additional diffusion around expected value modulo lg(b) So lg(x) [lg(l), lg(l) + lg(b)) has nearly uniform distribution x has approximately: Pr(x) 1/x probability distribution contains lg (1/Pr (x)) lg (x) + const bits of information. 13

14 ΔH Redundancy: instead of p s we use q s = x/c(s, x) probability: ΔH 1 ln(4) (p s q s ) 2 ΔH 1 p s s s,x where inaccuracy: ε s (x) = p s x/c(s, x) drops like 1/L: 1 ln(4) L 2 (1 + 1 ) for uabs, ΔH m p 1 p ln(4) Pr(x) (ε s(x)) ln(4) L 2 1 p s p s 2 s for rans Denoting L = 2 R, ΔH 4 R and grows proportionally to (alphabet size) 2. for uabs: 14

15 Tabled variants: tans, tabs (choose L = 2 R ) I = {L,,2L 1}, I s = {L s,,2l s 1} where I s = {x: C(s, x) I} we will choose s symbol distribution (symbol spread): #{x I: s(x) = s} = L s approximating probabilities: p s L s /L Fast pseudorandom spread ( ): step = 5/8 L + 3; // step chosen to cover the whole range X = 0; // X = x L {0,, L 1} for table handling for s = 0 to m 1 do {symbol[x] = s; X = mod(x + step, L); } // symbol[x] = s(x + L) Then finding table for {t = decodingtable(x); usesymbol(t. symbol); X = t. newx + readbits(t. nbbits)} decoding: for s = 0 to m 1 do next[s] = L s ; // symbol appearance number for X = 0 to L 1 do // fill all positions {t. symbol = symbol[x]; // from symbol spread x = next[t. symbol] + +; // use symbol and shift appearance t. nbbits = R lg(x) ; // L = 2 R t. newx = (x t. nbbits) L; decodingtable[x] = t; } 15

16 Precise initialization (heuresis) N s = { 0.5+i p s : i = 0,, L s 1} are uniform we need to shift them to natural numbers. (priority queue with put, getminv) for s = 0 to n 1 do put((0.5/p s, s)); for X = 0 to L 1 do {(v, s) = getminv; put((v + 1/p s, s)); symbol[x] = s; } ΔH drops like 1/L 2 L alphabet size 16

17 tabs and tuning test all possible symbol distributions for binary alphabet store tables for quantized probabilities (p) e.g = 256 bytes for 16 state, ΔH H.264 M decoder (arith. coding) tabs t = decodingtable[p][x]; X = t. newx + readbits(t. nbbits); usesymbol(t. symbol); no branches, no bit-by-bit renormalization state is single number (e.g. SIMD) 17

18 Additional tans advantage simultaneous encryption we can use huge freedom while initialization: choosing symbol distribution slightly disturb s(x) using PRNG initialized with cryptographic key ADVANTAGES comparing to standard (symmetric) cryptography: standard, e.g. DES, AES ANS based cryptography (initialized) based on XOR, permutations highly nonlinear operations bit blocks fixed length pseudorandomly varying lengths brute force or QC attacks just start decoding to test cryptokey perform initialization first for new cryptokey, fixed to need e.g. 0.1s speed online calculation most calculations while initialization entropy operates on bits operates on any input distribution 18

19 Summary we can construct accurate and extremely fast entropy coders: - accurate (and faster) replacement for Huffman coding, - many times faster replacement for Range coding, - faster decoding in adaptive binary case (but more complex encoding), - can simultaneously encrypt the message. Perfect for: DCT/wavelet coefficients, lossless image compression Perfect with: Lempel-Ziv, Burrows-Wheeler Transform... and many others... Further research: - finding symbol distribution s(x): with minimal ΔH (and quickly), - tune accordingly to p s L s /L approximation (e.g. very low probable symbols should have single appearance at the end of table), - optimal compression of used probability distribution to reduce headers, - maybe finding other low state entropy coding family, like forward decoded, - applying cryptographic capabilities (also without entropy coding), -...? 19

Data Compression Techniques

Data Compression Techniques Data Compression Techniques Part 1: Entropy Coding Lecture 4: Asymmetric Numeral Systems Juha Kärkkäinen 08.11.2017 1 / 19 Asymmetric Numeral Systems Asymmetric numeral systems (ANS) is a recent entropy

More information

Asymmetric numeral systems.

Asymmetric numeral systems. 1 Asymmetric numeral systems. arxiv:0902.0271v5 [cs.it] 21 May 2009 Jarek Duda Jagiellonian University, Poland, email: dudaj@interia.pl Abstract In this paper will be presented new approach to entropy

More information

arxiv: v1 [cs.it] 3 Nov 2015

arxiv: v1 [cs.it] 3 Nov 2015 DESIGNING DEDICATED DATA COMPRESSION FOR PHYSICS EXPERIMENTS WITHIN FPGA ALREADY USED FOR DATA ACQUISITION Jarek Duda Grzegorz Korcyl Faculty of Mathematics and Computer Science, Jagiellonian University,

More information

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE General e Image Coder Structure Motion Video x(s 1,s 2,t) or x(s 1,s 2 ) Natural Image Sampling A form of data compression; usually lossless, but can be lossy Redundancy Removal Lossless compression: predictive

More information

Digital communication system. Shannon s separation principle

Digital communication system. Shannon s separation principle Digital communication system Representation of the source signal by a stream of (binary) symbols Adaptation to the properties of the transmission channel information source source coder channel coder modulation

More information

Information and Entropy

Information and Entropy Information and Entropy Shannon s Separation Principle Source Coding Principles Entropy Variable Length Codes Huffman Codes Joint Sources Arithmetic Codes Adaptive Codes Thomas Wiegand: Digital Image Communication

More information

Basic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols.

Basic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols. Universal Lossless coding Lempel-Ziv Coding Basic principles of lossless compression Historical review Variable-length-to-block coding Lempel-Ziv coding 1 Basic Principles of Lossless Coding 1. Exploit

More information

Compression. Encryption. Decryption. Decompression. Presentation of Information to client site

Compression. Encryption. Decryption. Decompression. Presentation of Information to client site DOCUMENT Anup Basu Audio Image Video Data Graphics Objectives Compression Encryption Network Communications Decryption Decompression Client site Presentation of Information to client site Multimedia -

More information

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria Source Coding Master Universitario en Ingeniería de Telecomunicación I. Santamaría Universidad de Cantabria Contents Introduction Asymptotic Equipartition Property Optimal Codes (Huffman Coding) Universal

More information

Shannon-Fano-Elias coding

Shannon-Fano-Elias coding Shannon-Fano-Elias coding Suppose that we have a memoryless source X t taking values in the alphabet {1, 2,..., L}. Suppose that the probabilities for all symbols are strictly positive: p(i) > 0, i. The

More information

Image Compression. Qiaoyong Zhong. November 19, CAS-MPG Partner Institute for Computational Biology (PICB)

Image Compression. Qiaoyong Zhong. November 19, CAS-MPG Partner Institute for Computational Biology (PICB) Image Compression Qiaoyong Zhong CAS-MPG Partner Institute for Computational Biology (PICB) November 19, 2012 1 / 53 Image Compression The art and science of reducing the amount of data required to represent

More information

Summary of Last Lectures

Summary of Last Lectures Lossless Coding IV a k p k b k a 0.16 111 b 0.04 0001 c 0.04 0000 d 0.16 110 e 0.23 01 f 0.07 1001 g 0.06 1000 h 0.09 001 i 0.15 101 100 root 1 60 1 0 0 1 40 0 32 28 23 e 17 1 0 1 0 1 0 16 a 16 d 15 i

More information

SIGNAL COMPRESSION Lecture Shannon-Fano-Elias Codes and Arithmetic Coding

SIGNAL COMPRESSION Lecture Shannon-Fano-Elias Codes and Arithmetic Coding SIGNAL COMPRESSION Lecture 3 4.9.2007 Shannon-Fano-Elias Codes and Arithmetic Coding 1 Shannon-Fano-Elias Coding We discuss how to encode the symbols {a 1, a 2,..., a m }, knowing their probabilities,

More information

Image and Multidimensional Signal Processing

Image and Multidimensional Signal Processing Image and Multidimensional Signal Processing Professor William Hoff Dept of Electrical Engineering &Computer Science http://inside.mines.edu/~whoff/ Image Compression 2 Image Compression Goal: Reduce amount

More information

Autumn Coping with NP-completeness (Conclusion) Introduction to Data Compression

Autumn Coping with NP-completeness (Conclusion) Introduction to Data Compression Autumn Coping with NP-completeness (Conclusion) Introduction to Data Compression Kirkpatrick (984) Analogy from thermodynamics. The best crystals are found by annealing. First heat up the material to let

More information

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information 4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information Ramji Venkataramanan Signal Processing and Communications Lab Department of Engineering ramji.v@eng.cam.ac.uk

More information

Digital Communications III (ECE 154C) Introduction to Coding and Information Theory

Digital Communications III (ECE 154C) Introduction to Coding and Information Theory Digital Communications III (ECE 154C) Introduction to Coding and Information Theory Tara Javidi These lecture notes were originally developed by late Prof. J. K. Wolf. UC San Diego Spring 2014 1 / 8 I

More information

Compression and Coding

Compression and Coding Compression and Coding Theory and Applications Part 1: Fundamentals Gloria Menegaz 1 Transmitter (Encoder) What is the problem? Receiver (Decoder) Transformation information unit Channel Ordering (significance)

More information

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

Multimedia Communications. Mathematical Preliminaries for Lossless Compression Multimedia Communications Mathematical Preliminaries for Lossless Compression What we will see in this chapter Definition of information and entropy Modeling a data source Definition of coding and when

More information

Data Compression Techniques

Data Compression Techniques Data Compression Techniques Part 2: Text Compression Lecture 5: Context-Based Compression Juha Kärkkäinen 14.11.2017 1 / 19 Text Compression We will now look at techniques for text compression. These techniques

More information

Entropy Coding. Connectivity coding. Entropy coding. Definitions. Lossles coder. Input: a set of symbols Output: bitstream. Idea

Entropy Coding. Connectivity coding. Entropy coding. Definitions. Lossles coder. Input: a set of symbols Output: bitstream. Idea Connectivity coding Entropy Coding dd 7, dd 6, dd 7, dd 5,... TG output... CRRRLSLECRRE Entropy coder output Connectivity data Edgebreaker output Digital Geometry Processing - Spring 8, Technion Digital

More information

CHAPTER 3 CHAOTIC MAPS BASED PSEUDO RANDOM NUMBER GENERATORS

CHAPTER 3 CHAOTIC MAPS BASED PSEUDO RANDOM NUMBER GENERATORS 24 CHAPTER 3 CHAOTIC MAPS BASED PSEUDO RANDOM NUMBER GENERATORS 3.1 INTRODUCTION Pseudo Random Number Generators (PRNGs) are widely used in many applications, such as numerical analysis, probabilistic

More information

CSEP 590 Data Compression Autumn Arithmetic Coding

CSEP 590 Data Compression Autumn Arithmetic Coding CSEP 590 Data Compression Autumn 2007 Arithmetic Coding Reals in Binary Any real number x in the interval [0,1) can be represented in binary as.b 1 b 2... where b i is a bit. x 0 0 1 0 1... binary representation

More information

Lecture 4 : Adaptive source coding algorithms

Lecture 4 : Adaptive source coding algorithms Lecture 4 : Adaptive source coding algorithms February 2, 28 Information Theory Outline 1. Motivation ; 2. adaptive Huffman encoding ; 3. Gallager and Knuth s method ; 4. Dictionary methods : Lempel-Ziv

More information

Topics. Probability Theory. Perfect Secrecy. Information Theory

Topics. Probability Theory. Perfect Secrecy. Information Theory Topics Probability Theory Perfect Secrecy Information Theory Some Terms (P,C,K,E,D) Computational Security Computational effort required to break cryptosystem Provable Security Relative to another, difficult

More information

Wavelet Scalable Video Codec Part 1: image compression by JPEG2000

Wavelet Scalable Video Codec Part 1: image compression by JPEG2000 1 Wavelet Scalable Video Codec Part 1: image compression by JPEG2000 Aline Roumy aline.roumy@inria.fr May 2011 2 Motivation for Video Compression Digital video studio standard ITU-R Rec. 601 Y luminance

More information

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1 Kraft s inequality An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if N 2 l i 1 Proof: Suppose that we have a tree code. Let l max = max{l 1,...,

More information

CSCI 2570 Introduction to Nanocomputing

CSCI 2570 Introduction to Nanocomputing CSCI 2570 Introduction to Nanocomputing Information Theory John E Savage What is Information Theory Introduced by Claude Shannon. See Wikipedia Two foci: a) data compression and b) reliable communication

More information

2018/5/3. YU Xiangyu

2018/5/3. YU Xiangyu 2018/5/3 YU Xiangyu yuxy@scut.edu.cn Entropy Huffman Code Entropy of Discrete Source Definition of entropy: If an information source X can generate n different messages x 1, x 2,, x i,, x n, then the

More information

Multimedia. Multimedia Data Compression (Lossless Compression Algorithms)

Multimedia. Multimedia Data Compression (Lossless Compression Algorithms) Course Code 005636 (Fall 2017) Multimedia Multimedia Data Compression (Lossless Compression Algorithms) Prof. S. M. Riazul Islam, Dept. of Computer Engineering, Sejong University, Korea E-mail: riaz@sejong.ac.kr

More information

Chapter 2 Source Models and Entropy. Any information-generating process can be viewed as. computer program in executed form: binary 0

Chapter 2 Source Models and Entropy. Any information-generating process can be viewed as. computer program in executed form: binary 0 Part II Information Theory Concepts Chapter 2 Source Models and Entropy Any information-generating process can be viewed as a source: { emitting a sequence of symbols { symbols from a nite alphabet text:

More information

Chapter 2: Source coding

Chapter 2: Source coding Chapter 2: meghdadi@ensil.unilim.fr University of Limoges Chapter 2: Entropy of Markov Source Chapter 2: Entropy of Markov Source Markov model for information sources Given the present, the future is independent

More information

ELEC 515 Information Theory. Distortionless Source Coding

ELEC 515 Information Theory. Distortionless Source Coding ELEC 515 Information Theory Distortionless Source Coding 1 Source Coding Output Alphabet Y={y 1,,y J } Source Encoder Lengths 2 Source Coding Two coding requirements The source sequence can be recovered

More information

MARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for

MARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for MARKOV CHAINS A finite state Markov chain is a sequence S 0,S 1,... of discrete cv s from a finite alphabet S where q 0 (s) is a pmf on S 0 and for n 1, Q(s s ) = Pr(S n =s S n 1 =s ) = Pr(S n =s S n 1

More information

CSEP 521 Applied Algorithms Spring Statistical Lossless Data Compression

CSEP 521 Applied Algorithms Spring Statistical Lossless Data Compression CSEP 52 Applied Algorithms Spring 25 Statistical Lossless Data Compression Outline for Tonight Basic Concepts in Data Compression Entropy Prefix codes Huffman Coding Arithmetic Coding Run Length Coding

More information

4. Quantization and Data Compression. ECE 302 Spring 2012 Purdue University, School of ECE Prof. Ilya Pollak

4. Quantization and Data Compression. ECE 302 Spring 2012 Purdue University, School of ECE Prof. Ilya Pollak 4. Quantization and Data Compression ECE 32 Spring 22 Purdue University, School of ECE Prof. What is data compression? Reducing the file size without compromising the quality of the data stored in the

More information

Reduce the amount of data required to represent a given quantity of information Data vs information R = 1 1 C

Reduce the amount of data required to represent a given quantity of information Data vs information R = 1 1 C Image Compression Background Reduce the amount of data to represent a digital image Storage and transmission Consider the live streaming of a movie at standard definition video A color frame is 720 480

More information

Turbo Compression. Andrej Rikovsky, Advisor: Pavol Hanus

Turbo Compression. Andrej Rikovsky, Advisor: Pavol Hanus Turbo Compression Andrej Rikovsky, Advisor: Pavol Hanus Abstract Turbo codes which performs very close to channel capacity in channel coding can be also used to obtain very efficient source coding schemes.

More information

encoding without prediction) (Server) Quantization: Initial Data 0, 1, 2, Quantized Data 0, 1, 2, 3, 4, 8, 16, 32, 64, 128, 256

encoding without prediction) (Server) Quantization: Initial Data 0, 1, 2, Quantized Data 0, 1, 2, 3, 4, 8, 16, 32, 64, 128, 256 General Models for Compression / Decompression -they apply to symbols data, text, and to image but not video 1. Simplest model (Lossless ( encoding without prediction) (server) Signal Encode Transmit (client)

More information

Motivation for Arithmetic Coding

Motivation for Arithmetic Coding Motivation for Arithmetic Coding Motivations for arithmetic coding: 1) Huffman coding algorithm can generate prefix codes with a minimum average codeword length. But this length is usually strictly greater

More information

Text Compression. Jayadev Misra The University of Texas at Austin December 5, A Very Incomplete Introduction to Information Theory 2

Text Compression. Jayadev Misra The University of Texas at Austin December 5, A Very Incomplete Introduction to Information Theory 2 Text Compression Jayadev Misra The University of Texas at Austin December 5, 2003 Contents 1 Introduction 1 2 A Very Incomplete Introduction to Information Theory 2 3 Huffman Coding 5 3.1 Uniquely Decodable

More information

Huffman Coding. C.M. Liu Perceptual Lab, College of Computer Science National Chiao-Tung University

Huffman Coding. C.M. Liu Perceptual Lab, College of Computer Science National Chiao-Tung University Huffman Coding C.M. Liu Perceptual Lab, College of Computer Science National Chiao-Tung University http://www.csie.nctu.edu.tw/~cmliu/courses/compression/ Office: EC538 (03)573877 cmliu@cs.nctu.edu.tw

More information

Chapter 9 Fundamental Limits in Information Theory

Chapter 9 Fundamental Limits in Information Theory Chapter 9 Fundamental Limits in Information Theory Information Theory is the fundamental theory behind information manipulation, including data compression and data transmission. 9.1 Introduction o For

More information

CS4800: Algorithms & Data Jonathan Ullman

CS4800: Algorithms & Data Jonathan Ullman CS4800: Algorithms & Data Jonathan Ullman Lecture 22: Greedy Algorithms: Huffman Codes Data Compression and Entropy Apr 5, 2018 Data Compression How do we store strings of text compactly? A (binary) code

More information

CHAPTER 8 COMPRESSION ENTROPY ESTIMATION OF HEART RATE VARIABILITY AND COMPUTATION OF ITS RENORMALIZED ENTROPY

CHAPTER 8 COMPRESSION ENTROPY ESTIMATION OF HEART RATE VARIABILITY AND COMPUTATION OF ITS RENORMALIZED ENTROPY 108 CHAPTER 8 COMPRESSION ENTROPY ESTIMATION OF HEART RATE VARIABILITY AND COMPUTATION OF ITS RENORMALIZED ENTROPY 8.1 INTRODUCTION Klimontovich s S-theorem offers an approach to compare two different

More information

Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet)

Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet) Compression Motivation Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet) Storage: Store large & complex 3D models (e.g. 3D scanner

More information

UNIT I INFORMATION THEORY. I k log 2

UNIT I INFORMATION THEORY. I k log 2 UNIT I INFORMATION THEORY Claude Shannon 1916-2001 Creator of Information Theory, lays the foundation for implementing logic in digital circuits as part of his Masters Thesis! (1939) and published a paper

More information

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 5 Other Coding Techniques Instructional Objectives At the end of this lesson, the students should be able to:. Convert a gray-scale image into bit-plane

More information

Lecture 1: Shannon s Theorem

Lecture 1: Shannon s Theorem Lecture 1: Shannon s Theorem Lecturer: Travis Gagie January 13th, 2015 Welcome to Data Compression! I m Travis and I ll be your instructor this week. If you haven t registered yet, don t worry, we ll work

More information

Cryptography - Session 2

Cryptography - Session 2 Cryptography - Session 2 O. Geil, Aalborg University November 18, 2010 Random variables Discrete random variable X: 1. Probability distribution on finite set X. 2. For x X write Pr(x) = Pr(X = x). X and

More information

Applications of Information Theory in Science and in Engineering

Applications of Information Theory in Science and in Engineering Applications of Information Theory in Science and in Engineering Some Uses of Shannon Entropy (Mostly) Outside of Engineering and Computer Science Mário A. T. Figueiredo 1 Communications: The Big Picture

More information

1 Introduction to information theory

1 Introduction to information theory 1 Introduction to information theory 1.1 Introduction In this chapter we present some of the basic concepts of information theory. The situations we have in mind involve the exchange of information through

More information

Ch 0 Introduction. 0.1 Overview of Information Theory and Coding

Ch 0 Introduction. 0.1 Overview of Information Theory and Coding Ch 0 Introduction 0.1 Overview of Information Theory and Coding Overview The information theory was founded by Shannon in 1948. This theory is for transmission (communication system) or recording (storage

More information

Introduction p. 1 Compression Techniques p. 3 Lossless Compression p. 4 Lossy Compression p. 5 Measures of Performance p. 5 Modeling and Coding p.

Introduction p. 1 Compression Techniques p. 3 Lossless Compression p. 4 Lossy Compression p. 5 Measures of Performance p. 5 Modeling and Coding p. Preface p. xvii Introduction p. 1 Compression Techniques p. 3 Lossless Compression p. 4 Lossy Compression p. 5 Measures of Performance p. 5 Modeling and Coding p. 6 Summary p. 10 Projects and Problems

More information

Source Coding Techniques

Source Coding Techniques Source Coding Techniques. Huffman Code. 2. Two-pass Huffman Code. 3. Lemple-Ziv Code. 4. Fano code. 5. Shannon Code. 6. Arithmetic Code. Source Coding Techniques. Huffman Code. 2. Two-path Huffman Code.

More information

BASICS OF COMPRESSION THEORY

BASICS OF COMPRESSION THEORY BASICS OF COMPRESSION THEORY Why Compression? Task: storage and transport of multimedia information. E.g.: non-interlaced HDTV: 0x0x0x = Mb/s!! Solutions: Develop technologies for higher bandwidth Find

More information

Compressing Tabular Data via Pairwise Dependencies

Compressing Tabular Data via Pairwise Dependencies Compressing Tabular Data via Pairwise Dependencies Amir Ingber, Yahoo! Research TCE Conference, June 22, 2017 Joint work with Dmitri Pavlichin, Tsachy Weissman (Stanford) Huge datasets: everywhere - Internet

More information

Stream Codes. 6.1 The guessing game

Stream Codes. 6.1 The guessing game About Chapter 6 Before reading Chapter 6, you should have read the previous chapter and worked on most of the exercises in it. We ll also make use of some Bayesian modelling ideas that arrived in the vicinity

More information

Multimedia Information Systems

Multimedia Information Systems Multimedia Information Systems Samson Cheung EE 639, Fall 2004 Lecture 3 & 4: Color, Video, and Fundamentals of Data Compression 1 Color Science Light is an electromagnetic wave. Its color is characterized

More information

Entropy as a measure of surprise

Entropy as a measure of surprise Entropy as a measure of surprise Lecture 5: Sam Roweis September 26, 25 What does information do? It removes uncertainty. Information Conveyed = Uncertainty Removed = Surprise Yielded. How should we quantify

More information

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability

More information

3F1: Signals and Systems INFORMATION THEORY Examples Paper Solutions

3F1: Signals and Systems INFORMATION THEORY Examples Paper Solutions Engineering Tripos Part IIA THIRD YEAR 3F: Signals and Systems INFORMATION THEORY Examples Paper Solutions. Let the joint probability mass function of two binary random variables X and Y be given in the

More information

ECE Advanced Communication Theory, Spring 2009 Homework #1 (INCOMPLETE)

ECE Advanced Communication Theory, Spring 2009 Homework #1 (INCOMPLETE) ECE 74 - Advanced Communication Theory, Spring 2009 Homework #1 (INCOMPLETE) 1. A Huffman code finds the optimal codeword to assign to a given block of source symbols. (a) Show that cannot be a Huffman

More information

SIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding

SIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding SIGNAL COMPRESSION Lecture 7 Variable to Fix Encoding 1. Tunstall codes 2. Petry codes 3. Generalized Tunstall codes for Markov sources (a presentation of the paper by I. Tabus, G. Korodi, J. Rissanen.

More information

Public-key Cryptography: Theory and Practice

Public-key Cryptography: Theory and Practice Public-key Cryptography Theory and Practice Department of Computer Science and Engineering Indian Institute of Technology Kharagpur Appendix A: Symmetric Techniques Block Ciphers A block cipher f of block-size

More information

Information Theory. Week 4 Compressing streams. Iain Murray,

Information Theory. Week 4 Compressing streams. Iain Murray, Information Theory http://www.inf.ed.ac.uk/teaching/courses/it/ Week 4 Compressing streams Iain Murray, 2014 School of Informatics, University of Edinburgh Jensen s inequality For convex functions: E[f(x)]

More information

! Where are we on course map? ! What we did in lab last week. " How it relates to this week. ! Compression. " What is it, examples, classifications

! Where are we on course map? ! What we did in lab last week.  How it relates to this week. ! Compression.  What is it, examples, classifications Lecture #3 Compression! Where are we on course map?! What we did in lab last week " How it relates to this week! Compression " What is it, examples, classifications " Probability based compression # Huffman

More information

17.1 Binary Codes Normal numbers we use are in base 10, which are called decimal numbers. Each digit can be 10 possible numbers: 0, 1, 2, 9.

17.1 Binary Codes Normal numbers we use are in base 10, which are called decimal numbers. Each digit can be 10 possible numbers: 0, 1, 2, 9. ( c ) E p s t e i n, C a r t e r, B o l l i n g e r, A u r i s p a C h a p t e r 17: I n f o r m a t i o n S c i e n c e P a g e 1 CHAPTER 17: Information Science 17.1 Binary Codes Normal numbers we use

More information

Neural Markovian Predictive Compression: An Algorithm for Online Lossless Data Compression

Neural Markovian Predictive Compression: An Algorithm for Online Lossless Data Compression Neural Markovian Predictive Compression: An Algorithm for Online Lossless Data Compression Erez Shermer 1 Mireille Avigal 1 Dana Shapira 1,2 1 Dept. of Computer Science, The Open University of Israel,

More information

L. Yaroslavsky. Fundamentals of Digital Image Processing. Course

L. Yaroslavsky. Fundamentals of Digital Image Processing. Course L. Yaroslavsky. Fundamentals of Digital Image Processing. Course 0555.330 Lec. 6. Principles of image coding The term image coding or image compression refers to processing image digital data aimed at

More information

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code Chapter 3 Source Coding 3. An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code 3. An Introduction to Source Coding Entropy (in bits per symbol) implies in average

More information

Chapter 2 Date Compression: Source Coding. 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code

Chapter 2 Date Compression: Source Coding. 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code Chapter 2 Date Compression: Source Coding 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code 2.1 An Introduction to Source Coding Source coding can be seen as an efficient way

More information

Image Compression. Fundamentals: Coding redundancy. The gray level histogram of an image can reveal a great deal of information about the image

Image Compression. Fundamentals: Coding redundancy. The gray level histogram of an image can reveal a great deal of information about the image Fundamentals: Coding redundancy The gray level histogram of an image can reveal a great deal of information about the image That probability (frequency) of occurrence of gray level r k is p(r k ), p n

More information

Efficient Alphabet Partitioning Algorithms for Low-complexity Entropy Coding

Efficient Alphabet Partitioning Algorithms for Low-complexity Entropy Coding Efficient Alphabet Partitioning Algorithms for Low-complexity Entropy Coding Amir Said (said@ieee.org) Hewlett Packard Labs, Palo Alto, CA, USA Abstract We analyze the technique for reducing the complexity

More information

repetition, part ii Ole-Johan Skrede INF Digital Image Processing

repetition, part ii Ole-Johan Skrede INF Digital Image Processing repetition, part ii Ole-Johan Skrede 24.05.2017 INF2310 - Digital Image Processing Department of Informatics The Faculty of Mathematics and Natural Sciences University of Oslo today s lecture Coding and

More information

PART III. Outline. Codes and Cryptography. Sources. Optimal Codes (I) Jorge L. Villar. MAMME, Fall 2015

PART III. Outline. Codes and Cryptography. Sources. Optimal Codes (I) Jorge L. Villar. MAMME, Fall 2015 Outline Codes and Cryptography 1 Information Sources and Optimal Codes 2 Building Optimal Codes: Huffman Codes MAMME, Fall 2015 3 Shannon Entropy and Mutual Information PART III Sources Information source:

More information

+ (50% contribution by each member)

+ (50% contribution by each member) Image Coding using EZW and QM coder ECE 533 Project Report Ahuja, Alok + Singh, Aarti + + (50% contribution by each member) Abstract This project involves Matlab implementation of the Embedded Zerotree

More information

ELEMENT OF INFORMATION THEORY

ELEMENT OF INFORMATION THEORY History Table of Content ELEMENT OF INFORMATION THEORY O. Le Meur olemeur@irisa.fr Univ. of Rennes 1 http://www.irisa.fr/temics/staff/lemeur/ October 2010 1 History Table of Content VERSION: 2009-2010:

More information

Compression. What. Why. Reduce the amount of information (bits) needed to represent image Video: 720 x 480 res, 30 fps, color

Compression. What. Why. Reduce the amount of information (bits) needed to represent image Video: 720 x 480 res, 30 fps, color Compression What Reduce the amount of information (bits) needed to represent image Video: 720 x 480 res, 30 fps, color Why 720x480x20x3 = 31,104,000 bytes/sec 30x60x120 = 216 Gigabytes for a 2 hour movie

More information

Codes and Cryptography. Jorge L. Villar. MAMME, Fall 2015 PART XII

Codes and Cryptography. Jorge L. Villar. MAMME, Fall 2015 PART XII Codes and Cryptography MAMME, Fall 2015 PART XII Outline 1 Symmetric Encryption (II) 2 Construction Strategies Construction Strategies Stream ciphers: For arbitrarily long messages (e.g., data streams).

More information

Optimal codes - I. A code is optimal if it has the shortest codeword length L. i i. This can be seen as an optimization problem. min.

Optimal codes - I. A code is optimal if it has the shortest codeword length L. i i. This can be seen as an optimization problem. min. Huffman coding Optimal codes - I A code is optimal if it has the shortest codeword length L L m = i= pl i i This can be seen as an optimization problem min i= li subject to D m m i= lp Gabriele Monfardini

More information

Scalar and Vector Quantization. National Chiao Tung University Chun-Jen Tsai 11/06/2014

Scalar and Vector Quantization. National Chiao Tung University Chun-Jen Tsai 11/06/2014 Scalar and Vector Quantization National Chiao Tung University Chun-Jen Tsai 11/06/014 Basic Concept of Quantization Quantization is the process of representing a large, possibly infinite, set of values

More information

Lecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1)

Lecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1) 3- Mathematical methods in communication Lecture 3 Lecturer: Haim Permuter Scribe: Yuval Carmel, Dima Khaykin, Ziv Goldfeld I. REMINDER A. Convex Set A set R is a convex set iff, x,x 2 R, θ, θ, θx + θx

More information

Non-binary Distributed Arithmetic Coding

Non-binary Distributed Arithmetic Coding Non-binary Distributed Arithmetic Coding by Ziyang Wang Thesis submitted to the Faculty of Graduate and Postdoctoral Studies In partial fulfillment of the requirements For the Masc degree in Electrical

More information

COMM901 Source Coding and Compression. Quiz 1

COMM901 Source Coding and Compression. Quiz 1 German University in Cairo - GUC Faculty of Information Engineering & Technology - IET Department of Communication Engineering Winter Semester 2013/2014 Students Name: Students ID: COMM901 Source Coding

More information

Chair for Network Architectures and Services Institute of Informatics TU München Prof. Carle. Network Security. Chapter 2 Basics

Chair for Network Architectures and Services Institute of Informatics TU München Prof. Carle. Network Security. Chapter 2 Basics Chair for Network Architectures and Services Institute of Informatics TU München Prof. Carle Network Security Chapter 2 Basics 2.4 Random Number Generation for Cryptographic Protocols Motivation It is

More information

CS6304 / Analog and Digital Communication UNIT IV - SOURCE AND ERROR CONTROL CODING PART A 1. What is the use of error control coding? The main use of error control coding is to reduce the overall probability

More information

Joint error correction enhancement of the fountain codes concept

Joint error correction enhancement of the fountain codes concept 1 Joint error correction enhancement of the fountain codes concept Jarek Duda Jagiellonian University, Golebia 24, 31-007 Krakow, Poland, Email: dudajar@gmail.com arxiv:1505.07056v2 [cs.it] 18 Aug 2015

More information

Information Theory, Statistics, and Decision Trees

Information Theory, Statistics, and Decision Trees Information Theory, Statistics, and Decision Trees Léon Bottou COS 424 4/6/2010 Summary 1. Basic information theory. 2. Decision trees. 3. Information theory and statistics. Léon Bottou 2/31 COS 424 4/6/2010

More information

Lecture 2: Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments

Lecture 2: Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments Lecture 2: Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments Dr. Jian Zhang Conjoint Associate Professor NICTA & CSE UNSW COMP9519 Multimedia Systems S2 2006 jzhang@cse.unsw.edu.au

More information

Sample solutions to Homework 4, Information-Theoretic Modeling (Fall 2014)

Sample solutions to Homework 4, Information-Theoretic Modeling (Fall 2014) Sample solutions to Homework 4, Information-Theoretic Modeling (Fall 204) Jussi Määttä October 2, 204 Question [First, note that we use the symbol! as an end-of-message symbol. When we see it, we know

More information

Basic Principles of Video Coding

Basic Principles of Video Coding Basic Principles of Video Coding Introduction Categories of Video Coding Schemes Information Theory Overview of Video Coding Techniques Predictive coding Transform coding Quantization Entropy coding Motion

More information

Lecture 3 : Algorithms for source coding. September 30, 2016

Lecture 3 : Algorithms for source coding. September 30, 2016 Lecture 3 : Algorithms for source coding September 30, 2016 Outline 1. Huffman code ; proof of optimality ; 2. Coding with intervals : Shannon-Fano-Elias code and Shannon code ; 3. Arithmetic coding. 1/39

More information

Lecture 12. Block Diagram

Lecture 12. Block Diagram Lecture 12 Goals Be able to encode using a linear block code Be able to decode a linear block code received over a binary symmetric channel or an additive white Gaussian channel XII-1 Block Diagram Data

More information

Information Theory and Coding Techniques

Information Theory and Coding Techniques Information Theory and Coding Techniques Lecture 1.2: Introduction and Course Outlines Information Theory 1 Information Theory and Coding Techniques Prof. Ja-Ling Wu Department of Computer Science and

More information

Communications Theory and Engineering

Communications Theory and Engineering Communications Theory and Engineering Master's Degree in Electronic Engineering Sapienza University of Rome A.A. 2018-2019 AEP Asymptotic Equipartition Property AEP In information theory, the analog of

More information

Image Encryption and Decryption Algorithm Using Two Dimensional Cellular Automata Rules In Cryptography

Image Encryption and Decryption Algorithm Using Two Dimensional Cellular Automata Rules In Cryptography Image Encryption and Decryption Algorithm Using Two Dimensional Cellular Automata Rules In Cryptography P. Sanoop Kumar Department of CSE, Gayatri Vidya Parishad College of Engineering(A), Madhurawada-530048,Visakhapatnam,

More information

Introduction to information theory and coding

Introduction to information theory and coding Introduction to information theory and coding Louis WEHENKEL Set of slides No 5 State of the art in data compression Stochastic processes and models for information sources First Shannon theorem : data

More information

Outline. Computer Science 418. Number of Keys in the Sum. More on Perfect Secrecy, One-Time Pad, Entropy. Mike Jacobson. Week 3

Outline. Computer Science 418. Number of Keys in the Sum. More on Perfect Secrecy, One-Time Pad, Entropy. Mike Jacobson. Week 3 Outline Computer Science 48 More on Perfect Secrecy, One-Time Pad, Mike Jacobson Department of Computer Science University of Calgary Week 3 2 3 Mike Jacobson (University of Calgary) Computer Science 48

More information

Lecture 5. Ideal and Almost-Ideal SG.

Lecture 5. Ideal and Almost-Ideal SG. Lecture 5. Ideal and Almost-Ideal SG. Definition. SG is called ideal (perfect or unconditionally undetectable ISG),if its detection is equivalently to random guessing of this fact even with the use of

More information