ASYMMETRIC NUMERAL SYSTEMS: ADDING FRACTIONAL BITS TO HUFFMAN CODER

Similar documents
Data Compression Techniques

Asymmetric numeral systems.

arxiv: v1 [cs.it] 3 Nov 2015

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE

Digital communication system. Shannon s separation principle

Information and Entropy

Basic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols.

Compression. Encryption. Decryption. Decompression. Presentation of Information to client site

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

Shannon-Fano-Elias coding

Image Compression. Qiaoyong Zhong. November 19, CAS-MPG Partner Institute for Computational Biology (PICB)

Summary of Last Lectures

SIGNAL COMPRESSION Lecture Shannon-Fano-Elias Codes and Arithmetic Coding

Image and Multidimensional Signal Processing

Autumn Coping with NP-completeness (Conclusion) Introduction to Data Compression

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

Digital Communications III (ECE 154C) Introduction to Coding and Information Theory

Compression and Coding

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

Data Compression Techniques

Entropy Coding. Connectivity coding. Entropy coding. Definitions. Lossles coder. Input: a set of symbols Output: bitstream. Idea

CHAPTER 3 CHAOTIC MAPS BASED PSEUDO RANDOM NUMBER GENERATORS

CSEP 590 Data Compression Autumn Arithmetic Coding

Lecture 4 : Adaptive source coding algorithms

Topics. Probability Theory. Perfect Secrecy. Information Theory

Wavelet Scalable Video Codec Part 1: image compression by JPEG2000

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

CSCI 2570 Introduction to Nanocomputing

2018/5/3. YU Xiangyu

Multimedia. Multimedia Data Compression (Lossless Compression Algorithms)

Chapter 2 Source Models and Entropy. Any information-generating process can be viewed as. computer program in executed form: binary 0

Chapter 2: Source coding

ELEC 515 Information Theory. Distortionless Source Coding

MARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for

CSEP 521 Applied Algorithms Spring Statistical Lossless Data Compression

4. Quantization and Data Compression. ECE 302 Spring 2012 Purdue University, School of ECE Prof. Ilya Pollak

Reduce the amount of data required to represent a given quantity of information Data vs information R = 1 1 C

Turbo Compression. Andrej Rikovsky, Advisor: Pavol Hanus

encoding without prediction) (Server) Quantization: Initial Data 0, 1, 2, Quantized Data 0, 1, 2, 3, 4, 8, 16, 32, 64, 128, 256

Motivation for Arithmetic Coding

Text Compression. Jayadev Misra The University of Texas at Austin December 5, A Very Incomplete Introduction to Information Theory 2

Huffman Coding. C.M. Liu Perceptual Lab, College of Computer Science National Chiao-Tung University

Chapter 9 Fundamental Limits in Information Theory

CS4800: Algorithms & Data Jonathan Ullman

CHAPTER 8 COMPRESSION ENTROPY ESTIMATION OF HEART RATE VARIABILITY AND COMPUTATION OF ITS RENORMALIZED ENTROPY

Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet)

UNIT I INFORMATION THEORY. I k log 2

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Lecture 1: Shannon s Theorem

Cryptography - Session 2

Applications of Information Theory in Science and in Engineering

1 Introduction to information theory

Ch 0 Introduction. 0.1 Overview of Information Theory and Coding

Introduction p. 1 Compression Techniques p. 3 Lossless Compression p. 4 Lossy Compression p. 5 Measures of Performance p. 5 Modeling and Coding p.

Source Coding Techniques

BASICS OF COMPRESSION THEORY

Compressing Tabular Data via Pairwise Dependencies

Stream Codes. 6.1 The guessing game

Multimedia Information Systems

Entropy as a measure of surprise

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

3F1: Signals and Systems INFORMATION THEORY Examples Paper Solutions

ECE Advanced Communication Theory, Spring 2009 Homework #1 (INCOMPLETE)

SIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding

Public-key Cryptography: Theory and Practice

Information Theory. Week 4 Compressing streams. Iain Murray,

! Where are we on course map? ! What we did in lab last week. " How it relates to this week. ! Compression. " What is it, examples, classifications

17.1 Binary Codes Normal numbers we use are in base 10, which are called decimal numbers. Each digit can be 10 possible numbers: 0, 1, 2, 9.

Neural Markovian Predictive Compression: An Algorithm for Online Lossless Data Compression

L. Yaroslavsky. Fundamentals of Digital Image Processing. Course

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code

Chapter 2 Date Compression: Source Coding. 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code

Image Compression. Fundamentals: Coding redundancy. The gray level histogram of an image can reveal a great deal of information about the image

Efficient Alphabet Partitioning Algorithms for Low-complexity Entropy Coding

repetition, part ii Ole-Johan Skrede INF Digital Image Processing

PART III. Outline. Codes and Cryptography. Sources. Optimal Codes (I) Jorge L. Villar. MAMME, Fall 2015

+ (50% contribution by each member)

ELEMENT OF INFORMATION THEORY

Compression. What. Why. Reduce the amount of information (bits) needed to represent image Video: 720 x 480 res, 30 fps, color

Codes and Cryptography. Jorge L. Villar. MAMME, Fall 2015 PART XII

Optimal codes - I. A code is optimal if it has the shortest codeword length L. i i. This can be seen as an optimization problem. min.

Scalar and Vector Quantization. National Chiao Tung University Chun-Jen Tsai 11/06/2014

Lecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1)

Non-binary Distributed Arithmetic Coding

COMM901 Source Coding and Compression. Quiz 1

Chair for Network Architectures and Services Institute of Informatics TU München Prof. Carle. Network Security. Chapter 2 Basics


Joint error correction enhancement of the fountain codes concept

Information Theory, Statistics, and Decision Trees

Lecture 2: Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments

Sample solutions to Homework 4, Information-Theoretic Modeling (Fall 2014)

Basic Principles of Video Coding

Lecture 3 : Algorithms for source coding. September 30, 2016

Lecture 12. Block Diagram

Information Theory and Coding Techniques

Communications Theory and Engineering

Image Encryption and Decryption Algorithm Using Two Dimensional Cellular Automata Rules In Cryptography

Introduction to information theory and coding

Outline. Computer Science 418. Number of Keys in the Sum. More on Perfect Secrecy, One-Time Pad, Entropy. Mike Jacobson. Week 3

Lecture 5. Ideal and Almost-Ideal SG.

Transcription:

ASYMMETRIC NUMERAL SYSTEMS: ADDING FRACTIONAL BITS TO HUFFMAN CODER Huffman coding Arithmetic coding fast, but operates on integer number of bits: approximates probabilities with powers of ½, getting inferior compression rate accurate, but many times slower (computationally more demanding) Asymmetric Numeral Systems (ANS) accurate and faster than Huffman we construct low state automaton optimized for given probability distribution Jarek Duda Kraków, 28-04-2014 1

We need n bits of information to choose one of 2 n possibilities. For length n 0/1 sequences with pn of 1, how many bits we need to choose one? Entropy coder: encodes sequence with (p i ) i=1..m probability distribution using asymptotically at least H = m i=1 p i lg(1/p i ) bits/symbol (H lg(m)) Seen as weighted average: symbol of probability p contains lg(1/p) bits. Encoding (p i ) distribution with entropy coder optimal for (q i ) distribution costs ΔH = p i lg(1/q i ) p i lg(1/p i ) = p i lg ( p i ) 1 i i q i ln (4) i p i more bits/symbol - so called Kullback-Leibler distance (not symmetric). i (p i q i ) 2 2

Symbol frequencies from a version of the British National Corpus containing 67 million words ( http://www.yorku.ca/mack/uist2011.html#f1 ): Uniform: lg(27) 4. 75 bits/symbol (x 27x + s) Huffman uses: H = i p i r i 4. 12 bits/symbol Shannon: H = i p i lg(1/p i ) 4. 08 bits/symbol We can theoretically improve by 1% here ΔH 0. 04 bits/symbol order 1 Markov: ~3.3 bits/symbol order 2: ~3.1 bits/symbol, word: ~2.1 bits/symbol Currently the best text compression: durlica kingsize (http://mattmahoney.net/dc/text.html ) 10 9 bytes of text from Wikipedia (enwik9) into 127784888 bytes: 1bit/symbol (lossy) video compression: ~ 1000 reduction 3

Huffman codes encode symbols as bit sequences Perfect if symbol probabilities are powers of 2 e.g. p(a) =1/2, p(b)=1/4, p(c)=1/4 A 0 0 B 1 1 C A 0 C 11 B 10 We can reduce ΔH by grouping m symbols together (alphabet size 2 m or 3 m ): Generally, the maximal depth (R = max r i ) grows proportionally to m here, i ΔH drops approximately proportionally to 1/m, ΔH 1/R (= 1/lg (L)) for ANS: ΔH 4 R (= 1/L 2 for L = 2 R is the number of states) 4

Fast Huffman decoding step for maximal depth R: let X contain last R bits (requires decodingtable of size proportional to L = 2 R ): t = decodingtable[x]; // X {0,, 2 R 1} is current state usesymbol(t. symbol); // use or store decoded symbol X = t. newx + readbits(t. nbbits); // state transition where t. newx for Huffman are unused bits of the state, shifted to oldest position: t. newx = (X nbbits) & (2 R 1) (mod(a, 2 R ) = a & (2 R 1)) tans: the same decoder, different t. newx: not only shifts the unused bits, but also modifies them accordingly to the remaining fractional bits (lg(1/p)): x = X + L {L,, 2L 1} buffer containing lg(x) [R, R + 1) bits: 5

Operating on fractional number of bits restricting range L to length l subrange number x contains lg(l/l) bits contains lg(x) bits adding symbol of probability p - containing lg(1/p) bits lg(l/l) + lg(1/p) lg(l/l ) bits for l p l lg(x) + lg(1/p) = lg(x ) for x x/p 6

Asymmetric numeral systems redefine even/odd numbers: symbol distribution: s: N A (A alphabet, e.g. {0,1}) ( s(x) = mod(x, b) for base b numeral system: C(s, x) = bx + s) Should be still uniformly distributed but with density p s : # { 0 y < x: s(y) = s} xp s then x becomes x-th appearance of given symbol: D(x ) = (s(x ), # {0 y < x : s(y) = s(x )}) C(D(x )) = x D(C(s, x)) = (s, x) x x/p s example: range asymmetric binary systems (rabs) Pr(0) = 1/4 Pr(1) = 3/4 - take base 4 system and merge 3 digits, cyclic (0123) symbol distribution s becomes cyclic (0111): s(x) = 0 if mod(x, 4) = 0, else 1 to decode or encode 1, localize quadruple ( x/4 or x/3 ) if s(x) = 0, D(x) = (0, x/4 ) else D(x) = (1,3 x/4 + mod(x, 4) 1) C(0, x) = 4x C(1, x) = 4 x/3 + mod(x, 3) + 1 7

rans - range variant for large alphabet A = {0,.., m 1} assume Pr(s) = f s /2 n c s : = f 0 + f 1 + + f s 1 start with base 2 n numeral system and merge length f s ranges for x {0,1,, 2 n 1}, s(x) = max{s: c s x} encoding: C(s, x) = x/f s n + mod(x, f s ) + c s decoding: s = s(x & (2 n 1)) (e.g. tabled, alias method) D(x) = (s, f s (x n) + ( x & (2 n 1)) c s ) Similar to Range Coding, but decoding has 1 multiplication (instead of 2), and state is 1 number (instead of 2), making it convenient for SIMD vectorization. ( https://github.com/rygorous/ryg_rans ). Additionally, we can use alias method to store s for very precise probabilities. It is also convenient for dynamical updating. Alias method: rearrange probability distribution into m buckets: containing the primary symbol and eventually a single alias symbol 8

uabs - uniform binary variant (A = {0,1}) - extremely accurate Assume binary alphabet, p Pr (1), denote x s = {y < x: s(y) = s} xp s For uniform symbol distribution we can choose: x 1 = xp x 0 = x x 1 = x xp s(x) = 1 if there is jump on next position: s = s(x) = (x + 1)p xp decoding function: D(x) = (s, x s ) its inverse coding function: C(0, x) = x+1 1 1 p C(1, x) = x p For p = Pr(1) = 1 Pr(0) = 0.3: 9

Stream version renormalization Currently: we encode using succeeding C functions into huge number x, then decode (in reverse direction!) using succeeding D. Like in arithmetic coding, we need renormalization to limit working precision - enforce x I = {L,, bl 1} by transferring base b youngest digits: ANS decoding step from state x (s, x) = D(x); usesymbol(s); while x < L, x = bx + readdigit(); encoding step for symbol s from state x while x > maxx[s] // = bl s 1 {writedigit(mod(x, b)); x = x/b }; x = C(s, x); For unique decoding, we need to ensure that there is a single way to perform above loops: I = {L,, bl 1}, I s = {L s,, bl s 1} where I s = {x: C(s, x) I} Fulfilled e.g. for - rabs/rans when p s = f s /2 n has 1/L accuracy: 2 n divides L, - uabs when p has 1/L accuracy: b Lp = blp, - in tabled variants (tabs/tans) it will be fulfilled by construction. 10

Single step of stream version: to get x I to I s = {L s,, bl s 1}, we need to transfer k digits: x s (C(s, x/b k ), mod(x, b k )) where k = log b (x/l s ) for given s, observe that k can obtain one of two values: k = k s or k = k s 1 for k s = log b (p s ) = log b (L s /L) e. g. : b = 2, k s = 3, L s = 13, L = 66, b k sx + 3 = 115, x = 14, p s = 13/66: 11

General picture: encoder prepares before consuming succeeding symbol decoder produces symbol, then consumes succeeding digits Decoding is in reversed direction: we have stack of symbols (LIFO) - the final state has to be stored, but we can write information in initial state - encoding should be made in backward direction, but use context from perspective of future forward decoding. 12

In single step (I = {L,, bl 1}): lg(x) lg(x) + lg(1/p) Three sources of unpredictability/chaosity: 1) Asymmetry: behavior strongly dependent on chosen symbol small difference changes decoded symbol and so the entire behavior. 2) Ergodicity: usually log b (1/p) is irrational succeeding iterations cover entire range. 3) Diffusivity: C(s, x) is close but not exactly x/p s there is additional diffusion around expected value modulo lg(b) So lg(x) [lg(l), lg(l) + lg(b)) has nearly uniform distribution x has approximately: Pr(x) 1/x probability distribution contains lg (1/Pr (x)) lg (x) + const bits of information. 13

ΔH Redundancy: instead of p s we use q s = x/c(s, x) probability: ΔH 1 ln(4) (p s q s ) 2 ΔH 1 p s s s,x where inaccuracy: ε s (x) = p s x/c(s, x) drops like 1/L: 1 ln(4) L 2 (1 + 1 ) for uabs, ΔH m p 1 p ln(4) Pr(x) (ε s(x)) ln(4) L 2 1 p s p s 2 s for rans Denoting L = 2 R, ΔH 4 R and grows proportionally to (alphabet size) 2. for uabs: 14

Tabled variants: tans, tabs (choose L = 2 R ) I = {L,,2L 1}, I s = {L s,,2l s 1} where I s = {x: C(s, x) I} we will choose s symbol distribution (symbol spread): #{x I: s(x) = s} = L s approximating probabilities: p s L s /L Fast pseudorandom spread (https://github.com/cyan4973/finitestateentropy ): step = 5/8 L + 3; // step chosen to cover the whole range X = 0; // X = x L {0,, L 1} for table handling for s = 0 to m 1 do {symbol[x] = s; X = mod(x + step, L); } // symbol[x] = s(x + L) Then finding table for {t = decodingtable(x); usesymbol(t. symbol); X = t. newx + readbits(t. nbbits)} decoding: for s = 0 to m 1 do next[s] = L s ; // symbol appearance number for X = 0 to L 1 do // fill all positions {t. symbol = symbol[x]; // from symbol spread x = next[t. symbol] + +; // use symbol and shift appearance t. nbbits = R lg(x) ; // L = 2 R t. newx = (x t. nbbits) L; decodingtable[x] = t; } 15

Precise initialization (heuresis) N s = { 0.5+i p s : i = 0,, L s 1} are uniform we need to shift them to natural numbers. (priority queue with put, getminv) for s = 0 to n 1 do put((0.5/p s, s)); for X = 0 to L 1 do {(v, s) = getminv; put((v + 1/p s, s)); symbol[x] = s; } ΔH drops like 1/L 2 L alphabet size 16

tabs and tuning test all possible symbol distributions for binary alphabet store tables for quantized probabilities (p) e.g. 16 16 = 256 bytes for 16 state, ΔH 0.001 H.264 M decoder (arith. coding) tabs t = decodingtable[p][x]; X = t. newx + readbits(t. nbbits); usesymbol(t. symbol); no branches, no bit-by-bit renormalization state is single number (e.g. SIMD) 17

Additional tans advantage simultaneous encryption we can use huge freedom while initialization: choosing symbol distribution slightly disturb s(x) using PRNG initialized with cryptographic key ADVANTAGES comparing to standard (symmetric) cryptography: standard, e.g. DES, AES ANS based cryptography (initialized) based on XOR, permutations highly nonlinear operations bit blocks fixed length pseudorandomly varying lengths brute force or QC attacks just start decoding to test cryptokey perform initialization first for new cryptokey, fixed to need e.g. 0.1s speed online calculation most calculations while initialization entropy operates on bits operates on any input distribution 18

Summary we can construct accurate and extremely fast entropy coders: - accurate (and faster) replacement for Huffman coding, - many times faster replacement for Range coding, - faster decoding in adaptive binary case (but more complex encoding), - can simultaneously encrypt the message. Perfect for: DCT/wavelet coefficients, lossless image compression Perfect with: Lempel-Ziv, Burrows-Wheeler Transform... and many others... Further research: - finding symbol distribution s(x): with minimal ΔH (and quickly), - tune accordingly to p s L s /L approximation (e.g. very low probable symbols should have single appearance at the end of table), - optimal compression of used probability distribution to reduce headers, - maybe finding other low state entropy coding family, like forward decoded, - applying cryptographic capabilities (also without entropy coding), -...? 19