Coding for Discrete Source

Similar documents
Chapter 9 Fundamental Limits in Information Theory

Digital Communications III (ECE 154C) Introduction to Coding and Information Theory

MARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code

Chapter 2: Source coding

Chapter 2 Date Compression: Source Coding. 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code

COMM901 Source Coding and Compression. Quiz 1

EGR 544 Communication Theory

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

Information and Entropy

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

4. Quantization and Data Compression. ECE 302 Spring 2012 Purdue University, School of ECE Prof. Ilya Pollak

Motivation for Arithmetic Coding

CSCI 2570 Introduction to Nanocomputing

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

MAHALAKSHMI ENGINEERING COLLEGE-TRICHY QUESTION BANK UNIT V PART-A. 1. What is binary symmetric channel (AUC DEC 2006)

Chapter 5: Data Compression

3F1 Information Theory, Lecture 3

EC2252 COMMUNICATION THEORY UNIT 5 INFORMATION THEORY

MAHALAKSHMI ENGINEERING COLLEGE QUESTION BANK. SUBJECT CODE / Name: EC2252 COMMUNICATION THEORY UNIT-V INFORMATION THEORY PART-A


3F1 Information Theory, Lecture 3

Lecture 4 : Adaptive source coding algorithms

at Some sort of quantization is necessary to represent continuous signals in digital form

Basic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols.

Principles of Communications

Communications Theory and Engineering

UNIT I INFORMATION THEORY. I k log 2

Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet)

1 Introduction to information theory

Information Theory CHAPTER. 5.1 Introduction. 5.2 Entropy

Digital communication system. Shannon s separation principle

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

EE-597 Notes Quantization

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16

C.M. Liu Perceptual Signal Processing Lab College of Computer Science National Chiao-Tung University

CHAPTER 3. P (B j A i ) P (B j ) =log 2. j=1

3F1: Signals and Systems INFORMATION THEORY Examples Paper Solutions

Information Theory and Statistics Lecture 2: Source coding

Lecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1)

ECE 587 / STA 563: Lecture 5 Lossless Compression

ECE 587 / STA 563: Lecture 5 Lossless Compression

Multimedia. Multimedia Data Compression (Lossless Compression Algorithms)

Coding of memoryless sources 1/35

Image and Multidimensional Signal Processing

EE376A - Information Theory Midterm, Tuesday February 10th. Please start answering each question on a new page of the answer booklet.

Revision of Lecture 5

lossless, optimal compressor

Lecture 22: Final Review

Introduction to Information Theory. By Prof. S.J. Soni Asst. Professor, CE Department, SPCE, Visnagar

Block 2: Introduction to Information Theory

Compression and Coding

Fixed-Length-Parsing Universal Compression with Side Information

Multimedia Communications. Scalar Quantization

2018/5/3. YU Xiangyu

Introduction p. 1 Compression Techniques p. 3 Lossless Compression p. 4 Lossy Compression p. 5 Measures of Performance p. 5 Modeling and Coding p.

SIDDHARTH GROUP OF INSTITUTIONS :: PUTTUR Siddharth Nagar, Narayanavanam Road UNIT I

Module 1. Introduction to Digital Communications and Information Theory. Version 2 ECE IIT, Kharagpur

10-704: Information Processing and Learning Fall Lecture 9: Sept 28

ELEC 515 Information Theory. Distortionless Source Coding

Ch 0 Introduction. 0.1 Overview of Information Theory and Coding

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H.

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

Scalar and Vector Quantization. National Chiao Tung University Chun-Jen Tsai 11/06/2014

Lecture 3 : Algorithms for source coding. September 30, 2016

Channel capacity. Outline : 1. Source entropy 2. Discrete memoryless channel 3. Mutual information 4. Channel capacity 5.

Source Coding: Part I of Fundamentals of Source and Video Coding

SIGNAL COMPRESSION Lecture Shannon-Fano-Elias Codes and Arithmetic Coding

Exercise 1. = P(y a 1)P(a 1 )

ELECTRONICS & COMMUNICATIONS DIGITAL COMMUNICATIONS

Text Compression. Jayadev Misra The University of Texas at Austin December 5, A Very Incomplete Introduction to Information Theory 2

Entropy as a measure of surprise

ECE Advanced Communication Theory, Spring 2009 Homework #1 (INCOMPLETE)

The information loss in quantization

(Classical) Information Theory II: Source coding

Upper Bounds on the Capacity of Binary Intermittent Communication

Lecture 1 : Data Compression and Entropy

Capacity of a channel Shannon s second theorem. Information Theory 1/33

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

Chapter 5. Data Compression

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

Basic Principles of Video Coding

Principles of Communications

Information Sources. Professor A. Manikas. Imperial College London. EE303 - Communication Systems An Overview of Fundamentals

Source Coding Techniques

Lecture 1: Shannon s Theorem

18.2 Continuous Alphabet (discrete-time, memoryless) Channel

Lecture 16. Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code

DCSP-3: Minimal Length Coding. Jianfeng Feng

Data Compression Techniques

Lecture 1. Introduction

Information Theory with Applications, Math6397 Lecture Notes from September 30, 2014 taken by Ilknur Telkes

10-704: Information Processing and Learning Fall Lecture 10: Oct 3

EE376A: Homework #2 Solutions Due by 11:59pm Thursday, February 1st, 2018

Exercises with solutions (Set B)

Lecture 8: Shannon s Noise Models

Variable-to-Variable Codes with Small Redundancy Rates

Information Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18

Chapter I: Fundamental Information Theory

Transcription:

EGR 544 Communication Theory 3. Coding for Discrete Sources Z. Aliyazicioglu Electrical and Computer Engineering Department Cal Poly Pomona Coding for Discrete Source Coding Represent source data effectively in digital form for transmission or storage A measure of the efficiency of a source-encoding method can be obtained by comparing the average number of binary digits per output letter from the source to the entropy H(X). Two types of source coding ossless (Huffman coding algorithm, embel-ziv Algorithm..) ossy (rate-distortion, quantization, waveform coding..) X Source encoding bits Channel transmission bits Source decoding _ X _ X _ X = X X Cal Poly Pomona Electrical & Computer Engineering Dept. EGR 544-

Coding for Discrete Memoryless Source DMS source produce an output letter every τ s second. Source has finite alphabet of symbol x i, i=,, with probabilities P (x i ) The entropy of the DMS in bits per source symbol is H ( X) = P( x )log P( x ) log i= If symbols have same probability i i= i H ( X) = log = log The source rate in bits/s is H( X)/ τ s Cal Poly Pomona Electrical & Computer Engineering Dept. EGR 544-3 Fixed-length code words et s assign a unique set of R binary digits to each symbols Since there is possible symbols, R will gives us code rate in bits per symbols as R = log When is not a power of, it is R = log + log denotes the largest integer less than log Since H( X) log R H( X) H( X) R Ratio shows the efficiency of the encoding for DMS Cal Poly Pomona Electrical & Computer Engineering Dept. EGR 544-4

When is power of and source letters are equally probable, Fixed length code of R bits per symbol attains 00 percent efficiency R = H( X) When is not power of and source letters are equally probable, R will be different than H(X) at most bit. Shannon coding Theorem: Based on the sequences, the lossless coding exists as long as R H(X). ossless code does not exits for any R<H(X). Cal Poly Pomona Electrical & Computer Engineering Dept. EGR 544-5 Variable-length code words When the source symbols are not equally probable, a more efficient encoding methods is variable-length code words. Use the probabilities of occurrence of each source letter in the selection of the code word. This is called entropy coding Example: etter P(a ) Code I Code II Code III a a a 3 a 4 ½ ¼ /8 /8 00 0 0 0 0 0 0 0 0 Cal Poly Pomona Electrical & Computer Engineering Dept. EGR 544-6

Variable-length code words Code II is uniquely decodable and instantaneously decodable Code tree for Code II a a a 3 0 0 0 Code III is uniquely decodable but not instantaneously decodable Code tree for Code III a 4 0 a a a 3 Find a procedure to construct uniquely decodable variable-length codes that is efficient for R = np( a) average number of bits per source letter. = Cal Poly Pomona Electrical & Computer Engineering Dept. EGR 544-7 a 4 Kraft Inequality The codeword lengths n n n of a uniquely decodable for discrete variable X must satisfy the Kraft inequality condition = n The codeword lengths n n n of a uniquely decodable for discrete variable X must satisfy the Kraft inequality condition Cal Poly Pomona Electrical & Computer Engineering Dept. EGR 544-8

Source Coding Theorem et X be DMS random variable with finite entropy H(X), and the output letters x,. The corresponding probabilities p,. Construct a code that satisfies the prefix condition and has average length H( X) R < H( X) + Cal Poly Pomona Electrical & Computer Engineering Dept. EGR 544-9 Huffman Coding It is a variable-length encoding algorithm. It is optimal in sense of provides average number of binary digits per symbols It is based on the source letter probabilities P (x i ), i=,,, Example: etter X X X 3 X 4 X 5 X 6 X 7 Probabilities 0.35 0.30 0.0 0.0 0.04 0.005 0.005 Self Information.546.7370.39 3.39 4.6439 7.6439 7.6439 H( X ) =. R =. Code 00 0 0 0 0 0 Cal Poly Pomona Electrical & Computer Engineering Dept. EGR 544-0

Huffman Coding Efficiency is 0.95 An example of variable-length source encoding for a DMS Cal Poly Pomona Electrical & Computer Engineering Dept. EGR 544- Huffman Coding R =. Efficiency is 0.95 An alternative code for the DMS etter X X X 3 X 4 X 5 X 6 X 7 Code 0 0 0 0 0 0 Cal Poly Pomona Electrical & Computer Engineering Dept. EGR 544-

The empel-ziv Algorithm Huffman coding gives minimum the average code length and satisfy the prefix condition To design Huffman coding, we need to now the probabilities of occurrence of all the source letter. In practice, the statistic of a source output are often unnown. Huffman coding methods in generally impractical The empel-ziv source coding algorithm is designed to be independent of the source statistics. A given string of source symbols is parsed into variable-length blocs, which are called phrases The phrases listed in the dictionary New phrase will be one of the minimum length that has not appeared before Does not wor well for short string Often use in practice compress and uncompress utility (Z77, ZIP) Cal Poly Pomona Electrical & Computer Engineering Dept. EGR 544-3 Example et's loo at the binary sequence: 00000000000000000000000000000 Parsing the sequence as the following phrases: 0,0,,0,00,00,0,0,00,000, 00,000,000,000, 0000 Code the prefix number using the same 0s and s that also occur as characters in the string, We need our coded strings to have a fixed length. Since we have 6 strings, we will need 4 bits Starting from the first non-empty string (see Position Number in the table below), we also chec what the Prefix is (that is the piece of the string before its last digit) and the Position Number of that prefix The coded string is then constructed by taing the Position Number of the Prefix, and following that by the last bit of the string that we are considering. Cal Poly Pomona Electrical & Computer Engineering Dept. EGR 544-4

String Position Number of this string Dictionary ocation Prefix Position Number of Prefix Coded String 0 000 emty 0000 00000 0 3 000 00 0 emty 000 0000 000 0000 0 4 000 0 000 000 00 5 00 0 000 0000 00 6 00 0 000 0000 0 0 7 8 0 000 0 00 0 000 0 00 9 00 00 00 00 000 0 00 0 000 0000 00 000 0 00 000 3 0 000 4 0 0000 5 Cal Poly Pomona Electrical & Computer Engineering Dept. EGR 544-5 Coding For Analog Sources Source has band a limited stochastic process signal X(t). Sampling X(t) at the Nyquist rate converts the X(t) signal to discrete time sequence Then, we can quantize and encode the discrete time sequence A simple encoding is to represent each discrete amplitude level by a sequence of binary digits. et s we have level, we need R=log if is power of we need R= log + if is not power of If the levels are not equally probable, and probabilities of the output levels are nown, we can use Huffman coding to improve the efficiency. Cal Poly Pomona Electrical & Computer Engineering Dept. EGR 544-6

Rate-Distortion function Quantization of an amplitude of sampled signal is a ind of data compression. It introduces some distortion of the waveform. Idea is to have minimum distortion et s defined the distortion Some Measure of the difference between the actual source sample value { x } and the corresponding quantized value { x } d( x, x ) Commonly used distortion function is the squared-error distortion d( x, x ) = ( x x ) Cal Poly Pomona Electrical & Computer Engineering Dept. EGR 544-7 Rate-Distortion function The average distortion between a sequence of n samples, and quantized value is the n output samples X n n d(x,x ) = d( x, x ) n n n n n = The source is random process and X n will be random process. Therefore d(x,x ) is random variable n n The expected value of the distortion value is D n D = E d( X, X ) = E d( x, x ) = E d( x, x ) n n n n n = [ ] [ ] X n Cal Poly Pomona Electrical & Computer Engineering Dept. EGR 544-8

Rate-Distortion Function et s have memoryless continuous source signal output x X with a PDF p(x) The quantized output signal x X The distortion per sample d( x, x ) The minimum rate in bits per sample to represent X of memorlyless source with a distortion less than or equal to D is called the rate-distortion function R(D) RD ( ) = min I(X;X) p( xx ): E[ d(x,x)] D Where I(X;X) is the mutual information between X and X R(D) increases as D increases Cal Poly Pomona Electrical & Computer Engineering Dept. EGR 544-9 Rate-Distortion Function for a Memorless Gaussian Source The minimum information rate necessary to represent the output of the discrete-time, continues-memoryless Gaussian source based on a mean-square-error distortion measure per symbol is given(shannon, 959) σ log ( x ) (0 D σ Rg ( D) = D 0 ( D > σ x ) WE can represent D in terms of R as D ( R) = R σ Is called distortion-rate function g x x 0log D ( R) = 6R+ 0log σ in db 0 g 0 Cal Poly Pomona Electrical & Computer Engineering Dept. EGR 544-0 x

Scalar Quantization If we now the PDF of the source signal amplitude, the quantizer can be optimized Design the optimum scalar quantizer that minimize some function of the quantization error q = x x The distortion can be given as D = f( x x) p( x) dx Where f ( x x) is the desired function of the error. The optimum quantizer is that minimize D by optimally selecting the output level and corresponding input range of each output level Cal Poly Pomona Electrical & Computer Engineering Dept. EGR 544- Scalar Quantization We can treat the the quantized source value x as letter with probabilities {p } since discrete amplitude. X = { x, } If the signal amplitudes are statistically independent, its entropy is given H ( X ) = p log p = Cal Poly Pomona Electrical & Computer Engineering Dept. EGR 544-

Scalar Quantization Example: 4 level nonuniform quantizer for the Gaussian distribution signal Probabilities: p=p4=0.635 for outer level p=p3=0.3365 for inner level The entropy for the discrete sources H( X ) = p log p =.9 = The entropy coding can be achieved to.9 and the distortion will be 9.30db Cal Poly Pomona Electrical & Computer Engineering Dept. EGR 544-3 Vector Quantization Considering the joint quantization of a bloc of signal samples or a bloc of signal parameters in quantization is called bloc or vector quantization It is used in speech coding for digital cellular phone Better performance sense of rate distortion is better than scaler quantization Formulation of vector quantization et s have n-dimensional vector X=[x,x,,x n ] with real valued continuous-amplitude components {x, n} and the joint PDF are given by p(x,x, x n ). ~ ets have quantized value of X that n-dimensional vector X with components {x ~, n} Cal Poly Pomona Electrical & Computer Engineering Dept. EGR 544-4

Vector Quantization Quantization will be in the form X = Q(X) Vector quantization of blocs of data can be classified into a discrete number of categories or cells that some way to minimize error distortion Quantization of two-dimensional vector X=[x,x ]. Two-dimensional space partitioned into cells, where hexagonal-shape cells {C }. All input vectors that fall in the cell C are quantized into the vector X which is shown as the center of the hexagonal. Cal Poly Pomona Electrical & Computer Engineering Dept. EGR 544-5 Vector Quantization For quantization of n-dimensional vector X is vector ~ X. The ~ quantization error or distortion is d(x,x). The average distortion over the set of input vector X D = P(X C ) E[ d(x,x) X C] = = P(X C ) d(x,x) p(x) dx = X C Where P(X C ) is the probability that the vector X falls in the cell C and p(x) is the joint PDF of the n random variables. To minimize D, we can select the cell {C, } for a given PDF p (X). Cal Poly Pomona Electrical & Computer Engineering Dept. EGR 544-6

Vector Quantization The common distortion measurement for vector quantization is the mean square error n d(x,x) = (X-X)'(X,X)= ( x x) n n = The vectors can be transmitted at an average bit rate of H (X) R = Bits per sample n ~ Where H(X) is entropy of the quantized source output H(X) = p(x)log i p(x) i i= The minimum distortion will be D n (R) D ( R) = min E[ d(x,x) n Q(X) Cal Poly Pomona Electrical & Computer Engineering Dept. EGR 544-7