CODING SAMPLE DIFFERENCES ATTEMPT 1: NAIVE DIFFERENTIAL CODING

Similar documents
Multimedia Communications. Differential Coding

Predictive Coding. Prediction Prediction in Images

Principles of Communications

Predictive Coding. Prediction

Digital Image Processing Lectures 25 & 26

Scalar and Vector Quantization. National Chiao Tung University Chun-Jen Tsai 11/06/2014

EE5585 Data Compression April 18, Lecture 23

Multimedia Networking ECE 599

Compression methods: the 1 st generation

Example: for source

Multimedia Communications. Scalar Quantization

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Introduction p. 1 Compression Techniques p. 3 Lossless Compression p. 4 Lossy Compression p. 5 Measures of Performance p. 5 Modeling and Coding p.

Lecture 7 Predictive Coding & Quantization

encoding without prediction) (Server) Quantization: Initial Data 0, 1, 2, Quantized Data 0, 1, 2, 3, 4, 8, 16, 32, 64, 128, 256

Pulse-Code Modulation (PCM) :

at Some sort of quantization is necessary to represent continuous signals in digital form

BASICS OF COMPRESSION THEORY

Audio Coding. Fundamentals Quantization Waveform Coding Subband Coding P NCTU/CSIE DSPLAB C.M..LIU

Chapter 10 Applications in Communications

Waveform-Based Coding: Outline

Image Data Compression

Image Compression using DPCM with LMS Algorithm

Basic Principles of Video Coding

Quantization 2.1 QUANTIZATION AND THE SOURCE ENCODER

E303: Communication Systems

Lloyd-Max Quantization of Correlated Processes: How to Obtain Gains by Receiver-Sided Time-Variant Codebooks

CS578- Speech Signal Processing

Review of Quantization. Quantization. Bring in Probability Distribution. L-level Quantization. Uniform partition

Source Coding: Part I of Fundamentals of Source and Video Coding

EE 121: Introduction to Digital Communication Systems. 1. Consider the following discrete-time communication system. There are two equallly likely

Video Coding with Motion Compensation for Groups of Pictures

7.1 Sampling and Reconstruction

EE 5345 Biomedical Instrumentation Lecture 12: slides

Time-domain representations

BASIC COMPRESSION TECHNIQUES

1. Probability density function for speech samples. Gamma. Laplacian. 2. Coding paradigms. =(2X max /2 B ) for a B-bit quantizer Δ Δ Δ Δ Δ

Image Compression - JPEG

Objectives of Image Coding

Overview. Analog capturing device (camera, microphone) PCM encoded or raw signal ( wav, bmp, ) A/D CONVERTER. Compressed bit stream (mp3, jpg, )

On Optimal Coding of Hidden Markov Sources

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE

EE5356 Digital Image Processing

Analysis of Rate-distortion Functions and Congestion Control in Scalable Internet Video Streaming

PROOF OF ZADOR-GERSHO THEOREM

Estimation-Theoretic Delayed Decoding of Predictively Encoded Video Sequences

Predictive Coding. Lossy or lossless. Feedforward or feedback. Intraframe or interframe. Fixed or Adaptive

Rate-Constrained Multihypothesis Prediction for Motion-Compensated Video Compression

Image Compression. Fundamentals: Coding redundancy. The gray level histogram of an image can reveal a great deal of information about the image

18.2 Continuous Alphabet (discrete-time, memoryless) Channel

A Video Codec Incorporating Block-Based Multi-Hypothesis Motion-Compensated Prediction

LOW COMPLEX FORWARD ADAPTIVE LOSS COMPRESSION ALGORITHM AND ITS APPLICATION IN SPEECH CODING

Vector Quantization. Institut Mines-Telecom. Marco Cagnazzo, MN910 Advanced Compression

SCELP: LOW DELAY AUDIO CODING WITH NOISE SHAPING BASED ON SPHERICAL VECTOR QUANTIZATION

Noise-Shaped Predictive Coding for Multiple Descriptions of a Colored Gaussian Source

EE-597 Notes Quantization

L. Yaroslavsky. Fundamentals of Digital Image Processing. Course

Lecture 2: Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments

Vector Quantization and Subband Coding

Information and Entropy

Design of a CELP coder and analysis of various quantization techniques

MARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for

Compression. Encryption. Decryption. Decompression. Presentation of Information to client site

Lecture 2: Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments. Tutorial 1. Acknowledgement and References for lectures 1 to 5

Source Coding for Compression

Optimizing Motion Vector Accuracy in Block-Based Video Coding

C.M. Liu Perceptual Signal Processing Lab College of Computer Science National Chiao-Tung University

Multimedia Systems Giorgio Leonardi A.A Lecture 4 -> 6 : Quantization

Transform Coding. Transform Coding Principle

Soft-Output Trellis Waveform Coding

Optimization of Variable-length Code for Data. Compression of memoryless Laplacian source

STATISTICS FOR EFFICIENT LINEAR AND NON-LINEAR PICTURE ENCODING

THE dictionary (Random House) definition of quantization

Joint Source-Channel Coding Optimized On Endto-End Distortion for Multimedia Source

PCM Reference Chapter 12.1, Communication Systems, Carlson. PCM.1

Applications of on-line prediction. in telecommunication problems

ON SCALABLE CODING OF HIDDEN MARKOV SOURCES. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose

Compression and Coding

Real-Time Audio and Video

Image and Multidimensional Signal Processing

Image Compression. Qiaoyong Zhong. November 19, CAS-MPG Partner Institute for Computational Biology (PICB)

The Secrets of Quantization. Nimrod Peleg Update: Sept. 2009

Multiple Description Coding: Proposed Methods And Video Application

( 1 k "information" I(X;Y) given by Y about X)

On the optimal block size for block-based, motion-compensated video coders. Sharp Labs of America, 5750 NW Pacic Rim Blvd, David L.

Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi

Class of waveform coders can be represented in this manner

Digital Signal Processing 2/ Advanced Digital Signal Processing Lecture 3, SNR, non-linear Quantisation Gerald Schuller, TU Ilmenau

Coding for Discrete Source

Wavelet Scalable Video Codec Part 1: image compression by JPEG2000

FROM ANALOGUE TO DIGITAL

Digital Image Processing

channel of communication noise Each codeword has length 2, and all digits are either 0 or 1. Such codes are called Binary Codes.

Chapter 9 Fundamental Limits in Information Theory

Rate-Distortion Based Temporal Filtering for. Video Compression. Beckman Institute, 405 N. Mathews Ave., Urbana, IL 61801

AN IMPROVED ADPCM DECODER BY ADAPTIVELY CONTROLLED QUANTIZATION INTERVAL CENTROIDS. Sai Han and Tim Fingscheidt

Lossless Image and Intra-frame Compression with Integer-to-Integer DST

arxiv: v1 [cs.it] 20 Jan 2018

Exercises with solutions (Set D)

16.36 Communication Systems Engineering

Transcription:

5 0 DPCM (Differential Pulse Code Modulation) Making scalar quantization work for a correlated source -- a sequential approach. Consider quantizing a slowly varying source (AR, Gauss, ρ =.95, σ 2 = 3.2). Direct scalar quant: R=2 D = 1.2-5 0 5 10 15 20 25 30 35 40 45 50 However, a typical sample is quite similar to its predecessor. Consider quantizing successive sample differences: E(X i -X i-1 ) 2 = σ 2 /10 = 1.03 < 3.2. 5 0-5 0 5 10 15 20 25 30 35 40 45 50 Scalar quant of (X i -X i-1 ): R=2 D = 1.2 1.03 3.2 =.39 DPCM-1 CODING SAMPLE DIFFERENCES ATTEMPT 1: NAIVE DIFFERENTIAL CODING X i U i bits V i = Q(U i) Y i encoder decoder delay X i-1 M-level scalar quantizer Y i-1 delay Assume X 0 = Y 0 = 0 Key equation: Y i = Y i-1 + Q(X i -X i-1 ) Consider the errors in the reproductions X 1 - Y 1 = X 1 - Q(X 1 ) X 2 - Y 2 = X 1 +U 2 - (Y 1 +V 2 ) = (X 1 -Y 1 ) + (U 2 -V 2 ) X 3 - Y 3 = X 2 +U 3 - (Y 2 +V 3 ) = (X 2 -Y 2 ) + (U 3 -V 3 ) = (X 1 -Y 1 ) + (U 2 -V 2 ) + (U 3 -V 3 ) X i - Y i = X i-1 +U i - (Y i-1 +V i ) = (X i-1 -Y i-1 ) + (U i -V i ) = (X 1 -Y 1 ) + (U 2 -V 2 ) + (U 3 -V 3 ) + + (U i -V i ) We see that the errors accumulate. This method doesn't work. DPCM-2

ATTEMPT 2: DIFFERENTIAL PULSE CODE MODULATION (DPCM) X i U i M-level V i = Q(U i) bits binary binary V i Y i scalar encoder decoder quantizer Assume Y 0 = 0 Key Equations: Y i-1 Y i-1 Y i delay delay Y i = Y i-1 + Q(X i -Y i-1 ) i.e. new reproduction equals old reproduction plus quantized prediction error X i -Y i = U i - Q(U i ) i.e. overall error = error introduced by scalar quantizer (X i - Y i = (Y i-1 +U i ) - (Y i-1 +Q(U i )) = U i - Q(U i )) Notes: This method workds well. U i X i-1. However U i = X i -Y i-1 X i -X i-1. So the U i 's usually have smaller variances than the X i 's. The predictive, differential style in DPCM is embedded in many other source coding techniques. DPCM-3 DPCM: ADAPTIVE QUANTIZATION VIEWPOINT Recall: Y i = Y i-1 + Q(X i -Y i-1 ) X i is quantized with shifted quantizer : Y i = b+q(x i -b), where b = Y i-1 Quantizer Q: Shifted quantizer b+q(x-b): 0 0 b DPCM is a kind of backward adaptive quantizer. The quantizer to use on X i is determined by the reconstructions of past X's. It is critical that the rule for determining how X i is to be quantized depends only on what the decoder knows. Backward adaptation is a common theme in quantization. Another example of backward adaption Quantize X i with quantizer Q scaled by Y i-1, i.e. with Y i = Y i-1 Q(X i /Y i-1 ). DPCM-4

FORWARD ADAPTATION Example of Forward Adaptation: Given X 1,,X N (e.g. N = 100 or 1000) and a collection of scalar quantizers {Q 1, Q 2,..., Q S } each with rate R o, choose Q k that gives least distortion on X 1,,X N, i.e. that minimizes N i=1 (X i -Q k (X i )) 2 (this is not avg. distortion!). Binary output: log 2 S bits describing index s of the chosen quantizer. e s (X 1 ),,e s (X N ) rate R = R o + log 2 K N bits/samples Implementation: (a) Try each quantizer to see which is best, or (b) Use a selection rule, e.g. if quantizers are shifted versions of some basic quantizer, choose the quantizer whose "middle" is closest to K 1 K i=1 X i. There are many possibilities for the Q k 's. Different shifts, different scales, different point densities. One might even redesign the quantizer based on X 1,,X N, and then lossy encode the quantizers. Choosing N large reduces log 2 K /N, but decreases the benefits of forward adaptation. Which is better, backward or forward adaption? No conclusive answer. DPCM-5 DPCM: THIRD VIEWPOINT -- ENCODER CONTROLS DECODER Recall: Y i = Y i-1 + V j where V j C = {w 1,,w M } are scalar levels. Y 0 = 0. Knowing X i and Y i-1, encoder chooses V j to be the level w j that makes Y i-1 +w j closest to X i ; i.e encoder causes (controls) decoder to make the best possible output. This is a greedy encoding algorithm. Looking ahead might do better. Consider the tree structure of DPCM decoder outputs, i.e. of reconstruction sequences shown to the right. Tree-encoding: Given N, search the tree for the sequence of N quantization levels in the tree that is closest to the source sequence. Send the indices of the levels in the chosen sequence. Use the usual DPCM decoder. etc. This method has never worked its way into practical use. Instead people use the usual DPCM greedy encoder. output levels at time 1 output levels at time 2 output levels at time 3 DPCM-6

DPCM WITH IMPROVED PREDICTION X i U i M-level V i = Q(U i) bits binary binary V i Y i scalar encoder decoder quantizer X i X i Y i predictor predictor Example: X i = g(y i-1, Y i-2,...) X i = a Y i-1 Most common: Nth-order linear prediction -- X i = N j=1 a j Y i-j, where N = order of predictor and a 1,,a N are the prediction coefficients. Ideally... choose a i 's to minimize E Ui 2 = E (X i - N j=1 a j Y i-j ) 2, But... this requires knowing the correlations E X i Y i-1,,e X i Y i-n, which in turn depend on the a i 's. Therefore, a i 's are usually chosen to minimize E (X i - N j=1 a j X i-j ) 2 = MSPE = mean square prediction error. That is, to predict X i from Y i-1,,y i-n, use the predictor that would minimize the MSPE if we were predicting X i from X i-1,,x i-n. This is a reasonable because if the DPCM system is working well, then X j Y j, j<i, and so E (X i - N j=1 a j Y i-j ) 2 N E (X i - aj X i-j ) 2 j=1 MINIMUM MEAN-SQUARED ERROR LINEAR PREDICTION DPCM-7 Assume X is a zero-mean, wide-sense stationary random process with autocorrelation function R X (k) EX i X i+k. Linear prediction theory shows that the Nth-order linear predictor of X i based on X i-1,...,x i-n that minimizes MSPE has coefficients a = (a 1,,a N ) t given by a = K -1 r where K is the N N correlation matrix of X 1,,X N, i.e. K ij = E X i X j = R X ( i-j ), K -1 is its inverse, r = (R X (1),,R X (N)) t. It can be shown that if K is not invertible, then the random process X is deterministic in the sense that there are coefficients b 1,...,b N-1 such that X i = b 1 X i-1 + + b N-1 X i-n+1. The resulting minimum mean-square prediction error is MMSPE N = σ 2 - a t r = σ 2 - r t K -1 r, and the prediction gain is G N = 10 log 10 σ 2 MMSPE N DPCM-8

It can also be shown that MMSPE N = K(N+1) K (N) where K (N) and K (N+1) denote, respectively, the N N and (N+1) (N+1) covariance matrices of X, and K denotes the determinant of the matrix K. For a typical random process, MMSEP N decreases with N to some limit as illustrated below mean squared error of best Nth order linear predictor 12345678910 N and the prediction gains G N increase with N to some limit, as illustrated in the example below. DPCM-9 Jayant & Noll, p. 271 TYPICAL PREDICTION GAINS FOR SPEECH For each of several speakers, R X (k) was emprically estimated and the prediction gains were computed for various N's. The range of G N values found is shown below. The speech was lowpass filtered, probably to 3kHz. DPCM-10

TYPICAL PREDICTION GAINS FOR IMAGES Prediction gains in interframe image coding from Jayant and Noll, Fig 6.8, p. 272. B -- prediction from pixel immediately to the left X =.965 X B C -- prediction from corresponding pixel in previous frame X =.977 X C BC -- prediction from both pixels X =.379 X B +.617 X C BCD -- prediction from the two aforementioned pixels plus the pixel to the left of the corresponding in the previous frame X =.746 X B +.825 X C -.594 X D For larger values of N, the predictor is based on N pixels from the present and the previous frame. DPCM-11 ASYMPTOTIC LIMIT OF MMSPE N For a wide sense stationary random process, it can be shown that as N, MMSPE N decreases to MMSPE = "one-step prediction error" = exp π 1 2π -π ln S X (ω) dω where S X (ω) is the power spectral density of X, i.e. S X (ω) = discrete-time Fourier transform of R X (k) = RX (k) e -jkω k=- For an Nth-order AR process, MMSPE = MMSPE N. DPCM-12

COMPLEXITY OF DPCM arithmetic complexity (storage is usually small) scalar quantization prediction encoding R 2N+1 ops/sample decoding 0 2N+1 ops/sample DPCM-13 THE ORIGINS OF THE "DPCM" NAME The letters "PCM" comes from "Pulse code modulation", which is a modulation technique for transmitting analog sources such as speech. In PCM, an analog source, such as speech, is sampled; the samples are quantized; fixed-length binary codewords are produced; and each bit of each binary codeword determines which of two specific pulses are sent (e.g. one pulse might be the negative of the other, or the pulse might be sinusoidal with different frequencies). Thus PCM is a modulation technique for transmitting analog sources that competes with AM, FM and various other forms of pulse modulation. Invented in the 1940's, "PCM" is now viewed as three systems: sampler, quantizer and digital modulator. However, the quantizer by itself is often referred to as PCM. DPCM was originally proposed as an improved DPCM type modulator consisting of a sampler, a DPCM quantizer as we know it, and a digital modulator. Nowadays DPCM usually refers just to the quantization part of the system. Predictive Coding DPCM is also considered to be a kind of predictive coding. DPCM-14

DELTA MODULATION The special case of DPCM where the quantizer has just two levels: +, -. Delta modulation is used when the source is highly correlated, for example speech coding with a high sampling rate. SPEECH CODING Adaptive versions of DPCM and Delta-Modulation have been standardized for use in the telephone industry for digitizing telephone speech. The speech is bandlimited to 3 khz before sampling at 8 khz. However, this kind of speech coding is no longer considered state-of-the-art. DPCM-15 EXAMPLES OF DPCM IMAGE CODING Example: Prediction for the present pixel X ij Y i-1,j-1 Y i-1,j X ij = a 1 Y i,j-1 + a 2 Y i-1,j + a 3 Y i-1,j-1 Y i,j-1 X i,j (from R. Jain, Fund'ls of Image Proc, p. 493) All but second curve from top represents performance predicted with predictor matched to the stated source. For the second curve from the top, the values for a 1, a 2 and a 3 were designed for the actual measured correlations for a test set of images. DPCM has been extensively studied for image coding. But it is not often used today. Transform coding is more common. DPCM-16

DPCM IN VIDEO CODING Though not usually called DPCM, the most commonly used methods of video coding use a form of DPCM, e.g. MPEG, H.26X, HDTV, satellite "Direct TV", DVD. Prediction is done on frame-by-frame basis, with prediction of a frame being the decoded reconstruction of previous frame, or a "motioncompensated" version thereof. In latter case, coder has a forward-adaptive component in addition to the backward adaption. Example: (Netravali & Haskell, Digital Pictures, p. 331) left pixel, curr. frame Prediction Coefficients left pixel prev. frame same pixel prev frame MSPE Pred. gain 1 1 53.1 12.7 db 1-1/2 1/2 29.8 15.3 db 3/4-1/2 3/4 27.9 15.5 db 7/8-5/8 3/4 26.3 15.8 db 1Based on educated guess that σ 2 = 1000. DPCM-17 DESIGN AND PERFORMANCE -- HIGH RESOLUTION ANALYSIS Warning: Though this analysis is almost universally accepted, it has never been satisfactorily proven to be correct. Assume source is stationary, zero-mean random process. It can be shown that under ordinary conditions {(X i,u i,v i,y i )} is asymptotically stationary. So we assume it is stationary. Therefore, D = E(X i -Y i ) 2 = E(U i -Q(U i )) 2 (same for all i) Key assumption: When R is large and quantizer is well designed D E ( U i -Q( U i )) 2 where U i = X i - g(x i-1,x i-2, ). U i U i but U i U i Note: U is a stationary random process. Key assumption implies least possible distortion of DPCM with rate R least possible distortion of SQ with rate R encoding U, minimized over choice of predictor g That is, DPCM-18

δ dpcm (R) min δ g U,sq (R) (true for FLC or VLC) min g 1 12 σ2 α U U 2-2R where α U = β U, for FLC η U, for VLC DPCM-19 Conclusions: A good (but not necessarily optimal) way to design DPCM is to (a) Choose g to minimize σ 2 U α U (b) Choose q to achieve δ (R) U,sq Common situation: The pdf of U is similar to that of X, which implies, β U β X and η U η X. Example: exact equalities hold if X is Gaussian random process. In such cases, one need only (a') Choose g to minimize σ 2 U (b) Choose q to achieve δ U,sq (R) It follows that δ dpcm (R) min g 1 12 σ2 U α X 2-2R = min g σ 2 U 1 12 α X 2-2R MMSPE σ 2 δ sq (R) S dpcm (R) S sq (R) + G N db where G N = 10 log 10 σ 2 MMSPE db = prediction gain That is, the gain of DPCM over SQ approximately equals the prediction gain. DPCM-20