EGR 544 Communication Theory 3. Coding for Discrete Sources Z. Aliyazicioglu Electrical and Computer Engineering Department Cal Poly Pomona Coding for Discrete Source Coding Represent source data effectively in digital form for transmission or storage A measure of the efficiency of a source-encoding method can be obtained by comparing the average number of binary digits per output letter from the source to the entropy H(X). Two types of source coding ossless (Huffman coding algorithm, embel-ziv Algorithm..) ossy (rate-distortion, quantization, waveform coding..) X Source encoding bits Channel transmission bits Source decoding _ X _ X _ X = X X Cal Poly Pomona Electrical & Computer Engineering Dept. EGR 544-
Coding for Discrete Memoryless Source DMS source produce an output letter every τ s second. Source has finite alphabet of symbol x i, i=,, with probabilities P (x i ) The entropy of the DMS in bits per source symbol is H ( X) = P( x )log P( x ) log i= If symbols have same probability i i= i H ( X) = log = log The source rate in bits/s is H( X)/ τ s Cal Poly Pomona Electrical & Computer Engineering Dept. EGR 544-3 Fixed-length code words et s assign a unique set of R binary digits to each symbols Since there is possible symbols, R will gives us code rate in bits per symbols as R = log When is not a power of, it is R = log + log denotes the largest integer less than log Since H( X) log R H( X) H( X) R Ratio shows the efficiency of the encoding for DMS Cal Poly Pomona Electrical & Computer Engineering Dept. EGR 544-4
When is power of and source letters are equally probable, Fixed length code of R bits per symbol attains 00 percent efficiency R = H( X) When is not power of and source letters are equally probable, R will be different than H(X) at most bit. Shannon coding Theorem: Based on the sequences, the lossless coding exists as long as R H(X). ossless code does not exits for any R<H(X). Cal Poly Pomona Electrical & Computer Engineering Dept. EGR 544-5 Variable-length code words When the source symbols are not equally probable, a more efficient encoding methods is variable-length code words. Use the probabilities of occurrence of each source letter in the selection of the code word. This is called entropy coding Example: etter P(a ) Code I Code II Code III a a a 3 a 4 ½ ¼ /8 /8 00 0 0 0 0 0 0 0 0 Cal Poly Pomona Electrical & Computer Engineering Dept. EGR 544-6
Variable-length code words Code II is uniquely decodable and instantaneously decodable Code tree for Code II a a a 3 0 0 0 Code III is uniquely decodable but not instantaneously decodable Code tree for Code III a 4 0 a a a 3 Find a procedure to construct uniquely decodable variable-length codes that is efficient for R = np( a) average number of bits per source letter. = Cal Poly Pomona Electrical & Computer Engineering Dept. EGR 544-7 a 4 Kraft Inequality The codeword lengths n n n of a uniquely decodable for discrete variable X must satisfy the Kraft inequality condition = n The codeword lengths n n n of a uniquely decodable for discrete variable X must satisfy the Kraft inequality condition Cal Poly Pomona Electrical & Computer Engineering Dept. EGR 544-8
Source Coding Theorem et X be DMS random variable with finite entropy H(X), and the output letters x,. The corresponding probabilities p,. Construct a code that satisfies the prefix condition and has average length H( X) R < H( X) + Cal Poly Pomona Electrical & Computer Engineering Dept. EGR 544-9 Huffman Coding It is a variable-length encoding algorithm. It is optimal in sense of provides average number of binary digits per symbols It is based on the source letter probabilities P (x i ), i=,,, Example: etter X X X 3 X 4 X 5 X 6 X 7 Probabilities 0.35 0.30 0.0 0.0 0.04 0.005 0.005 Self Information.546.7370.39 3.39 4.6439 7.6439 7.6439 H( X ) =. R =. Code 00 0 0 0 0 0 Cal Poly Pomona Electrical & Computer Engineering Dept. EGR 544-0
Huffman Coding Efficiency is 0.95 An example of variable-length source encoding for a DMS Cal Poly Pomona Electrical & Computer Engineering Dept. EGR 544- Huffman Coding R =. Efficiency is 0.95 An alternative code for the DMS etter X X X 3 X 4 X 5 X 6 X 7 Code 0 0 0 0 0 0 Cal Poly Pomona Electrical & Computer Engineering Dept. EGR 544-
The empel-ziv Algorithm Huffman coding gives minimum the average code length and satisfy the prefix condition To design Huffman coding, we need to now the probabilities of occurrence of all the source letter. In practice, the statistic of a source output are often unnown. Huffman coding methods in generally impractical The empel-ziv source coding algorithm is designed to be independent of the source statistics. A given string of source symbols is parsed into variable-length blocs, which are called phrases The phrases listed in the dictionary New phrase will be one of the minimum length that has not appeared before Does not wor well for short string Often use in practice compress and uncompress utility (Z77, ZIP) Cal Poly Pomona Electrical & Computer Engineering Dept. EGR 544-3 Example et's loo at the binary sequence: 00000000000000000000000000000 Parsing the sequence as the following phrases: 0,0,,0,00,00,0,0,00,000, 00,000,000,000, 0000 Code the prefix number using the same 0s and s that also occur as characters in the string, We need our coded strings to have a fixed length. Since we have 6 strings, we will need 4 bits Starting from the first non-empty string (see Position Number in the table below), we also chec what the Prefix is (that is the piece of the string before its last digit) and the Position Number of that prefix The coded string is then constructed by taing the Position Number of the Prefix, and following that by the last bit of the string that we are considering. Cal Poly Pomona Electrical & Computer Engineering Dept. EGR 544-4
String Position Number of this string Dictionary ocation Prefix Position Number of Prefix Coded String 0 000 emty 0000 00000 0 3 000 00 0 emty 000 0000 000 0000 0 4 000 0 000 000 00 5 00 0 000 0000 00 6 00 0 000 0000 0 0 7 8 0 000 0 00 0 000 0 00 9 00 00 00 00 000 0 00 0 000 0000 00 000 0 00 000 3 0 000 4 0 0000 5 Cal Poly Pomona Electrical & Computer Engineering Dept. EGR 544-5 Coding For Analog Sources Source has band a limited stochastic process signal X(t). Sampling X(t) at the Nyquist rate converts the X(t) signal to discrete time sequence Then, we can quantize and encode the discrete time sequence A simple encoding is to represent each discrete amplitude level by a sequence of binary digits. et s we have level, we need R=log if is power of we need R= log + if is not power of If the levels are not equally probable, and probabilities of the output levels are nown, we can use Huffman coding to improve the efficiency. Cal Poly Pomona Electrical & Computer Engineering Dept. EGR 544-6
Rate-Distortion function Quantization of an amplitude of sampled signal is a ind of data compression. It introduces some distortion of the waveform. Idea is to have minimum distortion et s defined the distortion Some Measure of the difference between the actual source sample value { x } and the corresponding quantized value { x } d( x, x ) Commonly used distortion function is the squared-error distortion d( x, x ) = ( x x ) Cal Poly Pomona Electrical & Computer Engineering Dept. EGR 544-7 Rate-Distortion function The average distortion between a sequence of n samples, and quantized value is the n output samples X n n d(x,x ) = d( x, x ) n n n n n = The source is random process and X n will be random process. Therefore d(x,x ) is random variable n n The expected value of the distortion value is D n D = E d( X, X ) = E d( x, x ) = E d( x, x ) n n n n n = [ ] [ ] X n Cal Poly Pomona Electrical & Computer Engineering Dept. EGR 544-8
Rate-Distortion Function et s have memoryless continuous source signal output x X with a PDF p(x) The quantized output signal x X The distortion per sample d( x, x ) The minimum rate in bits per sample to represent X of memorlyless source with a distortion less than or equal to D is called the rate-distortion function R(D) RD ( ) = min I(X;X) p( xx ): E[ d(x,x)] D Where I(X;X) is the mutual information between X and X R(D) increases as D increases Cal Poly Pomona Electrical & Computer Engineering Dept. EGR 544-9 Rate-Distortion Function for a Memorless Gaussian Source The minimum information rate necessary to represent the output of the discrete-time, continues-memoryless Gaussian source based on a mean-square-error distortion measure per symbol is given(shannon, 959) σ log ( x ) (0 D σ Rg ( D) = D 0 ( D > σ x ) WE can represent D in terms of R as D ( R) = R σ Is called distortion-rate function g x x 0log D ( R) = 6R+ 0log σ in db 0 g 0 Cal Poly Pomona Electrical & Computer Engineering Dept. EGR 544-0 x
Scalar Quantization If we now the PDF of the source signal amplitude, the quantizer can be optimized Design the optimum scalar quantizer that minimize some function of the quantization error q = x x The distortion can be given as D = f( x x) p( x) dx Where f ( x x) is the desired function of the error. The optimum quantizer is that minimize D by optimally selecting the output level and corresponding input range of each output level Cal Poly Pomona Electrical & Computer Engineering Dept. EGR 544- Scalar Quantization We can treat the the quantized source value x as letter with probabilities {p } since discrete amplitude. X = { x, } If the signal amplitudes are statistically independent, its entropy is given H ( X ) = p log p = Cal Poly Pomona Electrical & Computer Engineering Dept. EGR 544-
Scalar Quantization Example: 4 level nonuniform quantizer for the Gaussian distribution signal Probabilities: p=p4=0.635 for outer level p=p3=0.3365 for inner level The entropy for the discrete sources H( X ) = p log p =.9 = The entropy coding can be achieved to.9 and the distortion will be 9.30db Cal Poly Pomona Electrical & Computer Engineering Dept. EGR 544-3 Vector Quantization Considering the joint quantization of a bloc of signal samples or a bloc of signal parameters in quantization is called bloc or vector quantization It is used in speech coding for digital cellular phone Better performance sense of rate distortion is better than scaler quantization Formulation of vector quantization et s have n-dimensional vector X=[x,x,,x n ] with real valued continuous-amplitude components {x, n} and the joint PDF are given by p(x,x, x n ). ~ ets have quantized value of X that n-dimensional vector X with components {x ~, n} Cal Poly Pomona Electrical & Computer Engineering Dept. EGR 544-4
Vector Quantization Quantization will be in the form X = Q(X) Vector quantization of blocs of data can be classified into a discrete number of categories or cells that some way to minimize error distortion Quantization of two-dimensional vector X=[x,x ]. Two-dimensional space partitioned into cells, where hexagonal-shape cells {C }. All input vectors that fall in the cell C are quantized into the vector X which is shown as the center of the hexagonal. Cal Poly Pomona Electrical & Computer Engineering Dept. EGR 544-5 Vector Quantization For quantization of n-dimensional vector X is vector ~ X. The ~ quantization error or distortion is d(x,x). The average distortion over the set of input vector X D = P(X C ) E[ d(x,x) X C] = = P(X C ) d(x,x) p(x) dx = X C Where P(X C ) is the probability that the vector X falls in the cell C and p(x) is the joint PDF of the n random variables. To minimize D, we can select the cell {C, } for a given PDF p (X). Cal Poly Pomona Electrical & Computer Engineering Dept. EGR 544-6
Vector Quantization The common distortion measurement for vector quantization is the mean square error n d(x,x) = (X-X)'(X,X)= ( x x) n n = The vectors can be transmitted at an average bit rate of H (X) R = Bits per sample n ~ Where H(X) is entropy of the quantized source output H(X) = p(x)log i p(x) i i= The minimum distortion will be D n (R) D ( R) = min E[ d(x,x) n Q(X) Cal Poly Pomona Electrical & Computer Engineering Dept. EGR 544-7