Digital Communication Systems ECS 452

Size: px

Start display at page:

Download "Digital Communication Systems ECS 452"

Rodger Lewis
5 years ago
Views:

1 Digital Communication Systems ECS 452 Asst. Prof. Dr. Prapun Suksompong 2. Source Coding 1 Office Hours: BKD, 6th floor of Sirindhralai building Tuesday 14:20-15:20 Wednesday 14:20-15:20 Friday 9:15-10:15

2 Elements of digital commu. sys. Message Transmitter Information Source Recovered Message Destination Source Encoder Remove redundancy Source Decoder Channel Encoder Add systematic redundancy Receiver Channel Decoder Digital Modulator Digital Demodulator Channel Transmitted Signal Received Signal Noise & Interference 2

3 Main Reference Elements of Information Theory 2006, 2nd Edition Chapters 2, 4 and 5 the jewel in Stanford's crown One of the greatest information theorists since Claude Shannon (and the one most like Shannon in approach, clarity, and taste). 3

4 US UK The ASCII Coded Character Set (American Standard Code for Information Interchange) [The ARRL Handbook for Radio Communications 2013]

5 Introduction to Data Compression 5 [ ]

6 English Redundancy: Ex. 1 J-st tr- t- r--d th-s s-nt-nc-. 6

7 English Redundancy: Ex. 2 yxx cxn xndxrstxnd whxt x xm wrxtxng xvxn xf x rxplxcx xll thx vxwxls wxth xn 'x' (t gts lttl hrdr f y dn't vn kn whr th vwls r). 7

8 English Redundancy: Ex. 3 To be, or xxx xx xx, xxxx xx xxx xxxxxxxx 8

9 9 Entropy Rate of Thai Text

10 10 Ex. Source alphabet of size = 4

11 Ex. DMS (1) X abcde,,,, p X x 1, x abcde,,,, 5 0, otherwise Information Source a c a c e c d b c e d a e e d a b b b d b b a a b e b e d c c e d b c e c a a c a a e a c c a a d c d e e a a c a a a b b c a e b b e d b c d e b c a e e d d c d a b c a b c d d e d c e a b a a c a d 11 Approximately 20% are letter a s [GenRV_Discrete_datasample_Ex1.m]

12 Ex. DMS (1) clear all; close all; S_X = 'abcde'; p_x = [1/5 1/5 1/5 1/5 1/5]; n = 100; MessageSequence = datasample(s_x,n,'weights',p_x) MessageSequence = reshape(messagesequence,10,10) >> GenRV_Discrete_datasample_Ex1 MessageSequence = eebbedddeceacdbcbedeecacaecedcaedabecccabbcccebdbbbeccbadeaaaecceccdaccedadabceddaceadacdaededcdcade MessageSequence = eeeabbacde eacebeeead bcadcccdce bdcacccaed ebabcbedac dceeeacadd dbccbdcbac deecdedcca eddcbaaedd cecabacdae 12 [GenRV_Discrete_datasample_Ex1.m]

13 Ex. DMS (2) X 1, 2,3, 4 Information Source p X x , x 1, 2 1, x 2, 4 1, x 3,4 8 0, otherwise 13 Approximately 50% are number 1 s [GenRV_Discrete_datasample_Ex2.m]

14 Ex. DMS (2) clear all; close all; S_X = [ ]; p_x = [1/2 1/4 1/8 1/8]; n = 20; MessageSequence = randsrc(1,n,[s_x;p_x]); %MessageSequence = datasample(s_x,n,'weights',p_x); rf = hist(messagesequence,s_x)/n; % Ref. Freq. calc. stem(s_x,rf,'rx','linewidth',2) % Plot Rel. Freq. hold on stem(s_x,p_x,'bo','linewidth',2) % Plot pmf xlim([min(s_x)-1,max(s_x)+1]) legend('rel. freq. from sim.','pmf p_x(x)') xlabel('x') grid on Rel. freq. from sim. pmf p X (x) x 14 [GenRV_Discrete_datasample_Ex2.m]

15 DMS in MATLAB clear all; close all; S_X = [ ]; p_x = [1/2 1/4 1/8 1/8]; n = 1e6; SourceString = randsrc(1,n,[s_x;p_x]); Alternatively, we can also use SourceString = datasample(s_x,n,'weights',p_x); 15 rf = hist(sourcestring,s_x)/n; % Ref. Freq. calc. stem(s_x,rf,'rx','linewidth',2) % Plot Rel. Freq. hold on stem(s_x,p_x,'bo','linewidth',2) % Plot pmf xlim([min(s_x)-1,max(s_x)+1]) legend('rel. freq. from sim.','pmf p_x(x)') xlabel('x') grid on [GenRV_Discrete_datasample_Ex.m]

16 A more realistic example of pmf: Relative freq. of letters in the English language 16 [

17 A more realistic example of pmf: Relative freq. of letters in the English language ordered by frequency 17 [

18 Example: ASCII Encoder Codebook Character x Codeword c(x) E L O V MATLAB: >> M = 'LOVE'; >> X = dec2bin(m,7); >> X = reshape(x',1,numel(x)) X = Remark: numel(a) = prod(size(a)) (the number of elements in matrix A) 18 Information Source LOVE Source Encoder c( L ) c( O ) c( V ) c( E )

19 The ASCII Coded Character Set [The ARRL Handbook for Radio Communications 2013]

20 A Byte (8 bits) vs. 7 bits >> dec2bin('i Love ECS452',7) ans = >> dec2bin('i Love ECS452',8) ans =

Geeky ways to express your love >> dec2bin('i Love You',8) >> dec2bin('i love you',8)

01110110 01110110 01100101 01100101 00100000 00100000 01011001 01111001 01101111

com/listing/91473057/binary-i-love-you-printable-foryour?

21 Geeky ways to express your love >> dec2bin('i Love You',8) >> dec2bin('i love you',8) ans = ans = &ga_filters=holidays+-supplies+valentine&ga_search_type=all&ga_view_type=gallery 21

22 Morse code (wired and wireless) Telegraph network Samuel Morse, 1838 A sequence of on-off tones (or, lights, or clicks) 22

23 Example 23 [

24 24 Example

25 Morse code: Key Idea Frequently-used characters are mapped to short codewords. 25 Relative frequencies of letters in the English language

26 Morse code: Key Idea Frequently-used characters (e,t) are mapped to short codewords. 26 Relative frequencies of letters in the English language

27 Morse code: Key Idea Frequently-used characters (e,t) are mapped to short codewords. Basic form of compression. 27

28 28 รห สมอร สภาษาไทย

29 Example: ASCII Encoder Character Codeword E L O V MATLAB: >> M = 'LOVE'; >> X = dec2bin(m,7); >> X = reshape(x',1,numel(x)) X = Information Source LOVE Source Encoder

30 Another Example of non-ud code x c(x) A 1 B 011 C D 1110 E Consider the string It can be interpreted as CDB: BABE:

31 Game: 20 Questions 20 Questions is a classic game that has been played since the 19th century. One person thinks of something (an object, a person, an animal, etc.) The others playing can ask 20 questions in an effort to guess what it is. 31

32 32 20 Questions: Example

33 Shannon Fano coding Prof. Robert Fano ( ) Shannon Award (1976 ) Proposed in Shannon s A Mathematical Theory of Communication in 1948 The method was attributed to Fano, who later published it as a technical report. Fano, R.M. (1949). The transmission of information. Technical Report No. 65. Cambridge (Mass.), USA: Research Laboratory of Electronics at MIT. Should not be confused with Shannon coding, the coding method used to prove Shannon's noiseless coding theorem, or with Shannon Fano Elias coding (also known as Elias coding), the precursor to arithmetic coding. 33

Huffman Code David Huffman (1925 1999) Hamming Medal (1999) MIT, 1951 Information theory class taught by Professor Fano.

34 Huffman Code David Huffman ( ) Hamming Medal (1999) MIT, 1951 Information theory class taught by Professor Fano. Huffman and his classmates were given the choice of a term paper on the problem of finding the most efficient binary code. or a final exam. Huffman, unable to prove any codes were the most efficient, was about to give up and start studying for the final when he hit upon the idea of using a frequency-sorted binary tree and quickly proved this method the most efficient. Huffman avoided the major flaw of the suboptimal Shannon-Fano coding by building the tree from the bottom up instead of from the top down. 34

35 Huffman s paper (1952) 35 [D. A. Huffman, "A Method for the Construction of Minimum-Redundancy Codes," in Proceedings of the IRE, vol. 40, no. 9, pp , Sept ] [ ]

36 Huffman coding 36 [ ]

37 Ex. Huffman Coding in MATLAB [Ex. 2.31] Observe that MATLAB automatically give the expected length of the codewords px = [ ]; % pmf of X SX = [1:length(pX)]; % Source Alphabet [dict,el] = huffmandict(sx,px); % Create codebook %% Pretty print the codebook. codebook = dict; for i = 1:length(codebook) codebook{i,2} = num2str(codebook{i,2}); end codebook %% Try to encode some random source string n = 5; % Number of source symbols to be generated sourcestring = randsrc(1,10,[sx; px]) % Create data using px encodedstring = huffmanenco(sourcestring,dict) % Encode the data 37 [Huffman_Demo_Ex1]

38 Ex. Huffman Coding in MATLAB codebook = [1] '0' [2] '1 0' [3] '1 1 1' [4] '1 1 0' sourcestring = encodedstring = [Huffman_Demo_Ex1]

39 Ex. Huffman Coding in MATLAB [Ex. 2.32] px = [ ]; % pmf of X SX = [1:length(pX)]; [dict,el] = huffmandict(sx,px); % Source Alphabet % Create codebook %% Pretty print the codebook. codebook = dict; for i = 1:length(codebook) codebook{i,2} = num2str(codebook{i,2}); end codebook EL The codewords can be different from our answers found earlier. The expected length is the same. 39 [Huffman_Demo_Ex2] >> Huffman_Demo_Ex2 codebook = EL = [1] '1' [2] '0 1' [3] ' ' [4] '0 0 1' [5] ' ' [6] ' '

40 Ex. Huffman Coding in MATLAB px = [1/8, 5/24, 7/24, 3/8]; % pmf of X SX = [1:length(pX)]; [dict,el] = huffmandict(sx,px); % Source Alphabet % Create codebook %% Pretty print the codebook. codebook = dict; for i = 1:length(codebook) codebook{i,2} = num2str(codebook{i,2}); end codebook EL [Exercise] 40 >> -px*(log2(px)).' ans = codebook = [1] '0 0 1' [2] '0 0 0' [3] '0 1' [4] '1' EL =

41 41 Let s talk about TV series on HBO

42 Silicon Valley (TV series) Focus around six young men who found a startup company in Silicon Valley. In the first season, the company develop a revolutionary data compression algorithm: The middle-out algorithm 42

43 Behind the Scene When Mike Judge set out to write Silicon Valley, he wanted to conceive a simple, believable widget for his characters to invent. He teamed with Stanford electrical engineering professor TsachyWeissman and a PhD student Vinith Misra They came up with a fictional breakthrough compression algorithm. We had to come up with an approach that isn t possible today, but it isn t immediately obvious that it isn t possible, says Misra. The writers also coined a metric, the Weissman Score, for characters to use when comparing compression codes. 43 [May 2014 issue of Popular Science]

44 Middle-Out Algorithm: Into the Real World Something like it can be found in Lepton A new lossless image compressor created by Dropbox. Lepton reduces the file size of JPEG-encoded images by as much as 22 percent, yet without losing a single bit of the original. How is this possible? Middle-out. Well, it s much more complicated than that, actually. The middle-out bit comes toward the end of the decompression bit. Lepton is open source, and Dropbox has put the code for it on GitHub. 44 [ ]

Weissman Score: Into the Real World It s hard to convey to a lay audience that one compression algorithm is better than another. Created by Misra (Prof. Weissman s PhD student) for the show.

45 Weissman Score: Into the Real World It s hard to convey to a lay audience that one compression algorithm is better than another. Created by Misra (Prof. Weissman s PhD student) for the show. Metrics for compression algorithms that rate not only the amount of compression but the processing speed, are hard to find

46 Summary A good code must be uniquely decodable (UD). Difficult to check. A special family of codes called prefix(-free) code is always UD. They are also instantaneous. Huffman s recipe Repeatedly combine the two least-likely (combined) symbols Automatically give prefix code For a given source s pmf, Huffman codes are optimal among all UD codes for that source. 46

47 x

48 Entropy and Description of RV 48 [ ]

49 Entropy and Description of RV 49 [ ]

50 Kronecker Product An operation on two matrices of arbitrary size Named after German mathematician Leopold Kronecker. If A is an m-by-n matrix and B is a p-by-q matrix, then the Kronecker product A B is the mp-by-nq matrix 50 Use kron(a,b) in MATLAB. Example a11b a1 nb A B. am 1B amnb

51 Kronecker Product >> p = [ ] p = >> p2 = kron(p,p) p2 = >> p3 = kron(p2,p) p3 = Columns 1 through Column

52 [Ex.2.40] Huffman Coding: Source Extension 1 1 X k i.i.d. Bernoulli p p L n n: order of extension

53 [Ex.2.40] Huffman Coding: Source Extension 1.8 X k i.i.d. Bernoulli p p L n H X 1 n 53 H X Order of source extension n

54 Final Remarks about Huffman Code Huffman coding compresses an i.i.d. source with a known pmf p X (x) to its entropy limit H(X). Sensitive to the assumed distribution. If the code is designed for some incorrect distribution, a penalty is incurred. What compression can be achieved if the true pmf p X (x) is unknown? One may assume uniform pmf Inefficient if the actual pmf is not uniform. Lempel-Ziv algorithm 54

55 Lempel-Ziv algorithm Often used in practice to compress data that cannot be modeled simply Could also be described as adaptive dictionary compression algorithms. Ziv and Lempel wrote their papers in 1977 and The two papers describe two distinct versions of the algorithm. LZ77: sliding window Lempel Ziv LZ78: tree-structured Lempel Ziv 55

56 Arithmetic Coding The Huffman coding procedure is optimal for encoding a random variable with a known pmf that has to be encoded symbol by symbol. Coding Inefficiency of Huffman Code: The codeword lengths for a Huffman code were restricted to be integer-valued There could be a loss of up to 1 bit per symbol in coding efficiency. We could alleviate this loss by encoding blocks of source symbols The complexity of this approach increases exponentially with block length n. In arithmetic coding, instead of using a sequence of bits to represent a symbol, we represent it by a subinterval of the unit interval. 56

Digital Communication Systems ECS 452

Digital Communication Systems ECS 452 Asst. Prof. Dr. Prapun Suksompong prapun@siit.tu.ac.th 3 Discrete Memoryless Channel (DMC) Office Hours: BKD, 6th floor of Sirindhralai building Tuesday 4:20-5:20