Lecture 6: Coding theory

Similar documents
22: Union Find. CS 473u - Algorithms - Spring April 14, We want to maintain a collection of sets, under the operations of:

Counting Paths Between Vertices. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs

Data Structures LECTURE 10. Huffman coding. Example. Coding: problem definition

CS311 Computational Structures Regular Languages and Regular Grammars. Lecture 6

I 3 2 = I I 4 = 2A

CS 491G Combinatorial Optimization Lecture Notes

p-adic Egyptian Fractions

Factorising FACTORISING.

Outline Data Structures and Algorithms. Data compression. Data compression. Lossy vs. Lossless. Data Compression

Lossless Compression Lossy Compression

1 PYTHAGORAS THEOREM 1. Given a right angled triangle, the square of the hypotenuse is equal to the sum of the squares of the other two sides.

Common intervals of genomes. Mathieu Raffinot CNRS LIAFA

Arrow s Impossibility Theorem

Mid-Term Examination - Spring 2014 Mathematical Programming with Applications to Economics Total Score: 45; Time: 3 hours

Nondeterministic Finite Automata

Project 6: Minigoals Towards Simplifying and Rewriting Expressions

Numbers and indices. 1.1 Fractions. GCSE C Example 1. Handy hint. Key point

18.06 Problem Set 4 Due Wednesday, Oct. 11, 2006 at 4:00 p.m. in 2-106

Surds and Indices. Surds and Indices. Curriculum Ready ACMNA: 233,

APPENDIX. Precalculus Review D.1. Real Numbers and the Real Number Line

CSE 332. Sorting. Data Abstractions. CSE 332: Data Abstractions. QuickSort Cutoff 1. Where We Are 2. Bounding The MAXIMUM Problem 4

Chapter 4 State-Space Planning

Section 2.1 Special Right Triangles

NON-DETERMINISTIC FSA

SIMPLE NONLINEAR GRAPHS

CS 573 Automata Theory and Formal Languages

CIT 596 Theory of Computation 1. Graphs and Digraphs

Intermediate Math Circles Wednesday 17 October 2012 Geometry II: Side Lengths

Arrow s Impossibility Theorem

2.4 Linear Inequalities and Interval Notation

Solutions for HW9. Bipartite: put the red vertices in V 1 and the black in V 2. Not bipartite!

The University of Nottingham SCHOOL OF COMPUTER SCIENCE A LEVEL 2 MODULE, SPRING SEMESTER MACHINES AND THEIR LANGUAGES ANSWERS

Solutions to Problem Set #1

PYTHAGORAS THEOREM WHAT S IN CHAPTER 1? IN THIS CHAPTER YOU WILL:

Now we must transform the original model so we can use the new parameters. = S max. Recruits

Equivalent fractions have the same value but they have different denominators. This means they have been divided into a different number of parts.

Data Compression Techniques (Spring 2012) Model Solutions for Exercise 4

Coalgebra, Lecture 15: Equations for Deterministic Automata

Lecture 3: Equivalence Relations

Introduction to Olympiad Inequalities

CS 360 Exam 2 Fall 2014 Name

Automata and Regular Languages

CS 310 (sec 20) - Winter Final Exam (solutions) SOLUTIONS

expression simply by forming an OR of the ANDs of all input variables for which the output is

Coding Techniques. Manjunatha. P. Professor Dept. of ECE. June 28, J.N.N. College of Engineering, Shimoga.

Homework 3 Solutions

Linear Inequalities. Work Sheet 1

Chapter Five - Eigenvalues, Eigenfunctions, and All That

The area under the graph of f and above the x-axis between a and b is denoted by. f(x) dx. π O

The DOACROSS statement

Bases for Vector Spaces

Lecture 2: Cayley Graphs

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4

Discrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 17

Probability The Language of Chance P(A) Mathletics Instant Workbooks. Copyright

How do we solve these things, especially when they get complicated? How do we know when a system has a solution, and when is it unique?

Parse trees, ambiguity, and Chomsky normal form

Global alignment. Genome Rearrangements Finding preserved genes. Lecture 18

CS 2204 DIGITAL LOGIC & STATE MACHINE DESIGN SPRING 2014

Finite State Automata and Determinisation

Lecture 11 Binary Decision Diagrams (BDDs)

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata

CARLETON UNIVERSITY. 1.0 Problems and Most Solutions, Sect B, 2005

LESSON 11: TRIANGLE FORMULAE

Discrete Structures Lecture 11

for all x in [a,b], then the area of the region bounded by the graphs of f and g and the vertical lines x = a and x = b is b [ ( ) ( )] A= f x g x dx

Lesson 2.1 Inductive Reasoning

Lesson 2.1 Inductive Reasoning

System Validation (IN4387) November 2, 2012, 14:00-17:00

Technische Universität München Winter term 2009/10 I7 Prof. J. Esparza / J. Křetínský / M. Luttenberger 11. Februar Solution

Let's start with an example:

STRAND J: TRANSFORMATIONS, VECTORS and MATRICES

6.5 Improper integrals

Improper Integrals. The First Fundamental Theorem of Calculus, as we ve discussed in class, goes as follows:

where the box contains a finite number of gates from the given collection. Examples of gates that are commonly used are the following: a b

50 AMC Lectures Problem Book 2 (36) Substitution Method

2.4 Theoretical Foundations

MCH T 111 Handout Triangle Review Page 1 of 3

Linear Algebra Introduction

I1 = I2 I1 = I2 + I3 I1 + I2 = I3 + I4 I 3

Total score: /100 points

, g. Exercise 1. Generator polynomials of a convolutional code, given in binary form, are g. Solution 1.

A Primer on Continuous-time Economic Dynamics

CS 330 Formal Methods and Models

Section 1.3 Triangles

( x) ( ) takes at the right end of each interval to approximate its value on that

Prefix-Free Regular-Expression Matching

a) Read over steps (1)- (4) below and sketch the path of the cycle on a P V plot on the graph below. Label all appropriate points.

Lecture 8: Abstract Algebra

5. Every rational number have either terminating or repeating (recurring) decimal representation.

Algorithms & Data Structures Homework 8 HS 18 Exercise Class (Room & TA): Submitted by: Peer Feedback by: Points:

Spacetime and the Quantum World Questions Fall 2010

Farey Fractions. Rickard Fernström. U.U.D.M. Project Report 2017:24. Department of Mathematics Uppsala University

Section 6: Area, Volume, and Average Value

Minimal DFA. minimal DFA for L starting from any other

Designing Information Devices and Systems I Discussion 8B

Particle Physics. Michaelmas Term 2011 Prof Mark Thomson. Handout 3 : Interaction by Particle Exchange and QED. Recap

Proportions: A ratio is the quotient of two numbers. For example, 2 3

Converting Regular Expressions to Discrete Finite Automata: A Tutorial

Discrete Mathematics and Probability Theory Summer 2014 James Cook Note 17

Transcription:

Leture 6: Coing theory Biology 429 Crl Bergstrom Ferury 4, 2008 Soures: This leture loosely follows Cover n Thoms Chpter 5 n Yeung Chpter 3. As usul, some of the text n equtions re tken iretly from those soures. Coing theory is the stuy of how informtion n e pkge for trnsport. Let us egin with n exmple. Suppose tht we hve sequene of symols A, C, E, B, B, D, E, A, C, D, D, A, B, A, E, A, B, D, C, A,... rwn from n lphet X = {A, B, C, D, E}, n we wnt to sen someone messge telling them this sequene. Our trnsmission hnnel llows only inry oing, so our messge hs to tke the form of string of zeros n ones. How n we oe this messge? One wy to o it is to simply use lok oe tht mps eh letter into inry oewor: A 000 B 001 C 010 D 011 E 100 Thus the messge ove woul look like 000010100001001011100000010... 1

Sine the oe wors re onstnt length, we n esily go in n group them efore eoing: 000 010 100 001 001 011 100 000 010... One n see right wy tht this oing is sort of ineffiient, euse we re not mking use of the oe wors 101, 110, n 111. Inee, we re using three its to trnsmit t most log 5 its of informtion. Another oing pproh woul e to uil set of vrile-length oewors. One might wnt to use oe suh s the following: A 0 B 1 C 00 D 01 E 11 ut in this se, there is no wy to uniquely eoe messge. For exmple 00101... oul e AABAB or CBAC or ADD or ny numer of other possiilities. We n use vrile-length oe wors so long s we llow the reeiver to uniquely eoe the messge. One wy to o this, for exmple, is to hve unique en-of-keywor symol. We oul let the numer of 1 s to inite the letter n using 0 s to inite the spes etween oewors. Then our messge woul look like A 0 B 10 C 110 D 1110 E 11110 01101111010101110111100110... Here, we n use the 0 s to group the symols n eoe: 2

0 110 11110 10 10 1110 11110 011 0... In generl, will this ltter pproh e more effiient, or less effiient, thn the former pproh? We nee efinition of effiieny. The ovious efinition is the expete oewor length per soure symol. Where l(x) is the length of the oewor neessry to enoe symol x, we re intereste in the expete oewor length L(C) = p(x)l(x). x X For our exmple soure ove, the expete oewor length of the first oe is simply 3, sine in ll ses the oewors re of length three. For the seon oe, the expete oewor length epens on the reltive frequeny of the symols in the originl messge. If messge hs lots of As, tht the ltter oing will e very effiient, euse mny of the oewors in the messge will e 0, i.e., one it long inste of three s in the former oe. If inste messge hs lots of Es, the ltter oe will e very ineffiient, euse then mn of the oewors in the oe messge will e 11110, i.e., five its long inste of three. Thus we see tht the effiieny of system for enoing informtion epens on the sttistil properties of the informtion to e enoe. Coing theory llows us to explore this reltionship, often with fous on esigning oes tht will e optiml or ner-to-optiml for ny given type of soure t. Notie the lose reltionship etween the two following prolems: Given soure of rnom vriles X rwn from lphet X with proilities p(x), fin n effiient wy of oing the soure using n lphet D. Given t file, fin wy of effiiently ompressing this t. With this out of the wy, we egin y looking t prefix oes. Prefix oes re those oes for whih one n eoe the messge string uniquely without hving to look forwr in the string, euse you n lwys figure out when oewor hs ene without neeing to look t see wht oewor omes next. Our oe with terminting zeros is very strightforwr exmple of prefix oe: we lwys know when we ve rehe the en of oewor, 3

euse every oe wor ens with zero n every zero signls the en of oewor. Note tht of ourse ll prefix oes re uniquely eole, though not ll uniquely eole oes re prefix oes (see Cover n Thoms tle 5.1 for n exmple). Theorem 1 A oe is prefix oe if n only if no oewor is prefix (the first prt of) ny other oewor. Proof: If oewor i is prefix of oewor j, then fter reeiving the symols for oewor i, one nees to look forwr t the susequent symols to etermine whether the full oewor is i or j, n thus the oe is not prefix oe. This proves the only if iretion. To prove the if iretion, we note tht when we use oe where no oewor is prefix of ny other, we will know with ertinty tht we hve reeive the full oewor when we reeive string of symols tht s up to this oewor. euse no other oewor strts tht sme wy. Thus we hve prefix oe. Now tht we hve this simple test for prefix oe, we n prove one of the min theorems in oing theory. Theorem 2 For ny prefix oe using n lphet of size D, the oewor lengths l 1, l 2,...,l m stisfy D l i 1. i Proof: Tke oe with n lphet of size D, n suppose tht the longest oe wor is of length w. Thus for ny non-singulr oe (where eh symol gets unique oewor) we n hve t most D w +D w 1 +...+D oewors. 4

D D 2 D 3 But we re looking t prefix oes here n thus no oewor n e the prefix of ny other oewor. The lrgest numer of oewors our prefix oe n hve is then D w, i.e., ll oewors hve the mximl length. If ny of our oewors re shorter, sy of length l < w, tht now rules out D w l potentil other oewors. So suppose tht we hve oewors 1, 2,..., m with lengths l 1, l 2,...,l i. Then we en up ruling out mny esenent oewors y our no-prefixes rule, nmely i D w l i esennts. Thus the tul numer of oewors we n hve is D w i D w l i 0, giving us D w > i D w l i. Diviing through y D w (whih is positive) we get the Krft inequlity: 1 i D l i. This puts powerful lower oun on the verge oewor length L(C). Next, we will see how lose we n ome to hieving this oun, n we will formlize the reltionship etween this oun n the entropy rte of the soure. We n lso prove the onverse thn for ny set of lengths stisfying the Krft inequlity, we n onstrut prefix oe with oe wors of those lengths. Theorem 3 Given ny set of oewor lengths l 1, l 2,...,l m tht stisfy the Krft inequlity D l i 1, i 5

there exists prefix oe with m oewors with preisely those lengths. The proof is y supplying onstrution. To onstrut suh set of oewors, rete tree s we i in the proof of the Krft inequlity. Orer the oewor lengths from shortest to longest, then strt with the first oewor length l 1. Assign the first symol to the first oewor with tht length, n remove ll esenents. Assign the seon symol to the first oewor remining on the tree with length l 2. Agin remove ll esenents. Continue until ll symols re ssigne oewor. One will lwys hve enough remining rnhes to ssign keywor to eh length, y the lultions performe in the proof of Krft s inequlity. We n then prove reltion etween the expete length L(C) of prefix oe n the entropy of the soure. Here we simply stte the theorem Theorem 4 The expete length of prefix oe L(C) using D-symol lphet is greter thn or equl to the entropy se D of the soure: L(C) H D (X) So we n t o etter thn the entropy rte when oing soure. We n get quite lose to the entropy rte, though, using very strightforwr oing proeure suggeste y Shnnon. This proeure, whih we will explore in the next leture, gives n expete oe length elow H D (X) + 1. 6