Autumn Coping with NP-completeness (Conclusion) Introduction to Data Compression

Size: px

Start display at page:

Download "Autumn Coping with NP-completeness (Conclusion) Introduction to Data Compression"

Barbra Johns
5 years ago
Views:

1 Autumn Coping with NP-completeness (Conclusion) Introduction to Data Compression

2 Kirkpatrick (984) Analogy from thermodynamics. The best crystals are found by annealing. First heat up the material to let it bounce from state to state. Slowly cool down the material to allow it to achieve its minimum energy state. 2

3 Local minimum Global minimum 3

4 Local minimum Global minimum 4

5 Local minimum Global minimum 5

6 Local minimum Global minimum 6

7 Local minimum Global minimum 7

8 Local minimum Global minimum 8

9 Local minimum Global minimum 9

10 Local minimum Global minimum 0

11 Solution space S, x in S is a solution E(x) is the energy of x x has a neighborhood of nearby states T is the temperature Cooling schedule, in step t of the algorithm bt Fast cooling T = ae b Slower cooling T = at

12 initialize T to be hot; choose a starting state; repeat generate a random move evaluate the change in energy de if de < 0 then accept the move else accept the move with probability e -de/t update T until T is very small (frozen) 2

13 A state is a spanning tree. T is a neighbor of T if it can be obtained by deletion of an edge in T and insertion of an edge not in T. Energy of a spanning tree T is its cost, c(t). T is a neighbor of T, de = c(t ) - c(t) > 0. Probability of moving to a higher energy state is e -(c(t ) -c(t))/t Higher if either c(t ) - c(t) is small or T is large. Low if either c(t ) - c(t) is large or T is small. 3

14 Not a black box algorithm. Requires tuning the cooling parameter. Has been shown to be very effective in finding good solutions for some optimization problems. Known to converge to optimal solution, but time of convergence is very large. Most likely converges to local optimum. Very little known about effectiveness generally. 4

15 Genetic Algorithms Mate solutions to find better ones Hopfield Nets Like a neural system trying to find minimum energy state DNA-Algorithms Use DNA chemical processes to execute a large search Quantum Algorithms Still an open question if quantum algorithms can solve NP-complete problems in polynomial time 5

16 Maintain a population of solutions. Good solutions mate to obtain even better ones as children. Bad mutations are killed. Include all edges Break cycles that don t include both parents edges 6

17 Major theoretical effort in past 30 years Some results are practical, many are not There both positive and negative results Approximation Complexity Classes APX polynomial time approximation within a constant factor of optimal. PTAS polynomial time in n approximation with a factor of (+a) of optimal for each a > 0. FPTAS polynomial time in n and /a approximation with in a factor of (+a) of optimal. 7

18 Proposal Due next Wednesday, Nov. 7 th. if send to ladner Final paper due December 4 th Friday Purpose Compare different algorithms to solve same problem. Delve into the literature Resources Library Web Instructors 8

19 original Encoder compressed decompressed x y xˆ Decoder Lossless compression Also called entropy coding, reversible coding. Lossy compression Also called irreversible coding. Compression ratio = is number of bits in x. x x x x = xˆ y xˆ 9

20 Conserve storage space Reduce time for transmission Faster to encode, send, then decode than to send the original Progressive transmission Some compression techniques allow us to send the most important bits first so we can get a low resolution version of some data before getting the high fidelity version Reduce computation Use less data to achieve an approximate answer 20

21 Data is not lost - the original is really needed. text compression compression of computer binaries to fit on a floppy Compression ratio typically no better than 4: for lossless compression. Major techniques include Huffman coding Arithmetic coding Dictionary techniques (Ziv,Lempel 977,978) Sequitur (Nevill-Manning, Witten 996) Standards - Morse code, Braille, Unix compress, gzip, zip, GIF, JBIG, JPEG 2

22 Data is lost, but not too much. audio video still images, medical images, photographs Compression ratios of 0: often yield quite high fidelity results. Major techniques include Vector Quantization Wavelets Transforms Standards - JPEG, MPEG 22

23 Developed by Shannon in the 940 s and 50 s Attempts to explain the limits of communication using probability theory. Example: Suppose English text is being sent Suppose a t is received. Given English, the next symbol being a z has very low probability, the next symbol being a h has much higher probability. Receiving a z has much more information in it than receiving a h. We already knew it was more likely we would receive an h. 23

24 Suppose we are given symbols {a, a 2,..., a m }. P(a i ) = probability of symbol a i occurring in the absence of any other information. P(a ) + P(a 2 ) P(a m ) = inf(a i ) = -log 2 P(a i ) bits is the information in bits of a i. y log(x) x 24

25 {a, b, c} with P(a) = /8, P(b) = /4, P(c) = 5/8 inf(a) = -log 2 (/8) = 3 inf(b) = -log 2 (/4) = 2 inf(c) = -log 2 (5/8) =.678 Receiving an a has more information than receiving a b or c. 25

26 The entropy is defined for a probability distribution over symbols {a, a 2,..., a m }. m H = P( ai )log2 i= ( P( a H is the average number of bits required to code up a symbol, given all we know is the probability distribution of the symbols. H is the Shannon lower bound on the average number of bits to code a symbol in this source model. Stronger models of entropy include context. i )) 26

27 {a, b, c} with a /8, b /4, c 5/8. H = /8 *3 + /4 *2 + 5/8*.678 =.3 bits/symbol {a, b, c} with a /3, b /3, c /3. (worst case) H = -3* (/3)*log 2 (/3) =.6 bits/symbol {a, b, c} with a, b 0, c 0 (best case) H = -*log 2 () = 0 Note that the standard coding of 3 symbols takes 2 bits. 27

28 Suppose we have two symbols with probabilities x and -x, respectively..2 -(x log x + (-x)log(-x)) entropy probability of first symbol 28

29 Suppose we are given a string x x 2...x n in an alphabet {a,a 2,...,a m } where P(a i ) is the probability of symbol i. The first-order entropy of x x 2...x n is n H ( x x Lx ) = P( x )inf( x ) = P( x )log2 ( P( x 2 n i i i i i= i= H(x x 2...x n ) is a lower bound on the number of bits to code the string x x 2...x n given only the probabilities of the symbols. This is the Shannon lower bound. n )) 29

30 Suppose we are given an algorithm that compresses a string x of length n and the algorithm only uses the frequencies of the symbols {a,a 2,...,a m } in the string as input. Let c(x) be the compressed result represented in bit. c ( x) H ( x) = nh H = n n log 2 ( m i i i= n n ) where and n i is the frequency of a i. 30

31 x = 0 0 P(0) = 2/2 (from frequencies) P() = 0/2 (from frequencies) H = -((2/2) log 2 (2/2) + (0/2) log 2 (0/2))=.65 Lower bound of 2 x.65 = 7.8 bits Standard for a two symbol alphabet is bits per symbol or 2 bits. There is a potential gain in some algorithm. 3

32 x = P() = P(2) = P(3) = P(6) = /2 (from frequencies) P(4) = P(5) = P(7) = P(8) = 2/2 (from frequencies) H = -((4/2) log 2 (/2) + (8/2) log 2 (2/2))= 2.92 Lower bound of 2 x 2.92 = bits Standard for an 8 symbol alphabet is 3 bits per symbol or 36 bits. No compression algorithm will give us much. 32

33 x = define x k+ = x k +r k r = - - (residual) Compression Algorithm represent x as x, r, r 2,..., r Compress this sequence. 3 bits for x and less than bits for the rest, for less than 4 bits instead of bits. This algorithm does not use just the frequencies of the symbols, but uses correlation between adjacent symbols. 33

34 Huffman (95) Uses frequencies of symbols in a string to build a variable rate prefix code. Each symbol is mapped to a binary string. More frequent symbols have shorter codes. No code is a prefix of another. Example: a 0 b 00 c 0 d 34

35 Example: a 0, b 00, c 0, d Coding: aabddcaa = 6 bits = 4 bits Prefix code ensures unique decodability a a b d d c a a Morse Code an example of variable rate code. E =. and Z =.. 35

36 Example: a 0, b 00, c 0, d 0 Leaves are labeled with symbols. a 0 b 0 c d The code of a symbol is the sequence of labels on the path from the root to the symbol. 36

37 Encoding: send the code, then the encoded data x = aabddcaa c(x) = a0,b00,c0,d, Huffman code code of x In some cases (e.g. H.263 video coder) the code is fixed Decoding: Build the Huffman tree, then use it to decode. a 0 0 b 0 c d repeat start at root of tree repeat if node is not a leaf if read bit = then go right else go left report leaf 37

38 Let p, p 2,..., p m be the probabilities for the symbols a, a 2,...,a m, respectively. Define the cost of the Huffman tree T to be C ( T ) = where r i is the length of the path from the root to a i. C(T) is the expected length of the code of a symbol coded by the tree T. m i= p i r i 38

39 Example: a /2, b /8, c /8, d /4 T 0 a 0 0 b c d C(T) = x /2 + 3 x /8 + 3 x /8 + 2 x /4 =.75 a b c d 39

40 Input: Probabilities p, p 2,..., p m for symbols a, a 2,...,a m, respectively. Output: A Huffman tree that minimizes the average number of bits (bit rate) to code a symbol. That is, minimizes where r i is the length of the path from the root to a i. m HC ( T ) = p r bit rate i i= i 40

41 In an optimal Huffman tree a lowest probability symbol has maximum distance from the root. If not exchanging a lowest probability symbol with one at maximum distance will lower the cost. k p T h p smallest p < q k < h q T q p C(T ) = C(T) + hp - hq + kq - kp = C(T) - (h-k)(q-p) < C(T) 4

42 The second lowest probability is a sibling of the the smallest in some optimal Huffman tree. If not, we can move it there not raising the cost. k q T h p smallest q 2nd smallest q < r k < h r T r p q p C(T ) = C(T) + hq - hr + kr - kq = C(T) - (h-k)(r-q) < C(T) 42

43 Assuming we have an optimal Huffman tree T whose two lowest probability symbols are siblings at maximum depth, they can be replaced by a new symbol whose probability is the sum of their probabilities. The resulting tree is optimal for the new symbol set. T h p smallest q 2nd smallest T q p q+p C(T ) = C(T) + (h-)(p+q) - hp -hq = C(T) - (p+q) 43

44 If T were not optimal then we could find a lower cost tree T. This will lead to a lower cost tree T for the original alphabet. T T T q+p q+p q p C(T ) = C(T ) + p + q < C(T ) + p + q = C(T) which is a contradiction 44

45 form a node for each symbol a i with weight p i ; insert the nodes in a min priority queue ordered by probability; while the priority queue has more than one element do min := delete-min; min2 := delete-min; create a new node n; n.weight := min.weight + min2.weight; n.left := min; n.right := min2; insert(n) return the last node in the priority queue. 45

46 P(a) =.4, P(b)=., P(c)=.3, P(d)=., P(e)= a b c d e a c d b e 46

47 a c d b e a c d b e 47

48 a c.4.6 a d c b e d b e 48

49 .4.6 a a c c d d b e b e 49

50 0 a 0 c 0 d 0 b average number of bits per symbol is.4 x +. x x 2 +. x 3 +. x 4 = 2. e a 0 b 0 c 0 d 0 e 50

51 P(a) =.4, P(b)=., P(c)=.3, P(d)=., P(e)=. Entropy H = -(.4 x log 2 (.4) +. x log 2 (.) +.3 x log 2 (.3) +. x log 2 (.) +. x log 2 (.)) = 2.05 bits per symbol Huffman Code HC =.4 x +. x x 2 +. x 3 +. x 4 = 2. bits per symbol pretty good! 5

52 P(a) = /2, P(b) = /4, P(c) = /8, P(d) = /6, P(e) = /6 Compute the Huffman tree and its bit rate. Compute the Entropy Compare 52

53 The Huffman code is within one bit of the entropy lower bound. H Huffman code does not work well with a two symbol alphabet. Example: P(0) = /00, P() = 99/00 HC = bits/symbol HC H = -((/00)*log 2 (/00) + (99/00)log 2 (99/00)) =.08 bits/symbol H 53

54 Assuming independence P(ab) = P(a)P(b), so we can lump symbols together. Example: P(0) = /00, P() = 99/00 P(00) = /0000, P(0) = P(0) = 99/0000, P() = 980/ HC =.03 bits/symbol (2 bit symbol) =.55 bits/bit Still not that close to H =.08 bits/bit

55 Suppose we add a one symbol context. That is in compressing a string x x 2...x n we want to take into account x k- when encoding x k. New model, so entropy based on just independent probabilities of the symbols doesn t hold. The new entropy model (2nd order entropy) has for each symbol a probability for each other symbol following it. Example: {a,b,c} prev next a b c a b..9 0 c

56 prev next a b c a b..9 0 c...8 Code for first symbol a 00 b 0 c 0 a b c a.4 b a.9. c.8 0 b.2 c.4 a b b a c c a. b. 56

57 Time to design Huffman Code is O(n log n) where n is the number of symbols. Probabilities frequencies in string to be compressed frequencies from a training set for a fixed codebook Huffman works better for larger alphabets. There are adaptive (one pass) Huffman coding algorithms. No need to transmit code. Huffman is still popular. It is simple and it works. 57

58 Binary source with 0 s much more frequent than s. Variable-to-variable length code Prefix code. Golumb code of order 4 input output

59 input output

60 Let p be the probability of 0 Choose the order k such that k + k k p p < p most likely least likely 0 k

61 p =.89 p 7 = p 6 =.503 p 5 =.558 Order = input output

62 order GC entropy entropy p 62

63 Very effective as a run-length code As p goes to effectiveness near entropy There are adaptive Golomb codes, so that the order can change according to frequency of 0 s. 63

CSEP 521 Applied Algorithms Spring Statistical Lossless Data Compression

CSEP 521 Applied Algorithms Spring Statistical Lossless Data Compression CSEP 52 Applied Algorithms Spring 25 Statistical Lossless Data Compression Outline for Tonight Basic Concepts in Data Compression Entropy Prefix codes Huffman Coding Arithmetic Coding Run Length Coding