Introduction to Information Theory. B. Škorić, Physical Aspects of Digital Security, Chapter 2

Introduction to Information Theory B. Škorić, Physical Aspects of Digital Security, Chapter 2 1

Information theory What is it? - formal way of counting information bits Why do we need it? - often used in crypto - best way to talk about physical sources of randomness 2

Measuring ignorance Experiment X with unpredictable outcome known possible outcomes xi known probabilities pi. We want to measure the unpredictability of the experiment should increase with #outcomes should be maximal for uniform pi 3

Measuring ignorance plaintext Encrypt ciphertext Decrypt plaintext Eve How much does Eve know about the plaintext given that she has seen the ciphertext? 4

Measuring common knowledge noise data data How much of the info that Alice sends does actually reach Bob? ( channel capacity ) 5

Data compression Yawn Bla bla blather bla dinner bla bla bla bla blep bla bla bla bla bla bla bla tonight bla blabla bla blather bla bla blather bla bla blab bla dinner tonight How far can a body of data be compressed? 6

plaatje Claude Shannon (1916-2001) Groundbreaking work in 1948 7

Shannon entropy Notation: H(X) or H( {p1,p2,...} ) Formal requirements for counting information bits Sub-additivity: H(X,Y) H(X) + H(Y) - equality only when X and Y independent Expansibility: extra outcome xj with pj=0 has no effect Normalization: - entropy of (1/2, 1/2) is 1 bit; of (1,0,0,...) is zero. Unique expression satisfying all requirements: H(X) = Σx px log2(px) 8

Examples of Shannon entropy H(X) = Σx px log(px) Uniform distribution X (1/n,..., 1/n). H(X) = log n. Binary entropy function. X (p, 1-p). h(p) = h(1-p) = p log p (1-p) log(1-p) 9

Mini quiz H(X) = Σx px log(px) X (1/2, 1/4, 1/4). Compute H(X). Which yes/no questions would you ask to quickly determine the outcome of the experiment? How many questions do you need on average? 10

Interpretation of Shannon entropy 1) Average number of binary questions needed to decide on outcome of experiment (lower bound) 2) Lower bound on compression size 3) Theory of types : Given pmf S = (s1,...,sq); N independent experiments; Type T sequences with exactly ni = N si occurrences of i. t T. Pr[X = t] = q i=1 s n i i = q i=1 s Ns i i =2 P i Ns i log s i =2 NH(S). 11

Relative entropy Measure of distance between distributions - asymmetric D(P Q) = - non-negative x - zero only when distr. are identical Interpretation - when you think distr. is Q, but actually it is P, #questions H(P) + D(P Q). X P(x) log P(x) Q(x) 12

Time for an exercise Prove that relative entropy is non-negative, D(P Q) 0. Prove sub-additive property, H(X,Y) H(X) + H(Y) 13

Conditional entropy Joint distribution (X,Y) P. Conditional probability: py x = pxy / px. Entropy of Y for given X=x: H(Y X=x) = Σy py x log(py x) Conditional entropy of Y given X: H(Y X) = Ex[H(Y X=x)] = -Σxy pxy log(py x) 14

Conditional entropy Watch out: Y X is not a RV, but a set of RVs. Y X=x is a RV Chain rule: pxy = px py x = py px y. H(X,Y) = H(X) + H(Y X) = H(Y) + H(X Y). 15

Some properties Conditioning reduces entropy: H(X Y) H(X) Proof: chain rule H(X Y) = H(X,Y) H(Y) sub-additive property H(X,Y) H(X) + H(Y) 16

Mutual information H(X) H(Y) H(X Y) I(X;Y) H(Y X) Mutual information I(X;Y) - overlap between X and Y - non-negative - binary questions in common Total area is H(X,Y) I(X;X) = H(X) 17

Mutual information H(X) H(Y) H(X Y) I(X;Y) H(Y X) I(X;Y) = H(X) H(X Y) = H(Y) H(Y X) = H(X,Y) H(X Y) H(Y X) = H(X) + H(Y) H(X,Y) = D( XY X Y) I(X;Y) = Σxy pxy log(pxy / px py) 18

Uses of mutual information noise N data X Channel capacity I(X;Y) Y=X+N Noisy broadcast X Z Y Secret key generation capacity I(X;Y Z) 19

Uses of mutual information / cond. entropies Communication theory (error correction,...) Crypto Biometrics Statistics (classification, hypothesis testing,...) Image processing Statistical physics Econometrics... 20

Other information measures Min-entropy Hmin(X) = log(pmax) also called guessing entropy crypto: pmax is Prob[correct guess in one go] Rényi entropy α>1 collision entropy H (X) = 1 1 log x X p x. we will encounter α=2 lim α H α (X) = Hmin(X) lim α 1 H α (X) = H(X) 21

... 22