INTRODUCTION TO INFORMATION THEORY

Similar documents
Lecture 5 - Information theory

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Dept. of Linguistics, Indiana University Fall 2015

Lecture 11: Information theory THURSDAY, FEBRUARY 21, 2019

Lecture 11: Continuous-valued signals and differential entropy

COMPSCI 650 Applied Information Theory Jan 21, Lecture 2

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

Information. = more information was provided by the outcome in #2

A Gentle Tutorial on Information Theory and Learning. Roni Rosenfeld. Carnegie Mellon University

CS 630 Basic Probability and Information Theory. Tim Campbell

Information Theory Primer:

ECON 7335 INFORMATION, LEARNING AND EXPECTATIONS IN MACRO LECTURE 1: BASICS. 1. Bayes Rule. p(b j A)p(A) p(b)

Introduction to Information Theory. Part 4

Information Theory in Intelligent Decision Making

6.02 Fall 2012 Lecture #1

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

Probability theory for Networks (Part 1) CS 249B: Science of Networks Week 02: Monday, 02/04/08 Daniel Bilar Wellesley College Spring 2008

Topics. Probability Theory. Perfect Secrecy. Information Theory

3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H.

Lecture 1: Introduction, Entropy and ML estimation

Information in Biology

Introduction to Information Theory

Towards a Theory of Information Flow in the Finitary Process Soup

Information in Biology

3F1 Information Theory, Lecture 1

Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information

Information & Correlation

ECE 4400:693 - Information Theory

Classical Information Theory Notes from the lectures by prof Suhov Trieste - june 2006

EE5585 Data Compression May 2, Lecture 27

Discrete Random Variables

EE376A: Homework #2 Solutions Due by 11:59pm Thursday, February 1st, 2018

Shannon s noisy-channel theorem

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

Machine Learning. Lecture 02.2: Basics of Information Theory. Nevin L. Zhang

An introduction to basic information theory. Hampus Wessman

Recitation 2: Probability

Lecture 11: Quantum Information III - Source Coding

Communication Theory II

Chapter 8: An Introduction to Probability and Statistics

Outline of the Lecture. Background and Motivation. Basics of Information Theory: 1. Introduction. Markku Juntti. Course Overview

Chapter 2 Review of Classical Information Theory

Classification & Information Theory Lecture #8

Computing and Communications 2. Information Theory -Entropy

M378K In-Class Assignment #1

Entropy. Probability and Computing. Presentation 22. Probability and Computing Presentation 22 Entropy 1/39

Exercise 1. = P(y a 1)P(a 1 )

X 1 : X Table 1: Y = X X 2

Lecture 22: Final Review

Machine Learning Srihari. Information Theory. Sargur N. Srihari

P [(E and F )] P [F ]

Principles of Communications

ELEMENT OF INFORMATION THEORY

Lecture 2: August 31

Lecture 18: Quantum Information Theory and Holevo s Bound

Shannon's Theory of Communication

Exercises with solutions (Set D)

Noisy channel communication

Notes 3: Stochastic channels and noisy coding theorem bound. 1 Model of information communication and noisy channel

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

The binary entropy function

Entropy Rate of Stochastic Processes

02 Background Minimum background on probability. Random process

Lecture 4: Probability and Discrete Random Variables

Problem Set: TT Quantum Information

Mutual Information & Genotype-Phenotype Association. Norman MacDonald January 31, 2011 CSCI 4181/6802

Machine Learning 3. week

A Brief Introduction to Shannon s Information Theory

God doesn t play dice. - Albert Einstein

Capacity of a channel Shannon s second theorem. Information Theory 1/33

n N CHAPTER 1 Atoms Thermodynamics Molecules Statistical Thermodynamics (S.T.)

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

1 Introduction. Sept CS497:Learning and NLP Lec 4: Mathematical and Computational Paradigms Fall Consider the following examples:

A brief review of basics of probabilities

Chapter 9 Fundamental Limits in Information Theory

Information Theory and Coding Techniques

Notes 1 Autumn Sample space, events. S is the number of elements in the set S.)

Experimental Design to Maximize Information

Entropies & Information Theory

MAHALAKSHMI ENGINEERING COLLEGE-TRICHY QUESTION BANK UNIT V PART-A. 1. What is binary symmetric channel (AUC DEC 2006)

CSCI 2570 Introduction to Nanocomputing

Probability Theory and Simulation Methods

Introduction to Information Theory. B. Škorić, Physical Aspects of Digital Security, Chapter 2

An Introduction to Information Theory: Notes

1.6. Information Theory

Introduction to Information Theory. Part 2

Intro to Information Theory

Exercises with solutions (Set B)

ECE Information theory Final

Massachusetts Institute of Technology

Signal Processing - Lecture 7

How to Quantitate a Markov Chain? Stochostic project 1

EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet.

MAHALAKSHMI ENGINEERING COLLEGE QUESTION BANK. SUBJECT CODE / Name: EC2252 COMMUNICATION THEORY UNIT-V INFORMATION THEORY PART-A

Introduction to Machine Learning

PERFECT SECRECY AND ADVERSARIAL INDISTINGUISHABILITY

log 2 N I m m log 2 N + 1 m.

Transcription:

INTRODUCTION TO INFORMATION THEORY KRISTOFFER P. NIMARK These notes introduce the machinery of information theory which is a eld within applied mathematics. The material can be found in most textbooks on information theory. Simple introductions are Luenberger (2006) and Pierce (980). More advanced material can be found in Cover and Thomas (2006) and Kullback (968).. Information and entropy In the everyday meaning of the word, information means many things. To treat the topic formally, we need to be more precise... De ning and measuring information. Claude Shannon proposed to de ne the amount of information I contained in a message with probability p as I log (=p) = log p (.) Does this de nition make sense? Yes, being told that something unusual has occurred is in some sense more informative than knowing that something that often happens have occurred since it changes our beliefs about the state more. Example: The message "It snowed in Ithaca in July" contains more information than "It snowed in Ithaca in January." While the logarithm in the de nition can be taken with respect to any base in (.), it is common in information theory to use base 2 logarithms. The amount of information is then measured in bits. Because of this convention, whenever the base of the logarithm is left unspeci ed, it is taken to be 2. Date: April 24, 208.

2 KRISTOFFER P. NIMARK.2. Additivity of information. Information about independent events is additive: A message about two independent events, A and B, contains I AB = log p A p B = log p A log p B = I A + I B (.2).3. Entropy. Entropy is a measure of the expected amount of information. For two events with probability p and p, entropy is given by H(p) = p log p ( p) log( p) (.3).3.. Example: The weather in California. The weather in California is sunny with probability 7/8 and cloudy with probability /8. The average information on sunny and cloudy days so that H = (7=8) log (7=8) (=8) log (=8) = [7 log 7 8 7 log 8 log 8] = [7 2:8 + 7 3 8 3] = :54 bits Consider the general expression for a two event source H(p) = p log p ( p) log( p) (.4) If p equals either 0 or, the event is completely determined and entropy is then 0. Entropy is maximized for p = =2 and H =..3.2. Example. The game 20 questions: What questions do you start with? Which questions do you expect to allow you to reject the largest number of candidates?.4. Information Sources. An information source is de ned to consist of the possible mutually exclusive outcomes (events) and their associated probabilities. The entropy of an

INFORMATION THEORY 3 n-event source is H(p ; p 2 ; :::; p n ) = p log p p 2 log p 2 ::: p n log p n.4.. Example. A three event source with probabilities /2, /4,/4 have entropy H (=2; =4; =4) = 2 log 2 + 4 log 4 + 4 log 4 = 3 2.5. Two properties of entropy. () Non-negativity:H 0 (2) H(p ; p 2 ; :::; p n ) log n and with equality if p i = p j 8i; j.6. Additivity. Entropy is additive in the same way information is. If two events, S and T are independent, then the probability of the event (T; S) is p T p S and we then have that H(S; T ) = H(S) + H(T ) We call the (S; T ) the product of two sources. If the two sources are independent, the entropy of the product is given by H (S; T ) = p s p t log p s p t = = s2s;t2t s2s;t2t p s p t [log p s + log p t ] " # " p t p s log p s p s t2t s2s s2s t2t = H(S) + H(T ) p t log p t # since p t = ; p s = : t2t s2s

4 KRISTOFFER P. NIMARK.7. Mixture of sources. Example: A source can by probability p be generated by a die and with probability p be generated by a ip of a coin. The event space is then,2,3,4,5,6,heads, tails. The entropy of the source is H = ph(die) + ( p) H(coin) + H(p).8. Bits as measure. Bits measure length of binary strings. Entropy of a string of binary numbers only coincides with the number of bits if all events are equally likely. Otherwise, the entropy is lower than the number of bits. 2. Differential entropy Entropy is onloy de ned for discrete random variables. The corresponding concept fot continuously distributed random variables is called di erential entropy. Not much changes in terms of results and interpreations, aprt from that integrals replace summations. De nition. The di erential entropy of when is a continuous random variable is de ned as where S is the support of f h() = Z S f(x) log f(x)dx Example. Uniform distribution (0; a) so that f(x) = a h() = Z a 0 = log a a log a dx Entropy is thus increasing in length of interval, which is intuitive.

INFORMATION THEORY 5 Example 2. Normal distribution (0; 2 ) so that f(x) = p 2 e x2 =(2 2 ) h() = Z f(x) log p e x2 =(2 2 ) dx 2 = log 2e2 2 The entropy of a normally distributed variable is thus increasing in its variance, which is also intuitive since entropy is also a natural measure of uncertainty. 3. Channels An information channel transmits information about the random variable (input) to the random variable Y (output). A discrete channel is de ned by transition probabilities between states of the input and the output Y; or equivalently, the conditional distribution of of outputs (signals), given the inputs (states). Example 3. Binary symmetric channel ; Y 2 f0; g p(y = ) = p p(y 6= ) = p Example 4. Binary asymmetric channel ; Y 2 f0; g p(y = j = ) = p p(y = 0 j = 0) = q and p 6= q:

6 KRISTOFFER P. NIMARK 3.. Conditional and joint entropies. Conditional entropy is denoteds H(Y j ):It can be derived by starting with the entropy of Y conditional on a particular realization of H (Y j x i ) = y j p (y j j x i ) log p (y j j x i ) : De nition 2. Conditional entropy H(Y j) H(Y j ) = x i H (Y j x i ) p (x i ) : Example 5. The toss of a die. Let Y be outcome of a six sided die and let be the events High (5 or 6) and Not High (,2,3, or 4). The conditional entropy is H(Y j ) = 3 log 2 + 2 3 log 4 = 5 3 < 2:56 = H(Y ) This is a general result: Lemma. Conditioning on a random variable cannot increase entropy 0 H(Y j ) H(Y ) However, it is not generally true that H(Y j x i ) H(Y ): Lemma 2. Joint entropy H(; Y ) = H(Y j ) + H() = H( j Y ) + H(Y ) 3.2. Mutual Information.

INFORMATION THEORY 7 De nition 3. The mutual information of a random variable given a random variable Y is I(; Y ) = H() H( j Y ) Interpretations: Mutual information is the reduction in uncertainty about that occurs when Y is observed Properties of mutual information () I(; Y ) = H(; Y ) H( j Y ) H(Y j ) (2) I(; Y ) = H() H(Y j Y ) (3) I(; Y ) = H(Y ) H(Y j ) (4) (5) I(; Y ) 0 (6) I(; ) = H() I(; Y ) = x p(x; y) p(x; y) log p(x)p(y) y References [] Cover, T.M. and J.A. Thomas, Elements of Information Theory, 2006, Wiley. [2] Kullback S. Information theory and statistics. New York: Dover, 2nd ed.. 968. [3] Luenberger, D.G., Information Science, 2006, Princeton University Press. [4] Pierce, An Introduction to Information Theory: Symbols, Signals and Noise, Dover books, 980.