Computing and Communications 2. Information Theory -Entropy

Similar documents
Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Dept. of Linguistics, Indiana University Fall 2015

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

5 Mutual Information and Channel Capacity

Information Theory Primer:

COMPSCI 650 Applied Information Theory Jan 21, Lecture 2

Lecture 5 - Information theory

LECTURE 3. Last time:

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

Lecture 4 Noisy Channel Coding

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

ECE 4400:693 - Information Theory

Principles of Communications

Machine Learning. Lecture 02.2: Basics of Information Theory. Nevin L. Zhang

Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information

Chapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye

Entropies & Information Theory

Lecture 2: August 31

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

Recitation 2: Probability

CS 630 Basic Probability and Information Theory. Tim Campbell

Shannon s Noisy-Channel Coding Theorem

Lecture 1: September 25, A quick reminder about random variables and convexity

LECTURE 2. Convexity and related notions. Last time: mutual information: definitions and properties. Lecture outline

Lecture 22: Final Review

Lecture 5 Channel Coding over Continuous Channels

Lecture 5: Asymptotic Equipartition Property

Lecture 8: Channel Capacity, Continuous Random Variables

Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity

Information Theory and Coding Techniques: Chapter 1.1. What is Information Theory? Why you should take this course?

Outline of the Lecture. Background and Motivation. Basics of Information Theory: 1. Introduction. Markku Juntti. Course Overview

One Lesson of Information Theory

Introduction to Information Theory

Information Theory and Communication

3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H.

Expectation Maximization

Chapter 2: Source coding

LECTURE 13. Last time: Lecture outline

3F1 Information Theory, Lecture 1

Lecture 16. Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code

Discrete Random Variables

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Shannon s Noisy-Channel Coding Theorem

ELEC546 Review of Information Theory

Quantitative Information Flow. Lecture 7

(Classical) Information Theory III: Noisy channel coding

The binary entropy function

Channel capacity. Outline : 1. Source entropy 2. Discrete memoryless channel 3. Mutual information 4. Channel capacity 5.

Lecture 2. Capacity of the Gaussian channel

Multiple-Input Multiple-Output Systems

M378K In-Class Assignment #1

ELEMENTS O F INFORMATION THEORY

Information Theory and Coding Techniques

02 Background Minimum background on probability. Random process

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

Example: Letter Frequencies

Towards a Theory of Information Flow in the Finitary Process Soup

Example: Letter Frequencies

6.02 Fall 2012 Lecture #1

Shannon s noisy-channel theorem

Discrete Probability Refresher

Capacity of a channel Shannon s second theorem. Information Theory 1/33

Lecture 4 Channel Coding

Information. = more information was provided by the outcome in #2

A Gentle Tutorial on Information Theory and Learning. Roni Rosenfeld. Carnegie Mellon University

Ch. 8 Math Preliminaries for Lossy Coding. 8.5 Rate-Distortion Theory

Introduction to Information Theory. B. Škorić, Physical Aspects of Digital Security, Chapter 2

Lecture Notes on Digital Transmission Source and Channel Coding. José Manuel Bioucas Dias

Exercises with solutions (Set D)

Noisy channel communication

Some Basic Concepts of Probability and Information Theory: Pt. 2

Lecture 8: Shannon s Noise Models

Introduction to Machine Learning

Classification & Information Theory Lecture #8

Introduction to Information Theory

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

Lecture 14 February 28

Classical Information Theory Notes from the lectures by prof Suhov Trieste - june 2006

Intro to Information Theory

ECE 587 / STA 563: Lecture 2 Measures of Information Information Theory Duke University, Fall 2017

Chapter 9 Fundamental Limits in Information Theory

Probabilistic Systems Analysis Spring 2018 Lecture 6. Random Variables: Probability Mass Function and Expectation

QB LECTURE #4: Motif Finding

Exercises with solutions (Set B)

IT and large deviation theory

ITCT Lecture IV.3: Markov Processes and Sources with Memory

Discrete Memoryless Channels with Memoryless Output Sequences

Noisy-Channel Coding

MARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157

1 Basic Information Theory

Chapter 2 Review of Classical Information Theory

Communication Theory II

Communications Theory and Engineering

Lecture 7. Union bound for reducing M-ary to binary hypothesis testing

Theorem 1.7 [Bayes' Law]: Assume that,,, are mutually disjoint events in the sample space s.t.. Then Pr( )

Entropy as a measure of surprise

Topics. Probability Theory. Perfect Secrecy. Information Theory

Lecture 18: Shanon s Channel Coding Theorem. Lecture 18: Shanon s Channel Coding Theorem

Transcription:

1896 1920 1987 2006 Computing and Communications 2. Information Theory -Entropy Ying Cui Department of Electronic Engineering Shanghai Jiao Tong University, China 2017, Autumn 1

Outline Entropy Joint entropy and conditional entropy Relative entropy and mutual information Relationship between entropy and mutual information Chain rules for entropy, relative entropy and mutual information Jensen s inequality and its consequences 2

Reference Elements of information theory, T. M. Cover and J. A. Thomas, Wiley 3

OVERVIEW 4

Information Theory Information theory answers two fundamental questions in communication theory what is the ultimate data compression? -- entropy H what is the ultimate transmission rate of communication? -- channel capacity C Information theory is considered as a subset of communication theory 5

Information Theory Information theory has fundamental contributions to other fields 6

A Mathematical Theory of Commun. In 1948, Shannon published A Mathematical Theory of Communication, founding Information Theory Shannon made two major modifications having huge impact on communication design the source and channel are modeled probabilistically bits became the common currency of communication 7

A Mathematical Theory of Commun. Shannon proved the following three theorems Theorem 1. Minimum compression rate of the source is its entropy rate H Theorem 2. Maximum reliable rate over the channel is its mutual information I Theorem 3. End-to-end reliable communication happens if and only if H < I, i.e. there is no loss in performance by using a digital interface between source and channel coding Impacts of Shannon s results after almost 70 years, all communication systems are designed based on the principles of information theory the limits not only serve as benchmarks for evaluating communication schemes, but also provide insights on designing good ones basic information theoretic limits in Shannon s theorems have now been successfully achieved using efficient algorithms and codes 8

ENTROPY 9

Definition Entropy is a measure of the uncertainty of a r.v. Consider discrete r.v. X with alphabet and p.m.f. p( x) Pr[ X x], x log is to the base 2, and entropy is expressed in bits e.g., the entropy of a fair coin toss is 1 bit define 0log 0 0, since x log x 0 as x 0 adding terms of zero probability does not change the entropy 10

Properties entropy is nonnegative base of log can be changed 11

Example H(X)=1 bit when p=0.5 maximum uncertainty H(X)=0 bit when p=0 or 1 minimum uncertainty concave function of p 12

Example 13

JOINT ENTROPY AND CONDITIONAL ENTROPY 14

Joint Entropy Joint entropy is a measure of the uncertainty of a pair of r.v.s Consider a pair of discrete r.v.s (X,Y) with alphabet and p.m.f.s ( ) Pr[ ],, ( ) Pr[ ],, p x X x x p y Y y y 15

Conditional Entropy Conditional entropy of a r.v. (Y) given another r.v. (X) expected value of entropies of conditional distributions, averaged over conditioning r.v. 16

Chain Rule 17

Chain Rule 18

Example 19

Example 20

RELATIVE ENTROPY AND MUTUAL INFORMATION 21

Relative Entropy Relative entropy is a measure of the distance between two distributions convention: if there is any 0 0 p 0log 0, 0log 0 and p log 0 q 0 x such that p( x) 0 and q( x) 0, then D( p q). 22

Example 23

Mutual Information Mutual information is a measure of the amount of information that one r.v. contains about another r.v. 24

RELATIONSHIP BETWEEN ENTROPY AND MUTUAL INFORMATION 25

Relation 26

Proof 27

Illustration 28

CHAIN RULES FOR ENTROPY, RELATIVE ENTROPY, AND MUTUAL INFORMATION 29

Chain Rule for Entropy 30

Proof 31

Alternative Proof 32

Chain Rule for Information 33

Proof 34

Chain Rule for Relative Entropy 35

Proof 36

JENSEN'S INEQUALITY AND ITS CONSEQUENCES 37

Convex & Concave Functions Examples: 2 x convex functions: x, x, e, x log x (for x 0) concave functions: log x and x (for x 0) linear functions ax b are both convex and concave 38

Convex & Concave Functions 39

Jensen s Inequality 40

Information Inequality 41

Proof 42

Nonnegativity of Mutual Information 43

Max. Entropy Dist. Uniform Dist. 44

Conditioning Reduces Entropy 45

Independence Bound on Entropy 46

Summary 47

Summary 48

Summary 49

cuiying@sjtu.edu.cn iwct.sjtu.edu.cn/personal/yingcui 50