Introduction to Information Theory

Similar documents
Information Theory and Coding Techniques: Chapter 1.1. What is Information Theory? Why you should take this course?

Outline of the Lecture. Background and Motivation. Basics of Information Theory: 1. Introduction. Markku Juntti. Course Overview

ELEMENTS O F INFORMATION THEORY

Computing and Communications 2. Information Theory -Entropy

INTRODUCTION TO INFORMATION THEORY

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Dept. of Linguistics, Indiana University Fall 2015

CSCI 2570 Introduction to Nanocomputing

Algorithmic Probability

5 Mutual Information and Channel Capacity

Entropies & Information Theory

Recitation 2: Probability

Lecture 4 Noisy Channel Coding

10-704: Information Processing and Learning Fall Lecture 9: Sept 28

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

Information in Biology

3F1 Information Theory, Lecture 1

Lecture 22: Final Review

Digital Communications III (ECE 154C) Introduction to Coding and Information Theory

Lecture 5: Asymptotic Equipartition Property

Information in Biology

CS 630 Basic Probability and Information Theory. Tim Campbell

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

Information Theory and Statistics Lecture 2: Source coding

CISC 876: Kolmogorov Complexity

Module 1. Introduction to Digital Communications and Information Theory. Version 2 ECE IIT, Kharagpur

Origins of Probability Theory

Chapter 9 Fundamental Limits in Information Theory

Expectation Maximization

Lecture 1: September 25, A quick reminder about random variables and convexity

National University of Singapore Department of Electrical & Computer Engineering. Examination for

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity

Computing Probability

(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute

Complexity 6: AIT. Outline. Dusko Pavlovic. Kolmogorov. Solomonoff. Chaitin: The number of wisdom RHUL Spring Complexity 6: AIT.

Random Signals and Systems. Chapter 1. Jitendra K Tugnait. Department of Electrical & Computer Engineering. James B Davis Professor.

Shannon s A Mathematical Theory of Communication

Lecture 7. Union bound for reducing M-ary to binary hypothesis testing

Information Theory Primer:

2011 Pearson Education, Inc

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye

Kolmogorov Complexity

Modern Digital Communication Techniques Prof. Suvra Sekhar Das G. S. Sanyal School of Telecommunication Indian Institute of Technology, Kharagpur

Shannon s noisy-channel theorem

1 Background on Information Theory

Lecture 2: August 31

Classical Information Theory Notes from the lectures by prof Suhov Trieste - june 2006

A little context This paper is concerned with finite automata from the experimental point of view. The behavior of these machines is strictly determin

Lecture Notes 1 Basic Probability. Elements of Probability. Conditional probability. Sequential Calculation of Probability

The Method of Types and Its Application to Information Hiding

DEEP LEARNING CHAPTER 3 PROBABILITY & INFORMATION THEORY

6.02 Fall 2011 Lecture #9

Midterm Exam Information Theory Fall Midterm Exam. Time: 09:10 12:10 11/23, 2016

Chaos, Complexity, and Inference (36-462)

Information Theory in Intelligent Decision Making

Shannon s Noisy-Channel Coding Theorem

Discrete Probability and State Estimation

Classification & Information Theory Lecture #8

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

(Classical) Information Theory II: Source coding

Introduction to Information Entropy Adapted from Papoulis (1991)

COS597D: Information Theory in Computer Science October 19, Lecture 10

3F1 Information Theory, Lecture 3

ELEMENT OF INFORMATION THEORY

On bounded redundancy of universal codes

Machine Learning Lecture 14

Probability, Entropy, and Inference / More About Inference

Quantum versus Classical Measures of Complexity on Classical Information

Single Maths B: Introduction to Probability

ELEC546 Review of Information Theory

Lecture 8: Channel and source-channel coding theorems; BEC & linear codes. 1 Intuitive justification for upper bound on channel capacity

Lecture 11: Information theory THURSDAY, FEBRUARY 21, 2019

Data Compression. Limit of Information Compression. October, Examples of codes 1

Probability theory basics

Notes 3: Stochastic channels and noisy coding theorem bound. 1 Model of information communication and noisy channel

Lecture 5 Channel Coding over Continuous Channels

Chapter 6 The Structural Risk Minimization Principle

A Single-letter Upper Bound for the Sum Rate of Multiple Access Channels with Correlated Sources

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16

LECTURE 1. 1 Introduction. 1.1 Sample spaces and events

EECS 750. Hypothesis Testing with Communication Constraints

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

Brief Review of Probability

EE 178 Lecture Notes 0 Course Introduction. About EE178. About Probability. Course Goals. Course Topics. Lecture Notes EE 178

An introduction to basic information theory. Hampus Wessman

Information and Entropy

Lecture 11: Quantum Information III - Source Coding

Discrete Probability and State Estimation

Coding, Information Theory (and Advanced Modulation) Prof. Jay Weitzen Ball 411

UNIT I INFORMATION THEORY. I k log 2

(Classical) Information Theory III: Noisy channel coding

SIGNAL COMPRESSION Lecture Shannon-Fano-Elias Codes and Arithmetic Coding

Uncertain Knowledge and Bayes Rule. George Konidaris

6.02 Fall 2012 Lecture #1

Lecture 1: Probability Fundamentals

Block 2: Introduction to Information Theory

Advanced Herd Management Probabilities and distributions

Noisy-Channel Coding

Transcription:

Introduction to Information Theory Impressive slide presentations Radu Trîmbiţaş UBB October 2012 Radu Trîmbiţaş (UBB) Introduction to Information Theory October 2012 1 / 19

Transmission of information A digital transmission scheme generally involves (see gure) Input/output are considered digital (analogue sampled/quantised) Radu Trîmbiţaş (UBB) Introduction to Information Theory October 2012 2 / 19

Introduction Information theory answers two fundamental questions in communication theory: Radu Trîmbiţaş (UBB) Introduction to Information Theory October 2012 3 / 19

Introduction Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? Answer: the entropy H Radu Trîmbiţaş (UBB) Introduction to Information Theory October 2012 3 / 19

Introduction Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? Answer: the entropy H What is the ultimate transmission rate of communication? Answer: the channel capacity C Radu Trîmbiţaş (UBB) Introduction to Information Theory October 2012 3 / 19

Introduction Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? Answer: the entropy H What is the ultimate transmission rate of communication? Answer: the channel capacity C For this reason information theory is considered to be a subset of communication theory Radu Trîmbiţaş (UBB) Introduction to Information Theory October 2012 3 / 19

Introduction Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? Answer: the entropy H What is the ultimate transmission rate of communication? Answer: the channel capacity C For this reason information theory is considered to be a subset of communication theory It is much more, for it has fundamental contributions in Radu Trîmbiţaş (UBB) Introduction to Information Theory October 2012 3 / 19

Introduction Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? Answer: the entropy H What is the ultimate transmission rate of communication? Answer: the channel capacity C For this reason information theory is considered to be a subset of communication theory It is much more, for it has fundamental contributions in statistical physics (thermodynamics); Radu Trîmbiţaş (UBB) Introduction to Information Theory October 2012 3 / 19

Introduction Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? Answer: the entropy H What is the ultimate transmission rate of communication? Answer: the channel capacity C For this reason information theory is considered to be a subset of communication theory It is much more, for it has fundamental contributions in statistical physics (thermodynamics); computer science (Kolmogorov complexity or algorithmic complexity); Radu Trîmbiţaş (UBB) Introduction to Information Theory October 2012 3 / 19

Introduction Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? Answer: the entropy H What is the ultimate transmission rate of communication? Answer: the channel capacity C For this reason information theory is considered to be a subset of communication theory It is much more, for it has fundamental contributions in statistical physics (thermodynamics); computer science (Kolmogorov complexity or algorithmic complexity); statistical inference (Occam s Razor: "the simplest explanation is best"); Radu Trîmbiţaş (UBB) Introduction to Information Theory October 2012 3 / 19

Introduction Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? Answer: the entropy H What is the ultimate transmission rate of communication? Answer: the channel capacity C For this reason information theory is considered to be a subset of communication theory It is much more, for it has fundamental contributions in statistical physics (thermodynamics); computer science (Kolmogorov complexity or algorithmic complexity); statistical inference (Occam s Razor: "the simplest explanation is best"); probability and statistics (error exponents for optimal hypothesis testing and estimation. Radu Trîmbiţaş (UBB) Introduction to Information Theory October 2012 3 / 19

Claude Shannon Claude Elwood Shannon, A mathematical theory of communication, Bell System Technical Journal, 1948. Almost all important topics in information theory were initiated by Shannon 1916-2001 Radu Trîmbiţaş (UBB) Introduction to Information Theory October 2012 4 / 19

Origin of Information Theory Common wisdom in 1940s: It is impossible to send information error-free at a positive rate Error control by using retransmission: rate! 0 if error-free Still in use today ARQ (automatic repeat request) in TCP/IP computer networking Shannon showed reliable communication is possible for all rates below channel capacity As long as source entropy is less than channel capacity, asymptotically error-free communication can be achieved And anything can be represented in bits Rise of digital information technology Radu Trîmbiţaş (UBB) Introduction to Information Theory October 2012 5 / 19

Relationship of information theory to other elds Radu Trîmbiţaş (UBB) Introduction to Information Theory October 2012 6 / 19

IT and limits of communication theory Information theory today represents the extreme points of the set of all possible communication schemes The data compression minimum I (X ; bx ) lies at one extreme of the set of communication ideas. All data compression schemes require description rates at least equal to this minimum. At the other extreme is the data transmission maximum I (X ; Y ), known as the channel capacity. All modulation schemes and data compression schemes lie between these limits. Radu Trîmbiţaş (UBB) Introduction to Information Theory October 2012 7 / 19

Relationship of information theory to other elds I Computer Science (Kolmogorov Complexity). Kolmogorov, Chaitin, and Solomono put forth the idea that the complexity of a string of data can be de ned by the length of the shortest binary computer program for computing the string. Thus, the complexity is the minimal description length. This de nition of complexity turns out to be universal, that is, computer independent, and is of fundamental importance. Thus, Kolmogorov complexity lays the foundation for the theory of descriptive complexity. Gratifyingly, the Kolmogorov complexity K is approximately equal to the Shannon entropy H if the sequence is drawn at random from a distribution that has entropy H. So the tie-in between information theory and Kolmogorov complexity is perfect. Indeed, we consider Kolmogorov complexity to be more fundamental than Shannon entropy. It is the ultimate data compression and leads to a logically consistent procedure for inference. Radu Trîmbiţaş (UBB) Introduction to Information Theory October 2012 8 / 19

Relationship of information theory to other elds II Physics (Thermodynamics). Statistical mechanics is the birthplace of entropy and the second law of thermodynamics. Entropy always increases. Among other things, the second law allows one to dismiss any claims to perpetual motion machines. Mathematics (Probability Theory and Statistics). The fundamental quantities of information theory entropy, relative entropy, and mutual information are de ned as functionals of probability distributions. In turn, they characterize the behavior of long sequences of random variables and allow us to estimate the probabilities of rare events (large deviation theory) and to nd the best error exponent in hypothesis tests. Economics (Investment). Repeated investment in a stationary stock market results in an exponential growth of wealth. The growth rate of the wealth is a dual of the entropy rate of the stock market. The parallels between the theory of optimal investment in the stock market and information theory are striking. Radu Trîmbiţaş (UBB) Introduction to Information Theory October 2012 9 / 19

Relationship of information theory to other elds III Computation vs. Communication. As we build larger computers out of smaller components, we encounter both a computation limit and a communication limit. Computation is communication limited and communication is computation limited. These become intertwined, and thus all of the developments in communication theory via information theory should have a direct impact on the theory of computation. Philosophy of Science (Occam s Razor). William of Occam said Causes shall not be multiplied beyond necessity, or to paraphrase it, The simplest explanation is best. Solomono and Chaitin argued persuasively that one gets a universally good prediction procedure if one takes a weighted combination of all programs that explain the data and observes what they print next. Moreover, this inference will work in many problems not handled by statistics. For example, this procedure will eventually predict the subsequent digits of π.when this procedure is applied to coin ips that come up heads with probability Radu Trîmbiţaş (UBB) Introduction to Information Theory October 2012 10 / 19

Relationship of information theory to other elds IV 0.7, this too will be inferred. When applied to the stock market, the procedure should essentially nd all the laws of the stock market and extrapolate them optimally. In principle, such a procedure would have found Newton s laws of physics. Of course, such inference is highly impractical, because weeding out all computer programs that fail to generate existing data will take impossibly long. We would predict what happens tomorrow a hundred years from now. Radu Trîmbiţaş (UBB) Introduction to Information Theory October 2012 11 / 19

Course Objectives In this course we will (focus on communication theory): De ne what we mean by information. Show how we can compress the information in a source to its theoretically minimum value and show the tradeo between data compression and distortion. Prove the channel coding theorem and derive the information capacity of di erent channels. Generalize from point-to-point to network information theory. Radu Trîmbiţaş (UBB) Introduction to Information Theory October 2012 12 / 19

Course Contents 1 Review of basic notions of probability theory 2 Entropy, informational divergence, joint entropy, conditional entropy, mutual information, conditional information 3 Fano s inequality and data processing lemma 4 Information sources, entropy rate, Markov chains 5 Data compression, optimal codes, Hu man codes, Shannon-Fano-Elias codes, Shannon codes 6 Channel capacity, the Channel Coding Theorem 7 Maximum entropy distributions, universal source coding Radu Trîmbiţaş (UBB) Introduction to Information Theory October 2012 13 / 19

Basic rule of probability theory I S - sample space, elements of S elementary events, A S random event A collection K of events from S is a σ- eld (σ-algebra) if 1 K 6= 2 A 2 K =) A 2 K [ 3 if A n 2 K, 8n 2 N, then A n 2 K n=1 (S, K) measurable space (S, K, P) probability space, A S random event, A 2 K Axioms of probability P : K! R P1) P(S) = 1 P2) P(A) 0, 8A 2 K Radu Trîmbiţaş (UBB) Introduction to Information Theory October 2012 14 / 19

Basic rule of probability theory II P3) If A i \ A j =, i 6= j, i, j 2 I N, A i 2 K P [ i 2I! A i = P(A i ) i 2I Conditional probability P(AjB) = P(A \ B), if P(B) > 0. P(B) A and B are independent events if P (A \ B) = P(A)P(B) Properties P(A \ B) = P(AjB)P(B) = P(BjA)P(A) Radu Trîmbiţaş (UBB) Introduction to Information Theory October 2012 15 / 19

Basic rule of probability theory III multiplication rule P(A 1 \...\A n ) =P(A 1 )P(A 2 ja 1 )P(A 3 ja 1 \A 2 )... P(A n ja 1 \ \A n 1 ) S total probability formula: if fa i g i 2I is a partition of S (i.e. A i = S i 2I and A i \ A j =?, 8i 6= j) then P(A) = P(A i )P(AjA i ) i 2I Bayes formula: if fa i g i 2I is a partition of S then P(A k ja) = P(AjA k )P(A k ) P(A i )P(AjA i ) i 2I P(A) = P(AjB)P(B) + P(AjB)P(B) Radu Trîmbiţaş (UBB) Introduction to Information Theory October 2012 16 / 19

Discrete Random Variables I X - the alphabet of X (the set of all values of X ). X discrete random variable x X p(x) x 2X p is the probability mass function of X p(x) 0, and x 2X p(x) = 1 p(x) = P(X = x) Y y p(y) y 2Y Radu Trîmbiţaş (UBB) Introduction to Information Theory October 2012 17 / 19

Discrete Random Variables II The joint probability distribution of X and Y p(x, y) = P (X = x \ Y = y) = p(y, x) p(x, y) 0, p(x, y) = 1 x 2X y 2Y marginal probability distribution of X and Y from joint distribution p(x) = P(X = x) = p(x, y) y 2Y p(y) = P(Y = y) = p(x, y) x 2X Radu Trîmbiţaş (UBB) Introduction to Information Theory October 2012 18 / 19

Discrete Random Variables III conditional probabilities p(x, y) p(xjy) = P (X = xjy = y) = p(y) p(x, y) p(yjx) = P (Y = yjx = x) = p(x) p(x, y) = p(xjy)p(y) = p(yjx)p(x) = p(xjy)p(y) p(x) Radu Trîmbiţaş (UBB) Introduction to Information Theory October 2012 19 / 19