Infor mation Theor y. Outline

Similar documents
Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet)

Information Theory, Statistics, and Decision Trees

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

CSCI 2570 Introduction to Nanocomputing

UNIT I INFORMATION THEORY. I k log 2

1 Introduction to information theory

Lecture 1: Shannon s Theorem

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Lecture 8: Shannon s Noise Models

Dept. of Linguistics, Indiana University Fall 2015

A Mathematical Theory of Communication

Entropy as a measure of surprise

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE

6.02 Fall 2012 Lecture #1

Information & Correlation

3F1: Signals and Systems INFORMATION THEORY Examples Paper Solutions

Classification and Regression Trees

Noisy channel communication

Information and Entropy

Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi

Digital communication system. Shannon s separation principle

Optimisation and Operations Research

Lecture 11: Information theory THURSDAY, FEBRUARY 21, 2019

Lecture 1: Introduction, Entropy and ML estimation

An introduction to basic information theory. Hampus Wessman

Optimal codes - I. A code is optimal if it has the shortest codeword length L. i i. This can be seen as an optimization problem. min.

Shannon's Theory of Communication

Information Theory and Coding Techniques

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code

INTRODUCTION TO INFORMATION THEORY

Image and Multidimensional Signal Processing

Basic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols.

Massachusetts Institute of Technology

Lecture 17: Trees and Merge Sort 10:00 AM, Oct 15, 2018

CMPT 365 Multimedia Systems. Lossless Compression

Welcome to Comp 411! 2) Course Objectives. 1) Course Mechanics. 3) Information. I thought this course was called Computer Organization

Lecture 7. Union bound for reducing M-ary to binary hypothesis testing

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

2018/5/3. YU Xiangyu

Lecture 1 : Data Compression and Entropy

MAHALAKSHMI ENGINEERING COLLEGE-TRICHY QUESTION BANK UNIT V PART-A. 1. What is binary symmetric channel (AUC DEC 2006)

Basic information theory

PART III. Outline. Codes and Cryptography. Sources. Optimal Codes (I) Jorge L. Villar. MAMME, Fall 2015

Coding of memoryless sources 1/35

1. Basics of Information

Introduction to Information Theory. Part 4

Exercise 1. = P(y a 1)P(a 1 )

CSE468 Information Conflict

Digital Image Processing Lectures 25 & 26

Chapter 9 Fundamental Limits in Information Theory

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Outline. Computer Science 418. Number of Keys in the Sum. More on Perfect Secrecy, One-Time Pad, Entropy. Mike Jacobson. Week 3

Lecture 4 : Adaptive source coding algorithms

Lecture 18: Quantum Information Theory and Holevo s Bound

Lecture 11: Quantum Information III - Source Coding

MAHALAKSHMI ENGINEERING COLLEGE QUESTION BANK. SUBJECT CODE / Name: EC2252 COMMUNICATION THEORY UNIT-V INFORMATION THEORY PART-A

Modern Digital Communication Techniques Prof. Suvra Sekhar Das G. S. Sanyal School of Telecommunication Indian Institute of Technology, Kharagpur

Lecture 7: DecisionTrees

Introduction to Information Theory. By Prof. S.J. Soni Asst. Professor, CE Department, SPCE, Visnagar

Experiments on the Consciousness Prior

log 2 N I m m log 2 N + 1 m.

! Where are we on course map? ! What we did in lab last week. " How it relates to this week. ! Compression. " What is it, examples, classifications

Lecture 2: Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments

Convolutional Coding LECTURE Overview

Lecture 16 Oct 21, 2014

Huffman Coding. C.M. Liu Perceptual Lab, College of Computer Science National Chiao-Tung University

Intelligent Data Analysis Lecture Notes on Document Mining

Lecture 6: Expander Codes

Massachusetts Institute of Technology

1 1 0, g Exercise 1. Generator polynomials of a convolutional code, given in binary form, are g

Physical Layer and Coding

3F1 Information Theory, Lecture 3

Shannon-Fano-Elias coding

A Simple Introduction to Information, Channel Capacity and Entropy

Data Structures and Algorithms

Information and Entropy. Professor Kevin Gold

Lecture 2: Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments. Tutorial 1. Acknowledgement and References for lectures 1 to 5

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Learning Decision Trees

Non-binary Distributed Arithmetic Coding

3F1 Information Theory, Lecture 3

Information in Biology

TTIC 31230, Fundamentals of Deep Learning David McAllester, April Information Theory and Distribution Modeling

Chapter 2: Source coding

Compression and Coding

Upper Bounds on the Capacity of Binary Intermittent Communication

Lecture 2. Capacity of the Gaussian channel

CSC 411 Lecture 3: Decision Trees

Chapter 7: Channel coding:convolutional codes

SIGNAL COMPRESSION Lecture Shannon-Fano-Elias Codes and Arithmetic Coding

Entropy Coding. Connectivity coding. Entropy coding. Definitions. Lossles coder. Input: a set of symbols Output: bitstream. Idea

an information-theoretic model for adaptive side-channel attacks

ECE Information theory Final (Fall 2008)

10-704: Information Processing and Learning Fall Lecture 10: Oct 3

Channel capacity. Outline : 1. Source entropy 2. Discrete memoryless channel 3. Mutual information 4. Channel capacity 5.

Decision Trees. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore

EECS 229A Spring 2007 * * (a) By stationarity and the chain rule for entropy, we have

Introduction to Information Theory

Transcription:

Outline Background Basic Definitions Infor mation and Entropy Independent Messages Codings

Background Infor mation Theor y verses Infor mation Systems Infor mation theor y: originated in electrical engineering communications infor mation theor y as essential for moder n communications as is the transistor for many years: an estranged cousin of infor mation systems only recently have substantial connections between infor mation theor y and systems been discovered Copyr ight 2009-3-2, E Robertson 2

Background Motivation Why study infor mation theor y in an infor mation systems class? to see the only quantified model of infor mation much of philosophy, linguistics, cognitive science ask questions about infor mation to think about infor mation to avoid confusion of terminology to provide an infor mative example of mathematical modeling to employ infor mation measures in database design data warehousing data mining Copyr ight 2009-3-2, E Robertson 3

Basic Definitions Common Usage for Infor mation lots of information implies either 1. meaningful 2. surpr ising Infor mation theor y deals with: sur prise and leaves meaning to the philosophers the quantity, not the value, of infor mation Copyr ight 2009-3-2, E Robertson 4

Basic Definitions Fundamental Assumption Infor mation is something that resolves uncertainty If I tell you something you already know, there is no infor mation in that message. If I tell you something that is ver y likely, there is little infor mation in that message. If I tell you something surpr ising, that message has a lot of infor mation. redundant data contains no infor mation Copyr ight 2009-3-2, E Robertson 5

Basic Definitions Channel Infor mation is transmitted from a source to a destination through a channel. source channel transmitter receiver destination transmitter does encoding receiver does decoding In databases, a quer y approximates this model Copyr ight 2009-3-2, E Robertson 6

Basic Definitions Noise injected by channel for modeling communication source channel transmitter receiver destination noise receiver filters message from noise in databases & other software applications, assume networ k handles noise ability to model and cope with noise major strength of infor mation theor y specific impact of errors on bandwidth leads to a robust theory of error-correcting codes Pioneer 10 Ether net Copyr ight 2009-3-2, E Robertson 7

Basic Definitions Messages Source can transmit a fixed set of n messages e.g. for a weather report, messages = { sunny, rain, snow, } Identify messages with integers {1,2,, n } Only important aspect of message i is the probability p i that it occurs. Hence, character ize messages by Of course, P ={p 1,p 2,,p n } Σ n i=1 p i =1 e.g. for weather report, P depends upon season P Januar y ={0.3, 0.05, 0.2, } P July ={0.25, 0.25, 10-6, } Intuitively, the less likely a message, the more infor mation it contains: It is snowing elicits a ho-hum in Januar y but a big response in July. Copyr ight 2009-3-2, E Robertson 8

Infor mation and Entropy Infor mation Value Goals: for malize the less likely a message, the more infor mation it contains a function that converts multiplication to addition logar ithm is the way to dothis log( x y)=log( x ) + log( y ) all logarithms are to the base 2 in the following ε log(1/ε) 0asε 0, so assume 0 log(1/0) = 0 So the infor mation value of receiving message i is log( 1 / p i ) or -log( p i ) Copyr ight 2009-3-2, E Robertson 9

Infor mation and Entropy Entropy The entropy of messages with probability set P, denoted H(P), is or, equivalently, H(P) =Σ n i=1 p i log(1 /p i ) H(P) = Σ n i=1 p i log(p i ) Entropy is the expected value of the infor mation gained by receiving one message Entropy is a metaphor based on the mathematical form of the summation do not confuse metaphor with actual physical proper ty in physics, entropy associated with one state & changes over time in communications, entropy associated with set Copyr ight 2009-3-2, E Robertson 10

Infor mation and Entropy Examples of Entropy Tw o equally likely messages: H( { 0.5, 0.5 } ) = = = 1 origin of bit (Binar y Infor mation Term) Answer is already known: H( { 1, 0 } ) = = = 0 so no new infor mation All messages equally likely, sop i =1/n for n messages: H({1/n,,1/n}) = when n =2 k, k bits of infor mation = = Copyr ight 2009-3-2, E Robertson 11

Infor mation and Entropy Fur ther Examples Three messages, P ={0.5, 0.25, 0.25 } H( P )= what is half a bit? Copyr ight 2009-3-2, E Robertson 12

Independent Messages Independence Tw o sets of messages A and B are independent if know amessage from one has no influence on the likelihood of a message from the other. e.g. weather in Bloomington and weather in Per th are independent weather in Bloomington and weather in Indianapolis are NOT independent weather now and weather 5 minutes ago are NOT independent weather today and weather a year ago are independent provided that P is seasonally adjusted weather today and weather yesterday...? this is Indiana, after all Copyr ight 2009-3-2, E Robertson 13

Independent Messages Messages and Signals A second weather report contains no infor mation because it has high-level meaning which doesn t change rapidly. If the message is just a symbol (or signal), we presume that each one is independent from the preceeding. Copyr ight 2009-3-2, E Robertson 14

Independent Messages Independence of Probabilities Independence of message sets A and B is just the notion of independence from probability theory. Say Ahas probabilities P ={p 1,,p m } Bhas probabilities Q ={q 1,,q n } then A and B are independent iff probability if message i from A followed by j from B is p i q j A independent from B implies joint probability set is P Q is Cartesian product here Copyr ight 2009-3-2, E Robertson 15

Independent Messages Additivity of Entropy Fundamental theorem of infor mation theor y: If P and Q are probabilities of independent sets of messages, then H( P Q )=H( P )+H( Q ) That is, if messages are truly independent, then their infor mation adds up. Copyr ight 2009-3-2, E Robertson 16

Independent Messages Proof P = { p 1,,p m } Q = { q 1,,q n } P Q = { p i q j } here and below i ranges over 1 m and j over 1 n H( P Q ) = Σ i,j p i q j log( p i q j ) = = = = Σ p i =1 = H( P )+H( Q ) Copyr ight 2009-3-2, E Robertson 17

Codings Variable Length Encodings Retur n to the question what is half a bit? P ={0.5, 0.25, 0.25 } Use longer codes for less probable messages: { 0, 1 0, 1 1 } msg prob code size cost A 0.5 0 1 0.5 B 0.25 1 0 2 0.5 C 0.25 1 1 2 0.5 sum 1.5 So 0 1 0 0 encodes A B A Useful? Copyr ight 2009-3-2, E Robertson 18

Codings Huffman s Algor ithm Specification: Given n messages with probabilities P, assign binary codes to minimize expected bit length. Input: P Output: binary tree with n leaves, one for each message. Actual codes given by assigning 0 to each left branch and 1 to each right branch. Data Structures: nodes of for m tree, prob set S of nodes tree ::= message (tree, tree ) Copyr ight 2009-3-2, E Robertson 19

Codings Code /* initializations */ S := { <i, p[i]> 1 <= i <= n } /* body */ while S has > 1 items do pick two items <ta, pa> and <tb, pb> in S with lowest probability remove <ta, pa> and <tb, pb> from S add <( ta, tb ), pa+pb> to S endwhile output t, where S = { <t, 1> } Copyr ight 2009-3-2, E Robertson 20

Codings Example A, B, C, D, E, F, G.25,.125,.0625,.0625,.25,.125,.125 A, B, (C, D), E, F, G.125 A, (B, (C, D)), E, F, G.25 A, (B, (C, D)), E, (F, G).25 (A, (B, (C, D))), E, (F, G).5 (A, (B, (C, D))), (E, (F, G)).5 ((A, (B, (C, D))), (E, (F, G))) 1.0 Hence code of 0110 maps to C 0 (A, (B...)) 1 (B (C, D)) 1 (C D)) 0 C Copyr ight 2009-3-2, E Robertson 21

Codings Analysis Message i with probability p i has Huffman code of k bits, with -log(p) k < -log(p) The average cost c of a Huffman code based on P satisfies H( P ) c <H( P )+1 Easiest to see when each p i =2 -k,some k Copyr ight 2009-3-2, E Robertson 22

Codings Other Applications To merge files F 1,F 2,,F n with sizes s 1, s 2,, s n 1. associate pseudo-probability p i = s i /Σ j s j with F i 2. apply Huffman s algor ithm 3. tree indicates merge order General message: combine smallest first Copyr ight 2009-3-2, E Robertson 23