Coding, Information Theory (and Advanced Modulation) Prof. Jay Weitzen Ball 411

Similar documents
Lecture 11: Information theory THURSDAY, FEBRUARY 21, 2019

Digital Communications III (ECE 154C) Introduction to Coding and Information Theory

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Dept. of Linguistics, Indiana University Fall 2015

Lecture 1. Introduction

1.1 Administrative Stuff

Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi

Information Science 2

Cryptography and Network Security Prof. D. Mukhopadhyay Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

And now for something completely different

5 Mutual Information and Channel Capacity

Welcome to Comp 411! 2) Course Objectives. 1) Course Mechanics. 3) Information. I thought this course was called Computer Organization

Lecture 10: Powers of Matrices, Difference Equations

Introduction to Information Theory. Part 4

Modern Digital Communication Techniques Prof. Suvra Sekhar Das G. S. Sanyal School of Telecommunication Indian Institute of Technology, Kharagpur

Discrete Probability and State Estimation

Turbo Compression. Andrej Rikovsky, Advisor: Pavol Hanus

Chapter Three. Deciphering the Code. Understanding Notation

Review of Basic Probability

Natural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley

Alex s Guide to Word Problems and Linear Equations Following Glencoe Algebra 1

Noisy-Channel Coding

AQI: Advanced Quantum Information Lecture 6 (Module 2): Distinguishing Quantum States January 28, 2013

Lecture 10: The Normal Distribution. So far all the random variables have been discrete.

ITCT Lecture IV.3: Markov Processes and Sources with Memory

Chapter 9 Fundamental Limits in Information Theory

Natural Language Processing Prof. Pawan Goyal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Module - 19 Gated Latches

Math 3361-Modern Algebra Lecture 08 9/26/ Cardinality

Probability and the Second Law of Thermodynamics

(Classical) Information Theory III: Noisy channel coding

Information Gain. Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Introduction to Information Theory

A Quick Introduction to Row Reduction

7.1 What is it and why should we care?

Preparing for the CSET. Sample Book. Mathematics

Probability, Entropy, and Inference / More About Inference

INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. NPTEL National Programme on Technology Enhanced Learning. Probability Methods in Civil Engineering

Naïve Bayes classification

Information in Biology

(Refer Slide Time: 0:17)

Lecture 1: September 25, A quick reminder about random variables and convexity

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

Digital communication system. Shannon s separation principle

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Bayesian Inference. 2 CS295-7 cfl Michael J. Black,

PERFECT SECRECY AND ADVERSARIAL INDISTINGUISHABILITY

L14. 1 Lecture 14: Crash Course in Probability. July 7, Overview and Objectives. 1.2 Part 1: Probability

Advanced 3G and 4G Wireless Communication Prof. Aditya K. Jagannatham Department of Electrical Engineering Indian Institute of Technology, Kanpur

STA Module 4 Probability Concepts. Rev.F08 1

Probability Review Lecturer: Ji Liu Thank Jerry Zhu for sharing his slides

Systematic Uncertainty Max Bean John Jay College of Criminal Justice, Physics Program

Noisy channel communication

Classification & Information Theory Lecture #8

Information & Correlation

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

Homework (due Wed, Oct 27) Chapter 7: #17, 27, 28 Announcements: Midterm exams keys on web. (For a few hours the answer to MC#1 was incorrect on

Discrete Probability and State Estimation

Example: Suppose we toss a quarter and observe whether it falls heads or tails, recording the result as 1 for heads and 0 for tails.

OCR Statistics 1 Probability. Section 1: Introducing probability

Classification and Regression Trees

Information Theory CHAPTER. 5.1 Introduction. 5.2 Entropy

(Refer Slide Time: 6:43)

ECE521 lecture 4: 19 January Optimization, MLE, regularization

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

An Introduction to Information Theory: Notes

Machine Learning, Midterm Exam

Strongly Agree Agree

Math 166: Topics in Contemporary Mathematics II

Information Theory and Coding Techniques

Kyle Reing University of Southern California April 18, 2018

Lecture 15: Conditional and Joint Typicaility

3F1 Information Theory, Lecture 1

Midterm Exam Information Theory Fall Midterm Exam. Time: 09:10 12:10 11/23, 2016

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag

Outline. CPSC 418/MATH 318 Introduction to Cryptography. Information Theory. Partial Information. Perfect Secrecy, One-Time Pad

Information Theory, Statistics, and Decision Trees

#29: Logarithm review May 16, 2009

Introduction to Information Theory. Part 3

CS 188: Artificial Intelligence Spring Today

Generative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul

CS 630 Basic Probability and Information Theory. Tim Campbell

Lecture 1: Introduction, Entropy and ML estimation

STAT 111 Recitation 1

Lecture 2: Perfect Secrecy and its Limitations

2011 Pearson Education, Inc

Information in Biology

CPSC 540: Machine Learning

PHYS Statistical Mechanics I Course Outline

Data Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

MAE 493G, CpE 493M, Mobile Robotics. 6. Basic Probability

Lecture 14: The Rank-Nullity Theorem

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H.

Discrete Mathematics. Kenneth A. Ribet. January 31, 2013

Multimedia. Multimedia Data Compression (Lossless Compression Algorithms)

One Lesson of Information Theory

Prof. Thistleton MAT 505 Introduction to Probability Lecture 18

VID3: Sampling and Quantization

Mixtures of Gaussians continued

Physics 239/139 Spring 2018 Assignment 3 Solutions

Transcription:

16.548 Coding, Information Theory (and Advanced Modulation) Prof. Jay Weitzen Ball 411 Jay_weitzen@uml.edu 1

Notes Coverage Course Introduction Definition of Information and Entropy Review of Conditional Probability 2

Class Coverage Fundamentals of Information Theory (4 weeks) Block Coding (3 weeks) Advanced Coding and modulation as a way of achieving the Shannon Capacity bound: Convolutional coding, trellis modulation, and turbo modulation, space time coding (7 weeks) 3

Course Web Site http://faculty.uml.edu/jweitzen/16.548 Class notes, assignments, other materials on web site Please check at least twice per week Lectures will be streamed, see course website 4

Prerequisites (What you need to know to thrive in this class) 16.363 or 16.584 (A Probability class) Some Programming (C, VB, Matlab) Digital Communication Theory 5

Grading Policy 4 Mini-Projects (25% each project) Lempel ziv compressor Cyclic Redundancy Check Convolutional Coder/Decoder soft decision Trellis Modulator/Demodulator 6

Course Information and Text Books Coding and Information Theory by Wells, plus his notes from University of Idaho Digital Communication by Sklar, or Proakis Book Shannon s original Paper (1948) Other material on Web site 7

Claude Shannon Founds Science of Information theory in 1948 In his 1948 paper, ``A Mathematical Theory of Communication,'' Claude E. Shannon formulated the theory of data compression. Shannon established that there is a fundamental limit to lossless data compression. This limit, called the entropy rate, is denoted by H. The exact value of H depends on the information source --- more specifically, the statistical nature of the source. It is possible to compress the source, in a lossless manner, with compression rate close to H. It is mathematically impossible to do better than H. 8

9

This is Important 10

Source Modeling 11

Zero order models It has been said, that if you get enough monkeys, and sit them down at enough typewriters, eventually they will complete the works of Shakespeare 12

First Order Model 13

Higher Order Models 14

15

16

17

18

19

20

Zero th Order Model 21

22

Definition of Entropy Shannon used the ideas of randomness and entropy from the study of thermodynamics to estimate the randomness (e.g. information content or entropy) of a process Entropy is a measure of predictability or randomness 23

Entropy in a nut-shell Low Entropy..the values (locations of soup) sampled entirely from within the soup bowl High Entropy..the values (locations of soup) unpredictable... almost uniformly sampled throughout our dining room 24

Absolute certainty implies 0 information 25

Randomness has high information content 26

Quick Review: Working with Logarithms 27

28

29

Entropy of English Alphabet 30

31

32

33

34

35

Kind of Intuitive, but hard to prove 36

Information content is bounded by certainty (0) and uncertainty 37

38

Bounds on Entropy 39

Math 495 Micro-Teaching Quick Review: JOINT DENSITY OF RANDOM VARIABLES David Sherman Bedrock, USA 40

In this presentation, we ll discuss the joint density of two random variables. This is a mathematical tool for representing the interdependence of two events. First, we need some random variables. Lots of those in Bedrock. 41

Let X be the number of days Fred Flintstone is late to work in a given week. Then X is a random variable; here is its density function: N 1 2 3 F(N).5.3.2 Amazingly, another resident of Bedrock is late with exactly the same distribution. It s... Fred s boss, Mr. Slate! 42

N 1 2 3 F(N).5.3.2 Remember this means that P(X=3) =.2. Let Y be the number of days when Slate is late. Suppose we want to record BOTH X and Y for a given week. How likely are different pairs? We re talking about the joint density of X and Y, and we record this information as a function of two variables, like this: 1 2 3 1.35.1.05 2.15.1.05 3 0.1.1 This means that P(X=3 and Y=2) =.05. We label it f(3,2). 43

N 1 2 3 F(N).5.3.2 1 2 3 1.35.1.05 2.15.1.05 3 0.1.1.2 The first observation to make is that this joint probability function contains all the information from the density functions for X and Y (which are the same here). For example, to recover P(X=3), we can add f(3,1)+f(3,2)+f(3,3). The individual probability functions recovered in this way are called marginal. Another observation here is that Slate is never late three days in a week when Fred is only late once. 44

N 1 2 3 F(N).5.3.2 Since he rides to work with Fred (at least until the directing career works out), Barney Rubble is late to work with the same probability function too. What do you think the joint probability function for Fred and Barney looks like? 1 2 3 1.5 0 0 2 0.3 0 3 0 0.2 It s diagonal! This should make sense, since in any week Fred and Barney are late the same number of days. This is, in some sense, a maximum amount of interaction: if you know one, you know the other. P(Barney late Fred late)= 1 45

N 1 2 3 F(N).5.3.2 A little-known fact: there is actually another famous person who is late to work like this. SPOCK! (Pretty embarrassing for a Vulcan.) Before you try to guess what the joint density function for Fred and Spock is, remember that Spock lives millions of miles (and years) from Fred, so we wouldn t expect these variables to influence each other at all. In fact, they re independent. 46

N 1 2 3 F(N).5.3.2 1 2 3 1.25.15.1 2.15.09.06 3.1.06.04 Since we know the variables X and Z (for Spock) are independent, we can calculate each of the joint probabilities by multiplying. For example, f(2,3) = P(X=2 and Z=3) = P(X=2)P(Z=3) = (.3)(.2) =.06. This represents a minimal amount of interaction. P(spock fred)=p(spock) 47

Dependence of two events means that knowledge of one gives information about the other. Now we ve seen that the joint density of two variables is able to reveal that two events are independent ( and ), completely dependent ( and ), or somewhere in the middle ( and ). Later in the course we will learn ways to quantify dependence. Stay tuned. YABBA DABBA DOO! 48

49

50

51

52

Marginal Density Functions 53

Conditional Probability another event P ( A B) = P( A, B) P( B) 54

Conditional Probability (cont d) P(B A) 55

Definition of conditional probability If P(B) is not equal to zero, then the conditional probability of A relative to B, namely, the probability of A given B, is P(A B) = P(A B) P(B) P(A B) = P(B) P(A B) or P(A B) = P(A) P(B A) 56

Conditional Probability A B 0.25 0.25 0.45 P(A) = 0.25 + 0.25 = 0.50 P(B) = 0.45 + 0.25 = 0.70 P(A ) = 1-0.50 =0.50 P(B )= 1-0.70 =0.30 P(A B) = P(B A) = P(A B) P(B) P(A B) P(A) 025. = = 070. 0357. 025. = = 05. 050. 57

Some Observations: In previous Slides P(Fred late and Spock late) were independent Therefore P(Fred Spock) =P(Fred)P(spock)/P(spock)=P(Fred) P(Fred late and Barney late) are totally dependent P(Fred Barney)=1 58

Law of Total Probability If B, B,..., and B are mutually exclusive events of which 1 2 k one must occur, then for any event A P(A) = P(B ) P(A B ) + P(B ) P( A B ) +... + P( B ) P( A B ) 1 1 2 2 k k Special case of rule of Total Probability ' ' P( A) = P( B) P( A B) + P( B ) P( A B ) 59

Bayes Theorem 60

61

Generalized Bayes theorem If B, B,... and B are mutually exclusive events of which 1 2 k one must occur, then P( B i / A) = P( Bi ) P( A/ Bi ) P( B) P( A/ B) + P( B ) P( A/ B ) +... + P( B ) P( A/ B 1 1 2 2 k k ) for i =1, 2,...,k. 62

Urn Problems Applications of Bayes Theorem Begin to think about concepts of Maximum likelihood and MAP detections, which we will use throughout codind theory 63

64

65

66

End of Notes 1 67