Information. = more information was provided by the outcome in #2
|
|
- Francine Gray
- 5 years ago
- Views:
Transcription
1 Outline First part based very loosely on [Abramson 63]. Information theory usually formulated in terms of information channels and coding will not discuss those here.. Information 2. Entropy 3. Mutual Information 4. Cross Entropy and Learning Mellon 2 IT tutorial, Roni Rosenfeld, 999 Definition of Information (After [Abramson 63]) Let E be some event which occurs with probability P(E). If we are told that E has occurred, then we say that we have received bits of information. I(E) = log 2 P(E) Base of log is unimportant will only change the units We ll stick with bits, and always assume base 2 Can also think of information as amount of surprise in E (e.g. P(E) =, P(E) = 0) Example: result of a fair coin flip (log 2 2 = bit) Example: result of a fair die roll (log bits) Mellon 4 IT tutorial, Roni Rosenfeld, 999 A Gentle Tutorial on Information Theory and Learning Roni Rosenfeld Information information knowledge Concerned with abstract possibilities, not their meaning information: reduction in uncertainty Mellon University Mellon Imagine: # you re about to observe the outcome of a coin flip #2 you re about to observe the outcome of a die roll There is more uncertainty in #2 Next:. You observed outcome of # uncertainty reduced to zero. 2. You observed outcome of #2 uncertainty reduced to zero. = more information was provided by the outcome in #2 Mellon 3 IT tutorial, Roni Rosenfeld, 999
2 Entropy Entropy as a Function of a Probability Distribution A Zero-memory information source S is a source that emits symbols from an alphabet {s, s 2,..., s k } with probabilities {p, p 2,..., p k }, respectively, where the symbols emitted are statistically independent. What is the average amount of information in observing the output of the source S? Since the source S is fully characterized byp = {p,... p k } (we don t care what the symbols s i actually are, or what they stand for), entropy can also be thought of as a property of a probability distribution function P: the avg uncertainty in the distribution. So we may also write: Call this Entropy: H(S) = H(P) = H(p, p 2,..., p k ) = i p i log p i H(S) = i p i I(s i ) = i p i log p i = E P [ log p(s) ] (Can be generalized to continuous distributions.) * Mellon 6 IT tutorial, Roni Rosenfeld, 999 Mellon 8 IT tutorial, Roni Rosenfeld, 999 Information is Additive I(k fair coin tosses) = log /2 k = k bits So: random word from a 00,000 word vocabulary: I(word) = log 00, 000 = 6.6 bits A 000 word document from same source: I(document) = 6,60 bits A 480x640 pixel, 6-greyscale video picture: I(picture) = 307,200 log6 =,228,800 bits = A (VGA) picture is worth (a lot more than) a 000 words! (In reality, both are gross overestimates.) Alternative Explanations of Entropy H(S) = i p i log p i. avg amt of info provided per symbol 2. avg amount of surprise when observing a symbol 3. uncertainty an observer has before seeing the symbol 4. avg # of bits needed to communicate each symbol (Shannon: there are codes that will communicate these symbols with efficiency arbitrarily close to H(S) bits/symbol; there are no codes that will do it with efficiency < H(S) bits/symbol) Mellon 5 IT tutorial, Roni Rosenfeld, 999 Mellon 7 IT tutorial, Roni Rosenfeld, 999
3 Special Case: k = 2 The Entropy of English Flipping a coin with P( head )=p, P( tail )=-p 27 characters (A-Z, space). H(p) = p log p + ( p) log p 00,000 words (avg 5.5 characters each) Assuming independence between successive characters: uniform character distribution: log 27 = 4.75 bits/character true character distribution: 4.03 bits/character Assuming independence between successive words: unifrom word distribution: log 00, 000/ bits/character true word distribution: 9.45/ bits/character Notice: True Entropy of English is much lower! zero uncertainty/information/surprise at edges maximum info at 0.5 ( bit) drops off quickly Mellon 0 IT tutorial, Roni Rosenfeld, 999 Mellon 2 IT tutorial, Roni Rosenfeld, 999 Properties of Entropy Special Case: k = 2 (cont.) Relates to: 20 questions game strategy (halving the space). H(P) = i. Non-negative: H(P) 0 p i log p i 2. Invariant wrt permutation of its inputs: H(p, p 2,..., p k ) = H(p τ(), p τ(2),..., p τ(k) ) So a sequence of (independent) 0 s-and- s can provide up to bit of information per digit, provided the 0 s and s are equally likely at any point. If they are not equally likely, the sequence provides less information and can be compressed. 3. For any other probability distribution {q, q 2,..., q k }: H(P) = i p i log p i < i p i log q i 4. H(P) log k, with equality iff p i = /k i 5. The further P is from uniform, the lower the entropy. Mellon 9 IT tutorial, Roni Rosenfeld, 999 Mellon IT tutorial, Roni Rosenfeld, 999
4 Joint Probability, Joint Entropy cold mild hot low high H(T) = H(0.3,0.5,0.2) = H(M) = H(0.6,0.4) = H(T) + H(M) = Joint Entropy: consider the space of (t, m) events H(T, M) = t,m P(T = t, M = m) log P(T=t,M=m) H(0.,0.4,0.,0.2,0.,0.) = Notice that H(T, M) < H(T) + H(M)!!! Mellon 4 IT tutorial, Roni Rosenfeld, 999 Conditional Probability, Conditional Entropy P(M = m T = t) cold mild hot low /3 4/5 /2 high 2/3 /5 / Conditional Entropy: H(M T = cold) = H(/3,2/3) = H(M T = mild) = H(4/5,/5) = H(M T = hot) = H(/2,/2) =.0 Average Conditional Entropy (aka Equivocation): H(M/T) = t P(T = t) H(M T = t) = 0.3 H(M T = cold) H(M T = mild) H(M T = hot) = How much is T telling us on average about M? H(M) H(M T) = bits Mellon 6 IT tutorial, Roni Rosenfeld, 999 Two Sources Temperature T: a random variable taking on values t P(T=hot)=0.3 P(T=mild)=0.5 P(T=cold)=0.2 = H(T)=H(0.3, 0.5, 0.2) = humidity M: a random variable, taking on values m P(M=low)=0.6 P(M=high)=0.4 = H(M) = H(0.6,0.4) = T, M not independent: P(T = t, M = m) P(T = t) P(M = m) Mellon 3 IT tutorial, Roni Rosenfeld, 999 Conditional Probability, Conditional Entropy Conditional Entropy: P(T = t M = m) cold mild hot low /6 4/6 /6.0 high 2/4 /4 /4.0 H(T M = low) = H(/6,4/6,/6) =.2563 H(T M = high) = H(2/4,/4,/4) =.5 Average Conditional Entropy (aka equivocation): H(T/M) = m P(M = m) H(T M = m) = 0.6 H(T M = low) H(T M = high) = How much is M telling us on average about T? H(T) H(T M) = bits Mellon 5 IT tutorial, Roni Rosenfeld, 999
5 Mutual Information Visualized A Markov Source Order-k Markov Source: A source that remembers the last k symbols emitted. Ie, the probability of emitting any symbol depends on the last k emitted symbols: P(s T=t s T=t, s T=t 2,..., s T=t k ) So the last k emitted symbols define a state, and there are q k states. First-order markov source: defined by qxq matrix: P(s i s j ) Example: S T=t is position after t random steps H(X, Y ) = H(X) + H(Y ) I(X; Y ) Mellon 8 IT tutorial, Roni Rosenfeld, 999 Mellon 20 IT tutorial, Roni Rosenfeld, 999 Average Mutual Information Three Sources I(X; Y ) = H(X) H(X/Y ) = P(x) log x P(x) P(x, y) log x,y P(x y) = P(x, y) log P(x y) x,y P(x) = P(x, y) P(x, y) log x,y P(x)P(y) Properties of Average Mutual Information: Symmetric (but H(X) H(Y ) and H(X/Y ) H(Y/X)) Non-negative (but H(X) H(X/y) may be negative!) Zero iff X, Y independent Additive (see next slide) Mellon 7 IT tutorial, Roni Rosenfeld, 999 From Blachman: ( / means given. ; means between., means and.) H(X, Y/Z) = H({X, Y } / Z) H(X/Y, Z) = H(X / {Y, Z}) I(X; Y/Z) = H(X/Z) H(X/Y, Z) I(X; Y ; Z) = I(X; Y ) I(X; Y/Z) = Can be negative! = H(X, Y, Z) H(X, Y ) H(X, Z) H(Y, Z) + H(X) + H(Y ) + H I(X; Y, Z) = I(X; Y ) + I(X; Z/Y ) (additivity) But: I(X; Y ) = 0,I(X; Z) = 0 doesn t mean I(X; Y, Z) = 0!!! Mellon 9 IT tutorial, Roni Rosenfeld, 999
6 Modeling an Arbitrary Source Source D(Y ) with unknown distribution P D (Y ) (recall H(P D ) = E P D [log P D (Y )] ) Goal: Model (approximate) with learned distribution P M (Y ) What s a good model P M (Y )?. RMS error over D s parameters but D is unknown! 2. Predictive Probability: Maximize the expected log-likelihood the model assigns to future data from D A Distance Measure Between Distributions Kullback-Liebler distance: KL(P D ; P M ) = CH(P D ; P M ) H(P D ) = E P D [log P D (Y ) P M (Y ) ] Properties of KL distance:. Non-negative. KL(P D ; P M ) = 0 P D = P M 2. Generally non-symmetric Mellon 22 IT tutorial, Roni Rosenfeld, 999 The following are equivalent:. Maximize Predictive Probability of P M for distribution D 2. Minimize Cross Entropy CH(P D ; P M ) 3. Minimize the distance KL(P D ; P M ) Mellon 24 IT tutorial, Roni Rosenfeld, 999 Approximating with a Markov Source Cross Entropy A non-markovian source can still be approximated by one. Examples: English characters: C = {c, c 2,...}. Uniform: H(C) = log27 = 4.75 bits/char 2. Assuming 0 memory: H(C) = H(0.86,0.064,0.027,...) = 4.03 bits/char 3. Assuming st order: H(C) = H(c i /c i ) = 3.32 bits/char 4. Assuming 2nd order: H(C) = H(c i /c i, c i 2 ) = 3. bits/char 5. Assuming large order: Shannon got down to bit/char M The following are equivalent: = argmax E D [log P M (Y )] M = argmin E D [log M P M (Y ) ] = CH(P D ; P M ) = Cross Entropy. Maximize Predictive Probability of P M 2. Minimize Cross Entropy CH(P D ; P M ) 3. Minimize the difference between P D and P M (in what sense?) Mellon 2 IT tutorial, Roni Rosenfeld, 999 Mellon 23 IT tutorial, Roni Rosenfeld, 999
A Gentle Tutorial on Information Theory and Learning. Roni Rosenfeld. Carnegie Mellon University
A Gentle Tutorial on Information Theory and Learning Roni Rosenfeld Mellon University Mellon Outline First part based very loosely on [Abramson 63]. Information theory usually formulated in terms of information
More informationIntroduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.
L65 Dept. of Linguistics, Indiana University Fall 205 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission rate
More informationDept. of Linguistics, Indiana University Fall 2015
L645 Dept. of Linguistics, Indiana University Fall 2015 1 / 28 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission
More informationKotebe Metropolitan University Department of Computer Science and Technology Multimedia (CoSc 4151)
Kotebe Metropolitan University Department of Computer Science and Technology Multimedia (CoSc 4151) Chapter Three Multimedia Data Compression Part I: Entropy in ordinary words Claude Shannon developed
More informationLecture 7: DecisionTrees
Lecture 7: DecisionTrees What are decision trees? Brief interlude on information theory Decision tree construction Overfitting avoidance Regression trees COMP-652, Lecture 7 - September 28, 2009 1 Recall:
More informationClassification & Information Theory Lecture #8
Classification & Information Theory Lecture #8 Introduction to Natural Language Processing CMPSCI 585, Fall 2007 University of Massachusetts Amherst Andrew McCallum Today s Main Points Automatically categorizing
More informationCOMPSCI 650 Applied Information Theory Jan 21, Lecture 2
COMPSCI 650 Applied Information Theory Jan 21, 2016 Lecture 2 Instructor: Arya Mazumdar Scribe: Gayane Vardoyan, Jong-Chyi Su 1 Entropy Definition: Entropy is a measure of uncertainty of a random variable.
More informationChapter 14. From Randomness to Probability. Copyright 2012, 2008, 2005 Pearson Education, Inc.
Chapter 14 From Randomness to Probability Copyright 2012, 2008, 2005 Pearson Education, Inc. Dealing with Random Phenomena A random phenomenon is a situation in which we know what outcomes could happen,
More informationCS 630 Basic Probability and Information Theory. Tim Campbell
CS 630 Basic Probability and Information Theory Tim Campbell 21 January 2003 Probability Theory Probability Theory is the study of how best to predict outcomes of events. An experiment (or trial or event)
More informationINTRODUCTION TO INFORMATION THEORY
INTRODUCTION TO INFORMATION THEORY KRISTOFFER P. NIMARK These notes introduce the machinery of information theory which is a eld within applied mathematics. The material can be found in most textbooks
More information3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H.
Appendix A Information Theory A.1 Entropy Shannon (Shanon, 1948) developed the concept of entropy to measure the uncertainty of a discrete random variable. Suppose X is a discrete random variable that
More informationInformation in Biology
Information in Biology CRI - Centre de Recherches Interdisciplinaires, Paris May 2012 Information processing is an essential part of Life. Thinking about it in quantitative terms may is useful. 1 Living
More informationChapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye
Chapter 2: Entropy and Mutual Information Chapter 2 outline Definitions Entropy Joint entropy, conditional entropy Relative entropy, mutual information Chain rules Jensen s inequality Log-sum inequality
More informationLecture 5 - Information theory
Lecture 5 - Information theory Jan Bouda FI MU May 18, 2012 Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 1 / 42 Part I Uncertainty and entropy Jan Bouda (FI MU) Lecture 5 - Information
More informationInformation Theory. Coding and Information Theory. Information Theory Textbooks. Entropy
Coding and Information Theory Chris Williams, School of Informatics, University of Edinburgh Overview What is information theory? Entropy Coding Information Theory Shannon (1948): Information theory is
More informationQuantitative Biology II Lecture 4: Variational Methods
10 th March 2015 Quantitative Biology II Lecture 4: Variational Methods Gurinder Singh Mickey Atwal Center for Quantitative Biology Cold Spring Harbor Laboratory Image credit: Mike West Summary Approximate
More informationEE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018
Please submit the solutions on Gradescope. EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 1. Optimal codeword lengths. Although the codeword lengths of an optimal variable length code
More informationInformation in Biology
Lecture 3: Information in Biology Tsvi Tlusty, tsvi@unist.ac.kr Living information is carried by molecular channels Living systems I. Self-replicating information processors Environment II. III. Evolve
More information6.02 Fall 2012 Lecture #1
6.02 Fall 2012 Lecture #1 Digital vs. analog communication The birth of modern digital communication Information and entropy Codes, Huffman coding 6.02 Fall 2012 Lecture 1, Slide #1 6.02 Fall 2012 Lecture
More informationInformation Theory Primer:
Information Theory Primer: Entropy, KL Divergence, Mutual Information, Jensen s inequality Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,
More informationSTA Module 4 Probability Concepts. Rev.F08 1
STA 2023 Module 4 Probability Concepts Rev.F08 1 Learning Objectives Upon completing this module, you should be able to: 1. Compute probabilities for experiments having equally likely outcomes. 2. Interpret
More informationOutline. 1. Define likelihood 2. Interpretations of likelihoods 3. Likelihood plots 4. Maximum likelihood 5. Likelihood ratio benchmarks
Outline 1. Define likelihood 2. Interpretations of likelihoods 3. Likelihood plots 4. Maximum likelihood 5. Likelihood ratio benchmarks Likelihood A common and fruitful approach to statistics is to assume
More informationInformation Theory and Hypothesis Testing
Summer School on Game Theory and Telecommunications Campione, 7-12 September, 2014 Information Theory and Hypothesis Testing Mauro Barni University of Siena September 8 Review of some basic results linking
More informationAn instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1
Kraft s inequality An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if N 2 l i 1 Proof: Suppose that we have a tree code. Let l max = max{l 1,...,
More informationLecture 1: Introduction, Entropy and ML estimation
0-704: Information Processing and Learning Spring 202 Lecture : Introduction, Entropy and ML estimation Lecturer: Aarti Singh Scribes: Min Xu Disclaimer: These notes have not been subjected to the usual
More information3F1 Information Theory, Lecture 1
3F1 Information Theory, Lecture 1 Jossy Sayir Department of Engineering Michaelmas 2013, 22 November 2013 Organisation History Entropy Mutual Information 2 / 18 Course Organisation 4 lectures Course material:
More informationLecture 11: Information theory THURSDAY, FEBRUARY 21, 2019
Lecture 11: Information theory DANIEL WELLER THURSDAY, FEBRUARY 21, 2019 Agenda Information and probability Entropy and coding Mutual information and capacity Both images contain the same fraction of black
More informationComputing and Communications 2. Information Theory -Entropy
1896 1920 1987 2006 Computing and Communications 2. Information Theory -Entropy Ying Cui Department of Electronic Engineering Shanghai Jiao Tong University, China 2017, Autumn 1 Outline Entropy Joint entropy
More informationExercises with solutions (Set D)
Exercises with solutions Set D. A fair die is rolled at the same time as a fair coin is tossed. Let A be the number on the upper surface of the die and let B describe the outcome of the coin toss, where
More informationPROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability
More informationChapter I: Fundamental Information Theory
ECE-S622/T62 Notes Chapter I: Fundamental Information Theory Ruifeng Zhang Dept. of Electrical & Computer Eng. Drexel University. Information Source Information is the outcome of some physical processes.
More informationNoisy channel communication
Information Theory http://www.inf.ed.ac.uk/teaching/courses/it/ Week 6 Communication channels and Information Some notes on the noisy channel setup: Iain Murray, 2012 School of Informatics, University
More informationCh. 8 Math Preliminaries for Lossy Coding. 8.5 Rate-Distortion Theory
Ch. 8 Math Preliminaries for Lossy Coding 8.5 Rate-Distortion Theory 1 Introduction Theory provide insight into the trade between Rate & Distortion This theory is needed to answer: What do typical R-D
More informationMachine Learning. Lecture 02.2: Basics of Information Theory. Nevin L. Zhang
Machine Learning Lecture 02.2: Basics of Information Theory Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering The Hong Kong University of Science and Technology Nevin L. Zhang
More informationInformation Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18
Information Theory David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 18 A Measure of Information? Consider a discrete random variable
More informationBioinformatics: Biology X
Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA Model Building/Checking, Reverse Engineering, Causality Outline 1 Bayesian Interpretation of Probabilities 2 Where (or of what)
More informationIntroduction to Information Theory
Introduction to Information Theory Gurinder Singh Mickey Atwal atwal@cshl.edu Center for Quantitative Biology Kullback-Leibler Divergence Summary Shannon s coding theorems Entropy Mutual Information Multi-information
More informationLecture 2: August 31
0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 2: August 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy
More informationELEMENT OF INFORMATION THEORY
History Table of Content ELEMENT OF INFORMATION THEORY O. Le Meur olemeur@irisa.fr Univ. of Rennes 1 http://www.irisa.fr/temics/staff/lemeur/ October 2010 1 History Table of Content VERSION: 2009-2010:
More informationProbability Theory for Machine Learning. Chris Cremer September 2015
Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares
More informationInformation Theory, Statistics, and Decision Trees
Information Theory, Statistics, and Decision Trees Léon Bottou COS 424 4/6/2010 Summary 1. Basic information theory. 2. Decision trees. 3. Information theory and statistics. Léon Bottou 2/31 COS 424 4/6/2010
More informationWhy should you care?? Intellectual curiosity. Gambling. Mathematically the same as the ESP decision problem we discussed in Week 4.
I. Probability basics (Sections 4.1 and 4.2) Flip a fair (probability of HEADS is 1/2) coin ten times. What is the probability of getting exactly 5 HEADS? What is the probability of getting exactly 10
More informationBasic Probability. Introduction
Basic Probability Introduction The world is an uncertain place. Making predictions about something as seemingly mundane as tomorrow s weather, for example, is actually quite a difficult task. Even with
More informationAn Introduction to Bioinformatics Algorithms Hidden Markov Models
Hidden Markov Models Hidden Markov Models Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm Forward-Backward Algorithm Profile HMMs HMM Parameter Estimation Viterbi training
More information1. Basics of Information
1. Basics of Information 6.004x Computation Structures Part 1 Digital Circuits Copyright 2015 MIT EECS 6.004 Computation Structures L1: Basics of Information, Slide #1 What is Information? Information,
More informationGambling and Information Theory
Gambling and Information Theory Giulio Bertoli UNIVERSITEIT VAN AMSTERDAM December 17, 2014 Overview Introduction Kelly Gambling Horse Races and Mutual Information Some Facts Shannon (1948): definitions/concepts
More informationCS4705. Probability Review and Naïve Bayes. Slides from Dragomir Radev
CS4705 Probability Review and Naïve Bayes Slides from Dragomir Radev Classification using a Generative Approach Previously on NLP discriminative models P C D here is a line with all the social media posts
More informationGod doesn t play dice. - Albert Einstein
ECE 450 Lecture 1 God doesn t play dice. - Albert Einstein As far as the laws of mathematics refer to reality, they are not certain; as far as they are certain, they do not refer to reality. Lecture Overview
More informationChapter Learning Objectives. Random Experiments Dfiii Definition: Dfiii Definition:
Chapter 2: Probability 2-1 Sample Spaces & Events 2-1.1 Random Experiments 2-1.2 Sample Spaces 2-1.3 Events 2-1 1.4 Counting Techniques 2-2 Interpretations & Axioms of Probability 2-3 Addition Rules 2-4
More informationX = X X n, + X 2
CS 70 Discrete Mathematics for CS Fall 2003 Wagner Lecture 22 Variance Question: At each time step, I flip a fair coin. If it comes up Heads, I walk one step to the right; if it comes up Tails, I walk
More informationDiscrete Structures for Computer Science
Discrete Structures for Computer Science William Garrison bill@cs.pitt.edu 6311 Sennott Square Lecture #24: Probability Theory Based on materials developed by Dr. Adam Lee Not all events are equally likely
More informationOutline. 1. Define likelihood 2. Interpretations of likelihoods 3. Likelihood plots 4. Maximum likelihood 5. Likelihood ratio benchmarks
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More information2. Probability. Chris Piech and Mehran Sahami. Oct 2017
2. Probability Chris Piech and Mehran Sahami Oct 2017 1 Introduction It is that time in the quarter (it is still week one) when we get to talk about probability. Again we are going to build up from first
More informationNoisy-Channel Coding
Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/05264298 Part II Noisy-Channel Coding Copyright Cambridge University Press 2003.
More informationCS 361: Probability & Statistics
February 19, 2018 CS 361: Probability & Statistics Random variables Markov s inequality This theorem says that for any random variable X and any value a, we have A random variable is unlikely to have an
More informationHidden Markov Models. Hosein Mohimani GHC7717
Hidden Markov Models Hosein Mohimani GHC7717 hoseinm@andrew.cmu.edu Fair et Casino Problem Dealer flips a coin and player bets on outcome Dealer use either a fair coin (head and tail equally likely) or
More informationQB LECTURE #4: Motif Finding
QB LECTURE #4: Motif Finding Adam Siepel Nov. 20, 2015 2 Plan for Today Probability models for binding sites Scoring and detecting binding sites De novo motif finding 3 Transcription Initiation Chromatin
More informationHidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:
Hidden Markov Models Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from: www.ioalgorithms.info Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm
More informationHIDDEN MARKOV MODELS
HIDDEN MARKOV MODELS Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm Forward-Backward Algorithm Profile HMMs HMM Parameter Estimation Viterbi training Baum-Welch algorithm
More informationDecision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag
Decision Trees Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Supervised Learning Input: labelled training data i.e., data plus desired output Assumption:
More informationExpectation Maximization
Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /
More informationWhat is a random variable
OKAN UNIVERSITY FACULTY OF ENGINEERING AND ARCHITECTURE MATH 256 Probability and Random Processes 04 Random Variables Fall 20 Yrd. Doç. Dr. Didem Kivanc Tureli didemk@ieee.org didem.kivanc@okan.edu.tr
More informationFundamentals of Probability CE 311S
Fundamentals of Probability CE 311S OUTLINE Review Elementary set theory Probability fundamentals: outcomes, sample spaces, events Outline ELEMENTARY SET THEORY Basic probability concepts can be cast in
More informationSome Basic Concepts of Probability and Information Theory: Pt. 2
Some Basic Concepts of Probability and Information Theory: Pt. 2 PHYS 476Q - Southern Illinois University January 22, 2018 PHYS 476Q - Southern Illinois University Some Basic Concepts of Probability and
More informationCS546:Learning and NLP Lec 4: Mathematical and Computational Paradigms
CS546:Learning and NLP Lec 4: Mathematical and Computational Paradigms Spring 2009 February 3, 2009 Lecture Some note on statistics Bayesian decision theory Concentration bounds Information theory Introduction
More informationInformation Theory. Week 4 Compressing streams. Iain Murray,
Information Theory http://www.inf.ed.ac.uk/teaching/courses/it/ Week 4 Compressing streams Iain Murray, 2014 School of Informatics, University of Edinburgh Jensen s inequality For convex functions: E[f(x)]
More informationProbabilistic and Bayesian Machine Learning
Probabilistic and Bayesian Machine Learning Lecture 1: Introduction to Probabilistic Modelling Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Why a
More informationQuantitative Biology Lecture 3
23 nd Sep 2015 Quantitative Biology Lecture 3 Gurinder Singh Mickey Atwal Center for Quantitative Biology Summary Covariance, Correlation Confounding variables (Batch Effects) Information Theory Covariance
More informationCommunication Theory II
Communication Theory II Lecture 15: Information Theory (cont d) Ahmed Elnakib, PhD Assistant Professor, Mansoura University, Egypt March 29 th, 2015 1 Example: Channel Capacity of BSC o Let then: o For
More information4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information
4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information Ramji Venkataramanan Signal Processing and Communications Lab Department of Engineering ramji.v@eng.cam.ac.uk
More informationCSC 411 Lecture 3: Decision Trees
CSC 411 Lecture 3: Decision Trees Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 03-Decision Trees 1 / 33 Today Decision Trees Simple but powerful learning
More informationTopics. Probability Theory. Perfect Secrecy. Information Theory
Topics Probability Theory Perfect Secrecy Information Theory Some Terms (P,C,K,E,D) Computational Security Computational effort required to break cryptosystem Provable Security Relative to another, difficult
More information1 Introduction. Sept CS497:Learning and NLP Lec 4: Mathematical and Computational Paradigms Fall Consider the following examples:
Sept. 20-22 2005 1 CS497:Learning and NLP Lec 4: Mathematical and Computational Paradigms Fall 2005 In this lecture we introduce some of the mathematical tools that are at the base of the technical approaches
More information6.02 Fall 2011 Lecture #9
6.02 Fall 2011 Lecture #9 Claude E. Shannon Mutual information Channel capacity Transmission at rates up to channel capacity, and with asymptotically zero error 6.02 Fall 2011 Lecture 9, Slide #1 First
More informationApplication of Information Theory, Lecture 7. Relative Entropy. Handout Mode. Iftach Haitner. Tel Aviv University.
Application of Information Theory, Lecture 7 Relative Entropy Handout Mode Iftach Haitner Tel Aviv University. December 1, 2015 Iftach Haitner (TAU) Application of Information Theory, Lecture 7 December
More informationClassical Information Theory Notes from the lectures by prof Suhov Trieste - june 2006
Classical Information Theory Notes from the lectures by prof Suhov Trieste - june 2006 Fabio Grazioso... July 3, 2006 1 2 Contents 1 Lecture 1, Entropy 4 1.1 Random variable...............................
More informationComputational Systems Biology: Biology X
Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#8:(November-08-2010) Cancer and Signals Outline 1 Bayesian Interpretation of Probabilities Information Theory Outline Bayesian
More informationSection 13.3 Probability
288 Section 13.3 Probability Probability is a measure of how likely an event will occur. When the weather forecaster says that there will be a 50% chance of rain this afternoon, the probability that it
More informationLecture 14 February 28
EE/Stats 376A: Information Theory Winter 07 Lecture 4 February 8 Lecturer: David Tse Scribe: Sagnik M, Vivek B 4 Outline Gaussian channel and capacity Information measures for continuous random variables
More informationCoding of memoryless sources 1/35
Coding of memoryless sources 1/35 Outline 1. Morse coding ; 2. Definitions : encoding, encoding efficiency ; 3. fixed length codes, encoding integers ; 4. prefix condition ; 5. Kraft and Mac Millan theorems
More informationLecture 11: Continuous-valued signals and differential entropy
Lecture 11: Continuous-valued signals and differential entropy Biology 429 Carl Bergstrom September 20, 2008 Sources: Parts of today s lecture follow Chapter 8 from Cover and Thomas (2007). Some components
More informationLecture 18: Quantum Information Theory and Holevo s Bound
Quantum Computation (CMU 1-59BB, Fall 2015) Lecture 1: Quantum Information Theory and Holevo s Bound November 10, 2015 Lecturer: John Wright Scribe: Nicolas Resch 1 Question In today s lecture, we will
More informationMonty Hall Puzzle. Draw a tree diagram of possible choices (a possibility tree ) One for each strategy switch or no-switch
Monty Hall Puzzle Example: You are asked to select one of the three doors to open. There is a large prize behind one of the doors and if you select that door, you win the prize. After you select a door,
More informationMAHALAKSHMI ENGINEERING COLLEGE-TRICHY QUESTION BANK UNIT V PART-A. 1. What is binary symmetric channel (AUC DEC 2006)
MAHALAKSHMI ENGINEERING COLLEGE-TRICHY QUESTION BANK SATELLITE COMMUNICATION DEPT./SEM.:ECE/VIII UNIT V PART-A 1. What is binary symmetric channel (AUC DEC 2006) 2. Define information rate? (AUC DEC 2007)
More informationSDS 321: Introduction to Probability and Statistics
SDS 321: Introduction to Probability and Statistics Lecture 2: Conditional probability Purnamrita Sarkar Department of Statistics and Data Science The University of Texas at Austin www.cs.cmu.edu/ psarkar/teaching
More informationShannon s Noisy-Channel Coding Theorem
Shannon s Noisy-Channel Coding Theorem Lucas Slot Sebastian Zur February 2015 Abstract In information theory, Shannon s Noisy-Channel Coding Theorem states that it is possible to communicate over a noisy
More informationNatural Image Statistics and Neural Representations
Natural Image Statistics and Neural Representations Michael Lewicki Center for the Neural Basis of Cognition & Department of Computer Science Carnegie Mellon University? 1 Outline 1. Information theory
More informationUniversity of California, Berkeley, Statistics 134: Concepts of Probability. Michael Lugo, Spring Exam 1
University of California, Berkeley, Statistics 134: Concepts of Probability Michael Lugo, Spring 2011 Exam 1 February 16, 2011, 11:10 am - 12:00 noon Name: Solutions Student ID: This exam consists of seven
More informationProbability Review Lecturer: Ji Liu Thank Jerry Zhu for sharing his slides
Probability Review Lecturer: Ji Liu Thank Jerry Zhu for sharing his slides slide 1 Inference with Bayes rule: Example In a bag there are two envelopes one has a red ball (worth $100) and a black ball one
More informationSeries 7, May 22, 2018 (EM Convergence)
Exercises Introduction to Machine Learning SS 2018 Series 7, May 22, 2018 (EM Convergence) Institute for Machine Learning Dept. of Computer Science, ETH Zürich Prof. Dr. Andreas Krause Web: https://las.inf.ethz.ch/teaching/introml-s18
More informationLecture 6: Entropy Rate
Lecture 6: Entropy Rate Entropy rate H(X) Random walk on graph Dr. Yao Xie, ECE587, Information Theory, Duke University Coin tossing versus poker Toss a fair coin and see and sequence Head, Tail, Tail,
More informationMAT Mathematics in Today's World
MAT 1000 Mathematics in Today's World Last Time We discussed the four rules that govern probabilities: 1. Probabilities are numbers between 0 and 1 2. The probability an event does not occur is 1 minus
More informationSequences and Information
Sequences and Information Rahul Siddharthan The Institute of Mathematical Sciences, Chennai, India http://www.imsc.res.in/ rsidd/ Facets 16, 04/07/2016 This box says something By looking at the symbols
More informationThe enumeration of all possible outcomes of an experiment is called the sample space, denoted S. E.g.: S={head, tail}
Random Experiment In random experiments, the result is unpredictable, unknown prior to its conduct, and can be one of several choices. Examples: The Experiment of tossing a coin (head, tail) The Experiment
More informationAn introduction to basic information theory. Hampus Wessman
An introduction to basic information theory Hampus Wessman Abstract We give a short and simple introduction to basic information theory, by stripping away all the non-essentials. Theoretical bounds on
More informationMAHALAKSHMI ENGINEERING COLLEGE QUESTION BANK. SUBJECT CODE / Name: EC2252 COMMUNICATION THEORY UNIT-V INFORMATION THEORY PART-A
MAHALAKSHMI ENGINEERING COLLEGE QUESTION BANK DEPARTMENT: ECE SEMESTER: IV SUBJECT CODE / Name: EC2252 COMMUNICATION THEORY UNIT-V INFORMATION THEORY PART-A 1. What is binary symmetric channel (AUC DEC
More informationlog 2 N I m m log 2 N + 1 m.
SOPHOMORE COLLEGE MATHEMATICS OF THE INFORMATION AGE SHANNON S THEOREMS Let s recall the fundamental notions of information and entropy. To repeat, Shannon s emphasis is on selecting a given message from
More informationDecision Trees. Tirgul 5
Decision Trees Tirgul 5 Using Decision Trees It could be difficult to decide which pet is right for you. We ll find a nice algorithm to help us decide what to choose without having to think about it. 2
More informationProblems from Probability and Statistical Inference (9th ed.) by Hogg, Tanis and Zimmerman.
Math 224 Fall 2017 Homework 1 Drew Armstrong Problems from Probability and Statistical Inference (9th ed.) by Hogg, Tanis and Zimmerman. Section 1.1, Exercises 4,5,6,7,9,12. Solutions to Book Problems.
More informationCh. 8 Math Preliminaries for Lossy Coding. 8.4 Info Theory Revisited
Ch. 8 Math Preliminaries for Lossy Coding 8.4 Info Theory Revisited 1 Info Theory Goals for Lossy Coding Again just as for the lossless case Info Theory provides: Basis for Algorithms & Bounds on Performance
More information