Information Theory Primer:
|
|
- Britton Warren
- 5 years ago
- Views:
Transcription
1 Information Theory Primer: Entropy, KL Divergence, Mutual Information, Jensen s inequality Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr seungjin 1 / 27
2 Outline Entropy (Shannon entropy, differential entropy, and conditional entropy) Kullback-Leibler (KL) divergence Mutual information Jensen s inequality and Gibb s inequality 2 / 27
3 Information Theory Information theory answers two fundamental questions in communication theory What is the ultimate data compression? entropy H. What is the ultimate transmission rate of communication? channel capacity C. In the early 1940 s, it was thought that increasing the transmission rate of information over a communication channel increased the probability of error This is not true. Shannon surprised the communication theory community by proving that this was not true as long as the communication rate was below the channel capacity. Although information theory was developed for communications, it is also important to explain ecological theory of sensory processing. Information theory plays a key role in elucidating the goal of unsupervised learning. 3 / 27
4 Shannon Entopy 4 / 27
5 Information and Entropy Information can be thought of as surprise, uncertainty, or unexpectedness. Mathematically it is defined by I = log p i, where p i is the probability that the event labelled i occurs. The rare event gives large information and frequent event produces small information. Entropy is average information, i.e., N H = E [I ] = p i log p i. i=1 5 / 27
6 Example: Horse Race Suppose we have a horse race with eight horses taking part. Assume that the probabilities of winning for the eight horse are ( 1 2, 1 4, 1 8, 1 16, 1 64, 1 64, 1 64, 1 64 ). Suppose that we wish to send a message to another person indicating which horse won the race. How many bits are required to describe this for each of the horses? 3 bits for any of the horses? 6 / 27
7 No! The win probabilities are not uniform. It makes sense to use shorter descriptions for the more probable horses and longer descriptions for the less probable ones so that we achieve a lower average description length. For example, we can use the following strings to represent the eight horses: 0, 10, 110, 1110, , , , The average description length in this case is 2 bits as opposed to 3 bits for the uniform code. We calculate the entropy: H = 1 2 log 2 = 2 bits log log log log 2 The entropy of a random variable is a lower bound on the average number of bits required to represent the random variables and also on the average number of questions needed to identify the variable in a game of twenty questions / 27
8 Shannon Entropy Given a discrete random variable X with image X, (Shannon) Entropy is the average information (a measure of uncertainty) that is defined by H(X ) = x X p(x) log p(x) = E p [ log p(x)]. Properties H(X ) 0. (since each term in the summation is nonnegative) H(X ) = 0 if and only if P[X = x] = 1 for some x X. The entropy is maximal if all the outcomes are equally likely. 8 / 27
9 Differential Entropy Given a continuous random variable X, Differential entropy is defined as H(p) = p(x) log p(x)dx = E p [log p(x)]. Properties It can be negative. Given a fixed variance, Gaussian distribution achieves the maximal differential entropy. For x N (µ, σ 2 ), H(x) = 1 2 log(2πeσ2 ). For x N (µ, Σ), H(x) = 1 log det (2πeΣ). 2 9 / 27
10 Conditional Entropy Given two discrete random variables X and Y with images X and Y, respectively, we expand the joint entropy H(X, Y ) = ( ) 1 p(x, y) log p(x, y) x X y Y = ( ) 1 p(x, y) log p(x)p(y x) x X y Y = ( ) 1 p(x, y) log + ( ) 1 p(x, y) log p(x) p(y x) x X y Y x X y Y = ( ) 1 p(x) log + p(x) ( ) 1 p(y x) log p(x) p(y x) x X x X y Y = H(X ) + p(x)h(y X = x). x X }{{} H(Y X ) Chain rule: H(X, Y ) = H(X ) + H(Y X ). 10 / 27
11 H(X, Y ) H(X ) + H(Y ) H(Y X ) H(Y ) Try to prove these by yourself! 11 / 27
12 KL Divergence 12 / 27
13 Relative Entropy (Kullback-Leibler Divergence) Introduced by Solomon Kullback and Richard Leibler in A measure of how one probability distribution q diverges from the other p D KL [p q] (KL divergence of q(x) from p(x)) Discrete probability distributions p and q: D KL [p q] = x p(x) log p(x) q(x). Probability distributions p and q of continuous random variables: D KL [p q] = p(x) log p(x) q(x) dx. Properties of KL divergence Divergence is not symmetric: DKL [p q] D KL [q p]. Divergence is always nonnegative: DKL [p q] 0 (Gibb s inequality). Divergence is a convex function on the domain of probability distributions. 13 / 27
14 Theorem (Convexity of divergence) Let p 1, q 1, and p 2, q 2, be probability distributions over a random variable X and λ (0, 1) define p = λp 1 + (1 λ)p 2, q = λq 1 + (1 λ)q 2. Then, D KL [p q] λd KL [p 1 q 1 ] + (1 λ)d KL [p 2 q 2 ]. Proof. It is deferred to the end of this lecture. 14 / 27
15 Entropy and Divergence The entropy of a random variable X with a probability distribution p(x) is related to how much p(x) diverges from the uniform distribution on the support of X. H(X ) = ( ) 1 p(x) log p(x) x X = ( ) X p(x) log p(x) X x X = log X ( ) p(x) p(x) log 1 x X X = log X D KL [p unif]. The more p(x) diverges the lesser its entropy and vice versa. 15 / 27
16 Recall D KL [p q] = p(x) log p(x) q(x) dx. Characterizing KL divergence If p and q are high, we are happy If p is high but q isn t, we pay a price If p is low, we do not care D KL = 0, then distributions are equal 16 / 27
17 17 / 27
18 18 / 27
19 KL Divergence of Two Gaussians Two univariate Gaussians (x R) p(x) = N (µ1, σ1) 2 and q(x) = N (µ 2, σ2) 2 Calculated as D KL [p q] = p(x) log p(x) q(x) dx = p(x) log p(x)dx p(x) log q(x)dx = 1 σ (µ 2 µ 1) 2 + log σ1 1 2 σ2 2 2 σ2 2 σ 2 2. Two multivariate Gaussians (x R D ) p(x) = N (µ1, Σ 1) and q(x) = N (µ 2, Σ 2) Calculated as D KL [p q] = 1 [ ( ) tr Σ 1 2 Σ 1 + (µ 2 2 µ 1 ) Σ 1 2 (µ 2 µ 1 ) D + log Σ2 ]. Σ 1 19 / 27
20 Mutual Information 20 / 27
21 Mutual Information Mutual information is the relative entropy between the joint distribution and the product of marginal distributions, I (x, y) = [ ] p(x, y) p(x, y) log p(x)p(y) x X y Y = D KL [p(x, y) p(x)p(y)] [ ]} p(x, y) = E p(x,y) {log. p(x)p(y) Mutual information can be interpreted as the reduction in the uncertainty of x due to the knowledge of y, i.e., I (x, y) = H(x) H(x y), where H(x y) = E p(x,y) [log p(x y)] is the conditional entropy 21 / 27
22 Convexity, Jensen s inequality, and Gibb s inequality 22 / 27
23 Convex Sets and Functions Definition (Convex Sets) Let C be a subset of R m. C is called a convex set if αx + (1 α)y C, x, y C, α [0, 1] Definition (Convex Function) Let C be a convex subset of R m. A function f : C R is called a convex function if f (αx + (1 α)y) αf (x) + (1 α)f (y) x, y C, α [0, 1] 23 / 27
24 Jensen s Inequality Theorem (Jensen s Inequality) If f (x) is a convex function and x is a random vector, then E [f (x)] f (E [x]). Note: Jensen s inequality can also be rewritten for a concave function, with the direction of the inequality reversed. 24 / 27
25 Proof of Jensen s Inequality ( N Need to show that N i=1 p if (x i ) f i=1 p ix i ). The proof is based on the recursion, working from the right-hand side of this equation. ( N ) ( ) N f p i x i = f p 1x 1 + p i x i i=1 and so forth. i=2 [ N ] ( N ) ( ) i=2 p 1f (x 1) + p i f p ix i N i=2 i=2 p choose α = p1 N i i=1 p i [ N ] ( N )} i=3 p 1f (x 1) + p i {αf (x 2) + (1 α)f p ix i N i=2 i=3 p i ( ) choose α = p2 N i=2 p i ( N ) N i=3 = p 1f (x 1) + p 2f (x 2) + p i f p ix i N i=3 p, i i=3 25 / 27
26 Gibb s Inequality Theorem D KL [p q] 0 with equality iff p = q. Proof: Consider the Kullback-Leibler divergence for discrete distributions: D KL [p q] = i p i log p i q i = p i log q i p i i [ ] q i log p i p i i [ ] = log q i = 0. i (by Jensen s inequality) 26 / 27
27 More on Gibb s Inequality In order to find the distribution p which minimizes D KL [p q], we consider a Lagrangian ( E = D KL [p q] + λ 1 ) p i = ( p i log p i + λ 1 ) p i. q i i i i Compute the partial derivative E p k and set to zero, E p k = log p k log q k + 1 λ = 0, which leads to p k = q k e λ 1. It follows from i p i = 1 that i q ie λ 1 = 1, which leads to λ = 1. Therefore p i = q i. The Hessian, 2 E = 1 p i 2 p i, 2 E p i p j = 0, is positive definite, which shows that p i = q i is a genuine minimum. 27 / 27
Expectation Maximization
Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /
More informationIntroduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.
L65 Dept. of Linguistics, Indiana University Fall 205 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission rate
More informationDept. of Linguistics, Indiana University Fall 2015
L645 Dept. of Linguistics, Indiana University Fall 2015 1 / 28 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission
More informationLecture 5 - Information theory
Lecture 5 - Information theory Jan Bouda FI MU May 18, 2012 Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 1 / 42 Part I Uncertainty and entropy Jan Bouda (FI MU) Lecture 5 - Information
More informationMachine Learning Srihari. Information Theory. Sargur N. Srihari
Information Theory Sargur N. Srihari 1 Topics 1. Entropy as an Information Measure 1. Discrete variable definition Relationship to Code Length 2. Continuous Variable Differential Entropy 2. Maximum Entropy
More informationLecture 2: August 31
0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 2: August 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy
More informationLecture 1: Introduction, Entropy and ML estimation
0-704: Information Processing and Learning Spring 202 Lecture : Introduction, Entropy and ML estimation Lecturer: Aarti Singh Scribes: Min Xu Disclaimer: These notes have not been subjected to the usual
More information3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H.
Appendix A Information Theory A.1 Entropy Shannon (Shanon, 1948) developed the concept of entropy to measure the uncertainty of a discrete random variable. Suppose X is a discrete random variable that
More informationCOMPSCI 650 Applied Information Theory Jan 21, Lecture 2
COMPSCI 650 Applied Information Theory Jan 21, 2016 Lecture 2 Instructor: Arya Mazumdar Scribe: Gayane Vardoyan, Jong-Chyi Su 1 Entropy Definition: Entropy is a measure of uncertainty of a random variable.
More informationMachine Learning. Lecture 02.2: Basics of Information Theory. Nevin L. Zhang
Machine Learning Lecture 02.2: Basics of Information Theory Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering The Hong Kong University of Science and Technology Nevin L. Zhang
More information5 Mutual Information and Channel Capacity
5 Mutual Information and Channel Capacity In Section 2, we have seen the use of a quantity called entropy to measure the amount of randomness in a random variable. In this section, we introduce several
More informationChapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye
Chapter 2: Entropy and Mutual Information Chapter 2 outline Definitions Entropy Joint entropy, conditional entropy Relative entropy, mutual information Chain rules Jensen s inequality Log-sum inequality
More informationComputing and Communications 2. Information Theory -Entropy
1896 1920 1987 2006 Computing and Communications 2. Information Theory -Entropy Ying Cui Department of Electronic Engineering Shanghai Jiao Tong University, China 2017, Autumn 1 Outline Entropy Joint entropy
More informationECE 4400:693 - Information Theory
ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential
More informationCS 630 Basic Probability and Information Theory. Tim Campbell
CS 630 Basic Probability and Information Theory Tim Campbell 21 January 2003 Probability Theory Probability Theory is the study of how best to predict outcomes of events. An experiment (or trial or event)
More informationThe binary entropy function
ECE 7680 Lecture 2 Definitions and Basic Facts Objective: To learn a bunch of definitions about entropy and information measures that will be useful through the quarter, and to present some simple but
More informationHomework 1 Due: Thursday 2/5/2015. Instructions: Turn in your homework in class on Thursday 2/5/2015
10-704 Homework 1 Due: Thursday 2/5/2015 Instructions: Turn in your homework in class on Thursday 2/5/2015 1. Information Theory Basics and Inequalities C&T 2.47, 2.29 (a) A deck of n cards in order 1,
More informationHands-On Learning Theory Fall 2016, Lecture 3
Hands-On Learning Theory Fall 016, Lecture 3 Jean Honorio jhonorio@purdue.edu 1 Information Theory First, we provide some information theory background. Definition 3.1 (Entropy). The entropy of a discrete
More informationEntropies & Information Theory
Entropies & Information Theory LECTURE I Nilanjana Datta University of Cambridge,U.K. See lecture notes on: http://www.qi.damtp.cam.ac.uk/node/223 Quantum Information Theory Born out of Classical Information
More informationInformation Theory. Coding and Information Theory. Information Theory Textbooks. Entropy
Coding and Information Theory Chris Williams, School of Informatics, University of Edinburgh Overview What is information theory? Entropy Coding Information Theory Shannon (1948): Information theory is
More informationInformation Theory and Communication
Information Theory and Communication Ritwik Banerjee rbanerjee@cs.stonybrook.edu c Ritwik Banerjee Information Theory and Communication 1/8 General Chain Rules Definition Conditional mutual information
More informationComputational Systems Biology: Biology X
Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#8:(November-08-2010) Cancer and Signals Outline 1 Bayesian Interpretation of Probabilities Information Theory Outline Bayesian
More informationCS 591, Lecture 2 Data Analytics: Theory and Applications Boston University
CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University Charalampos E. Tsourakakis January 25rd, 2017 Probability Theory The theory of probability is a system for making better guesses.
More information4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information
4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information Ramji Venkataramanan Signal Processing and Communications Lab Department of Engineering ramji.v@eng.cam.ac.uk
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationInformation Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18
Information Theory David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 18 A Measure of Information? Consider a discrete random variable
More informationLecture 22: Final Review
Lecture 22: Final Review Nuts and bolts Fundamental questions and limits Tools Practical algorithms Future topics Dr Yao Xie, ECE587, Information Theory, Duke University Basics Dr Yao Xie, ECE587, Information
More information1.6. Information Theory
48. INTRODUCTION Section 5.6 Exercise.7 (b) First solve the inference problem of determining the conditional density p(t x), and then subsequently marginalize to find the conditional mean given by (.89).
More informationInformation in Biology
Lecture 3: Information in Biology Tsvi Tlusty, tsvi@unist.ac.kr Living information is carried by molecular channels Living systems I. Self-replicating information processors Environment II. III. Evolve
More informationLogistic Regression. Seungjin Choi
Logistic Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationLecture 14 February 28
EE/Stats 376A: Information Theory Winter 07 Lecture 4 February 8 Lecturer: David Tse Scribe: Sagnik M, Vivek B 4 Outline Gaussian channel and capacity Information measures for continuous random variables
More informationBayes Decision Theory
Bayes Decision Theory Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 / 16
More informationLECTURE 2. Convexity and related notions. Last time: mutual information: definitions and properties. Lecture outline
LECTURE 2 Convexity and related notions Last time: Goals and mechanics of the class notation entropy: definitions and properties mutual information: definitions and properties Lecture outline Convexity
More informationInformation in Biology
Information in Biology CRI - Centre de Recherches Interdisciplinaires, Paris May 2012 Information processing is an essential part of Life. Thinking about it in quantitative terms may is useful. 1 Living
More informationChapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye
Chapter 8: Differential entropy Chapter 8 outline Motivation Definitions Relation to discrete entropy Joint and conditional differential entropy Relative entropy and mutual information Properties AEP for
More informationIntroduction to Machine Learning
Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB
More informationLinear Models for Regression
Linear Models for Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationConditional Independence and Factorization
Conditional Independence and Factorization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationLinear Models for Regression
Linear Models for Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationDEEP LEARNING CHAPTER 3 PROBABILITY & INFORMATION THEORY
DEEP LEARNING CHAPTER 3 PROBABILITY & INFORMATION THEORY OUTLINE 3.1 Why Probability? 3.2 Random Variables 3.3 Probability Distributions 3.4 Marginal Probability 3.5 Conditional Probability 3.6 The Chain
More informationLecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157
Lecture 6: Gaussian Channels Copyright G. Caire (Sample Lectures) 157 Differential entropy (1) Definition 18. The (joint) differential entropy of a continuous random vector X n p X n(x) over R is: Z h(x
More informationFinite Automata. Seungjin Choi
Finite Automata Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 / 28 Outline
More informationPROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability
More informationChapter 2 Review of Classical Information Theory
Chapter 2 Review of Classical Information Theory Abstract This chapter presents a review of the classical information theory which plays a crucial role in this thesis. We introduce the various types of
More informationECE Information theory Final
ECE 776 - Information theory Final Q1 (1 point) We would like to compress a Gaussian source with zero mean and variance 1 We consider two strategies In the first, we quantize with a step size so that the
More information3F1 Information Theory, Lecture 1
3F1 Information Theory, Lecture 1 Jossy Sayir Department of Engineering Michaelmas 2013, 22 November 2013 Organisation History Entropy Mutual Information 2 / 18 Course Organisation 4 lectures Course material:
More informationMachine learning - HT Maximum Likelihood
Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce
More informationINTRODUCTION TO INFORMATION THEORY
INTRODUCTION TO INFORMATION THEORY KRISTOFFER P. NIMARK These notes introduce the machinery of information theory which is a eld within applied mathematics. The material can be found in most textbooks
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality
More informationNeural coding Ecological approach to sensory coding: efficient adaptation to the natural environment
Neural coding Ecological approach to sensory coding: efficient adaptation to the natural environment Jean-Pierre Nadal CNRS & EHESS Laboratoire de Physique Statistique (LPS, UMR 8550 CNRS - ENS UPMC Univ.
More informationNoisy-Channel Coding
Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/05264298 Part II Noisy-Channel Coding Copyright Cambridge University Press 2003.
More informationLecture 5 Channel Coding over Continuous Channels
Lecture 5 Channel Coding over Continuous Channels I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw November 14, 2014 1 / 34 I-Hsiang Wang NIT Lecture 5 From
More informationNonparameteric Regression:
Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,
More informationInformation Theory in Intelligent Decision Making
Information Theory in Intelligent Decision Making Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire, United Kingdom June 7, 2015 Information Theory
More informationECE598: Information-theoretic methods in high-dimensional statistics Spring 2016
ECE598: Information-theoretic methods in high-dimensional statistics Spring 06 Lecture : Mutual Information Method Lecturer: Yihong Wu Scribe: Jaeho Lee, Mar, 06 Ed. Mar 9 Quick review: Assouad s lemma
More informationPart I. Entropy. Information Theory and Networks. Section 1. Entropy: definitions. Lecture 5: Entropy
and Networks Lecture 5: Matthew Roughan http://www.maths.adelaide.edu.au/matthew.roughan/ Lecture_notes/InformationTheory/ Part I School of Mathematical Sciences, University
More informationApplication of Information Theory, Lecture 7. Relative Entropy. Handout Mode. Iftach Haitner. Tel Aviv University.
Application of Information Theory, Lecture 7 Relative Entropy Handout Mode Iftach Haitner Tel Aviv University. December 1, 2015 Iftach Haitner (TAU) Application of Information Theory, Lecture 7 December
More informationIndependent Component Analysis (ICA)
Independent Component Analysis (ICA) Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationthe Information Bottleneck
the Information Bottleneck Daniel Moyer December 10, 2017 Imaging Genetics Center/Information Science Institute University of Southern California Sorry, no Neuroimaging! (at least not presented) 0 Instead,
More informationNonnegative Matrix Factorization
Nonnegative Matrix Factorization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationClassification & Information Theory Lecture #8
Classification & Information Theory Lecture #8 Introduction to Natural Language Processing CMPSCI 585, Fall 2007 University of Massachusetts Amherst Andrew McCallum Today s Main Points Automatically categorizing
More informationMathematical Preliminaries
Mathematical Preliminaries Economics 3307 - Intermediate Macroeconomics Aaron Hedlund Baylor University Fall 2013 Econ 3307 (Baylor University) Mathematical Preliminaries Fall 2013 1 / 25 Outline I: Sequences
More informationInformation. = more information was provided by the outcome in #2
Outline First part based very loosely on [Abramson 63]. Information theory usually formulated in terms of information channels and coding will not discuss those here.. Information 2. Entropy 3. Mutual
More informationA Gentle Tutorial on Information Theory and Learning. Roni Rosenfeld. Carnegie Mellon University
A Gentle Tutorial on Information Theory and Learning Roni Rosenfeld Mellon University Mellon Outline First part based very loosely on [Abramson 63]. Information theory usually formulated in terms of information
More informationComplex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity
Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity Eckehard Olbrich MPI MiS Leipzig Potsdam WS 2007/08 Olbrich (Leipzig) 26.10.2007 1 / 18 Overview 1 Summary
More informationLecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016
Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,
More informationExpectation Propagation Algorithm
Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,
More informationInformation Theory. Week 4 Compressing streams. Iain Murray,
Information Theory http://www.inf.ed.ac.uk/teaching/courses/it/ Week 4 Compressing streams Iain Murray, 2014 School of Informatics, University of Edinburgh Jensen s inequality For convex functions: E[f(x)]
More informationIntroduction to Statistical Learning Theory
Introduction to Statistical Learning Theory In the last unit we looked at regularization - adding a w 2 penalty. We add a bias - we prefer classifiers with low norm. How to incorporate more complicated
More informationNoisy channel communication
Information Theory http://www.inf.ed.ac.uk/teaching/courses/it/ Week 6 Communication channels and Information Some notes on the noisy channel setup: Iain Murray, 2012 School of Informatics, University
More informationx log x, which is strictly convex, and use Jensen s Inequality:
2. Information measures: mutual information 2.1 Divergence: main inequality Theorem 2.1 (Information Inequality). D(P Q) 0 ; D(P Q) = 0 iff P = Q Proof. Let ϕ(x) x log x, which is strictly convex, and
More informationInformation Theory. M1 Informatique (parcours recherche et innovation) Aline Roumy. January INRIA Rennes 1/ 73
1/ 73 Information Theory M1 Informatique (parcours recherche et innovation) Aline Roumy INRIA Rennes January 2018 Outline 2/ 73 1 Non mathematical introduction 2 Mathematical introduction: definitions
More informationMGMT 69000: Topics in High-dimensional Data Analysis Falll 2016
MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016 Lecture 14: Information Theoretic Methods Lecturer: Jiaming Xu Scribe: Hilda Ibriga, Adarsh Barik, December 02, 2016 Outline f-divergence
More informationQuantitative Biology II Lecture 4: Variational Methods
10 th March 2015 Quantitative Biology II Lecture 4: Variational Methods Gurinder Singh Mickey Atwal Center for Quantitative Biology Cold Spring Harbor Laboratory Image credit: Mike West Summary Approximate
More informationOutline of the Lecture. Background and Motivation. Basics of Information Theory: 1. Introduction. Markku Juntti. Course Overview
: Markku Juntti Overview The basic ideas and concepts of information theory are introduced. Some historical notes are made and the overview of the course is given. Source The material is mainly based on
More informationStatistical Machine Learning Lectures 4: Variational Bayes
1 / 29 Statistical Machine Learning Lectures 4: Variational Bayes Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 29 Synonyms Variational Bayes Variational Inference Variational Bayesian Inference
More informationQuiz 2 Date: Monday, November 21, 2016
10-704 Information Processing and Learning Fall 2016 Quiz 2 Date: Monday, November 21, 2016 Name: Andrew ID: Department: Guidelines: 1. PLEASE DO NOT TURN THIS PAGE UNTIL INSTRUCTED. 2. Write your name,
More informationBioinformatics: Biology X
Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA Model Building/Checking, Reverse Engineering, Causality Outline 1 Bayesian Interpretation of Probabilities 2 Where (or of what)
More informationLecture 17: Differential Entropy
Lecture 17: Differential Entropy Differential entropy AEP for differential entropy Quantization Maximum differential entropy Estimation counterpart of Fano s inequality Dr. Yao Xie, ECE587, Information
More informationIntroduction to Machine Learning
What does this mean? Outline Contents Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola December 26, 2017 1 Introduction to Probability 1 2 Random Variables 3 3 Bayes
More informationExample: Letter Frequencies
Example: Letter Frequencies i a i p i 1 a 0.0575 2 b 0.0128 3 c 0.0263 4 d 0.0285 5 e 0.0913 6 f 0.0173 7 g 0.0133 8 h 0.0313 9 i 0.0599 10 j 0.0006 11 k 0.0084 12 l 0.0335 13 m 0.0235 14 n 0.0596 15 o
More informationExample: Letter Frequencies
Example: Letter Frequencies i a i p i 1 a 0.0575 2 b 0.0128 3 c 0.0263 4 d 0.0285 5 e 0.0913 6 f 0.0173 7 g 0.0133 8 h 0.0313 9 i 0.0599 10 j 0.0006 11 k 0.0084 12 l 0.0335 13 m 0.0235 14 n 0.0596 15 o
More informationInformation Theory, Statistics, and Decision Trees
Information Theory, Statistics, and Decision Trees Léon Bottou COS 424 4/6/2010 Summary 1. Basic information theory. 2. Decision trees. 3. Information theory and statistics. Léon Bottou 2/31 COS 424 4/6/2010
More informationLecture 18: Quantum Information Theory and Holevo s Bound
Quantum Computation (CMU 1-59BB, Fall 2015) Lecture 1: Quantum Information Theory and Holevo s Bound November 10, 2015 Lecturer: John Wright Scribe: Nicolas Resch 1 Question In today s lecture, we will
More informationQB LECTURE #4: Motif Finding
QB LECTURE #4: Motif Finding Adam Siepel Nov. 20, 2015 2 Plan for Today Probability models for binding sites Scoring and detecting binding sites De novo motif finding 3 Transcription Initiation Chromatin
More informationECE 587 / STA 563: Lecture 2 Measures of Information Information Theory Duke University, Fall 2017
ECE 587 / STA 563: Lecture 2 Measures of Information Information Theory Duke University, Fall 207 Author: Galen Reeves Last Modified: August 3, 207 Outline of lecture: 2. Quantifying Information..................................
More informationLecture 1: September 25, A quick reminder about random variables and convexity
Information and Coding Theory Autumn 207 Lecturer: Madhur Tulsiani Lecture : September 25, 207 Administrivia This course will cover some basic concepts in information and coding theory, and their applications
More informationInformation theory and decision tree
Information theory and decision tree Jianxin Wu LAMDA Group National Key Lab for Novel Software Technology Nanjing University, China wujx2001@gmail.com June 14, 2018 Contents 1 Prefix code and Huffman
More informationLECTURE 3. Last time:
LECTURE 3 Last time: Mutual Information. Convexity and concavity Jensen s inequality Information Inequality Data processing theorem Fano s Inequality Lecture outline Stochastic processes, Entropy rate
More informationSeries 7, May 22, 2018 (EM Convergence)
Exercises Introduction to Machine Learning SS 2018 Series 7, May 22, 2018 (EM Convergence) Institute for Machine Learning Dept. of Computer Science, ETH Zürich Prof. Dr. Andreas Krause Web: https://las.inf.ethz.ch/teaching/introml-s18
More informationInformation Theory: Entropy, Markov Chains, and Huffman Coding
The University of Notre Dame A senior thesis submitted to the Department of Mathematics and the Glynn Family Honors Program Information Theory: Entropy, Markov Chains, and Huffman Coding Patrick LeBlanc
More informationEntropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information
Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information 1 Conditional entropy Let (Ω, F, P) be a probability space, let X be a RV taking values in some finite set A. In this lecture
More informationMedical Imaging. Norbert Schuff, Ph.D. Center for Imaging of Neurodegenerative Diseases
Uses of Information Theory in Medical Imaging Norbert Schuff, Ph.D. Center for Imaging of Neurodegenerative Diseases Norbert.schuff@ucsf.edu With contributions from Dr. Wang Zhang Medical Imaging Informatics,
More informationRevision of Lecture 5
Revision of Lecture 5 Information transferring across channels Channel characteristics and binary symmetric channel Average mutual information Average mutual information tells us what happens to information
More informationH(X) = plog 1 p +(1 p)log 1 1 p. With a slight abuse of notation, we denote this quantity by H(p) and refer to it as the binary entropy function.
LECTURE 2 Information Measures 2. ENTROPY LetXbeadiscreterandomvariableonanalphabetX drawnaccordingtotheprobability mass function (pmf) p() = P(X = ), X, denoted in short as X p(). The uncertainty about
More informationVariational Inference (11/04/13)
STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further
More information1 Introduction to information theory
1 Introduction to information theory 1.1 Introduction In this chapter we present some of the basic concepts of information theory. The situations we have in mind involve the exchange of information through
More informationWeek 3: The EM algorithm
Week 3: The EM algorithm Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2005 Mixtures of Gaussians Data: Y = {y 1... y N } Latent
More informationLecture 8: Channel Capacity, Continuous Random Variables
EE376A/STATS376A Information Theory Lecture 8-02/0/208 Lecture 8: Channel Capacity, Continuous Random Variables Lecturer: Tsachy Weissman Scribe: Augustine Chemparathy, Adithya Ganesh, Philip Hwang Channel
More informationVariable selection and feature construction using methods related to information theory
Outline Variable selection and feature construction using methods related to information theory Kari 1 1 Intelligent Systems Lab, Motorola, Tempe, AZ IJCNN 2007 Outline Outline 1 Information Theory and
More information