the Information Bottleneck
|
|
- Belinda Stafford
- 5 years ago
- Views:
Transcription
1 the Information Bottleneck Daniel Moyer December 10, 2017 Imaging Genetics Center/Information Science Institute University of Southern California
2 Sorry, no Neuroimaging! (at least not presented) 0
3 Instead, a deep dive into a relevant topic. 0
4 Instead, a medium dive into a relevant topic. 0
5 Rate-Distortion Theory
6 Sending a signal 1
7 Sending a signal Send: X = NEURON 1
8 Sending a signal Binary Wire Send: X = NEURON 1
9 Sending a signal N E Naïve encoding Len log symbols Send: X = NEURON Rec: X = NEURON 1
10 Sending a signal X Lossless (Huffman) encoding Len 3.75 = 22.5 symbols Send: X = NEURON Rec: X = NEURON 1
11 Sending a signal X (Example) Lossy Encoding Len 3.75 = 17.5 symbols Send: X = NEURON Rec: X = NERON 1
12 Choosing a lossy code What makes a good encoding? (According to Claude Shannon) Low Rate: We want short messages. Low Distortion: We want our messages to still make sense. Objective: Learn the best encoding p : X X. Clearly this also depends on a measure of distortion d : X X R + 2
13 Preliminaries Entropy (description length): H(X) = E p(x) [ log p(x)] = x p(x) log p(x) Mutual Information (Transmission Rate ): I(X; Y) = E p(x,y) [log Technically a bound. p(x, y) p(x)p(y) ] = H(X) H(X Y) change in desc. length = H(Y) H(Y X) 3
14 Rate Distortion Theory What makes a good encoding? (According to Claude Shannon) Low Rate: We want short messages. Low Distortion: We want our messages to still make sense. Objective: minimize p( x x) subject to I(X; X) d(x, X) < D given d : X X R + and D. 4
15 Rate Distortion Theory Rate Distortion Theorem Define the function (Shannon and Kolmogorov) R(D) = min p( x x) s.t. d(x, x) D I(X; X) as the minimum achievable rate under distortion constraint D. Then an encoding that achieves this rate is p( x x) = p( x) exp[ βd(x, x)] Z(x, β) 5
16 Rate Distortion Theory Sketch: How did we get there? R(D) = min p( x x) s.t. d(x, x) D I(X; X) original problem (1) F[p( x x)] = I(X; X) + βe p( x,x) [d(x, x)] Lagrange Mult. (2) δf = 0 Opt. Cond. (3) δp( x x) log p( x x) p( x) 0 = p(x)[log p( x x) p( x) λ(x) = βd(x, x) p(x) + βd(x, x) + λ(x) p(x) ] p( x x) = p( x) exp( βd(x, x)) exp( λ(x) p(x) ) (4) 6
17 Rate Distortion Theory X Sketch: How did we get there? R(D) = min I(X; X) original problem (1) p( x x) s.t. d(x, x) D F[p( x x)] = I(X; X) + βe p( x,x) [d(x, x)] Lagrange Mult. (2) δf = 0 Opt. Cond. (3) δp( x x) 0 = p(x)[log p( x x) λ(x) + βd(x, x) + p( x) p(x) ] log p( x x) λ(x) = βd(x, x) p( x) p(x) p( x x) = p( x) exp( βd(x, x)) exp( λ(x) p(x) ) (4) 6
18 Rate Distortion Theory Sketch: How did we get there? 1. We have one unknown p( x x) for our desiderata minimize p( x x) I(X; X) 2. We have two constraints, d(x, x) D and x p( x x) = We can relax the first constraint and form a functional F[p( x x)] = I(X; X) + βe p( x,x) [d(x, x)] 4. We know about Lagrange Multipliers. 7
19 Computational Solutions Blahut Arimoto Algorithm Iterate these two functions p t+1 ( x) = x p(x)p t ( x x) p t+1 ( x x) = p t( x) exp[ βd(x, x)] Z(x, β) Further we are guaranteed convergence as F is convex. 8
20 8
21 Information Bottleneck
22 Bottleneck Slides X = Weather in Florida Y = Price of Oranges 9
23 Bottleneck Slides X Most Relevant Parts of X X = Weather in Florida Y = Price of Oranges 9
24 More Preliminaries Entropy (description length) under X p(x): H(X) = E p(x) [ log p(x)] = x p(x) log p(x) Mutual Information (Transmission Rate bound): I(X; Y) = E p(x,y) [log p(x, y) p(x)p(y) ] Kullback Leibler Divergence (misspecified description length): D KL [p q] = E p(x) [log p(x) log q(x)] = E p [ log q(x)] H(X) }{{} Cross-entropy 10
25 Information Bottleneck What makes a good encoding? (With respect to Y) Low Rate: We want short messages. High Relevance: We want our messages to be relevant to some Y outside variable. Objective: Given L, minimize p( x x) subject to I(X; X) I( X, Y) > L 11
26 Bottleneck Slides The Information Bottleneck Define the function R(L) = min p( x x) s.t. I( x;y) L (Tishby, Pereira, and Bialek) I(X; X) as the minimum achievable rate while preserving L bits of mutual information. Then an encoding that achieves this rate has the form p( x x) = p( x) Z(x, β) exp( βd KL[ p(y x) p(y x) ]) 12
27 Rate Distortion Theory Sketch: How did we get there? 1. We have one unknown p( x x) for our desiderata minimize p( x x) I(X; X) 2. We have three constraints, I( X; Y) L, x p( x x) = 1, and p(y x) = 1. y 3. We can relax the first constraint and form a functional F[p( x x)] = I(X; X) + βi( X; Y) 4. We know about Lagrange Multipliers. 13
28 Bottleneck Computation Information Bottleneck Algorithm Iterate these three functions p t+1 ( x x) = p t+1 ( x) = x p t( x) Z(x, β) exp[ βd KL[ p(y x) p t (y x) ])] p(x)p t ( x x) p t+1 (y x) = y p(y x)p t (x x) In general, this will only converge locally, but we have a bound on the amount of information still in X but not in X about Y, given by I(X; Y). 14
29 Bottleneck parameter Not shown, optimization trajectories. 15
30 Similar in effect to Clustering (receiving partition of x), but No guarantee of disentangled representations Optimizing over p( x x) Relevance! Has similar problems to solve (e.g. choosing the size of encoding x). 16
31 Multivariate Bottleneck
32 Multivariate IB Preliminaries: I(X 1,..., X n ) = D KL [P(X 1,..., X n ) P(X 1 ) P(X n )] [ = E P log P(X ] 1,..., X n ) P(X 1 ) P(X n ) P consistent with DAG G = P = G P(X 1,..., X n ) = i P(X i Parents G (X i )) 17
33 Multivariate IB If P = G then I(X 1,..., X n ) = i I(X i ; Parents G (X i )) In general D(P P G ) = i I(X i ; NotParents G (X i ) Parents G (X i )) = I(X 1,..., X n ) I G 18
34 Multivariate IB X Y X Y X X Graph G input. Graph G output. Using L = (1 + γ)i(g input ) γi(g output ) this produces the regular IB. 19
35 Multivariate IB X Y X Y T 1 T 2 T 1 T 2 20
36 Scaling this might be hard. 20
37 Modern Bottleneck
38 Deep Variational IB Alemi et al. 2016, very similar to Achilles & Soatto
39 Deep Variational IB Main ideas of Alemi et al./achilles & Soatto 2016: 1. Deep networks are great function approximators. 22
40 Deep Variational IB Main ideas of Alemi et al./achilles & Soatto 2016: 1. Deep networks are great function approximators. 2. We want to optimize p(ˆx x), but propagating error past the stochastic loss L(x, y, p) = p(y ˆx)p(ˆx y) is hard. 22
41 Deep Variational IB Main ideas of Alemi et al./achilles & Soatto 2016: 1. Deep networks are great function approximators. 2. We want to optimize p(ˆx x), but propagating error past the stochastic loss L(x, y, p) = p(y ˆx)p(ˆx y) is hard. 3. Using technology from Variational Autoencoders, we can propagate derivatives from p(y ˆx) to p(ˆx x). (The re-parameterization trick of Kingma & Welling 2014) 22
42 Deep Variational IB Main ideas of Alemi et al./achilles & Soatto 2016: 1. Deep networks are great function approximators. 2. We want to optimize p(ˆx x), but propagating error past the stochastic loss L(x, y, p) = p(y ˆx)p(ˆx y) is hard. 3. Using technology from Variational Autoencoders, we can propagate derivatives from p(y ˆx) to p(ˆx x). (The re-parameterization trick of Kingma & Welling 2014) 4. Problems: calculating Mutual Information is actually quite difficult. Using an independent KDE estimator here, perhaps not optimal. 22
43 Other recent developments 23
44 Connections to Deep Learning Stolen directly from Tishby and Zaslavsky
45 Main Points of Tishby Claims: 1. To learn is to forget irrelevant information. 2. The layers of a deep network are (iteratively) applying a bottleneck principle. 3. The final layers should hopefully have only relevant information. 4. (Not in the paper) Backprop training produces a learning pattern w.r.t. the bottleneck objectives. 25
46 Counter Arguments (Under Review) Abstract from Anon. ICLR Submission this year. 26
47 Shannon s Warning
48 Other recent developments 27
arxiv:physics/ v1 [physics.data-an] 24 Apr 2000 Naftali Tishby, 1,2 Fernando C. Pereira, 3 and William Bialek 1
The information bottleneck method arxiv:physics/0004057v1 [physics.data-an] 24 Apr 2000 Naftali Tishby, 1,2 Fernando C. Pereira, 3 and William Bialek 1 1 NEC Research Institute, 4 Independence Way Princeton,
More informationComputational Systems Biology: Biology X
Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#8:(November-08-2010) Cancer and Signals Outline 1 Bayesian Interpretation of Probabilities Information Theory Outline Bayesian
More informationBioinformatics: Biology X
Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA Model Building/Checking, Reverse Engineering, Causality Outline 1 Bayesian Interpretation of Probabilities 2 Where (or of what)
More informationIntroduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.
L65 Dept. of Linguistics, Indiana University Fall 205 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission rate
More informationDept. of Linguistics, Indiana University Fall 2015
L645 Dept. of Linguistics, Indiana University Fall 2015 1 / 28 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission
More informationOptimal Deep Learning and the Information Bottleneck method
1 Optimal Deep Learning and the Information Bottleneck method ICRI-CI retreat, Haifa, May 2015 Naftali Tishby Noga Zaslavsky School of Engineering and Computer Science The Edmond & Lily Safra Center for
More informationInformation Theory Primer:
Information Theory Primer: Entropy, KL Divergence, Mutual Information, Jensen s inequality Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,
More informationComplex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity
Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity Eckehard Olbrich MPI MiS Leipzig Potsdam WS 2007/08 Olbrich (Leipzig) 26.10.2007 1 / 18 Overview 1 Summary
More informationInformation Theory. Coding and Information Theory. Information Theory Textbooks. Entropy
Coding and Information Theory Chris Williams, School of Informatics, University of Edinburgh Overview What is information theory? Entropy Coding Information Theory Shannon (1948): Information theory is
More informationLecture 22: Final Review
Lecture 22: Final Review Nuts and bolts Fundamental questions and limits Tools Practical algorithms Future topics Dr Yao Xie, ECE587, Information Theory, Duke University Basics Dr Yao Xie, ECE587, Information
More information4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information
4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information Ramji Venkataramanan Signal Processing and Communications Lab Department of Engineering ramji.v@eng.cam.ac.uk
More informationTTIC 31230, Fundamentals of Deep Learning David McAllester, April Information Theory and Distribution Modeling
TTIC 31230, Fundamentals of Deep Learning David McAllester, April 2017 Information Theory and Distribution Modeling Why do we model distributions and conditional distributions using the following objective
More information3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H.
Appendix A Information Theory A.1 Entropy Shannon (Shanon, 1948) developed the concept of entropy to measure the uncertainty of a discrete random variable. Suppose X is a discrete random variable that
More informationDeep Learning and Information Theory
Deep Learning and Information Theory Bhumesh Kumar (13D070060) Alankar Kotwal (12D070010) November 21, 2016 Abstract T he machine learning revolution has recently led to the development of a new flurry
More informationQuiz 2 Date: Monday, November 21, 2016
10-704 Information Processing and Learning Fall 2016 Quiz 2 Date: Monday, November 21, 2016 Name: Andrew ID: Department: Guidelines: 1. PLEASE DO NOT TURN THIS PAGE UNTIL INSTRUCTED. 2. Write your name,
More informationCOMPSCI 650 Applied Information Theory Jan 21, Lecture 2
COMPSCI 650 Applied Information Theory Jan 21, 2016 Lecture 2 Instructor: Arya Mazumdar Scribe: Gayane Vardoyan, Jong-Chyi Su 1 Entropy Definition: Entropy is a measure of uncertainty of a random variable.
More informationExpectation Maximization
Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /
More informationHands-On Learning Theory Fall 2016, Lecture 3
Hands-On Learning Theory Fall 016, Lecture 3 Jean Honorio jhonorio@purdue.edu 1 Information Theory First, we provide some information theory background. Definition 3.1 (Entropy). The entropy of a discrete
More informationInformation Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18
Information Theory David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 18 A Measure of Information? Consider a discrete random variable
More informationCommunication Theory and Engineering
Communication Theory and Engineering Master's Degree in Electronic Engineering Sapienza University of Rome A.A. 018-019 Information theory Practice work 3 Review For any probability distribution, we define
More informationThe Method of Types and Its Application to Information Hiding
The Method of Types and Its Application to Information Hiding Pierre Moulin University of Illinois at Urbana-Champaign www.ifp.uiuc.edu/ moulin/talks/eusipco05-slides.pdf EUSIPCO Antalya, September 7,
More informationInformation Theory in Intelligent Decision Making
Information Theory in Intelligent Decision Making Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire, United Kingdom June 7, 2015 Information Theory
More informationLecture 1: Introduction, Entropy and ML estimation
0-704: Information Processing and Learning Spring 202 Lecture : Introduction, Entropy and ML estimation Lecturer: Aarti Singh Scribes: Min Xu Disclaimer: These notes have not been subjected to the usual
More informationPART III. Outline. Codes and Cryptography. Sources. Optimal Codes (I) Jorge L. Villar. MAMME, Fall 2015
Outline Codes and Cryptography 1 Information Sources and Optimal Codes 2 Building Optimal Codes: Huffman Codes MAMME, Fall 2015 3 Shannon Entropy and Mutual Information PART III Sources Information source:
More informationLecture 2: August 31
0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 2: August 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy
More informationDobbiaco Lectures 2010 (4)Solved and Unsolved Problems in
Dobbiaco Lectures 2010 (4) Solved and Unsolved Problems in Biology Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA Dobbiaco Outline 1 2 3 PART IX: Information Main theses Outline
More informationInformation & Correlation
Information & Correlation Jilles Vreeken 11 June 2014 (TADA) Questions of the day What is information? How can we measure correlation? and what do talking drums have to do with this? Bits and Pieces What
More informationExpectation Propagation Algorithm
Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,
More informationCompressing Tabular Data via Pairwise Dependencies
Compressing Tabular Data via Pairwise Dependencies Amir Ingber, Yahoo! Research TCE Conference, June 22, 2017 Joint work with Dmitri Pavlichin, Tsachy Weissman (Stanford) Huge datasets: everywhere - Internet
More informationHow to Quantitate a Markov Chain? Stochostic project 1
How to Quantitate a Markov Chain? Stochostic project 1 Chi-Ning,Chou Wei-chang,Lee PROFESSOR RAOUL NORMAND April 18, 2015 Abstract In this project, we want to quantitatively evaluate a Markov chain. In
More informationMachine Learning Srihari. Information Theory. Sargur N. Srihari
Information Theory Sargur N. Srihari 1 Topics 1. Entropy as an Information Measure 1. Discrete variable definition Relationship to Code Length 2. Continuous Variable Differential Entropy 2. Maximum Entropy
More informationInformation Theory, Statistics, and Decision Trees
Information Theory, Statistics, and Decision Trees Léon Bottou COS 424 4/6/2010 Summary 1. Basic information theory. 2. Decision trees. 3. Information theory and statistics. Léon Bottou 2/31 COS 424 4/6/2010
More informationInformation Bottleneck for Gaussian Variables
Information Bottleneck for Gaussian Variables Gal Chechik Amir Globerson Naftali Tishby Yair Weiss {ggal,gamir,tishby,yweiss}@cs.huji.ac.il School of Computer Science and Engineering and The Interdisciplinary
More informationLecture 1. Introduction
Lecture 1. Introduction What is the course about? Logistics Questionnaire Dr. Yao Xie, ECE587, Information Theory, Duke University What is information? Dr. Yao Xie, ECE587, Information Theory, Duke University
More informationMachine Learning. Lecture 02.2: Basics of Information Theory. Nevin L. Zhang
Machine Learning Lecture 02.2: Basics of Information Theory Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering The Hong Kong University of Science and Technology Nevin L. Zhang
More informationChapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye
Chapter 2: Entropy and Mutual Information Chapter 2 outline Definitions Entropy Joint entropy, conditional entropy Relative entropy, mutual information Chain rules Jensen s inequality Log-sum inequality
More informationQB LECTURE #4: Motif Finding
QB LECTURE #4: Motif Finding Adam Siepel Nov. 20, 2015 2 Plan for Today Probability models for binding sites Scoring and detecting binding sites De novo motif finding 3 Transcription Initiation Chromatin
More informationChapter 2 Review of Classical Information Theory
Chapter 2 Review of Classical Information Theory Abstract This chapter presents a review of the classical information theory which plays a crucial role in this thesis. We introduce the various types of
More informationVariable selection and feature construction using methods related to information theory
Outline Variable selection and feature construction using methods related to information theory Kari 1 1 Intelligent Systems Lab, Motorola, Tempe, AZ IJCNN 2007 Outline Outline 1 Information Theory and
More informationDEEP LEARNING CHAPTER 3 PROBABILITY & INFORMATION THEORY
DEEP LEARNING CHAPTER 3 PROBABILITY & INFORMATION THEORY OUTLINE 3.1 Why Probability? 3.2 Random Variables 3.3 Probability Distributions 3.4 Marginal Probability 3.5 Conditional Probability 3.6 The Chain
More informationInformation in Biology
Lecture 3: Information in Biology Tsvi Tlusty, tsvi@unist.ac.kr Living information is carried by molecular channels Living systems I. Self-replicating information processors Environment II. III. Evolve
More informationLECTURE 3. Last time:
LECTURE 3 Last time: Mutual Information. Convexity and concavity Jensen s inequality Information Inequality Data processing theorem Fano s Inequality Lecture outline Stochastic processes, Entropy rate
More informationInformation-theoretic foundations of differential privacy
Information-theoretic foundations of differential privacy Darakhshan J. Mir Rutgers University, Piscataway NJ 08854, USA, mir@cs.rutgers.edu Abstract. We examine the information-theoretic foundations of
More informationInformation Theory and Communication
Information Theory and Communication Ritwik Banerjee rbanerjee@cs.stonybrook.edu c Ritwik Banerjee Information Theory and Communication 1/8 General Chain Rules Definition Conditional mutual information
More informationLecture 5 - Information theory
Lecture 5 - Information theory Jan Bouda FI MU May 18, 2012 Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 1 / 42 Part I Uncertainty and entropy Jan Bouda (FI MU) Lecture 5 - Information
More informationSource Coding with Lists and Rényi Entropy or The Honey-Do Problem
Source Coding with Lists and Rényi Entropy or The Honey-Do Problem Amos Lapidoth ETH Zurich October 8, 2013 Joint work with Christoph Bunte. A Task from your Spouse Using a fixed number of bits, your spouse
More informationQuantitative Biology Lecture 3
23 nd Sep 2015 Quantitative Biology Lecture 3 Gurinder Singh Mickey Atwal Center for Quantitative Biology Summary Covariance, Correlation Confounding variables (Batch Effects) Information Theory Covariance
More informationExtracting relevant structures
Chapter 5 Extracting relevant structures A key problem in understanding auditory coding is to identify the acoustic features that neurons at various levels of the system code. If we can map the relevant
More informationInformation Theory and Feature Selection (Joint Informativeness and Tractability)
Information Theory and Feature Selection (Joint Informativeness and Tractability) Leonidas Lefakis Zalando Research Labs 1 / 66 Dimensionality Reduction Feature Construction Construction X 1,..., X D f
More informationChapter 9 Fundamental Limits in Information Theory
Chapter 9 Fundamental Limits in Information Theory Information Theory is the fundamental theory behind information manipulation, including data compression and data transmission. 9.1 Introduction o For
More informationInformation theory and decision tree
Information theory and decision tree Jianxin Wu LAMDA Group National Key Lab for Novel Software Technology Nanjing University, China wujx2001@gmail.com June 14, 2018 Contents 1 Prefix code and Huffman
More informationInformation Theory. M1 Informatique (parcours recherche et innovation) Aline Roumy. January INRIA Rennes 1/ 73
1/ 73 Information Theory M1 Informatique (parcours recherche et innovation) Aline Roumy INRIA Rennes January 2018 Outline 2/ 73 1 Non mathematical introduction 2 Mathematical introduction: definitions
More informationApplication of Information Theory, Lecture 7. Relative Entropy. Handout Mode. Iftach Haitner. Tel Aviv University.
Application of Information Theory, Lecture 7 Relative Entropy Handout Mode Iftach Haitner Tel Aviv University. December 1, 2015 Iftach Haitner (TAU) Application of Information Theory, Lecture 7 December
More informationCoding for Discrete Source
EGR 544 Communication Theory 3. Coding for Discrete Sources Z. Aliyazicioglu Electrical and Computer Engineering Department Cal Poly Pomona Coding for Discrete Source Coding Represent source data effectively
More informationx log x, which is strictly convex, and use Jensen s Inequality:
2. Information measures: mutual information 2.1 Divergence: main inequality Theorem 2.1 (Information Inequality). D(P Q) 0 ; D(P Q) = 0 iff P = Q Proof. Let ϕ(x) x log x, which is strictly convex, and
More informationLecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157
Lecture 6: Gaussian Channels Copyright G. Caire (Sample Lectures) 157 Differential entropy (1) Definition 18. The (joint) differential entropy of a continuous random vector X n p X n(x) over R is: Z h(x
More informationMultimedia Communications. Mathematical Preliminaries for Lossless Compression
Multimedia Communications Mathematical Preliminaries for Lossless Compression What we will see in this chapter Definition of information and entropy Modeling a data source Definition of coding and when
More informationELEMENT OF INFORMATION THEORY
History Table of Content ELEMENT OF INFORMATION THEORY O. Le Meur olemeur@irisa.fr Univ. of Rennes 1 http://www.irisa.fr/temics/staff/lemeur/ October 2010 1 History Table of Content VERSION: 2009-2010:
More informationInformation in Biology
Information in Biology CRI - Centre de Recherches Interdisciplinaires, Paris May 2012 Information processing is an essential part of Life. Thinking about it in quantitative terms may is useful. 1 Living
More informationCh. 8 Math Preliminaries for Lossy Coding. 8.4 Info Theory Revisited
Ch. 8 Math Preliminaries for Lossy Coding 8.4 Info Theory Revisited 1 Info Theory Goals for Lossy Coding Again just as for the lossless case Info Theory provides: Basis for Algorithms & Bounds on Performance
More informationPROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability
More informationInformation Theory: Entropy, Markov Chains, and Huffman Coding
The University of Notre Dame A senior thesis submitted to the Department of Mathematics and the Glynn Family Honors Program Information Theory: Entropy, Markov Chains, and Huffman Coding Patrick LeBlanc
More informationClassification & Information Theory Lecture #8
Classification & Information Theory Lecture #8 Introduction to Natural Language Processing CMPSCI 585, Fall 2007 University of Massachusetts Amherst Andrew McCallum Today s Main Points Automatically categorizing
More informationINTRODUCTION TO INFORMATION THEORY
INTRODUCTION TO INFORMATION THEORY KRISTOFFER P. NIMARK These notes introduce the machinery of information theory which is a eld within applied mathematics. The material can be found in most textbooks
More informationBayesian Machine Learning - Lecture 7
Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1
More informationMultiterminal Source Coding with an Entropy-Based Distortion Measure
Multiterminal Source Coding with an Entropy-Based Distortion Measure Thomas Courtade and Rick Wesel Department of Electrical Engineering University of California, Los Angeles 4 August, 2011 IEEE International
More informationInformation Theory and Statistics Lecture 2: Source coding
Information Theory and Statistics Lecture 2: Source coding Łukasz Dębowski ldebowsk@ipipan.waw.pl Ph. D. Programme 2013/2014 Injections and codes Definition (injection) Function f is called an injection
More informationCapacity of a channel Shannon s second theorem. Information Theory 1/33
Capacity of a channel Shannon s second theorem Information Theory 1/33 Outline 1. Memoryless channels, examples ; 2. Capacity ; 3. Symmetric channels ; 4. Channel Coding ; 5. Shannon s second theorem,
More informationECE521 Lectures 9 Fully Connected Neural Networks
ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance
More informationGaussian Lower Bound for the Information Bottleneck Limit
Journal of Machine Learning Research 18 (18 1-9 Submitted 7/17; Revised 11/17; Published 4/18 Gaussian Lower Bound for the Information Bottleneck Limit Amichai Painsky Naftali Tishby School of Computer
More informationExercises with solutions (Set D)
Exercises with solutions Set D. A fair die is rolled at the same time as a fair coin is tossed. Let A be the number on the upper surface of the die and let B describe the outcome of the coin toss, where
More information18.2 Continuous Alphabet (discrete-time, memoryless) Channel
0-704: Information Processing and Learning Spring 0 Lecture 8: Gaussian channel, Parallel channels and Rate-distortion theory Lecturer: Aarti Singh Scribe: Danai Koutra Disclaimer: These notes have not
More informationThe binary entropy function
ECE 7680 Lecture 2 Definitions and Basic Facts Objective: To learn a bunch of definitions about entropy and information measures that will be useful through the quarter, and to present some simple but
More informationSolutions to Homework Set #1 Sanov s Theorem, Rate distortion
st Semester 00/ Solutions to Homework Set # Sanov s Theorem, Rate distortion. Sanov s theorem: Prove the simple version of Sanov s theorem for the binary random variables, i.e., let X,X,...,X n be a sequence
More informationMGMT 69000: Topics in High-dimensional Data Analysis Falll 2016
MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016 Lecture 14: Information Theoretic Methods Lecturer: Jiaming Xu Scribe: Hilda Ibriga, Adarsh Barik, December 02, 2016 Outline f-divergence
More information(Classical) Information Theory II: Source coding
(Classical) Information Theory II: Source coding Sibasish Ghosh The Institute of Mathematical Sciences CIT Campus, Taramani, Chennai 600 113, India. p. 1 Abstract The information content of a random variable
More informationSolutions to Set #2 Data Compression, Huffman code and AEP
Solutions to Set #2 Data Compression, Huffman code and AEP. Huffman coding. Consider the random variable ( ) x x X = 2 x 3 x 4 x 5 x 6 x 7 0.50 0.26 0. 0.04 0.04 0.03 0.02 (a) Find a binary Huffman code
More information3F1: Signals and Systems INFORMATION THEORY Examples Paper Solutions
Engineering Tripos Part IIA THIRD YEAR 3F: Signals and Systems INFORMATION THEORY Examples Paper Solutions. Let the joint probability mass function of two binary random variables X and Y be given in the
More informationUnraveling the mysteries of stochastic gradient descent on deep neural networks
Unraveling the mysteries of stochastic gradient descent on deep neural networks Pratik Chaudhari UCLA VISION LAB 1 The question measures disagreement of predictions with ground truth Cat Dog... x = argmin
More informationLecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1)
3- Mathematical methods in communication Lecture 3 Lecturer: Haim Permuter Scribe: Yuval Carmel, Dima Khaykin, Ziv Goldfeld I. REMINDER A. Convex Set A set R is a convex set iff, x,x 2 R, θ, θ, θx + θx
More informationRepresentation. Stefano Ermon, Aditya Grover. Stanford University. Lecture 2
Representation Stefano Ermon, Aditya Grover Stanford University Lecture 2 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 2 1 / 32 Learning a generative model We are given a training
More informationQuantitative Biology II Lecture 4: Variational Methods
10 th March 2015 Quantitative Biology II Lecture 4: Variational Methods Gurinder Singh Mickey Atwal Center for Quantitative Biology Cold Spring Harbor Laboratory Image credit: Mike West Summary Approximate
More informationRemote Source Coding with Two-Sided Information
Remote Source Coding with Two-Sided Information Basak Guler Ebrahim MolavianJazi Aylin Yener Wireless Communications and Networking Laboratory Department of Electrical Engineering The Pennsylvania State
More informationAPC486/ELE486: Transmission and Compression of Information. Bounds on the Expected Length of Code Words
APC486/ELE486: Transmission and Compression of Information Bounds on the Expected Length of Code Words Scribe: Kiran Vodrahalli September 8, 204 Notations In these notes, denotes a finite set, called the
More informationInformation Geometric view of Belief Propagation
Information Geometric view of Belief Propagation Yunshu Liu 2013-10-17 References: [1]. Shiro Ikeda, Toshiyuki Tanaka and Shun-ichi Amari, Stochastic reasoning, Free energy and Information Geometry, Neural
More informationHomework 1 Due: Thursday 2/5/2015. Instructions: Turn in your homework in class on Thursday 2/5/2015
10-704 Homework 1 Due: Thursday 2/5/2015 Instructions: Turn in your homework in class on Thursday 2/5/2015 1. Information Theory Basics and Inequalities C&T 2.47, 2.29 (a) A deck of n cards in order 1,
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationBayesian Inference Course, WTCN, UCL, March 2013
Bayesian Course, WTCN, UCL, March 2013 Shannon (1948) asked how much information is received when we observe a specific value of the variable x? If an unlikely event occurs then one would expect the information
More informationCausal Modeling with Generative Neural Networks
Causal Modeling with Generative Neural Networks Michele Sebag TAO, CNRS INRIA LRI Université Paris-Sud Joint work: D. Kalainathan, O. Goudet, I. Guyon, M. Hajaiej, A. Decelle, C. Furtlehner https://arxiv.org/abs/1709.05321
More informationChapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University
Chapter 4 Data Transmission and Channel Capacity Po-Ning Chen, Professor Department of Communications Engineering National Chiao Tung University Hsin Chu, Taiwan 30050, R.O.C. Principle of Data Transmission
More informationEE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018
Please submit the solutions on Gradescope. EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 1. Optimal codeword lengths. Although the codeword lengths of an optimal variable length code
More informationSeries 7, May 22, 2018 (EM Convergence)
Exercises Introduction to Machine Learning SS 2018 Series 7, May 22, 2018 (EM Convergence) Institute for Machine Learning Dept. of Computer Science, ETH Zürich Prof. Dr. Andreas Krause Web: https://las.inf.ethz.ch/teaching/introml-s18
More informationBlock 2: Introduction to Information Theory
Block 2: Introduction to Information Theory Francisco J. Escribano April 26, 2015 Francisco J. Escribano Block 2: Introduction to Information Theory April 26, 2015 1 / 51 Table of contents 1 Motivation
More informationOptimal predictive inference
Optimal predictive inference Susanne Still University of Hawaii at Manoa Information and Computer Sciences S. Still, J. P. Crutchfield. Structure or Noise? http://lanl.arxiv.org/abs/0708.0654 S. Still,
More information3F1 Information Theory, Lecture 1
3F1 Information Theory, Lecture 1 Jossy Sayir Department of Engineering Michaelmas 2013, 22 November 2013 Organisation History Entropy Mutual Information 2 / 18 Course Organisation 4 lectures Course material:
More informationMedical Imaging. Norbert Schuff, Ph.D. Center for Imaging of Neurodegenerative Diseases
Uses of Information Theory in Medical Imaging Norbert Schuff, Ph.D. Center for Imaging of Neurodegenerative Diseases Norbert.schuff@ucsf.edu With contributions from Dr. Wang Zhang Medical Imaging Informatics,
More informationMachine Learning Basics: Maximum Likelihood Estimation
Machine Learning Basics: Maximum Likelihood Estimation Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics 1. Learning
More informationDATA MINING LECTURE 9. Minimum Description Length Information Theory Co-Clustering
DATA MINING LECTURE 9 Minimum Description Length Information Theory Co-Clustering MINIMUM DESCRIPTION LENGTH Occam s razor Most data mining tasks can be described as creating a model for the data E.g.,
More informationECE Information theory Final
ECE 776 - Information theory Final Q1 (1 point) We would like to compress a Gaussian source with zero mean and variance 1 We consider two strategies In the first, we quantize with a step size so that the
More informationChaos, Complexity, and Inference (36-462)
Chaos, Complexity, and Inference (36-462) Lecture 7: Information Theory Cosma Shalizi 3 February 2009 Entropy and Information Measuring randomness and dependence in bits The connection to statistics Long-run
More information