Achievable rates for pattern recognition

Size: px
Start display at page:

Download "Achievable rates for pattern recognition"

Transcription

1 Achievable rates for pattern recognition M Brandon Westover Joseph A O Sullivan Washington University in Saint Louis Departments of Physics and Electrical Engineering

2 Goals of Information Theory Models Bounds Design?

3 The Pattern Recognition Problem Model 1: The Naïve View

4 Training set Selection process Recognition Environment X 1 X 2 X Mc Select one Test pattern x h

5 Training set Selection process Recognition Environment X 1 X 2 X Mc Select one Test pattern x h Recognition System X 1,X 2,,X Mc g ĥ Memory Recognition module Objective: g(x h )=h

6 X 1 X 2 X Mc Select one x h X 1,X 2,,X Mc g ĥ Objective: g(x h )=h

7 Mumford, 2002

8 Mumford, 2002

9 ``Easy things are hard -M Minksy What makes recognition problems hard? Problem-intrinsic challenges Data ambiguity Data complexity

10 Mumford, 95

11 The infinity of signature variation Kersten, 1998

12

13 Yuille and Kersten, 03

14 The Pattern Recognition Problem Model 2: The Probabilistic View

15 X 1 X 2 X Mc Select one x h X 1,X 2,,X Mc g ĥ Objective: g(x h )=h

16 Data model Imaging model X 1 X 2 p(x) X 1 Select one p(h) x h p(y x) y X Mc X 1,X 2,,X Mc g ĥ Objective: Pr{g(x h )=h}>1-ε

17

18 ``Easy things are hard -M Minksy What makes pattern recognition hard? Problem-intrinsic challenges Data ambiguity Data complexity Problem-solver-intrinsic challenges Faulty components Data storage limitations Data processing limitations

19 The simplest recognition circuit Barlow, 1959 Barlow, 1959

20 Redundancy Absolute Perceptual Data Compression Data volume ~10 8 bits/sec Data complexity Data cost ~10 9 ATP/bit Cortical energy budget ~10 20 ATP/sec Storage capacity Accessibility Simplicity Stability Modeling Prediction Inference

21 The Pattern Recognition Problem Model 3: Information Theoretic View

22 Data model Imaging model X 1 X 2 p(x) X 1 Select one p(h) x h p(y x) y X Mc X 1,X 2,,X Mc g ĥ Objective: Pr{g(x h )=h}>1-ε

23 X 1 X 2 p(x) X 1 Select one p(h) x h p(y x) y X Mc memory encoder f φ sensory encoder V sensory representation memory representation U 1,U 2,,U Mx g ĥ Objective: Pr{g(V )=h}>1-ε

24 X 1 X 2 p(x) X 1 Select one p(h) x h p(y x) y X Mc R x memory encoder f φ sensory encoder R y V sensory representation memory representation U 1,U 2,,U Mx g ĥ Objective: Pr{g(V )=h}>1-ε, st R=(, R x, R y )

25 Goals of Information Theory Models Bounds Design?

26 p(x) X 1 X 2 Problem statement X Mc Select one p(h) p(y x) Determine the admissible rates R x,r y, and for reliable recognition x h y R x f R y φ v U 1,U 2,,U Mx g ĥ

27 Pattern Recognition Codes: Definitions p(x) x 1 x 2 x 3 select pattern p(h) x h y p(y x) x Mc g 2 f g 1 φ u(i) i γ(i) M x v(i) i γ(i) M y 11 11

28 Achievable rates

29 Characterizing

30 Characterizing

31 All p(x,y,u,v) Outer bound: proof strategy 1/ p ** (x,y,u,v) given (f,φ,g) construct R U-X-Y, X-Y-V etc

32 Characterizing

33 All p(x,y,u,v) Inner bound: proof strategy construct (f,φ,g) 1/ given p * (x,y,u,v) R U-X-Y-V etc

34 H(Y) R x > I(X;U) R y > I(Y;V) < I(U;V)-I(U;V X,Y) I(Y;V) 0 0 I(X;U) H(X)

35 H(Y) V=Y R x > I(X;U) R y > I(Y;V) < I(U;V)-I(U;V X,Y) U=X On the border, U-X-Y-V, so R * =R ** =R I(Y;V) 0 0 I(X;U) H(X)

36 H(Y) V=Y R x > I(X;U) R y > I(Y;V) < I(U;V) U=X I(Y;V) 0 0 I(X;U) H(X)

37 H(Y) V=Y =I(X;Y) R x > I(X;U) R y > I(Y;V) < I(U;V) U=X `Unlimited U,V capacity: U=X, Y=V Rc < I(X;Y) I(Y;V) 0 0 I(X;U) H(X) Channel coding!

38 H(Y) V=Y =I(X;Y) R x > I(X;U) R y > I(Y;V) < I(U;V) =0 U=X Poor memory: U=0 Rc < I(0;V)=0 I(Y;V) 0 0 I(X;U) =0 H(X) Poor senses: V=0 Rc<I(U;0)=0

39 H(Y) V=Y =I(X;Y)-I(X;Y U) =I(X;Y) R x > I(X;U) R y > I(Y;V) < I(U;V) =0 U=X `Unlimited V capacity: V=Y Rc < I(X;Y)-I(X;Y U) I(Y;V) 0 0 I(X;U) =0 H(X)

40 H(Y) V=Y =I(X;Y)-I(X;Y U) =I(X;Y) R x > I(X;U) R y > I(Y;V) < I(U;V) =0 U=X `Unlimited U capacity: U=X Rc < I(X;Y)-I(X;Y V) I(Y;V) 0 0 I(X;U) =0 H(X)

41 H(Y) V=Y =I(X;Y)-I(X;Y U) =I(X;Y) =0 U=X =I(X;Y)-I(X;Y V) I(Y;V) 0 0 I(X;U) =0 H(X)

42 Revisiting GAP

43 All p(x,y,u,v) The gap 1/ R U-X-Y, X-Y-V etc U-X-Y-V etc

44 A related gap: The distributed source coding problem p(x,y) X Y f φ g (U,V) -Problem: Characterize the achievable (R x,r y,d x,d y ) -Posed in early 70 s -Only partial solutions so far

45 Comparison of the gaps Pattern Recognition Distributed Source Coding

46 Closing comments Objective framework for normalizing recognition system performance / guiding system design Closing the gap Extensions for finite n? Learning codebooks for real examples Connections to the information bottleneck framework Similar philosophy: distortion should be defined by the task!

47

48 References for borrowed images David Mumford, 1995, in Neuronal Architecturesfor Pattern-theoretic problems, from the book Large Scale Neuronal Theories of the Brain, edited by C koch and JL Davis Dan Kersten, 1998, slide from NIPS tutorial titled Computational Vision: Principles of Perceptual Inference Available at: Kersten, D, Mamassian P & Yuille A 2003 (in press), Object perception as Bayesian inference Annual Review of Psychology David Mumford and Agnes Desolneux, 2002, in the introductory chapter to Pattern Theory Through Examples Available from

Efficient Coding. Odelia Schwartz 2017

Efficient Coding. Odelia Schwartz 2017 Efficient Coding Odelia Schwartz 2017 1 Levels of modeling Descriptive (what) Mechanistic (how) Interpretive (why) 2 Levels of modeling Fitting a receptive field model to experimental data (e.g., using

More information

Deep learning in the visual cortex

Deep learning in the visual cortex Deep learning in the visual cortex Thomas Serre Brown University. Fundamentals of primate vision. Computational mechanisms of rapid recognition and feedforward processing. Beyond feedforward processing:

More information

Estimation of information-theoretic quantities

Estimation of information-theoretic quantities Estimation of information-theoretic quantities Liam Paninski Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/ liam liam@gatsby.ucl.ac.uk November 16, 2004 Some

More information

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy Coding and Information Theory Chris Williams, School of Informatics, University of Edinburgh Overview What is information theory? Entropy Coding Information Theory Shannon (1948): Information theory is

More information

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression

More information

Neural coding Ecological approach to sensory coding: efficient adaptation to the natural environment

Neural coding Ecological approach to sensory coding: efficient adaptation to the natural environment Neural coding Ecological approach to sensory coding: efficient adaptation to the natural environment Jean-Pierre Nadal CNRS & EHESS Laboratoire de Physique Statistique (LPS, UMR 8550 CNRS - ENS UPMC Univ.

More information

Entropies & Information Theory

Entropies & Information Theory Entropies & Information Theory LECTURE I Nilanjana Datta University of Cambridge,U.K. See lecture notes on: http://www.qi.damtp.cam.ac.uk/node/223 Quantum Information Theory Born out of Classical Information

More information

ELEC546 Review of Information Theory

ELEC546 Review of Information Theory ELEC546 Review of Information Theory Vincent Lau 1/1/004 1 Review of Information Theory Entropy: Measure of uncertainty of a random variable X. The entropy of X, H(X), is given by: If X is a discrete random

More information

Classification & Information Theory Lecture #8

Classification & Information Theory Lecture #8 Classification & Information Theory Lecture #8 Introduction to Natural Language Processing CMPSCI 585, Fall 2007 University of Massachusetts Amherst Andrew McCallum Today s Main Points Automatically categorizing

More information

Looking forwards and backwards: Similarities and differences in prediction and retrodiction

Looking forwards and backwards: Similarities and differences in prediction and retrodiction Looking forwards and backwards: Similarities and differences in prediction and retrodiction Kevin A Smith (k2smith@ucsd.edu) and Edward Vul (evul@ucsd.edu) University of California, San Diego Department

More information

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information. L65 Dept. of Linguistics, Indiana University Fall 205 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission rate

More information

CSE/NB 528 Final Lecture: All Good Things Must. CSE/NB 528: Final Lecture

CSE/NB 528 Final Lecture: All Good Things Must. CSE/NB 528: Final Lecture CSE/NB 528 Final Lecture: All Good Things Must 1 Course Summary Where have we been? Course Highlights Where do we go from here? Challenges and Open Problems Further Reading 2 What is the neural code? What

More information

Dept. of Linguistics, Indiana University Fall 2015

Dept. of Linguistics, Indiana University Fall 2015 L645 Dept. of Linguistics, Indiana University Fall 2015 1 / 28 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission

More information

Flexible Gating of Contextual Influences in Natural Vision. Odelia Schwartz University of Miami Oct 2015

Flexible Gating of Contextual Influences in Natural Vision. Odelia Schwartz University of Miami Oct 2015 Flexible Gating of Contextual Influences in Natural Vision Odelia Schwartz University of Miami Oct 05 Contextual influences Perceptual illusions: no man is an island.. Review paper on context: Schwartz,

More information

Bayesian decision making

Bayesian decision making Bayesian decision making Václav Hlaváč Czech Technical University in Prague Czech Institute of Informatics, Robotics and Cybernetics 166 36 Prague 6, Jugoslávských partyzánů 1580/3, Czech Republic http://people.ciirc.cvut.cz/hlavac,

More information

What and where: A Bayesian inference theory of attention

What and where: A Bayesian inference theory of attention What and where: A Bayesian inference theory of attention Sharat Chikkerur, Thomas Serre, Cheston Tan & Tomaso Poggio CBCL, McGovern Institute for Brain Research, MIT Preliminaries Outline Perception &

More information

12/2/15. G Perception. Bayesian Decision Theory. Laurence T. Maloney. Perceptual Tasks. Testing hypotheses. Estimation

12/2/15. G Perception. Bayesian Decision Theory. Laurence T. Maloney. Perceptual Tasks. Testing hypotheses. Estimation G89.2223 Perception Bayesian Decision Theory Laurence T. Maloney Perceptual Tasks Testing hypotheses signal detection theory psychometric function Estimation previous lecture Selection of actions this

More information

Principles of Communications

Principles of Communications Principles of Communications Weiyao Lin Shanghai Jiao Tong University Chapter 10: Information Theory Textbook: Chapter 12 Communication Systems Engineering: Ch 6.1, Ch 9.1~ 9. 92 2009/2010 Meixia Tao @

More information

Variational Autoencoders. Presented by Alex Beatson Materials from Yann LeCun, Jaan Altosaar, Shakir Mohamed

Variational Autoencoders. Presented by Alex Beatson Materials from Yann LeCun, Jaan Altosaar, Shakir Mohamed Variational Autoencoders Presented by Alex Beatson Materials from Yann LeCun, Jaan Altosaar, Shakir Mohamed Contents 1. Why unsupervised learning, and why generative models? (Selected slides from Yann

More information

Noisy channel communication

Noisy channel communication Information Theory http://www.inf.ed.ac.uk/teaching/courses/it/ Week 6 Communication channels and Information Some notes on the noisy channel setup: Iain Murray, 2012 School of Informatics, University

More information

Lecture 22: Final Review

Lecture 22: Final Review Lecture 22: Final Review Nuts and bolts Fundamental questions and limits Tools Practical algorithms Future topics Dr Yao Xie, ECE587, Information Theory, Duke University Basics Dr Yao Xie, ECE587, Information

More information

16.36 Communication Systems Engineering

16.36 Communication Systems Engineering MIT OpenCourseWare http://ocw.mit.edu 16.36 Communication Systems Engineering Spring 2009 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. 16.36: Communication

More information

Communication by Regression: Sparse Superposition Codes

Communication by Regression: Sparse Superposition Codes Communication by Regression: Sparse Superposition Codes Department of Statistics, Yale University Coauthors: Antony Joseph and Sanghee Cho February 21, 2013, University of Texas Channel Communication Set-up

More information

Information Theoretic Limits of Randomness Generation

Information Theoretic Limits of Randomness Generation Information Theoretic Limits of Randomness Generation Abbas El Gamal Stanford University Shannon Centennial, University of Michigan, September 2016 Information theory The fundamental problem of communication

More information

Natural Image Statistics and Neural Representations

Natural Image Statistics and Neural Representations Natural Image Statistics and Neural Representations Michael Lewicki Center for the Neural Basis of Cognition & Department of Computer Science Carnegie Mellon University? 1 Outline 1. Information theory

More information

Probability models for machine learning. Advanced topics ML4bio 2016 Alan Moses

Probability models for machine learning. Advanced topics ML4bio 2016 Alan Moses Probability models for machine learning Advanced topics ML4bio 2016 Alan Moses What did we cover in this course so far? 4 major areas of machine learning: Clustering Dimensionality reduction Classification

More information

(Classical) Information Theory III: Noisy channel coding

(Classical) Information Theory III: Noisy channel coding (Classical) Information Theory III: Noisy channel coding Sibasish Ghosh The Institute of Mathematical Sciences CIT Campus, Taramani, Chennai 600 113, India. p. 1 Abstract What is the best possible way

More information

Network Computing and State Space Semantics versus Symbol Manipulation and Sentential Representation

Network Computing and State Space Semantics versus Symbol Manipulation and Sentential Representation Network Computing and State Space Semantics versus Symbol Manipulation and Sentential Representation Brain Unlike Turing Machine/Symbol Manipulator 1. Nervous system has massively parallel architecture

More information

Differentiation. Timur Musin. October 10, University of Freiburg 1 / 54

Differentiation. Timur Musin. October 10, University of Freiburg 1 / 54 Timur Musin University of Freiburg October 10, 2014 1 / 54 1 Limit of a Function 2 2 / 54 Literature A. C. Chiang and K. Wainwright, Fundamental methods of mathematical economics, Irwin/McGraw-Hill, Boston,

More information

EE 4TM4: Digital Communications II. Channel Capacity

EE 4TM4: Digital Communications II. Channel Capacity EE 4TM4: Digital Communications II 1 Channel Capacity I. CHANNEL CODING THEOREM Definition 1: A rater is said to be achievable if there exists a sequence of(2 nr,n) codes such thatlim n P (n) e (C) = 0.

More information

Linear discriminant functions

Linear discriminant functions Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative

More information

Lecture 2: August 31

Lecture 2: August 31 0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 2: August 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy

More information

Statistical Structure in Natural Language. Tom Jackson April 4th PACM seminar Advisor: Bill Bialek

Statistical Structure in Natural Language. Tom Jackson April 4th PACM seminar Advisor: Bill Bialek Statistical Structure in Natural Language Tom Jackson April 4th PACM seminar Advisor: Bill Bialek Some Linguistic Questions n Why does language work so well? Unlimited capacity and flexibility Requires

More information

Distributed Lossless Compression. Distributed lossless compression system

Distributed Lossless Compression. Distributed lossless compression system Lecture #3 Distributed Lossless Compression (Reading: NIT 10.1 10.5, 4.4) Distributed lossless source coding Lossless source coding via random binning Time Sharing Achievability proof of the Slepian Wolf

More information

Ch. 8 Math Preliminaries for Lossy Coding. 8.5 Rate-Distortion Theory

Ch. 8 Math Preliminaries for Lossy Coding. 8.5 Rate-Distortion Theory Ch. 8 Math Preliminaries for Lossy Coding 8.5 Rate-Distortion Theory 1 Introduction Theory provide insight into the trade between Rate & Distortion This theory is needed to answer: What do typical R-D

More information

Chapter 9 Fundamental Limits in Information Theory

Chapter 9 Fundamental Limits in Information Theory Chapter 9 Fundamental Limits in Information Theory Information Theory is the fundamental theory behind information manipulation, including data compression and data transmission. 9.1 Introduction o For

More information

County Council Named for Kent

County Council Named for Kent \ Y Y 8 9 69 6» > 69 ««] 6 : 8 «V z 9 8 x 9 8 8 8?? 9 V q» :: q;; 8 x () «; 8 x ( z x 9 7 ; x >«\ 8 8 ; 7 z x [ q z «z : > ; ; ; ( 76 x ; x z «7 8 z ; 89 9 z > q _ x 9 : ; 6? ; ( 9 [ ) 89 _ ;»» «; x V

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models David Sontag New York University Lecture 4, February 16, 2012 David Sontag (NYU) Graphical Models Lecture 4, February 16, 2012 1 / 27 Undirected graphical models Reminder

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks Delivered by Mark Ebden With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable

More information

Coding for Computing. ASPITRG, Drexel University. Jie Ren 2012/11/14

Coding for Computing. ASPITRG, Drexel University. Jie Ren 2012/11/14 Coding for Computing ASPITRG, Drexel University Jie Ren 2012/11/14 Outline Background and definitions Main Results Examples and Analysis Proofs Background and definitions Problem Setup Graph Entropy Conditional

More information

the Information Bottleneck

the Information Bottleneck the Information Bottleneck Daniel Moyer December 10, 2017 Imaging Genetics Center/Information Science Institute University of Southern California Sorry, no Neuroimaging! (at least not presented) 0 Instead,

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

Variable selection and feature construction using methods related to information theory

Variable selection and feature construction using methods related to information theory Outline Variable selection and feature construction using methods related to information theory Kari 1 1 Intelligent Systems Lab, Motorola, Tempe, AZ IJCNN 2007 Outline Outline 1 Information Theory and

More information

Hierarchy. Will Penny. 24th March Hierarchy. Will Penny. Linear Models. Convergence. Nonlinear Models. References

Hierarchy. Will Penny. 24th March Hierarchy. Will Penny. Linear Models. Convergence. Nonlinear Models. References 24th March 2011 Update Hierarchical Model Rao and Ballard (1999) presented a hierarchical model of visual cortex to show how classical and extra-classical Receptive Field (RF) effects could be explained

More information

Module 1. Introduction to Digital Communications and Information Theory. Version 2 ECE IIT, Kharagpur

Module 1. Introduction to Digital Communications and Information Theory. Version 2 ECE IIT, Kharagpur Module ntroduction to Digital Communications and nformation Theory Lesson 3 nformation Theoretic Approach to Digital Communications After reading this lesson, you will learn about Scope of nformation Theory

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Week 1: Introduction, Statistical Basics, and a bit of Information Theory Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent

More information

An Introduction to Statistical and Probabilistic Linear Models

An Introduction to Statistical and Probabilistic Linear Models An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning

More information

Bayesian Classifiers and Probability Estimation. Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington

Bayesian Classifiers and Probability Estimation. Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington Bayesian Classifiers and Probability Estimation Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington 1 Data Space Suppose that we have a classification problem The

More information

ECE Information theory Final

ECE Information theory Final ECE 776 - Information theory Final Q1 (1 point) We would like to compress a Gaussian source with zero mean and variance 1 We consider two strategies In the first, we quantize with a step size so that the

More information

Nonparameteric Regression:

Nonparameteric Regression: Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

UCSD ECE 255C Handout #14 Prof. Young-Han Kim Thursday, March 9, Solutions to Homework Set #5

UCSD ECE 255C Handout #14 Prof. Young-Han Kim Thursday, March 9, Solutions to Homework Set #5 UCSD ECE 255C Handout #14 Prof. Young-Han Kim Thursday, March 9, 2017 Solutions to Homework Set #5 3.18 Bounds on the quadratic rate distortion function. Recall that R(D) = inf F(ˆx x):e(x ˆX)2 DI(X; ˆX).

More information

8/13/10. Visual perception of human motion. Outline. Niko Troje BioMotionLab. Perception is. Stimulus Sensation Perception. Gestalt psychology

8/13/10. Visual perception of human motion. Outline. Niko Troje BioMotionLab. Perception is. Stimulus Sensation Perception. Gestalt psychology Visual perception of human motion Outline Niko Troje BioMotionLab! Vision from the psychologist s point of view: A bit of history and a few concepts! Biological motion: perception and analysis Department

More information

Revision of Lecture 5

Revision of Lecture 5 Revision of Lecture 5 Information transferring across channels Channel characteristics and binary symmetric channel Average mutual information Average mutual information tells us what happens to information

More information

Machine Learning Foundations

Machine Learning Foundations Machine Learning Foundations ( 機器學習基石 ) Lecture 8: Noise and Error Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University ( 國立台灣大學資訊工程系

More information

Information Theory CHAPTER. 5.1 Introduction. 5.2 Entropy

Information Theory CHAPTER. 5.1 Introduction. 5.2 Entropy Haykin_ch05_pp3.fm Page 207 Monday, November 26, 202 2:44 PM CHAPTER 5 Information Theory 5. Introduction As mentioned in Chapter and reiterated along the way, the purpose of a communication system is

More information

Compression and Coding

Compression and Coding Compression and Coding Theory and Applications Part 1: Fundamentals Gloria Menegaz 1 Transmitter (Encoder) What is the problem? Receiver (Decoder) Transformation information unit Channel Ordering (significance)

More information

Compressing Tabular Data via Pairwise Dependencies

Compressing Tabular Data via Pairwise Dependencies Compressing Tabular Data via Pairwise Dependencies Amir Ingber, Yahoo! Research TCE Conference, June 22, 2017 Joint work with Dmitri Pavlichin, Tsachy Weissman (Stanford) Huge datasets: everywhere - Internet

More information

The wake-sleep algorithm for unsupervised neural networks

The wake-sleep algorithm for unsupervised neural networks The wake-sleep algorithm for unsupervised neural networks Geoffrey E Hinton Peter Dayan Brendan J Frey Radford M Neal Department of Computer Science University of Toronto 6 King s College Road Toronto

More information

Information Theory - Entropy. Figure 3

Information Theory - Entropy. Figure 3 Concept of Information Information Theory - Entropy Figure 3 A typical binary coded digital communication system is shown in Figure 3. What is involved in the transmission of information? - The system

More information

Introduction to Bayesian Learning. Machine Learning Fall 2018

Introduction to Bayesian Learning. Machine Learning Fall 2018 Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability

More information

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Naïve Bayes Matt Gormley Lecture 18 Oct. 31, 2018 1 Reminders Homework 6: PAC Learning

More information

On Optimal Coding of Hidden Markov Sources

On Optimal Coding of Hidden Markov Sources 2014 Data Compression Conference On Optimal Coding of Hidden Markov Sources Mehdi Salehifar, Emrah Akyol, Kumar Viswanatha, and Kenneth Rose Department of Electrical and Computer Engineering University

More information

Ch. 8 Math Preliminaries for Lossy Coding. 8.4 Info Theory Revisited

Ch. 8 Math Preliminaries for Lossy Coding. 8.4 Info Theory Revisited Ch. 8 Math Preliminaries for Lossy Coding 8.4 Info Theory Revisited 1 Info Theory Goals for Lossy Coding Again just as for the lossless case Info Theory provides: Basis for Algorithms & Bounds on Performance

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models CPSC

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Communication by Regression: Achieving Shannon Capacity

Communication by Regression: Achieving Shannon Capacity Communication by Regression: Practical Achievement of Shannon Capacity Department of Statistics Yale University Workshop Infusing Statistics and Engineering Harvard University, June 5-6, 2011 Practical

More information

Quiz 2 Date: Monday, November 21, 2016

Quiz 2 Date: Monday, November 21, 2016 10-704 Information Processing and Learning Fall 2016 Quiz 2 Date: Monday, November 21, 2016 Name: Andrew ID: Department: Guidelines: 1. PLEASE DO NOT TURN THIS PAGE UNTIL INSTRUCTED. 2. Write your name,

More information

SYDE 575: Introduction to Image Processing. Image Compression Part 2: Variable-rate compression

SYDE 575: Introduction to Image Processing. Image Compression Part 2: Variable-rate compression SYDE 575: Introduction to Image Processing Image Compression Part 2: Variable-rate compression Variable-rate Compression: Transform-based compression As mentioned earlier, we wish to transform image data

More information

Lecture Notes 15 Prediction Chapters 13, 22, 20.4.

Lecture Notes 15 Prediction Chapters 13, 22, 20.4. Lecture Notes 15 Prediction Chapters 13, 22, 20.4. 1 Introduction Prediction is covered in detail in 36-707, 36-701, 36-715, 10/36-702. Here, we will just give an introduction. We observe training data

More information

Natural Language Processing

Natural Language Processing SFU NatLangLab Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class Simon Fraser University October 9, 2018 0 Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class

More information

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion st Semester 00/ Solutions to Homework Set # Sanov s Theorem, Rate distortion. Sanov s theorem: Prove the simple version of Sanov s theorem for the binary random variables, i.e., let X,X,...,X n be a sequence

More information

Lateral organization & computation

Lateral organization & computation Lateral organization & computation review Population encoding & decoding lateral organization Efficient representations that reduce or exploit redundancy Fixation task 1rst order Retinotopic maps Log-polar

More information

DATA MINING LECTURE 9. Minimum Description Length Information Theory Co-Clustering

DATA MINING LECTURE 9. Minimum Description Length Information Theory Co-Clustering DATA MINING LECTURE 9 Minimum Description Length Information Theory Co-Clustering MINIMUM DESCRIPTION LENGTH Occam s razor Most data mining tasks can be described as creating a model for the data E.g.,

More information

6.02 Fall 2011 Lecture #9

6.02 Fall 2011 Lecture #9 6.02 Fall 2011 Lecture #9 Claude E. Shannon Mutual information Channel capacity Transmission at rates up to channel capacity, and with asymptotically zero error 6.02 Fall 2011 Lecture 9, Slide #1 First

More information

30.5. Iterative Methods for Systems of Equations. Introduction. Prerequisites. Learning Outcomes

30.5. Iterative Methods for Systems of Equations. Introduction. Prerequisites. Learning Outcomes Iterative Methods for Systems of Equations 0.5 Introduction There are occasions when direct methods (like Gaussian elimination or the use of an LU decomposition) are not the best way to solve a system

More information

Statistical and Learning Techniques in Computer Vision Lecture 1: Random Variables Jens Rittscher and Chuck Stewart

Statistical and Learning Techniques in Computer Vision Lecture 1: Random Variables Jens Rittscher and Chuck Stewart Statistical and Learning Techniques in Computer Vision Lecture 1: Random Variables Jens Rittscher and Chuck Stewart 1 Motivation Imaging is a stochastic process: If we take all the different sources of

More information

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,

More information

Lecture 2: Logistic Regression and Neural Networks

Lecture 2: Logistic Regression and Neural Networks 1/23 Lecture 2: and Neural Networks Pedro Savarese TTI 2018 2/23 Table of Contents 1 2 3 4 3/23 Naive Bayes Learn p(x, y) = p(y)p(x y) Training: Maximum Likelihood Estimation Issues? Why learn p(x, y)

More information

Vision Modules and Cue Combination

Vision Modules and Cue Combination Cue Coupling This section describes models for coupling different visual cues. The ideas in this section are logical extensions of the ideas in the earlier sections. But we are now addressing more complex

More information

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows Kn-Nearest

More information

Physical Layer and Coding

Physical Layer and Coding Physical Layer and Coding Muriel Médard Professor EECS Overview A variety of physical media: copper, free space, optical fiber Unified way of addressing signals at the input and the output of these media:

More information

EC2252 COMMUNICATION THEORY UNIT 5 INFORMATION THEORY

EC2252 COMMUNICATION THEORY UNIT 5 INFORMATION THEORY EC2252 COMMUNICATION THEORY UNIT 5 INFORMATION THEORY Discrete Messages and Information Content, Concept of Amount of Information, Average information, Entropy, Information rate, Source coding to increase

More information

arxiv:physics/ v1 [physics.bio-ph] 19 Feb 1999

arxiv:physics/ v1 [physics.bio-ph] 19 Feb 1999 Odor recognition and segmentation by coupled olfactory bulb and cortical networks arxiv:physics/9902052v1 [physics.bioph] 19 Feb 1999 Abstract Zhaoping Li a,1 John Hertz b a CBCL, MIT, Cambridge MA 02139

More information

ORF 245 Fundamentals of Statistics Joint Distributions

ORF 245 Fundamentals of Statistics Joint Distributions ORF 245 Fundamentals of Statistics Joint Distributions Robert Vanderbei Fall 2015 Slides last edited on November 11, 2015 http://www.princeton.edu/ rvdb Introduction Joint Cumulative Distribution Function

More information

Probabilistic Reasoning in Deep Learning

Probabilistic Reasoning in Deep Learning Probabilistic Reasoning in Deep Learning Dr Konstantina Palla, PhD palla@stats.ox.ac.uk September 2017 Deep Learning Indaba, Johannesburgh Konstantina Palla 1 / 39 OVERVIEW OF THE TALK Basics of Bayesian

More information

CSC321 Lecture 20: Autoencoders

CSC321 Lecture 20: Autoencoders CSC321 Lecture 20: Autoencoders Roger Grosse Roger Grosse CSC321 Lecture 20: Autoencoders 1 / 16 Overview Latent variable models so far: mixture models Boltzmann machines Both of these involve discrete

More information

Bayesian Inference Course, WTCN, UCL, March 2013

Bayesian Inference Course, WTCN, UCL, March 2013 Bayesian Course, WTCN, UCL, March 2013 Shannon (1948) asked how much information is received when we observe a specific value of the variable x? If an unlikely event occurs then one would expect the information

More information

ORIE 4741: Learning with Big Messy Data. Generalization

ORIE 4741: Learning with Big Messy Data. Generalization ORIE 4741: Learning with Big Messy Data Generalization Professor Udell Operations Research and Information Engineering Cornell September 23, 2017 1 / 21 Announcements midterm 10/5 makeup exam 10/2, by

More information

1 Background on Information Theory

1 Background on Information Theory Review of the book Information Theory: Coding Theorems for Discrete Memoryless Systems by Imre Csiszár and János Körner Second Edition Cambridge University Press, 2011 ISBN:978-0-521-19681-9 Review by

More information

An Introduction to Independent Components Analysis (ICA)

An Introduction to Independent Components Analysis (ICA) An Introduction to Independent Components Analysis (ICA) Anish R. Shah, CFA Northfield Information Services Anish@northinfo.com Newport Jun 6, 2008 1 Overview of Talk Review principal components Introduce

More information

Network coding for multicast relation to compression and generalization of Slepian-Wolf

Network coding for multicast relation to compression and generalization of Slepian-Wolf Network coding for multicast relation to compression and generalization of Slepian-Wolf 1 Overview Review of Slepian-Wolf Distributed network compression Error exponents Source-channel separation issues

More information

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)

More information

4 An Introduction to Channel Coding and Decoding over BSC

4 An Introduction to Channel Coding and Decoding over BSC 4 An Introduction to Channel Coding and Decoding over BSC 4.1. Recall that channel coding introduces, in a controlled manner, some redundancy in the (binary information sequence that can be used at the

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Conditional Random Field

Conditional Random Field Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions

More information

Optimization in the Big Data Regime 2: SVRG & Tradeoffs in Large Scale Learning. Sham M. Kakade

Optimization in the Big Data Regime 2: SVRG & Tradeoffs in Large Scale Learning. Sham M. Kakade Optimization in the Big Data Regime 2: SVRG & Tradeoffs in Large Scale Learning. Sham M. Kakade Machine Learning for Big Data CSE547/STAT548 University of Washington S. M. Kakade (UW) Optimization for

More information

Last Time. Today. Bayesian Learning. The Distributions We Love. CSE 446 Gaussian Naïve Bayes & Logistic Regression

Last Time. Today. Bayesian Learning. The Distributions We Love. CSE 446 Gaussian Naïve Bayes & Logistic Regression CSE 446 Gaussian Naïve Bayes & Logistic Regression Winter 22 Dan Weld Learning Gaussians Naïve Bayes Last Time Gaussians Naïve Bayes Logistic Regression Today Some slides from Carlos Guestrin, Luke Zettlemoyer

More information

Dynamical Systems and Deep Learning: Overview. Abbas Edalat

Dynamical Systems and Deep Learning: Overview. Abbas Edalat Dynamical Systems and Deep Learning: Overview Abbas Edalat Dynamical Systems The notion of a dynamical system includes the following: A phase or state space, which may be continuous, e.g. the real line,

More information

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II 1 Non-linear regression techniques Part - II Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Support vector regression Boosting random projections Relevance vector

More information