Natural Image Statistics and Neural Representations

Similar documents
Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Dept. of Linguistics, Indiana University Fall 2015

Neural coding Ecological approach to sensory coding: efficient adaptation to the natural environment

3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H.

Information Theory. Mark van Rossum. January 24, School of Informatics, University of Edinburgh 1 / 35

Lecture 2: August 31

Information Theory (Information Theory by J. V. Stone, 2015)

Introduction to Information Theory

Quantitative Biology Lecture 3

encoding and estimation bottleneck and limits to visual fidelity

Entropies & Information Theory

Information in Biology

Lecture 1: Introduction, Entropy and ML estimation

Adaptation in the Neural Code of the Retina

Information in Biology

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye

Efficient Coding. Odelia Schwartz 2017

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

Information Theory Primer:

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

Characterizing Activity Landscapes Using an Information-Theoretic Approach

Information Theory, Statistics, and Decision Trees

Information. = more information was provided by the outcome in #2

A Gentle Tutorial on Information Theory and Learning. Roni Rosenfeld. Carnegie Mellon University

15 Grossberg Network 1

Spatial Vision: Primary Visual Cortex (Chapter 3, part 1)

Information Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

5 Mutual Information and Channel Capacity

Limits on classical communication from quantum entropy power inequalities

(Classical) Information Theory III: Noisy channel coding

Principles of Communications

Information & Correlation

Information Theory in Intelligent Decision Making

COMPSCI 650 Applied Information Theory Jan 21, Lecture 2

(Classical) Information Theory II: Source coding

3F1: Signals and Systems INFORMATION THEORY Examples Paper Solutions

Shannon s noisy-channel theorem

Slide11 Haykin Chapter 10: Information-Theoretic Models


16.36 Communication Systems Engineering

Series 7, May 22, 2018 (EM Convergence)

ECE Information theory Final

CSCI 2570 Introduction to Nanocomputing

Block 2: Introduction to Information Theory

Overview. Analog capturing device (camera, microphone) PCM encoded or raw signal ( wav, bmp, ) A/D CONVERTER. Compressed bit stream (mp3, jpg, )

Lecture 5 - Information theory

Chapter 9 Fundamental Limits in Information Theory

Information Theory. Week 4 Compressing streams. Iain Murray,

Learning Predictive Filters

Nonlinear reverse-correlation with synthesized naturalistic noise

Computational Systems Biology: Biology X

Machine Learning Srihari. Information Theory. Sargur N. Srihari

3F1 Information Theory, Lecture 3

Expectation Maximization

Multimedia Networking ECE 599

ECE Information theory Final (Fall 2008)

Advanced Introduction to Machine Learning CMU-10715

Lecture 16 Oct 21, 2014

ELEMENT OF INFORMATION THEORY

1 Introduction. Sept CS497:Learning and NLP Lec 4: Mathematical and Computational Paradigms Fall Consider the following examples:

Fisher Information Quantifies Task-Specific Performance in the Blowfly Photoreceptor

Massachusetts Institute of Technology

Exercise 1. = P(y a 1)P(a 1 )

Bioinformatics: Biology X

Machine Learning. Lecture 02.2: Basics of Information Theory. Nevin L. Zhang

The Method of Types and Its Application to Information Hiding

ELEC546 Review of Information Theory

Part I. Entropy. Information Theory and Networks. Section 1. Entropy: definitions. Lecture 5: Entropy

6.02 Fall 2012 Lecture #1

Lecture 1. Introduction

Medical Imaging. Norbert Schuff, Ph.D. Center for Imaging of Neurodegenerative Diseases

2D Image Processing (Extended) Kalman and particle filter

Outline. Computer Science 418. Number of Keys in the Sum. More on Perfect Secrecy, One-Time Pad, Entropy. Mike Jacobson. Week 3

Modeling and Characterization of Neural Gain Control. Odelia Schwartz. A dissertation submitted in partial fulfillment

An Introduction to Independent Components Analysis (ICA)

Homework 1 Due: Thursday 2/5/2015. Instructions: Turn in your homework in class on Thursday 2/5/2015

Chapter 2 Review of Classical Information Theory

Do Neurons Process Information Efficiently?

Information and Entropy

Massachusetts Institute of Technology. Solution to Problem 1: The Registrar s Worst Nightmare

A Simple Introduction to Information, Channel Capacity and Entropy

Coding on Countably Infinite Alphabets

Context tree models for source coding

Noisy channel communication

Lecture 8: Shannon s Noise Models

Problem Set: TT Quantum Information

8/13/10. Visual perception of human motion. Outline. Niko Troje BioMotionLab. Perception is. Stimulus Sensation Perception. Gestalt psychology

arxiv:physics/ v1 [physics.data-an] 24 Apr 2000 Naftali Tishby, 1,2 Fernando C. Pereira, 3 and William Bialek 1

Can the sample being transmitted be used to refine its own PDF estimate?

Two Applications of the Gaussian Poincaré Inequality in the Shannon Theory

ECE472/572 - Lecture 11. Roadmap. Roadmap. Image Compression Fundamentals and Lossless Compression Techniques 11/03/11.

How to Quantitate a Markov Chain? Stochostic project 1

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157

Noisy-Channel Coding

Pattern Recognition and Machine Learning. Artificial Neural networks

BASIC VISUAL SCIENCE CORE

10-704: Information Processing and Learning Fall Lecture 10: Oct 3

Shannon's Theory of Communication

Natural Image Statistics

Transcription:

Natural Image Statistics and Neural Representations Michael Lewicki Center for the Neural Basis of Cognition & Department of Computer Science Carnegie Mellon University? 1

Outline 1. Information theory review, coding of 1D signals 2. Sparse coding, ICA, coding of natural images and motion 3. Representing images with noisy neural populations 4. Learning higher-order image structure? 2

Visual Coding What are the computational problems of visual coding? What signal should be sent out of the retina? How do we approach this theoretically?? 3

After the retina... at least 23 distinct neural pathways out of the retina some receive from a single type of retinal cell, some from many, one eye, both... there is no simple function division? 4

Why is it like this? Evolutionary viewpoint: success depends on whole organism and cooperation of areas and cell types there is no opportunity to redesign, functions simply pile up layers and layers of goo not engineering, but tinkering there are few clean functional divisions, i.e. there are not distinct channels for color or motion? 5

Types of optical systems Suprachiasmatic nucleus: generate the circadian rythm Accessory optic system: helps stabilize retinal image during head movement Superior colliculus: integrates visual and auditory information together with head movements, directs eyes to regions of interest Pretectum: plays role in adjusting size of pupil to changes in light intensity, and in tracking large moving objects Pregeniculate: function unknown, but cells are responsive to ambient light level lateral geniculate: main relay to visual cortex; contains 6 distinct layers, each with 2 sublayers. Organization is very complex and cells have a wide range of sensitivities including contrast, color, and motion.? 6

Where is this headed?? 7

A theoretical approach Look at the system from a function perspective: What problems does it need to solve? abstract from the details, make predictions from theoretical principles You can only have data after you do your theory. Models are bottom-up, theories are top-down. What are the relevant principles?? 8

Information theory: a short introduction Entropy: measure of irreducible signal complexity lower bound on how much a signal can be compressed without loss Information of symbol w: I(w) log 2 P (w) For a random variable X, with probability P (x), the entropy is the average amount of information obtained by observing x: H(X) = x P (x)i(x) = x P (x) log 2 P (x) H only depends on the probabilty, not value Gives lower bound on average bits per code word. Average coding cost for a message of length L (assuming independence) is LH(X)bits.? 9

Example A single random variable X with X = 1 with probability p and X = 0 with probability 1 p. Note that H(p) is 1 bit when p = 1/2.? 10

Capacity Capacity is the maximum amount of information per symbol: C = log 2 N Maximum is when all N symbols have equal probability. English: C = log 2 27 = 4.73 bits/letter Image: 8 256 256 for 8 bit 256 2 image. Actual entropy, i.e. the irreducible part, is much less. Why?? 11

Redundancy Redundancy is a measure of (in)efficiency or actual entropy relative to capacity: R = 1 H(x)/C Capacity is maximum when code words (symbols) have equal frequency no inter-symbol redundancy Examples English: letter probs not equal, letters not indep. Images: pixel value probs not equal, pixels not indep.? 12

Example of symbols in english: A-Z and space? 13

A fourth order approximation Note that as the order is increased: entropy decreases: H 0 = 4.76 bits, H 1 = 4.03 bits, and H 4 = 2.8 bits/char variables, i.e. P (c i c i 1, c i 2,..., c i k ), specify more specific structure generated samples look more like real English This is an example of the relationship between efficient coding and representation of signal structure.? 14

The same model can also be applied to words Specifying higher-order word models is problemlatic because the number of variables increases as N k, where N is the number of words (e.g. 50,000 in English) and k is the order of the model.? 15

A general approach to coding: redundancy reduction 1 Correlation of adjacent pixels 0.8 0.6 0.4 0.2 Why reduce redundancy? This is equivalent to efficient coding. 0 0 0.2 0.4 0.6 0.8 1? 16

Why code efficiently? Information bottleneck: restriction on information flow rate channel capacity computational bottleneck 5 10 6 40 50 bits/sec need even probabilities for associative learning easy to calc joint probs for independent vars facilitate pattern recognition independent features are more informative better sensory codes could simply further processing? 17

The bottleneck in vision Eyes must move small, thin cord 100 million photoreceptors 1 million optic nerve fibers Fovea already provides a great reduction in amount of information How do we reliably transmit the important visual information?? 18

A little more information theory H(X) is a measure of how much information it takes on average to describe random variable X. If we know p(x), we can calculate entropy or optimize the model for the data, but what if we don t know p(x) and can only approximate it, e.g. with q(x)? How many bits does this inaccuracy cost us?? 19

Relative Entropy The relative entropy D(p q) is a measure of the inefficiency of assuming distribution q when the true distribution is p. If we knew p we could construct code with average code word length H(p). If we assume q, the best average code length we can achieve is H(p) + D(p q) D(p q) = x p(x) log p(x) q(x) D(p q) = 0 p = q This is also called the Kullback Leibler divergence It is not called a distance, becase it is not symmetric and does not satisfy the triangle inequality.? 20

Information theoretic viewpoint Use Shannon s source coding theorm. L = E[l(X)] x = x p(x) log 1 q(x) p(x) log p(x) q(x) + x p(x) log 1 p(x) = D KL (p q) + H(p) D KL is the Kullback-Leibler divergence. If model density q(x) equals true density p(x) then D KL = 0. q(x) gives lower bound on average code length. greater coding efficiency more learned structure Principle Good codes capture the statistical distribution of sensory patterns. How do we descibe the distribution?? 21

Contrast response function in the fly eye (Laughlin, 1981) fly LMC (large monopolar cells) interneuron in compound eye output is graded potential How to set sensitivity? too high response saturated too low range under utilized Idea: predict contrast reponse function using information theory.? 22

Maximizing information transfer with limited channel capacity inputs follow given distribution transform so that output levels are used with equal frequency each response state has equal area ( equal probability) continuum limit is cumulative pdf of input distribution? 23

Another example with different statistics Mathematical form is as cumulative probability. For y = g(c) y y max = c c min P (c )dc? 24

Testing the theory Laughlin 1981: collect natural scenes to get stimulus pdf 15,000 readings use linear scans: 10, 25, or 50 calc contrast within each scan: I/ I measure actual response of LMC to varying contrasts fly LMC transmits information efficiently? 25

Coding a natural intensity time series van Hateren and Snippe (2001) recorded with a photometer, walking around outdoors dynamic range of intensity is much larger than that of photoreceptors large changes can occur on short time-scale Most, if not all, species can quickly adjust their gain to changing light levels. Questions: How should the signal be transformed? What gain control model should be used? How should the optimality the system (the fly in this case) be evaluated?? 26

An evaluation method for non-linear encoding models? 27

Measuring the capacity of the system The noise is given by: N = S S est. The signal to noise ratio is SNR = S ests est NN? 28

A linear model prediction of neural response of linear model is poor For linear model, coherence is sub-optimal at all frequencies.? 29

Gain model with a static non-linearity: log Coherence at low frequencies is improved, but coding is not perfect.? 30

Gain model with a static non-linearity: sqrt sqrt is slightly worse than log.? 31

Gain model with a Dynamic non-linearity? 32

Gain model with a Dynamic non-linearity? 33

Gain model with a Dynamic non-linearity? 34

Best model requires several stages? 35

Coherence rates of the different models Upper bound on capacity of fly photoreceptor is measured by estimating the variability in the response to the same stimulus.? 36