Natural Image Statistics

Similar documents
CIFAR Lectures: Non-Gaussian statistics and natural images

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts III-IV

Nonlinear reverse-correlation with synthesized naturalistic noise

Emergence of Phase- and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces

Efficient Coding. Odelia Schwartz 2017

EXTENSIONS OF ICA AS MODELS OF NATURAL IMAGES AND VISUAL PROCESSING. Aapo Hyvärinen, Patrik O. Hoyer and Jarmo Hurri

TWO METHODS FOR ESTIMATING OVERCOMPLETE INDEPENDENT COMPONENT BASES. Mika Inki and Aapo Hyvärinen

Independent Component Analysis

Estimating Unnormalized models. Without Numerical Integration

Is early vision optimised for extracting higher order dependencies? Karklin and Lewicki, NIPS 2005

Independent Component Analysis

Learning features by contrasting natural images with noise

Lecture'12:' SSMs;'Independent'Component'Analysis;' Canonical'Correla;on'Analysis'

From independent component analysis to score matching

A two-layer ICA-like model estimated by Score Matching

Hierarchy. Will Penny. 24th March Hierarchy. Will Penny. Linear Models. Convergence. Nonlinear Models. References

Higher Order Statistics

Advanced Introduction to Machine Learning CMU-10715

Temporal Coherence, Natural Image Sequences, and the Visual Cortex

ICA. Independent Component Analysis. Zakariás Mátyás

Lateral organization & computation

Unsupervised Discovery of Nonlinear Structure Using Contrastive Backpropagation

Independent Component Analysis

THE functional role of simple and complex cells has

ADAPTIVE LATERAL INHIBITION FOR NON-NEGATIVE ICA. Mark Plumbley

New Machine Learning Methods for Neuroimaging

The statistical properties of local log-contrast in natural images

Edinburgh Research Explorer

c Springer, Reprinted with permission.

Independent Component Analysis

1 Introduction Independent component analysis (ICA) [10] is a statistical technique whose main applications are blind source separation, blind deconvo

Independent Component Analysis and Its Applications. By Qing Xue, 10/15/2004

ICA [6] ICA) [7, 8] ICA ICA ICA [9, 10] J-F. Cardoso. [13] Matlab ICA. Comon[3], Amari & Cardoso[4] ICA ICA

Independent Component Analysis. Contents

CPSC 540: Machine Learning

Simultaneous Estimation of Nongaussian Components and Their Correlation Structure

Massoud BABAIE-ZADEH. Blind Source Separation (BSS) and Independent Componen Analysis (ICA) p.1/39

Connections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables. Revised submission to IEEE TNN

SUPPLEMENTARY INFORMATION

NON-NEGATIVE SPARSE CODING

Fundamentals of Principal Component Analysis (PCA), Independent Component Analysis (ICA), and Independent Vector Analysis (IVA)

1/12/2017. Computational neuroscience. Neurotechnology.

Independent component analysis: algorithms and applications

Sparse Coding as a Generative Model

Basis Expansion and Nonlinear SVM. Kai Yu

An Introduction to Independent Components Analysis (ICA)

Unsupervised Feature Extraction by Time-Contrastive Learning and Nonlinear ICA

Hierarchical Sparse Bayesian Learning. Pierre Garrigues UC Berkeley

Sean Escola. Center for Theoretical Neuroscience

Information maximization in a network of linear neurons

Neural coding Ecological approach to sensory coding: efficient adaptation to the natural environment

Learning Nonlinear Statistical Regularities in Natural Images by Modeling the Outer Product of Image Intensities

STA 414/2104: Lecture 8

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

Fundamentals of Computational Neuroscience 2e

Estimating Overcomplete Independent Component Bases for Image Windows

Convolutional networks. Sebastian Seung

Estimation of linear non-gaussian acyclic models for latent factors

Mathematical Tools for Neuroscience (NEU 314) Princeton University, Spring 2016 Jonathan Pillow. Homework 8: Logistic Regression & Information Theory

Structural equations and divisive normalization for energy-dependent component analysis

CPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018

Evaluating Models of Natural Image Patches

THE SPIKE PROCESS: A SIMPLE TEST CASE FOR INDEPENDENT OR SPARSE COMPONENT ANALYSIS. Naoki Saito and Bertrand Bénichou

Tutorial on Blind Source Separation and Independent Component Analysis

Simple-Cell-Like Receptive Fields Maximize Temporal Coherence in Natural Video

Independent Component Analysis and Blind Source Separation

How to do backpropagation in a brain

THE retina in general consists of three layers: photoreceptors

CSC321 Lecture 5: Multilayer Perceptrons

Simple-Cell-Like Receptive Fields Maximize Temporal Coherence in Natural Video

ECE521 Lecture7. Logistic Regression

One-unit Learning Rules for Independent Component Analysis

Notes on Noise Contrastive Estimation (NCE)

CSCI 315: Artificial Intelligence through Deep Learning

Jan 16: The Visual System

Probabilistic Machine Learning. Industrial AI Lab.

A summary of Deep Learning without Poor Local Minima

Unsupervised Neural Nets

CPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2017

ACENTRAL problem in neural-network research, as well

A Constrained EM Algorithm for Independent Component Analysis

Bayesian probability theory and generative models

Blind separation of sources that have spatiotemporal variance dependencies

Independent Component Analysis

Natural Gradient Learning for Over- and Under-Complete Bases in ICA

SPARSE REPRESENTATION AND BLIND DECONVOLUTION OF DYNAMICAL SYSTEMS. Liqing Zhang and Andrzej Cichocki

SPIKE TRIGGERED APPROACHES. Odelia Schwartz Computational Neuroscience Course 2017

Artificial Intelligence Module 2. Feature Selection. Andrea Torsello

Lecture 16 Deep Neural Generative Models

Estimation theory and information geometry based on denoising

Visual features: From Fourier to Gabor

Soft Mixer Assignment in a Hierarchical Generative Model of Natural Scene Statistics

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Classification Logistic Regression

Discovery of Linear Acyclic Models Using Independent Component Analysis

Ad Placement Strategies

9 Classification. 9.1 Linear Classifiers

Dimensionality Reduction. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Non-Euclidean Independent Component Analysis and Oja's Learning

Transcription:

Natural Image Statistics A probabilistic approach to modelling early visual processing in the cortex Dept of Computer Science

Early visual processing LGN V1 retina From the eye to the primary visual cortex (V1)

Simple and complex cells Basic dichotomy of neurons in primary visual cortex (V1) Simple cells modelled as linear functions of input: Receptive fields can be simply plotted Complex cells considered strongly nonlinear Invariance (tolerant) to location (phase) of input Modelled as sum of squares of simple cell outputs Simple cells: Selective to orientation and location of the bar Complex cells: Tolerant to exact location

Theories of response properties of visual neurons Edge detection Joint localization in space and frequency Texture classification But: the above give only vague predictions. Here: Statistical-ecological approach (Barlow, 1972) What is important in a real environment? Natural images have statistical regularities. Can we explain receptive fields by basic statistical properties of natural images? Emergence : a lot of precise predictions from only a couple statistical assumptions. Extremely relevant to image processing /engineering

Outline of this talk: Statistical models that account for some properties of the (primary) visual cortex. Independent Component Analysis / Sparse Coding Various extensions Properties in visual cortex explained simple cells complex cells spatial organization (topography) Multi-layer approach can predict properties beyond V1.

Linear statistical models of images = s 1 + s 2 + + s k Denote by I(x, y) the gray-scale values of pixels. Model as a linear sum of basis vectors: I(x,y) = i A i (x,y)s i (1) What are the best basis vectors for natural images?

Independent Component Analysis (Jutten and Hérault, 1991) Linear model: I(x,y) = i A i (x,y)s i (2) In ICA, we assume that The s i are mutually statistically independent The s i are nongaussian, e.g. sparse For simplicity: number of A i equals number of pixels Then, the actual basis vectors A i can be estimated, if the data is actually generated using the linear model (Comon, 1994). Thus we get the best basis vectors from one statistical viewpoint.

Sparsity A form of nongaussianity often encountered in natural signals A random variable is active only rarely 5 gaussian 0 5 5 sparse 0 5 Outputs of linear filters are usually sparse when input is natural images.

Sparse coding and ICA Sparse coding (Barlow 1972): Find linear representation I(x,y) = i A i (x,y)s i (3) so that the s i are as sparse as possible. Important property: a given data point is represented using only a limited number of active (clearly non-zero) components s i. In contrast to PCA, active components change from image patch to patch. Deep result: For images, ICA is sparse coding. Vectorizing whitened image as x, denoting inverse system by w i : min orthog w 1,...,w n Ê{ w T i x } (4)

ICA / sparse coding of natural images (Olshausen and Field, 1996; Bell and Sejnowski, 1997) Using the FastICA algorithm (Hyvärinen, 1999)

ICA of natural images with colour (Hoyer and Hyvärinen, 2000)

Model II: Independent subspace analysis Components estimated from natural images are not really independent. The statistical structure much more complicated (of course!). Independent components cannot be found for most kinds of data: There are not enough free parameters. Dominant form of dependency after ICA is correlation of energies Signals which are uncorrelated but whose squares are correlated.

Using subspaces to model dependency (Hyvärinen and Hoyer, 2000) Assumption: the s i can be divided into groups or subspaces (Cardoso, 1998), such that the s i in the same group are dependent on each other dependencies between different groups are not allowed. We also need to specify the distributions inside the groups Use classic complex cell model, norm of projection in subspace Leads to correlation of squares Maximize independence / sparsity of complex cell output

Computation of features in independent subspace analysis <w 1, I> (.) 2 <w 2, I> <w, I> 3 (.) 2 (.) 2 Σ <w 4, I> (.) 2 Input I <w 5, I> <w, I> 6 <w 7, I> (.) 2 (.) 2 (.) 2 Σ <w 8, I> (.) 2

Independent subspaces of natural image patches Each group of 4 basis vectors corresponds to one complex cell.

Model III: Spatial organization in V1 In the brain, response properties mostly change continuously when moving on the cortical surface. We introduced Topographic ICA (Hyvärinen and Hoyer, 2001) Cells (components) are arranged on a two-dimensional lattice Again, simple cell outputs are sparse, but not independent: Correlations of squares follows topography. Learn by maximizing likelihood which measures sparsity: logp(x W) = Ê h ij (wi Tx)2 i j (5) where h is distance on topographic grid.

Topographic ICA on natural image patches Basic vectors (simple cell RF s) with spatial organization

Digression: Generalizations leading to unnormalized models Generalize ISA and topographic ICA by estimating two layers p(x;w,m) = 1 Z(W,M) exp[ i G( i m ij (w T j x) 2 )] (6) This is an unnormalized model: Density function p is known only up to a multiplicative constant Z(W,M) = exp[ G( m ij (wj T x) 2 )]dx i i which cannot be computed with reasonable computing time Common problem in non-gaussian models, e.g. ICA with more components than observed variables p(x;w) = 1 Z(W) exp[ i w T i x ] (7)

Recent methods for unnormalized statistical models Score matching (Hyvärinen, JMLR, 2005) Take derivative of model log-density w.r.t. x, so partition function disappears Fit this derivative to the same derivative of data density Easy to compute due to a integration-by-parts trick Noise-contrastive estimation (Gutmann and Hyvärinen, JMLR, 2012) Learn to distinguish data from artificially generated noise Logistic regression learns ratios of pdf s of data and noise For known noise pdf, we have in fact learnt data pdf

Another unnormalized model: Linear correlations between components Many methods force components to be strictly uncorrelated So, any remaining linear correlations cannot be observed However, the true components could be correlated even linearly In ongoing work (Sasaki et al, 2013, 2014) we learn correlated components logp(x W) = i Ê{ w T i x }+ i,j β ij Ê{ w T i x w T j x } logz(β ij ) (8)

Model IV: Three-layer model Goal: Alternating selectivity and invariance in many layers Neurons are selective to certain properties of the stimulus: response is strong when those properties take specific values Neurons are tolerant (invariant) to properties: response does not change much when those properties change Example: Simple vs. complex cells in the primary visual cortex: Simple cells: Selective to orientation and location of the bar Complex cells: Tolerant to exact location

Nonlinearities for neural selectivity and tolerance Selectivity has been modelled as AND operation / MIN operation Tolerance (invariance) has been modelled as OR operation / MAX operation Here, we use convex and concave nonlinearites E.g. ( i x4 i )1/4 : similar to MAX E.g. ( i x1/4 i ) 4 : similar to MIN Two first layers similar to squaring + square root in complex cells Learning by maximization of sparsity in each layer

Layer three responses in three-layer model Receptive field (RF) Inhibitory RF Optimal 2nd layer activity (horizontal, vertical, diagonal) Strongly activating images 04 12 28

Neurophysiological modelling vs. Deep learning Deep learning means neural network with many layers Result is often a black box: interpretation difficult For neurophysiological modelling, we would prefer a network where The role of each unit is clear All cell responses model biological responses Instead of blindly stacking many layers on top of each other, we must think about what each layer is doing

For more information on basic models

Conclusion Properties of visual neurons can be quantitatively modelled by statistical properties of natural images. Answers the Why question important in neuroscience Simple cell receptive fields can be learned by maximizing sparsity / independence of linear filters. By modelling dependencies between simple cell ouputs we can model complex cells and topography Three-layer models can use alternating selectivity and invariance Theoretical development on estimation of unnormalized models Many applications in image processing and computer vision