Exercise Sheet 4: Covariance and Correlation, Bayes theorem, and Linear discriminant analysis

Similar documents
+ + ( + ) = Linear recurrent networks. Simpler, much more amenable to analytic treatment E.g. by choosing

Signal detection theory

Encoding or decoding

Generative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham

!) + log(t) # n i. The last two terms on the right hand side (RHS) are clearly independent of θ and can be

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

3 Neural Decoding. 3.1 Encoding and Decoding. (r 1, r 2,..., r N ) for N neurons is a list of spike-count firing rates, although,

Section 2.6 Solving Linear Inequalities

Neural Decoding. Chapter Encoding and Decoding

Exercises. Chapter 1. of τ approx that produces the most accurate estimate for this firing pattern.

A Brief Review of Probability, Bayesian Statistics, and Information Theory

3.3 Population Decoding

What is the neural code? Sekuler lab, Brandeis

Review of Maximum Likelihood Estimators

Lecture 9: Classification, LDA

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

Lecture 9: Classification, LDA

Lecture 7 Introduction to Statistical Decision Theory

AT2 Neuromodeling: Problem set #3 SPIKE TRAINS

The Bayesian Brain. Robert Jacobs Department of Brain & Cognitive Sciences University of Rochester. May 11, 2017

Comments. x > w = w > x. Clarification: this course is about getting you to be able to think as a machine learning expert

Artificial Neural Networks Examination, March 2004

Week 5: Logistic Regression & Neural Networks

Bayes Rule for Minimizing Risk

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification

Lecture 3: Pattern Classification

The Multivariate Gaussian Distribution [DRAFT]

A Probability Review

Intelligent Systems Statistical Machine Learning

CMSC858P Supervised Learning Methods

LECTURE NOTE #3 PROF. ALAN YUILLE

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Classification 1: Linear regression of indicators, linear discriminant analysis

7 Gaussian Discriminant Analysis (including QDA and LDA)

8 Basics of Hypothesis Testing

CIS 390 Fall 2016 Robotics: Planning and Perception Final Review Questions

1/12/2017. Computational neuroscience. Neurotechnology.

Math for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han

Concerns of the Psychophysicist. Three methods for measuring perception. Yes/no method of constant stimuli. Detection / discrimination.

Linear Classification: Probabilistic Generative Models

Intelligent Systems Statistical Machine Learning

The Diffusion Model of Speeded Choice, from a Rational Perspective

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

The homogeneous Poisson process

Review (Probability & Linear Algebra)

Machine Learning, Midterm Exam: Spring 2009 SOLUTION

Bayesian Decision Theory

Computer Vision Group Prof. Daniel Cremers. 3. Regression

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1

Discriminant analysis and supervised classification

CPSC 540: Machine Learning

Informatics 2B: Learning and Data Lecture 10 Discriminant functions 2. Minimal misclassifications. Decision Boundaries

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

Machine Learning Lecture 2

Lecture 3: Pattern Classification. Pattern classification

Bayes Decision Theory

Robust regression and non-linear kernel methods for characterization of neuronal response functions from limited data

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory

L2: Review of probability and statistics

Machine Learning, Fall 2009: Midterm

Stephen Scott.

Statistical inference

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

18.01 Single Variable Calculus Fall 2006

Generative v. Discriminative classifiers Intuition

Machine Learning Linear Classification. Prof. Matteo Matteucci

Computer vision: models, learning and inference

COM336: Neural Computing

Logistic Regression. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

DETECTION theory deals primarily with techniques for

Lecture Note 1: Probability Theory and Statistics

Naive Bayes and Gaussian Bayes Classifier

CO6: Introduction to Computational Neuroscience

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Bayesian Decision and Bayesian Learning

Minimum Error Rate Classification

Naive Bayes and Gaussian Bayes Classifier

Population Coding. Maneesh Sahani Gatsby Computational Neuroscience Unit University College London

Naive Bayes and Gaussian Bayes Classifier

Estimation Tasks. Short Course on Image Quality. Matthew A. Kupinski. Introduction

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

L20: MLPs, RBFs and SPR Bayes discriminants and MLPs The role of MLP hidden units Bayes discriminants and RBFs Comparison between MLPs and RBFs

44 CHAPTER 2. BAYESIAN DECISION THEORY

Neural Coding: Integrate-and-Fire Models of Single and Multi-Neuron Responses

Lecture Notes 1: Vector spaces

Parametric Techniques

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Expect Values and Probability Density Functions

Bayesian Decision Theory

Bayesian Learning (II)

Regression. Oscar García

ECE521 week 3: 23/26 January 2017

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21

Lecture 9: March 26, 2014

10-701/ Machine Learning, Fall

ECE662: Pattern Recognition and Decision Making Processes: HW TWO

Transcription:

Exercise Sheet 4: Covariance and Correlation, Bayes theorem, and Linear discriminant analysis Younesse Kaddar. Covariance and Correlation Assume that we have recorded two neurons in the two-alternative-forced choice task discussed in class. We denote the firing rate of neuron in trial i as and the firing rate of neuron as. We furthermore denote the averaged firing rate of neuron as and of neuron as. Let us "center" the data,i r,i r r by defining two data vectors: x, r, r y r, r r, r (a Show that the variance of the firing rates of the first neuron is: Var( = x The actual variance of is not known. So to estimate it, we compute the average of the squared deviations on a sample (,, of the population: the biased sample variance:,, Var biased ( = (,i r i= = i=,i μ Var( To what extent is this estimate biased? By denoting by (resp. the actual mean (resp. variance of :

E( Var biased ( = E ( (,i r i= = E i= (,i,j j= = E r +,i i= r,i,j j= (,j j= = E ( + r,i i= r,i,j j=,j,k j,k = ( E( E( + E( i= r,i j=,i,j j,k = ( E( E( E( E( i= r,i r,i j= j i,i,j,j,k + E( + E( E( j r,j j k,j = ( E( E( E( E( r r,i,i j= j i,i,j,k + E( + E( E( j r,j j k = ( E( r E( r,i,i Var( + μ j= j i,j,k E( r,i E(,j μ + E( r + E(,j,j E(,k j j k =μ =μ = ((Var( + μ (Var( + ( + (Var( + + μ μ μ ( μ = Var( = μ (by independence Thus we get the following unbiased sample variance: Var( = ( Var biased = (,i r i= = x since x = (,i r i= And we get the expected result: Var( = x (b Compute the cosine of the angle between x and y. What do you get? cos(x, y = x y x y (,i r ( r,i r i= = ( Var( ( Var( = (,i r ( r,i r i= Var( Var( r r Analogously to what we did in the previous question, we can show that:

Cov(, r = ( (,i r r,i r i= Therefore: Cov(, r cos(x, y = Var( Var( r which is the correlation coefficient ( c What are the maximum and minimum values that the correlation coefficient between and r can take? Why? By the Cauchy-Schwarz inequality: Cov(, Var( Var( r r i.e. Cov(, r Var( Var( r hence Cov(, r Var( Var( The maximum (resp. minimum value of the correlation coefficient between and is (resp. r r By Cauchy-Schwarz: α 0 i = α( 0 α 0 i = α( a correlation coefficient of corresponds to colinear vectors: there exists such that for all,,i r r,i r a correlation coefficient of corresponds to uncorrelated vectors a correlation coefficient of corresponds to anti-colinear vectors: there exists such that for all,,i r r,i r (d What do you think the term centered refers to? Centered means that the data sample mean is equal to 0. We can easily check it is the case here (let s do it for x, it is similar for y: i= (,i r = (,i r = 0 i=,i i=. Bayes theorem The theorem of Bayes summarizes all the knowledge we have about about the stimulus by observing the responses of a set of neurons, independently of the specific decoding rule. To get a better intuition about this theorem, we will look at the motion discrimination task again and compute the probability ( ( s {, } r that the stimulus moved to the left or right. For a stimulus, and a firing rate response of a single neuron, Bayes theorem reads: p(s r = p(r sp(s p(r p(r s r s Here, is the probability that the firing rate is if the stimulus was. The respective distribution can be measured and we assume that it follows a Gaussian probability density with mean and standard deviation σ: The relative frequency with which the stimuli (leftward or rightward motion, or appear is denoted by, often called the prior probability or, for short, the prior. The distribution p(r μ s (r μ p(r s = exp( s σ πσ p(s denotes the probability of observing a response r, independent of any knowledge about the stimulus.

(a How can you calculate As {{ }, { }} p(r is a partition of the sample space:? What shape does it have? s s p(r = p(r s = p(r sp(s = p(r p( + p(r p( As a result: p(r = (p( exp( + p( exp( (r μ (r μ σ σ πσ To use the same example as in the last exercise sheet: if the firing rates range from to Hz, and the means and standard deviation are set to be: 60 Hz μ 40 Hz μ σ = Hz 0 00 then we have: Figure.a. - Plot of the neuron's response probability for left and right motion Figure.a. - Plot of the previous neuron's response (in green probability, if p( = /3 and p( = /3

(b The distribution is often called the posterior probability or, for short, the posterior. Calculate the posterior for s = and sketch it as a function of r, assuming a prior p( = p( = /. Draw the posterior into the same plot. By Bayes theorem: p( r p(s r (rμ ( So is a sigmoid of μ μ μ. p( r p( r p( r As for, as is a probability distribution: σ p(r p( p( r = p(r p(r p( = p(r p( + p(r p( = + p(r p( p(r p( = + p(r p(r = (rμ (r + exp( μ σ = (r + exp( μ ( μ μ μ σ p( r = p( r Figure.b. - Plot of the previous neuron's response (in green probability, if p( = p( = /

Figure.b. - Plot of the previous neuron's posteriors, if p( = p( = / p( ( c What happens if you change the prior? Investigate how the posterior changes if becomes much larger than and vice versa. Make a sketch similar to (b. p( p( If the prior changes, the ratio is no longer equal to and the posterior changes as well, insofar as the general expression of p( r is: p( p( r = p( (r + exp( μ μ ( μ μ p( σ = = p( r (r + exp( μ μ ( μ μ p( + ln( σ p( As a matter of fact: p( >> p( if (for a fixed r: p( << p( if (for a fixed r: p( (r μ p( r exp( μ ( μ μ p( σ p( p( r p( p( 0 + + p( 0 +

Figure.c - Plot of the previous neuron's posteriors, for various values of the prior (d Let us assume that we decide for leftward motion whenever μ μ. Interpret this decision rule in the plots above. How well does this rule do depending on the prior? What do you lose when you move from the full posterior to a simple decision (decoding rule? B: the decision rule mentioned in the problem statement suggests that μ μ, i.e. we are now considering a neuron whose tuning is such that is fires more strongly if the motion stimulus moves leftwards. As nothing was specified before, the example neuron we were focusing on so far was of opposite tuning. We will, from now on, switch to a neuron such that: μ > > μ r > ( + Figure.d. - Plot of the response probability of the neuron we are focusing on from now on, for left and right motion

Figure.d. - Plot of the new neuron posteriors (for various priors and r > ( + r > ( μ + μ decision rule for leftward motion The μ μ decision rule corresponds to inferring that the motion is to the left (resp. to the right in the red (resp. blue area of figure.d., which is perfectly suited to the case where (bold dash-dotted curves. p( = p( = / However, it does a poor job at predicting the motion direction in the other cases, since it does not take into account the changing prior. To infer a more sensible rule, let us reconsider the situation: we want to come up with a threshold rate such that an ideal observer - making the decision that the motion that caused the neuron to fire at r > r th was to the left (preferred direction of the neuron - would have a higher chance of being right than wrong. In other words, the decision rule we are looking for is: p( r > p( r r th As such, r th satisfies: p( r th = p( r th p( = / r th (r th μ exp( μ ( μ μ p( + ln( = σ p( σ = p( μ r th ln( + + μ μ μ p( A more sensible decoding rule would then be: B: we can check that, in case p( = p( σ r > p( ln( + μ μ p(, we do recover the previous rule. μ + μ However, moving from the full posterior to a simple decision rule makes us lose the information relating to how much we are sure about the decision p( r p( r p( r > p( r made. Indeed, the full content of the information is contained in the posterior values and. If, then there is a higher chance that the motion that caused the neuron to fire was to the left, but just stating that doesn t tell you how much higher. You need p( r and p( r p( r to do so: as it happens, the chance of leftward motion is times higher. p( r 3. Linear discriminant analysis Let us redo the calculations for the case of neurons. If we denote by r the vector of firing rates, Bayes theorem reads: p(s r = p(r sp(s p(r We assume that the distribution of firing rates again follows a Gaussian so that:

p(r s = exp( (r (r (π / det μ C s T C μ s where μ s denotes the mean of the density for stimulus s {, } C is the covariance matrix assumed identical for both stimuli (a Compute the log-likelihood ratio l(r ln p(r p(r l(r = ln exp( (r (r (π / det C μ T C μ exp( (r (r (π / det C μ T C μ = ln( exp( (r μ (r T C μ exp( (r μ T C (r μ = ln(exp( (r μ (r ln(exp( (r (r T C μ μ T C μ = (r μ (r (r (r T C μ μ T C μ = ( r + r rt C μ T C μ r T C μ μ T C r T C r μ T C μ + r T C μ + μ T C r μt C μ μ T C μ r T C μ μ T C r T C μ μ T C = ( ( + r + ( + r But: R r T C μ s = ( r T C μ s T = μ T s ( C T r since C is a covariance matrix, it is symmetric, hence: ( = ( C = = I, and C T C T C T I T ( C T = ( C T Therefore: l(r = ( + μt C μ μ T C μ r T C μ r T C μ and l(r = ( + μt C μ μ T C μ r T C ( μ μ α β (b Assume that is the decision boundary, so that any firing rate vector r giving a log-likelihood ratio larger than zero is classified as coming from the stimulus. Compute a formula for the decision boundary. What shape does this boundary have? First, let us show how we could come up with the So the l(r > 0 l(r = 0 l(r > 0 decision boundary is a good one if and only if decision boundary. p( r > p( r p( r > p( r p(r p( /p(r > p(r p( /p(r p(r p( > p(r p( p(r p( > p(r p( p(r p( l(r ln( > ln( p(r p(

p( ln( = 0p( = p( = / p( The formula for this decision boundary is the following one: l(r = 0 ( + = 0 μt C μ μ T C μ r T C ( μ μ It s an affine hyperplane (i.e. a subspace of codimension in the -dimensional space. α r T β = α n R n β p( = p( = / ( c Assume that. Assume we are analyzing two neurons with uncorrelated activities, so that the covariance matrix is: σ C = ( 0 0 σ Sketch the decision boundary for this case. With: μ (60 Hz, 40 Hz μ (40 Hz, 60 Hz C ( Hz 0 0 5 Hz l(r = + > 0 we get, by resorting to the LDA decision boundary r T β α : Figure 3.c - Plot of the LDA decision boundary for the two neurons with uncorrelated activities