Encoding or decoding

Size: px
Start display at page:

Download "Encoding or decoding"

Transcription

1 Encoding or decoding

2 Decoding How well can we learn what the stimulus is by looking at the neural responses? We will discuss two approaches: devise and evaluate explicit algorithms for extracting a stimulus estimate directly quantify the relationship between stimulus and response using information theory

3 The optimal linear estimator Let s start with a rate response, r(t) and a stimulus, s(t). The optimal linear estimator is closest to satisfying Want to solve for K. Multiply by s(t-t ) and integrate over t:

4 The optimal linear estimator produced terms which are simply correlation functions: Given a convolution, Fourier transform: Now we have a straightforward algebraic equation for K(w): Solving for K(t),

5 The optimal linear estimator For white noise, the correlation function C ss (t) = s 2 d(t), So K(t) is simply C rs (t).

6 Stimulus reconstruction K t

7 Stimulus reconstruction

8 Stimulus reconstruction

9 Reading minds: the LGN Yang Dan, UC Berkeley

10 Other decoding approaches

11 Binary choice tasks Britten et al. 92: measured both behavior + neural responses

12 Behavioral performance

13 Predictable from neural activity? Discriminability: d = ( <r> + - <r> - )/ s r

14 Signal detection theory z p(r -) p(r +) <r> - <r> + Decoding corresponds to comparing test, r, to threshold, z. a(z) = P[ r z -] false alarm rate, size b(z) = P[ r z +] hit rate, power Find z by maximizing P[correct] = p[+] b(z) + p[-](1 a(z))

15 ROC curves summarize performance of test for different thresholds z Want b 1, a 0.

16 ROC: two alternative forced choice Threshold z is the result from the first presentation The area under the ROC curve corresponds to P[correct]

17 Is there a better test to use than r? The optimal test function is the likelihood ratio, l(r) = p[r +] / p[r -]. (Neyman-Pearson lemma) Recall a(z) = P[ r z -] b(z) = P[ r z +] false alarm rate, size hit rate, power Then l(z) = (db/dz) / (da/dz) = db/da i.e. slope of ROC curve

18 The logistic function If p[r +] and p[r -] are both Gaussian, one can show that P[correct] = ½ erfc(-d /2). To interpret results as two-alternative forced choice, need simultaneous responses from + neuron and from neuron. Simulate - neuron responses from same neuron in response to stimulus. Ideal observer: performs as area under ROC curve.

19 More d Again, if and p[r -] and p[r +] are Gaussian, p[+] and p[-] are equal, P[+ r] = 1/ [1 + exp(-d (r - <r>)/ s)]. d is the slope of the sigmoidal fitted to P[+ r]

20 Neurons vs organisms Close correspondence between neural and behaviour.. Why so many neurons? Correlations limit performance.

21 Priors z p[r -] p[r +] <r> - <r> + Role of priors: Find z by maximizing P[correct] = p[+] b(z) + p[-](1 a(z))

22 The wind or a tiger? Classification of noisy data: single photon responses Rieke

23 Nonlinear separation of signal and noise Classification of noisy data: single photon responses P(I noise) P(I signal) I Rieke

24 Nonlinear separation of signal and noise Classification of noisy data: single photon responses P(I noise) P(I signal) I Rieke

25 Nonlinear separation of signal and noise Classification of noisy data: single photon responses P(I noise) P(I signal) I Rieke

26 Nonlinear separation of signal and noise Classification of noisy data: single photon responses P(I noise) P(noise) P(I signal) P(signal) I Rieke

27 How about costs? P(I noise) P(noise) P(I signal) P(signal) I

28 Building in cost Penalty for incorrect answer: L +, L - For an observation r, what is the expected loss? Loss - = L - P[+ r] Loss + = L + P[- r] Cut your losses: answer + when Loss + < Loss - i.e. Using Bayes, L + P[- r] < L - P[+ r]. P[+ r] = p[r +]P[+]/p(r); P[- r] = p[r -]P[-]/p(r); l(r) = p[r +]/p[r -] > L + P[-] / L - P[+].

29 Relationship of likelihood to tuning curves For small stimulus differences s and s + ds like comparing to threshold

30 Decoding from many neurons: population codes Population code formulation Methods for decoding: population vector Bayesian inference maximum likelihood maximum a posteriori Fisher information

31 Cricket cercal cells Jacobs G A et al. J Exp Biol 2008;211: by The Company of Biologists Ltd

32 Cricket cercal cells

33 Population vector RMS error in estimate Theunissen & Miller, 1991

34 Population coding in M1 r 0 Hand reaching direction Cosine tuning curve of a motor cortical neuron

35 Population coding in M1 Cosine tuning: Pop. vector: For sufficiently large N, is parallel to the direction of arm movement

36 Population coding in M1 Cosine tuning: Pop. vector: Difficulties with this coding scheme?

37 Is this the best one can do? The population vector is neither general nor optimal. Optimal : make use of all information in the stimulus/response distributions

38 Bayesian inference Bayes law: likelihood function conditional distribution prior distribution a posteriori distribution marginal distribution

39 Bayesian estimation Want an estimator s Bayes Introduce a cost function, L(s,s Bayes ); minimize mean cost. For least squares cost, L(s,s Bayes ) = (s s Bayes ) 2 ; solution is the conditional mean.

40 Bayesian inference By Bayes law, likelihood function a posteriori distribution

41 Maximum likelihood Find maximum of P[r s] over s More generally, probability of the data given the model Model = stimulus assume parametric form for tuning curve

42 Bayesian inference By Bayes law, likelihood function a posteriori distribution

43 Population vector RMS error in estimate Theunissen & Miller, 1991

44 MAP and ML ML: s* which maximizes p[r s] MAP: s* which maximizes p[s r] Difference is the role of the prior: differ by factor p[s]/p[r] For cercal data:

45 Decoding an arbitrary continuous stimulus Work through a specific example assume independence assume Poisson firing Noise model: Poisson distribution P T [k] = (lt) k exp(-lt)/k!

46 Decoding an arbitrary continuous stimulus E.g. Gaussian tuning curves

47 Need to know full P[r s] Assume Poisson: Assume independent: Population response of 11 cells with Gaussian tuning curves

48 ML Apply ML: maximize ln P[r s] with respect to s Set derivative to zero, use sum = constant From Gaussianity of tuning curves, If all s same

49 MAP Apply MAP: maximise ln p[s r] with respect to s Set derivative to zero, use sum = constant From Gaussianity of tuning curves,

50 Given this data: Prior with mean -2, variance 1 MAP: Constant prior

51 How good is our estimate? For stimulus s, have estimated s est Bias: Variance: Mean square error: Cramer-Rao bound: Fisher information (ML is unbiased: b = b = 0)

52 Fisher information Alternatively: Quantifies local stimulus discriminability

53 Fisher information for Gaussian tuning curves For the Gaussian tuning curves w/poisson statistics:

54 Are narrow or broad tuning curves better? Approximate: Thus, Narrow tuning curves are better But not in higher dimensions!..what happens in 2D?

55 Fisher information and discrimination Recall d' = mean difference/standard deviation Can also decode and discriminate using decoded values. Trying to discriminate s and s+ds: Difference in ML estimate is Ds (unbiased) variance in estimate is 1/I F (s).

56 Limitations of these approaches Tuning curve/mean firing rate Correlations in the population

57 The importance of correlation Shadlen and Newsome, 98

58 The importance of correlation

59 The importance of correlation

60 Entropy and Shannon information Model-based vs model free

61 Entropy and Shannon information For a random variable X with distribution p(x), the entropy is H[X] = - S x p(x) log 2 p(x) Information is defined as I[X] = - log 2 p(x)

62 Mutual information Typically, information = mutual information: how much knowing the value of one random variable r (the response) reduces uncertainty about another random variable s (the stimulus). Variability in response is due both to different stimuli and to noise. How much response variability is useful, i.e. can represent different messages, depends on the noise. Noise can be specific to a given stimulus.

63 Mutual information Information quantifies how independent r and s are: I(s;r) = D KL [P(r,s), P(r)P(s)] Alternatively: I(s;r) = H[P(r)] S s P(s) H[P(r s)].

64 Mutual information Mutual information is the difference between the total response entropy and the mean noise entropy: I(s;r) = H[P(r)] S s P(s) H[P(r s)]. Need to know the conditional distribution P(s r) or P(r s). Take a particular stimulus s=s 0 and repeat many times to obtain P(r s 0 ). Compute variability due to noise: noise entropy

65 Mutual information Information is symmetric in r and s Examples: response is unrelated to stimulus: p[r s] =?, MI =? response is perfectly predicted by stimulus: p[r s] =?

66 Simple example r + encodes stimulus +, r - encodes stimulus - but with a probability of error: P(r + +) = 1- p P(r - -) = 1- p What is the response entropy H[p]? What is the noise entropy?

67 Entropy and Shannon information Entropy Information H[p] = -p + log p + (1-p + )log(1-p + ) When p + = ½, H[P(r s)] = -p log p (1-p)log(1-p)

Signal detection theory

Signal detection theory Signal detection theory z p[r -] p[r +] - + Role of priors: Find z by maximizing P[correct] = p[+] b(z) + p[-](1 a(z)) Is there a better test to use than r? z p[r -] p[r +] - + The optimal

More information

3.3 Population Decoding

3.3 Population Decoding 3.3 Population Decoding 97 We have thus far considered discriminating between two quite distinct stimulus values, plus and minus. Often we are interested in discriminating between two stimulus values s

More information

Neural Decoding. Chapter Encoding and Decoding

Neural Decoding. Chapter Encoding and Decoding Chapter 3 Neural Decoding 3.1 Encoding and Decoding In chapters 1 and 2, we considered the problem of predicting neural responses to known stimuli. The nervous system faces the reverse problem, determining

More information

3 Neural Decoding. 3.1 Encoding and Decoding. (r 1, r 2,..., r N ) for N neurons is a list of spike-count firing rates, although,

3 Neural Decoding. 3.1 Encoding and Decoding. (r 1, r 2,..., r N ) for N neurons is a list of spike-count firing rates, although, 3 Neural Decoding 3.1 Encoding and Decoding In chapters 1 and 2, we considered the problem of predicting neural responses to known stimuli. The nervous system faces the reverse problem, determining what

More information

+ + ( + ) = Linear recurrent networks. Simpler, much more amenable to analytic treatment E.g. by choosing

+ + ( + ) = Linear recurrent networks. Simpler, much more amenable to analytic treatment E.g. by choosing Linear recurrent networks Simpler, much more amenable to analytic treatment E.g. by choosing + ( + ) = Firing rates can be negative Approximates dynamics around fixed point Approximation often reasonable

More information

Population Coding. Maneesh Sahani Gatsby Computational Neuroscience Unit University College London

Population Coding. Maneesh Sahani Gatsby Computational Neuroscience Unit University College London Population Coding Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2010 Coding so far... Time-series for both spikes and stimuli Empirical

More information

Exercises. Chapter 1. of τ approx that produces the most accurate estimate for this firing pattern.

Exercises. Chapter 1. of τ approx that produces the most accurate estimate for this firing pattern. 1 Exercises Chapter 1 1. Generate spike sequences with a constant firing rate r 0 using a Poisson spike generator. Then, add a refractory period to the model by allowing the firing rate r(t) to depend

More information

CSE/NB 528 Final Lecture: All Good Things Must. CSE/NB 528: Final Lecture

CSE/NB 528 Final Lecture: All Good Things Must. CSE/NB 528: Final Lecture CSE/NB 528 Final Lecture: All Good Things Must 1 Course Summary Where have we been? Course Highlights Where do we go from here? Challenges and Open Problems Further Reading 2 What is the neural code? What

More information

!) + log(t) # n i. The last two terms on the right hand side (RHS) are clearly independent of θ and can be

!) + log(t) # n i. The last two terms on the right hand side (RHS) are clearly independent of θ and can be Supplementary Materials General case: computing log likelihood We first describe the general case of computing the log likelihood of a sensory parameter θ that is encoded by the activity of neurons. Each

More information

The Bayesian Brain. Robert Jacobs Department of Brain & Cognitive Sciences University of Rochester. May 11, 2017

The Bayesian Brain. Robert Jacobs Department of Brain & Cognitive Sciences University of Rochester. May 11, 2017 The Bayesian Brain Robert Jacobs Department of Brain & Cognitive Sciences University of Rochester May 11, 2017 Bayesian Brain How do neurons represent the states of the world? How do neurons represent

More information

1/12/2017. Computational neuroscience. Neurotechnology.

1/12/2017. Computational neuroscience. Neurotechnology. Computational neuroscience Neurotechnology https://devblogs.nvidia.com/parallelforall/deep-learning-nutshell-core-concepts/ 1 Neurotechnology http://www.lce.hut.fi/research/cogntech/neurophysiology Recording

More information

Information Theory. Mark van Rossum. January 24, School of Informatics, University of Edinburgh 1 / 35

Information Theory. Mark van Rossum. January 24, School of Informatics, University of Edinburgh 1 / 35 1 / 35 Information Theory Mark van Rossum School of Informatics, University of Edinburgh January 24, 2018 0 Version: January 24, 2018 Why information theory 2 / 35 Understanding the neural code. Encoding

More information

encoding and estimation bottleneck and limits to visual fidelity

encoding and estimation bottleneck and limits to visual fidelity Retina Light Optic Nerve photoreceptors encoding and estimation bottleneck and limits to visual fidelity interneurons ganglion cells light The Neural Coding Problem s(t) {t i } Central goals for today:

More information

Concerns of the Psychophysicist. Three methods for measuring perception. Yes/no method of constant stimuli. Detection / discrimination.

Concerns of the Psychophysicist. Three methods for measuring perception. Yes/no method of constant stimuli. Detection / discrimination. Three methods for measuring perception Concerns of the Psychophysicist. Magnitude estimation 2. Matching 3. Detection/discrimination Bias/ Attentiveness Strategy/Artifactual Cues History of stimulation

More information

Exercise Sheet 4: Covariance and Correlation, Bayes theorem, and Linear discriminant analysis

Exercise Sheet 4: Covariance and Correlation, Bayes theorem, and Linear discriminant analysis Exercise Sheet 4: Covariance and Correlation, Bayes theorem, and Linear discriminant analysis Younesse Kaddar. Covariance and Correlation Assume that we have recorded two neurons in the two-alternative-forced

More information

Estimation of information-theoretic quantities

Estimation of information-theoretic quantities Estimation of information-theoretic quantities Liam Paninski Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/ liam liam@gatsby.ucl.ac.uk November 16, 2004 Some

More information

What is the neural code? Sekuler lab, Brandeis

What is the neural code? Sekuler lab, Brandeis What is the neural code? Sekuler lab, Brandeis What is the neural code? What is the neural code? Alan Litke, UCSD What is the neural code? What is the neural code? What is the neural code? Encoding: how

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons

Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons Yan Karklin and Eero P. Simoncelli NYU Overview Efficient coding is a well-known objective for the evaluation and

More information

Bayesian Inference. 2 CS295-7 cfl Michael J. Black,

Bayesian Inference. 2 CS295-7 cfl Michael J. Black, Population Coding Now listen to me closely, young gentlemen. That brain is thinking. Maybe it s thinking about music. Maybe it has a great symphony all thought out or a mathematical formula that would

More information

Decoding. How well can we learn what the stimulus is by looking at the neural responses?

Decoding. How well can we learn what the stimulus is by looking at the neural responses? Decoding How well can we learn what the stimulus is by looking at the neural responses? Two approaches: devise explicit algorithms for extracting a stimulus estimate directly quantify the relationship

More information

Neural coding Ecological approach to sensory coding: efficient adaptation to the natural environment

Neural coding Ecological approach to sensory coding: efficient adaptation to the natural environment Neural coding Ecological approach to sensory coding: efficient adaptation to the natural environment Jean-Pierre Nadal CNRS & EHESS Laboratoire de Physique Statistique (LPS, UMR 8550 CNRS - ENS UPMC Univ.

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Mathematical Tools for Neuroscience (NEU 314) Princeton University, Spring 2016 Jonathan Pillow. Homework 8: Logistic Regression & Information Theory

Mathematical Tools for Neuroscience (NEU 314) Princeton University, Spring 2016 Jonathan Pillow. Homework 8: Logistic Regression & Information Theory Mathematical Tools for Neuroscience (NEU 34) Princeton University, Spring 206 Jonathan Pillow Homework 8: Logistic Regression & Information Theory Due: Tuesday, April 26, 9:59am Optimization Toolbox One

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

MODULE -4 BAYEIAN LEARNING

MODULE -4 BAYEIAN LEARNING MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities

More information

Bayesian Decision Theory

Bayesian Decision Theory Bayesian Decision Theory Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 1 / 46 Bayesian

More information

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please

More information

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation Variations ECE 6540, Lecture 10 Last Time BLUE (Best Linear Unbiased Estimator) Formulation Advantages Disadvantages 2 The BLUE A simplification Assume the estimator is a linear system For a single parameter

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods Pattern Recognition and Machine Learning Chapter 6: Kernel Methods Vasil Khalidov Alex Kläser December 13, 2007 Training Data: Keep or Discard? Parametric methods (linear/nonlinear) so far: learn parameter

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Logistics CSE 446: Point Estimation Winter 2012 PS2 out shortly Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Last Time Random variables, distributions Marginal, joint & conditional

More information

CS 630 Basic Probability and Information Theory. Tim Campbell

CS 630 Basic Probability and Information Theory. Tim Campbell CS 630 Basic Probability and Information Theory Tim Campbell 21 January 2003 Probability Theory Probability Theory is the study of how best to predict outcomes of events. An experiment (or trial or event)

More information

PART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics

PART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics Table of Preface page xi PART I INTRODUCTION 1 1 The meaning of probability 3 1.1 Classical definition of probability 3 1.2 Statistical definition of probability 9 1.3 Bayesian understanding of probability

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 305 Part VII

More information

Lecture 2: August 31

Lecture 2: August 31 0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 2: August 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy

More information

Multivariate statistical methods and data mining in particle physics

Multivariate statistical methods and data mining in particle physics Multivariate statistical methods and data mining in particle physics RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement of the problem Some general

More information

Computer Vision Group Prof. Daniel Cremers. 3. Regression

Computer Vision Group Prof. Daniel Cremers. 3. Regression Prof. Daniel Cremers 3. Regression Categories of Learning (Rep.) Learnin g Unsupervise d Learning Clustering, density estimation Supervised Learning learning from a training data set, inference on the

More information

Lecture 1: Introduction, Entropy and ML estimation

Lecture 1: Introduction, Entropy and ML estimation 0-704: Information Processing and Learning Spring 202 Lecture : Introduction, Entropy and ML estimation Lecturer: Aarti Singh Scribes: Min Xu Disclaimer: These notes have not been subjected to the usual

More information

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

10-704: Information Processing and Learning Fall Lecture 24: Dec 7

10-704: Information Processing and Learning Fall Lecture 24: Dec 7 0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 24: Dec 7 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy of

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Statistical Signal Processing Detection, Estimation, and Time Series Analysis

Statistical Signal Processing Detection, Estimation, and Time Series Analysis Statistical Signal Processing Detection, Estimation, and Time Series Analysis Louis L. Scharf University of Colorado at Boulder with Cedric Demeure collaborating on Chapters 10 and 11 A TT ADDISON-WESLEY

More information

Stephen Scott.

Stephen Scott. 1 / 35 (Adapted from Ethem Alpaydin and Tom Mitchell) sscott@cse.unl.edu In Homework 1, you are (supposedly) 1 Choosing a data set 2 Extracting a test set of size > 30 3 Building a tree on the training

More information

Neural Encoding Models

Neural Encoding Models Neural Encoding Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2011 Studying sensory systems x(t) y(t) Decoding: ˆx(t)= G[y(t)]

More information

Estimation Tasks. Short Course on Image Quality. Matthew A. Kupinski. Introduction

Estimation Tasks. Short Course on Image Quality. Matthew A. Kupinski. Introduction Estimation Tasks Short Course on Image Quality Matthew A. Kupinski Introduction Section 13.3 in B&M Keep in mind the similarities between estimation and classification Image-quality is a statistical concept

More information

PATTERN CLASSIFICATION

PATTERN CLASSIFICATION PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto CONTENTS

More information

Introduction to Signal Detection and Classification. Phani Chavali

Introduction to Signal Detection and Classification. Phani Chavali Introduction to Signal Detection and Classification Phani Chavali Outline Detection Problem Performance Measures Receiver Operating Characteristics (ROC) F-Test - Test Linear Discriminant Analysis (LDA)

More information

Introduction to Statistical Learning Theory

Introduction to Statistical Learning Theory Introduction to Statistical Learning Theory In the last unit we looked at regularization - adding a w 2 penalty. We add a bias - we prefer classifiers with low norm. How to incorporate more complicated

More information

Primer on statistics:

Primer on statistics: Primer on statistics: MLE, Confidence Intervals, and Hypothesis Testing ryan.reece@gmail.com http://rreece.github.io/ Insight Data Science - AI Fellows Workshop Feb 16, 018 Outline 1. Maximum likelihood

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Bayesian Decision Theory

Bayesian Decision Theory Introduction to Pattern Recognition [ Part 4 ] Mahdi Vasighi Remarks It is quite common to assume that the data in each class are adequately described by a Gaussian distribution. Bayesian classifier is

More information

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation Lecture 15. Pattern Classification (I): Statistical Formulation Outline Statistical Pattern Recognition Maximum Posterior Probability (MAP) Classifier Maximum Likelihood (ML) Classifier K-Nearest Neighbor

More information

Lecture 5: GPs and Streaming regression

Lecture 5: GPs and Streaming regression Lecture 5: GPs and Streaming regression Gaussian Processes Information gain Confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19, 2017 1 Recall: Non-parametric regression Input space X

More information

Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1)

Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1) Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1) Detection problems can usually be casted as binary or M-ary hypothesis testing problems. Applications: This chapter: Simple hypothesis

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Intelligent Systems Statistical Machine Learning

Intelligent Systems Statistical Machine Learning Intelligent Systems Statistical Machine Learning Carsten Rother, Dmitrij Schlesinger WS2014/2015, Our tasks (recap) The model: two variables are usually present: - the first one is typically discrete k

More information

Intelligent Systems Discriminative Learning, Neural Networks

Intelligent Systems Discriminative Learning, Neural Networks Intelligent Systems Discriminative Learning, Neural Networks Carsten Rother, Dmitrij Schlesinger WS2014/2015, Outline 1. Discriminative learning 2. Neurons and linear classifiers: 1) Perceptron-Algorithm

More information

Week 5: Logistic Regression & Neural Networks

Week 5: Logistic Regression & Neural Networks Week 5: Logistic Regression & Neural Networks Instructor: Sergey Levine 1 Summary: Logistic Regression In the previous lecture, we covered logistic regression. To recap, logistic regression models and

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

Variable selection and feature construction using methods related to information theory

Variable selection and feature construction using methods related to information theory Outline Variable selection and feature construction using methods related to information theory Kari 1 1 Intelligent Systems Lab, Motorola, Tempe, AZ IJCNN 2007 Outline Outline 1 Information Theory and

More information

Bayesian probability theory and generative models

Bayesian probability theory and generative models Bayesian probability theory and generative models Bruno A. Olshausen November 8, 2006 Abstract Bayesian probability theory provides a mathematical framework for peforming inference, or reasoning, using

More information

Mixture of Gaussians Models

Mixture of Gaussians Models Mixture of Gaussians Models Outline Inference, Learning, and Maximum Likelihood Why Mixtures? Why Gaussians? Building up to the Mixture of Gaussians Single Gaussians Fully-Observed Mixtures Hidden Mixtures

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 11 Maximum Likelihood as Bayesian Inference Maximum A Posteriori Bayesian Gaussian Estimation Why Maximum Likelihood? So far, assumed max (log) likelihood

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

Machine learning - HT Maximum Likelihood

Machine learning - HT Maximum Likelihood Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce

More information

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1 EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle

More information

Chapter 2 Signal Processing at Receivers: Detection Theory

Chapter 2 Signal Processing at Receivers: Detection Theory Chapter Signal Processing at Receivers: Detection Theory As an application of the statistical hypothesis testing, signal detection plays a key role in signal processing at receivers of wireless communication

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

An introduction to Bayesian inference and model comparison J. Daunizeau

An introduction to Bayesian inference and model comparison J. Daunizeau An introduction to Bayesian inference and model comparison J. Daunizeau ICM, Paris, France TNU, Zurich, Switzerland Overview of the talk An introduction to probabilistic modelling Bayesian model comparison

More information

Lecture 3: Pattern Classification

Lecture 3: Pattern Classification EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures

More information

Basics of Information Processing

Basics of Information Processing Abstract Don H. Johnson Computer & Information Technology Institute Department of Electrical & Computer Engineering Rice University, MS366 Houston, TX 77251 I describe the basics of probability theory,

More information

Application: Can we tell what people are looking at from their brain activity (in real time)? Gaussian Spatial Smooth

Application: Can we tell what people are looking at from their brain activity (in real time)? Gaussian Spatial Smooth Application: Can we tell what people are looking at from their brain activity (in real time? Gaussian Spatial Smooth 0 The Data Block Paradigm (six runs per subject Three Categories of Objects (counterbalanced

More information

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Overview Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Probabilistic Interpretation: Linear Regression Assume output y is generated

More information

INFORMATION PROCESSING ABILITY OF BINARY DETECTORS AND BLOCK DECODERS. Michael A. Lexa and Don H. Johnson

INFORMATION PROCESSING ABILITY OF BINARY DETECTORS AND BLOCK DECODERS. Michael A. Lexa and Don H. Johnson INFORMATION PROCESSING ABILITY OF BINARY DETECTORS AND BLOCK DECODERS Michael A. Lexa and Don H. Johnson Rice University Department of Electrical and Computer Engineering Houston, TX 775-892 amlexa@rice.edu,

More information

Artificial Neural Networks Examination, June 2004

Artificial Neural Networks Examination, June 2004 Artificial Neural Networks Examination, June 2004 Instructions There are SIXTY questions (worth up to 60 marks). The exam mark (maximum 60) will be added to the mark obtained in the laborations (maximum

More information

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression

More information

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Lecture 4: Types of errors. Bayesian regression models. Logistic regression Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture

More information

If there exists a threshold k 0 such that. then we can take k = k 0 γ =0 and achieve a test of size α. c 2004 by Mark R. Bell,

If there exists a threshold k 0 such that. then we can take k = k 0 γ =0 and achieve a test of size α. c 2004 by Mark R. Bell, Recall The Neyman-Pearson Lemma Neyman-Pearson Lemma: Let Θ = {θ 0, θ }, and let F θ0 (x) be the cdf of the random vector X under hypothesis and F θ (x) be its cdf under hypothesis. Assume that the cdfs

More information

Digital Transmission Methods S

Digital Transmission Methods S Digital ransmission ethods S-7.5 Second Exercise Session Hypothesis esting Decision aking Gram-Schmidt method Detection.K.K. Communication Laboratory 5//6 Konstantinos.koufos@tkk.fi Exercise We assume

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30 Problem Set 2 MAS 622J/1.126J: Pattern Recognition and Analysis Due: 5:00 p.m. on September 30 [Note: All instructions to plot data or write a program should be carried out using Matlab. In order to maintain

More information

Advanced statistical methods for data analysis Lecture 1

Advanced statistical methods for data analysis Lecture 1 Advanced statistical methods for data analysis Lecture 1 RHUL Physics www.pp.rhul.ac.uk/~cowan Universität Mainz Klausurtagung des GK Eichtheorien exp. Tests... Bullay/Mosel 15 17 September, 2008 1 Outline

More information

Machine Learning

Machine Learning Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 1, 2011 Today: Generative discriminative classifiers Linear regression Decomposition of error into

More information

Bayesian inference J. Daunizeau

Bayesian inference J. Daunizeau Bayesian inference J. Daunizeau Brain and Spine Institute, Paris, France Wellcome Trust Centre for Neuroimaging, London, UK Overview of the talk 1 Probabilistic modelling and representation of uncertainty

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Minimum Error-Rate Discriminant

Minimum Error-Rate Discriminant Discriminants Minimum Error-Rate Discriminant In the case of zero-one loss function, the Bayes Discriminant can be further simplified: g i (x) =P (ω i x). (29) J. Corso (SUNY at Buffalo) Bayesian Decision

More information

Machine Learning, Midterm Exam

Machine Learning, Midterm Exam 10-601 Machine Learning, Midterm Exam Instructors: Tom Mitchell, Ziv Bar-Joseph Wednesday 12 th December, 2012 There are 9 questions, for a total of 100 points. This exam has 20 pages, make sure you have

More information

Neural Decoding. Mark van Rossum. School of Informatics, University of Edinburgh. January 2012

Neural Decoding. Mark van Rossum. School of Informatics, University of Edinburgh. January 2012 Neural Decoding Mark van Rossum School of Informatics, University of Edinburgh January 2012 0 Acknowledgements: Chris Williams and slides from Gatsby Liam Paninski. Version: January 31, 2018 1 / 63 2 /

More information

Lecture 9: PGM Learning

Lecture 9: PGM Learning 13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and

More information

Artificial Neural Networks Examination, March 2004

Artificial Neural Networks Examination, March 2004 Artificial Neural Networks Examination, March 2004 Instructions There are SIXTY questions (worth up to 60 marks). The exam mark (maximum 60) will be added to the mark obtained in the laborations (maximum

More information