HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

Similar documents
Class 26: review for final exam 18.05, Spring 2014

14.30 Introduction to Statistical Methods in Economics Spring 2009

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

Central Limit Theorem and the Law of Large Numbers Class 6, Jeremy Orloff and Jonathan Bloom

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

2011 Pearson Education, Inc

I - Probability. What is Probability? the chance of an event occuring. 1classical probability. 2empirical probability. 3subjective probability

Lecture 1: Probability Fundamentals

15-388/688 - Practical Data Science: Basic probability. J. Zico Kolter Carnegie Mellon University Spring 2018

Probability. Lecture Notes. Adolfo J. Rumbos

Compute f(x θ)f(θ) dθ

Introduction to Bayesian Statistics

STA Module 4 Probability Concepts. Rev.F08 1

18.05 Final Exam. Good luck! Name. No calculators. Number of problems 16 concept questions, 16 problems, 21 pages

(1) Introduction to Bayesian statistics

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10

2. A Basic Statistical Toolbox

Problems from Probability and Statistical Inference (9th ed.) by Hogg, Tanis and Zimmerman.

Chapter 14. From Randomness to Probability. Copyright 2012, 2008, 2005 Pearson Education, Inc.

Chapter 2: The Random Variable

Chapter Learning Objectives. Random Experiments Dfiii Definition: Dfiii Definition:

Comparison of Bayesian and Frequentist Inference

Bayesian Updating with Continuous Priors Class 13, Jeremy Orloff and Jonathan Bloom

Classification and Regression Trees

Quantitative Understanding in Biology 1.7 Bayesian Methods

MAT Mathematics in Today's World

Hypothesis tests

Machine Learning Lecture 2

6.042/18.062J Mathematics for Computer Science November 28, 2006 Tom Leighton and Ronitt Rubinfeld. Random Variables

CSC321 Lecture 18: Learning Probabilistic Models

INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP

Single Maths B: Introduction to Probability

Probability deals with modeling of random phenomena (phenomena or experiments whose outcomes may vary)

4 Lecture 4 Notes: Introduction to Probability. Probability Rules. Independence and Conditional Probability. Bayes Theorem. Risk and Odds Ratio

Review of probability

18.05 Practice Final Exam

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Probability and Estimation. Alan Moses

Fourier and Stats / Astro Stats and Measurement : Stats Notes

Statistics 251: Statistical Methods

CS 361: Probability & Statistics

Chapter 2.5 Random Variables and Probability The Modern View (cont.)

Probability (Devore Chapter Two)

Fundamentals of Probability CE 311S

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

Human-Oriented Robotics. Probability Refresher. Kai Arras Social Robotics Lab, University of Freiburg Winter term 2014/2015

Whaddya know? Bayesian and Frequentist approaches to inverse problems

Fundamental Probability and Statistics

Conditional probabilities and graphical models

CS4705. Probability Review and Naïve Bayes. Slides from Dragomir Radev

Bayes Formula. MATH 107: Finite Mathematics University of Louisville. March 26, 2014

CS 361: Probability & Statistics

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur

STAT 302 Introduction to Probability Learning Outcomes. Textbook: A First Course in Probability by Sheldon Ross, 8 th ed.

Introduction to Statistical Methods for High Energy Physics

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over

Stephen Scott.

Expression arrays, normalization, and error models

Introduction to Machine Learning

Probability theory for Networks (Part 1) CS 249B: Science of Networks Week 02: Monday, 02/04/08 Daniel Bilar Wellesley College Spring 2008

Machine Learning Lecture 2

Multivariate random variables

Recursive Estimation

Physics 403. Segev BenZvi. Credible Intervals, Confidence Intervals, and Limits. Department of Physics and Astronomy University of Rochester

CS 361: Probability & Statistics

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

Probability. Chapter 1 Probability. A Simple Example. Sample Space and Probability. Sample Space and Event. Sample Space (Two Dice) Probability

CS70: Jean Walrand: Lecture 15b.

Basic Statistics and Probability Chapter 3: Probability

Notes 1 Autumn Sample space, events. S is the number of elements in the set S.)

1. When applied to an affected person, the test comes up positive in 90% of cases, and negative in 10% (these are called false negatives ).

Computer vision: models, learning and inference

Conditional Probability, Independence and Bayes Theorem Class 3, Jeremy Orloff and Jonathan Bloom

Chapter 2 Class Notes

Statistical Distribution Assumptions of General Linear Models

Topic -2. Probability. Larson & Farber, Elementary Statistics: Picturing the World, 3e 1

An introduction to biostatistics: part 1

Bayesian Learning (II)

BAYESIAN DECISION THEORY

ECE531: Principles of Detection and Estimation Course Introduction

3 PROBABILITY TOPICS

STT When trying to evaluate the likelihood of random events we are using following wording.

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

Inconsistency of Bayesian inference when the model is wrong, and how to repair it

Frequentist Statistics and Hypothesis Testing Spring

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

MATH 556: PROBABILITY PRIMER

Bagging. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL 8.7

Discrete Structures for Computer Science

Introduction to Machine Learning

Probability and Information Theory. Sargur N. Srihari

Statistical Modeling and Analysis of Scientific Inquiry: The Basics of Hypothesis Testing

ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing

Practice Exam 1: Long List 18.05, Spring 2014

Tutorial for Lecture Course on Modelling and System Identification (MSI) Albert-Ludwigs-Universität Freiburg Winter Term

COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference

Probability Theory and Applications

Probability theory basics

Transcription:

MIT OpenCourseWare http://ocw.mit.edu HST.582J / 6.555J / 16.456J Biomedical Signal and Image Processing Spring 2007 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

Harvard-MIT Division of Health Sciences and Technology HST.582J: Biomedical Signal and Image Processing, Spring 2007 Course Director: Dr. Julie Greenberg Automated Decision Making Systems Probability, Classification, Model Estimation One the use of statistics: Information and Statistics There are three kind of lies: lies, damned lies, and statistics - Benjamin Disraeli (popularized by Mark Twain) On the value of information: And when we were finished renovating our house, we had only $24.00 left in the bank only because the plumber didn t know about it. - Mark Twain (from a speech paraphrasing one of his books) April 07 HST 582 John W. Fisher III, 2002-2006 2

Elements of Decision Making Systems 1. Probability A quantitative way of modeling uncertainty. 2. Statistical Classification application of probability models to inference. incorporates a notion of optimality 3. Model Estimation we rarely (OK never) know the model beforehand. can we estimate the model from labeled observations. April 07 HST 582 John W. Fisher III, 2002-2006 4 Problem Setup April 07 HST 582 John W. Fisher III, 2002-2006 5

Concepts In many eperiments there is some element of randomness the we are unable to eplain. Probability and statistics are mathematical tools for reasoning in the face of such uncertainty. They allow us to answer questions quantitatively such as Is the signal present or not? Binary : YES or NO How certain am I? Continuous : Degree of confidence We can design systems for which Single use performance has an element of uncertainty Average case performance is predictable April 07 HST 582 John W. Fisher III, 2002-2006 7 Anomalous behavior (eample) How do quantify our belief that these are anomalies? How might we detect them automatically? April 07 HST 582 John W. Fisher III, 2002-2006 8

Detection of signals in noise In which of these plots is the signal present? Why are we more certain in some cases than others? signal signal(?)+noise signal(?)+noise signal(?)+noise April 07 HST 582 John W. Fisher III, 2002-2006 9 Coin Flipping Fairly simple probability modeling problem Binary hypothesis testing Many decision systems come down to making a decision on the basis of a biased coin flip (or N-sided die) April 07 HST 582 John W. Fisher III, 2002-2006 10

Bayes Rule Bayes rule plays an important role in classification, inference, and estimation. A useful thing to remember is that conditional probability relationships can be derived from a Venn diagram. Bayes rule then arises from straightforward algebraic manipulation. April 07 HST 582 John W. Fisher III, 2002-2006 11 Heads/Tails Conditioning Eample If I flip two coins and tell you at least one of them is heads what is the probability that at least one of them is tails? The events of interest are the set of outcomes where at least one of the results is a head. The point of this eample is two-fold Keep track of your sample space and events of interest. Bayes rule tells how to incorporate information in order to adjust probability. 1 st flip 2 nd flip H T H HH HT T TH TT April 07 HST 582 John W. Fisher III, 2002-2006 12

Heads/Tails Conditioning Eample The probability that at least one of the results is heads is ¾ by simple counting. The probability that both of the coins are heads is ¼ 1 st flip 2 nd flip H T H HH HT T TH TT 2 nd flip H T The chance of winning is 1 in 3 Equivalently, the odds of winning are 1 to 2 1 st flip H HH HT T TH TT April 07 HST 582 John W. Fisher III, 2002-2006 13 Defining Probability (Frequentist vs. Aiomatic) The probability of an event is the number of times we epect a specific outcome relative to the number of times we conduct the eperiment. Define: N : the number of trials N A, N B : the number of times events A and B are observed. Events A and B are mutually eclusive (i.e. observing one precludes observing the other). Empirical definition: Probability is defined as a limit over observations Aiomatic definition: Probability is derived from its properties April 07 HST 582 John W. Fisher III, 2002-2006 14

Estimating the Bias of a Coin (Bernoulli Process) April 07 HST 582 John W. Fisher III, 2002-2006 15 4 out of 5 Dentists What does this statement mean? How can we attach meaning/significance to the claim? An eample of a frequentist vs. Bayesian viewpoint The difference (in this case) lies in: The assumption regarding how the data is generated The way in which we can epress certainty about our answer Asympotitically (as we get more observations) they both converge to the same answer (but at different rates). April 07 HST 582 John W. Fisher III, 2002-2006

Sample without Replacement, Order Matters Begin with N empty boes each term represents the number of different choices we have at each stage Start with N empty boes this can be re-written as and then simplified to Choose one from N choices At left: color indicates the order in which we filled the boes. Any sample which fills the same boes, but has a different color in any bo (there will be at least 2) is considered a different sample. Choose another one from N-1 choices : : Choose the kth bo from the N-k+1 remaining choices April 07 HST 582 John W. Fisher III, 2002-2006 17 Sample without Replacement, Order doesn t Matter The sampling procedure is the same as the previous ecept that we don t keep track of the colors. The number of sample draws with the same filled boes is equal to the number of ways we can re-order (permute) the colors. The result is to reduce the total number of draws by that factor. Start with N empty boes Choose one from N choices Choose another one from N-1 choices : : Choose the kth bo from the N-k+1 remaining choices April 07 HST 582 John W. Fisher III, 2002-2006 18

Cumulative Distributions Functions (PDFs) cumulative distribution function (CDF) divides a continuous sample space into two events It has the following properties April 07 HST 582 John W. Fisher III, 2002-2006 19 Probability Density Functions (PDFs) probability density function (PDF) is defined in terms of the CDF Some properties which follow are: April 07 HST 582 John W. Fisher III, 2002-2006 20

Epectation Given a function of a random variable (i.e. g(x)) we define it s epected value as: Epectation is linear (see variance eample once we ve defined joint density function and statistical independence) For the mean, variance, and entropy (continous eamples): Epectation is with regard to ALL random variables within the arguments. This is important for multidimensional and joint random variables. April 07 HST 582 John W. Fisher III, 2002-2006 21 Multiple Random Variables (Joint Densities) We can define a density over multiple random variables in a similar fashion as we did for a single random variable. 1. We define the probability of the event {X AND Y y} as a function of and y. 2. The density is the function we integrate to compute the probability. y v pxy ( u,v) u April 07 HST 582 John W. Fisher III, 2002-2006 22

Given a joint density or mass function over two random variables we can define the conditional density similar to conditional probability from Venn diagrams Conditional Density y o y pxy (, y) This is, it is not of practical use unless we condition on Y equal to a value versus letting it remain a variable (creating an actual density) p XY ( y, o ) is the slice of p XY (, y) along the line y = y o We also get the following relationship pxy (, y o )/ p Y (y o ) April 07 HST 582 John W. Fisher III, 2002-2006 23 Bayes Rule For continuous random variables, Bayes rule is essentially the same (again just an algebraic manipulation of the definition of a conditional density). p ( ) XY y = p YX ( ) p X ( ) p ( y) y This relationship will be very useful when we start looking at classification and detection. Y April 07 HST 582 John W. Fisher III, 2002-2006 24

Binary Hypothesis Testing (Neyman-Pearson) (and a simplification of the notation) 2-Class problems are equivalent to the binary hypothesis testing problem. The goal is estimate which Hypothesis is true (i.e. from which class our sample came from). A minor change in notation will make the following discussion a little simpler. Probability density models for the measurement depending on which hypothesis is in effect. April 07 HST 582 John W. Fisher III, 2002-2006 25 Decision Rules p 1 ( ) p 0 ( ) Decision rules are functions which map measurements to choices. In the binary case we can write it as April 07 HST 582 John W. Fisher III, 2002-2006 26

Error Types p 1 ( ) p 0 ( ) R 1 R 0 R 1 There are 2 types of errors A miss A false alarm April 07 HST 582 John W. Fisher III, 2002-2006 27 Binary Hypothesis Testing (Bayesian) 2-Class problems are equivalent to the binary hypothesis testing problem. Marginal density of X Conditional probability of the hypothesis H i given X The goal is estimate which Hypothesis is true (i.e. from which class our sample came from). A minor change in notation will make the following discussion a little simpler. Prior probabilities of each class Class-conditional probability density models for the measurement April 07 HST 582 John W. Fisher III, 2002-2006 28

A Notional 1-Dimensional Classification Eample p 1 ( ) p 0 ( ) p X ( ) So given observations of, how should select our best guess of H i? Specifically, what is a good criterion for making that assignment? Which H i should we select before we observe. April 07 HST 582 John W. Fisher III, 2002-2006 29 Bayes Classifier p 1 ( ) p 0 ( ) p X ( ) A reasonable criterion for guessing values of H given observations of X is to minimize the probability of error. The classifier which achieves this minimization is the Bayes classifier. April 07 HST 582 John W. Fisher III, 2002-2006 30

Probability of Misclassification p 1 ( ) p 0 ( ) R 1 R 0 R 1 Before we derive the Bayes classifier, consider the probability of misclassification for an arbitrary classifier (i.e. decision rule). The first step is to assign regions of X, to each class. An error occurs if a sample of falls in R i and we assume hypothesis H j. April 07 HST 582 John W. Fisher III, 2002-2006 31 Probability of Misclassification p 1 ( ) p 0 ( ) R 1 R 0 R 1 An error is comprised of two events These are mutually eclusive events so their joint probability is the sum of their individual probabilities April 07 HST 582 John W. Fisher III, 2002-2006 32

Minimum Probability of Misclassification So now let s choose regions to minimize the probability of error. In the second step we just change the region over which integrate for one of the terms (these are complementary events). In the third step we collect terms and note that all underbraced terms in the integrand are non-negative. If we want to choose regions (remember choosing region 1 effectively chooses region 2) to minimize P E then we should set region 1 to be such that the integrand is negative. April 07 HST 582 John W. Fisher III, 2002-2006 33 Minimum Probability of Misclassification Consequently, for minimum probability of misclassification (which is the Bayes error), R 1 is defined as R 2 is the complement. The boundary is where we have equality. Equivalently we can write the condition as when the likelihood ratio for H 1 vs H 0 eceeds the PRIOR odds of H 0 vs H 1 April 07 HST 582 John W. Fisher III, 2002-2006 34

Risk Adjusted Classifiers Suppose that making one type of error is more of a concern than making another. For eample, it is worse to declare H 1 when H 2 is true then vice versa. This is captured by the notion of cost. In the binary case this leads to a cost matri. Derivation We ll simplify by assuming that C 11 =C 22 =0 (there is zero cost to being correct) and that all other costs are positive. Think of cost as a piecewise constant function of X. If we divide X into decision regions we can compute the epected cost as the cost of being wrong times the probability of a sample falling into that region. The Risk Adjusted Classifier tries to minimize the epected cost April 07 HST 582 John W. Fisher III, 2002-2006 35 Risk Adjusted Classifiers Epected Cost is then As in the minimum probability of error classifier, we note that all terms are positive in the integral, so to minimize epected cost choose R 1 to be: If C 10 =C 01 then the risk adjusted classifier is equivalent to the minimum probability of error classifier. Another interpretation of costs is an adjustment to the prior probabilities. Alternatively Then the risk adjusted classifier is equivalent to the minimum probability of error classifier with prior probabilities equal to P1 adj and P0 adj, respectively. April 07 HST 582 John W. Fisher III, 2002-2006 36

Error Probability as an Epectation p 1 ( ) p 0 ( ) R 1 R 0 R 1 Equivalently, we can compute error probability as the epectation of a function of X and H April 07 HST 582 John W. Fisher III, 2002-2006 37 Bayes Classifier vs Risk Adjusted Classifier p 1 ( ) p 2 ( ) R 1 R 2 R 1 April 07 HST 582 John W. Fisher III, 2002-2006 38

Okay, so what. All of this is great. We now know what to do in a few classic cases if some nice person hands us all of the probability models. In general we aren t given the models What do we do? Density estimation to the rescue. While we may not have the models, often we do have a collection of labeled measurements, that is a set of {,H j }. From these we can estimate the class-conditional densities. Important issues will be: How close will the estimate be to the true model. How does closeness impact on classification performance? What types of estimators are appropriate (parametric vs. nonparametric). Can we avoid density estimation and go straight to estimating the decision rule directly? (generative approaches versus discriminative approaches) April 07 HST 582 John W. Fisher III, 2002-2006 39