Bayesian decision theory. Nuno Vasconcelos ECE Department, UCSD

Similar documents
Review of probability. Nuno Vasconcelos UCSD

Brief Review of Probability

Bayesian decision theory. Nuno Vasconcelos ECE Department, UCSD

Modeling and reasoning with uncertainty

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES

Reasoning Under Uncertainty: Conditional Probability

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

CS 188: Artificial Intelligence. Our Status in CS188

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

CS 540: Machine Learning Lecture 2: Review of Probability & Statistics

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Probability Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 27 Mar 2012

In today s lecture. Conditional probability and independence. COSC343: Artificial Intelligence. Curse of dimensionality.

Introduction to Mobile Robotics Probabilistic Robotics

Naïve Bayesian. From Han Kamber Pei

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Uncertainty. Variables. assigns to each sentence numerical degree of belief between 0 and 1. uncertainty

CS 188: Artificial Intelligence Fall 2011

CS221 / Autumn 2017 / Liang & Ermon. Lecture 13: Bayesian networks I

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 1

STA 4273H: Statistical Machine Learning

CS188 Outline. CS 188: Artificial Intelligence. Today. Inference in Ghostbusters. Probability. We re done with Part I: Search and Planning!

MODULE -4 BAYEIAN LEARNING

Math 124: Modules Overall Goal. Point Estimations. Interval Estimation. Math 124: Modules Overall Goal.

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Mathematical Formulation of Our Example

Our Status in CSE 5522

CISC 4631 Data Mining

ECE521 week 3: 23/26 January 2017

Bayesian Networks BY: MOHAMAD ALSABBAGH

Lecture Slides - Part 1

Probability Calculus. p.1

Machine Learning. CS Spring 2015 a Bayesian Learning (I) Uncertainty

Reasoning Under Uncertainty: Conditioning, Bayes Rule & the Chain Rule

Bayesian Reasoning. Adapted from slides by Tim Finin and Marie desjardins.

an introduction to bayesian inference

CS188 Outline. We re done with Part I: Search and Planning! Part II: Probabilistic Reasoning. Part III: Machine Learning

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur

Probability, Statistics, and Bayes Theorem Session 3

Probabilistic Graphical Models

Artificial Intelligence CS 6364

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Conditional probabilities and graphical models

CS 188: Artificial Intelligence Fall 2009

L2: Review of probability and statistics

CMU-Q Lecture 24:

Bayesian belief networks

2. A Basic Statistical Toolbox

Bayesian Inference. Introduction

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag

Uncertain Knowledge and Bayes Rule. George Konidaris

Probabilistic Graphical Models. Rudolf Kruse, Alexander Dockhorn Bayesian Networks 153

CS 188: Artificial Intelligence Spring Announcements

P (E) = P (A 1 )P (A 2 )... P (A n ).

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

Probability and Information Theory. Sargur N. Srihari

Probability, Entropy, and Inference / More About Inference

13.4 INDEPENDENCE. 494 Chapter 13. Quantifying Uncertainty

Minimum Error-Rate Discriminant

Statistical Learning. Philipp Koehn. 10 November 2015

Metric-based classifiers. Nuno Vasconcelos UCSD

Some Probability and Statistics

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

Probability Theory Review Reading Assignments

Discrete Probability and State Estimation

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

Why Probability? It's the right way to look at the world.

Random Variables. A random variable is some aspect of the world about which we (may) have uncertainty

ECE-271B. Nuno Vasconcelos ECE Department, UCSD

Probability and (Bayesian) Data Analysis

CSE 473: Artificial Intelligence

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Introduction to Machine Learning

Markov localization uses an explicit, discrete representation for the probability of all position in the state space.

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over

Review of Probability Mark Craven and David Page Computer Sciences 760.

PROBABILITY. Inference: constraint propagation

10-701/ Machine Learning: Assignment 1

CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-II

Introduction to Machine Learning

Learning Causal Bayesian Networks from Observations and Experiments: A Decision Theoretic Approach

Bayesian Learning (II)

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

1 Using standard errors when comparing estimated values

CS 343: Artificial Intelligence

Reasoning Under Uncertainty: Belief Network Inference

Some Probability and Statistics

Chapter 6 Classification and Prediction (2)

CS 540: Machine Learning Lecture 1: Introduction

Sparse Linear Models (10/7/13)

Multicomponent DS Fusion Approach for Waveform EKG Detection

Lecture : Probabilistic Machine Learning

The University of Nottingham

Oriental Medical Data Mining and Diagnosis Based On Binary Independent Factor Model

Our Status. We re done with Part I Search and Planning!

Machine Learning Linear Classification. Prof. Matteo Matteucci

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

Transcription:

Bayesian decision theory Nuno Vasconcelos ECE Department, UCSD

Notation the notation in DHS is quite sloppy e.g. show that ( error = ( error z ( z dz really not clear what this means we will use the following notation Y y ( x0 0 subscripts are random variables ab (uppercase arguments are the values of the random variables (lowercase equivalent to ( = x Y = y ( 0 y0 2

Bayesian decision theory framework for computing optimal decisions on problems involving uncertainty (probabilities basic concepts: world: has states or classes, drawn from a state or class random variable Y fish classification, Y {bass, salmon} student grading, Y {A, B, C, D, F} medical diagnosis {disease A, disease B,, disease M} observer: measures observations (features, drawn from a random process fish classification, = (scale length, scale width R 2 student grading, = (HW,,HW n 1 n R medical diagnosis = (symptom 1,, symptom n R n 3

Bayesian decision theory decision function: observer uses the observations to make decisions about the state of the world y if x Ω and y Ψ the decision function is the mapping such that g : Ω Ψ g ( x = and y o is a prediction of the state y loss function: y o is the cost L(y o,y of deciding for y o when the true state is y usually this is zero if there is no error and positive otherwise goal: to determine the optimal decision function for the loss L(.,. 4

Classification we will focus on classification problems the observer tries to infer the state of the world { 1, M } g ( x = i, i K, we will also mostly consider the 0-1 loss function 1, L[ g( x, y] = 0, but the regression case g( x g ( x the observer tries to predict a continuous y g(x R is basically the same, for a suitable loss function, e.g. squared error = L[ g( x, y] = y g( x y y 2 5

Tools for solving BDT problem probabilistic representations joint distribution class-conditional distributions class probabilities properties of probabilistic inference chain rule of probability marginalization independence Bayes rule 6

Tools for solving BDT problem in order to find optimal decision function we need a probabilistic description of the problem in the most general form this is the joint distribution Y, ( x, i but we frequently decompose it into a combination of two terms ( x, i ( i { ( Y x i Y 14243, Y = i these are the class conditional distribution and class probability class probability prior probability of state i, before observer actually measures anything reflects a prior belief that, if all else is equal, the world will be in state i with probability Y (i 7

Tools for solving BDT problem class-conditional distribution: is the model for the observations given the class or state of the world consider the grading example I know, from experience, that a% of the students will get A s, b% B s, c% C s, and so forth hence, for any student, (A = a/100, (B = b / 100, etc. these are the state probabilities, before I get to see any of the student s work the class-conditional conditional densities are the models for the grades themselves let s assume that the grades are Gaussian, i.e. they are completely l characterized by a mean and a variance 8

Tools for solving BDT problem knowledge of the class changes the mean grade, e.g. I expect A students to have an average HW grade of 90% B students 75% C students 60%, etc this means that Y ( x i = G( x, µ i, σ i.e. the distribution of class i is a Gaussian of mean µ i and variance σ note that the decomposition, Y ( x, i Y ( x i Y ( i = is a special case of a very ypowerful tool in Bayesian inference 9

Tools for solving BDT problem probabilistic representations joint distribution class-conditional distributions class probabilities properties of probabilistic inference chain rule of probability marginalization independence Bayes rule 10

The chain rule of probability is an important consequence of the definition of conditional probability note that, t by recursive application of we can write, Y ( x, y Y ( x y Y ( y = ( x1, x2,..., xn =,..., ( x1 x2,..., x 1, 2,..., n 1 2 n n this is called the chain rule of probability ( 2,...,... 2 3..., x x x n 3 n... ( x n 1 x n ( x n 1 n n n it allows us to modularize inference problems 11

The chain rule of probability e.g. in the medical diagnosis scenario what is the probability that you will be sick and have 104 o of fever? Y, ( sick,104 = ( 104 (104 1 Y sick 1 1 breaks down a hard question (prob of sick and 104 into two easier questions rob (sick 104: everyone knows that this is close to one Y (sick 104 = 1! You have a cold! 12

The chain rule of probability e.g. what is the probability that you will be sick and have 104 o of fever? sick,104 = ( sick 104 (104 Y, ( Y ( 1 1 1 rob(104: still hard, but easier than (sick,104 since we now only have one random variable (temperature does not depend on sickness, it is just the question what is the probability that someone will have 104 o? gather a number of people, measure their temperatures and make an histogram that everyone can use after that 13

Tools for solving BDT problem probabilistic representations joint distribution class-conditional distributions class probabilities properties of probabilistic inference chain rule of probability marginalization independence Bayes rule 14

Tools for solving BDT problems frequently we have problems with multiple random variables e.g. when in the doctor, you are mostly a collection of random variables x 1 : temperature x 2 : blood pressure x 3 : weight x 4 : cough we can summarize this as a vector = (x 1,, x n of n random variables (x 1,, x n is the joint probability distribution but frequently we only care about a subset of 15

Marginalization what if I only want to know if the patient has a cold or not? does not depend on blood pressure and weight all that matters are fever and cough that is, we need to know 1,4 (a,b (cold? we marginalize with respect to a subset of variables (in this case 1 and 4 this is done by summing (or integrating the others out 1, ( x1, x4 =,,, ( x1, x2, x3, x4 4 1 2 (, 1, 4 (,,, d 2d 1 x x x x x x dx dx 4 4 3 4 x3, x4 = 3,,, ( 1 2 3 4 1 2 3 16

Marginalization important equation: seems trivial, but for large models is a major computational asset for probabilistic inference for any question, there are lots of variables which are irrelevant direct evaluation is frequently intractable typically, we combine with the chain rule to explore independence relationships that will allow us to reduce computation independence: and Y are independent random variables if Y ( x y ( x = 17

Tools for solving BDT problem probabilistic representations joint distribution class-conditional distributions class probabilities properties of probabilistic inference chain rule of probability marginalization independence Bayes rule 18

Independence very useful in the design of intelligent systems frequently, knowing makes Y independent d of Z e.g. consider the shivering symptom: if you have temperature you sometimes shiver it is a symptom of having a cold but once you measure the temperature, the two become independent Y,, ( sick,98, shiver =, ( sick 98, shiver 1 S Y 1 S shiver 98 (98 ( 1 1 S = (sick 98 1 shiver 98 Y S (98 ( 1 1 simplifies considerably the estimation of the probabilities 19

Independence combined with marginalization, enables efficient computation e.g to compute Y (sick 1 marginalization 2 chain rule Y ( sick Y, 1, S s Y = ( sick, x, s dx ( sick = Y S ( sick x, s S ( s x 1, 1 1 ( = 3 independence s ( x dx ( sick ik = ( sick ik x ( x ( s x dx ( Y Y S 1 1 s 1 dividing and grouping terms (divide and conquer makes the integral simpler 20

Tools for solving BDT problem probabilistic representations joint distribution class-conditional distributions class probabilities properties of probabilistic inference chain rule of probability marginalization independence Bayes rule 21

Tools for solving BDT problems Bayes rule Y ( y x = Y ( x y ( x ( y is the central equation of Bayesian inference allows us to switch the relation between the variables this is extremely useful e.g. for medical diagnosis doctor needs to know Y ( disease y symptom x this is very complicated because it is not causal we are asking for the probability of cause given consequence Y 22

Tools for solving BDT problems Bayes rule transforms it into the probability of consequence given cause Y ( disease y symptom x = and some other stuff = Y ( symptom x disease y Y ( disease y ( symptom x note that Y (symptom x disease y is easy you can get it out of any medical textbook what about the other stuff? Y (disease y does not depend d on the patient t you can get titb by collecting statistics over the entire population (symptom x is a combination of the two (marginalization ( symptom x Y = y ( symptom x disease y Y ( disease y 23

Bayes rule Bayes rule allows us to combine textbook knowledge with prior knowledge to compute the probability of cause given consequence e.g. if you heard on the radio that there is an outbreak of measles, you increase the prior probability for the measles disease (cause Y (measles since (relation between cause and consequence Y ( patient symptoms measles does not change, Bayes rule will give you the updated Y ( measles patient symptoms that accounts for the new information this is hard if you work directly with the posterior probability 24

Tools for solving BDT problem probabilistic representations joint distribution class-conditional distributions class probabilities properties of probabilistic inference chain rule of probability marginalization independence Bayes rule we are now ready to make optimal decisions! 25

36