A.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I. kevin small & byron wallace

Similar documents
Probability and Estimation. Alan Moses

Bayesian Models in Machine Learning

Probability and Statistics. Terms and concepts

Computational Cognitive Science

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Probability and Statistics. Joyeeta Dutta-Moscato June 29, 2015

Bayesian Updating with Discrete Priors Class 11, Jeremy Orloff and Jonathan Bloom

Computational Cognitive Science

CS 361: Probability & Statistics

Stephen Scott.

Grundlagen der Künstlichen Intelligenz

Probability Theory for Machine Learning. Chris Cremer September 2015

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Lecture 1: Probability Fundamentals

NPFL108 Bayesian inference. Introduction. Filip Jurčíček. Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic

COMP90051 Statistical Machine Learning

Bayesian Learning (II)

Bayesian Inference. Introduction

Bayesian Decision Theory

Formal Modeling in Cognitive Science

Formal Modeling in Cognitive Science Lecture 19: Application of Bayes Theorem; Discrete Random Variables; Distributions. Background.

Some Probability and Statistics

Machine Learning

ECE521 week 3: 23/26 January 2017

Bayesian Methods: Naïve Bayes

MLE/MAP + Naïve Bayes

CS 361: Probability & Statistics

Naïve Bayes classification

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher

7 Gaussian Discriminant Analysis (including QDA and LDA)

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

PMR Learning as Inference

CSC321 Lecture 18: Learning Probabilistic Models

Human-Oriented Robotics. Probability Refresher. Kai Arras Social Robotics Lab, University of Freiburg Winter term 2014/2015

Advanced Probabilistic Modeling in R Day 1

A.I. in health informatics lecture 3 clinical reasoning & probabilistic inference, II *

An Introduction to Bayesian Machine Learning

Bayes Formula. MATH 107: Finite Mathematics University of Louisville. March 26, 2014

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21

Review of Probability Mark Craven and David Page Computer Sciences 760.

Probability and Information Theory. Sargur N. Srihari

COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

CS 188: Artificial Intelligence Spring Today

PHASES OF STATISTICAL ANALYSIS 1. Initial Data Manipulation Assembling data Checks of data quality - graphical and numeric

Intro to Probability. Andrei Barbu

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

Machine Learning Linear Classification. Prof. Matteo Matteucci

Review of probabilities

Conditional Probability

Why Bayesian? Rigorous approach to address statistical estimation problems. The Bayesian philosophy is mature and powerful.

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10

Naïve Bayes Classifiers and Logistic Regression. Doug Downey Northwestern EECS 349 Winter 2014

CPSC 340: Machine Learning and Data Mining

With Question/Answer Animations. Chapter 7

CS 6140: Machine Learning Spring What We Learned Last Week. Survey 2/26/16. VS. Model

Performance evaluation of binary classifiers

CS 6140: Machine Learning Spring 2016

Machine Learning

Introduction to Machine Learning. Lecture 2

Some Probability and Statistics

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

MLE/MAP + Naïve Bayes

Discrete Binary Distributions

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation

Introduction to Probability and Statistics (Continued)

CSC 411 Lecture 3: Decision Trees

Introduction to Bayesian Learning. Machine Learning Fall 2018

Qualifier: CS 6375 Machine Learning Spring 2015

INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP

Review: Probability. BM1: Advanced Natural Language Processing. University of Potsdam. Tatjana Scheffler

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Non-parametric Methods

The Naïve Bayes Classifier. Machine Learning Fall 2017

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Machine Learning and Data Mining. Bayes Classifiers. Prof. Alexander Ihler

Computational Perception. Bayesian Inference

Language as a Stochastic Process

Machine Learning 4771

An AI-ish view of Probability, Conditional Probability & Bayes Theorem

10/18/2017. An AI-ish view of Probability, Conditional Probability & Bayes Theorem. Making decisions under uncertainty.

Probability. Lecture Notes. Adolfo J. Rumbos

CHAPTER 2 Estimating Probabilities

Physics 403. Segev BenZvi. Choosing Priors and the Principle of Maximum Entropy. Department of Physics and Astronomy University of Rochester

Introduction to Probabilistic Machine Learning

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018

MATH MW Elementary Probability Course Notes Part I: Models and Counting

Classification and Regression Trees

Today. Statistical Learning. Coin Flip. Coin Flip. Experiment 1: Heads. Experiment 1: Heads. Which coin will I use? Which coin will I use?

Review of probability

Uncertain Knowledge and Bayes Rule. George Konidaris

Introduction to Bayesian Learning

Lecture 11: Probability Distributions and Parameter Estimation

Introduction to Machine Learning

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

A Brief Review of Probability, Bayesian Statistics, and Information Theory

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

Transcription:

A.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I kevin small & byron wallace

today a review of probability random variables, maximum likelihood, etc. crucial for clinical reasoning and AI and also AI-clinical reasoning systems intro to (review of?) 2x2 tables, basic probability, counting, etc.

probability (& statistics) quantify uncertainty both from measurement and sampling errors how likely is it that an event e will occur? at the end of the day: counting let s consider an example

what are the chances? p(x) denotes the probability of event X occurring usually interpreted as: the fraction of outcomes (out of all possible outcomes) in which X occurs philosophically loaded, but that s not what this class is about

probability: some basics sum rule product rule p(x) = Y p(x,y) Probability of Y given X p(x,y) = p(y X) p(x) Bayes theorem p(y X) = more on this later! p(x Y)p(Y) p(x)

random variables random variables follow some latent distribution we usually assume we know its form, but not its parameters this is the generative story

expectations E[X] denotes the expectation of a random variable X it is the average of the values it can take, weighted by their respective probabilities of occurring Ε[X] = p(x) x Ε[X] = p(x) x discrete, finite case con<nuous case

expectations expectations are vital for formal reasoning should I play the lottery? - perhaps if the expected value of lottery ticket is >0 - E[X] = (-$1)*(.9999)... + (+$10000)*(.0 01) - probably not

probability: balls & bins suppose we are (inexplicably!) given a jar filled with red and blue balls we draw n balls* m are blue what proportion of the balls in the jar are blue? * with replacement

probability: balls & bins m/n (observed proportion) is the best (maximum likelihood) estimate of p - ML estimates usually wear hats wait, why is this the best? p ˆ

maximum likelihood the maximum likelihood approach: given observed data D, pick the parameters w that maximize the probability of D, p(d w)

balls & bins, ml style let p denote the probability of drawing a blue ball (equiv., proportion of blue balls) probability of a draw being: blue = p; red =1-p probability of an observed m blue balls in a sequence of N draws p m (1 p) N m

balls & bins, ml style thus the likelihood function L is p m (1 p) N m if you take the derivative w.r.t. p and set it to zero (maximizing the likelihood function), you get: p=m/n note that this ignores prior/external information!

a word on Bayes suppose we flip a coin that we believe to be fair three times, and it comes up heads each time the ML estimate will imply that the coin will always land on heads but incorporating prior beliefs can avoid this the Bayesian way: posterior α likelihood x prior this example in bishop, 2006

the Gaussian we used the Bernoulli (for binary events) is what we used in the balls & jar example the Gaussian (a/k/a normal) is the distribution parameterized by μ and σ 2 people typically just assume everything is normal. sometimes wrongly

many other distributions! Poisson p(# of events occurring in an interval) Beta often used as a prior in Bayesian analysis Multinomial generalization of binomial (which we saw before)

the 2x2 table confusingly called the confusion matrix reference standard ( truth ) posi%ve nega%ve posi%ve true posi<ve false posi<ve nega%ve false nega<ve true nega<ve

common metrics reference standard ( truth ) posi%ve nega%ve posi%ve true posi<ve false posi<ve nega%ve false nega<ve true nega<ve sensitivity a/k/a recall = tp tp+ fn people misinterpret these all the time! see homework specificity = tn tn+ fp precision = tp tp+ fp accuracy = tp + tn tp+ fp + tn + fn

choosing between tests recall: sensitivity tp tp+ fn specificity tn tn+ fp but these depend on a cut-off! how to assess test performance independent of this threshold?

ROC-space ROC Receiver Operating Characteristic (ROC) curve ROC space is 2-d: sensitivity and 1-specificity ideal tests occupy upper right region

ROC-space images from http://www.medcalc.org/manual/roc-curves.php

AUC

uncommon metrics reference standard ( truth ) posi%ve nega%ve posi%ve true posi<ve false posi<ve nega%ve false nega<ve true nega<ve negative predictive value (npv) = tn tn+ fn likelihood ratio (+) = sensitivity 1 specificity likelihood ratio (-) = specificity 1 sensitivity

humans are not good at probability People are [Tversky & Hahneman]: insensitive to priors p(y)p(y x) Whoops, forgot about this guy insensitive to sample size the probability that the average in a sample is more extreme than the population average increases with small samples Gambler s fallacy

misleading intuitions true hiv status hiv test posi%ve nega%ve posi%ve true posi<ve false posi<ve nega%ve false nega<ve true nega<ve assume test has 98% sensitivity and 99% specificity great test, right?

misleading intuitions HIV prevalence in USA = ~.3% suppose someone tests positive; what s the probability that they have HIV? X = def event of a positive test result; Y = def event that the patient actually has HIV; ~Y = def does not have HIV P(Y X) = P(X Y)P(Y) P(X Y)P(Y) + P(X ~ Y)P(~ Y) =.99.003.99.003 +.01.997.23

misleading intuitions so: what is the likelihood that the a person with positive test results has HIV? - 23% -- it s 3x more likely that they don t have HIV, despite a positive test result! P(~ Y X) = = classic example of insensitivity to priors! P(X ~ Y)P(~ Y) P(X ~ Y)P(~ Y) + P(X Y)P(Y).01.997.01.997 +.99.003.77

automatic diagnosis assume we have a database of diseases, their prevalences and test characteristics. really naïve automated diagnosis system: diagnose (test result X) this is how the Leeds abdominal pain system worked! return arg max Y P(X Y)P(Y) // over all diseases Y

a calculus for decisions need to integrate utilities and costs into a probabilistic framework next up: decision theory

expected-value decision making patient has disease D decision node expected outcome A expected outcome B chance node the outcomes are stochastic, so we depend on the expected outcome

expectations (review) E[X] denotes the expectation of a random variable X it is the average of the values it can take, weighted by their respective probabilities of occurring Ε[X] = p(x) x Ε[X] = p(x) x discrete, finite case con<nuous case

decision making made easy 1. calculate E[x] over all decision alternatives x 2. select the decision x* that maximizes E[x]

but creating the decision tree is non-trivial - we ll come back to this we ve not taken into account variance of the expectation, nor values of outcomes, nor costs

utilities encode arbitrary (potentially subjective) pay-offs - by convention, between 0 and 1 - perfect health = 1; death = 0 where do utilities come from?

the standard gamble [von neumann & morgenstern] the (morbid) gamble: we will flip a coin with bias p tails implies death (0); heads perfect health (1) let c = def some relevant clinical state (e.g. paralysis ) the utility of c is the value p* at which the patient has no preference between: 1) c (being paralyzed) 2) taking the gamble

the standard gamble [von neumann & morgenstern] c at what value of p is the patient indifferent to which path is taken? death! perfect health!

time trade-off technique [sox et a.] ask the patient to decide the length of life t in perfect health equivalent to a given length of life with c, t c - e.g., I find 6 months of life paralyzed to be equivalent to 1 year of life with full mobility utility = t c /t - above: 6/12 =.5

utility-maximization associate chance nodes with utilities select-treatment () return arg max T Expected Utility[T] expected utility is defined recursively down branches (until terminal nodes are reached)