A.I. in health informatics lecture 3 clinical reasoning & probabilistic inference, II *

Similar documents
The Naïve Bayes Classifier. Machine Learning Fall 2017

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics

COMP61011! Probabilistic Classifiers! Part 1, Bayes Theorem!

Probabilistic Classification

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Probability Based Learning

COMP 328: Machine Learning

Bayesian Learning Features of Bayesian learning methods:

PROBABILISTIC REASONING SYSTEMS

A.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I. kevin small & byron wallace

COMP61011 : Machine Learning. Probabilis*c Models + Bayes Theorem

An AI-ish view of Probability, Conditional Probability & Bayes Theorem

10/18/2017. An AI-ish view of Probability, Conditional Probability & Bayes Theorem. Making decisions under uncertainty.

Lecture 9: Bayesian Learning

Building Bayesian Networks. Lecture3: Building BN p.1

Stephen Scott.

Why Probability? It's the right way to look at the world.

Uncertainty. Logic and Uncertainty. Russell & Norvig. Readings: Chapter 13. One problem with logical-agent approaches: C:145 Artificial

Our Status. We re done with Part I Search and Planning!

Reasoning Under Uncertainty: Conditioning, Bayes Rule & the Chain Rule

Uncertainty and Bayesian Networks

Informatics 2D Reasoning and Agents Semester 2,

Bayesian Approaches Data Mining Selected Technique

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

Bayes Nets: Independence

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

CS 5522: Artificial Intelligence II

Bayesian Networks BY: MOHAMAD ALSABBAGH

CSCE 478/878 Lecture 6: Bayesian Learning and Graphical Models. Stephen Scott. Introduction. Outline. Bayes Theorem. Formulas

CS188 Outline. We re done with Part I: Search and Planning! Part II: Probabilistic Reasoning. Part III: Machine Learning

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Bayesian Networks Representation

CSCE 478/878 Lecture 6: Bayesian Learning

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

Announcements. CS 188: Artificial Intelligence Spring Probability recap. Outline. Bayes Nets: Big Picture. Graphical Model Notation

Bayesian Classification. Bayesian Classification: Why?

CS 188: Artificial Intelligence Fall 2009

T Machine Learning: Basic Principles

Our Status in CSE 5522

Directed Graphical Models

Computer Science CPSC 322. Lecture 18 Marginalization, Conditioning

Bayesian Networks. Motivation

Bayesian Learning. Bayesian Learning Criteria

CS 188: Artificial Intelligence Spring Announcements

CSE 473: Artificial Intelligence

CSE 473: Artificial Intelligence Autumn 2011

CS 188: Artificial Intelligence. Our Status in CS188

Quantifying uncertainty & Bayesian networks

Machine Learning

CS 343: Artificial Intelligence

Naive Bayes Classifier. Danushka Bollegala

Artificial Intelligence: Reasoning Under Uncertainty/Bayes Nets

Bayesian Machine Learning

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

COMP5211 Lecture Note on Reasoning under Uncertainty

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Bayes Theorem & Naïve Bayes. (some slides adapted from slides by Massimo Poesio, adapted from slides by Chris Manning)

Be able to define the following terms and answer basic questions about them:

Learning Bayesian Networks (part 1) Goals for the lecture

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

Directed Graphical Models or Bayesian Networks

CS 188: Artificial Intelligence. Bayes Nets

Today. Statistical Learning. Coin Flip. Coin Flip. Experiment 1: Heads. Experiment 1: Heads. Which coin will I use? Which coin will I use?

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

Probabilistic representation and reasoning

CS 343: Artificial Intelligence

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning

Probability. CS 3793/5233 Artificial Intelligence Probability 1

Artificial Intelligence Bayes Nets: Independence

Bayesian Learning. CSL603 - Fall 2017 Narayanan C Krishnan

Introduction to Machine Learning

CS 5522: Artificial Intelligence II

Review: Bayesian learning and inference

Directed and Undirected Graphical Models

Bayesian Learning. Reading: Tom Mitchell, Generative and discriminative classifiers: Naive Bayes and logistic regression, Sections 1-2.

Bayes Nets: Sampling

Directed Graphical Models

Probabilistic representation and reasoning

Introduction to Artificial Intelligence. Unit # 11

UVA CS / Introduc8on to Machine Learning and Data Mining

Probabilistic Models. Models describe how (a portion of) the world works

Probabilistic Models

CS 188: Artificial Intelligence Spring Announcements

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag

CS 343: Artificial Intelligence

Reinforcement Learning Wrap-up

Bayesian Network Representation

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Approximate Inference

Mining Classification Knowledge

Bayesian Networks. Distinguished Prof. Dr. Panos M. Pardalos

Algorithms for Classification: The Basic Methods

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II.

Machine Learning. Yuh-Jye Lee. March 1, Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU

Transcription:

A.I. in health informatics lecture 3 clinical reasoning & probabilistic inference, II * kevin small & byron wallace * Slides borrow heavily from Andrew Moore, Weng- Keen Wong and Longin Jan Latecki

today probabilistic reasoning Bayesian networks reasoning with uncertainty crucial building block for automated clinical reasoning systems review conditional independence and (a little) graph theory

introduction diagnosing inhalational anthrax observe the following symptoms patient has difficulty breathing patient has a cough patient has a fever patient has diarrhea patient has inflamed mediastinum

introduction diagnoses often stated in probabilities (e.g. 30% chance of inhalational anthrax) additional evidence should change your degree of belief in the diagnosis how much evidence until absolutely certain? Bayesian networks are a methodology for reasoning with uncertainty

review: random variables basic element of probabilisac reasoning refers to an event drawn from a distribuaon modeling the uncertain outcome of the event

Boolean random variables takes the values true or false can be thought of event occurred or event didn t occur examples notation patient has inhalational anthrax A patient has difficulty breathing B patient has a cough C patient has a fever F patient has diarrhea D patient has inflamed mediastinum M

joint probability distribution expresses probability between arbitrary number of variables for each combinaaon, states how probable said combinaaon is A D M P(A,D,M) false false false 0.65 false false true 0.03 false true false 0.1 false true true 0.04 true false false 0.02 true false true 0.06 true true false 0.03 true true true 0.07 must sum to 1

reasoning with the joint with the joint, you can compute anything may need need marginalizaaon and/or Bayes rule to do so A D M P(A,D,M) false false false 0.65 false false true 0.03 false true false 0.1 false true true 0.04 true false false 0.02 true false true 0.06 true true false 0.03 true true true 0.07 p(d) = p(a,d, M) + p(a,d, M) + p( A,D,M) + p( A,D, M) = 0.15 p(a,m D) = p(a, M,D) p(d) = 0.467 p(a M,D) = p(a, M,D) p(m,d) = 0.636

problems with the joint not a compact representaaon requires 2 n - 1 parameters to express requires a lot of data to accurately esamate (condiaonal) independence to the rescue!

independence random variables A and B are independent if p(a,b) = p(a) p(b) p(a B) = p(a) p(b A) = p(b) knowledge regarding outcome of A provides no addiaonal informaaon about the outcome of B

independence independence allows compact representaaon suppose n coin flips joint requires 2 n - 1 parameters if flips independent, requires n parameters

conditional independence random variables A and B are condiaonally independent if p(a,b C) = p(a C) p(b C) p(a B,C) = p(a C) p(b A,C) = p(b C) knowledge regarding outcome of A provides no addiaonal informaaon about the outcome of B

Bayesian networks (finally!) a Bayesian network G=(V,E) is composed of a directed acyclic graph a set of condiaonal probability tables (CPT) A B P(B A) false false 0.01 A P(A) A false true 0.99 true false 0.7 true true 0.3 B D P(D B) false 0.6 true 0.4 B C P(C B) B false false 0.02 false true 0.98 false false 0.4 false true 0.6 C D true false 0.05 true false 0.9 true true 0.95 true true 0.1

semantics of structure A P(A) false 0.6 true 0.4 A B each vertex is a random variable B is a parent of D; D is condiaoned on B C D each vertex has CPT p(x i Parents(X i )) B D P(D B) false false 0.02 false true 0.98 true false 0.05 true true 0.95

conditional probability tables A B E P(B A,E) A B A B P(B A) false false 0.01 false true 0.99 true false 0.7 true true 0.3 A B E false false false 0.2 false false true 0.1 false true false 0.8 false true true 0.9 true false false 0.25 true false true 0.98 true true false 0.75 true true true 0.02 a Boolean variable with n parents has 2 n+1 entries (2 n which must be stored) note what must sum to 1

utility of Bayes nets two important properaes encodes condiaonal independence relaaonships between random variables in the graph compact representaaon of the joint ND 1 P 1 P 2 X ND 2 given parents (P 1,P 2 ), a vertex X is condiaonally independent of its non- descendents (ND 1,ND 2 ) C 1 C 2

calculating the joint can compute joint using Markov condiaon ( ) = p X i = x i Parents( X i ) p X 1 = x 1,, X n = x n n i=1 ( ) p(a,b, C,D) = p(a) p(b A) p( C B) p(d B) = 0.4 0.3 0.9 0.95 = 0.1026 A B C D

inference compuang probabiliaes specified by model generally queries of the form A p(x E) C B D query variable(s) evidence variable(s)

inference compuang probabiliaes specified by model let s try A p(c A) C B D query variable(s) evidence variable(s) to the board!

bad news exact inference is feasible in only small to medium sized networks exact inference in larger networks takes a long Ame can use approximate inference

network structure use domain expert knowledge to design learn it from data not trivial A B good news is clinical experase is high C D

naïve Bayes another opaon is to make strong (condiaonal) independence assumpaons ogen effecave for classificaaon models A B C D F M

Bayes revisited posterior = (prior * likelihood) / evidence p(a B,C,D,F, M) = p(a) p(b,c,d,f,m A) p(b,c,d,f, M) A B C D F M

conditional independence assume input variables condiaonally indendent A B C D F M p(a X) = p(a) n i=1 p(x) p( X i A)

naïve Bayes classification since p(x) is the same for all outcome of A a ˆ = argmax a' A p(a = a') n i=1 ( ) p X i A = a' A B C D F M

number of parameters joint probability distribuaon 2 n 1 = 63 naïve Bayes A 1+ A n i=1 ( X i 1) =11 inference runame O( n A ) to esamate parameters, count (and smooth)

example day outlook temperature humidity wind tennis 1 sunny hot high weak no 2 sunny hot high strong no 3 overcast hot high weak yes 4 rain mild high weak yes 5 rain cool normal weak yes 6 rain cool normal strong no 7 overcast cool normal strong yes 8 sunny mild high weak no 9 sunny cool normal weak yes 10 rain mild normal weak yes 11 sunny mild normal strong yes 12 overcast mild high strong yes 13 overcast hot normal weak yes 14 rain mild high strong no Given today is sunny, cool but windy with high humidity, will we play tennis? [Mitchell s Machine Learning Book]

example Given today is sunny, cool but windy with high humidity, will we play tennis? p(t = no X) p(t = no)p(o = sunny T = no) p(m = cool T = no)p(h = high T = no) p(w = strong T = no) 5 14 3 5 1 5 4 5 3 5 = 2.1e - 2 p(t = yes X) p(t = yes)p(o = sunny T = yes) p(m = cool T = yes)p(h = high T = yes) p(w = strong T = yes) 9 14 2 9 3 9 3 9 3 9 = 5.2e - 3

PopulaAon- wide ANomaly DetecAon and Assessment (PANDA) a detector for a large- scale outdoor release of inhalaaonal anthrax massive Bayes net populaaon- wide means each person has their own subnetwork in the model [Wong et al., KDD 2005]

population-wide approach anthrax is non- contagious reflected in network structure Anthrax Release Location of Release Time of Release Person Model Person Model Person Model

person model Anthrax Release Time Of Release Location of Release 20-30 Age Decile Female Gender 50-60 Male Age Decile Gender Anthrax Infection Home Zip 15213 Other ED Disease Anthrax Infection Home Zip 15146 Other ED Disease Respiratory from Anthrax Respiratory CC From Other Respiratory from Anthrax Respiratory CC From Other ED Admit from Anthrax Respiratory CC False ED Admit from Other ED Admit from Anthrax Respiratory CC Unknown ED Admit from Other Respiratory CC When Admitted Respiratory CC When Admitted Yesterday ED Admission never ED Admission

advanced topics learning network structure generally a search procedure Markov networks considers undirected edges influence diagrams generalized with determinisac veraces more inference variable eliminaaon, approximate inference

more? current standard bearer the classic really interesang