Bayesian Networks. Distinguished Prof. Dr. Panos M. Pardalos

Similar documents
Bayesian Approaches Data Mining Selected Technique

Motivation. Bayesian Networks in Epistemology and Philosophy of Science Lecture. Overview. Organizational Issues

Probabilistic Reasoning. (Mostly using Bayesian Networks)

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Directed Graphical Models

Uncertainty and Bayesian Networks

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Consider an experiment that may have different outcomes. We are interested to know what is the probability of a particular set of outcomes.

Course Introduction. Probabilistic Modelling and Reasoning. Relationships between courses. Dealing with Uncertainty. Chris Williams.

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Naïve Bayes classification

Introduction to Artificial Intelligence. Unit # 11

Bayesian Networks BY: MOHAMAD ALSABBAGH

Data Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition

Probabilistic Graphical Networks: Definitions and Basic Results

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

Bayesian Networks. Motivation

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

Probabilistic Graphical Models (I)

Reasoning with Uncertainty

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Ch.6 Uncertain Knowledge. Logic and Uncertainty. Representation. One problem with logical approaches: Department of Computer Science

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II.

Bayesian Reasoning. Adapted from slides by Tim Finin and Marie desjardins.

Quantifying Uncertainty & Probabilistic Reasoning. Abdulla AlKhenji Khaled AlEmadi Mohammed AlAnsari

{ p if x = 1 1 p if x = 0

Bayesian Inference. Definitions from Probability: Naive Bayes Classifiers: Advantages and Disadvantages of Naive Bayes Classifiers:

CMPSCI 240: Reasoning about Uncertainty

COMP9414: Artificial Intelligence Reasoning Under Uncertainty

Probabilistic Reasoning Systems

DATA MINING LECTURE 10

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Probability Theory Review

Probability. CS 3793/5233 Artificial Intelligence Probability 1

Based on slides by Richard Zemel

PROBABILISTIC REASONING SYSTEMS

Representation. Stefano Ermon, Aditya Grover. Stanford University. Lecture 2

Quantifying uncertainty & Bayesian networks

Probability Theory for Machine Learning. Chris Cremer September 2015

Uncertainty. Russell & Norvig Chapter 13.

Graphical Models - Part I

P (A B) P ((B C) A) P (B A) = P (B A) + P (C A) P (A) = P (B A) + P (C A) = Q(A) + Q(B).

Introduction to Probabilistic Graphical Models

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

1. what conditional independencies are implied by the graph. 2. whether these independecies correspond to the probability distribution

Probabilistic Classification

Learning Bayesian Networks (part 1) Goals for the lecture

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Machine Learning

Econ 325: Introduction to Empirical Economics

Bayesian Networks aka belief networks, probabilistic networks. Bayesian Networks aka belief networks, probabilistic networks. An Example Bayes Net

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Machine Learning Lecture 14

Chris Bishop s PRML Ch. 8: Graphical Models

Artificial Intelligence

Where are we? Knowledge Engineering Semester 2, Reasoning under Uncertainty. Probabilistic Reasoning

An AI-ish view of Probability, Conditional Probability & Bayes Theorem

10/18/2017. An AI-ish view of Probability, Conditional Probability & Bayes Theorem. Making decisions under uncertainty.

Computational Genomics

ARTIFICIAL INTELLIGENCE. Uncertainty: probabilistic reasoning

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence

Bayesian Networks Inference with Probabilistic Graphical Models

Stochastic Methods. 5.0 Introduction 5.1 The Elements of Counting 5.2 Elements of Probability Theory

Uncertainty. Introduction to Artificial Intelligence CS 151 Lecture 2 April 1, CS151, Spring 2004

Uncertainty. Chapter 13

Directed Graphical Models or Bayesian Networks

Be able to define the following terms and answer basic questions about them:

Introduction to Machine Learning

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

Directed Graphical Models. William W. Cohen Machine Learning

Review: Bayesian learning and inference

Introduction to Machine Learning

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics

Uncertainty. Logic and Uncertainty. Russell & Norvig. Readings: Chapter 13. One problem with logical-agent approaches: C:145 Artificial

CS6220: DATA MINING TECHNIQUES

Lecture 8: Bayesian Networks

Lecture 2. Conditional Probability

n How to represent uncertainty in knowledge? n Which action to choose under uncertainty? q Assume the car does not have a flat tire

Advanced classifica-on methods

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

CS6220: DATA MINING TECHNIQUES

Probabilistic Models

Building Bayesian Networks. Lecture3: Building BN p.1

DATA MINING: NAÏVE BAYES

COMP5211 Lecture Note on Reasoning under Uncertainty

1 : Introduction. 1 Course Overview. 2 Notation. 3 Representing Multivariate Distributions : Probabilistic Graphical Models , Spring 2014

Lecture 1: Bayesian Framework Basics

Bayesian Learning. CSL603 - Fall 2017 Narayanan C Krishnan

Origins of Probability Theory

Lecture Notes 1 Basic Probability. Elements of Probability. Conditional probability. Sequential Calculation of Probability

Probabilistic representation and reasoning

Uncertainty and knowledge. Uncertainty and knowledge. Reasoning with uncertainty. Notes

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

CMPSCI 240: Reasoning about Uncertainty

The Naïve Bayes Classifier. Machine Learning Fall 2017

Advanced Probabilistic Modeling in R Day 1

Artificial Intelligence Bayesian Networks

Transcription:

Distinguished Prof. Dr. Panos M. Pardalos Center for Applied Optimization Department of Industrial & Systems Engineering Computer & Information Science & Engineering Department Biomedical Engineering Program, McKnight Brain Institute University of Florida http://www.ise.ufl.edu/pardalos/

Lecture Outline Introduction Probability Space Conditional Probability Bayes Rule Bayesian Approach Example Graph Theory Concepts Definition Inferencing

Introduction are applied in cases of uncertainty when we know certain [conditional] probabilities and are looking for unknown probabilities given specific conditions. Applications: bioinformatics and medicine, engineering, document classification, image processing, data fusion, and decision support systems, etc Examples: Inference: P(Diagnosis Symptom) Anomaly detection: Is this observation anomalous? Active Data Collection: What is the next diagnostic test given a set of observations?

Discrete Random Variables Probability Space Conditional Probability Bayes Rule Let A denote a boolean-valued random variable If A denotes an event, and there is some degree of uncertainty as to whether A occurs. Examples A = Patient has Tuberculosis A = Coin flipping outcome is Head A = France will win World Cup in 2010

Probability Space Conditional Probability Bayes Rule Intuition Behind Probability Intuitively probability of event A equals to the proportion of the outcomes where A is true Ω is the set of all possible outcomes. Its area is P(Ω) = 1 The set colored in orange corresponds to the outcomes where A is true P(A) = Area of orange oval. Clearly 0 P(A) 1.

Kolmogorov s Probability Axioms Probability Space Conditional Probability Bayes Rule The theory of probability as a mathematical discipline can and should be developed from axioms in exactly the same way as geometry and algebra. Andrey Nikolaevich Kolmogorov. Foundations of the Theory of Probability, 1933. 1. P(A) 0, A Ω 2. P(Ω) = 1 3. σ-additivity: Any countable sequence of pairwise disjoint events A 1, A 2,... satisfies ( ) P A i = P(A i ) i i

Probability Space Conditional Probability Bayes Rule Other Ways to Deal with Uncertainty Three-valued logic: True / False / Maybe Fuzzy logic (truth values between 0 and 1) Non-monotonic reasoning (especially focused on Penguin informatics) Dempster-Shafer theory (and an extension known as quasi-bayesian theory) Possibabilistic Logic But...

Coherence of the Axioms Probability Space Conditional Probability Bayes Rule The Kolmogorov s axioms of probability are the only model with this property: Wagers (probabilities) are assigned in such a way that no matter what set of wagers your opponent chooses you are not exposed to certain loss Bruno de Finetti, Probabilismo, Napoli, Logos 14, 1931, pp 163-219. Bruno de Finetti. Probabilism: A Critical Essay on the Theory of Probability and on the Value of Science, (translation of 1931 article) in Erkenntnis, volume 31, September 1989, pp 169-223.

Probability Space Conditional Probability Bayes Rule Consequences of the Axioms P(Ā) = 1 P(A), where Ā = Ω\A P( ) = 0

Probability Space Conditional Probability Bayes Rule Consequences of the Axioms P(A B) = P(A) + P(B) P(A B) P(A) = P(A B) + P(A B)

Conditional Probability Probability Space Conditional Probability Bayes Rule P(A B) = Proportion of the space in which A is true that also have B true Formal definition: P(B A) = P(A B) P(A)

Conditional Probability: Example Probability Space Conditional Probability Bayes Rule Let us draw a card from the deck of 52 playing cards. A =the card is a court card. P(A) = 12/52 = 3/13 B =the card is a queen. P(B) = 4/52 = 1/13, P(B A) = P(B) = 1/13 If we apply the definition we obtain very intuitive result: P(B A) = 1/13 3/13 = 1/3 P(A B) = 1/13 1/13 = 1 C =the suit is spade. P(C) = 1/4 Note that P(C A) = P(C) = 1/4. In other words event C is independent from A.

Probability Space Conditional Probability Bayes Rule Independent Events Definition Two events A and B are independent if and only if P(A B) = P(A)Pr(B). Let us denote independence of A and B as I (A, B). The independence of A and B implies P(A B) = P(A), if P(B) 0 P(B A) = P(B), if P(A) 0 Why?

Probability Space Conditional Probability Bayes Rule Conditional Independence One might observe that people of longer arms tend to have higher levels of reading skills If the age is fixed then this relationship disappears Arm length and reading skills are conditionally independent given the age Definition Two events A and B are conditionally independent given C if and only if P(A B C) = P(A C)Pr(B C). Notation: I (A, B C). P(A B, C) = P(A C), if P(B C) 0 P(B A, C) = P(B C), if P(A C) 0

Probability Space Conditional Probability Bayes Rule Bayes Rule The definition of conditional probability P(A B) = P(A B) P(B) implies the chain rule: P(A B) = P(A B)P(B). By symmetry P(A B) = P(B A)P(A) After we equate right hand sides and do some algebra we obtain Bayes Rule P(B A) = P(A B)P(B) P(A)

Probability Space Conditional Probability Bayes Rule Monty Hall Problem The treasure is equally probable contained in one of the boxes A, B and C, i.e. P(A) = P(B) = P(C). You are offered to chose one of them. Let us say you choose box A. Then the host of the game opens the box which you did not chose and does not contain the treasure

Monty Hall Problem Introduction Probability Space Conditional Probability Bayes Rule For instance, the host has opened box C Then you are offered an option to reconsider your choice. What would you do? In other words what are the probabilities P(A N A,C ) and P(B N A,C )? What does your intuition advise? Now apply Bayes Rule.

Bayesian Approach Example Classification based on Bayes Theorem Let Y denote class variable. For example we want to predict if the borrower will default. Let X = (X 1, X 1,..., X k ) denote the attribute set (i.e. home owner, marital status, annual income, etc) We can treat X and Y as random variables and determine P(Y X ) (posterior probability). Knowing the probability P(Y X ) we can relate the relate the record X to the class that maximizes the posterior probability. How can we estimate P(Y X ) from training data?

Bayesian Approach Example # Home Owner (binary) Marital Status (categorical) Annual Income (continuous) 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes Table: Historical data for default prediction Defaulted Borrower (class)

Bayes approach Introduction Bayesian Approach Example Accurate estimate of posterior probability for every possible combination of attributes and classes requires a very large training set, even for a moderate number of attributes. We can utilize Bayes theorem instead P(Y X ) = P(X Y ) P(Y ) P(X ) P(X ) is a constant and can be calculated as a normalization multiplier P(Y ) can be easily estimated from training set (fraction of training records that belong to each class) P(X Y ) is a more challenging task. Methods: Bayesian Network

Bayesian Approach Example Attributes are assumed to be conditionally independent, given the class label y: thus P(X Y = y) = k P(X i Y = y) i=1 P(Y X ) = P(Y ) k i=1 P(X i Y = y) P(X ) Now we need to estimate P(X i Y ) for i = 1,..., k.

Estimating Probabilities Bayesian Approach Example P(X i = x Y = y) is estimated according to fraction of training instances in class y that take on a particular attribute value x i. For example P(Home Owner=Yes Y = No) = 3/7 P(Marital Status=Single Y = Yes) = 2/3 What about continuous attributes? One solution is to discretize each continuous attribute and then replace value with its corresponding interval (transform continuous attributes into ordinal attributes). How can we discretize?..

Bayesian Approach Example Continuous Attributes Assume a certain type of probability distribution for continuous attribute. For example it can be a Gaussian distribution having p.d.f. f ij (x i ) = 1 e (xi µij ) 2σ ij 2 2πσij 2 parameters for f ij can be estimated based on training records that belongs to class y i

Continuous Attributes Bayesian Approach Example Using approximation P(x i < X i x i + ɛ Y = y i ) = xi +ɛ x i f ij (y)dy f ij (x i )ɛ and the fact that ɛ cancels out when we normalize posterior probability for P(Y X ) allows us to assume P(X i = x i Y = y j ) = f ij (x i )

Example Introduction Bayesian Approach Example The sample mean for annual income attribute with respect to the class No Variance x = 125 + 100 + 70 +... + 75 7 = 110 s 2 = (125 110)2 + (100 110) 2 +... + (75 100) 2 6 s = 2975 = 54.54 Given a test record with income $120K P(Income = 120 No) = = 2975 1 2π54.54 e (120 110)2 2 2975 = 0.0072

Bayesian Approach Example Example Suppose X =(Home Owner=No, Marital Status=Married, Income = $120K) P( Home Owner = Yes Y = No) = 3/7 P( Home Owner = No Y = No) = 4/7 P( Home Owner = Yes Y = Yes) = 0 P( Home Owner = No Y = Yes) = 1 P( Marital Status = Divorced Y = No) = 2/7 P( Marital Status = Married Y = No) = 1/7 P( Marital Status = Single Y = No) = 4/7 P( Marital Status = Divorced Y = Yes) = 2/3 P( Marital Status = Married Y = Yes) = 1/3 P( Marital Status = Single Y = Yes) = 0

Example Introduction Bayesian Approach Example For annual income: class No: x = 110, s 2 = 2975 class Yes: x = 90, s 2 = 25 Class-conditional probabilities: P(X No) = P(Home Owner = No No) P(Status=Married No) P(Annual Income = $120K No) = 4/7 4/7 0.0072 = 0.0024 P(X Yes) = P(Home Owner = No Yes) P(Status=Married Yes) P(Annual Income = $120K Yes) = 1 0 1.2 10 9 = 0

Example Introduction Bayesian Approach Example Posterior probabilities P(No X ) = α 7/10 0.0024 = 0.0016α P(Yes X ) = 0 where α = 1/P(X ) Since P(No X ) > (Yes X ) the record is classified as No

: Discussion Bayesian Approach Example Robust to isolated noise points because such points are averaged out when estimating conditional probabilities from data Can handle missing values by ignoring the example during model building and classification Robust to irrelevant attributes. If X i is irrelevant then P(X i Y ) is almost uniformly distributed and thus P(X i Y ) has little impact on posterior probability Correlated attributes can degrade the performance because conditional independence does not hold. account dependence between attributes

Graph Theory Concepts Definition Inferencing Directed Graph Definition A directed graph or digraph G is an ordered pair G := (V, A) with V is a set, whose elements are called vertices or nodes, A V V is a set of ordered pairs of vertices, called directed edges, arcs, or arrows. V = {V 1, V 2, V 3, V 4, V 5 } E = {(V 1, V 1 ), (V 1, V 4 ), (V 2, V 1 ), (V 4, V 2 ), (V 5, V 5 )} Cycle: V 1 V 4 V 2

Directed Acyclic Graph Graph Theory Concepts Definition Inferencing Definition A directed acyclic graph (DAG), is a directed graph with no directed cycles; that is, for any vertex v, there is no nonempty directed path that starts and ends on v.

Some Graph Theory Notions Graph Theory Concepts Definition Inferencing V 1 and V 4 are parents of V 2 (V 1, V 2 ) E and (V 4, V 2 ) E V 5, V 3 and V 2 are descendants of V 1 V1 is connected to V 5, V 3 and V 2 with directed paths V 4 and V 2 are ancestors of V 3 There exist directed paths from V4 and V 2 to V 3 V 6 and V 4 are nondescendents of V 1 Directed paths from V1 to V 4 and V 6 do not exist

Bayesian Network Definition Graph Theory Concepts Definition Inferencing Elements of Bayesian Network: Directed acyclic graph (DAG) encodes the dependence relationships among a set of variables A probability table associating each node to its immediate parent nodes Each node of the graph represents a variable Each arc asserts the dependence relationship between the pair of variables DAG satisfies Markov condition

The Markov Condition Graph Theory Concepts Definition Inferencing Definition Suppose we have a joint probability distribution P of the random variables in some set V and a DAG G = (V, E). We say that (G, P) satisfies the Markov condition if for each variable X V, {X } is conditionally independent of the set of all its nondescendents (ND X ) given the set of all its parents (PA X ). I ({X }, ND X PA X ). The definitin implies that a root node X, which has no parents, is unconditionally independent from its nondescendents.

Graph Theory Concepts Definition Inferencing Figure: Bayes network: a case study

Markov Condition Example Graph Theory Concepts Definition Inferencing Node Parents Independency E I (E, {D, Hb}) D I (D, E) HD E, D I (HD, Hb {E, D}) Hb?? B?? C?? Note that I (A, B C) implies I (A, D C) whenever D B.

Graph Theory Concepts Definition Inferencing Representation Recall that a naïve Bayes classifier assumes conditional independence of attributes X 1, X 1,..., X k, given target class Y This can be represented using a Bayesian Network below

Inferencing Introduction Graph Theory Concepts Definition Inferencing We can compute joint probability from a Bayesian Network P(X 1, X 2,..., X n ) = n P(X i parents(x i )) i=1 Thus we can compute any conditional probability P(X k X m ) = P(X entries X P(X) k, X m ) matching X = k,xm P(X m ) P(X) entries X matching X k

Example of Inferencing Graph Theory Concepts Definition Inferencing Suppose no prior information about the person is given What is the probability of developing heart disease? α = {Yes, No}, β = {Healthy, Unhealthy} P(HD = Yes) = = α β P(HD = Yes E = α, D = β) P(E = α, D = β) = = α β P(HD = Yes E = α, D = β) P(E = α) P(D = β) = = 0.25 0.7 0.25 + 0.45 0.7 0.75+ +0.55 0.3 0.25 + 0.75 0.30.75 = 0.49

Graph Theory Concepts Definition Inferencing Now let us compute probability of heart disease when the person has high blood pressure γ = {Yes, No} Probability of high blood pressure P(B = High) = = γ P(B = High HD = γ) P(HD = γ) = 0.85 0.49 + 0.2 0.51 = = 0.5185 The posterior probability of heart disease given high blood pressure is P(BP = High HD = Yes) P(HD = Yes) P(HD = Yes BP = High) = P(BP = High) = (0.85 0.49)/0.5185 = 0.8033 =

Complexity Issues Introduction Graph Theory Concepts Definition Inferencing Recall, we can compute any conditional probability: P(X k X m ) = P(X entries X P(X) k, X m ) matching X = k,xm P(X m ) P(X) entries X matching X k Generally it requires exponentially large number of operations We can apply various tricks to reduce complexity But querying of Bayes nets is NP-hard D. M. Chickering, D. Heckerman, C. Meek, Large-Sample Learning of is NP-Hard. Journal of Machine Learning Research, 5 (2004) 1287-1330

Graph Theory Concepts Definition Inferencing Discussion Bayes network is an elegant way of encoding casual probabilistic dependencies. The dependency model can be represented graphically Constructing a network requires effort but adding a new variable is quite straightforward Well suited for incomplete data. Due to probabilistic nature of the model the method is robust to model overfitting

What we have learned Independence and conditional independence Bayes theorem Naïve Bayes classification The definition of a Bayes net Computing probabilities with a Bayes net

Literature Pang-Ning Tan, Michael Steinbach, Vipin Kumar, Introduction to Data Mining, Addison-Wesley, 2005 Finn V. Jensen, Thomas D Nielsen. and Decision Graphs. 2nd Ed., Springer, 2007 Richard E. Neapolitan, Learning, Prentice Hall, 2003 Indea Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan Kaufmann, 1988