COMP538: Introduction to Bayesian Networks

Similar documents
COMP5211 Lecture Note on Reasoning under Uncertainty

Probabilistic Models

Directed Graphical Models

Introduction to Artificial Intelligence. Unit # 11

CS 5522: Artificial Intelligence II

Lecture 5: Bayesian Network

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

CS 188: Artificial Intelligence Fall 2009

CS 343: Artificial Intelligence

Bayesian Networks. Motivation

CS 188: Artificial Intelligence Spring Announcements

Quantifying uncertainty & Bayesian networks

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence

Probabilistic Reasoning. (Mostly using Bayesian Networks)

CS 188: Artificial Intelligence Fall 2008

Announcements. CS 188: Artificial Intelligence Spring Probability recap. Outline. Bayes Nets: Big Picture. Graphical Model Notation

Probabilistic Models. Models describe how (a portion of) the world works

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

Probabilistic Reasoning Systems

Axioms of Probability? Notation. Bayesian Networks. Bayesian Networks. Today we ll introduce Bayesian Networks.

Bayesian networks. Chapter 14, Sections 1 4

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

COMP538: Introduction to Bayesian Networks

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II.

Learning Bayesian Networks (part 1) Goals for the lecture

Review: Bayesian learning and inference

Graphical Models - Part I

COMP538: Introduction to Bayesian Networks

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Rapid Introduction to Machine Learning/ Deep Learning

Informatics 2D Reasoning and Agents Semester 2,

PROBABILISTIC REASONING SYSTEMS

CSE 473: Artificial Intelligence Autumn 2011

Bayesian belief networks. Inference.

Bayesian Network. Outline. Bayesian Network. Syntax Semantics Exact inference by enumeration Exact inference by variable elimination

Introduction to Artificial Intelligence Belief networks

Reasoning Under Uncertainty: Belief Network Inference

Bayesian Networks. Semantics of Bayes Nets. Example (Binary valued Variables) CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-III

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

CMPSCI 240: Reasoning about Uncertainty

CS 5522: Artificial Intelligence II

Bayesian Networks. Material used

Cognitive Systems 300: Probability and Causality (cont.)

Our learning goals for Bayesian Nets

Artificial Intelligence Bayesian Networks

CS 188: Artificial Intelligence Spring Announcements

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

15-780: Graduate Artificial Intelligence. Bayesian networks: Construction and inference

Probabilistic Representation and Reasoning

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Bayesian networks. Chapter Chapter

Bayesian Networks BY: MOHAMAD ALSABBAGH

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Probabilistic representation and reasoning

Uncertainty. Introduction to Artificial Intelligence CS 151 Lecture 2 April 1, CS151, Spring 2004

Bayesian Networks and Decision Graphs

Probabilistic representation and reasoning

Probabilistic Graphical Models (I)

Quantifying Uncertainty & Probabilistic Reasoning. Abdulla AlKhenji Khaled AlEmadi Mohammed AlAnsari

Y. Xiang, Inference with Uncertain Knowledge 1

COMP9414: Artificial Intelligence Reasoning Under Uncertainty

Bayes Nets: Independence

Bayesian belief networks

Graphical Models and Kernel Methods

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

Uncertainty. Logic and Uncertainty. Russell & Norvig. Readings: Chapter 13. One problem with logical-agent approaches: C:145 Artificial

Introduction to Bayesian Networks

Bayesian Networks Representation

Uncertainty and Bayesian Networks

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

Ch.6 Uncertain Knowledge. Logic and Uncertainty. Representation. One problem with logical approaches: Department of Computer Science

last two digits of your SID

School of EECS Washington State University. Artificial Intelligence

Reasoning Under Uncertainty: Bayesian networks intro

Introduction to Probabilistic Reasoning. Image credit: NASA. Assignment

1. what conditional independencies are implied by the graph. 2. whether these independecies correspond to the probability distribution

Introduction to Causal Calculus

Bayesian Networks Introduction to Machine Learning. Matt Gormley Lecture 24 April 9, 2018

Probabilistic Reasoning. Kee-Eung Kim KAIST Computer Science

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Directed Graphical Models or Bayesian Networks

Machine Learning Lecture 14

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics

CS 380: ARTIFICIAL INTELLIGENCE UNCERTAINTY. Santiago Ontañón

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

CPSC340. Probability. Nando de Freitas September, 2012 University of British Columbia

Directed and Undirected Graphical Models

Module 10. Reasoning with Uncertainty - Probabilistic reasoning. Version 2 CSE IIT,Kharagpur

Bayes Nets. CS 188: Artificial Intelligence Fall Example: Alarm Network. Bayes Net Semantics. Building the (Entire) Joint. Size of a Bayes Net

Introduction to Bayesian Learning

Bayesian networks. Chapter AIMA2e Slides, Stuart Russell and Peter Norvig, Completed by Kazim Fouladi, Fall 2008 Chapter 14.

Defining Things in Terms of Joint Probability Distribution. Today s Lecture. Lecture 17: Uncertainty 2. Victor R. Lesser

Bayesian networks (1) Lirong Xia

Undirected Graphical Models

Tópicos Especiais em Modelagem e Análise - Aprendizado por Máquina CPS863

Artificial Intelligence Bayes Nets: Independence

An Introduction to Bayesian Networks: Representation and Approximate Inference

Transcription:

COMP538: Introduction to Bayesian Networks Lecture 2: Bayesian Networks Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering Hong Kong University of Science and Technology Fall 2008 Nevin L. Zhang (HKUST) Bayesian Networks Fall 2008 1 / 55

Objective Objective: Explain the concept of Bayesian networks Reading: Zhang & Guo, Chapter 2 References: Russell & Norvig, Chapter 15 Nevin L. Zhang (HKUST) Bayesian Networks Fall 2008 2 / 55

Outline Probabilistic Modeling with Joint Distribution 1 Probabilistic Modeling with Joint Distribution 2 Conditional Independence and Factorization 3 Bayesian Networks 4 Manual Construction of Bayesian Networks Building structures Causal Bayesian networks Determining Parameters Local Structures 5 Remarks Nevin L. Zhang (HKUST) Bayesian Networks Fall 2008 3 / 55

Probabilistic Modeling with Joint Distribution The probabilistic approach to reasoning under uncertainty A problem domain is modeled by a list of variables X 1, X 2,..., X n, Knowledge about the problem domain is represented by a joint probability P(X 1, X 2,...,X n ). Example: Alarm (Pearl 1988) Story: In LA, burglary and earthquake are not uncommon. They both can cause alarm. In case of alarm, two neighbors John and Mary may call. Problem: Estimate the probability of a burglary based who has or has not called. Variables: Burglary (B), Earthquake (E), Alarm (A), JohnCalls (J), MaryCalls (M). Knowledge required by the probabilistic approach in order to solve this problem: P(B, E, A, J, M) Nevin L. Zhang (HKUST) Bayesian Networks Fall 2008 4 / 55

P(B, E, A, J, M) B E A J M Prob B E A J M Prob y y y y y.00001 n y y y y.0002 y y y y n.000025 n y y y n.0004 y y y n y.000025 n y y n y.0004 y y y n n.00000 n y y n n.0002 y y n y y.00001 n y n y y.0002 y y n y n.000015 n y n y n.0002 y y n n y.000015 n y n n y.0002 y y n n n.0000 n y n n n.0002 y n y y y.00001 n n y y y.0001 y n y y n.000025 n n y y n.0002 y n y n y.000025 n n y n y.0002 y n y n n.0000 n n y n n.0001 y n n y y.00001 n n n y y.0001 y n n y n.00001 n n n y n.0001 y n n n y.00001 n n n n y.0001 y n n n n.00000 n n n n n.996 Nevin L. Zhang (HKUST) Bayesian Networks Fall 2008 5 / 55 Probabilistic Modeling with Joint Distribution Joint probability distribution

Probabilistic Modeling with Joint Distribution Inference with joint probability distribution What is the probability of burglary given that Mary called, P(B=y M=y)? Compute marginal probability: P(B, M) = P(B, E, A, J, M) E,A,J B M Prob y y.000115 y n.000075 n y.00015 n n.99966 Compute answer (reasoning by conditioning): P(B=y M=y) = = P(B=y, M=y) P(M=y).000115.000115 + 000075 = 0.61 Nevin L. Zhang (HKUST) Bayesian Networks Fall 2008 6 / 55

Probabilistic Modeling with Joint Distribution Advantages Probability theory well-established and well-understood. In theory, can perform arbitrary inference among the variables given a joint probability. This is because the joint probability contains information of all aspects of the relationships among the variables. Diagnostic inference: From effects to causes. Example: P(B=y M=y) Predictive inference: From causes to effects. Example: P(M=y B=y) Combining evidence: P(B=y J=y, M=y, E=n) All inference sanctioned by laws of probability and hence has clear semantics. Nevin L. Zhang (HKUST) Bayesian Networks Fall 2008 7 / 55

Probabilistic Modeling with Joint Distribution Difficulty: Complexity in model construction and inference In Alarm example: 31 numbers needed, Quite unnatural to assess: e.g. P(B = y, E = y, A = y, J = y, M = y) In general, Computing P(B=y M=y) takes 29 additions. [Exercise: Verify this.] P(X 1, X 2,..., X n ) needs at least 2 n 1 numbers to specify the joint probability. Exponential model size. Knowledge acquisition difficult (complex, unnatural), Exponential storage and inference. Nevin L. Zhang (HKUST) Bayesian Networks Fall 2008 8 / 55

Outline Conditional Independence and Factorization 1 Probabilistic Modeling with Joint Distribution 2 Conditional Independence and Factorization 3 Bayesian Networks 4 Manual Construction of Bayesian Networks Building structures Causal Bayesian networks Determining Parameters Local Structures 5 Remarks Nevin L. Zhang (HKUST) Bayesian Networks Fall 2008 9 / 55

Conditional Independence and Factorization Chain Rule and Factorization Overcome the problem of exponential size by exploiting conditional independence The chain rule of probabilities: P(X 1, X 2 ) = P(X 1 )P(X 2 X 1 ) P(X 1, X 2, X 3 ) = P(X 1 )P(X 2 X 1 )P(X 3 X 1, X 2 )... P(X 1, X 2,..., X n ) = P(X 1 )P(X 2 X 1 )...P(X n X 1,..., X n 1 ) n = P(X i X 1,..., X i 1 ). i=1 No gains yet. The number of parameters required by the factors is: 2 n 1 + 2 n 1 +... + 1 = 2 n 1. Nevin L. Zhang (HKUST) Bayesian Networks Fall 2008 10 / 55

Conditional Independence and Factorization Conditional Independence About P(X i X 1,..., X i 1 ): Then Domain knowledge usually allows one to identify a subset pa(x i ) {X 1,..., X i 1 } such that Given pa(x i), X i is independent of all variables in {X 1,..., X i 1} \ pa(x i), i.e. Joint distribution factorized. P(X i X 1,..., X i 1) = P(X i pa(x i)) P(X 1, X 2,..., X n ) = n P(X i pa(x i )) The number of parameters might have been substantially reduced. i=1 Nevin L. Zhang (HKUST) Bayesian Networks Fall 2008 11 / 55

Conditional Independence and Factorization Example continued P(B, E, A, J, M) = P(B)P(E B)P(A B, E)P(J B, E, A)P(M B, E, A, J) = P(B)P(E)P(A B, E)P(J A)P(M A)(Factorization) pa(b) = {}, pa(e) = {},pa(a) = {B, E }, pa(j) = {A},pa(M) = {A}. Conditional probabilities tables (CPT) B Y N P(B).01.99 M A P(M A) Y Y.9 N Y.1 Y N.05 N N.95 E Y N P(E).02.98 J A P(J A) Y Y.7 N Y.3 Y N.01 N N.99 A B E P(A B, E) Y Y Y.95 N Y Y.05 Y Y N.94 N Y N.06 Y N Y.29 N N Y.71 Y N N.001 N N N.999 Nevin L. Zhang (HKUST) Bayesian Networks Fall 2008 12 / 55

Conditional Independence and Factorization Example continued Model size reduced from 31 to 1+1+4+2+2=10 Model construction easier Fewer parameters to assess. Parameters more natural to assess:e.g. P(B = Y ), P(E = Y), P(A = Y B = Y, E = Y ), Inference easier.will see this later. P(J = Y A = Y ), P(M = Y A = Y ) Nevin L. Zhang (HKUST) Bayesian Networks Fall 2008 13 / 55

Outline Bayesian Networks 1 Probabilistic Modeling with Joint Distribution 2 Conditional Independence and Factorization 3 Bayesian Networks 4 Manual Construction of Bayesian Networks Building structures Causal Bayesian networks Determining Parameters Local Structures 5 Remarks Nevin L. Zhang (HKUST) Bayesian Networks Fall 2008 14 / 55

Bayesian Networks From Factorizations to Bayesian Networks Graphically represent the conditional independency relationships: construct a directed graph by drawing an arc from X j to X i iff X j pa(x i ) pa(b) = {}, pa(e) = {}, pa(a) = {B, E }, pa(j) = {A}, pa(m) = {A}. B P(B) E P(E) A P(A B, E) J P(J A) M P(M A) Also attach the conditional probability (table) P(X i pa(x i )) to node X i. What results in is a Bayesian network.also known as belief network, probabilistic network. Nevin L. Zhang (HKUST) Bayesian Networks Fall 2008 15 / 55

Formal Definition Bayesian Networks A Bayesian network is: An directed acyclic graph (DAG), where Each node represents a random variable And is associated with the conditional probability of the node given its parents. Recall: In introduction, we said that Bayesian networks are networks of random variables. Nevin L. Zhang (HKUST) Bayesian Networks Fall 2008 16 / 55

Bayesian Networks Understanding Bayesian networks Qualitative level: A directed acyclic graph (DAG) where arcs represent direct probabilistic dependence. B E A J M Absence of arc indicates conditional independence: A variable is conditionally independent of all its nondescendants given its parents. (Will prove this later.) The above DAG implies the following conditional independence relationships: B E; J B A; J E A; M B A; M E A; M J A The following are not implied: J B; J E; J M; B E A Nevin L. Zhang (HKUST) Bayesian Networks Fall 2008 17 / 55

Bayesian Networks Understanding Bayesian networks Quantitative (numerical) level: Conditional probability tables: B Y N P(B).01.99 M A P(M A) Y Y.9 N Y.1 Y N.05 N N.95 E Y N P(E).02.98 J A P(J A) Y Y.7 N Y.3 Y N.01 N N.99 A B E P(A B, E) Y Y Y.95 N Y Y.05 Y Y N.94 N Y N.06 Y N Y.29 N N Y.71 Y N N.001 N N N.999 Describe how parents of a variable influence the variable. Nevin L. Zhang (HKUST) Bayesian Networks Fall 2008 18 / 55

Bayesian Networks Understanding Bayesian Networks As a whole: A Bayesian network represents a factorization of a joint distribution. n P(X 1, X 2,...,X n ) = P(X i pa(x i )) Multiplying all the CPTs results in a joint distribution over all variables. i=1 Nevin L. Zhang (HKUST) Bayesian Networks Fall 2008 19 / 55

Example networks Bayesian Networks Network repository Software Bayesian Network Repository: http://www.cs.huji.ac.il/labs/compbio/repository/ Genie & Smile Network Repository: http://genie.sis.pitt.edu/networks.html Netica Net Library: http://www.norsys.com/netlibrary/index.htm Hugin Case Studies: http://www.hugin.com/cases/ Genie & Smile: http://genie.sis.pitt.edu/. Free. Netica: http://www.norsys.com/. Free version for small nets. Nevin L. Zhang (HKUST) Bayesian Networks Fall 2008 20 / 55

Outline Manual Construction of Bayesian Networks 1 Probabilistic Modeling with Joint Distribution 2 Conditional Independence and Factorization 3 Bayesian Networks 4 Manual Construction of Bayesian Networks Building structures Causal Bayesian networks Determining Parameters Local Structures 5 Remarks Nevin L. Zhang (HKUST) Bayesian Networks Fall 2008 21 / 55

Manual Construction of Bayesian Networks Building structures Procedure for constructing Bayesian network structures 1 Choose a set of variables that describes the application domain. 2 Choose an ordering for the variables. 3 Start with the empty network and add variables to the network one by one according to the ordering. 4 To add the i-th variable X i, 1 Determine a subset pa(x i ) of variables already in the network (X 1,..., X i 1 ) such that P(X i X 1,..., X i 1 ) = P(X i pa(x i )) (Domain knowledge is needed here.) 2 Draw an arc from each variable in pa(x i ) to X i. Nevin L. Zhang (HKUST) Bayesian Networks Fall 2008 23 / 55

Examples Manual Construction of Bayesian Networks Building structures Order 1: B, E, A, J, M pa(b) = {}, pa(e) = {}, pa(a) = {B, E }, pa(j) = {A}, pa(m) = {A}. Nevin L. Zhang (HKUST) Bayesian Networks Fall 2008 24 / 55

Examples Manual Construction of Bayesian Networks Building structures Order 2: M, J, A, B, E pa(m) = {}, pa(j) = {M}, pa(a) = {M, J}, pa(b) = {A}, pa(e) = {A, B}. Nevin L. Zhang (HKUST) Bayesian Networks Fall 2008 25 / 55

Examples Manual Construction of Bayesian Networks Building structures Order 3: M, J, E, B, A pa(m) = {}, pa(j) = {M}, pa(e) = {M, J}, pa(b) = {M, J, E }, pa(a) = {M, J, B, E }. Nevin L. Zhang (HKUST) Bayesian Networks Fall 2008 26 / 55

Manual Construction of Bayesian Networks Building structures Building Bayesian network structures Which order? Naturalness of probability assessment (Howard and Matheson). (B, E, A, J, M) is a good ordering because the following distributions natural to assess P(B), P(E): frequency of burglary and earthquake P(A B, E): property of Alarm system. P(M A): knowledge about Mary P(J A): knowledge about John. The order M, J, E, B, A is not good because, for instance, P(B J, M, E) is unnatural and hence difficult to assess directly. Nevin L. Zhang (HKUST) Bayesian Networks Fall 2008 27 / 55

Manual Construction of Bayesian Networks Building structures Building Bayesian network structures Which order? Minimize number of arcs (J. Q. Smith). The order (M, J, E, B, A) is bad because too many arcs. In contrast, the order (B, E, A, J, M) is good is because it results in a simple structure. Use causal relationships (Pearl): cause come before their effects. The order (M, J, E, B, A) is not good because, for instance, M and J are effects of A but come before A. In contrast, the order (B, E, A, J, M) is good is because it respects the causal relationships among variables. Nevin L. Zhang (HKUST) Bayesian Networks Fall 2008 28 / 55

Manual Construction of Bayesian Networks Exercise in Structure Building Building structures Five variable about what happens to an office building Fire: There is a fire in the building. Smoke: There is smoke in the building. Alarm: Fire alarm goes off. Leave: People leaves the building. Tampering: Someone tamper with the fire system (e.g., open fire exit) Build network structures using the following ordering. Clearly state your assumption. 1 Order 1: tampering, fire, smoke, alarm, leave 2 Order 2: leave, alarm, smoke, fire, tampering Nevin L. Zhang (HKUST) Bayesian Networks Fall 2008 29 / 55

Manual Construction of Bayesian Networks Causal Bayesian networks Building structures Build a Bayesian network using casual relationships: Choose a set of variables that describes the domain. Draw an arc to a variable from each of its DIRECT causes. (Domain knowledge needed here.) What results in is a causal Bayesian network, or simply causal networks, Arcs are interpreted as indicating cause-effect relationships. Nevin L. Zhang (HKUST) Bayesian Networks Fall 2008 30 / 55

Example: Manual Construction of Bayesian Networks Building structures Travel (Lauritzen and Spiegelhalter) Adventure Smoking Tuberculosis Lung Cancer Bronchitis X ray Dyspnea Nevin L. Zhang (HKUST) Bayesian Networks Fall 2008 31 / 55

Manual Construction of Bayesian Networks Use of Causality: Issue 1 Building structures Causality is not a well understood concept. No widely accepted definition. No consensus on Whether it is a property of the world, Or a concept in our minds helping us to organize our perception of the world. Nevin L. Zhang (HKUST) Bayesian Networks Fall 2008 32 / 55

Causality Manual Construction of Bayesian Networks Building structures Sometimes causal relations are obvious: Alarm causes people to leave building. Lung Cancer causes mass on chest X-ray. At other times, they are not that clear. Whether gender influences ability in technical sciences. Most of us believe Smoking cause lung cancer,but the tobacco industry has a different story: Surgeon General (1964) s c Tobacco Industry g s c Nevin L. Zhang (HKUST) Bayesian Networks Fall 2008 33 / 55

Manual Construction of Bayesian Networks Working Definition of Causality Building structures Imagine an all powerful agent, GOD, who can change the states of variables. Example: X causes Y if knowing that GOD has changed the state of X changes your believe about Y. Smoking and yellow finger are correlated. If we force someone to smoke for sometime, his finger will probably become yellow. So Smoking is a cause of yellow finger. If we paint someone s finger yellow, that will not affect our belief on whether s/he smokes. So yellow finger does not cause smoking. Similar example with Earthquake and Alarm Nevin L. Zhang (HKUST) Bayesian Networks Fall 2008 34 / 55

Causality Manual Construction of Bayesian Networks Building structures Coin tossing example revisited: Knowing that GOD somehow made sure the coin drawn from the bag is a fair coin would affect our belief on the results of tossing. Knowing that GOD somehow made sure that the first tossing resulted in a head does not affect our belief on the type of the coin. So arrows go from coin type to results of tossing. Coin Type Toss 1 Result Toss 2 Result... Toss n Result Nevin L. Zhang (HKUST) Bayesian Networks Fall 2008 35 / 55

Manual Construction of Bayesian Networks Use of Causality: Issue 2 Building structures Adventure Smoking Tuberculosis Lung Cancer Bronchitis X ray Dyspnea Causality network structure (building process) Network structure conditional independence (Semantics of BN) The causal Markov assumption bridges causality and conditional independence: A variable is independent of all its non-effects (non-descendants) given its direct causes (i.e. parents). We make this assumption if we determine Bayesian network structure using causality. Nevin L. Zhang (HKUST) Bayesian Networks Fall 2008 36 / 55

Manual Construction of Bayesian Networks Determining probability parameters Determining Parameters Later in this course, we will discuss in detail how to learn parameters from data. We will not be so much concerned with eliciting probability values from experts. However, people do that some times. In such a case, one would want the number of parameters be as small as possible. The rest of the lecture describe two concepts for reducing the number of parameters: Causal Independence. Context-specific independence. Left to students as reading materials. Nevin L. Zhang (HKUST) Bayesian Networks Fall 2008 38 / 55

Manual Construction of Bayesian Networks Determining probability parameters Determining Parameters Sometimes conditional probabilities are given by domain theory Genetic inheritance in Stud (horse) farm (Jensen, F. V. (2001). Bayesian networks and decision graphs. Springer.): P(Child Father, Mother) aa aa AA aa (1, 0, 0) (.5,.5, 0) (0, 1, 0) aa (.5,.5, 0) (.25,.5, 25) (0,.5,.5) AA (0, 1, 0) (0,.5,.5) (0, 0, 1) Genotypes: aa - sick, aa - carrier, AA - pure. Nevin L. Zhang (HKUST) Bayesian Networks Fall 2008 39 / 55

Manual Construction of Bayesian Networks Determining probability parameters Determining Parameters Sometimes, we need to get the numbers from the experts. This is a time-consuming and difficult process. Nonetheless, many networks have been built. See Bayesian Network Repository at http://www.cs.huji.ac.il/labs/compbio/repository/ Combine experts knowledge and data Use assessments by experts as a start point. When data become available, combine data and experts assessments. As more and more data become available, influence of experts is automatically reduced. We will show how this can be done when discussing parameter learning. Note: Much of the course will be about how to learning Bayesian networks (structures and parameters) from data. Nevin L. Zhang (HKUST) Bayesian Networks Fall 2008 40 / 55

Manual Construction of Bayesian Networks Reducing the number of parameters Determining Parameters Let E be a variable in a BN and let C 1, C 2,..., C m be its parents. C 1 C 2... C m E The size of the conditional probability P(E C 1, C 2,...,C m ) is exponential in m. This poses a problem for knowledge acquisition, learning, and inference. In application, there usually exist local structures that one can exploit to reduce the size of conditional probabilities Nevin L. Zhang (HKUST) Bayesian Networks Fall 2008 41 / 55

Manual Construction of Bayesian Networks Causal independence Determining Parameters Causal independence refers to the situation where the causes C 1, C 2..., and C m influence E independently. In other words, the ways by which the C i s influence e are independent. Burglary Earthquake Burglary Earthquake Alarm Alarm-due-to-Burglary (A b ) Alarm-due-to-Earthquake (A e ) Alarm Example: Burglary and earthquake trigger alarm independently. Precise statement: A b and A e are independent. A = A b A e, hence Noisy-OR gate (Good 1960). Nevin L. Zhang (HKUST) Bayesian Networks Fall 2008 42 / 55

Manual Construction of Bayesian Networks Determining Parameters Causal Independence C 1 C 2 C m ξ 1 ξ... 2 ξm * E Formally, C 1, C 2..., and C m are said to be causally independent w.r.t effect E if there exist random variables ξ 1, ξ 2..., and ξ m such that 1 For each i, ξ i probabilistically depends on C i and is conditionally independent of all other C j s and all other ξ j s given C i, and 2 There exists a commutative and associative binary operator over the domain of e such that. E = ξ 1 ξ 2... ξ m Nevin L. Zhang (HKUST) Bayesian Networks Fall 2008 43 / 55

Manual Construction of Bayesian Networks Causal Independence Determining Parameters C 1 C 2 C m ξ 1 ξ... 2 ξm * E In words, individual contributions from different causes are independent and the total influence on effect is a combination of the individual contributions. ξ i contribution of C i to E. * base combination operator. E independent cause (IC) variable. Known as convergent variable in Zhang & Poole (1996). Nevin L. Zhang (HKUST) Bayesian Networks Fall 2008 44 / 55

Manual Construction of Bayesian Networks Causal Independence Determining Parameters Example: Lottery C i : money spent on buying lottery of type i. E: change of wealth. ξ i : change in wealth due to buying the ith type lottery. Base combination operator: +. (Noisy-adder) Other causal independence models: 1 Noisy MAX-gate max 2 Noisy AND-gate Nevin L. Zhang (HKUST) Bayesian Networks Fall 2008 45 / 55

Manual Construction of Bayesian Networks Determining Parameters Causal Independence Theorem (2.1) If C 1, C 2,..., C m are causally independent w.r.t E,then the conditional probability P(E C 1,..., C m ) can be obtained from the conditional probabilities P(ξ i C i ) through P(E=e C 1,..., C m ) = α 1... α k =e P(ξ 1 =α 1 C 1 )...P(ξ m =α m C m ), (1) for each value e of E. Here is the base combination operator of E. See Zhang and Poole (1996) for the proof. Nevin L. Zhang (HKUST) Bayesian Networks Fall 2008 46 / 55

Manual Construction of Bayesian Networks Causal Independence Determining Parameters Notes: Causal independence reduces model size: In the case of binary variable, it reduces model sizes from 2 m+1 to 4m. Examples: CPSC, Carpo It can also be used to speed up inference (Zhang and Poole 1996). Relationship with logistic regression? (Potential term project) Nevin L. Zhang (HKUST) Bayesian Networks Fall 2008 47 / 55

Parent divorcing Manual Construction of Bayesian Networks Determining Parameters Another technique to reduce the number of parameters Top figure: A more natural model for the Adventure Smoking Travel example. But it requires 1+1+2+2+2+4+8=20 parameters. Tuberculosis X ray Adventure Tuberculosis X-ray Tuberculosis or Lung Cancer Lung Cancer Lung Cancer Dyspnea Smoking Dyspnea Bronchitis Bronchitis Low figure: requires only 1+1+2+2+2+4+2+4=18 parameters. The difference would be bigger if, for example, D have other parents. The trick is to introduce a new node (TB-or-LC). It divorces T and L from the other parent B of D. Note that the trick would not help if the new node TB-or-LC has 4 or more states. Nevin L. Zhang (HKUST) Bayesian Networks Fall 2008 48 / 55

Manual Construction of Bayesian Networks Determining Parameters Context specific independence (CSI) Let C be a set of variables. A context on C is an assignment of one value to each variable in C. We denote a context by C=c, where c is a set of values of variables in C. Two contexts are incompatible if there exists a variable that is assigned different values in the contexts. They are compatible otherwise. Nevin L. Zhang (HKUST) Bayesian Networks Fall 2008 49 / 55

Manual Construction of Bayesian Networks Context-specific independence Determining Parameters Let X, Y, Z, and C be four disjoint sets of variables. X and Y are independent given Z in context C=c if whenever P(Y, Z, C=c)>0. P(X Z,Y,C=c) = P(X Z,C=c) When Z is empty, one simply says that X and Y are independent in context C=c. Nevin L. Zhang (HKUST) Bayesian Networks Fall 2008 50 / 55

Manual Construction of Bayesian Networks Context-specific independence Determining Parameters Age Gender Number of Pregnancies Shafer s Example: Number of pregnancies (N) is independent of Age (A) in the context Gender=Male (G =m). P(N A, G=m) = P(N G=m) Number of parameters reduced by ( A 1)( N 1). Nevin L. Zhang (HKUST) Bayesian Networks Fall 2008 51 / 55

Manual Construction of Bayesian Networks Context-specific independence Determining Parameters Income independent of Weather in context Profession=Programmer. P(I W, P=Prog, Q) = P(I P=Prog, Q) Weather Profession Qualification (W) (P) (Q) Income independent of Qualification in context Profession=Farmer. Income (I) P(I W, P, Q) P(I W, P=Farmer, Q) = P(I W, P=Farmer) Number of parameters reduced by: ( W 1) Q ( I 1) + ( Q 1) W ( I 1) CSI can also be exploited to speed up inference (Zhang and Poole 1999). Nevin L. Zhang (HKUST) Bayesian Networks Fall 2008 52 / 55

Outline Remarks 1 Probabilistic Modeling with Joint Distribution 2 Conditional Independence and Factorization 3 Bayesian Networks 4 Manual Construction of Bayesian Networks Building structures Causal Bayesian networks Determining Parameters Local Structures 5 Remarks Nevin L. Zhang (HKUST) Bayesian Networks Fall 2008 53 / 55

Remarks Reasons for the popularity of Bayesian networks It s graphical language is intuitive and easy to understand because it captures what might be called intuitive causality. Pearl (1986) claims that it is a model for human s inferential reasoning: Notations of dependence and conditional dependence are basic to human reasoning. The fundamental structure of human knowledge can be represented by dependence graphs. Nevin L. Zhang (HKUST) Bayesian Networks Fall 2008 54 / 55

Remarks Reasons for the popularity of Bayesian networks In practice, the graphical language Functions as a convenient language to organizes one s knowledge about a domain. Facilitates interpersonal communication. On the other hand, the language is well-defined enough to allow computer processing. Correctness of results guaranteed by probability theory. For probability theory, Bayesian networks provide a whole new perspective: Probability is not really about numbers; It is about the structure of reasoning. (Glenn Shafer) Nevin L. Zhang (HKUST) Bayesian Networks Fall 2008 55 / 55