Artificial Intelligence Bayesian Networks

Similar documents
Hidden Markov Models

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING

Engineering Risk Benefit Analysis

EM and Structure Learning

CS47300: Web Information Search and Management

6. Stochastic processes (2)

6. Stochastic processes (2)

CIS587 - Artificial Intellgence. Bayesian Networks CIS587 - AI. KB for medical diagnosis. Example.

Bayesian Networks. Course: CS40022 Instructor: Dr. Pallab Dasgupta

Representing arbitrary probability distributions Inference. Exact inference; Approximate inference

Bayesian belief networks

CS-433: Simulation and Modeling Modeling and Probability Review

Reasoning under Uncertainty

Hidden Markov Models

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI


Introduction to Hidden Markov Models

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

Bayesian decision theory. Nuno Vasconcelos ECE Department, UCSD

A New Evolutionary Computation Based Approach for Learning Bayesian Network

CSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing

Computational issues surrounding the management of an ecological food web

9.913 Pattern Recognition for Vision. Class IV Part I Bayesian Decision Theory Yuri Ivanov

Artificial Intelligence

Probability-Theoretic Junction Trees

Conjugacy and the Exponential Family

Stat 543 Exam 2 Spring 2016

Outline for today. Markov chain Monte Carlo. Example: spatial statistics (Christensen and Waagepetersen 2001)

Stat 543 Exam 2 Spring 2016

Probability Theory (revisited)

Hopfield networks and Boltzmann machines. Geoffrey Hinton et al. Presented by Tambet Matiisen

Markov Chain Monte Carlo Lecture 6

Lecture 12: Classification

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ

CHAPTER 3: BAYESIAN DECISION THEORY

1 Motivation and Introduction

What Independencies does a Bayes Net Model? Bayesian Networks: Independencies and Inference. Quick proof that independence is symmetric

A Bayesian methodology for systemic risk assessment in financial networks

Bayesian decision theory. Nuno Vasconcelos ECE Department, UCSD

Accepted for the Twelfth Conference on Uncertainty in Artificial Intelligence (UAI-96) August 1-3, 1996, Portland, Oregon, USA

Homework Assignment 3 Due in class, Thursday October 15

Marginal Effects in Probit Models: Interpretation and Testing. 1. Interpreting Probit Coefficients

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

Speech and Language Processing

Course 395: Machine Learning - Lectures

Why BP Works STAT 232B

Second order approximations for probability models

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

Lecture 3 January 31, 2017

Bayesian Learning. Smart Home Health Analytics Spring Nirmalya Roy Department of Information Systems University of Maryland Baltimore County

Bayesian Decision Theory

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

Statistical pattern recognition

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Randomness and Computation

Why Bayesian? 3. Bayes and Normal Models. State of nature: class. Decision rule. Rev. Thomas Bayes ( ) Bayes Theorem (yes, the famous one)

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

An adaptive SMC scheme for ABC. Bayesian Computation (ABC)

Journal of Engineering Mechanics, Trans. ASCE, 2010, 136(10): Bayesian Network Enhanced with Structural Reliability Methods: Methodology

Learning from Data 1 Naive Bayes

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1

Statistical learning

Chapter 1. Probability

Clustering with Gaussian Mixtures

Non-Informative Dirichlet Score for learning Bayesian networks

PROBABILITY PRIMER. Exercise Solutions

Identifying Dynamic Sequential Plans

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

Other NN Models. Reinforcement learning (RL) Probabilistic neural networks

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

Lecture 3: Probability Distributions

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

Inference in Multilayer Networks via. Michael Kearns and Lawrence Saul. AT&T Labs Research. Shannon Laboratory. 180 Park Avenue A-235

Exact Inference: Introduction. Exact Inference: Introduction. Exact Inference: Introduction. Exact Inference: Introduction.

Calculation of time complexity (3%)

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

EGR 544 Communication Theory

CS 3750 Machine Learning Lecture 6. Monte Carlo methods. CS 3750 Advanced Machine Learning. Markov chain Monte Carlo

Ryan (2009)- regulating a concentrated industry (cement) Firms play Cournot in the stage. Make lumpy investment decisions

Evaluation for sets of classes

10.34 Fall 2015 Metropolis Monte Carlo Algorithm

Statistics and Probability Theory in Civil, Surveying and Environmental Engineering

Introduction to Algorithms

Limited Dependent Variables

DETERMINATION OF UNCERTAINTY ASSOCIATED WITH QUANTIZATION ERRORS USING THE BAYESIAN APPROACH

Queueing Networks II Network Performance

Cell Biology. Lecture 1: 10-Oct-12. Marco Grzegorczyk. (Gen-)Regulatory Network. Microarray Chips. (Gen-)Regulatory Network. (Gen-)Regulatory Network

Quantifying Uncertainty

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Generative and Discriminative Models. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

18.1 Introduction and Recap

Transcription:

Artfcal Intellgence Bayesan Networks Adapted from sldes by Tm Fnn and Mare desjardns. Some materal borrowed from Lse Getoor. 1

Outlne Bayesan networks Network structure Condtonal probablty tables Condtonal ndependence Inference n Bayesan networks Exact nference Approxmate nference 2

Bayesan Belef Networks (BNs) Defnton: BN = (DAG, CPD) DAG: drected acyclc graph (BN s structure) Nodes: random varables (typcally bnary or dscrete, but methods also exst to handle contnuous varables) Arcs: ndcate probablstc dependences between nodes (lack of lnk sgnfes condtonal ndependence) CPD: condtonal probablty dstrbuton (BN s parameters) Condtonal probabltes at each node, usually stored as a table (condtonal probablty table, or CPT) P ( x π ) where π s the set of all parent nodes of x Root nodes are a specal case no parents, so just use prors n CPD: π = so P ( x π ) = P( x ), 3

Example BN P(A) = 0.001 a P(B A) = 0.3 P(B A) = 0.001 b c P(C A) = 0.2 P(C A) = 0.005 d e P(D B,C) = 0.1 P (D B, C) = 0.01 P(D B,C) = 0.01 P (D B, C) = 0.00001 P(E C) = 0.4 P(E C) = 0.002 Note that we only specfy P(A) etc., not P( A), snce they have to add to one 4

Condtonal ndependence and channg Condtonal ndependence assumpton P( x π, q) = P( x π ) π where q s any set of varables q (nodes) other than x and ts successors π x blocks nfluence of other nodes on x and ts successors (q nfluences x only through varables n π ) Wth ths assumpton, the complete jont probablty dstrbuton of all varables n the network can be represented by (recovered from) local CPDs by channg these CPDs: n P( x,..., xn) = Π = 1P( x π ) 1 5

Channg: Example a b c d e Computng the jont probablty for all varables s easy: P(a, b, c, d, e) = P(e a, b, c, d) P(a, b, c, d) by the product rule = P(e c) P(a, b, c, d) by cond. ndep. assumpton = P(e c) P(d a, b, c) P(a, b, c) = P(e c) P(d b, c) P(c a, b) P(a, b) = P(e c) P(d b, c) P(c a) P(b a) P(a) 6

Topologcal semantcs A node s condtonally ndependent of ts nondescendants gven ts parents A node s condtonally ndependent of all other nodes n the network gven ts parents, chldren, and chldren s parents (also known as ts Markov blanket) The method called d-separaton can be appled to decde whether a set of nodes X s ndependent of another set Y, gven a thrd set Z 7

Inference tasks Smple queres: Computer posteror margnal P(X E=e) E.g., P(NoGas Gauge=empty, Lghts=on, Starts=false) Conjunctve queres: P(X, X j E=e) = P(X e=e) P(X j X, E=e) Optmal decsons: Decson networks nclude utlty nformaton; probablstc nference s requred to fnd P (outcome acton, evdence) Value of nformaton: Whch evdence should we seek next? Senstvty analyss: Whch probablty values are most crtcal? Explanaton: Why do I need a new starter motor? 8

Approaches to nference Exact nference Enumeraton Belef propagaton n polytrees Varable elmnaton Clusterng / jon tree algorthms Approxmate nference Stochastc smulaton / samplng methods Markov chan Monte Carlo methods Genetc algorthms Neural networks Smulated annealng Mean feld theory 9

Drect nference wth BNs Instead of computng the jont, suppose we just want the probablty for one varable Exact methods of computaton: Enumeraton Varable elmnaton Jon trees: get the probabltes assocated wth every query varable 10

Inference by enumeraton Add all of the terms (atomc event probabltes) from the full jont dstrbuton If E are the evdence (observed) varables and Y are the other (unobserved) varables, then: P(X e) = α P(X, E) = α P(X, E, Y) Each P(X, E, Y) term can be computed usng the chan rule Computatonally expensve! 11

Example: Enumeraton a b c d e P(x ) = Σ π P(x π ) P(π ) Suppose we want P(D=true), and only the value of E s gven as true P (d e) = α Σ ABC P(a, b, c, d, e) = α Σ ABC P(a) P(b a) P(c a) P(d b,c) P(e c) Wth smple teraton to compute ths expresson, there s gong to be a lot of repetton (e.g., P(e c) has to be recomputed every tme we terate over C=true) 12

Exercse: Enumeraton p(smart)=.8 smart study p(study)=.6 prepared far p(far)=.9 pass smart smart p(pass ) prep prep prep prep far.9.7.7.2 far.1.1.1.1 p(prep ) smart smart study.9.7 study.5.1 Query: What s the probablty that a student studed, gven that they pass the exam? 13

Summary Bayes nets Structure Parameters Condtonal ndependence Channg BN nference Enumeraton Varable elmnaton Samplng methods 14