A Tutorial on Learning with Bayesian Networks

Similar documents
Learning in Bayesian Networks

Bayesian Networks BY: MOHAMAD ALSABBAGH

LEARNING WITH BAYESIAN NETWORKS

Introduction to Bayesian Learning

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Directed Graphical Models

Course Introduction. Probabilistic Modelling and Reasoning. Relationships between courses. Dealing with Uncertainty. Chris Williams.

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Bayesian Networks. Motivation

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

Bayesian Inference and MCMC

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Probabilistic Machine Learning

Lecture 6: Graphical Models: Learning

Mobile Robot Localization

Review: Bayesian learning and inference

CSC321 Lecture 18: Learning Probabilistic Models

Machine Learning

Machine Learning Summer School

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II.

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Learning Bayesian Networks (part 1) Goals for the lecture

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Probabilistic Reasoning Systems

Probabilistic Graphical Networks: Definitions and Basic Results

Bayesian Learning. Two Roles for Bayesian Methods. Bayes Theorem. Choosing Hypotheses

Artificial Intelligence Bayes Nets: Independence

CS 188: Artificial Intelligence Fall 2011

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Approximate Inference

Probability and Inference

CSE 473: Artificial Intelligence Probability Review à Markov Models. Outline

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

CS 5522: Artificial Intelligence II

Readings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008

Bayes Nets: Independence

The Monte Carlo Method: Bayesian Networks

6.867 Machine learning, lecture 23 (Jaakkola)

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Bayesian Approaches Data Mining Selected Technique

Graphical Models and Kernel Methods

BN Semantics 3 Now it s personal! Parameter Learning 1

Introduction to Probabilistic Graphical Models

Introduction to Artificial Intelligence. Unit # 11

Probabilistic Models

PROBABILISTIC REASONING SYSTEMS

Qualifier: CS 6375 Machine Learning Spring 2015

Statistical learning. Chapter 20, Sections 1 4 1

{ p if x = 1 1 p if x = 0

Bayesian Networks. Characteristics of Learning BN Models. Bayesian Learning. An Example

Extensions of Bayesian Networks. Outline. Bayesian Network. Reasoning under Uncertainty. Features of Bayesian Networks.

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

CS 343: Artificial Intelligence

Probabilistic Graphical Models (I)

Bayesian Networks aka belief networks, probabilistic networks. Bayesian Networks aka belief networks, probabilistic networks. An Example Bayes Net

CS 188: Artificial Intelligence. Bayes Nets

Be able to define the following terms and answer basic questions about them:

Latent Variable Models

TDT70: Uncertainty in Artificial Intelligence. Chapter 1 and 2

Probability Theory for Machine Learning. Chris Cremer September 2015

Structure Learning: the good, the bad, the ugly

Probability and Statistics

Quantifying uncertainty & Bayesian networks

Based on slides by Richard Zemel

Uncertainty and Bayesian Networks

Recent Advances in Bayesian Inference Techniques

ECE521 W17 Tutorial 6. Min Bai and Yuhuai (Tony) Wu

2 : Directed GMs: Bayesian Networks

Parameter Learning With Binary Variables

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling

Lecture : Probabilistic Machine Learning

Mobile Robot Localization

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

CS 188: Artificial Intelligence Fall 2009

Learning With Bayesian Networks. Markus Kalisch ETH Zürich

COMP90051 Statistical Machine Learning

Lecture 8: Bayesian Networks

Probabilistic Reasoning

Machine learning: lecture 20. Tommi S. Jaakkola MIT CSAIL

an introduction to bayesian inference

Directed Graphical Models

Rapid Introduction to Machine Learning/ Deep Learning

Learning Bayesian belief networks

Unit 1: Sequence Models

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

Lecture 15. Probabilistic Models on Graph

Naïve Bayes classification

Introduction to Artificial Intelligence (AI)

Tópicos Especiais em Modelagem e Análise - Aprendizado por Máquina CPS863

Statistical Models. David M. Blei Columbia University. October 14, 2014

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Announcements. CS 188: Artificial Intelligence Spring Probability recap. Outline. Bayes Nets: Big Picture. Graphical Model Notation

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Transcription:

A utorial on Learning with Bayesian Networks David Heckerman Presented by: Krishna V Chengavalli April 21 2003 Outline Introduction Different Approaches Bayesian Networks Learning Probabilities and Structure Bayesian Networks and Supervised and Unsupervised Learning Conclusion Probability Classical he probability that a coin lands head (also called rue or Physical Probability) Bayesian Person s belief in the event (also called Personal Probability) Example elephone Example Consider listening to a friend talking over phone to some person x. We try to guess who is he talking to. Based on what they talk, language they use and tone we assign a certain probability to some persons. (Background Info) We update probability till we assign a maximum likely person. Probability Assessment he process of measuring one s degree of belief A wheel with shaded region analogy Illustrating Approaches Classical Approach A Common humbtack Problem. Assert some physical probability Estimate the probability of head/tail from N observations. Estimate for N+1th trial.

Bayesian Approach Bayesian Approach to the same problem Assert the physical probability (unknown) Encode the uncertainty about their physical probability using Bayesian Probability Use rules of probability to compute the required probability Probability Formulas Bayes theorem - he posterior probability for θ given D and a background knowledge ξ : p(θ/d, ξ) = p(θ / ξ ) p (D/ θ, ξ ) P(D / ξ) Note : θ is an uncertain variable whose value corresponds to the possible true values of the physical probability Likelihood Function How good is a particular value of θ? How likely it is capable of generating the observed data L (θ :D ) = P( D/ θ ) Finally we average over to determine the probability of N+1th toss being head his is also called as Expectation of Θ wrt P(Θ / D, ) he likelihood of the sequence H,,H,, L (θ :D ) = θ. (1- θ). θ. (1- θ). (1- θ). Bayesian Network A directed acyclic graph Encodes set of conditional assertions about variables Encodes set of local probability for each variable ogether they define the joint probability distribution for the structure. More about Bayesian Networks Presence of Arcs between variables denote conditional dependence B A C

Why Bayesian Networks? Bayesian Network Handle Incomplete Data Learn Causal Relationships Avoids Overfitting Combination of knowledge with data Conditional (In)Dependence Constructing Bayesian Network Use of Causal Relationships Given a set of variables draw arcs between cause variables and their immediate effects. Use of causal semantics is largely responsible for the success of Bayesian Networks. Constructing a Bayesian Network cont. Identify as many observations as possible Determine the subset that is worthwhile to model Organize those observations into mutually exclusive and collectively exhaustive states Steps in Construction model the system of interest with random variables arrange nodes that represent the random variables by influence number the nodes in order of influence and depth (roots have lowest numbers) obtain prior probabilities at root nodes construct CP's at non-root nodes compute the joint probabilities program Bayes' rules and other equations to be used for the query nodes

Chain Rule of Probability Bayesian Inference Determining various probabilities of interest from the model Θ Query??? Battery Engine urns Over Car Starts X1 X2 Xm Xm+1 Observed Data P(X) = P(s/e)P(e/b)P(b) Bayesian Inference cont.. Example he probability that the battery is up given the car starts P(b/s) = P(b,s) / P(s) P(b,s) computed by summation over t for given b and s P(s) computed by summation over b and t for given s D C High.75 Low.25 Short Long High.95.05 Low.10.90 R Query : Low,Follower L Y N Short,leader.90.1 Short,follower.65.35 Long,leader.5.5 Long,follower.3.7 Leader Follower.90.10 Handling of Incomplete Data Gibbs Sampling Simulates missing data according to available data ime consuming Incomplete Data cont Expectation Maximization Basic idea is to augment observed data with missing data Expectation Step taken wrt missing conditioned on observed values Maximization Step Repeat 1 and 2 until estimation converges he idea is to converge the parameter being estimated to its maximum likelihood. Faster than Gibbs.

Learning Probabilities Using Data to update the probabilities in the network In thumbtack problem no probability is learned but the posterior distribution of variable that represents physical probability of heads is updated We assume no missing data and parameters are independent Known Structure If structure is known the problem is to learn the parameters for the graph from the database Example flipping the coin,we have a database of outcomes, learn distribution of the variable More complex situations, more variables, complex formulas Unknown Structure First select the model. Done by generating a model search space. Evaluate the models given the values in the dataset Model Selection Criteria degree to which the structure fits the prior knowledge and data Use of Relative Posterior Probability Local Criteria Relative Posterior Probability Relative Posterior Probability Local Criteria Medical Example Ailment log log marginal prior likelihood his can be interpreted as Finding 1 Finding 2 Finding 3

Priors Structure Priors Parameter Priors We use them to compute the relative posterior probability Priors on Network Parameters Basic Idea Independence Equivalence Distribution Equivalence Independence Equivalence X Y Z Distribution Equivalence structures have same joint distribution X Z Y Search Methods Greedy Search Greedy Search with restarts Best First Search Monte Carlo Methods Search Method Variable Specific Criteria Make successive Arc Changes Make changes according to valid changes e in E Compute C(X i,pa i,d) to determine (e) Greedy Search (e) is computed for all edges and change e is done where it is maximum If criterion is separable we need to compute only for that change to determine (e) Problem local maximum Way out Random restarts, Simulated annealing Supervised Learning Vs Bayesian When input variables cause outcome,data is complete identical approaches If cause-effect is not found or incomplete data differences arise Small sample sizes handled well by Bayesian Nets.

Unsupervised Learning Example Assumption of hidden variables Generate models with and without hidden variables. Compare score. Eg AutoClass performs data clustering. Obj 1 2 3 4 A F F B F C Applications Limitations Implementations in real life : Used in the Microsoft products(microsoft Office) Medical applications and Biostatistics (BUGS) In NASA Autoclass project for data analysis Collaborative filtering (Microsoft MSBN) Fraud Detection (A) Speech recognition (UC, Berkeley ) Require initial knowledge of many probabilities quality and extent of prior knowledge play an important role Significant computational cost(np hard task) Unanticipated probability of an event is not taken care of. References A tutorial to Learning Bayesian Networks http://www.stat.duke.edu/~chris/research/em. pdf http://www.cs.huji.ac.il/~nir/nips01-utorial/

Integration of Analytical and Empirical Components A utorial on Learning With Bayesian Networks - David Heckerman Mohammad Saif Lecturer: Dr. R. Maclin Feature Presentation: Krishna C. Date : April 21, 2003 Bayesian Networks suitable to combine prior knowledge and evidence from data Bayesian network derived from WordNet Empirical component composed of suitable probabilistic model agged training data Janyce Wiebe, om O Hara and Rebecca Bruce 1 2 WordNet Creation of the Bayesian N/W Synset Relations Synset Synset Let community and town appear in the same sentence ask Assign correct senses to them A new bayesian n/w created for each sentence Includes all synsets of the target word Reachable from them through the hypernymy (is a) hierarchy Animal Is a Mammal Is a Dog CONCEP c 1 CONCEP c 2 3 4 Bayesian Network using WordNet Urban area#2 location#1 region#3 mun#3 gathering#1 Occupn.#1 own#1 own#2 own#3 Com#1 group#1 Com#2 organization#1 Com#3 Probabilities Each child node assigned probability Inversely proportioanal to number of children hypernym has P(town / urban area) = ½ =.5 Empirical classifier developed for each ambiguous word Determines likelihood of each sense based on context Nodes added to each ambiguous word Represent support for each sense derived from the corpus own#1 Com#2 5 6

A utorial on Learning with Bayesian Networks paper by David Heckerman presented by Krishna V Chengavalli commentary by Siddharth Patwardhan Learning Causal Relationships Causal Markov Condition Directed Acyclic Graph C is a causal graph for variables X if the nodes in C have a oneto-one correspondence with the variables in X, and there is an arc from node A to node B in C if and only if A is a direct cause of B. Causal Markov Condition Learning Causal Relations A B Learn Bayesian Network Structure from Data. Infer Causal Relationships from network structure. C X An Illustration Given data containing 2 variables a and b. a is independent of b, if p(b a) = p(b ā) = p(b) If we observe p(b a) p(b ā) we can conclude (by Causal Markov Condition) a b a b H S a b a b Learning Network Structure o learn a structure on a set of variables X. Consider a random variable z, who s states correspond to the various network configurations. Using Bayesian methods, determine the posterior probability distribution of z. Structures with highest probability are the learnt structures.

College Plans!! An Example SEX SEX IQ PE SES IQ PE SES CP CP P(Struct 1 Data) = 1.0 P(Struct 2 Data) = 1.2 10-10