Learning the structure of Bayesian belief networks

Similar documents
Learning Bayesian belief networks

Machine learning: Density estimation

Learning Bayesian belief networks

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

Bayesian Assessment of Availabilities and Unavailabilities of Multistate Monotone Systems

Detection and Estimation Theory

Machine Learning. Spectral Clustering. Lecture 23, April 14, Reading: Eric Xing 1

The Greatest Deviation Correlation Coefficient and its Geometrical Interpretation

Chapter Fifiteen. Surfaces Revisited

CIS587 - Artificial Intellgence. Bayesian Networks CIS587 - AI. KB for medical diagnosis. Example.

P 365. r r r )...(1 365

an application to HRQoL

Space of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics

Dirichlet Mixture Priors: Inference and Adjustment

Thermodynamics of solids 4. Statistical thermodynamics and the 3 rd law. Kwangheon Park Kyung Hee University Department of Nuclear Engineering

The Backpropagation Algorithm

Tian Zheng Department of Statistics Columbia University

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING

Clustering Techniques

If there are k binding constraints at x then re-label these constraints so that they are the first k constraints.

CS649 Sensor Networks IP Track Lecture 3: Target/Source Localization in Sensor Networks

Correspondence Analysis & Related Methods

Multistage Median Ranked Set Sampling for Estimating the Population Median

EM and Structure Learning

Khintchine-Type Inequalities and Their Applications in Optimization

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

MULTILAYER PERCEPTRONS

8 Baire Category Theorem and Uniform Boundedness

4 SingularValue Decomposition (SVD)

A. Thicknesses and Densities

Optimization Algorithms for System Integration

Summer Workshop on the Reaction Theory Exercise sheet 8. Classwork

Physics 11b Lecture #2. Electric Field Electric Flux Gauss s Law

INTRODUCTION. consider the statements : I there exists x X. f x, such that. II there exists y Y. such that g y

The geometric construction of Ewald sphere and Bragg condition:

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

Announcements. Stereo (Part 3) Summary of Stereo Constraints. Features on same epipolar line. Stereo matching. Truco Fig. 7.5

Using DP for hierarchical discretization of continuous attributes. Amit Goyal (31 st March 2008)

PHYS 705: Classical Mechanics. Derivation of Lagrange Equations from D Alembert s Principle

3. A Review of Some Existing AW (BT, CT) Algorithms

Clustering. Outline. Supervised vs. Unsupervised Learning. Clustering. Clustering Example. Applications of Clustering

Integral Vector Operations and Related Theorems Applications in Mechanics and E&M

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

PHY126 Summer Session I, 2008

Backward Haplotype Transmission Association (BHTA) Algorithm. Tian Zheng Department of Statistics Columbia University. February 5 th, 2002

Set of square-integrable function 2 L : function space F

The Substring Search Problem

Event Shape Update. T. Doyle S. Hanlon I. Skillicorn. A. Everett A. Savin. Event Shapes, A. Everett, U. Wisconsin ZEUS Meeting, October 15,

Complex atoms and the Periodic System of the elements

SURVEY OF APPROXIMATION ALGORITHMS FOR SET COVER PROBLEM. Himanshu Shekhar Dutta. Thesis Prepared for the Degree of MASTER OF SCIENCE

A Brief Guide to Recognizing and Coping With Failures of the Classical Regression Assumptions

A NOTE ON ELASTICITY ESTIMATION OF CENSORED DEMAND

Multilayer neural networks

RECAPITULATION & CONDITIONAL PROBABILITY. Number of favourable events n E Total number of elementary events n S

MACHINE LEARNING. Mistake and Loss Bound Models of Learning

The conjugate prior to a Bernoulli is. A) Bernoulli B) Gaussian C) Beta D) none of the above

6.6 The Marquardt Algorithm

Physics 202, Lecture 2. Announcements

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

Weighted Infinite Relational Model for Network Data

Optimization Methods: Linear Programming- Revised Simplex Method. Module 3 Lecture Notes 5. Revised Simplex Method, Duality and Sensitivity analysis

APPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14

Stanford University CS259Q: Quantum Computing Handout 8 Luca Trevisan October 18, 2012

Momentum is conserved if no external force

UNIT10 PLANE OF REGRESSION

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ

Temporal-Difference Learning

= y and Normed Linear Spaces

Error Bars in both X and Y

N = N t ; t 0. N is the number of claims paid by the

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

State Estimation. Ali Abur Northeastern University, USA. Nov. 01, 2017 Fall 2017 CURENT Course Lecture Notes

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Course 395: Machine Learning - Lectures

CSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing

Machine Learning 4771

Life-long Informative Paths for Sensing Unknown Environments

Lecture 10 Support Vector Machines II

Optimal System for Warm Standby Components in the Presence of Standby Switching Failures, Two Types of Failures and General Repair Time

Additional File 1 - Detailed explanation of the expression level CPD

Rotational Motion. Lecture 6. Chapter 4. Physics I. Course website:

On Maneuvering Target Tracking with Online Observed Colored Glint Noise Parameter Estimation

Energy in Closed Systems

A Tutorial on Low Density Parity-Check Codes

THE REGRESSION MODEL OF TRANSMISSION LINE ICING BASED ON NEURAL NETWORKS

Efficiency of the principal component Liu-type estimator in logistic

Expectation Maximization Mixture Models HMMs

Problem Set 9 Solutions

Bayesian Uncertainty Quantification and Propagation in Large-Order Finite Element Models using CMS Techniques

Lecture Notes on Linear Regression

To Feel a Force Chapter 7 Static equilibrium - torque and friction

Order Reduction of Continuous LTI Systems using Harmony Search Optimization with Retention of Dominant Poles

The M 2 -tree: Processing Complex Multi-Feature Queries with Just One Index

Maximum Likelihood Estimation

An Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation

Random Variables and Probability Distribution Random Variable

Information Retrieval

Maximum Likelihood Directed Enumeration Method in Piecewise-Regular Object Recognition

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for U Charts. Dr. Wayne A. Taylor

Vector Control. Application to Induction Motor Control. DSP in Motion Control - Seminar

Transcription:

Lectue 17 Leanng the stuctue of Bayesan belef netwoks Mlos Hauskecht mlos@cs.ptt.edu 5329 Sennott Squae Leanng of BBN Leanng. Leanng of paametes of condtonal pobabltes Leanng of the netwok stuctue Vaables: Obsevable values pesent n evey data sample Hdden they values ae neve obseved n data Mssng values values sometmes pesent, sometmes not Next: All vaables ae obsevable 1. Leanng of paametes of BBN 2. Leanng of the model (BBN stuctue

Leanng of BBN paametes. Example. Example: Pneumona Pneumona?? HWBC Pneum Pn???? eve Hgh WBC Palen Pneum eve Pneum Pneum??? Leanng of BBN paametes. Example. Data D (dffeent patent cases: Pal ev Cou HWB Pneu eve Pneumona Hgh WBC

Estmates of paametes of BBN Much lke multple con toss o oll of a dce poblems. A smalle leanng poblem coesponds to the leanng of exactly one condtonal dstbuton Example: eve Pneumona Poblem: How to pck the data to lean? Leanng of BBN paametes. Example. Lean: eve Pneumona Step 1: Select data ponts wth Pneumona Pal ev Cou HWB Pneu eve Pneumona Hgh WBC

Leanng of BBN paametes. Example. Lean: Step 1: eve Pneumona Ignoe the est Pal ev Cou HWB Pneu eve Pneumona Hgh WBC Leanng of BBN paametes. Example. Lean: eve Pneumona Step 2: Select values of the andom vaable defnng the dstbuton of eve Pal ev Cou HWB Pneu eve Pneumona Hgh WBC

Leanng of BBN paametes. Example. Lean: eve Pneumona Step 2: Ignoe the est ev eve Pneumona Hgh WBC Leanng of BBN paametes. Example. Lean: eve Pneumona Step 3a: Leanng the ML estmate ev eve Pneumona Hgh WBC eve Pneumona 0.6 0.4

Leanng of BBN paametes. Bayesan leanng. Lean: eve Pneumona Step 3b: Leanng the Bayesan estmate Assume the po ev θ eve Pneumona ~ Beta(3,4 eve Pneumona Hgh WBC Posteo: Pneumona ~ Beta(6,6 θ eve Model selecton BBN has two components: Stuctue of the netwok (models condtonal ndependences A set of paametes (condtonal chld-paent dstbutons We aleady know how to lean the paametes fo the fxed stuctue But how to lean the stuctue of the BBN? Alam? Buglay Alam Quake Buglay Quake John May John May

Leanng the stuctue Ctea we can choose to scoe the stuctue S Magnal lkelhood maxmze P ( D S, ξ ξ - epesents the po knowledge Maxmum posteo pobablty maxmze P ( S D, ξ P ( S D, ξ P ( D S, ξ P ( S P ( D ξ ξ How to compute magnal lkelhood P ( D S, ξ? Leanng of BBNs Notaton: anges ove all possble vaables 1,..,n j1,..,q anges ove all possble paent combnatons k1,.., anges ove all possble vaable values - paametes of the BBN j s a vecto of epesentng paametes of the condtonal pobablty dstbuton; such that 1 N N j j - a numbe of nstances n the dataset whee paents of vaable X take on values j and X has value k N - po counts (paametes of Beta and Dchlet pos

Magnal lkelhood Integate ove all possble paamete settngs P ( D S, ξ D S,, ξ p( S, ξ d Usng the assumpton of paamete and sample ndependence P ( D S, ξ n q j 1 j 1 j j We can use log-lkelhood scoe nstead log D S, ξ n q j log + log 1 1 1 j Scoe s decomposable along vaables!!! j j k om the d assumpton: D N h 1 1 h x paents, Let numbe of values that attbute x can take q numbe of possble paent combnatons N numbe of cases n D whee x has value k and paents wth values j. n n h x k paents n q j k Magnal lkelhood q j k P ( j, θ N N

om paamete ndependence Pos fo p( j ξ j ( j1,..., j s a vecto of paametes; we use a Dchlet dstbuton wth paametes to epesent t P ( j 1 ξ j1,..., j ξ Dchlet( j,..., j Magnal lkelhood n p( ξ p( ξ 1 j 1 q 1 j Combne thngs togethe: P ( D S P ( D S, P ( S d Γ n q ( N j k Γ n q ( j j Magnal lkelhood N + 1 1 d a j k 1 j j n q d

An altenatve way to compute the magnal lkelhood Integate ove all possble paamete settngs Posteo of paametes, gven data and the stuctue ck Gves the soluton P ( D ξ D, ξ p( ξ d D ξ D, ξ p( ξ p( D, ξ D ξ D, ξ p( ξ D ξ p( D, ξ n q j 1 j 1 j j Leanng the stuctue Lkelhood of data fo the BBN (stuctue and paametes D, ξ measues the goodness of ft of the BBN to data Magnal lkelhood (fo the stuctue only P ( D S, ξ Does not measue only a goodness of ft. It s: dffeent fo stuctues of dffeent complexty Incopoates pefeences towads smple stuctues, mplements Occam s azo!!!!

Occam s Razo Why thee s a pefeence towads smple stuctues? Rewte magnal lkelhood as D S, ξ We know that D S,, ξ p( S, ξ d p( S, ξ d p( ξ d 1 Intepetaton: n moe complex stuctues thee ae moe ways paametes can be set badly he numeato: count of good assgnments he denomnato: count of all assgnments Appoxmatons of pobablstc scoes Appoxmatons of the magnal lkelhood and posteo scoes Infomaton based measues Akake cteon Bayesan nfomaton cteon (BIC Mnmum descpton length (MDL Reflect the tadeoff between the ft to data and pefeence towads smple stuctues Example: Akake cteon. Maxmze: scoe( S log D ML, ξ compl(s Bayesan nfomaton cteon (BIC Maxmze: 1 scoe( S log D ML, ξ compl(s logn 2

Optmzng the stuctue ndng the best stuctue s a combnatoal optmzaton poblem A good featue: the scoe s decomposable along vaables: n q Γ Γ + ( j + ( N log P ( D S, ξ log log 1 j 1 Γ ( j j Γ ( Algothm dea: Seach the space of stuctues usng local changes (addtons and deletons of a lnk Advantage: we do not have to compute the whole scoe fom scatch Recompute the patal scoe fo the affected vaable Optmzng the stuctue. Algothms Geedy seach Stat fom the stuctue wth no lnks Add a lnk that yelds the best scoe mpovement Metopols algothm (wth smulated annealng Local addtons and deletons Avods beng tapped n local optmal