K. Nishijima. Definition and use of Bayesian probabilistic networks 1/32

Similar documents
Review: Bayesian learning and inference

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Directed Graphical Models

Introduction to Bayesian Learning

Chris Bishop s PRML Ch. 8: Graphical Models

Bayesian Networks Basic and simple graphs

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

LEARNING WITH BAYESIAN NETWORKS

Learning Bayesian network : Given structure and completely observed data

Naïve Bayes classification

Probabilistic Graphical Models

Optimal Reliability of Components of Complex Systems Using Hierarchical System Models

Representation. Stefano Ermon, Aditya Grover. Stanford University. Lecture 2

Bayesian Models in Machine Learning

Classical and Bayesian inference

Probabilistic Graphical Models (I)

{ p if x = 1 1 p if x = 0

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

CS839: Probabilistic Graphical Models. Lecture 2: Directed Graphical Models. Theo Rekatsinas

NPFL108 Bayesian inference. Introduction. Filip Jurčíček. Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic

Machine Learning. Bayes Basics. Marc Toussaint U Stuttgart. Bayes, probabilities, Bayes theorem & examples

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Probabilistic Graphical Models

Bayesian Networks. Motivation

Probabilistic Reasoning. (Mostly using Bayesian Networks)

CS Lecture 3. More Bayesian Networks

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

An Introduction to Bayesian Machine Learning

Summary of the Bayes Net Formalism. David Danks Institute for Human & Machine Cognition

Probabilistic Graphical Models

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

PMR Learning as Inference

Graphical Models and Kernel Methods

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

2 : Directed GMs: Bayesian Networks

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

PROBABILISTIC REASONING SYSTEMS

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

Based on slides by Richard Zemel

Directed and Undirected Graphical Models

Bayesian Networks 2:

Learning Bayesian Networks (part 1) Goals for the lecture

4: Parameter Estimation in Fully Observed BNs

Conditional probabilities and graphical models

p L yi z n m x N n xi

Lecture 5: Bayesian Network

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

Directed Probabilistic Graphical Models CMSC 678 UMBC

Outline. Spring It Introduction Representation. Markov Random Field. Conclusion. Conditional Independence Inference: Variable elimination

CS 361: Probability & Statistics

Bayesian Approach 2. CSC412 Probabilistic Learning & Reasoning

Time Series and Dynamic Models

Introduction to Bayesian inference

Bayesian Networks: Representation, Variable Elimination

Lecture 8: December 17, 2003

Statistical Models. David M. Blei Columbia University. October 14, 2014

Latent Dirichlet Allocation Introduction/Overview

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Probabilistic Graphical Networks: Definitions and Basic Results

Readings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

Conditional Independence

PHASES OF STATISTICAL ANALYSIS 1. Initial Data Manipulation Assembling data Checks of data quality - graphical and numeric

Probabilistic Models

Introduction to Probabilistic Machine Learning

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

CSC321 Lecture 18: Learning Probabilistic Models

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Probabilistic Graphical Models

Events A and B are independent P(A) = P(A B) = P(A B) / P(B)

Grundlagen der Künstlichen Intelligenz

COMP90051 Statistical Machine Learning

STA 4273H: Statistical Machine Learning

Inference and estimation in probabilistic time series models

Quantifying uncertainty & Bayesian networks

Graphical Models 359

Bayesian Inference: What, and Why?

Rapid Introduction to Machine Learning/ Deep Learning

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

The Monte Carlo Method: Bayesian Networks

CPSC 540: Machine Learning

Today. Why do we need Bayesian networks (Bayes nets)? Factoring the joint probability. Conditional independence

Bayesian Analysis for Natural Language Processing Lecture 2

Decision-Theoretic Specification of Credal Networks: A Unified Language for Uncertain Modeling with Sets of Bayesian Networks

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence

10708 Graphical Models: Homework 2

BAYESIAN MACHINE LEARNING: THEORETICAL FOUNDATIONS. Bayesian Machine Learning, Frederic Pennerath

Learning in Bayesian Networks

Introduction to Probabilistic Graphical Models

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

Qualifier: CS 6375 Machine Learning Spring 2015

Latent Variable Models

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models

CS221 / Autumn 2017 / Liang & Ermon. Lecture 15: Bayesian networks III

1. what conditional independencies are implied by the graph. 2. whether these independecies correspond to the probability distribution

Advanced Probabilistic Modeling in R Day 1

Intelligent Systems:

BN Semantics 3 Now it s personal! Parameter Learning 1

Learning from Sensor Data: Set II. Behnaam Aazhang J.S. Abercombie Professor Electrical and Computer Engineering Rice University

Quantifying Uncertainty & Probabilistic Reasoning. Abdulla AlKhenji Khaled AlEmadi Mohammed AlAnsari

Transcription:

The Probabilistic Analysis of Systems in Engineering 1/32 Bayesian probabilistic bili networks Definition and use of Bayesian probabilistic networks K. Nishijima nishijima@ibk.baug.ethz.ch

2/32 Today s agenda Probabilistic representation of systems Representation of systems with (random) variables Observable and unobservable variables joint probability, conditional probability, marginal probability, prior/posterior probability, predictive probability Bayesian probabilistic networks (BPNs) Graphical representation / conditional probability (tables) Advantages of BPNs Example

3/32 Representation of systems Systems may be characterized by variables and the systems performance may be described by using the variables. Model x1, x2,..., xn f ( x, x,..., x ) i 1 2 n

4/32 Representation of systems Systems may be characterized by variables and the systems performance may be described by using the variables. Ship hull structure system Corresponding model Variables: Self weight Wave loads Thickness of plates Stiffness Corrosion, etc.

5/32 Representation of systems If the states of the systems are uncertain, the systems may be characterized by random variables. Model Probabilistic model x1, x2,..., xn X 1, X 2,...,, X n f ( x, x,..., x ) fi( X1, X2,..., Xn) i 1 2 n

6/32 Task in modeling The main task in modeling is to identify the interrelations between all the variables involved in the models in logical and/or probabilistic manners. Probabilistic model X 1, X 2,...,, X n f ( X, X,..., X ) i 1 2 n Example of logical relation: X3 = X1+ X2 Example of probabilistic relation: PX [ X, X ] 3 1 2 (The logical relation is a special case of the probabilistic relation.)

7/32 Aim of analysis The aim of the analysis with models is to evaluate the quantities of interest, e.g. the probability of the occurrence of critical situations, using the logical and probabilistic relations in the models. Probabilistic bili model Example of quantities of interest: t X1, X2,..., Xn P F where = P [ X = 1] f i ( X 1, X 2,..., Xn ) X = I X1 < X2 [ ]

8/32 Observable and unobservable Observable variables The values of some of variables can be observed by e.g. measurements. An example is the annual maximum wind speed at a given location: 10 m/s, 12 m/s, 29 m/s, 14 m/s, 35 m/s, 18 m/s.

9/32 Observable and unobservable Unobservable variables (latent variables, model parameters, etc.) Unobservable variables are not directly observed but rather inferred from the variables that are directly measured. An example is the mean value of the annual maximum wind speed at one location. The mean value may be inferred from the observations as: (10+12+29+14+35+18)/6 = 19.67 m/s

10/32 Observable and unobservable Example The annual maximum wind speeds X i (i=1,2,3,4,5) are assumed modeled as: Xi iid N μ σ 2 (, ) Here, X i s are the examples of observable variables, and μ and σ 2 are the examples of unobservable variables. In Bayesian approaches, μ and σ 2 themselves may be considered as random variables.

11/32 Joint probability Any models that represent systems can be fully described by joint probabilities (probability densities or their combination): P[ X1, X2,..., X n ] Only few parametric families of multivariate distributions are known, e.g., multivariate Normal distribution, Dirichlet distribution. They are often less flexible; consider the case X 1 is continuous and X 2 is discrete. Quite lots of information are required for characterizing the multivariate distributions; (n 2 + n) values are required for the multivariate Normal distribution. Also, think of a multivariate discrete distribution!

12/32 Conditional probability An efficient (compact) way of characterizing joint probabilities is to employ conditional probabilities. Any joint probability can be built up with conditional probabilities: n P[ X1, X2,..., X ] = P[ X pa( X )] n i i i=11 where pa(x i ) represents the parents of X i (which will be later explained in the context of Bayesian probabilistic networks.)

13/32 Conditional probability Example Assume two situations where the joint probability of discrete random variables X 1, X 2 and X 3, each of which can take 5 possible values, are modeled in two different ways: P[ X1, X2, X3] This direct representation ti requires 5 3 =125 values to describe the joint probability. PX [, X, X ] = PX [ X ] PX [ X ] PX [ ] 1 2 3 3 2 2 1 1 This conditional independence representation requires only 5+55 2 +5 2 =55 values, although h this representation is less flexible than the above one.

14/32 Marginal probability Any marginal probability can be calculated from joint probabilities for discrete cases as: PX [ ] PX [, X.., X] i = 1 2 i 1 i+ 1 n 1 2 n For continuous cases the summations may be replaced by appropriate integrals.

15/32 Marginal probability Example Consider a Bernoulli trial throwing a coin 10 times. However, the probability that a head comes out is not known. Assume that X 1 represents the number that the head comes out in the 10 trials, and X 2 represents the probability that the head comes out in each trial and follows the uniform distribution [0,1]. Calculate P[X 1 ].

16/32 Marginal probability Example (solution) Joint probability P[ X = i, X = p] = P[ X = i X = p] P[ X = p] 1 2 1 2 2 Marginal probability 10 i 10 i 10 = p (1 p) 1 i 10 1 1 10 i i PX [ = = = = = 1 i] PX [ 1 ix, 2 pdp ] p (1 p ) dp 0 0 10! i!(10 i) 1 = = (10 i)! i! 11! 11 i

17/32 Prior/posterior probability According to the Bayes theorem the following relation holds: PX [ 1 = x1 X2 ] P[ X2 X1 = x1] = P[ X2] PX [ = x] 1 1 Consider special cases where X 2 represents the parameter of the probability of X 1. In such cases, the relation above provides a rationale how the probability of X 2 may be updated with the observed data (X 1 =x 1 ). Thereby, P[ X ] : prior probability (distribution) 2 PX [ X = x ] : posterior probability bili (distribution). ib i 2 1 1

18/32 Prior/posterior probability Example Consider again the Bernoulli trial throwing a coin. Assume that 5 heads came out after 10 trials, i.e., X 1 =5. Calculate P[X 2 X 1 =5], the posterior probability of X 2.

19/32 Prior/posterior probability Example Posterior probability PX [ = 5 X = p] PX p X PX p 1 2 [ 2 = 1 = 5] = [ 2 = ] PX [ 1 = 5] Thus, p (1 p ) 5 5 Γ(12) P[ X2 = p X1 = 5] = p (1 p) Γ(6) Γ (6) 5 5

20/32 Predictive probability Predictive probability is an alias of the marginal probability when the joint probabilities are marginalized in regards to the parameters of the probabilities: P [ X ] = P [ X, X ] = P [ X X ] P [ X ] 1 1 2 1 2 2 2 2 where X 2 is the parameter of the probability of X 1. P [ X ] : Predictive probability 1

21/32 Predictive probability Example Consider again the Bernoulli trial throwing the coin. Calculate the predictive probability P[X 1 ], using the updated (posterior) probability of X 2.

22/32 Predictive probability Example (solution) 1 PX [ = i ] = PX [ = ix, = pdp ] 1 0 1 2 1 0 1 2 2 = P [ X = i X = p ] PX [ = p ] dp 1 10 Γ(12) = 0 i Γ(6) Γ(6) i 10 i 5 5 p (1 p) p (1 p) dp 10 Γ(12) 1 i + 5 15 i = p (1 p ) dp i (6) (6) 0 Γ Γ = Γ (11) Γ (12) Γ ( i + 6) Γ (16 i ) Γ ( i + 1) Γ (11 i) Γ (6) Γ (6) Γ (22)

23/32 Predictive probability Example (solution) 0.2 Predictive with prior Predictive with posterior babilit ty Pro 0.15 0.1 0.05 0 0 1 2 3 4 5 6 7 8 9 10 Number of heads

24/32 Bayesian probabilistic networks DAG (Directed Acyclic Graph) Conditional probability (tables)

25/32 Bayesian probabilistic networks Example of DAG (Directed Acyclic Graph) X 1 X 2 P[ X X ] P[ X ] 2 1 1

26/32 Bayesian probabilistic networks Example of DAG (Directed Acyclic Graph) X 1 X 2 X 3 P[ X3 X1, X2] P[ X1] P[ X2]

27/32 Bayesian probabilistic networks Example of DAG (Directed Acyclic Graph) X 1 X 2 X 3 P[ X2 X1] P[ X3 X1] P[ X1]

28/32 Bayesian probabilistic networks Example of DAG (Directed Acyclic Graph) X 1 X 2 X 3 X 4 P[ X4 X3] PX [ 3 X1, X2] PX [ 1] PX [ 2]

29/32 Bayesian probabilistic networks Counter-example of DAG (Directed Acyclic Graph) X 1 X 2 X 3

30/32 Why are BPNs useful? Easy way of mind mapping. Platform for integrating all relevant aspects. Generic algorithms as well as software tools are available for - creating BPNs - calculating probabilities of interest.

31/32 Bayesian probabilistic networks Example Consider the Bernoulli trial throwing a coin 10 times. X1_before: number of heads at first trial X1_after: number of heads at second trial X2: the probability that a head comes out Will be demonstrated.

32/32 Modeling with BPNs Bottom-up approach The components of a BPN are built up by expert knowledge and analyses and are integrated into the BPN. Top-down approach The interrelation between the components in a BPN are fixed, but the probabilistic (quantitative) relations are estimated from observed data. Let s see these approaches with application examples later.