Simulation of Discrete Event Systems

Size: px
Start display at page:

Download "Simulation of Discrete Event Systems"

Transcription

1 Simulation of Discrete Event Systems Unit 10 and 11 Bayesian Networks and Dynamic Bayesian Networks Fall Winter 2017/2018 Prof. Dr.-Ing. Dipl.-Wirt.-Ing. Sven Tackenberg Benedikt Andrew Latos M.Sc.RWTH Chair and Institute of Industrial Engineering and Ergonomics RWTH Aachen University Bergdriesch Aachen phone:

2 Contents 1. Introduction 2. Background - Bayes theorem and rules of probability - Maximum a posteriori hypothesis - Bayesian methodology to calculate posterior distributions 3. Bayesian networks - Approach - Definition - Inference in simple Bayesian networks 4. Introduction to Dynamic Bayesian networks 5. Formalism of Dynamic Bayesian Networks 10-2

3 Focus of lecture and exercise model static dynamic time-varying time-invariant linear nonlinear continuous states time-driven discrete states Focus of lecture and exercise event-driven deterministic stochastic discrete-time continuous-time 10-3

4 1. Introduction 1. Introduction 10-4

5 What for are Bayesian Networks helpful? Experts are persons who need for processing their tasks a specific expertise Normal procedure of experts Observe Based on observations decisions are made Decisions lead to actions An Action causes good or bad results The results lead to a learning of the expert Experts often have to make decisions based on incomplete and conflicting information Learn Decide The probable best decision is in general the one which minimizes the risk! Bayesian networks are used to build up an expert system. 10-5

6 Motivation! Why is it worth to consider the Bayes methodology? The Bayes methodology is a statistical approach to modeling and simulating discrete-event systems under uncertainty.! Which is the concept of Bayes methodology? The basic assumption is that the state variables can be represented by probability mass functions (discrete variables) or probability density functions (continuous variables). Based on the Bayes theorem, conclusions can be drawn to identify optimal decisions.! What is the Bayes methodology used for? The Bayes theorem and the associated rules of probability are a consistent and powerful basis for algorithms manipulating probability mass functions and probability density functions directly. 10-6

7 Think about. if you see that there are clouds, what is the probability soon there will be rain? p(rain clouds) if you know that it is raining, by hearing it patter on the roof, what is the probability that there are clouds? p(clouds rain)? Is p(rain clouds) equal to p(clouds rain )? 10-7

8 Repetition of relevant definitions and formulas of probability theory Probability of event A: P(A) Probability of event A and on the condition of event B: P(A B) Bayes formula (theoretical basis of Bayesian Networks): P(A B) = P(B A) P(A) P(B) This formula enables the conversion of the probability of event A on the condition of event B into the probability of event B on the condition of event A. Formula of the total probability: P A = P A B i P B i i The absolute probability of A can be calculated based on the conditional probability of A. 10-8

9 Rules of probability 1. Product rule: The joint probability of A and B is: P A, B = P B A P A = P A B P B B in condition to A 2. Independence: The random variables A and B are independent, if the joint probability distribution can be factorized as: P A, B = P A P B A in condition to B 3. Sum rule: If the hypotheses B 1,..., B n are mutually exclusive and therefore form a partition of the set B, the marginal likelihood of the data is: P A = P A, B i i = P A B i P B i i Hence, the Bayes theorem can be expanded: P B i A = P A B i P B i σ i P A B i P B i Note, associated with Bayesian methodology the random variables A and B are named D and h. 10-9

10 Causal networks Introduction Causal networks are a precursor of Bayesian networks! Formalism to describe causal dependence within given situations. Consisting of: Set of variables Each variable can have different (finite, infinite) states Set of directed arcs Each variable must be in one of the defined states, but the current state could be unknown! A B State of variable A direct causes the occurrence of states of variable B 10-10

11 Example of a causal network Causal network W W (Winter): {true, false} C (Slippery roads): {true, false} D C D (Klaus drunk alcohol): K (Klaus has an accident): {true, false} {true, false} M (Mike has an accident): {true, false} K M Formalism to describe causal dependence within given situations. Consisting of: Season of the year: Variable winter states (true, false) has a significant impact on the condition of the street Condition of the street: Variable C states (true, false) describes the sleekness of the street and has a significant impact on the risk of an accident of Klaus (K) or Mike (M) Occurrence of an accident: Variables K or M states (true, false) describe the occurrence of an accident of Klaus (K) or Mike (M) Condition of Klaus: Variable D states (true, false) describes if Klaus has drunken alcohol 10-11

12 Dependency and conditional dependency Two variables A and B of a causal network are designated as dependent if the probabilities of the states of variable A depends on the state of variable B and vice versa: P A, B P A P B Two variables A and B of a causal network are designated as conditional dependent if A and B are dependent for specific states Z and independent for all other states Z. ҧ P A, B Z P A, B ҧ Z P A Z P B Z and P A ҧ Z P B ҧ Z 10-12

13 Dependencies (1/2) Serial Dependency W C M W (Winter): C (Slippery roads): M (Mike has an accident): {true, false} {true, false} {true, false} Variables W and M are independent if the condition of the road C is known. If the conditions of the street are known the season has no impact on the probability of an accident. Branch C C (Slippery roads): K (Klaus has an accident): M (Mike has an accident): {true, false} {true, false} {true, false} K M Variables K and M are independent if the condition of the road C is known. If K has an accident and the condition of the street is unknown the probability of the sleekness of the street increases. Furthermore, the probability of an accident of M increases

14 Dependencies (2/2) Merge D (Klaus drunk alcohol): {true, false} D C C (Slippery roads): K (Klaus has an accident): {true, false} {true, false} K Variables D and C dependent on each other if the state of variable K is known. If Klaus (K) has an accident and the street is not slippery then the probability that he has drunken alcohol is increased

15 2. Background 2. Background 10-15

16 Example: Diagnosis of scarce faults A X-ray test of a track is done: Object has hairline cracks Measurement result: hairline crack true: in 98% of the cases Object has no hairline cracks Measurement result: hairline crack false: in 97% of the cases Hairline cracks occur only at 0.8% of the produced tracks.? Calculate the probability that a measurement indicates hairline cracks and in reality the track has some cracks

17 Bayes theorem To answer this question, the Bayes theorem is used. The Bayes theorem goes back to the seminal work of the English reverent Thomas Bayes in the 18th century on games of chances. Formula: P( h D) P( D h) P( h) PD ( ) P(h): A priori probability of a hypothesis h (or a model) representing the initial degree of belief P(D): A priori probability of the data D (observations) P(h D): A posteriori probability of hypothesis h under the condition of given data D P(D h): Probability of data D under the condition of hypothesis h Two meanings of probability: Frequencies of outcomes in random experiments, e.g. repeated rolling of a dice Degrees of belief in propositions that do not necessarily involve random experiments, e.g. probability that a certain production machine will fail, given the evidence of a poor surface quality of the workpiece known as the Bayesian methodology 10-17

18 Example: Diagnosis of scarce faults Measurement: hairline crack true: in 98% of cases Measurement: hairline crack false: in 97% of cases Hairline cracks occur only at 0.8% of the produced tracks? Calculate the probability that a measurement indicates hairline cracks and in reality there are some cracks. P( h D) P( D h) P( h) PD ( ) P(h): A priori probability of a hypothesis h (or a model) representing the initial degree of belief P(D): A priori probability of the data D (observations) P(h D): A posteriori probability of hypothesis h under the condition of given data D P(D h): Probability of data D under the condition of hypothesis h Data shows crack P scrap = Track has a crack P scrap P scrap P Track has a crack Probability that the Data shows a crack 10-18

19 Example: Diagnosis of scarce faults Data shows crack P scrap = Track has a crack P scrap P scrap P Track has a crack Probability that Data shows a crack P scrap = P scrap = 0.98 P scrap = 0.97 P scrap = P scrap = 0.02 P scrap = 0.03 Auxiliary calculation: P = P scrap P scrap + scrap P Probability that Data shows a crack = 0,0376 scrap P scrap = The probability of a positively tested track that also has hailine cracks is only 21%! not 10-19

20 Bayesian methodology Objective function for Bayesian parameter estimation is the most likely hypothesis given the observations. The hypothesis h MAP representing the maximum of the probability mass is called the maximum a posteriori hypothesis: P D h P( h) hmap arg max P h D arg max arg max P D h P( h) hh hh PD ( ) hh The choice of P(h) and P(D h) represents the a priori knowledge and assumptions of the modeler concerning the application domain. The hypotheses are regarded as functions of the observations, which can be adapted iteratively to the state of knowledge of an observer. If all hypotheses have the same a priori probability, the equation above can be simplified further and only the term P(D h) has to be maximized. Each hypothesis maximizing P(D h) is called the maximum likelihood hypothesis (h ML ) : h ML arg max P D h hh 10-20

21 Example of Bayesian methodology (I) Workpieces of only one type are stored in a pallet cage. A produced workpiece is faultless (index g for good ) A produced workpiece is defective (index b for bad ). Due to a new manufacturing process, the prior probability distribution of the frequency of faultless and defective workpieces is unknown.? Calculate the posterior distribution of the proportion of faultless workpieces step-by-step (produced workpieces) on the basis of the Bayesian methodology. The input data are a sample of N workpieces, randomly drawn from the line! The workpieces in the sample are tested independently! 10-21

22 Ƹ Example of Bayesian methodology (II) Workpieces of only one type are stored in a pallet cage. A produced workpiece is faultless (index g for good ) A produced workpiece is defective (index b for bad ). The proportions to be estimated under hypothesis h on the basis of the sample of size N are: h = pƹ g, pƹ b = p g, 1 Ƹ p g pƹ g : estimated proportion of good workpiece The properties of the sample can be described sufficiently by the following aggregated quantities: n b = N n g n b : frequency of good workpieces after N tests The probability of observing exactly n g times faultless workpieces in the sample follows the binomial distribution

23 Binomial distribution The probably most important discrete distribution is the binomial distribution Lets consider an experiment with n trials Each trial can result in two states {a, b} The probability of a or b is the same in each trial The number of {a} is X Probability, that a specific number of {a} appears. p X = x = n x px 1 p n x The distribution is defined by n and p. - Mean value: μ = n p - Variance: σ 2 = n p 1 p Example: If an accident occurs, every tenth person of the population is able to provide initial medical treatment How large is the probability that there are 0, 1, up to10 persons of a total quantity of 10, who are able to provide initial medical treatment. e. g. p X = 1 One person isable to make a treatment p X = 0 = = p X = 1 = =

24 Example of Bayesian methodology (III) The binomial distribution represents the generative model of the data P(D h) under hypothesis h: P n g p g, N = N! N n g! n g! p n g N n g Probability of observing exactly n g 1 p g g times faultless workpieces in the sample Bayesian methodology (Remember ) Objective function for Bayesian parameter estimation is the most likely hypothesis under the given the observations: h MAP = arg max h H P D h P h P D = arg max h H P D h P h Due to the new manufacturing process, there is no knowledge regarding the proportion of faultless workpieces. Prior probability of the corresponding hypothesis h is described by a uniform distribution for the parameter p g : f p p g N = 0, n g = 0 = Γ 2 Γ 1 Γ 1 p g 0 1 p g 0 =

25 Example of Bayesian methodology (IV) Due to the Bayesian methodology we can define the A-posteriori probability density: f p p g N, n g = Γ N + 2 Γ n e + 1 Γ N n g + 1 p g n 1 p g N n g ~β N + 2, n e + 1 Incremental measuring of the workpieces drawn from the production line leads to the samples: after N = 5 measurements n g = 3 workpieces turned out to be faultless after N = 10 measurements n g = 6 workpieces turned out to be faultless after N = 15 measurements n g = 9 workpieces turned out to be faultless For each measurement observation the initial uniform distribution is transformed into the Betatype posterior distribution for the independent parameter p g

26 Example of Bayesian methodology (V) f p f ( p n 9, N 15) p g g f ( p n 6, N 10) p g g ˆ MAP p n 9, N 15 g g ˆ MAP p n 6, N 10 g g ˆ MAP p n 3, N 5 g g f ( p n 3, N 5) p g g f p ( p ) g p g 10-26

27 Example of Bayesian methodology (V) Conversely, when using the maximum likelihood estimator and not the maximum a posteriori estimator we have the point estimate: ML N! g pˆ g arg max P( ng pg, N) arg max pg (1 pg ) pg pg ( N n )! n! n g N g g n N n For instance, the maximum likelihood value for the first sample that had been drawn from the line (N = 5, n g = 3) is: pˆ arg max p (1 p ) ML 3 2 g g g p g g d! p 3 2 g (1 p g) 0 dp g! ML 3 p (1 ) 2(1 )( 1) 0 ˆ g pg pg pg pg 3 5 Obviously, the maximum likelihood estimate is equivalent to the relative frequency of the faultless workpieces in the tested sample! 10-27

28 3. Bayesian Networks 3. Bayesian Networks 10-28

29 Example of a Bayesian Network (1/9) Winter Sprinkler Rain Θ Winter is: Wet Grass Wet Road Winter = true Winter = false

30 Example of a Bayesian Network (2/9) Winter Sprinkler Rain Wet Grass Wet Road Θ Rain Winter is: represents false Winter Rain Rain true false

31 Example of a Bayesian Network (3/9) Winter Sprinkler Rain Wet Grass Wet Road Θ Wet Grass Sprinkler,Rain is: Sprinkler Rain Wet Grass Wet Grass true true true false false true represents false false false

32 Example of a Bayesian Network (4/9) Winter Sprinkler Rain Wet Grass Wet Road Θ Wet Road Rain is: Rain Wet Road Wet Road represents false true false

33 Example of a Bayesian Network (5/9) Probability distribution described by a Bayesian Network Winter Allocation of interest: Sprinkler Wet Grass Rain Wet Road ω(winter) = true ω(sprinkler) = false ω(rain) = true ω(wet Grass) = true ω(wet Road) = true Θ Winter Winter Winter Probability of winter P(Winter) = 0.6 Θ Sprinkler Winter Winter Sprinkler Sprinkler true false represents false Probability of Winter and not used Sprinkler P(W S) = =

34 Example of a Bayesian Network (6/9) Probability distribution described by a Bayesian Network Sprinkler Θ Rain Winter Winter Wet Grass Rain Wet Road Winter Rain Rain true false Allocation of interest: ω(winter) = true ω(sprinkler) = false ω(rain) = true ω(wet Grass) = true ω(wet Road) = true Probability of Winter and not used Sprinkler P(W S) = = 0.48 Probability of Winter and not used Sprinkler and Rain P(W S R) = = represents false 10-34

35 Example of a Bayesian Network (7/9) Probability distribution described by a Bayesian Network Sprinkler Winter Wet Grass Rain Θ Wet Grass Sprinkler,Rain Sprinkler true true false false Rain true false true false Wet Grass Wet Road Wet Grass Allocation of interest: ω(winter) = true ω(sprinkler) = false ω(rain) = true ω(wet Grass) = true ω(wet Road) = true Probability of Winter and not used Sprinkler and Rain P(W S R) = Probability of Winter and not used Sprinkler and Rain and Wet road P(W S R WG) = = represents false 10-35

36 Example of a Bayesian Network (8/9) Probability distribution described by a Bayesian Network Winter Allocation of interest: Sprinkler Wet Grass Θ Wet Road Rain is: Rain Rain Wet Road Wet Road true false Wet Road represents false ω(winter) = true ω(sprinkler) = false ω(rain) = true ω(wet Grass) = true ω(wet Road) = true Probability of Winter and not used Sprinkler and Rain and Wet road P(W S R WG) = Probability of Winter and not used Sprinkler and Rain and Wet road P(W S R WG WR ) = =

37 Example of a Bayesian Network (9/9) Probability distribution described by a Bayesian Network Sprinkler Winter Wet Grass Rain Wet Road Allocation of interest: ω(winter) = true ω(sprinkler) = false ω(rain) = true ω(wet Grass) = true ω(wet Road) = true Summarized: Pr ω = Θ W Θ S W Θ R W Θ WG S R Θ WR R This basically corresponds to the chain rule of probabilities: Pr φ 1 φ n = Pr φ 1 φ 2 φ n Pr φ 2 φ 3 φ n Pr α n. represents false 10-37

38 Approach (I) Assumption: To classify and predict a discrete event system model with uncertainty, it is necessary to make assumptions about statistical independency of variables. Reason: Number of alternatives to factorize the joint probability distribution increases exponentially with the number of variables: P X 1, X 2 = P X 2 X 1 P X 1 = P X 1 X 2 P X 2 P X 1, X 2, X 3 = P X 1 X 2, X 3 P X 2 X 3 P X 3 = P X 2 X 1, X 3 P X 2 X 3 P X 3 Conditional independency: A conditional independency of random variables X and Y given Z, if it holds: P X, Y Z = P X Z P Y Z P X Y, Z P X Z Bayesian networks encode conditional independency assumptions among subsets of random system variables are represented by a directed acyclic graphical model, with: - directed arcs between nodes (model structure) - conditional probability tables related to the random system variables (model parameters) 10-38

39 Approach (II) Semantics of the graphical model: Bayesian networks Root node Rain Parent Nodes: Random variables as state variables and observables of the system model Directed arcs: Causal dependencies of the system model from which the conditional independency of the random system variables follows Wet Road Child If a directed arc is drawn from node X ( Rain ) to node Y ( Wet Road ), node X is called parent node of Y and Y is called the child node of X Nodes without parent nodes are called root nodes A directed path from node X to Y is said to exist, if one can find a valid sequence of nodes starting from X and ending in Y such that each node in the sequence is a parent of the following node in the sequence Wet Clouds Rain X Road Y Each random variable Y with the parent nodes X 1,..., X n is associated with a conditional probability table (CPT) encoding the conditional probability P(Y=y X 1 =x 1,..., X n =x n ) X 1 X 2 Sprinkler Rain Wet Road Y 10-39

40 Definition of a Bayesian network Definition of a discrete Bayesian network (BN) : A discrete Bayesian network (BN) is represented by the parameter tuple: λ BN = G, Θ G is a directed, acyclic graph. Its nodes represent discrete random variables X i (i = 1,... n): A node is conditionally independent from its non-descendents, given its parents if the predecessor nodes are given. given Clouds Rain Wet Road Slippery Road non-descendent Θ i = (a i mr) are the conditional probability tables (CPT) of nodes of the network with the components a i mr (values): a i mr = P(X i = x m Parents(X i ) = w r ) (m = 1... X i ; r = 1... Parent 1 (X i ) Parent 2 (X i ) ) a i mr The index r of the CPT columns enumerates the possible combinations of values w r of the associated parent nodes (if the node is a root node, r is simply 1) r 1 m Values w 2 r 2 m = 1... X i simply enumerates the values of the discrete random variable X i. The column vectors in the CPTs have always a sum of one

41 Factorization of the joint probability distribution 1. Proposition: The joint probability distribution of a discrete Bayesian network with the random variables X 1, X 2,, X n can be factorized as follows: P X 1, X 2,, X n n = P X i Parents X i i=1 Predecessor of X i Note: Factorization mechanism is directly associated with the graphical model: Compared to a fully interlinked and structurally uninformative graph the number of alternatives to factorize the joint probability distribution can be significantly reduced. A graphical model can be developed from first principles and established theories about cause and effect relationships. Note: Several valid factorizations can exist for a given joint probability distribution of a Bayesian model Therefore, a transformation is only forward directed! 10-41

42 Example of a Bayesian network (I) A production machine (M) tends to produce a significant amount of defective parts. Source: 1000steine.de. Graphical model of conditional independencies: Season Causes: Its drive (D) is over-heated The control electronics (E) are disturbed The shop floor temperature (T) influences the over-heating of the drive (D) The shop floor (T) temperature depends on the season (S), because there is no air conditioning system. The functioning of the control electronics (E) is affected by grid (G) voltage jitters and by the shop floor temperature (T). Temperature Grid Drive Control Electronics Machine 10-42

43 Example of a Bayesian network (II) Graphical model of conditional independencies: Season Temperature Grid Control Electronics Machine Drive Random system variables of system model: X 1 = M with binary states: {normal productivity, low productivity} = {m, m} X 2 = E with binary states: {faultless, disturbed} = {e, e} X 3 = D with binary states: {normal, over-heated} = {d, d} X 4 = G with binary states: {no voltage jitters, significant jitters} = {g, g} X 5 = T with ternary states: {high, normal, low} = {h, n, l} X 6 = S with quaternary states: {winter, spring, summer, fall} = {w, p, s, f} 10-43

44 Example of a Bayesian network (III) Example conditional probability table (CPT T ) of the variable temperature (T): {high, normal, low} = {h, n, l} relating to the season Temperature: high Temperature: normal Temperature: low Season: Winter Spring Summer Fall S = w S = p S = s S = f P(T = h.) P(T = n.) P(T = l.) Example conditional probability table (CPT M ) of production machine (M): Machine productivity: normal low Electronic: faultless e, disturbed e Drive: normal d, over-heated d E = e D = d E = e D = d E = e D = d E = e D = d P(M = m.) P(M = -m.)

45 Example of a Bayesian network (IV) Remember The joint probability distribution encoded by a discrete Bayesian network with the random variables X 1, X 2,, X n can be factorized as follows: P( X, X,..., X ) P X Parents X 1 2 P M, E, D, G, T, S P M E, D P E G, T P D T P T S P( S) P( G) n n i i i1 For the example the following parameter setting is developed: Season Temperature Grid Control Electronics Machine Drive P(M,E,D,G,T,S) = P(M E,D) P(E G,T) P(D T) P(T S) P(S) P(G) 10-45

46 Inference in Bayesian networks (I) Overall goal Probability calculation with Bayesian networks, also referred to as inference Estimation of the probability mass functions of not-observable (hidden) random variables in the network, if (some) states of observable variables are known. Grid Parent (root) Control Electronics Season Temperature Drive Machine Child! If due to the network structure the child nodes are observable and hidden causes have to be estimated, the inference is called a diagnosis or bottom-up inference. Example: P( significant grid voltage jitters low productivity of machine )! If root nodes or parent nodes are observable and effects have to be estimated, the inference is called a prognosis or top-down inference. Example: P ( low productivity of machine over-heated drive ) 10-46

47 Inference in Bayesian networks (II) Grid Parent (root) Control Electronics Season Temperature Drive Machine Child! Inference in Bayesian networks is very flexible: The states of arbitrary network nodes can be defined and therefore the probability distributions of the other nodes can be updated. Example: P(... winter season, significant grid voltage jitters ) But the exact calculation of probability values usually is a NP-incomplete problem. Therefore, we only present closed-form solutions for chains of variables (like Markov chains) and a simple tree in this introductory course

48 Diagnosis in chains (I) Remember the Bayes Theorem: P( h D) P( D h) P( h) PD ( ) P(h): A priori probability of a hypothesis h (or a model) representing the initial degree of belief P(D): A priori probability of the data D (observations) P(h D): A posteriori probability of hypothesis h under the condition of given data D P(D h): Probability of data D under the condition of hypothesis h Case 1: Dual chain: X Y and {Y = y} is observed y represents the a priori probability of the observed data D probability of the observed y under the condition of x Belief x P X = x Y = y = P X = x Y = y P X = x P Y = y = P X = x Y = y P X = x σ x P Y = y X = x P X = x Belief x p x y = p y x p x σ x p y x p x = c p x l x 1 with: c = x p y x p x ; l x = p y x 10-48

49 Diagnosis in chains (II) Case 1: Dual chain: X Y and {Y = y} is observed Season Temperature Grid Control Electronics Machine Drive Example: Grid control electronics, observed is {control electronics = disturbed } Assumptions: P(E = faultless G = no jitters ) p(e g) = 0.9 p( e g) = 0.1 Faultless of electronics in condition to no grid voltage jitters P(E = perturbed G = significant jitters ) p( e g) = 0.8 p(e g) = 0.2 Disturbed electronics in condition to significant grid voltage jitters P(G = no jitters ) p(g) = 0.95 p( g) = 0.05 Probability of the occurrence of jitters Belief g = c p g l g = c p g p e g = c = c Belief g = c p g l g = c p g p e g = c = c 0.04 c c 0.04 = 1 c = = Belief g 0.70 and Belief g

50 Diagnosis in chains (III) Case 2: Triple chain: X Y Z and {Z = z} is observed Belief x = p x z = 1 p z probability of the observed x under the condition of z p x p z x = c p x l x l x = p z y, x p y x = p z y p y x y y Likelihood function Season Temperature Grid Control Electronics Machine Drive Example: Grid Electronics Machine Observed is the {machine = low productivity } Assumptions: P(M = normal productivity E = faultless ) p(m e) = 0.95 p( m e) = 0.05 P(M = low productivity E = disturbed ) p( m e) = 0.85 p(m e) = 0.15 P(E = faultless G = no jitters ) p(e g) = 0.9 p( e g) = 0.1 P(E = perturbed G = significant jitters ) p( e g) = 0.8 p(e g) = 0.2 P(G = no jitters ) p(g) = 0.95 p( g) =

51 Diagnosis in chains (IV) Season Temperature Grid Control Electronics Machine Drive l x = p y x l g = p m g Low productivity in condition to no voltage jitters l g = p m g = p m e p e g + p m e p e g Faultless electronic in condition to no voltage jitters l g = = 0.13 Belief g = c p g l g = c = c from last slide l g = p m g = p m e p e g + p m e p e g l g = = 0.69 Belief g = c p g l g = c = c c = Belief g 0.78 and Belief g

52 Diagnosis in chains (V) Case 3: n-tuple chain: X 1... X n and {X n =x n } is observed 1 belief ( x ) p x x p( x ) p x x cp( x ) l( x ) n1 2 n1 2 n n px ( n ) n n n n l( x )... p x x,..., x p x x,..., x... p x x x x x n n1 n1 n p x x p x x... p x x x p xn xn 1 p xn 1 xn2... p x2 x1 x x x n1 n

53 Topologies of trees Tree Multiply connected tree Note, in a tree there is not any node which merges arcs

54 Diagnosis in a simple tree Case 4: Simple tree: X1 Y Z and {Z=z} is observed X2 1 bel( x1 ) p x1 z p( x1 ) p z x1 cp( x1 ) l( x1) pz ( ) 2 2 l( x ) p z y, x, x p y x, x p x x y x y y x 2, 1 2 p z y p y x x p x 2, 1 2 p z y p y x x p x x 2 Moreover, it is possible to derive exact inference algorithms for trees with multiple layers as well as multiply connected trees. Multiply connected trees are converted into multiple layer trees. These algorithms are given in KOCH (2000)

55 4. Introduction 4. Introduction to Dynamic Bayesian networks 10-55

56 Approach (I) In the previous lecture the approach of static Bayesian networks with discrete random variables was introduced, which is able to encode prior knowledge and independency assumptions of a problem domain to be modelled both efficiently and consistently in a graphical model and allows to infer the system state from incomplete data. In this lecture the primary question is how we can exploit the methodology of Bayesian networks to model and simulate stochastic processes. These processes were already analyzed in the 7th and 8th lecture. As in the case of Markov chains we are only interested in the total probability p(x, x) of a transition from state x to state x and do not distinguish the events triggering the state transition. For instance, it is possible to represent a discrete-state and discrete-time Markov chain as a Bayesian network. Therefore, the time-indexed random variable O t defined over the integers 1,2, encodes the observable state of the chain in each time step t of the process: π P( O 1) P( O 2) time slices O 1 O 2 O 3... O T t = 1 t = 2 t = 3... t = T... P p12 P( Ot 2 Ot 1 1) p p [ pij ] p21 p

57 Approach (II) Clearly, we can make use of the structure of the graphical model according to proposition 1 of the previous lecture to factorize the joint probability distribution of the observables: P( O1, O2,..., OT ) P( O1 ) P( O2 O1 ) P( O3 O2 ) P( OT OT 1) Furthermore, we showed in the previous lecture how to compute the bottom-up inference (diagnosis) in such a Markov chain using the Bayes theorem. According to the factorization of the joint distribution the predictive power of this simple process model is limited, because the state transition mechanism considers only two neighboring time slices. In other words, if we have modeled the state sequence {O 1,..., O t } and we want to predict the future state of the stochastic process O t+1, the simple Markovian chain model considers only the distribution of the probability mass related to O t in conjunction with the single-step transition probabilities p ij. The previous instances of the process are irrelevant, given the present state. This minimum chain model is also called a first-order Markov chain, because only two consecutive time slices are linked in the graphical process model. The first-order Markov chain can be considered as the minimum structure of a dynamic Bayesian network

58 High-order Markov chains A significantly larger predictive power of the chain model is possible (without recoding states, see 8th lecture!), if the present (t) state of the chain does not only depend on the state in the previous time slice (t-1) but also on additional time slices in the past of the process (t-2, t-3, ). If the memory depth of the model is 2 it is called a second-order Markov chain and drawn as follows: O 1 O 2 O 3... O T t = 1 t = 2 t = 3... t = T Clearly, the joint probability distribution of the second-order Markov chain can be factorized as: P( O, O,..., O ) P( O ) P( O O ) P( O O, O ) P( O O, O ) P( O O, O ) 1 2 T T T 1 T 2 1. Proposition: The joint probability distribution of a discrete-state, discrete-time Markov chain of order k can be factorized in each time step T as: P( O, O,..., O ) P( O ) P( O O,..., O ) P( O O,..., O )... P( O O ) 1 2 T 1 k k 1 1 k 1 k P( O O,..., O ) P( O O,..., O ) P( O O,..., O ) k 1 k 1 k 2 k 1 2 T T 1 T k 10-58

59 Markov chains with hidden variables (I) Markov chains (MC) of finite order k are able to simulate significant memory capacity, but the number of model parameters N = ( represents the parameter tuple) that are stored in the prior and conditional probabilities tables grows polynomially with the order. Consider a stochastic process with three states o t {1, 2, 3}. We have: First-order MC: N 1 = (3-1) + 3(3-1) (initial state prob. plus transition matrix; rows must sum up to 1) Second-order MC: N 2 = (3-1) + 3(3-1) (3-1) k-th order MC:N k = (3-1) + 3(3-1) k (3-1) In order to avoid this rapid growth of the number of parameters and to be able to model processes with latent dependency structures leading to long-range correlations, in engineering science the approach of Markov chains with hidden variables was invented. These Hidden Markov Models (HMM) distinguish a not directly observable state process {Q t } that satisfies the Markov property and a non Markovian observation process {O t } that depends on the state process. We have the following structure of this kind of dynamic Bayesian network with hidden (latent) state variables: Q 1 Q 2 Q 3... Q T O 1 O 2 O 3 O T Graph of Hidden Markov Model t = 1 t = 2 t = 3... t = T 10-59

60 Application examples of HMM Phoneme and word recognition on the basis of adequately sampled and encoded acoustical spectra in speech recognition Classification of human behavior when interacting with anthropomorphic robots Prediction of event sequences in communication engineering and humancomputer interaction Function model of a speech recognition system of Prof. Schukat (Jena University) 10-60

61 Markov chains with hidden variables (II) 2. Proposition: The joint probability distribution of a Hidden Markov Model can be factorized in each time step T as: P( Q,..., Q, O,..., O ) P( Q ) P( O Q ) P( Q Q ) P( O Q ) 1 T 1 T t t1 t t t2 1. Def.: A discrete-time, discrete-state Hidden Markov Model is represented by the parameter tuple HMM = (Q, O,, A, B), where - Q is a set of hidden states being mapped in the following onto the integers {1, 2,..., J} - O is a set of observable states being mapped in the following onto the integers {1, 2,..., K} - = ( 1,..., J ) encodes the start vector indicating the initial distribution of the probability mass over the hidden states with j = P(Q 1 = j) (j = 1...J) - A = (a ij ) = P(Q t = j Q t-1 = i) encodes the transition matrix of the hidden process (i, j = 1...J) - B = (b jk ) = P(O t = k Q t = j) encodes the emission matrix of the observable states given the hidden states (j = 1...J, k = 1... K). Therefore, the distribution of the probability mass (t) in time step t given the initial distribution, transition matrix A and emission matrix B can be calculated as follows: P( O ) ( ) t T t1 Π t ΠA B 10-61

62 HMM example A fluid in a chemical reactor has two states Q = {1 (non-toxic), 2 (toxic)}. According to the molecular properties of the fluid its state can change spontaneously (e.g. due to temperature jitters) from the non-toxic state to the toxic state with probability p 12 = 0.01 at any time instant. This state switching is irreversible. Laboratory studies have shown that the temporal unfolding of the state process can be represented with a sufficiently high level of accuracy by a first-order Markov chain model. Initially, the fluid is filled in the non-toxic state into the reactor. The measurement of the state of the fluid can only be carried out with the help of an integrated sensor. A direct state observation is not possible. The sensor is fast enough to finish the measurement in the same time instant. The sensor identifies the toxic state with a reliability of 99.9% and the non-toxic state with a reliability of 95%. How is the probability mass distributed over the observable states in time step t =4 when the system is initialized in non-toxic state? Solution: Π A B PO ( t 4) 3 ΠA B

63 5. Formalism of Dynamic Bayesian Networks 5. Formalism of Dynamic Bayesian Networks 10-63

64 Network definition 2. Def.: A discrete-state, discrete-time dynamic Bayesian network is represented by the parameter tuple DBN = (G 1, G tr, { i } i{1,...i}, {CPT j } j{1,...j} ), where - G 1 is a directed, acyclic graph of start nodes in the first time slice (t=1) encoding the initial distribution of the probability mass, which has the same meaning as a static Bayesian network: Each node is conditionally independent from its non-descendents, given its parents, - G tr is a directed, acyclic graph of transition nodes in replicated time slices encoding the transition probabilities between time steps with the same meaning as a static Bayesian network, - i = ( i km) encode the start vectors or start matrices of observable as well as hidden random variables X 1 i of the start nodes in the first time slice (t=1) with the components i 1m = P(X 1 i= m) (i = 1... G 1, m = 1,, X 1 i ) if X 1 i is a root node or i km = P(X 1 i= m Parents(X 1 i) = w 1 k) (k = 1,, Parents(X 1 i ) if X 1 i is not a root node, - CPT j = (a j rl) encode the transition matrices regarding observable as well as hidden random variables X tr j in the replicated time slices (t=2, 3, ) with the components a j km = P(X tr j = m Parents(X tr j) = w tr k ) (j = 1... G tr (t=2) -1) (k = 1... Parents(X tr j), m = 1... X tr j )

65 Basic DBN structures with parameterization for binary states (I) 1. First-order Markov chain: G 1 O 1 O t-1 O t G tr Π Π P( O 1) P( O 2) CPT1 P( Ot 1 Ot 1 1) P( Ot 2 Ot 1 1) P P( Ot 1 Ot 1 2) P( Ot 2 Ot 1 2) 2. HMM: Q 1 Qt-1 Q t Π G 1 Π P( Q 1) P( Q 2) G tr O t CPT1 CPT2 P( Qt 1 Qt 1 1) P( Qt 2 Qt 1 1) A P( Qt 1 Qt 1 2) P( Qt 2 Qt 1 2) P( Ot 1 Qt 1) P( Ot 2 Qt 1) B P( Ot 1 Qt 2) P( Ot 2 Qt 2) 10-65

66 Basic DBN structures with parameterization for binary states (II) 3. Autoregressive HMM: G 1 Q 1 Q t-1 Q t G tr O 1 O t-1 O t Π Π P( Q 1) P( Q 2) CPT1 P( Qt 1 Qt 1 1) P( Qt 2 Qt 1 1) A P( Qt 1 Qt 1 2) P( Qt 2 Qt 1 2) Π 2 P( O1 1 Q1 1) P( O1 2 Q1 1) CPT P( O1 1 Q1 2) P( O1 2 Q1 2) 2 P( Ot 1 Qt 1, Ot 1 1) P( Ot 2 Qt 1, Ot 1 1) P( Ot 1 Qt 2, Ot 1 1) P( Ot 2 Qt 2, Ot 1 1) P( Ot 1 Qt 1, Ot 1 2) P( Ot 2 Qt 1, Ot 1 2) P ( Ot 1 Qt 2, Ot 1 2) P( Ot 2 Qt 2, Ot 1 2) 10-66

67 Basic DBN structures with parameterization for binary states (III) 3. Factorial HMM: Q 1 1 Q 1 t-1 Q 1 t G 1 G tr Q 2 1 Q 2 t-1 Q 2 t O t Π Π P( Q 1 1) P( Q 1 2) P( Q 1 1) P( Q 1 2) P Q Q P Q Q CPT 1 P Q Q P Q Q ( t 1 t1 1) ( t 2 t1 1) ( t 1 t1 2) ( t 2 t1 2) P( Q t 1 Q t1 1) P( Q t 2 Q t1 1) CPT P( Q t 1 Q t1 2) P( Q t 2 Q t1 2) P O Q Q P O Q Q P O Q Q P O Q Q CPT 3 P O Q Q P O Q Q P O Q Q P O Q Q ( t 1 t 1, t 1) ( t 2 t 1, t 1) ( t 1 t 2, t 1) ( t 2 t 2, t 1) ( t 1 t 1, t 2) ( t 2 t 1, t 2) ( t 1 t 2, t 2) ( t 2 t 2, t 2) 10-67

68 Factorization in DBN 3. Def.: A DBN in two consecutive time slices and the aggregated random state variables X X, X,..., X 1 2 n t t t X X, X,..., X 1 2 n t1 t1 t1 is a net fragment G tr and represents the probability distribution of a transition model according to n P X X P X Parents X tr i i i1 3. Proposition: The joint probability distribution of the aggregated random state variables of DBN can be factorized in each time slice T according to: T 1,..., T DBN 1 1 DBN tr, DBN P X X X P X X 1 (.) represents the initial probability distribution of the aggregated state variables in the first time slice and P tr (.) represents the transition model defined by Def. 3. t

69 Questions? Open Questions??? 10-69

Bayesian Networks BY: MOHAMAD ALSABBAGH

Bayesian Networks BY: MOHAMAD ALSABBAGH Bayesian Networks BY: MOHAMAD ALSABBAGH Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN Introduction Conditional

More information

Probability. CS 3793/5233 Artificial Intelligence Probability 1

Probability. CS 3793/5233 Artificial Intelligence Probability 1 CS 3793/5233 Artificial Intelligence 1 Motivation Motivation Random Variables Semantics Dice Example Joint Dist. Ex. Axioms Agents don t have complete knowledge about the world. Agents need to make decisions

More information

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma COMS 4771 Probabilistic Reasoning via Graphical Models Nakul Verma Last time Dimensionality Reduction Linear vs non-linear Dimensionality Reduction Principal Component Analysis (PCA) Non-linear methods

More information

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012 CSE 573: Artificial Intelligence Autumn 2012 Reasoning about Uncertainty & Hidden Markov Models Daniel Weld Many slides adapted from Dan Klein, Stuart Russell, Andrew Moore & Luke Zettlemoyer 1 Outline

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Bayesian Networks. Motivation

Bayesian Networks. Motivation Bayesian Networks Computer Sciences 760 Spring 2014 http://pages.cs.wisc.edu/~dpage/cs760/ Motivation Assume we have five Boolean variables,,,, The joint probability is,,,, How many state configurations

More information

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty Lecture 10: Introduction to reasoning under uncertainty Introduction to reasoning under uncertainty Review of probability Axioms and inference Conditional probability Probability distributions COMP-424,

More information

Introduction to Bayesian Learning

Introduction to Bayesian Learning Course Information Introduction Introduction to Bayesian Learning Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Apprendimento Automatico: Fondamenti - A.A. 2016/2017 Outline

More information

PROBABILISTIC REASONING SYSTEMS

PROBABILISTIC REASONING SYSTEMS PROBABILISTIC REASONING SYSTEMS In which we explain how to build reasoning systems that use network models to reason with uncertainty according to the laws of probability theory. Outline Knowledge in uncertain

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Rapid Introduction to Machine Learning/ Deep Learning

Rapid Introduction to Machine Learning/ Deep Learning Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University 1/32 Lecture 5a Bayesian network April 14, 2016 2/32 Table of contents 1 1. Objectives of Lecture 5a 2 2.Bayesian

More information

Uncertainty and Bayesian Networks

Uncertainty and Bayesian Networks Uncertainty and Bayesian Networks Tutorial 3 Tutorial 3 1 Outline Uncertainty Probability Syntax and Semantics for Uncertainty Inference Independence and Bayes Rule Syntax and Semantics for Bayesian Networks

More information

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Probabilistic Reasoning. (Mostly using Bayesian Networks) Probabilistic Reasoning (Mostly using Bayesian Networks) Introduction: Why probabilistic reasoning? The world is not deterministic. (Usually because information is limited.) Ways of coping with uncertainty

More information

Quantifying uncertainty & Bayesian networks

Quantifying uncertainty & Bayesian networks Quantifying uncertainty & Bayesian networks CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2016 Soleymani Artificial Intelligence: A Modern Approach, 3 rd Edition,

More information

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004 A Brief Introduction to Graphical Models Presenter: Yijuan Lu November 12,2004 References Introduction to Graphical Models, Kevin Murphy, Technical Report, May 2001 Learning in Graphical Models, Michael

More information

Approximate Inference

Approximate Inference Approximate Inference Simulation has a name: sampling Sampling is a hot topic in machine learning, and it s really simple Basic idea: Draw N samples from a sampling distribution S Compute an approximate

More information

1 : Introduction. 1 Course Overview. 2 Notation. 3 Representing Multivariate Distributions : Probabilistic Graphical Models , Spring 2014

1 : Introduction. 1 Course Overview. 2 Notation. 3 Representing Multivariate Distributions : Probabilistic Graphical Models , Spring 2014 10-708: Probabilistic Graphical Models 10-708, Spring 2014 1 : Introduction Lecturer: Eric P. Xing Scribes: Daniel Silva and Calvin McCarter 1 Course Overview In this lecture we introduce the concept of

More information

Learning Bayesian Networks (part 1) Goals for the lecture

Learning Bayesian Networks (part 1) Goals for the lecture Learning Bayesian Networks (part 1) Mark Craven and David Page Computer Scices 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Some ohe slides in these lectures have been adapted/borrowed from materials

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 3, 2016 CPSC 422, Lecture 11 Slide 1 422 big picture: Where are we? Query Planning Deterministic Logics First Order Logics Ontologies

More information

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2016/2017 Lesson 13 24 march 2017 Reasoning with Bayesian Networks Naïve Bayesian Systems...2 Example

More information

PROBABILITY. Inference: constraint propagation

PROBABILITY. Inference: constraint propagation PROBABILITY Inference: constraint propagation! Use the constraints to reduce the number of legal values for a variable! Possible to find a solution without searching! Node consistency " A node is node-consistent

More information

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005 Instructor: Professor Jeff A. Bilmes Uncertainty & Bayesian Networks

More information

COMP5211 Lecture Note on Reasoning under Uncertainty

COMP5211 Lecture Note on Reasoning under Uncertainty COMP5211 Lecture Note on Reasoning under Uncertainty Fangzhen Lin Department of Computer Science and Engineering Hong Kong University of Science and Technology Fangzhen Lin (HKUST) Uncertainty 1 / 33 Uncertainty

More information

Basic Probabilistic Reasoning SEG

Basic Probabilistic Reasoning SEG Basic Probabilistic Reasoning SEG 7450 1 Introduction Reasoning under uncertainty using probability theory Dealing with uncertainty is one of the main advantages of an expert system over a simple decision

More information

Directed Graphical Models

Directed Graphical Models CS 2750: Machine Learning Directed Graphical Models Prof. Adriana Kovashka University of Pittsburgh March 28, 2017 Graphical Models If no assumption of independence is made, must estimate an exponential

More information

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem Recall from last time: Conditional probabilities Our probabilistic models will compute and manipulate conditional probabilities. Given two random variables X, Y, we denote by Lecture 2: Belief (Bayesian)

More information

Stephen Scott.

Stephen Scott. 1 / 28 ian ian Optimal (Adapted from Ethem Alpaydin and Tom Mitchell) Naïve Nets sscott@cse.unl.edu 2 / 28 ian Optimal Naïve Nets Might have reasons (domain information) to favor some hypotheses/predictions

More information

Review: Bayesian learning and inference

Review: Bayesian learning and inference Review: Bayesian learning and inference Suppose the agent has to make decisions about the value of an unobserved query variable X based on the values of an observed evidence variable E Inference problem:

More information

T Machine Learning: Basic Principles

T Machine Learning: Basic Principles Machine Learning: Basic Principles Bayesian Networks Laboratory of Computer and Information Science (CIS) Department of Computer Science and Engineering Helsinki University of Technology (TKK) Autumn 2007

More information

Bayes Nets: Independence

Bayes Nets: Independence Bayes Nets: Independence [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://ai.berkeley.edu.] Bayes Nets A Bayes

More information

Introduction to Artificial Intelligence. Unit # 11

Introduction to Artificial Intelligence. Unit # 11 Introduction to Artificial Intelligence Unit # 11 1 Course Outline Overview of Artificial Intelligence State Space Representation Search Techniques Machine Learning Logic Probabilistic Reasoning/Bayesian

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Bayes Nets: Sampling Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.

More information

Hidden Markov Models. Vibhav Gogate The University of Texas at Dallas

Hidden Markov Models. Vibhav Gogate The University of Texas at Dallas Hidden Markov Models Vibhav Gogate The University of Texas at Dallas Intro to AI (CS 4365) Many slides over the course adapted from either Dan Klein, Luke Zettlemoyer, Stuart Russell or Andrew Moore 1

More information

CS 188: Artificial Intelligence. Bayes Nets

CS 188: Artificial Intelligence. Bayes Nets CS 188: Artificial Intelligence Probabilistic Inference: Enumeration, Variable Elimination, Sampling Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew

More information

CSEP 573: Artificial Intelligence

CSEP 573: Artificial Intelligence CSEP 573: Artificial Intelligence Hidden Markov Models Luke Zettlemoyer Many slides over the course adapted from either Dan Klein, Stuart Russell, Andrew Moore, Ali Farhadi, or Dan Weld 1 Outline Probabilistic

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September

More information

CSCE 478/878 Lecture 6: Bayesian Learning and Graphical Models. Stephen Scott. Introduction. Outline. Bayes Theorem. Formulas

CSCE 478/878 Lecture 6: Bayesian Learning and Graphical Models. Stephen Scott. Introduction. Outline. Bayes Theorem. Formulas ian ian ian Might have reasons (domain information) to favor some hypotheses/predictions over others a priori ian methods work with probabilities, and have two main roles: Naïve Nets (Adapted from Ethem

More information

4 : Exact Inference: Variable Elimination

4 : Exact Inference: Variable Elimination 10-708: Probabilistic Graphical Models 10-708, Spring 2014 4 : Exact Inference: Variable Elimination Lecturer: Eric P. ing Scribes: Soumya Batra, Pradeep Dasigi, Manzil Zaheer 1 Probabilistic Inference

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Classification: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network

More information

CMPT Machine Learning. Bayesian Learning Lecture Scribe for Week 4 Jan 30th & Feb 4th

CMPT Machine Learning. Bayesian Learning Lecture Scribe for Week 4 Jan 30th & Feb 4th CMPT 882 - Machine Learning Bayesian Learning Lecture Scribe for Week 4 Jan 30th & Feb 4th Stephen Fagan sfagan@sfu.ca Overview: Introduction - Who was Bayes? - Bayesian Statistics Versus Classical Statistics

More information

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements CS 188: Artificial Intelligence Spring 2011 Lecture 18: HMMs and Particle Filtering 4/4/2011 Pieter Abbeel --- UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew Moore

More information

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic CS 188: Artificial Intelligence Fall 2008 Lecture 16: Bayes Nets III 10/23/2008 Announcements Midterms graded, up on glookup, back Tuesday W4 also graded, back in sections / box Past homeworks in return

More information

Bayesian Inference. Definitions from Probability: Naive Bayes Classifiers: Advantages and Disadvantages of Naive Bayes Classifiers:

Bayesian Inference. Definitions from Probability: Naive Bayes Classifiers: Advantages and Disadvantages of Naive Bayes Classifiers: Bayesian Inference The purpose of this document is to review belief networks and naive Bayes classifiers. Definitions from Probability: Belief networks: Naive Bayes Classifiers: Advantages and Disadvantages

More information

CSE 473: Artificial Intelligence Probability Review à Markov Models. Outline

CSE 473: Artificial Intelligence Probability Review à Markov Models. Outline CSE 473: Artificial Intelligence Probability Review à Markov Models Daniel Weld University of Washington [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.

More information

Bayesian Networks Inference with Probabilistic Graphical Models

Bayesian Networks Inference with Probabilistic Graphical Models 4190.408 2016-Spring Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang intelligence Lab Seoul National University 4190.408 Artificial (2016-Spring) 1 Machine Learning? Learning

More information

2 : Directed GMs: Bayesian Networks

2 : Directed GMs: Bayesian Networks 10-708: Probabilistic Graphical Models 10-708, Spring 2017 2 : Directed GMs: Bayesian Networks Lecturer: Eric P. Xing Scribes: Jayanth Koushik, Hiroaki Hayashi, Christian Perez Topic: Directed GMs 1 Types

More information

Bayesian Networks Representation

Bayesian Networks Representation Bayesian Networks Representation Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University March 19 th, 2007 Handwriting recognition Character recognition, e.g., kernel SVMs a c z rr r r

More information

Bayesian Learning. Bayesian Learning Criteria

Bayesian Learning. Bayesian Learning Criteria Bayesian Learning In Bayesian learning, we are interested in the probability of a hypothesis h given the dataset D. By Bayes theorem: P (h D) = P (D h)p (h) P (D) Other useful formulas to remember are:

More information

Logic, Knowledge Representation and Bayesian Decision Theory

Logic, Knowledge Representation and Bayesian Decision Theory Logic, Knowledge Representation and Bayesian Decision Theory David Poole University of British Columbia Overview Knowledge representation, logic, decision theory. Belief networks Independent Choice Logic

More information

Directed Graphical Models or Bayesian Networks

Directed Graphical Models or Bayesian Networks Directed Graphical Models or Bayesian Networks Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Bayesian Networks One of the most exciting recent advancements in statistical AI Compact

More information

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang Chapter 4 Dynamic Bayesian Networks 2016 Fall Jin Gu, Michael Zhang Reviews: BN Representation Basic steps for BN representations Define variables Define the preliminary relations between variables Check

More information

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements CS 188: Artificial Intelligence Spring 2011 Lecture 12: Probability 3/2/2011 Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein. 1 Announcements P3 due on Monday (3/7) at 4:59pm W3 going out

More information

Machine Learning for Data Science (CS4786) Lecture 19

Machine Learning for Data Science (CS4786) Lecture 19 Machine Learning for Data Science (CS4786) Lecture 19 Hidden Markov Models Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2017fa/ Quiz Quiz Two variables can be marginally independent but not

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Our Status in CSE 5522

Our Status in CSE 5522 Our Status in CSE 5522 We re done with Part I Search and Planning! Part II: Probabilistic Reasoning Diagnosis Speech recognition Tracking objects Robot mapping Genetics Error correcting codes lots more!

More information

CS 188: Artificial Intelligence. Our Status in CS188

CS 188: Artificial Intelligence. Our Status in CS188 CS 188: Artificial Intelligence Probability Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein. 1 Our Status in CS188 We re done with Part I Search and Planning! Part II: Probabilistic Reasoning

More information

Lecture 15. Probabilistic Models on Graph

Lecture 15. Probabilistic Models on Graph Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how

More information

CSCE 478/878 Lecture 6: Bayesian Learning

CSCE 478/878 Lecture 6: Bayesian Learning Bayesian Methods Not all hypotheses are created equal (even if they are all consistent with the training data) Outline CSCE 478/878 Lecture 6: Bayesian Learning Stephen D. Scott (Adapted from Tom Mitchell

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Particle Filters and Applications of HMMs Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro

More information

Probabilistic Reasoning. Kee-Eung Kim KAIST Computer Science

Probabilistic Reasoning. Kee-Eung Kim KAIST Computer Science Probabilistic Reasoning Kee-Eung Kim KAIST Computer Science Outline #1 Acting under uncertainty Probabilities Inference with Probabilities Independence and Bayes Rule Bayesian networks Inference in Bayesian

More information

CS 5522: Artificial Intelligence II

CS 5522: Artificial Intelligence II CS 5522: Artificial Intelligence II Bayes Nets: Independence Instructor: Alan Ritter Ohio State University [These slides were adapted from CS188 Intro to AI at UC Berkeley. All materials available at http://ai.berkeley.edu.]

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Chapter 8&9: Classification: Part 3 Instructor: Yizhou Sun yzsun@ccs.neu.edu March 12, 2013 Midterm Report Grade Distribution 90-100 10 80-89 16 70-79 8 60-69 4

More information

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018 Bayesian networks CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018 Soleymani Slides have been adopted from Klein and Abdeel, CS188, UC Berkeley. Outline Probability

More information

Hidden Markov Models Part 1: Introduction

Hidden Markov Models Part 1: Introduction Hidden Markov Models Part 1: Introduction CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Modeling Sequential Data Suppose that

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

CS188 Outline. We re done with Part I: Search and Planning! Part II: Probabilistic Reasoning. Part III: Machine Learning

CS188 Outline. We re done with Part I: Search and Planning! Part II: Probabilistic Reasoning. Part III: Machine Learning CS188 Outline We re done with Part I: Search and Planning! Part II: Probabilistic Reasoning Diagnosis Speech recognition Tracking objects Robot mapping Genetics Error correcting codes lots more! Part III:

More information

Learning in Bayesian Networks

Learning in Bayesian Networks Learning in Bayesian Networks Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1 Overview 1. Bayesian Networks Stochastic Networks

More information

Course Introduction. Probabilistic Modelling and Reasoning. Relationships between courses. Dealing with Uncertainty. Chris Williams.

Course Introduction. Probabilistic Modelling and Reasoning. Relationships between courses. Dealing with Uncertainty. Chris Williams. Course Introduction Probabilistic Modelling and Reasoning Chris Williams School of Informatics, University of Edinburgh September 2008 Welcome Administration Handout Books Assignments Tutorials Course

More information

Hidden Markov models

Hidden Markov models Hidden Markov models Charles Elkan November 26, 2012 Important: These lecture notes are based on notes written by Lawrence Saul. Also, these typeset notes lack illustrations. See the classroom lectures

More information

Introduction to Probabilistic Reasoning. Image credit: NASA. Assignment

Introduction to Probabilistic Reasoning. Image credit: NASA. Assignment Introduction to Probabilistic Reasoning Brian C. Williams 16.410/16.413 November 17 th, 2010 11/17/10 copyright Brian Williams, 2005-10 1 Brian C. Williams, copyright 2000-09 Image credit: NASA. Assignment

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Particle Filters and Applications of HMMs Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro

More information

Stochastic inference in Bayesian networks, Markov chain Monte Carlo methods

Stochastic inference in Bayesian networks, Markov chain Monte Carlo methods Stochastic inference in Bayesian networks, Markov chain Monte Carlo methods AI: Stochastic inference in BNs AI: Stochastic inference in BNs 1 Outline ypes of inference in (causal) BNs Hardness of exact

More information

LEARNING WITH BAYESIAN NETWORKS

LEARNING WITH BAYESIAN NETWORKS LEARNING WITH BAYESIAN NETWORKS Author: David Heckerman Presented by: Dilan Kiley Adapted from slides by: Yan Zhang - 2006, Jeremy Gould 2013, Chip Galusha -2014 Jeremy Gould 2013Chip Galus May 6th, 2016

More information

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 Plan for today and next week Today and next time: Bayesian networks (Bishop Sec. 8.1) Conditional

More information

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II.

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II. Copyright Richard J. Povinelli rev 1.0, 10/1//2001 Page 1 Probabilistic Reasoning Systems Dr. Richard J. Povinelli Objectives You should be able to apply belief networks to model a problem with uncertainty.

More information

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements CS 188: Artificial Intelligence Spring 2011 Lecture 16: Bayes Nets IV Inference 3/28/2011 Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew Moore Announcements

More information

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine CS 484 Data Mining Classification 7 Some slides are from Professor Padhraic Smyth at UC Irvine Bayesian Belief networks Conditional independence assumption of Naïve Bayes classifier is too strong. Allows

More information

Our Status. We re done with Part I Search and Planning!

Our Status. We re done with Part I Search and Planning! Probability [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://ai.berkeley.edu.] Our Status We re done with Part

More information

Mobile Robot Localization

Mobile Robot Localization Mobile Robot Localization 1 The Problem of Robot Localization Given a map of the environment, how can a robot determine its pose (planar coordinates + orientation)? Two sources of uncertainty: - observations

More information

Uncertainty. Logic and Uncertainty. Russell & Norvig. Readings: Chapter 13. One problem with logical-agent approaches: C:145 Artificial

Uncertainty. Logic and Uncertainty. Russell & Norvig. Readings: Chapter 13. One problem with logical-agent approaches: C:145 Artificial C:145 Artificial Intelligence@ Uncertainty Readings: Chapter 13 Russell & Norvig. Artificial Intelligence p.1/43 Logic and Uncertainty One problem with logical-agent approaches: Agents almost never have

More information

Bayesian networks. Chapter 14, Sections 1 4

Bayesian networks. Chapter 14, Sections 1 4 Bayesian networks Chapter 14, Sections 1 4 Artificial Intelligence, spring 2013, Peter Ljunglöf; based on AIMA Slides c Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1 4 1 Bayesian networks

More information

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech Probabilistic Graphical Models and Bayesian Networks Artificial Intelligence Bert Huang Virginia Tech Concept Map for Segment Probabilistic Graphical Models Probabilistic Time Series Models Particle Filters

More information

Bayes Nets III: Inference

Bayes Nets III: Inference 1 Hal Daumé III (me@hal3.name) Bayes Nets III: Inference Hal Daumé III Computer Science University of Maryland me@hal3.name CS 421: Introduction to Artificial Intelligence 10 Apr 2012 Many slides courtesy

More information

p(d θ ) l(θ ) 1.2 x x x

p(d θ ) l(θ ) 1.2 x x x p(d θ ).2 x 0-7 0.8 x 0-7 0.4 x 0-7 l(θ ) -20-40 -60-80 -00 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ θ x FIGURE 3.. The top graph shows several training points in one dimension, known or assumed to

More information

Probabilistic Graphical Networks: Definitions and Basic Results

Probabilistic Graphical Networks: Definitions and Basic Results This document gives a cursory overview of Probabilistic Graphical Networks. The material has been gleaned from different sources. I make no claim to original authorship of this material. Bayesian Graphical

More information

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1 Bayes Networks CS540 Bryan R Gibson University of Wisconsin-Madison Slides adapted from those used by Prof. Jerry Zhu, CS540-1 1 / 59 Outline Joint Probability: great for inference, terrible to obtain

More information

Template-Based Representations. Sargur Srihari

Template-Based Representations. Sargur Srihari Template-Based Representations Sargur srihari@cedar.buffalo.edu 1 Topics Variable-based vs Template-based Temporal Models Basic Assumptions Dynamic Bayesian Networks Hidden Markov Models Linear Dynamical

More information

Directed and Undirected Graphical Models

Directed and Undirected Graphical Models Directed and Undirected Graphical Models Adrian Weller MLSALT4 Lecture Feb 26, 2016 With thanks to David Sontag (NYU) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,

More information

Probabilistic Reasoning Systems

Probabilistic Reasoning Systems Probabilistic Reasoning Systems Dr. Richard J. Povinelli Copyright Richard J. Povinelli rev 1.0, 10/7/2001 Page 1 Objectives You should be able to apply belief networks to model a problem with uncertainty.

More information

Introduction to Artificial Intelligence (AI)

Introduction to Artificial Intelligence (AI) Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 9 Oct, 11, 2011 Slide credit Approx. Inference : S. Thrun, P, Norvig, D. Klein CPSC 502, Lecture 9 Slide 1 Today Oct 11 Bayesian

More information

CS 188: Artificial Intelligence Spring 2009

CS 188: Artificial Intelligence Spring 2009 CS 188: Artificial Intelligence Spring 2009 Lecture 21: Hidden Markov Models 4/7/2009 John DeNero UC Berkeley Slides adapted from Dan Klein Announcements Written 3 deadline extended! Posted last Friday

More information

CS 7180: Behavioral Modeling and Decision- making in AI

CS 7180: Behavioral Modeling and Decision- making in AI CS 7180: Behavioral Modeling and Decision- making in AI Bayesian Networks for Dynamic and/or Relational Domains Prof. Amy Sliva October 12, 2012 World is not only uncertain, it is dynamic Beliefs, observations,

More information

3 : Representation of Undirected GM

3 : Representation of Undirected GM 10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:

More information

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car CSE 573: Artificial Intelligence Autumn 2012 Bayesian Networks Dan Weld Many slides adapted from Dan Klein, Stuart Russell, Andrew Moore & Luke Zettlemoyer Outline Probabilistic models (and inference)

More information

Directed and Undirected Graphical Models

Directed and Undirected Graphical Models Directed and Undirected Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Last Lecture Refresher Lecture Plan Directed

More information

CS 188: Artificial Intelligence Fall Recap: Inference Example

CS 188: Artificial Intelligence Fall Recap: Inference Example CS 188: Artificial Intelligence Fall 2007 Lecture 19: Decision Diagrams 11/01/2007 Dan Klein UC Berkeley Recap: Inference Example Find P( F=bad) Restrict all factors P() P(F=bad ) P() 0.7 0.3 eather 0.7

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information