Bayesian networks. 鮑興國 Ph.D. National Taiwan University of Science and Technology

Size: px
Start display at page:

Download "Bayesian networks. 鮑興國 Ph.D. National Taiwan University of Science and Technology"

Transcription

1 Bayesian networks 鮑興國 Ph.D. National Taiwan University of Science and Technology

2 Outline Introduction to Bayesian networks Bayesian networks: definition, d-searation, equivalence of networks Eamles: causal grahs, elaining away, Markov chains, Naïve Bayes, etc Undirected models Probabilistic inference node elimination junction tree Building the networks

3 Probabilistic Grahical Models Combination of grah theory and robability theory Informally, Grah structure secifies which arts of system are directly deendent Local functions at each node secify how arts interact More formally, Grah encodes conditional indeendence assumtions Local functions at each node are factors in the joint robability distribution E.g. : Bayesian networks = PGMs based on directed acyclic grahs E.g. : Markov networks Markov random fields = PGM with undirected grah Others: Discrete models, continuous models, Gaussian grahical models, hybrid Bayesian networks, CG models, etc

4 Alications of PGMs Machine learning Statistics Seech recognition Natural language rocessing Comuter vision Error-control codes Bioinformatics Medical diagnosis Financial redictions etc

5 Why Probability? Why robabilistic grahical models? Why robabilistic aroach can hel us in machine learning? Guess : world is nondeterministic or Guess : world is deterministic, but lacking of relevant facts/attributes to decide the answer robabilistic model can make inferences about missing inuts classification is one eamle missing is y! Either way, robabilistic aroach can make decisions which minimize eected loss

6 Probabilistic Aroach Classification and regression: conditional density estimation PY X then, by Bayes rule Unsuervised learning: density estimation PX Imaging Y as a latent variable, classification is no more than missing value filling N N... d d d N d+ d+ d+ N... d+k d+k d+k N y y y Y X ste : building the model ste : inference

7 Why Grahical Models? We discuss this issue later on Basically, grahical models hel us to deal with comlicated roblem modularized and in a visualized way That is, breaking roblems to subroblems recursively until we can solve them

8 Welcome to Hotel California! Srinkler Cloudy Rain P C, S, R, W, F = P CP SCP RC, S P W C, S, RP F C, S, R, W = P CP SCP RC P W S, RP F R Wet grass roof actually Conditional Probability Table PW R, S = 0.95 PW R, ~S = 0.90 PW ~R, S = 0.90 PW ~R, ~S = 0.0 P RC, S = P RC P W C, S, R = P W S, R P F C, S, R, W = P F R

9 Joint Probability Goal : reresent a joint distribution PX = = PX =,, X n = n comactly even when there are many variables Goal : efficiently calculate marginal and conditionals of such comactly reresented joint distribution For n discrete variables of arity k, the naïve table resentation if HUGE: it requires k n entries We need to make some assumtions about the distribution One simle assumtion: indeendence = comlete factorization PX = Π i PX i But the indeendence assumtion is too restrictive. So we make conditional indeendence assumtion instead

10 Conditional Indeendence Notation: X A X B X C Definition: two sets of variables X A and X B are conditionally indeendent given a third X C if: PX A, X B X C = PX A X C PX B X C X C which is equivalent to saying PX A X B, X C = PX A X C X C Only a subset of all distribution resect any given nontrivial conditional indeendence statement. The subset of distributions that resect all the CI assumtions we make is the family of distributions consistent with our assumtions Bayesian networks robabilistic grahical models are a owerful, elegant and simle way to secify such a family

11 Between Simle and Comle Models Given n r.v. s X, X,, X n, with state sace = {,, 3}, let us consider the degree of freedom or comleity of the model to describe the joint robabilities most general df = 3 n not efficient in time and sace??? Bayesian networks indeendent df = n PX i =, PX i = i.i.d. df = PX =, PX =

12 Bayesian Belief networks Bayesian networks grahical models is an intermediate aroach i.i.d. assumtion too restrictive the most general cases ineffective BN GM reresent large joint distributions comactly using a set of local relationshis secified by a grah. The joint robability then can be factorized into such local relationshis BN describes conditional indeendence among subsets of variables conditional robabilities joint robabilities Bayes theorem Bayesian networks or directed models vs. Markov random fields MRF, or undirected models Belief networks using conditional r., while Markov random fields using otential function

13 Part I Bayesian Networks: Introduction to Directed Models and Undirected Models

14 Outline Introduction to Bayesian networks Bayesian networks: definition, d-searation, equivalence of networks Eamles: causal grahs, elaining away, Markov chains, Naïve Bayes, etc Undirected models Probabilistic inference node elimination junction tree Building the networks

15 Bayesian Networks A.K.A. belief network, directed grahical model Definition a set of nodes reresenting random variables and a set of directed edges between variables each variable has a finite set of mutually eclusive states to each variable X with arents Y, Y,, Y n, there is attached the conditional robability table PX Y, Y,, Y n

16 Bayesian Networks cont. Informally, edges reresent causation The variables together with the directed edges form a Directed Acyclic Grah no cycles allowed X Y X, Y = X Y X Y X, Y = X Y X V W Y X 3 X X 3 Z X X X X X U

17 Factorization Consider directed acyclic grahs over n variables. Each node has ossibly emty set of arents π i. Each node maintains a function f i X i ; X ùi such that f i > 0and P i f i X i = i ; X ùi = ù i Define the joint robability to be: PX,X,...,X n = Q i Even with no further restriction on the the f i, it is always true that f i X i ; X ùi =PX i X ùi so we will just write PX,X,...,X n = Q i f i X i ; X ùi PX i X ùi Factorization of the joint in terms of local conditional robabilities. Eonential in fan-in of each node instead of in total variables n.

18 So by the Alternative Definition X X X 4 P X,X,X 3,X 4,X 5,X 6 = Y i X 6 P X,X,X 3,X 4,X 5,X 6 f i X i X πi X 3 X 5 = P X P X X P X 3 X P X 4 X P X 5 X 3 P X 6 X,X 5 PX,X 5,X 6 = X X,X 3,X 4 f X f X ; X f 3 X 3 ; X f 4 X 4 ; X f 5 X 5 ; X 3 f 6 X 6 ; X,X 5 = f 6 X 6 ; X,X 5 X X,X 3,X 4 f X f X ; X f 3 X 3 ; X f 4 X 4 ; X f 5 X 5 ; X 3 PX,X 5 = X X f X f X ; X f 3 X 3 ; X f 4 X 4 ; X f 5 X 5 ; X 3 f 6 X 6 ; X,X 5 X,X 3,X 4 X 6 = X f X f X ; X f 3 X 3 ; X f 4 X 4 ; X f 5 X 5 ; X 3 X,X 3,X 4 PX 6 X,X 5 =PX,X 5,X 6 /PX,X 5 =f 6 X 6 ; X,X 5

19 Conditional Indeendence in DAGs If we order the nodes in a directed grahical model so that arents always come before their children in the ordering then the grahical model imlies the following about the distribution: {X i X ùi àx ùi } i X ùi where à are the nodes coming before X i that are not its arents. In other words, the DAG is telling us that each variable is conditionally indeendent of its nondescendants given its arents, this is called local Markov roerty Such an ordering is called a toological ordering

20 From the Definition again! X X 4 X X 6 Prove: X 4 X X X 3 X 5 PX [..4] =P X P X X PX 3 X P X 4 X X X 5 P X 5 X 3 X X 6 PX 6 X,X 5 = P X P X X PX 3 X P X 4 X PX [..3] =P X P X X PX 3 X X X 4 P X 4 X X X 5 PX 5 X 3 X X 6 P X 6 X,X 5 = P X P X X PX 3 X PX 4 X [..3] =PX 4 X and PX 4 X,X =PX 4 X why?

21 Saving in Time & Sace X X X 4 X 3 X 5 X 6 Conditional indeendence: E.g.a: X 4 X X E.g.b: X 5 X X 3 E.g.c: X 5 X 6 X 3 Idea can be etended to sets: E.g.a: X 6 X 3 X, X 5 E.g.b: X 5, X 6 X X, X 3 PX [..6] = X X [..6] PX PX X PX 3 X,X PX 4 X [..3] PX 5 X [..4] PX 6 X [..5] = X X [..6] PX PX X PX 3 X P X 4 X P X 5 X 3 PX 6 X,X 5

22 Saving in Time & Sace cont. X X X 4 X 6 P X,X,X 3,X 4,X 5,X 6 = P X P X X P X 3 X P X 4 X P X 5 X 3 P X 6 X,X 5 X 3 X 5 Missing edges imly cond. indeendence Each entry/cell reresents a cond. rob. combination The biggest table, the bottleneck of the comutation

23 Eamle: Causal Grahs Rain PR = 0.4 PRW = PWRPR PW 0.9â0.4 = 0.9â0.4+0.â0.6 =0.75 Wet grass PW R = 0.9 PW ~R = 0. In general, fewer values need to be secified for the whole distribution than that for the unconstrained case O n On k, where k is the maimum fan-in of a node

24 Eamle: Comlete Grah P X,X,X 3,X 4,X 5 = P X P X X P X 3 X,X P X 4 X,X,X 3 P X 5 X,X,X 3,X 4 X X X 3 X 4 X 5 A network reresents a relationshi between r.v. s, however, the reresentation to give this relationshi is not unique; e.g., different orders of X i can roduce different networks

25 Eamle: Markov Chain X X X 3 X 4 X 5 P X = P X,X,...,X n = P X n X n,...,x P X n X n,...,x PX X P X = P X n X n P X n X n PX X P X = P X Y n ix i i=

26 Naïve Bayes What is the connection between a BN and classification? Suose one of the variables is the target variable. Can we comute the robability of the target variable given the other variables? In Naïve Bayes: Concet Cj Directions are not the same with the direction of classification! X X X n PX, X,, X n, Cj = PCj PX C j PX C j PX n C j

27 Elaining Away PS = 0. PR = 0.4 Srinkler Rain PSW = PWSPS PW = 0.9â =0.35 PW R, S = 0.95 PW R, ~S = 0.90 PW ~R, S = 0.90 PW ~R, ~S = 0.0 Wet grass PSR, W = PWR,SPSR PWR = PWR,SPS PWR =0. <PSW Knowing it rained, the robability that srinkler is on to cause the web grass decreases, knowing that the grass is wet Berkson s arado!

28 Directed-searation: Definition Definition d-searation A ath is said to be d-searated or blocked if and only if contains a chain X Y Z or a fork X Y Z and Y is instantiated shaded X Y Z X Y Z contains an inverted fork or collider X Y Z, and neither Y nor any descendants of Y have received evidence X U Y Z

29 Directed-searation: Definition cont. A set A is said to be d-searated from a set B if every ath from a node in A to a node in B is d-searated A and B connected if they are not searated! A set A is said to be d-searated from a set B given a set C if every ath from a node in A to a node in B is d-searated, given all nodes in C are instantiated! A and B connected given C if they are not searated given C!

30 D-searation and Conditional Indeendence G = set of C.I. rules = family of distributions Probabilistic Imlications of d-searation If sets A and B are d-searated given C in a DAG G, then A is indeendent of B conditional on C in every distribution comatible with G. Conversely, if A and B are not d-searated given C in a DAG G, then A and B are deendent conditional on C in at least one distribution comatible with G.

31 D-searation: eamle I X W Y Z X Z Z Z 3 Y R S X and Y are d-searated given Z and d-connected given Z T X and R are d-searated by {Y, Z} X and T are d-searated by {Y, Z} W and T are d-searated by {R} W and X are not d-searated by Y W and X are d-searated by

32 D-searation: eamle II Is X d-searated from Y? X Y e e e 3 e 4 e 5 d-searated: given no evidence or e or e 3 or only e 4 d-connected: given only e or only e 5 rovided the rest ath is connected

33 Bayes Bouncing Ball Rules A is d-searated from B given C if we cannot send a ball from any node in A to any node in B according to the rules below, where shaded nodes are in C Case : X Y Z X Y Z Case : X Y Z X Y Z Case 3: X Y Z X Y Z

34 Bayes Ball Rules cont. Boundary condition Case : X Y X Y Case : X Y X Y not really necessary X Y ancestor X Y descendant

35 Undirected Grahical Models A.K.A Markov Random Fields, Markov Networks Also grahs with one node er random variable and edges that connect airs of nodes, but now the edges are undirected Semantics: every node set is conditionally indeendent from its non-neighbours given its neighbours, i.e. X A X C X B if every ath between X A and X C goes through X B Can model symmetric interactions that directed models cannot! X A X B X C

36 Simle Grah Searation In undirected models, simle grah searation as oosed to d-searation tells us about conditional indeendencies X A X C X B if every ath between X A and X C is blocked by some node in X B X A X B X C Markov Ball algorithm: remove X B and see if there is any ath from X A to X C

37 Conditional Parameterization? In directed models, we started with X = Q i i ùi and we derived the d-searation semantics from that. Undirected models: have the semantics, need arametrization. What about this conditional arameterization? X = Q i i neighborsi Good: roduct of local functions. Good: each one has a simle conditional interretation. Bad: local functions cannot be arbitrary, but must agree roerly in order to define a valid distribution.

38 Marginal Parameterization? OK, what about this marginal arameterization? X = Q i i ùi X = Q i i, neighborsi Good: roduct of local functions. Good: each one has a simle marginal interretation. Bad: only very few athalogical marginals on overaling nodes can be multilied to give a valid joint.

39 Interretation of Clique Potentials X Y Z The model imlies X Z Y PX, Y, Z = PY PX Y PZ Y We can write this as: PX, Y, Z = PX, Y PZ Y = ψ y X, Y ψ yz Y, Z PX, Y, Z = PX Y PZ, Y = ψ y X, Y ψ yz Y, Z cannot have all otentials be marginals cannot have all otentials be conditionals The ositive clique otentials can only be thought of as general comatibility, goodness or hainess functions over their variables, but not as robability distributions.

40 When Causality not Playing a Role! X P X Q X R C D E ψ: otential functions B A G F X P, X Q, X R : maimal cliques P A, B, C, D, E, F, G = P A, B, C, DP E,F,GA, B, C, D = P A, B, C, DP E,F,GD = P A, B, C, DP EDP F, GD,E = P A, B, C, DP EDP F, GE = ψa, B, C, DψD, EψE,F,G = ψx P ψx Q ψx R

41 An eamle of Undirected Model Whatever factorization we ick, we know that only connected nodes can be arguments of a single local function. A clique is a fully connected subset of nodes. Thus, consider using a roduct of ositive clique otentials: P X = ψc c Z = ψc c Z cliques c X cliques c The roduct of functions that don t need to agree with each other. Still factors in the way that the grah semantics demand. Without loss of generality we can restrict ourselves to maimal cliques. Why? Potential functions P X [ :5] = ψ X, X, X 3 ψ X 3, X 4 ψ X 3, X 5 Z Partition function 3 4 5

42 Partition Function Normalizer ZX above is called the artition function. Comuting the normalizer and its derivatives can often be the hardest art of inference and learning in undirected models. Often the factored structure of the distribution makes it ossible to efficiently do the sums/integrals required to comute Z. Don t always have to comute Z, e.g. for conditional robabilities.

43 Directed vs. Undirected Models Directed models using conditional rob. for each local substructure called Bayesian network may describe some distributions which can not be described by undirected models mainly used in A.I., diagnostic, decision making, etc. Undirected models using otential functions in each local substructure called Markov random field or Markov network may describe some distributions which can not be described by directed models for roblems with little causal structure to guide the grah construction: image restoration, certain otimization roblems, models of hysical systems

44 A Directed Model of 3 r.v. A general dist. PX, Y, Z = PX PY X PZ X, Y Chain PX, Y, Z = PX PY X PZ Y X Z X Z Y Elaining away PX, Y, Z = PX PY PZ X, Y X Y Fork PX, Y, Z = PY PX Y PZ Y Z the same grou One link PX, Y, Z = PX PY PZ Y X Y Y All indeendence PX, Y, Z = PX PY PZ Z X Z X Z Y Y

45 An Undirected Model of 3 r.v. A general dist. PX, Y, Z = PX PY X PZ X, Y = ψx, Y, Z One link 3 cases PX, Y, Z = PX PY PZ Y = ψx ψy, Z X Z X Z Y Y Two links 3 cases PX, Y, Z = PY, Z X PX = PX PY X PZ X = ψx, Y ψx, Z All indeendence PX, Y, Z = PX PY PZ = ψx ψy ψz X Z X Z Y Y

46 Between Directed and Undirected Models Directed can t do it! X Y { W, Z } W Z { X, Y } Must be acyclic, will have at least one V structure and Bayes ball goes through X Y W X Y { W, Z } W W X Y X Y Z Undirected can t do it! X Z X Z Y Undirected can t do it! X Z Y X Z Z X Z Y Y X Z X Z Y

47 Probabilistic Grahical Models Grahical Models Probabilistic Models Directed Models Bayesian belief nets Alarm network State-sace models HMMS Naïve Bayes classifier PCA / ICA Undirected Models Markov nets Markov Random Field Boltzmann machine Ising model Ma-ent model Log-linear models

48 Part II Probabilistic Inference

49 Outline Introduction to Bayesian networks Bayesian networks: definition, d-searation, equivalence of networks Eamles: causal grahs, elaining away, Markov chains, Naïve Bayes, etc Undirected models Probabilistic inference node elimination junction tree Building the networks

50 Probabilistic Inference Task infer value of some X i given values of some other X j in the network Facts all other attributes known: easy general case: NP-hard Cooer, 990 roving 3-SAT easier than robabilistic inference! aroimated inference: still NP-hard Dagum & Luby, 993 Monte-Carlo Pradham & Dagum, 996

51 Probabilistic Inference II Partition the random variables in a domain X into three disjoint subsets X E, X F, X R. The general robabilistic inference roblem is to comute the osterior X F X E over the query nodes X F. This involves conditioning on evidence nodes X E and integrating summing out marginal nodes X R If the joint distribution is reresented as a huge table, this is trivial: just select the aroriate indices in the columns corresonding to X E based on the values, sum over the columns corresonding to X R, and renormalize the resulting table over X F

52 Probabilistic Inference III If the joint is a known continuous function this can sometimes be done analytically. e.g. Gaussian: eliminate rows/cols corresonding to X R ; aly conditioning formulas for X F X E But what if the joint distribution over X is reresented by a directed or undirected grahical model?

53 Recall Bayes Rule For simle models, we can derive the inference formulae by hand using Bayes rule This is called reversing the arrow In general, the calculation we want to do is: Q: Can we do these sums efficiently? Q: Can we avoid reeating unnecessary work each time we do inference? A: Yes, if we eloit the factorization of the joint distribution X Y X Y X Y a b c = = y y y y y y, c b a = R F R R F E R F E E F,,,,,

54 Take Advantage of Distributed Law X X 4 Comute PX [..5]? X X 6 X 3 X 5 P X [..5] = X X 6 P X P X X P X 3 X P X 4 X P X 5 X 3 P X 6 X,X 5 X 6 0 X 0 5 X : 0 = P X P X X P X 3 X P X 4 X P X 5 X 3 X X 6 P X 6 X,X 5 = P X P X X P X 3 X P X 4 X P X 5 X 3 5 multilications, additions, 5 multilications, additions!

55 The Generalized Distributed Law ab+ ac= a b + c left: +, right:, +

56 一場遊戲一場夢?!

57 The Most Probable Path A B n n Given: a multilayer network, with transition robability a kl shown on edges. Problem: find the most robable ath from A to B One of the solutions is given by Viterbi algorithm, using dynamic rogramming

58 The Viterbi Algorithm All aths have the same start state A, so v 0 0 =. By keeing ointers backwards, the actual ath can be found by backtracking. The full algorithm: Init i = 0: Recursion i Termination : Traceback i =..n: = n.. : 0 =, v * i Note : end state is assumed v 0 v l tr l = π π i = * n i ma P Y, π* = = arg ma arg ma = i k 0 k v ma i k * tr π = k v 0 i k k k v for a v k k i n a k kl n a k 0 a > k 0 0 kl

59 Commutative Semi-ring A set K, together with two binary oerations called + and, which satisfy the following three aioms:. The oerations + is associative and commutative, and there is an additive identity element called 0 s.t. k + 0 = k, k. The oeration is also associative and commutative, and there is a multilicative identity element called s.t. k = k, k 3. The distributive law holds, i.e. a b + a c = a b + c, a, b, c from K A semi-ring is a commutative ring without the additive inverse

60 Eamle: ma-roduct K = R + = [0, +, + : ma, : usual multilication. Checking the oerations + mak, 0 = k, k ; i.e., identity: 0. Checking the oeration k = k, k ; i.e., identity: 3. The distributive law holds, i.e. ma{a b, a c} = a ma b, c, a, b, c from K Other eamles: min-roduct, min-sum, ma-sum, etc.

61 Viterbi vs. GDL Viterbi is just a form of GDL, by choosing an aroriate semi-ring In fact, many dynamic rogramming rocesses can be interreted by GDL Other eamles: Baum-Welch algorithm, FFTfast Fourier transform on any finite Abelian grou, Gallager-Tanner-Wiberg decoding algorithm, BCJR algorithm, Pearl s belief roagation, Shafer-Shenoy robability roagation algorithm, turbo decoding algorithm, etc.

62 Eamle Comute? ,,,,,,,,,, ,,, ,,, Φ = Φ Φ = Φ = = = = = ' 6 6 ', X : 0 X 3 0 X 5 0 /, =

63 Single Node Posteriors For a single node osterior i.e. X F is a single node, there is a simle, efficient algorithm based on eliminating nodes. Notation: i is the value of evidence node i. The algorithm, called ELIMINATION, requires a node ordering to be given, which tells it which order to do the summations in. In this ordering, the query node must aear last. Grahically, we ll remove a node from the grah once we sum it out

64 Evidence Potentials Elimination also uses a bookeeing trick, called evidential functions: g X i = g X i δ X i, X i i where δ i, is if and 0 otherwise. i i = i This trick allows us to treat conditioning in the same way as we treat marginalization. So everything boils down to doing sums: PX PX PX F F E X E,X = E We just ick an ordering and go for it... = = PX R R E F F E,X E PX PX / PX F F,X,X E E E,X,X R R δ X δ X E E,X,X E E

65 Elimination Algorithm ELIMINATEG lace all PX i X ùi and δ X i, X i on the active list choose an ordering I such that F aears last for each X i in I find all otentials on the active list that reference X i and remove them from the active list define a new otential as the sum with resect to X i of the roduct of these otentials lace the new otential on the active list end return the roduct of the remaining otentials

66 Algorithm Details At each ste we are trying to remove the current variable in the elimination ordering from the distribution For marginal nodes the sums them out, for evidence nodes this conditions on their observed values using the evidential functions Each ste erforms a sum over a roduct of otential functions The algorithm terminates when we reach the query node, which always aears last in the ordering We renormalize what we have left to get the final result PX F X E For undirected models, everything is the same ecet that initialization hase uses the clique otentials instead of the arent-conditionals

67 Marginalization without Evidence Marginalization of joint distributions reresented by grahical models is a secial case of robabilistic inference To comute the marginal PX i of a single node, we set it to be the query node and set the evidence set to be emty In directed models, we can ignore all nodes downstream from the query node, and marginalize only the art of the grah before it If the node has no arents, we can read off its marginal directly In undirected models, we need to do the full comutation: comute PX i / Z using elimination and then normalize in the last ste of elimination to get Z. We can reuse Z later if we want to save work

68 Efficiency Trick in Directed Elimination In directed models, we often know that a certain sum must evaluate to unity, since it is a conditional robability. For eamle, consider the term Φ 4 in our si node eamle: Φ 4 = P 4 4 ñ We can t use this trick in undirected models, because there are no guarantees about what clique otentials sum to.

69 Node Elimination The algorithm we resented is really a way of eliminating nodes from a grah one by one. For undirected grahs: foreach node i in ordering I: connect all the neighbors of i remove i from the grah end The removal oeration requires summing out i or conditioning on observed evidence for i. Summing out i leaves a function involving all its revious neighbors and thus they become connected by this ste. The original grah, augmented by all the added edges is now a triangulated grah. Reminder: triangulated means that every cycle of length >3 contains a chord, i.e. an edge not on the cycle but between two nodes in the cycle.

70 Eamle: Node/Variable Elimination X X X 4 X 3 X 5 X 6 Push the sums in as far as ossible After erforming the innermost sum, we create a new term which does not deend on the term summed out Continue to do summations, PX, X 6 = X ψx,x ψx,x 3 ψx,x 4 ψx 3,X 5 ψx,x 6 ψx 5,X 6 Z X [..5] = X X X X ψx,x ψx,x 4 ψx,x 3 ψx 3,X 5 ψx, X 6 ψx 5, X 6 Z X X 4 X 3 X 5 = X X X X ψx,x ψx,x 4 ψx,x 3 ψx 3,X 5 ΦX,X 5 Z X X 4 X 3 X 5 = X X X ψx,x ψx,x 4 ψx,x 3 ΦX,X 3 Z X X 4 X 3 = X ψx,x ΦX ΦX,X = Z Z ΦX X

71 Added Edges = Triangulation ,,,,,,,,,, ,,, ,,, Φ = Φ Φ = Φ = = = = It is easy to check if a grah is triangulated in linear time It is easy to triangulate a non-triangulated grah But it is very hard to do so in a way that induces small clique sizes

72 Moralization For directed grahs, the arents may not be elicitly connected, but they are involved in the same otential function P i π i Thus to think of ELIMINATION as a node removal algorithm, we first must connect all the arents of every node and dro the directions on the links This ste is known as Moralization and it is essential: since conditioning coules arents in directed models elaining away we need a mechanism for resecting this when we do inference

73 Markov Blanket Elimination order : Elimination order : For each eliminated node X i = X 5, we need to take care the child = X 6, the arents = X 3, and 3 the arents of the child = X. This three is called the Markov blanket of X ,, Φ = = Φ = = ,,, Order : Order :

74 Moral Grah , ψ, ψ, 3 ψ, 4 ψ 3, 5 ψ, 5, Z The grah after moralization looks more general! What do we lose? X X 5 X, X 3 Moral grah is more general loses some indeendencies 6 most secific most general

75 Junction Tree Moralization Triangulation Construct Junction Tree Proagate Probabilities

76 Tree-Structured Grahical Models For now, we will focus on tree-structured grahical models. Trees are an imortant class; they incororate all chains e.g. HMMs as well. Eact inference on trees is the basis for the junction tree algorithm which solves the general eact inference roblem for directed acyclic grahs and for many aroimate algorithms which can work on intractable or cyclic grahs. Directed and undirected trees make eactly same conditional indeendence assumtions, so we cover them together. a b c

77 Elimination on Trees Recall basic structure of Eliminate:. Convert directed grah to undirected by moralization.. Chose elimination ordering with query node last. 3. Place all otentials on active list. 4. Eliminate nodes by removing all relevant otentials, taking roduct, summing out node and lacing resulting factor back onto otential list. What haens when the original grah is a tree?. No moralization is necessary.. There is a natural elimination ordering with query node as root. Any deth first search order. 3. All subtrees with no evidence nodes can be ignored since they will leave a otential of unity once they are eliminated.

78 Elimination on Trees Now consider eliminating node j which is followed by i in the order. Which nodes aear in the otential created after summing over j? nothing in the subtree below j already eliminated nothing from other subtrees, since the grah is a tree only i, from ψ ij which relates i and j Call the factor that is created m ji i, and think of it as a message that j asses to i when j is eliminated. This message is created by summing over j the roduct of all earlier messages m kj j sent to j E as well as ψ j j if j is an evidence node. to root i j k l i m ji i m kj j m lj j j

79 Eliminate = Message Passing On a tree, ELIMINATE can be thought of as assing messages u to the query node at the root from the other nodes at the leaves or interior. Since we ignore subtrees with no evidence, observed evidence nodes at always at the leaves. The message m ji i is created when we sum over j m ij = ψ i E ψ j i, j mkj j j k c j At the final node f, we obtain the answer: f E ψ E E If j is an evidence node, ψ j = δ j, j else ψ E j =. If j is a leaf node in the ordering, cj is emty, otherwise cj are the children of j in the ordering. f k c f m kf f

80 Message are Reused in MultiElimination Consider querying,, 3 and 4 in the grah below. The messages needed for,, 4 individually are shown a-c. Also shown in d is the set of messages needed to comute all ossible marginals over single query nodes. X X X X m m m m m X X X X m 3 m 4 m 3 m 4 m 3 m 4 4 m 3 m 4 X 3 X 4 a X 3 X 4 b X 3 X 4 Key insight: even though the naive aroach rerun Elimination needs to comute N messages to find marginals for all N query nodes, there are only N ossible messages. We can comute all ossible messages in only double the amount of work it takes to do one query. Then we take the roduct of relevant messages to get marginals. c X 3 X 4 m 3 3 d m 4 4

81 Comuting All Possible Messages How can we comute all ossible messages efficiently? Idea: resect the following Message-Passing-Protocol: A node can send a message to a neighbor only when it has received messages from all its other neighbors. Protocol is realizable: designate one node arbitrarily as the root. Collect messages inward to root then distribute back out to leaves. Once we have the messages, we can comute marginals using: i E ψ E Remember that the directed tree on which we ass messages might not be same directed tree we started with. We can also consider synchronous or asynchronous message assing nodes that resect the rotocol but don t use the Collect-Distribute schedule above. Must rove this terminates. i k c i m ki i

82 Triangulation Triangulation: connect nodes in moral grah such that no cycle of 4 or more nodes remains in grah So, add links, but many ossible choices HINT: Try to kee largest clique size small makes junction tree algorithm more efficient Sub-otimal triangulations of moral grah are olynomial Triangulation that minimizes largest clique size is NP But, OK to use a subotimal triangulation slower JTA

83 Part III Learning Belief Networks

84 Outline Introduction to Bayesian networks Bayesian networks: definition, d-searation, equivalence of networks Eamles: causal grahs, elaining away, Markov chains, Naïve Bayes, etc Undirected models Probabilistic inference node elimination junction tree Building the networks

85 Building Bayesian Network Models Tasks of building models Catching the model structure Determining the conditional robabilities What deciding the models? Theoretical considerations e.g.: mied data, overfitting, etc. A set of data Subjective views from eerts can be artially rovided!

86 Some Issues in Learning Structure & Parameters Nodes as variables How many edges? we can always start from a comlete grah, but it may not be a good idea! edge missing can be treated as an edge of robability zero Undirected, directed or hybrid, and how to decide the direction for the directed case? direction may not reflect causality Comleity vs. training error

87 Maimum Likelihood For IID data: P Dθ = Y m `θ; D = X m P X m θ log P X m θ Idea of maimum likelihood estimation MLE: ick the setting of arameters most likely to have generated the data we saw: θ ML =argma θ`θ; D Very commonly used in statistics. Often leads to intuitive, aealing, or natural estimators. For a start, the IID assumtion makes the log likelihood into a sum, so its derivative can be easily taken term by term.

88 MLE for Directed GMs For a directed GM, the likelihood function has a nice form: log P Dθ =log Q Q P Xi mx π i, θ i = P P log P Xi mx π i, θ i m i m i The arameters decoule; so we can maimize likelihood indeendently for each node s function by setting θ i Only need the values of i and its arents in order to estimate θ i Furthermore, if X m, X πi have sufficient statistics only need those. In general, for fully observed data if we know how to estimate arameters at a single node we can do it for the whole network. W W W W X Y X Y Z X Y Z

89 Three Key Regularization Ideas To avoid overfitting, we can ut riors on the arameters of the class and class conditional feature distributions We can also tie some arameters together so that fewer of them are estimated using more data Finally, we can make factorization or indeendence assumtions about the distributions. In articular, for the class conditional distributions we can assume the features are fully deendent, artly deendent, or indeendent!. Y Y Y X X X n X X X n X

90 Discrete Multinomial Naive Bayes Discrete features i, assumed indeendent given the class label y Classification rule: P X i = jy = k =η ijk Y P Xy = k, η = Y i j η [X i=j] ijk Y P y = kx, η = π k Qi Qj η[ i=j] ijk Q i Qj η[ i=j] ijq P q π q X X X n = e βt k Pq eβt q X β k =log[η k η jk η ijk log π k ] =[ =; =;...; i = j;...;]

91 Fitting Discrete Naive Bayes ML arameters are class-conditional frequency counts: P ηijk m = [m i = j][y m = k] P m [ym = k] How do we know? Write down the likelihood: `θ; D = P m log P ym π+ P mi log P m i ym, η and otimize it by setting its derivative to zero careful! enforce normalization with Lagrange multiliers: `η; D = X X [ m i = j][y m = k]logη ijk + X λ ik X η ijk m ijk ik j P ` m = [m i = j][y m = k] λ ik η ijk η ijk ` =0 λ ik = X [y m = k] ηijk =above η ijk m

92 Learning Markov Models The ML arameter estimates for a simle Markov model are easy: TY P y, y,...,y T =Py,...,y k P y t y t, y t,...,y t k log P {y} =logp y,...,y k + t=k+ Each window of k + oututs is a training case for the model P y t y t, y t,...,y t k Eamle: for discrete oututs symbols and a nd-order Markov model we can use the multinomial model: P y t = my t = a, y t = b =α mab The maimum likelihood values for α are: αmab = t s.t. y t = m, y t = a, y t = b t s.t. y t = a, y t = b TX t=k+ log P y t y t, y t,...,y t k

93 Summary Introduction to Bayesian networks Bayesian networks: definition, d-searation, equivalence of networks Eamles: causal grahs, elaining away, Markov chains, Naïve Bayes, etc Undirected models Probabilistic inference node elimination junction tree Building the networks

94 What Else Bayesian networks with continuous r.v. s PCA, ICA, other dimension reduction methods will be discussed in coming lectures! Models with unobservable r.v. s learning by Eectation-Maimization will also be delivered!

Graphical Models (Lecture 1 - Introduction)

Graphical Models (Lecture 1 - Introduction) Grahical Models Lecture - Introduction Tibério Caetano tiberiocaetano.com Statistical Machine Learning Grou NICTA Canberra LLSS Canberra 009 Tibério Caetano: Grahical Models Lecture - Introduction / 7

More information

4 : Exact Inference: Variable Elimination

4 : Exact Inference: Variable Elimination 10-708: Probabilistic Graphical Models 10-708, Spring 2014 4 : Exact Inference: Variable Elimination Lecturer: Eric P. ing Scribes: Soumya Batra, Pradeep Dasigi, Manzil Zaheer 1 Probabilistic Inference

More information

Bayesian Networks Practice

Bayesian Networks Practice Bayesian Networks Practice Part 2 2016-03-17 Byoung-Hee Kim, Seong-Ho Son Biointelligence Lab, CSE, Seoul National University Agenda Probabilistic Inference in Bayesian networks Probability basics D-searation

More information

Introduction to Probability for Graphical Models

Introduction to Probability for Graphical Models Introduction to Probability for Grahical Models CSC 4 Kaustav Kundu Thursday January 4, 06 *Most slides based on Kevin Swersky s slides, Inmar Givoni s slides, Danny Tarlow s slides, Jaser Snoek s slides,

More information

Bayesian Networks Practice

Bayesian Networks Practice ayesian Networks Practice Part 2 2016-03-17 young-hee Kim Seong-Ho Son iointelligence ab CSE Seoul National University Agenda Probabilistic Inference in ayesian networks Probability basics D-searation

More information

Based on slides by Richard Zemel

Based on slides by Richard Zemel CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

More information

Review: Directed Models (Bayes Nets)

Review: Directed Models (Bayes Nets) X Review: Directed Models (Bayes Nets) Lecture 3: Undirected Graphical Models Sam Roweis January 2, 24 Semantics: x y z if z d-separates x and y d-separation: z d-separates x from y if along every undirected

More information

Approximating min-max k-clustering

Approximating min-max k-clustering Aroximating min-max k-clustering Asaf Levin July 24, 2007 Abstract We consider the roblems of set artitioning into k clusters with minimum total cost and minimum of the maximum cost of a cluster. The cost

More information

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

Inference in Graphical Models Variable Elimination and Message Passing Algorithm Inference in Graphical Models Variable Elimination and Message Passing lgorithm Le Song Machine Learning II: dvanced Topics SE 8803ML, Spring 2012 onditional Independence ssumptions Local Markov ssumption

More information

Lecture 8: Bayesian Networks

Lecture 8: Bayesian Networks Lecture 8: Bayesian Networks Bayesian Networks Inference in Bayesian Networks COMP-652 and ECSE 608, Lecture 8 - January 31, 2017 1 Bayes nets P(E) E=1 E=0 0.005 0.995 E B P(B) B=1 B=0 0.01 0.99 E=0 E=1

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI Generative and Discriminative Approaches to Graphical Models CMSC 35900 Topics in AI Lecture 2 Yasemin Altun January 26, 2007 Review of Inference on Graphical Models Elimination algorithm finds single

More information

Machine Learning Lecture 14

Machine Learning Lecture 14 Many slides adapted from B. Schiele, S. Roth, Z. Gharahmani Machine Learning Lecture 14 Undirected Graphical Models & Inference 23.06.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de

More information

Directed and Undirected Graphical Models

Directed and Undirected Graphical Models Directed and Undirected Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Last Lecture Refresher Lecture Plan Directed

More information

Probabilistic Graphical Networks: Definitions and Basic Results

Probabilistic Graphical Networks: Definitions and Basic Results This document gives a cursory overview of Probabilistic Graphical Networks. The material has been gleaned from different sources. I make no claim to original authorship of this material. Bayesian Graphical

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 16 Undirected Graphs Undirected Separation Inferring Marginals & Conditionals Moralization Junction Trees Triangulation Undirected Graphs Separation

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Lecture 4 October 18th

Lecture 4 October 18th Directed and undirected graphical models Fall 2017 Lecture 4 October 18th Lecturer: Guillaume Obozinski Scribe: In this lecture, we will assume that all random variables are discrete, to keep notations

More information

Biointelligence Lab School of Computer Sci. & Eng. Seoul National University

Biointelligence Lab School of Computer Sci. & Eng. Seoul National University Artificial Intelligence Chater 19 easoning with Uncertain Information Biointelligence Lab School of Comuter Sci. & Eng. Seoul National University Outline l eview of Probability Theory l Probabilistic Inference

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:

More information

Probabilistic Graphical Models (I)

Probabilistic Graphical Models (I) Probabilistic Graphical Models (I) Hongxin Zhang zhx@cad.zju.edu.cn State Key Lab of CAD&CG, ZJU 2015-03-31 Probabilistic Graphical Models Modeling many real-world problems => a large number of random

More information

MATH 2710: NOTES FOR ANALYSIS

MATH 2710: NOTES FOR ANALYSIS MATH 270: NOTES FOR ANALYSIS The main ideas we will learn from analysis center around the idea of a limit. Limits occurs in several settings. We will start with finite limits of sequences, then cover infinite

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

Lecture 21: Quantum Communication

Lecture 21: Quantum Communication CS 880: Quantum Information Processing 0/6/00 Lecture : Quantum Communication Instructor: Dieter van Melkebeek Scribe: Mark Wellons Last lecture, we introduced the EPR airs which we will use in this lecture

More information

Topic: Lower Bounds on Randomized Algorithms Date: September 22, 2004 Scribe: Srinath Sridhar

Topic: Lower Bounds on Randomized Algorithms Date: September 22, 2004 Scribe: Srinath Sridhar 15-859(M): Randomized Algorithms Lecturer: Anuam Guta Toic: Lower Bounds on Randomized Algorithms Date: Setember 22, 2004 Scribe: Srinath Sridhar 4.1 Introduction In this lecture, we will first consider

More information

3 : Representation of Undirected GM

3 : Representation of Undirected GM 10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:

More information

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly

More information

Conditional Independence and Factorization

Conditional Independence and Factorization Conditional Independence and Factorization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1 Bayes Networks CS540 Bryan R Gibson University of Wisconsin-Madison Slides adapted from those used by Prof. Jerry Zhu, CS540-1 1 / 59 Outline Joint Probability: great for inference, terrible to obtain

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Convex Analysis and Economic Theory Winter 2018

Convex Analysis and Economic Theory Winter 2018 Division of the Humanities and Social Sciences Ec 181 KC Border Conve Analysis and Economic Theory Winter 2018 Toic 16: Fenchel conjugates 16.1 Conjugate functions Recall from Proosition 14.1.1 that is

More information

STA 250: Statistics. Notes 7. Bayesian Approach to Statistics. Book chapters: 7.2

STA 250: Statistics. Notes 7. Bayesian Approach to Statistics. Book chapters: 7.2 STA 25: Statistics Notes 7. Bayesian Aroach to Statistics Book chaters: 7.2 1 From calibrating a rocedure to quantifying uncertainty We saw that the central idea of classical testing is to rovide a rigorous

More information

Undirected graphical models

Undirected graphical models Undirected graphical models Semantics of probabilistic models over undirected graphs Parameters of undirected models Example applications COMP-652 and ECSE-608, February 16, 2017 1 Undirected graphical

More information

Directed Graphical Models

Directed Graphical Models Directed Graphical Models Instructor: Alan Ritter Many Slides from Tom Mitchell Graphical Models Key Idea: Conditional independence assumptions useful but Naïve Bayes is extreme! Graphical models express

More information

Bayesian Networks BY: MOHAMAD ALSABBAGH

Bayesian Networks BY: MOHAMAD ALSABBAGH Bayesian Networks BY: MOHAMAD ALSABBAGH Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN Introduction Conditional

More information

Homework Set #3 Rates definitions, Channel Coding, Source-Channel coding

Homework Set #3 Rates definitions, Channel Coding, Source-Channel coding Homework Set # Rates definitions, Channel Coding, Source-Channel coding. Rates (a) Channels coding Rate: Assuming you are sending 4 different messages using usages of a channel. What is the rate (in bits

More information

Outline. Markov Chains and Markov Models. Outline. Markov Chains. Markov Chains Definitions Huizhen Yu

Outline. Markov Chains and Markov Models. Outline. Markov Chains. Markov Chains Definitions Huizhen Yu and Markov Models Huizhen Yu janey.yu@cs.helsinki.fi Det. Comuter Science, Univ. of Helsinki Some Proerties of Probabilistic Models, Sring, 200 Huizhen Yu (U.H.) and Markov Models Jan. 2 / 32 Huizhen Yu

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Statistical Learning

Statistical Learning Statistical Learning Lecture 5: Bayesian Networks and Graphical Models Mário A. T. Figueiredo Instituto Superior Técnico & Instituto de Telecomunicações University of Lisbon, Portugal May 2018 Mário A.

More information

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm Probabilistic Graphical Models 10-708 Homework 2: Due February 24, 2014 at 4 pm Directions. This homework assignment covers the material presented in Lectures 4-8. You must complete all four problems to

More information

CSE 599d - Quantum Computing When Quantum Computers Fall Apart

CSE 599d - Quantum Computing When Quantum Computers Fall Apart CSE 599d - Quantum Comuting When Quantum Comuters Fall Aart Dave Bacon Deartment of Comuter Science & Engineering, University of Washington In this lecture we are going to begin discussing what haens to

More information

CSC 412 (Lecture 4): Undirected Graphical Models

CSC 412 (Lecture 4): Undirected Graphical Models CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:

More information

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic CS 188: Artificial Intelligence Fall 2008 Lecture 16: Bayes Nets III 10/23/2008 Announcements Midterms graded, up on glookup, back Tuesday W4 also graded, back in sections / box Past homeworks in return

More information

Chapter 16. Structured Probabilistic Models for Deep Learning

Chapter 16. Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe

More information

Learning Sequence Motif Models Using Gibbs Sampling

Learning Sequence Motif Models Using Gibbs Sampling Learning Sequence Motif Models Using Gibbs Samling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Sring 2018 Anthony Gitter gitter@biostat.wisc.edu These slides excluding third-arty material are licensed under

More information

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem Recall from last time: Conditional probabilities Our probabilistic models will compute and manipulate conditional probabilities. Given two random variables X, Y, we denote by Lecture 2: Belief (Bayesian)

More information

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma COMS 4771 Probabilistic Reasoning via Graphical Models Nakul Verma Last time Dimensionality Reduction Linear vs non-linear Dimensionality Reduction Principal Component Analysis (PCA) Non-linear methods

More information

Statistical Approaches to Learning and Discovery

Statistical Approaches to Learning and Discovery Statistical Approaches to Learning and Discovery Graphical Models Zoubin Ghahramani & Teddy Seidenfeld zoubin@cs.cmu.edu & teddy@stat.cmu.edu CALD / CS / Statistics / Philosophy Carnegie Mellon University

More information

Part III. for energy minimization

Part III. for energy minimization ICCV 2007 tutorial Part III Message-assing algorithms for energy minimization Vladimir Kolmogorov University College London Message assing ( E ( (,, Iteratively ass messages between nodes... Message udate

More information

Radial Basis Function Networks: Algorithms

Radial Basis Function Networks: Algorithms Radial Basis Function Networks: Algorithms Introduction to Neural Networks : Lecture 13 John A. Bullinaria, 2004 1. The RBF Maing 2. The RBF Network Architecture 3. Comutational Power of RBF Networks 4.

More information

Named Entity Recognition using Maximum Entropy Model SEEM5680

Named Entity Recognition using Maximum Entropy Model SEEM5680 Named Entity Recognition using Maximum Entroy Model SEEM5680 Named Entity Recognition System Named Entity Recognition (NER): Identifying certain hrases/word sequences in a free text. Generally it involves

More information

Artificial Intelligence Bayes Nets: Independence

Artificial Intelligence Bayes Nets: Independence Artificial Intelligence Bayes Nets: Independence Instructors: David Suter and Qince Li Course Delivered @ Harbin Institute of Technology [Many slides adapted from those created by Dan Klein and Pieter

More information

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Part I C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Probabilistic Graphical Models Graphical representation of a probabilistic model Each variable corresponds to a

More information

Bayes Nets: Independence

Bayes Nets: Independence Bayes Nets: Independence [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://ai.berkeley.edu.] Bayes Nets A Bayes

More information

CS 5522: Artificial Intelligence II

CS 5522: Artificial Intelligence II CS 5522: Artificial Intelligence II Bayes Nets: Independence Instructor: Alan Ritter Ohio State University [These slides were adapted from CS188 Intro to AI at UC Berkeley. All materials available at http://ai.berkeley.edu.]

More information

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004 A Brief Introduction to Graphical Models Presenter: Yijuan Lu November 12,2004 References Introduction to Graphical Models, Kevin Murphy, Technical Report, May 2001 Learning in Graphical Models, Michael

More information

Model checking, verification of CTL. One must verify or expel... doubts, and convert them into the certainty of YES [Thomas Carlyle]

Model checking, verification of CTL. One must verify or expel... doubts, and convert them into the certainty of YES [Thomas Carlyle] Chater 5 Model checking, verification of CTL One must verify or exel... doubts, and convert them into the certainty of YES or NO. [Thomas Carlyle] 5. The verification setting Page 66 We introduce linear

More information

Lecture 6: Graphical Models

Lecture 6: Graphical Models Lecture 6: Graphical Models Kai-Wei Chang CS @ Uniersity of Virginia kw@kwchang.net Some slides are adapted from Viek Skirmar s course on Structured Prediction 1 So far We discussed sequence labeling tasks:

More information

CS 188: Artificial Intelligence. Bayes Nets

CS 188: Artificial Intelligence. Bayes Nets CS 188: Artificial Intelligence Probabilistic Inference: Enumeration, Variable Elimination, Sampling Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Undirected Graphical Models: Markov Random Fields

Undirected Graphical Models: Markov Random Fields Undirected Graphical Models: Markov Random Fields 40-956 Advanced Topics in AI: Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2015 Markov Random Field Structure: undirected

More information

Use of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek

Use of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek Use of Transformations and the Reeated Statement in PROC GLM in SAS Ed Stanek Introduction We describe how the Reeated Statement in PROC GLM in SAS transforms the data to rovide tests of hyotheses of interest.

More information

ECE 534 Information Theory - Midterm 2

ECE 534 Information Theory - Midterm 2 ECE 534 Information Theory - Midterm Nov.4, 009. 3:30-4:45 in LH03. You will be given the full class time: 75 minutes. Use it wisely! Many of the roblems have short answers; try to find shortcuts. You

More information

Cours 7 12th November 2014

Cours 7 12th November 2014 Sum Product Algorithm and Hidden Markov Model 2014/2015 Cours 7 12th November 2014 Enseignant: Francis Bach Scribe: Pauline Luc, Mathieu Andreux 7.1 Sum Product Algorithm 7.1.1 Motivations Inference, along

More information

RANDOM WALKS AND PERCOLATION: AN ANALYSIS OF CURRENT RESEARCH ON MODELING NATURAL PROCESSES

RANDOM WALKS AND PERCOLATION: AN ANALYSIS OF CURRENT RESEARCH ON MODELING NATURAL PROCESSES RANDOM WALKS AND PERCOLATION: AN ANALYSIS OF CURRENT RESEARCH ON MODELING NATURAL PROCESSES AARON ZWIEBACH Abstract. In this aer we will analyze research that has been recently done in the field of discrete

More information

Numerical Linear Algebra

Numerical Linear Algebra Numerical Linear Algebra Numerous alications in statistics, articularly in the fitting of linear models. Notation and conventions: Elements of a matrix A are denoted by a ij, where i indexes the rows and

More information

4.1 Notation and probability review

4.1 Notation and probability review Directed and undirected graphical models Fall 2015 Lecture 4 October 21st Lecturer: Simon Lacoste-Julien Scribe: Jaime Roquero, JieYing Wu 4.1 Notation and probability review 4.1.1 Notations Let us recall

More information

John Weatherwax. Analysis of Parallel Depth First Search Algorithms

John Weatherwax. Analysis of Parallel Depth First Search Algorithms Sulementary Discussions and Solutions to Selected Problems in: Introduction to Parallel Comuting by Viin Kumar, Ananth Grama, Anshul Guta, & George Karyis John Weatherwax Chater 8 Analysis of Parallel

More information

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III AI*IA 23 Fusion of Multile Pattern Classifiers PART III AI*IA 23 Tutorial on Fusion of Multile Pattern Classifiers by F. Roli 49 Methods for fusing multile classifiers Methods for fusing multile classifiers

More information

Lecture 15. Probabilistic Models on Graph

Lecture 15. Probabilistic Models on Graph Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how

More information

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL Technical Sciences and Alied Mathematics MODELING THE RELIABILITY OF CISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL Cezar VASILESCU Regional Deartment of Defense Resources Management

More information

Probabilistic Graphical Models. Guest Lecture by Narges Razavian Machine Learning Class April

Probabilistic Graphical Models. Guest Lecture by Narges Razavian Machine Learning Class April Probabilistic Graphical Models Guest Lecture by Narges Razavian Machine Learning Class April 14 2017 Today What is probabilistic graphical model and why it is useful? Bayesian Networks Basic Inference

More information

Outline. Spring It Introduction Representation. Markov Random Field. Conclusion. Conditional Independence Inference: Variable elimination

Outline. Spring It Introduction Representation. Markov Random Field. Conclusion. Conditional Independence Inference: Variable elimination Probabilistic Graphical Models COMP 790-90 Seminar Spring 2011 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Outline It Introduction ti Representation Bayesian network Conditional Independence Inference:

More information

5. Sum-product algorithm

5. Sum-product algorithm Sum-product algorithm 5-1 5. Sum-product algorithm Elimination algorithm Sum-product algorithm on a line Sum-product algorithm on a tree Sum-product algorithm 5-2 Inference tasks on graphical models consider

More information

Uncertainty and Bayesian Networks

Uncertainty and Bayesian Networks Uncertainty and Bayesian Networks Tutorial 3 Tutorial 3 1 Outline Uncertainty Probability Syntax and Semantics for Uncertainty Inference Independence and Bayes Rule Syntax and Semantics for Bayesian Networks

More information

Introduction to Probabilistic Graphical Models

Introduction to Probabilistic Graphical Models Introduction to Probabilistic Graphical Models Sargur Srihari srihari@cedar.buffalo.edu 1 Topics 1. What are probabilistic graphical models (PGMs) 2. Use of PGMs Engineering and AI 3. Directionality in

More information

arxiv: v1 [physics.data-an] 26 Oct 2012

arxiv: v1 [physics.data-an] 26 Oct 2012 Constraints on Yield Parameters in Extended Maximum Likelihood Fits Till Moritz Karbach a, Maximilian Schlu b a TU Dortmund, Germany, moritz.karbach@cern.ch b TU Dortmund, Germany, maximilian.schlu@cern.ch

More information

Bayes Networks 6.872/HST.950

Bayes Networks 6.872/HST.950 Bayes Networks 6.872/HST.950 What Probabilistic Models Should We Use? Full joint distribution Completely expressive Hugely data-hungry Exponential computational complexity Naive Bayes (full conditional

More information

A CONCRETE EXAMPLE OF PRIME BEHAVIOR IN QUADRATIC FIELDS. 1. Abstract

A CONCRETE EXAMPLE OF PRIME BEHAVIOR IN QUADRATIC FIELDS. 1. Abstract A CONCRETE EXAMPLE OF PRIME BEHAVIOR IN QUADRATIC FIELDS CASEY BRUCK 1. Abstract The goal of this aer is to rovide a concise way for undergraduate mathematics students to learn about how rime numbers behave

More information

Lecture 6. 2 Recurrence/transience, harmonic functions and martingales

Lecture 6. 2 Recurrence/transience, harmonic functions and martingales Lecture 6 Classification of states We have shown that all states of an irreducible countable state Markov chain must of the same tye. This gives rise to the following classification. Definition. [Classification

More information

Computational Complexity of Inference

Computational Complexity of Inference Computational Complexity of Inference Sargur srihari@cedar.buffalo.edu 1 Topics 1. What is Inference? 2. Complexity Classes 3. Exact Inference 1. Variable Elimination Sum-Product Algorithm 2. Factor Graphs

More information

MAP Estimation Algorithms in Computer Vision - Part II

MAP Estimation Algorithms in Computer Vision - Part II MAP Estimation Algorithms in Comuter Vision - Part II M. Pawan Kumar, University of Oford Pushmeet Kohli, Microsoft Research Eamle: Image Segmentation E() = c i i + c ij i (1- j ) i i,j E: {0,1} n R 0

More information

Lecture 7: Linear Classification Methods

Lecture 7: Linear Classification Methods Homeork Homeork Lecture 7: Linear lassification Methods Final rojects? Grous oics Proosal eek 5 Lecture is oster session, Jacobs Hall Lobby, snacks Final reort 5 June. What is linear classification? lassification

More information

HENSEL S LEMMA KEITH CONRAD

HENSEL S LEMMA KEITH CONRAD HENSEL S LEMMA KEITH CONRAD 1. Introduction In the -adic integers, congruences are aroximations: for a and b in Z, a b mod n is the same as a b 1/ n. Turning information modulo one ower of into similar

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

Conditional Random Field

Conditional Random Field Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions

More information

p L yi z n m x N n xi

p L yi z n m x N n xi y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen

More information

Lecture 2: Simple Classifiers

Lecture 2: Simple Classifiers CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 2: Simple Classifiers Slides based on Rich Zemel s All lecture slides will be available on the course website: www.cs.toronto.edu/~jessebett/csc412

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 12 Dynamical Models CS/CNS/EE 155 Andreas Krause Homework 3 out tonight Start early!! Announcements Project milestones due today Please email to TAs 2 Parameter learning

More information

Brownian Motion and Random Prime Factorization

Brownian Motion and Random Prime Factorization Brownian Motion and Random Prime Factorization Kendrick Tang June 4, 202 Contents Introduction 2 2 Brownian Motion 2 2. Develoing Brownian Motion.................... 2 2.. Measure Saces and Borel Sigma-Algebras.........

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

PROBABILISTIC REASONING SYSTEMS

PROBABILISTIC REASONING SYSTEMS PROBABILISTIC REASONING SYSTEMS In which we explain how to build reasoning systems that use network models to reason with uncertainty according to the laws of probability theory. Outline Knowledge in uncertain

More information

Bayes Nets III: Inference

Bayes Nets III: Inference 1 Hal Daumé III (me@hal3.name) Bayes Nets III: Inference Hal Daumé III Computer Science University of Maryland me@hal3.name CS 421: Introduction to Artificial Intelligence 10 Apr 2012 Many slides courtesy

More information

Intelligent Systems:

Intelligent Systems: Intelligent Systems: Undirected Graphical models (Factor Graphs) (2 lectures) Carsten Rother 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM Roadmap for next two lectures Definition

More information

SQUARES IN Z/NZ. q = ( 1) (p 1)(q 1)

SQUARES IN Z/NZ. q = ( 1) (p 1)(q 1) SQUARES I Z/Z We study squares in the ring Z/Z from a theoretical and comutational oint of view. We resent two related crytograhic schemes. 1. SQUARES I Z/Z Consider for eamle the rime = 13. Write the

More information

3 More applications of derivatives

3 More applications of derivatives 3 More alications of derivatives 3.1 Eact & ineact differentials in thermodynamics So far we have been discussing total or eact differentials ( ( u u du = d + dy, (1 but we could imagine a more general

More information

4. Score normalization technical details We now discuss the technical details of the score normalization method.

4. Score normalization technical details We now discuss the technical details of the score normalization method. SMT SCORING SYSTEM This document describes the scoring system for the Stanford Math Tournament We begin by giving an overview of the changes to scoring and a non-technical descrition of the scoring rules

More information

Bayesian Networks for Modeling and Managing Risks of Natural Hazards

Bayesian Networks for Modeling and Managing Risks of Natural Hazards [National Telford Institute and Scottish Informatics and Comuter Science Alliance, Glasgow University, Set 8, 200 ] Bayesian Networks for Modeling and Managing Risks of Natural Hazards Daniel Straub Engineering

More information