Bayesian networks. 鮑興國 Ph.D. National Taiwan University of Science and Technology
|
|
- Andrew Wheeler
- 5 years ago
- Views:
Transcription
1 Bayesian networks 鮑興國 Ph.D. National Taiwan University of Science and Technology
2 Outline Introduction to Bayesian networks Bayesian networks: definition, d-searation, equivalence of networks Eamles: causal grahs, elaining away, Markov chains, Naïve Bayes, etc Undirected models Probabilistic inference node elimination junction tree Building the networks
3 Probabilistic Grahical Models Combination of grah theory and robability theory Informally, Grah structure secifies which arts of system are directly deendent Local functions at each node secify how arts interact More formally, Grah encodes conditional indeendence assumtions Local functions at each node are factors in the joint robability distribution E.g. : Bayesian networks = PGMs based on directed acyclic grahs E.g. : Markov networks Markov random fields = PGM with undirected grah Others: Discrete models, continuous models, Gaussian grahical models, hybrid Bayesian networks, CG models, etc
4 Alications of PGMs Machine learning Statistics Seech recognition Natural language rocessing Comuter vision Error-control codes Bioinformatics Medical diagnosis Financial redictions etc
5 Why Probability? Why robabilistic grahical models? Why robabilistic aroach can hel us in machine learning? Guess : world is nondeterministic or Guess : world is deterministic, but lacking of relevant facts/attributes to decide the answer robabilistic model can make inferences about missing inuts classification is one eamle missing is y! Either way, robabilistic aroach can make decisions which minimize eected loss
6 Probabilistic Aroach Classification and regression: conditional density estimation PY X then, by Bayes rule Unsuervised learning: density estimation PX Imaging Y as a latent variable, classification is no more than missing value filling N N... d d d N d+ d+ d+ N... d+k d+k d+k N y y y Y X ste : building the model ste : inference
7 Why Grahical Models? We discuss this issue later on Basically, grahical models hel us to deal with comlicated roblem modularized and in a visualized way That is, breaking roblems to subroblems recursively until we can solve them
8 Welcome to Hotel California! Srinkler Cloudy Rain P C, S, R, W, F = P CP SCP RC, S P W C, S, RP F C, S, R, W = P CP SCP RC P W S, RP F R Wet grass roof actually Conditional Probability Table PW R, S = 0.95 PW R, ~S = 0.90 PW ~R, S = 0.90 PW ~R, ~S = 0.0 P RC, S = P RC P W C, S, R = P W S, R P F C, S, R, W = P F R
9 Joint Probability Goal : reresent a joint distribution PX = = PX =,, X n = n comactly even when there are many variables Goal : efficiently calculate marginal and conditionals of such comactly reresented joint distribution For n discrete variables of arity k, the naïve table resentation if HUGE: it requires k n entries We need to make some assumtions about the distribution One simle assumtion: indeendence = comlete factorization PX = Π i PX i But the indeendence assumtion is too restrictive. So we make conditional indeendence assumtion instead
10 Conditional Indeendence Notation: X A X B X C Definition: two sets of variables X A and X B are conditionally indeendent given a third X C if: PX A, X B X C = PX A X C PX B X C X C which is equivalent to saying PX A X B, X C = PX A X C X C Only a subset of all distribution resect any given nontrivial conditional indeendence statement. The subset of distributions that resect all the CI assumtions we make is the family of distributions consistent with our assumtions Bayesian networks robabilistic grahical models are a owerful, elegant and simle way to secify such a family
11 Between Simle and Comle Models Given n r.v. s X, X,, X n, with state sace = {,, 3}, let us consider the degree of freedom or comleity of the model to describe the joint robabilities most general df = 3 n not efficient in time and sace??? Bayesian networks indeendent df = n PX i =, PX i = i.i.d. df = PX =, PX =
12 Bayesian Belief networks Bayesian networks grahical models is an intermediate aroach i.i.d. assumtion too restrictive the most general cases ineffective BN GM reresent large joint distributions comactly using a set of local relationshis secified by a grah. The joint robability then can be factorized into such local relationshis BN describes conditional indeendence among subsets of variables conditional robabilities joint robabilities Bayes theorem Bayesian networks or directed models vs. Markov random fields MRF, or undirected models Belief networks using conditional r., while Markov random fields using otential function
13 Part I Bayesian Networks: Introduction to Directed Models and Undirected Models
14 Outline Introduction to Bayesian networks Bayesian networks: definition, d-searation, equivalence of networks Eamles: causal grahs, elaining away, Markov chains, Naïve Bayes, etc Undirected models Probabilistic inference node elimination junction tree Building the networks
15 Bayesian Networks A.K.A. belief network, directed grahical model Definition a set of nodes reresenting random variables and a set of directed edges between variables each variable has a finite set of mutually eclusive states to each variable X with arents Y, Y,, Y n, there is attached the conditional robability table PX Y, Y,, Y n
16 Bayesian Networks cont. Informally, edges reresent causation The variables together with the directed edges form a Directed Acyclic Grah no cycles allowed X Y X, Y = X Y X Y X, Y = X Y X V W Y X 3 X X 3 Z X X X X X U
17 Factorization Consider directed acyclic grahs over n variables. Each node has ossibly emty set of arents π i. Each node maintains a function f i X i ; X ùi such that f i > 0and P i f i X i = i ; X ùi = ù i Define the joint robability to be: PX,X,...,X n = Q i Even with no further restriction on the the f i, it is always true that f i X i ; X ùi =PX i X ùi so we will just write PX,X,...,X n = Q i f i X i ; X ùi PX i X ùi Factorization of the joint in terms of local conditional robabilities. Eonential in fan-in of each node instead of in total variables n.
18 So by the Alternative Definition X X X 4 P X,X,X 3,X 4,X 5,X 6 = Y i X 6 P X,X,X 3,X 4,X 5,X 6 f i X i X πi X 3 X 5 = P X P X X P X 3 X P X 4 X P X 5 X 3 P X 6 X,X 5 PX,X 5,X 6 = X X,X 3,X 4 f X f X ; X f 3 X 3 ; X f 4 X 4 ; X f 5 X 5 ; X 3 f 6 X 6 ; X,X 5 = f 6 X 6 ; X,X 5 X X,X 3,X 4 f X f X ; X f 3 X 3 ; X f 4 X 4 ; X f 5 X 5 ; X 3 PX,X 5 = X X f X f X ; X f 3 X 3 ; X f 4 X 4 ; X f 5 X 5 ; X 3 f 6 X 6 ; X,X 5 X,X 3,X 4 X 6 = X f X f X ; X f 3 X 3 ; X f 4 X 4 ; X f 5 X 5 ; X 3 X,X 3,X 4 PX 6 X,X 5 =PX,X 5,X 6 /PX,X 5 =f 6 X 6 ; X,X 5
19 Conditional Indeendence in DAGs If we order the nodes in a directed grahical model so that arents always come before their children in the ordering then the grahical model imlies the following about the distribution: {X i X ùi àx ùi } i X ùi where à are the nodes coming before X i that are not its arents. In other words, the DAG is telling us that each variable is conditionally indeendent of its nondescendants given its arents, this is called local Markov roerty Such an ordering is called a toological ordering
20 From the Definition again! X X 4 X X 6 Prove: X 4 X X X 3 X 5 PX [..4] =P X P X X PX 3 X P X 4 X X X 5 P X 5 X 3 X X 6 PX 6 X,X 5 = P X P X X PX 3 X P X 4 X PX [..3] =P X P X X PX 3 X X X 4 P X 4 X X X 5 PX 5 X 3 X X 6 P X 6 X,X 5 = P X P X X PX 3 X PX 4 X [..3] =PX 4 X and PX 4 X,X =PX 4 X why?
21 Saving in Time & Sace X X X 4 X 3 X 5 X 6 Conditional indeendence: E.g.a: X 4 X X E.g.b: X 5 X X 3 E.g.c: X 5 X 6 X 3 Idea can be etended to sets: E.g.a: X 6 X 3 X, X 5 E.g.b: X 5, X 6 X X, X 3 PX [..6] = X X [..6] PX PX X PX 3 X,X PX 4 X [..3] PX 5 X [..4] PX 6 X [..5] = X X [..6] PX PX X PX 3 X P X 4 X P X 5 X 3 PX 6 X,X 5
22 Saving in Time & Sace cont. X X X 4 X 6 P X,X,X 3,X 4,X 5,X 6 = P X P X X P X 3 X P X 4 X P X 5 X 3 P X 6 X,X 5 X 3 X 5 Missing edges imly cond. indeendence Each entry/cell reresents a cond. rob. combination The biggest table, the bottleneck of the comutation
23 Eamle: Causal Grahs Rain PR = 0.4 PRW = PWRPR PW 0.9â0.4 = 0.9â0.4+0.â0.6 =0.75 Wet grass PW R = 0.9 PW ~R = 0. In general, fewer values need to be secified for the whole distribution than that for the unconstrained case O n On k, where k is the maimum fan-in of a node
24 Eamle: Comlete Grah P X,X,X 3,X 4,X 5 = P X P X X P X 3 X,X P X 4 X,X,X 3 P X 5 X,X,X 3,X 4 X X X 3 X 4 X 5 A network reresents a relationshi between r.v. s, however, the reresentation to give this relationshi is not unique; e.g., different orders of X i can roduce different networks
25 Eamle: Markov Chain X X X 3 X 4 X 5 P X = P X,X,...,X n = P X n X n,...,x P X n X n,...,x PX X P X = P X n X n P X n X n PX X P X = P X Y n ix i i=
26 Naïve Bayes What is the connection between a BN and classification? Suose one of the variables is the target variable. Can we comute the robability of the target variable given the other variables? In Naïve Bayes: Concet Cj Directions are not the same with the direction of classification! X X X n PX, X,, X n, Cj = PCj PX C j PX C j PX n C j
27 Elaining Away PS = 0. PR = 0.4 Srinkler Rain PSW = PWSPS PW = 0.9â =0.35 PW R, S = 0.95 PW R, ~S = 0.90 PW ~R, S = 0.90 PW ~R, ~S = 0.0 Wet grass PSR, W = PWR,SPSR PWR = PWR,SPS PWR =0. <PSW Knowing it rained, the robability that srinkler is on to cause the web grass decreases, knowing that the grass is wet Berkson s arado!
28 Directed-searation: Definition Definition d-searation A ath is said to be d-searated or blocked if and only if contains a chain X Y Z or a fork X Y Z and Y is instantiated shaded X Y Z X Y Z contains an inverted fork or collider X Y Z, and neither Y nor any descendants of Y have received evidence X U Y Z
29 Directed-searation: Definition cont. A set A is said to be d-searated from a set B if every ath from a node in A to a node in B is d-searated A and B connected if they are not searated! A set A is said to be d-searated from a set B given a set C if every ath from a node in A to a node in B is d-searated, given all nodes in C are instantiated! A and B connected given C if they are not searated given C!
30 D-searation and Conditional Indeendence G = set of C.I. rules = family of distributions Probabilistic Imlications of d-searation If sets A and B are d-searated given C in a DAG G, then A is indeendent of B conditional on C in every distribution comatible with G. Conversely, if A and B are not d-searated given C in a DAG G, then A and B are deendent conditional on C in at least one distribution comatible with G.
31 D-searation: eamle I X W Y Z X Z Z Z 3 Y R S X and Y are d-searated given Z and d-connected given Z T X and R are d-searated by {Y, Z} X and T are d-searated by {Y, Z} W and T are d-searated by {R} W and X are not d-searated by Y W and X are d-searated by
32 D-searation: eamle II Is X d-searated from Y? X Y e e e 3 e 4 e 5 d-searated: given no evidence or e or e 3 or only e 4 d-connected: given only e or only e 5 rovided the rest ath is connected
33 Bayes Bouncing Ball Rules A is d-searated from B given C if we cannot send a ball from any node in A to any node in B according to the rules below, where shaded nodes are in C Case : X Y Z X Y Z Case : X Y Z X Y Z Case 3: X Y Z X Y Z
34 Bayes Ball Rules cont. Boundary condition Case : X Y X Y Case : X Y X Y not really necessary X Y ancestor X Y descendant
35 Undirected Grahical Models A.K.A Markov Random Fields, Markov Networks Also grahs with one node er random variable and edges that connect airs of nodes, but now the edges are undirected Semantics: every node set is conditionally indeendent from its non-neighbours given its neighbours, i.e. X A X C X B if every ath between X A and X C goes through X B Can model symmetric interactions that directed models cannot! X A X B X C
36 Simle Grah Searation In undirected models, simle grah searation as oosed to d-searation tells us about conditional indeendencies X A X C X B if every ath between X A and X C is blocked by some node in X B X A X B X C Markov Ball algorithm: remove X B and see if there is any ath from X A to X C
37 Conditional Parameterization? In directed models, we started with X = Q i i ùi and we derived the d-searation semantics from that. Undirected models: have the semantics, need arametrization. What about this conditional arameterization? X = Q i i neighborsi Good: roduct of local functions. Good: each one has a simle conditional interretation. Bad: local functions cannot be arbitrary, but must agree roerly in order to define a valid distribution.
38 Marginal Parameterization? OK, what about this marginal arameterization? X = Q i i ùi X = Q i i, neighborsi Good: roduct of local functions. Good: each one has a simle marginal interretation. Bad: only very few athalogical marginals on overaling nodes can be multilied to give a valid joint.
39 Interretation of Clique Potentials X Y Z The model imlies X Z Y PX, Y, Z = PY PX Y PZ Y We can write this as: PX, Y, Z = PX, Y PZ Y = ψ y X, Y ψ yz Y, Z PX, Y, Z = PX Y PZ, Y = ψ y X, Y ψ yz Y, Z cannot have all otentials be marginals cannot have all otentials be conditionals The ositive clique otentials can only be thought of as general comatibility, goodness or hainess functions over their variables, but not as robability distributions.
40 When Causality not Playing a Role! X P X Q X R C D E ψ: otential functions B A G F X P, X Q, X R : maimal cliques P A, B, C, D, E, F, G = P A, B, C, DP E,F,GA, B, C, D = P A, B, C, DP E,F,GD = P A, B, C, DP EDP F, GD,E = P A, B, C, DP EDP F, GE = ψa, B, C, DψD, EψE,F,G = ψx P ψx Q ψx R
41 An eamle of Undirected Model Whatever factorization we ick, we know that only connected nodes can be arguments of a single local function. A clique is a fully connected subset of nodes. Thus, consider using a roduct of ositive clique otentials: P X = ψc c Z = ψc c Z cliques c X cliques c The roduct of functions that don t need to agree with each other. Still factors in the way that the grah semantics demand. Without loss of generality we can restrict ourselves to maimal cliques. Why? Potential functions P X [ :5] = ψ X, X, X 3 ψ X 3, X 4 ψ X 3, X 5 Z Partition function 3 4 5
42 Partition Function Normalizer ZX above is called the artition function. Comuting the normalizer and its derivatives can often be the hardest art of inference and learning in undirected models. Often the factored structure of the distribution makes it ossible to efficiently do the sums/integrals required to comute Z. Don t always have to comute Z, e.g. for conditional robabilities.
43 Directed vs. Undirected Models Directed models using conditional rob. for each local substructure called Bayesian network may describe some distributions which can not be described by undirected models mainly used in A.I., diagnostic, decision making, etc. Undirected models using otential functions in each local substructure called Markov random field or Markov network may describe some distributions which can not be described by directed models for roblems with little causal structure to guide the grah construction: image restoration, certain otimization roblems, models of hysical systems
44 A Directed Model of 3 r.v. A general dist. PX, Y, Z = PX PY X PZ X, Y Chain PX, Y, Z = PX PY X PZ Y X Z X Z Y Elaining away PX, Y, Z = PX PY PZ X, Y X Y Fork PX, Y, Z = PY PX Y PZ Y Z the same grou One link PX, Y, Z = PX PY PZ Y X Y Y All indeendence PX, Y, Z = PX PY PZ Z X Z X Z Y Y
45 An Undirected Model of 3 r.v. A general dist. PX, Y, Z = PX PY X PZ X, Y = ψx, Y, Z One link 3 cases PX, Y, Z = PX PY PZ Y = ψx ψy, Z X Z X Z Y Y Two links 3 cases PX, Y, Z = PY, Z X PX = PX PY X PZ X = ψx, Y ψx, Z All indeendence PX, Y, Z = PX PY PZ = ψx ψy ψz X Z X Z Y Y
46 Between Directed and Undirected Models Directed can t do it! X Y { W, Z } W Z { X, Y } Must be acyclic, will have at least one V structure and Bayes ball goes through X Y W X Y { W, Z } W W X Y X Y Z Undirected can t do it! X Z X Z Y Undirected can t do it! X Z Y X Z Z X Z Y Y X Z X Z Y
47 Probabilistic Grahical Models Grahical Models Probabilistic Models Directed Models Bayesian belief nets Alarm network State-sace models HMMS Naïve Bayes classifier PCA / ICA Undirected Models Markov nets Markov Random Field Boltzmann machine Ising model Ma-ent model Log-linear models
48 Part II Probabilistic Inference
49 Outline Introduction to Bayesian networks Bayesian networks: definition, d-searation, equivalence of networks Eamles: causal grahs, elaining away, Markov chains, Naïve Bayes, etc Undirected models Probabilistic inference node elimination junction tree Building the networks
50 Probabilistic Inference Task infer value of some X i given values of some other X j in the network Facts all other attributes known: easy general case: NP-hard Cooer, 990 roving 3-SAT easier than robabilistic inference! aroimated inference: still NP-hard Dagum & Luby, 993 Monte-Carlo Pradham & Dagum, 996
51 Probabilistic Inference II Partition the random variables in a domain X into three disjoint subsets X E, X F, X R. The general robabilistic inference roblem is to comute the osterior X F X E over the query nodes X F. This involves conditioning on evidence nodes X E and integrating summing out marginal nodes X R If the joint distribution is reresented as a huge table, this is trivial: just select the aroriate indices in the columns corresonding to X E based on the values, sum over the columns corresonding to X R, and renormalize the resulting table over X F
52 Probabilistic Inference III If the joint is a known continuous function this can sometimes be done analytically. e.g. Gaussian: eliminate rows/cols corresonding to X R ; aly conditioning formulas for X F X E But what if the joint distribution over X is reresented by a directed or undirected grahical model?
53 Recall Bayes Rule For simle models, we can derive the inference formulae by hand using Bayes rule This is called reversing the arrow In general, the calculation we want to do is: Q: Can we do these sums efficiently? Q: Can we avoid reeating unnecessary work each time we do inference? A: Yes, if we eloit the factorization of the joint distribution X Y X Y X Y a b c = = y y y y y y, c b a = R F R R F E R F E E F,,,,,
54 Take Advantage of Distributed Law X X 4 Comute PX [..5]? X X 6 X 3 X 5 P X [..5] = X X 6 P X P X X P X 3 X P X 4 X P X 5 X 3 P X 6 X,X 5 X 6 0 X 0 5 X : 0 = P X P X X P X 3 X P X 4 X P X 5 X 3 X X 6 P X 6 X,X 5 = P X P X X P X 3 X P X 4 X P X 5 X 3 5 multilications, additions, 5 multilications, additions!
55 The Generalized Distributed Law ab+ ac= a b + c left: +, right:, +
56 一場遊戲一場夢?!
57 The Most Probable Path A B n n Given: a multilayer network, with transition robability a kl shown on edges. Problem: find the most robable ath from A to B One of the solutions is given by Viterbi algorithm, using dynamic rogramming
58 The Viterbi Algorithm All aths have the same start state A, so v 0 0 =. By keeing ointers backwards, the actual ath can be found by backtracking. The full algorithm: Init i = 0: Recursion i Termination : Traceback i =..n: = n.. : 0 =, v * i Note : end state is assumed v 0 v l tr l = π π i = * n i ma P Y, π* = = arg ma arg ma = i k 0 k v ma i k * tr π = k v 0 i k k k v for a v k k i n a k kl n a k 0 a > k 0 0 kl
59 Commutative Semi-ring A set K, together with two binary oerations called + and, which satisfy the following three aioms:. The oerations + is associative and commutative, and there is an additive identity element called 0 s.t. k + 0 = k, k. The oeration is also associative and commutative, and there is a multilicative identity element called s.t. k = k, k 3. The distributive law holds, i.e. a b + a c = a b + c, a, b, c from K A semi-ring is a commutative ring without the additive inverse
60 Eamle: ma-roduct K = R + = [0, +, + : ma, : usual multilication. Checking the oerations + mak, 0 = k, k ; i.e., identity: 0. Checking the oeration k = k, k ; i.e., identity: 3. The distributive law holds, i.e. ma{a b, a c} = a ma b, c, a, b, c from K Other eamles: min-roduct, min-sum, ma-sum, etc.
61 Viterbi vs. GDL Viterbi is just a form of GDL, by choosing an aroriate semi-ring In fact, many dynamic rogramming rocesses can be interreted by GDL Other eamles: Baum-Welch algorithm, FFTfast Fourier transform on any finite Abelian grou, Gallager-Tanner-Wiberg decoding algorithm, BCJR algorithm, Pearl s belief roagation, Shafer-Shenoy robability roagation algorithm, turbo decoding algorithm, etc.
62 Eamle Comute? ,,,,,,,,,, ,,, ,,, Φ = Φ Φ = Φ = = = = = ' 6 6 ', X : 0 X 3 0 X 5 0 /, =
63 Single Node Posteriors For a single node osterior i.e. X F is a single node, there is a simle, efficient algorithm based on eliminating nodes. Notation: i is the value of evidence node i. The algorithm, called ELIMINATION, requires a node ordering to be given, which tells it which order to do the summations in. In this ordering, the query node must aear last. Grahically, we ll remove a node from the grah once we sum it out
64 Evidence Potentials Elimination also uses a bookeeing trick, called evidential functions: g X i = g X i δ X i, X i i where δ i, is if and 0 otherwise. i i = i This trick allows us to treat conditioning in the same way as we treat marginalization. So everything boils down to doing sums: PX PX PX F F E X E,X = E We just ick an ordering and go for it... = = PX R R E F F E,X E PX PX / PX F F,X,X E E E,X,X R R δ X δ X E E,X,X E E
65 Elimination Algorithm ELIMINATEG lace all PX i X ùi and δ X i, X i on the active list choose an ordering I such that F aears last for each X i in I find all otentials on the active list that reference X i and remove them from the active list define a new otential as the sum with resect to X i of the roduct of these otentials lace the new otential on the active list end return the roduct of the remaining otentials
66 Algorithm Details At each ste we are trying to remove the current variable in the elimination ordering from the distribution For marginal nodes the sums them out, for evidence nodes this conditions on their observed values using the evidential functions Each ste erforms a sum over a roduct of otential functions The algorithm terminates when we reach the query node, which always aears last in the ordering We renormalize what we have left to get the final result PX F X E For undirected models, everything is the same ecet that initialization hase uses the clique otentials instead of the arent-conditionals
67 Marginalization without Evidence Marginalization of joint distributions reresented by grahical models is a secial case of robabilistic inference To comute the marginal PX i of a single node, we set it to be the query node and set the evidence set to be emty In directed models, we can ignore all nodes downstream from the query node, and marginalize only the art of the grah before it If the node has no arents, we can read off its marginal directly In undirected models, we need to do the full comutation: comute PX i / Z using elimination and then normalize in the last ste of elimination to get Z. We can reuse Z later if we want to save work
68 Efficiency Trick in Directed Elimination In directed models, we often know that a certain sum must evaluate to unity, since it is a conditional robability. For eamle, consider the term Φ 4 in our si node eamle: Φ 4 = P 4 4 ñ We can t use this trick in undirected models, because there are no guarantees about what clique otentials sum to.
69 Node Elimination The algorithm we resented is really a way of eliminating nodes from a grah one by one. For undirected grahs: foreach node i in ordering I: connect all the neighbors of i remove i from the grah end The removal oeration requires summing out i or conditioning on observed evidence for i. Summing out i leaves a function involving all its revious neighbors and thus they become connected by this ste. The original grah, augmented by all the added edges is now a triangulated grah. Reminder: triangulated means that every cycle of length >3 contains a chord, i.e. an edge not on the cycle but between two nodes in the cycle.
70 Eamle: Node/Variable Elimination X X X 4 X 3 X 5 X 6 Push the sums in as far as ossible After erforming the innermost sum, we create a new term which does not deend on the term summed out Continue to do summations, PX, X 6 = X ψx,x ψx,x 3 ψx,x 4 ψx 3,X 5 ψx,x 6 ψx 5,X 6 Z X [..5] = X X X X ψx,x ψx,x 4 ψx,x 3 ψx 3,X 5 ψx, X 6 ψx 5, X 6 Z X X 4 X 3 X 5 = X X X X ψx,x ψx,x 4 ψx,x 3 ψx 3,X 5 ΦX,X 5 Z X X 4 X 3 X 5 = X X X ψx,x ψx,x 4 ψx,x 3 ΦX,X 3 Z X X 4 X 3 = X ψx,x ΦX ΦX,X = Z Z ΦX X
71 Added Edges = Triangulation ,,,,,,,,,, ,,, ,,, Φ = Φ Φ = Φ = = = = It is easy to check if a grah is triangulated in linear time It is easy to triangulate a non-triangulated grah But it is very hard to do so in a way that induces small clique sizes
72 Moralization For directed grahs, the arents may not be elicitly connected, but they are involved in the same otential function P i π i Thus to think of ELIMINATION as a node removal algorithm, we first must connect all the arents of every node and dro the directions on the links This ste is known as Moralization and it is essential: since conditioning coules arents in directed models elaining away we need a mechanism for resecting this when we do inference
73 Markov Blanket Elimination order : Elimination order : For each eliminated node X i = X 5, we need to take care the child = X 6, the arents = X 3, and 3 the arents of the child = X. This three is called the Markov blanket of X ,, Φ = = Φ = = ,,, Order : Order :
74 Moral Grah , ψ, ψ, 3 ψ, 4 ψ 3, 5 ψ, 5, Z The grah after moralization looks more general! What do we lose? X X 5 X, X 3 Moral grah is more general loses some indeendencies 6 most secific most general
75 Junction Tree Moralization Triangulation Construct Junction Tree Proagate Probabilities
76 Tree-Structured Grahical Models For now, we will focus on tree-structured grahical models. Trees are an imortant class; they incororate all chains e.g. HMMs as well. Eact inference on trees is the basis for the junction tree algorithm which solves the general eact inference roblem for directed acyclic grahs and for many aroimate algorithms which can work on intractable or cyclic grahs. Directed and undirected trees make eactly same conditional indeendence assumtions, so we cover them together. a b c
77 Elimination on Trees Recall basic structure of Eliminate:. Convert directed grah to undirected by moralization.. Chose elimination ordering with query node last. 3. Place all otentials on active list. 4. Eliminate nodes by removing all relevant otentials, taking roduct, summing out node and lacing resulting factor back onto otential list. What haens when the original grah is a tree?. No moralization is necessary.. There is a natural elimination ordering with query node as root. Any deth first search order. 3. All subtrees with no evidence nodes can be ignored since they will leave a otential of unity once they are eliminated.
78 Elimination on Trees Now consider eliminating node j which is followed by i in the order. Which nodes aear in the otential created after summing over j? nothing in the subtree below j already eliminated nothing from other subtrees, since the grah is a tree only i, from ψ ij which relates i and j Call the factor that is created m ji i, and think of it as a message that j asses to i when j is eliminated. This message is created by summing over j the roduct of all earlier messages m kj j sent to j E as well as ψ j j if j is an evidence node. to root i j k l i m ji i m kj j m lj j j
79 Eliminate = Message Passing On a tree, ELIMINATE can be thought of as assing messages u to the query node at the root from the other nodes at the leaves or interior. Since we ignore subtrees with no evidence, observed evidence nodes at always at the leaves. The message m ji i is created when we sum over j m ij = ψ i E ψ j i, j mkj j j k c j At the final node f, we obtain the answer: f E ψ E E If j is an evidence node, ψ j = δ j, j else ψ E j =. If j is a leaf node in the ordering, cj is emty, otherwise cj are the children of j in the ordering. f k c f m kf f
80 Message are Reused in MultiElimination Consider querying,, 3 and 4 in the grah below. The messages needed for,, 4 individually are shown a-c. Also shown in d is the set of messages needed to comute all ossible marginals over single query nodes. X X X X m m m m m X X X X m 3 m 4 m 3 m 4 m 3 m 4 4 m 3 m 4 X 3 X 4 a X 3 X 4 b X 3 X 4 Key insight: even though the naive aroach rerun Elimination needs to comute N messages to find marginals for all N query nodes, there are only N ossible messages. We can comute all ossible messages in only double the amount of work it takes to do one query. Then we take the roduct of relevant messages to get marginals. c X 3 X 4 m 3 3 d m 4 4
81 Comuting All Possible Messages How can we comute all ossible messages efficiently? Idea: resect the following Message-Passing-Protocol: A node can send a message to a neighbor only when it has received messages from all its other neighbors. Protocol is realizable: designate one node arbitrarily as the root. Collect messages inward to root then distribute back out to leaves. Once we have the messages, we can comute marginals using: i E ψ E Remember that the directed tree on which we ass messages might not be same directed tree we started with. We can also consider synchronous or asynchronous message assing nodes that resect the rotocol but don t use the Collect-Distribute schedule above. Must rove this terminates. i k c i m ki i
82 Triangulation Triangulation: connect nodes in moral grah such that no cycle of 4 or more nodes remains in grah So, add links, but many ossible choices HINT: Try to kee largest clique size small makes junction tree algorithm more efficient Sub-otimal triangulations of moral grah are olynomial Triangulation that minimizes largest clique size is NP But, OK to use a subotimal triangulation slower JTA
83 Part III Learning Belief Networks
84 Outline Introduction to Bayesian networks Bayesian networks: definition, d-searation, equivalence of networks Eamles: causal grahs, elaining away, Markov chains, Naïve Bayes, etc Undirected models Probabilistic inference node elimination junction tree Building the networks
85 Building Bayesian Network Models Tasks of building models Catching the model structure Determining the conditional robabilities What deciding the models? Theoretical considerations e.g.: mied data, overfitting, etc. A set of data Subjective views from eerts can be artially rovided!
86 Some Issues in Learning Structure & Parameters Nodes as variables How many edges? we can always start from a comlete grah, but it may not be a good idea! edge missing can be treated as an edge of robability zero Undirected, directed or hybrid, and how to decide the direction for the directed case? direction may not reflect causality Comleity vs. training error
87 Maimum Likelihood For IID data: P Dθ = Y m `θ; D = X m P X m θ log P X m θ Idea of maimum likelihood estimation MLE: ick the setting of arameters most likely to have generated the data we saw: θ ML =argma θ`θ; D Very commonly used in statistics. Often leads to intuitive, aealing, or natural estimators. For a start, the IID assumtion makes the log likelihood into a sum, so its derivative can be easily taken term by term.
88 MLE for Directed GMs For a directed GM, the likelihood function has a nice form: log P Dθ =log Q Q P Xi mx π i, θ i = P P log P Xi mx π i, θ i m i m i The arameters decoule; so we can maimize likelihood indeendently for each node s function by setting θ i Only need the values of i and its arents in order to estimate θ i Furthermore, if X m, X πi have sufficient statistics only need those. In general, for fully observed data if we know how to estimate arameters at a single node we can do it for the whole network. W W W W X Y X Y Z X Y Z
89 Three Key Regularization Ideas To avoid overfitting, we can ut riors on the arameters of the class and class conditional feature distributions We can also tie some arameters together so that fewer of them are estimated using more data Finally, we can make factorization or indeendence assumtions about the distributions. In articular, for the class conditional distributions we can assume the features are fully deendent, artly deendent, or indeendent!. Y Y Y X X X n X X X n X
90 Discrete Multinomial Naive Bayes Discrete features i, assumed indeendent given the class label y Classification rule: P X i = jy = k =η ijk Y P Xy = k, η = Y i j η [X i=j] ijk Y P y = kx, η = π k Qi Qj η[ i=j] ijk Q i Qj η[ i=j] ijq P q π q X X X n = e βt k Pq eβt q X β k =log[η k η jk η ijk log π k ] =[ =; =;...; i = j;...;]
91 Fitting Discrete Naive Bayes ML arameters are class-conditional frequency counts: P ηijk m = [m i = j][y m = k] P m [ym = k] How do we know? Write down the likelihood: `θ; D = P m log P ym π+ P mi log P m i ym, η and otimize it by setting its derivative to zero careful! enforce normalization with Lagrange multiliers: `η; D = X X [ m i = j][y m = k]logη ijk + X λ ik X η ijk m ijk ik j P ` m = [m i = j][y m = k] λ ik η ijk η ijk ` =0 λ ik = X [y m = k] ηijk =above η ijk m
92 Learning Markov Models The ML arameter estimates for a simle Markov model are easy: TY P y, y,...,y T =Py,...,y k P y t y t, y t,...,y t k log P {y} =logp y,...,y k + t=k+ Each window of k + oututs is a training case for the model P y t y t, y t,...,y t k Eamle: for discrete oututs symbols and a nd-order Markov model we can use the multinomial model: P y t = my t = a, y t = b =α mab The maimum likelihood values for α are: αmab = t s.t. y t = m, y t = a, y t = b t s.t. y t = a, y t = b TX t=k+ log P y t y t, y t,...,y t k
93 Summary Introduction to Bayesian networks Bayesian networks: definition, d-searation, equivalence of networks Eamles: causal grahs, elaining away, Markov chains, Naïve Bayes, etc Undirected models Probabilistic inference node elimination junction tree Building the networks
94 What Else Bayesian networks with continuous r.v. s PCA, ICA, other dimension reduction methods will be discussed in coming lectures! Models with unobservable r.v. s learning by Eectation-Maimization will also be delivered!
Graphical Models (Lecture 1 - Introduction)
Grahical Models Lecture - Introduction Tibério Caetano tiberiocaetano.com Statistical Machine Learning Grou NICTA Canberra LLSS Canberra 009 Tibério Caetano: Grahical Models Lecture - Introduction / 7
More information4 : Exact Inference: Variable Elimination
10-708: Probabilistic Graphical Models 10-708, Spring 2014 4 : Exact Inference: Variable Elimination Lecturer: Eric P. ing Scribes: Soumya Batra, Pradeep Dasigi, Manzil Zaheer 1 Probabilistic Inference
More informationBayesian Networks Practice
Bayesian Networks Practice Part 2 2016-03-17 Byoung-Hee Kim, Seong-Ho Son Biointelligence Lab, CSE, Seoul National University Agenda Probabilistic Inference in Bayesian networks Probability basics D-searation
More informationIntroduction to Probability for Graphical Models
Introduction to Probability for Grahical Models CSC 4 Kaustav Kundu Thursday January 4, 06 *Most slides based on Kevin Swersky s slides, Inmar Givoni s slides, Danny Tarlow s slides, Jaser Snoek s slides,
More informationBayesian Networks Practice
ayesian Networks Practice Part 2 2016-03-17 young-hee Kim Seong-Ho Son iointelligence ab CSE Seoul National University Agenda Probabilistic Inference in ayesian networks Probability basics D-searation
More informationBased on slides by Richard Zemel
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we
More informationReview: Directed Models (Bayes Nets)
X Review: Directed Models (Bayes Nets) Lecture 3: Undirected Graphical Models Sam Roweis January 2, 24 Semantics: x y z if z d-separates x and y d-separation: z d-separates x from y if along every undirected
More informationApproximating min-max k-clustering
Aroximating min-max k-clustering Asaf Levin July 24, 2007 Abstract We consider the roblems of set artitioning into k clusters with minimum total cost and minimum of the maximum cost of a cluster. The cost
More informationInference in Graphical Models Variable Elimination and Message Passing Algorithm
Inference in Graphical Models Variable Elimination and Message Passing lgorithm Le Song Machine Learning II: dvanced Topics SE 8803ML, Spring 2012 onditional Independence ssumptions Local Markov ssumption
More informationLecture 8: Bayesian Networks
Lecture 8: Bayesian Networks Bayesian Networks Inference in Bayesian Networks COMP-652 and ECSE 608, Lecture 8 - January 31, 2017 1 Bayes nets P(E) E=1 E=0 0.005 0.995 E B P(B) B=1 B=0 0.01 0.99 E=0 E=1
More informationChris Bishop s PRML Ch. 8: Graphical Models
Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular
More informationGenerative and Discriminative Approaches to Graphical Models CMSC Topics in AI
Generative and Discriminative Approaches to Graphical Models CMSC 35900 Topics in AI Lecture 2 Yasemin Altun January 26, 2007 Review of Inference on Graphical Models Elimination algorithm finds single
More informationMachine Learning Lecture 14
Many slides adapted from B. Schiele, S. Roth, Z. Gharahmani Machine Learning Lecture 14 Undirected Graphical Models & Inference 23.06.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de
More informationDirected and Undirected Graphical Models
Directed and Undirected Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Last Lecture Refresher Lecture Plan Directed
More informationProbabilistic Graphical Networks: Definitions and Basic Results
This document gives a cursory overview of Probabilistic Graphical Networks. The material has been gleaned from different sources. I make no claim to original authorship of this material. Bayesian Graphical
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationMachine Learning 4771
Machine Learning 4771 Instructor: Tony Jebara Topic 16 Undirected Graphs Undirected Separation Inferring Marginals & Conditionals Moralization Junction Trees Triangulation Undirected Graphs Separation
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More information6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationLecture 4 October 18th
Directed and undirected graphical models Fall 2017 Lecture 4 October 18th Lecturer: Guillaume Obozinski Scribe: In this lecture, we will assume that all random variables are discrete, to keep notations
More informationBiointelligence Lab School of Computer Sci. & Eng. Seoul National University
Artificial Intelligence Chater 19 easoning with Uncertain Information Biointelligence Lab School of Comuter Sci. & Eng. Seoul National University Outline l eview of Probability Theory l Probabilistic Inference
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:
More informationProbabilistic Graphical Models (I)
Probabilistic Graphical Models (I) Hongxin Zhang zhx@cad.zju.edu.cn State Key Lab of CAD&CG, ZJU 2015-03-31 Probabilistic Graphical Models Modeling many real-world problems => a large number of random
More informationMATH 2710: NOTES FOR ANALYSIS
MATH 270: NOTES FOR ANALYSIS The main ideas we will learn from analysis center around the idea of a limit. Limits occurs in several settings. We will start with finite limits of sequences, then cover infinite
More informationGraphical Models and Kernel Methods
Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.
More informationLecture 21: Quantum Communication
CS 880: Quantum Information Processing 0/6/00 Lecture : Quantum Communication Instructor: Dieter van Melkebeek Scribe: Mark Wellons Last lecture, we introduced the EPR airs which we will use in this lecture
More informationTopic: Lower Bounds on Randomized Algorithms Date: September 22, 2004 Scribe: Srinath Sridhar
15-859(M): Randomized Algorithms Lecturer: Anuam Guta Toic: Lower Bounds on Randomized Algorithms Date: Setember 22, 2004 Scribe: Srinath Sridhar 4.1 Introduction In this lecture, we will first consider
More information3 : Representation of Undirected GM
10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:
More informationProbabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov
Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly
More informationConditional Independence and Factorization
Conditional Independence and Factorization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationBayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1
Bayes Networks CS540 Bryan R Gibson University of Wisconsin-Madison Slides adapted from those used by Prof. Jerry Zhu, CS540-1 1 / 59 Outline Joint Probability: great for inference, terrible to obtain
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationConvex Analysis and Economic Theory Winter 2018
Division of the Humanities and Social Sciences Ec 181 KC Border Conve Analysis and Economic Theory Winter 2018 Toic 16: Fenchel conjugates 16.1 Conjugate functions Recall from Proosition 14.1.1 that is
More informationSTA 250: Statistics. Notes 7. Bayesian Approach to Statistics. Book chapters: 7.2
STA 25: Statistics Notes 7. Bayesian Aroach to Statistics Book chaters: 7.2 1 From calibrating a rocedure to quantifying uncertainty We saw that the central idea of classical testing is to rovide a rigorous
More informationUndirected graphical models
Undirected graphical models Semantics of probabilistic models over undirected graphs Parameters of undirected models Example applications COMP-652 and ECSE-608, February 16, 2017 1 Undirected graphical
More informationDirected Graphical Models
Directed Graphical Models Instructor: Alan Ritter Many Slides from Tom Mitchell Graphical Models Key Idea: Conditional independence assumptions useful but Naïve Bayes is extreme! Graphical models express
More informationBayesian Networks BY: MOHAMAD ALSABBAGH
Bayesian Networks BY: MOHAMAD ALSABBAGH Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN Introduction Conditional
More informationHomework Set #3 Rates definitions, Channel Coding, Source-Channel coding
Homework Set # Rates definitions, Channel Coding, Source-Channel coding. Rates (a) Channels coding Rate: Assuming you are sending 4 different messages using usages of a channel. What is the rate (in bits
More informationOutline. Markov Chains and Markov Models. Outline. Markov Chains. Markov Chains Definitions Huizhen Yu
and Markov Models Huizhen Yu janey.yu@cs.helsinki.fi Det. Comuter Science, Univ. of Helsinki Some Proerties of Probabilistic Models, Sring, 200 Huizhen Yu (U.H.) and Markov Models Jan. 2 / 32 Huizhen Yu
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationStatistical Learning
Statistical Learning Lecture 5: Bayesian Networks and Graphical Models Mário A. T. Figueiredo Instituto Superior Técnico & Instituto de Telecomunicações University of Lisbon, Portugal May 2018 Mário A.
More informationProbabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm
Probabilistic Graphical Models 10-708 Homework 2: Due February 24, 2014 at 4 pm Directions. This homework assignment covers the material presented in Lectures 4-8. You must complete all four problems to
More informationCSE 599d - Quantum Computing When Quantum Computers Fall Apart
CSE 599d - Quantum Comuting When Quantum Comuters Fall Aart Dave Bacon Deartment of Comuter Science & Engineering, University of Washington In this lecture we are going to begin discussing what haens to
More informationCSC 412 (Lecture 4): Undirected Graphical Models
CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:
More informationAnnouncements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic
CS 188: Artificial Intelligence Fall 2008 Lecture 16: Bayes Nets III 10/23/2008 Announcements Midterms graded, up on glookup, back Tuesday W4 also graded, back in sections / box Past homeworks in return
More informationChapter 16. Structured Probabilistic Models for Deep Learning
Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe
More informationLearning Sequence Motif Models Using Gibbs Sampling
Learning Sequence Motif Models Using Gibbs Samling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Sring 2018 Anthony Gitter gitter@biostat.wisc.edu These slides excluding third-arty material are licensed under
More informationRecall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem
Recall from last time: Conditional probabilities Our probabilistic models will compute and manipulate conditional probabilities. Given two random variables X, Y, we denote by Lecture 2: Belief (Bayesian)
More informationCOMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma
COMS 4771 Probabilistic Reasoning via Graphical Models Nakul Verma Last time Dimensionality Reduction Linear vs non-linear Dimensionality Reduction Principal Component Analysis (PCA) Non-linear methods
More informationStatistical Approaches to Learning and Discovery
Statistical Approaches to Learning and Discovery Graphical Models Zoubin Ghahramani & Teddy Seidenfeld zoubin@cs.cmu.edu & teddy@stat.cmu.edu CALD / CS / Statistics / Philosophy Carnegie Mellon University
More informationPart III. for energy minimization
ICCV 2007 tutorial Part III Message-assing algorithms for energy minimization Vladimir Kolmogorov University College London Message assing ( E ( (,, Iteratively ass messages between nodes... Message udate
More informationRadial Basis Function Networks: Algorithms
Radial Basis Function Networks: Algorithms Introduction to Neural Networks : Lecture 13 John A. Bullinaria, 2004 1. The RBF Maing 2. The RBF Network Architecture 3. Comutational Power of RBF Networks 4.
More informationNamed Entity Recognition using Maximum Entropy Model SEEM5680
Named Entity Recognition using Maximum Entroy Model SEEM5680 Named Entity Recognition System Named Entity Recognition (NER): Identifying certain hrases/word sequences in a free text. Generally it involves
More informationArtificial Intelligence Bayes Nets: Independence
Artificial Intelligence Bayes Nets: Independence Instructors: David Suter and Qince Li Course Delivered @ Harbin Institute of Technology [Many slides adapted from those created by Dan Klein and Pieter
More informationPart I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS
Part I C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Probabilistic Graphical Models Graphical representation of a probabilistic model Each variable corresponds to a
More informationBayes Nets: Independence
Bayes Nets: Independence [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://ai.berkeley.edu.] Bayes Nets A Bayes
More informationCS 5522: Artificial Intelligence II
CS 5522: Artificial Intelligence II Bayes Nets: Independence Instructor: Alan Ritter Ohio State University [These slides were adapted from CS188 Intro to AI at UC Berkeley. All materials available at http://ai.berkeley.edu.]
More informationA Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004
A Brief Introduction to Graphical Models Presenter: Yijuan Lu November 12,2004 References Introduction to Graphical Models, Kevin Murphy, Technical Report, May 2001 Learning in Graphical Models, Michael
More informationModel checking, verification of CTL. One must verify or expel... doubts, and convert them into the certainty of YES [Thomas Carlyle]
Chater 5 Model checking, verification of CTL One must verify or exel... doubts, and convert them into the certainty of YES or NO. [Thomas Carlyle] 5. The verification setting Page 66 We introduce linear
More informationLecture 6: Graphical Models
Lecture 6: Graphical Models Kai-Wei Chang CS @ Uniersity of Virginia kw@kwchang.net Some slides are adapted from Viek Skirmar s course on Structured Prediction 1 So far We discussed sequence labeling tasks:
More informationCS 188: Artificial Intelligence. Bayes Nets
CS 188: Artificial Intelligence Probabilistic Inference: Enumeration, Variable Elimination, Sampling Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationUndirected Graphical Models: Markov Random Fields
Undirected Graphical Models: Markov Random Fields 40-956 Advanced Topics in AI: Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2015 Markov Random Field Structure: undirected
More informationUse of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek
Use of Transformations and the Reeated Statement in PROC GLM in SAS Ed Stanek Introduction We describe how the Reeated Statement in PROC GLM in SAS transforms the data to rovide tests of hyotheses of interest.
More informationECE 534 Information Theory - Midterm 2
ECE 534 Information Theory - Midterm Nov.4, 009. 3:30-4:45 in LH03. You will be given the full class time: 75 minutes. Use it wisely! Many of the roblems have short answers; try to find shortcuts. You
More informationCours 7 12th November 2014
Sum Product Algorithm and Hidden Markov Model 2014/2015 Cours 7 12th November 2014 Enseignant: Francis Bach Scribe: Pauline Luc, Mathieu Andreux 7.1 Sum Product Algorithm 7.1.1 Motivations Inference, along
More informationRANDOM WALKS AND PERCOLATION: AN ANALYSIS OF CURRENT RESEARCH ON MODELING NATURAL PROCESSES
RANDOM WALKS AND PERCOLATION: AN ANALYSIS OF CURRENT RESEARCH ON MODELING NATURAL PROCESSES AARON ZWIEBACH Abstract. In this aer we will analyze research that has been recently done in the field of discrete
More informationNumerical Linear Algebra
Numerical Linear Algebra Numerous alications in statistics, articularly in the fitting of linear models. Notation and conventions: Elements of a matrix A are denoted by a ij, where i indexes the rows and
More information4.1 Notation and probability review
Directed and undirected graphical models Fall 2015 Lecture 4 October 21st Lecturer: Simon Lacoste-Julien Scribe: Jaime Roquero, JieYing Wu 4.1 Notation and probability review 4.1.1 Notations Let us recall
More informationJohn Weatherwax. Analysis of Parallel Depth First Search Algorithms
Sulementary Discussions and Solutions to Selected Problems in: Introduction to Parallel Comuting by Viin Kumar, Ananth Grama, Anshul Guta, & George Karyis John Weatherwax Chater 8 Analysis of Parallel
More informationAI*IA 2003 Fusion of Multiple Pattern Classifiers PART III
AI*IA 23 Fusion of Multile Pattern Classifiers PART III AI*IA 23 Tutorial on Fusion of Multile Pattern Classifiers by F. Roli 49 Methods for fusing multile classifiers Methods for fusing multile classifiers
More informationLecture 15. Probabilistic Models on Graph
Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how
More informationMODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL
Technical Sciences and Alied Mathematics MODELING THE RELIABILITY OF CISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL Cezar VASILESCU Regional Deartment of Defense Resources Management
More informationProbabilistic Graphical Models. Guest Lecture by Narges Razavian Machine Learning Class April
Probabilistic Graphical Models Guest Lecture by Narges Razavian Machine Learning Class April 14 2017 Today What is probabilistic graphical model and why it is useful? Bayesian Networks Basic Inference
More informationOutline. Spring It Introduction Representation. Markov Random Field. Conclusion. Conditional Independence Inference: Variable elimination
Probabilistic Graphical Models COMP 790-90 Seminar Spring 2011 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Outline It Introduction ti Representation Bayesian network Conditional Independence Inference:
More information5. Sum-product algorithm
Sum-product algorithm 5-1 5. Sum-product algorithm Elimination algorithm Sum-product algorithm on a line Sum-product algorithm on a tree Sum-product algorithm 5-2 Inference tasks on graphical models consider
More informationUncertainty and Bayesian Networks
Uncertainty and Bayesian Networks Tutorial 3 Tutorial 3 1 Outline Uncertainty Probability Syntax and Semantics for Uncertainty Inference Independence and Bayes Rule Syntax and Semantics for Bayesian Networks
More informationIntroduction to Probabilistic Graphical Models
Introduction to Probabilistic Graphical Models Sargur Srihari srihari@cedar.buffalo.edu 1 Topics 1. What are probabilistic graphical models (PGMs) 2. Use of PGMs Engineering and AI 3. Directionality in
More informationarxiv: v1 [physics.data-an] 26 Oct 2012
Constraints on Yield Parameters in Extended Maximum Likelihood Fits Till Moritz Karbach a, Maximilian Schlu b a TU Dortmund, Germany, moritz.karbach@cern.ch b TU Dortmund, Germany, maximilian.schlu@cern.ch
More informationBayes Networks 6.872/HST.950
Bayes Networks 6.872/HST.950 What Probabilistic Models Should We Use? Full joint distribution Completely expressive Hugely data-hungry Exponential computational complexity Naive Bayes (full conditional
More informationA CONCRETE EXAMPLE OF PRIME BEHAVIOR IN QUADRATIC FIELDS. 1. Abstract
A CONCRETE EXAMPLE OF PRIME BEHAVIOR IN QUADRATIC FIELDS CASEY BRUCK 1. Abstract The goal of this aer is to rovide a concise way for undergraduate mathematics students to learn about how rime numbers behave
More informationLecture 6. 2 Recurrence/transience, harmonic functions and martingales
Lecture 6 Classification of states We have shown that all states of an irreducible countable state Markov chain must of the same tye. This gives rise to the following classification. Definition. [Classification
More informationComputational Complexity of Inference
Computational Complexity of Inference Sargur srihari@cedar.buffalo.edu 1 Topics 1. What is Inference? 2. Complexity Classes 3. Exact Inference 1. Variable Elimination Sum-Product Algorithm 2. Factor Graphs
More informationMAP Estimation Algorithms in Computer Vision - Part II
MAP Estimation Algorithms in Comuter Vision - Part II M. Pawan Kumar, University of Oford Pushmeet Kohli, Microsoft Research Eamle: Image Segmentation E() = c i i + c ij i (1- j ) i i,j E: {0,1} n R 0
More informationLecture 7: Linear Classification Methods
Homeork Homeork Lecture 7: Linear lassification Methods Final rojects? Grous oics Proosal eek 5 Lecture is oster session, Jacobs Hall Lobby, snacks Final reort 5 June. What is linear classification? lassification
More informationHENSEL S LEMMA KEITH CONRAD
HENSEL S LEMMA KEITH CONRAD 1. Introduction In the -adic integers, congruences are aroximations: for a and b in Z, a b mod n is the same as a b 1/ n. Turning information modulo one ower of into similar
More informationUndirected Graphical Models
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional
More informationConditional Random Field
Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions
More informationp L yi z n m x N n xi
y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen
More informationLecture 2: Simple Classifiers
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 2: Simple Classifiers Slides based on Rich Zemel s All lecture slides will be available on the course website: www.cs.toronto.edu/~jessebett/csc412
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 12 Dynamical Models CS/CNS/EE 155 Andreas Krause Homework 3 out tonight Start early!! Announcements Project milestones due today Please email to TAs 2 Parameter learning
More informationBrownian Motion and Random Prime Factorization
Brownian Motion and Random Prime Factorization Kendrick Tang June 4, 202 Contents Introduction 2 2 Brownian Motion 2 2. Develoing Brownian Motion.................... 2 2.. Measure Saces and Borel Sigma-Algebras.........
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationPROBABILISTIC REASONING SYSTEMS
PROBABILISTIC REASONING SYSTEMS In which we explain how to build reasoning systems that use network models to reason with uncertainty according to the laws of probability theory. Outline Knowledge in uncertain
More informationBayes Nets III: Inference
1 Hal Daumé III (me@hal3.name) Bayes Nets III: Inference Hal Daumé III Computer Science University of Maryland me@hal3.name CS 421: Introduction to Artificial Intelligence 10 Apr 2012 Many slides courtesy
More informationIntelligent Systems:
Intelligent Systems: Undirected Graphical models (Factor Graphs) (2 lectures) Carsten Rother 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM Roadmap for next two lectures Definition
More informationSQUARES IN Z/NZ. q = ( 1) (p 1)(q 1)
SQUARES I Z/Z We study squares in the ring Z/Z from a theoretical and comutational oint of view. We resent two related crytograhic schemes. 1. SQUARES I Z/Z Consider for eamle the rime = 13. Write the
More information3 More applications of derivatives
3 More alications of derivatives 3.1 Eact & ineact differentials in thermodynamics So far we have been discussing total or eact differentials ( ( u u du = d + dy, (1 but we could imagine a more general
More information4. Score normalization technical details We now discuss the technical details of the score normalization method.
SMT SCORING SYSTEM This document describes the scoring system for the Stanford Math Tournament We begin by giving an overview of the changes to scoring and a non-technical descrition of the scoring rules
More informationBayesian Networks for Modeling and Managing Risks of Natural Hazards
[National Telford Institute and Scottish Informatics and Comuter Science Alliance, Glasgow University, Set 8, 200 ] Bayesian Networks for Modeling and Managing Risks of Natural Hazards Daniel Straub Engineering
More information