Graphical Models (Lecture 1 - Introduction)
|
|
- Joshua Newton
- 6 years ago
- Views:
Transcription
1 Grahical Models Lecture - Introduction Tibério Caetano tiberiocaetano.com Statistical Machine Learning Grou NICTA Canberra LLSS Canberra 009 Tibério Caetano: Grahical Models Lecture - Introduction / 7
2 Material on Grahical Models Many good books Chris Bisho s book Pattern Recognition and Machine Learning Grahical Models chater available from his webage in df format as well as all the figures many used here in these slides! Judea Pearl s Probabilistic Reasoning in Intelligent Systems Stehen Lauritzen s Grahical Models Unublished material Michael Jordan s unublished book An Introduction to Probabilistic Grahical Models Koller and Friedman s unublished book Structured Probabilistic Models Videos Sam Roweis videos on videolectures.net Ecellent! Tibério Caetano: Grahical Models Lecture - Introduction / 7
3 Introduction Query in quotes # results in Google Scholar Kalman Filter >0000 EM algorithm > Hidden Markov Models > Bayesian Networks > 600 Markov Random Fields > 5000 Particle Filters > 4000 Miture Models > 4000 Conditional Random Fields > 500 Markov Chain Monte Carlo > Gibbs Samling > 8000 Tibério Caetano: Grahical Models Lecture - Introduction / 7
4 Introduction Grahical Models have been alied to Image Processing Seech Processing Natural Language Processing Document Processing Pattern Recognition Bioinformatics Comuter Vision Economics Physics Social Sciences Tibério Caetano: Grahical Models Lecture - Introduction 4 / 7
5 Physics Tibério Caetano: Grahical Models Lecture - Introduction 5 / 7
6 Biology Tibério Caetano: Grahical Models Lecture - Introduction 6 / 7
7 Music Tibério Caetano: Grahical Models Lecture - Introduction 7 / 7
8 Comuter Vision Matching Tibério Caetano: Grahical Models Lecture - Introduction 8 / 7
9 Comuter Alications Vision in Vision and PR Matching Tibério Caetano: Grahical Models Lecture - Introduction 9 / 7
10 Image Processing l Tibério Caetano: Grahical Models Lecture - Introduction 0 / 7
11 Image Processing Tibério Caetano: Grahical Models Lecture - Introduction / 7
12 Introduction Technically Grahical Models are Multivariate robabilistic models... which are structured... in terms of conditional indeendence statements Informally Models that reresent a system by its arts and the ossible relations among them in a robabilistic way Tibério Caetano: Grahical Models Lecture - Introduction / 7
13 Introduction Questions we want to ask about these models Estimating the arameters of the model given data Obtaining data samles from the model Comuting robabilities of articular outcomes Finding most likely outcome Tibério Caetano: Grahical Models Lecture - Introduction / 7
14 Univariate Eamle = σ π e[ µ /σ ] Estimate µ and σ given X = {... n } Samle from Comute Pµ σ µ + σ := µ+σ d µ σ Find argma Tibério Caetano: Grahical Models Lecture - Introduction 4 / 7
15 Multivariate Case We want to do the same things For multivariate distributions structured according to CI Efficiently Accurately For eamle given... n ; θ Estimate θ given a samle X and criterion e.g. ML Comute A A {... n } marginal distributions Find argma... n... n ; θ MAP assignment Tibério Caetano: Grahical Models Lecture - Introduction 5 / 7
16 Naively... When trying to answer the relevant questions... =... N... N O X X X N and similarly for other questions: NOT GOOD We need comact reresentations of multivariate distributions Tibério Caetano: Grahical Models Lecture - Introduction 6 / 7
17 When to Use Grahical Models When the comactness of the model arises from conditional indeendence statements involving its random variables. CAUTION: Grahical Models are useful in such cases. If the robability sace is structured in different ways Grahical Models may not and in rincile should not be the right framework to reresent and deal with the robability distributions involved. Tibério Caetano: Grahical Models Lecture - Introduction 7 / 7
18 Grahical Models Lecture - Basics Tibério Caetano tiberiocaetano.com Statistical Machine Learning Grou NICTA Canberra LLSS Canberra 009 Tibério Caetano: Grahical Models Lecture - Basics /
19 Notation Basic definitions involving random quantities X - A random variable X = X... X N N - A articular realization of X: =... N X - Set of all realizations samle sace X A - A random vector of variables indeed by A {... N} A for realizations XÃ - The random vector comrised of all variables other than those in X A A Ã = {... N} A Ã =. Ã for realizations X A := { A } X A := { A } := robability that X assumes realization Basic roerties of robabilities 0 X X = Tibério Caetano: Grahical Models Lecture - Basics /
20 Conditioning and Marginalization Conditioning The two rules you will always need A B = A B B for B > 0 Marginalization A = Ã XÃ A Ã Tibério Caetano: Grahical Models Lecture - Basics /
21 Indeendence and Conditional Inde. Indeendence A B = A B Conditional Indeendence A B C = A C B C or equivalently A B C = A C or equivalently B A C = B C Notation: X A X B X C Tibério Caetano: Grahical Models Lecture - Basics 4 /
22 Conditional Indeendence Eamles Weather tomorrow Weather yesterday Weather today My Genome my grandarents Genome my arent s Genome My mood my wife s boss mood my wife s mood A iel s color color of far away iels color of surrounding iels Tibério Caetano: Grahical Models Lecture - Basics 5 /
23 Conditional Indeendence The KEY Fact is A B C }{{} f variables = A C }{{} f variables B C }{{} f variables factors as functions over roer subsets of variables Tibério Caetano: Grahical Models Lecture - Basics 6 /
24 Conditional Indeendence Therefore A B C }{{} f variables = A C }{{} f variables B C }{{} f variables A B C cannot assume arbitrary values for arbitrary A B C If you vary A and B for fied C you can only realize robabilities that satisfy the above condition Tibério Caetano: Grahical Models Lecture - Basics 7 /
25 What is a Grahical Model? What is a Grahical Model? Given a set of conditional indeendence statements for the random vector X = X... X N : {X Ai X Bi X Ci } Our object of study will be the family of robability distributions where these statements hold... N Tibério Caetano: Grahical Models Lecture - Basics 8 /
26 Questions to be addressed Tyical questions when we have a robabilistic model Estimate arameters of the model given data Comute robabilities of articular outcomes Find articularly interesting realizations e.g. MAP assignment In order to maniulate the robabilistic model We need to know the mathematical structure of We need to find ways of comuting efficiently and accurately in such structure Tibério Caetano: Grahical Models Lecture - Basics 9 /
27 An Eercise How does look like? Eamle: where = = = so = Tibério Caetano: Grahical Models Lecture - Basics 0 /
28 An Eercise However = is also = since = = since = Tibério Caetano: Grahical Models Lecture - Basics /
29 Cond. Inde. and Factorization So CI seems to generate factorization of! Is this useful for the questions we want to ask? Let s see an eamle of how eensive it is to comute = Without factorization: = O X X X With factorization: = = = O X X Tibério Caetano: Grahical Models Lecture - Basics /
30 Cond. Inde. and Factorization Therefore Conditional Indeendence seems to induce a structure in that allows us to eloit the distributive law in order to make comutations more tractable However what about the general case... N? What is the form that will take in general given a set of conditional indeendence statements? Will we be able to eloit the distributive law in this general case as well? Tibério Caetano: Grahical Models Lecture - Basics /
31 Re-Writing the Joint Distribution A little eercise... N =... N N... N... N =... N... N = N i= i <i where < i := {j : j < i j N + } now denote by π a ermutation of the labels {... N} such that π j < π i i j < i. Above we have π = i.e. π i = i So we can write = N i= π i <πi. Tibério Caetano: Grahical Models Lecture - Basics 4 /
32 Re-Writing the Joint Distribution So Any can be written as = N i= π i <πi. Now assume that the following CI statements hold πi <πi = πi aπi i where a πi < π i. Then we immediately get = N i= π i aπi Tibério Caetano: Grahical Models Lecture - Basics 5 /
33 Comuting in Grahical Models Algebra is boring so let s draw this Let s reresent variables as circles Let s draw an arrow from j to i if j a i The resulting drawing will be a Directed Grah Tyes of Grahical Models Moreover it will be Acyclic no directed cycles Eercise:why? Directed Grahical Models: X 4 X X X 6 X X 5 Tibério Caetano: Grahical Models Lecture - Basics 6 /
34 Comuting intyes of Grahical Models Directed Grahical Models: X 4 X X X 6 X X 5 =? Eercise Tibério Caetano: Grahical Models Lecture - Basics 7 /
35 Bayesian Networks This is why the name Grahical Models Such Grahical Models with arrows are called Bayesian Networks Bayes Nets Bayes Belief Nets Belief Networks Or more descritively: Directed Grahical Models Tibério Caetano: Grahical Models Lecture - Basics 8 /
36 Bayesian Networks A Bayesian Network associated to a DAG is a set of robability distributions where each element can be written as = i i ai where random variable i is reresented as a node in the DAG and a i = { j : arrow j i in the DAG }. a is for arents. Colloquially we say the BN is the DAG Tibério Caetano: Grahical Models Lecture - Basics 9 /
37 Toological Sorts A ermutation π of the node labels which for every node makes each of its arents have a smaller inde than that of the node is called a toological sort of the nodes in the DAG. Theorem: Every DAG has at least one toological sort Eercise: Prove Tibério Caetano: Grahical Models Lecture - Basics 0 /
38 A Little Eercise Revisited Remember?... N =... N N... N... N =... N... N = N i= i <i where < i := {j : j < i j N + } now denote by π a ermutation of the labels {... N} such that π j < π i i j < i. Above we have π = So we can write = N i= π i <πi. Tibério Caetano: Grahical Models Lecture - Basics /
39 Eercises Eercises: How many toological sorts has a BN where no CI statements hold? How many toological sorts has a BN where all CI statements hold? Tibério Caetano: Grahical Models Lecture - Basics /
40 Grahical Models Lecture -Bayesian Networks Tibério Caetano tiberiocaetano.com Statistical Machine Learning Grou NICTA Canberra LLSS Canberra 009 Tibério Caetano: Grahical Models Lecture -Bayesian Networks / 9
41 Some Key Elements of Lecture We can always write = i i <i Create a DAG with arrow j i whenever j < i Imose CI statements by removing some arrows The result will be = i i ai Now there will be ermutations π other than the identity such that = i π i aπi with π i > k where k aπ i. Tibério Caetano: Grahical Models Lecture -Bayesian Networks / 9
42 Refreshing Eercise Eercise Prove that the factorized form for the robability distribution of a Bayesian Network is indeed normalized to. Tibério Caetano: Grahical Models Lecture -Bayesian Networks / 9
43 Hidden CI Statements? We have obtained a BN by Introducing very convenient CI statements namely those that shrink the factors of the eansion = N i= π i <πi By doing so have we induced other CI statements? The answer is YES Tibério Caetano: Grahical Models Lecture -Bayesian Networks 4 / 9
44 Head-to-Tail Nodes Indeendence Are a and b indeendent? a c b Does a b hold? Check whether ab = ab ab = c abc = c ac ab c = a c b cc a = ab a ab Tibério Caetano: Grahical Models Lecture -Bayesian Networks 5 / 9
45 Head-to-Tail Nodes Cond. Inde. Factorization CI? a c b Does abc = ac ab c a b c? Assume abc = ac ab c holds Then ab c = abc c = ac ab c c = ca cb c c = a cb c Tibério Caetano: Grahical Models Lecture -Bayesian Networks 6 / 9
46 Head-to-Tail Nodes Cond. Inde. CI Factorization? a c b Does a b c abc = ac ab c? Assume a b c i.e. ab c = a cb c Then abc := ab cc = a cb cc Bayes = ac ab c Tibério Caetano: Grahical Models Lecture -Bayesian Networks 7 / 9
47 Tail-to-Tail Nodes Indeendence Are a and b indeendent? c a b Does a b hold? Check whether ab = ab ab = c abc = c ca cb c = ba cc b = ba b ab in general c Tibério Caetano: Grahical Models Lecture -Bayesian Networks 8 / 9
48 Tail-to-Tail Nodes Cond. Inde. Factorization CI? c a b Does abc = ca cb c a b c? Assume abc = ca cb c. Then ab c = abc c = ca cb c c = a cb c Tibério Caetano: Grahical Models Lecture -Bayesian Networks 9 / 9
49 Tail-to-Tail Nodes Cond. Inde. CI Factorization? c a b Does a b c abc = ca cb c? Assume a b c holds i.e. ab c = a cb c holds Then abc = ab cc = a cb cc Tibério Caetano: Grahical Models Lecture -Bayesian Networks 0 / 9
50 Head-to-Head Nodes Indeendence Are a and b indeendent? a b c Does a b hold? Check whether ab = ab ab = c abc = c abc ab = ab Tibério Caetano: Grahical Models Lecture -Bayesian Networks / 9
51 Head-to-head Nodes Cond. Inde. Factorization CI? a b c Does abc = abc ab a b c? Assume abc = abc ab holds ab c = abc c Then = abc ab c Tibério Caetano: Grahical Models Lecture -Bayesian Networks / 9 a cb c in general
52 CI Factorization in -Node BNs Therefore we conclude that Conditional Indeendence and Factorization are equivalent for the atomic Bayesian Networks with only nodes. Question Are they equivalent for any Bayesian Network? To answer we need to characterize which conditional indeendence statements hold for an arbitrary factorization and check whether a distribution that satisfies those statements will have such factorization. Tibério Caetano: Grahical Models Lecture -Bayesian Networks / 9
53 Blocked Paths We start by defining a blocked ath which is one containing An observed TT or HT node or A HH node which is not observed nor any of its descendants is observed a f a f e b e b c c Tibério Caetano: Grahical Models Lecture -Bayesian Networks 4 / 9
54 D-Searation A set of nodes Tyes A is of said Grahical to be d-searated Models from a set of nodes B by a set of nodes C if every ath from A to B is blocked when C is in the conditioning set. Directed Grahical Models: X 4 X X X 6 X X 5 Eercise: Is X d-searated from X 6 when the conditioning set is {X X 5 }? Tibério Caetano: Grahical Models Lecture -Bayesian Networks 5 / 9
55 CI Factorization for BNs Theorem: Factorization CI If a robability distribution factorizes according to a directed acyclic grah and if A B and C are disjoint subsets of nodes such that A is d-searated from B by C in the grah then the distribution satisfies A B C. Theorem: CI Factorization If a robability distribution satisfies the conditional indeendence statements imlied by d-searation over a articular directed grah then it also factorizes according to the grah. Tibério Caetano: Grahical Models Lecture -Bayesian Networks 6 / 9
56 Factorization CI for BNs Proof Strategy: DF d-se d-se: d-searation roerty DF: Directed Factorization Proerty Tibério Caetano: Grahical Models Lecture -Bayesian Networks 7 / 9
57 CI Factorization for BNs Proof Strategy: d-se DL DF DL: Directed Local Markov Proerty: α ndα aα Thus we obtain DF d-se DL DF Tibério Caetano: Grahical Models Lecture -Bayesian Networks 8 / 9
58 Relevance of CI Factorization for BNs Has local wants global CI statements are usually what is known by the eert The eert needs the model in order to comute things The CI Factorization art gives from what is known CI statements Tibério Caetano: Grahical Models Lecture -Bayesian Networks 9 / 9
59 Grahical Models Lecture 4 - Markov Random Fields Tibério Caetano tiberiocaetano.com Statistical Machine Learning Grou NICTA Canberra LLSS Canberra 009 Tibério Caetano: Grahical Models Lecture 4 - Markov Random Fields / 8
60 Changing the class of CI statements We obtained BNs by assuming πi <πi = πi aπi i where a πi < π i. We saw in general that such tyes of CI statements would roduce others and in general all CI statements can be read as d-searation in a DAG. However there are sets of CI statements which cannot be satisfied by any BN. Tibério Caetano: Grahical Models Lecture 4 - Markov Random Fields / 8
61 Markov Random Fields Ideally we would like to have more freedom There is another class of Grahical Models called Markov Random Fields MRFs MRFs allow for the secification of a different class of CI statements The class of CI statements for MRFs can be easily defined by grahical means in undirected grahs. Tibério Caetano: Grahical Models Lecture 4 - Markov Random Fields / 8
62 Grah Searation Definition of Grah Searation In an undirected grah G being A B and C disjoint subsets of nodes if every ath from A to B includes at least one node from C then C is said to searate A from B in G. A C B Tibério Caetano: Grahical Models Lecture 4 - Markov Random Fields 4 / 8
63 Grah Searation Definition of Markov Random Field An MRF is a set of robability distributions { : > 0 } such that there eists an undirected grah G with disjoint subsets of nodes A B C in which whenever C searates A from B in G A B C in Colloquially we say that the MRF is such undirected grah. But in reality it is the set of all robability distributions whose conditional indeendency statements are recisely those given by grah searation in the grah. Tibério Caetano: Grahical Models Lecture 4 - Markov Random Fields 5 / 8
64 Cliques and Maimal Cliques Definitions concerning undirected grahs A clique of a grah is a comlete subgrah of it i.e. a subgrah where every air of nodes is connected by an edge. A maimal clique of a grah is clique which is not a roer subset of another clique 4 {X X } form a clique and {X X X 4 } a maimal clique. Tibério Caetano: Grahical Models Lecture 4 - Markov Random Fields 6 / 8
65 Factorization Proerty Definition of factorization w.r.t. an undirected grah A robability distribution is said to factorize with resect to a given undirected grah if it can be written as = Z c C c c where C is the set of maimal cliques c is a maimal clique c is the domain of restricted to c and c c is an arbitrary non-negative real-valued function. Z ensures =. Tibério Caetano: Grahical Models Lecture 4 - Markov Random Fields 7 / 8
66 CI Factorization for ositive MRFs Theorem: Factorization CI If a robability distribution factorizes according to an undirected grah and if A B and C are disjoint subsets of nodes such that C searates A from B in the grah then the distribution satisfies A B C. Theorem: CI Factorization Hammersley-Clifford If a strictly ositive robability distribution > 0 satisfies the conditional indeendence statements imlied by grah searation over a articular undirected grah then it also factorizes according to the grah. Tibério Caetano: Grahical Models Lecture 4 - Markov Random Fields 8 / 8
67 Factorization CI for MRFs Proof... Tibério Caetano: Grahical Models Lecture 4 - Markov Random Fields 9 / 8
68 CI Factorization for +MRFs H-C Thm Möbius Inversion: for C B A S and F : PS R: FA = B:B A C:C B B C F C Define F = φ = log and comute the inner sum for the case where B is not a clique i.e. X X not connected in B. Then CI φx C X + φc = φc X + φc X holds and = B C φc = C B C B;X X / C C B;X X / C C B;X X / C C B;X X / C C B;X X / C B C φc + B C X φc X + B C X φc X + B X C X φx C X = B C [φx C X + φc φc X φc X ] }{{} Tibério Caetano: Grahical Models Lecture 4 - Markov Random Fields 0 / 8 =0
69 Relevance of CI Factorization for MRFs Relevance is analogous to the BN case CI statements are usually what is known by the eert The eert needs the model in order to comute things The CI Factorization art gives from what is known CI statements Tibério Caetano: Grahical Models Lecture 4 - Markov Random Fields / 8
70 Comarison BNs vs. MRFs In both tyes of Grahical Models A relationshi between the CI statements satisfied by a distribution and the associated simlified algebraic structure of the distribution is made in term of grahical objects. The CI statements are related to concets of searation between variables in the grah. The simlified algebraic structure factorization of in this case is related to local ieces of the grah child + its arents in BNs cliques in MRFs Tibério Caetano: Grahical Models Lecture 4 - Markov Random Fields / 8
71 Comarison BNs vs. MRFs Differences The set of robability distributions that can be reresented as MRFs is different from the set that can be reresented as BNs. Although both MRFs and BNs are eressed as a factorization of local functions on the grah the MRF has a normalization constant Z = c C c c that coules all factors whereas the BN has not. The local ieces of the BN are robability distributions themselves whereas in MRFs they need only be non-negative functions i.e. they may not have range [0 ] as robabilities do. Tibério Caetano: Grahical Models Lecture 4 - Markov Random Fields / 8
72 Comarison BNs vs. MRFs Eercises When are the CI statements of a BN and a MRF recisely the same? A grah has nodes A B and C. We know that A B but C A and C B both do not hold. Can this reresent a BN? An MRF? Tibério Caetano: Grahical Models Lecture 4 - Markov Random Fields 4 / 8
73 I-Mas D-Mas and P-Mas A grah is said to be a D-ma for deendence ma of a distribution if every conditional indeendence statement satisfied by the distribution is reflected in the grah. A grah is said to be an I-ma for indeendence ma of a distribution if every conditional indeendence statement imlied by the grah is satisfied in the distribution. A grah is said to be an P-ma for erfect ma of a distribution if it is both a D-ma and an I-ma for the distribution. Tibério Caetano: Grahical Models Lecture 4 - Markov Random Fields 5 / 8
74 I-Mas D-Mas and P-Mas D U P D: set of distributions on n variables that can be reresented as a erfect ma by a DAG U: set of distributions on n variables that can be reresented as a erfect ma by an Undirected grah P: set of all distributions on n variables Tibério Caetano: Grahical Models Lecture 4 - Markov Random Fields 6 / 8
75 Markov Blankets i The Markov Blanket of a node X i in either a BN or an MRF is the smallest set of nodes A such that i ĩ = i A BN: arents children and co-arents of the node MRF: neighbors of the node Tibério Caetano: Grahical Models Lecture 4 - Markov Random Fields 7 / 8
76 Eercises Eercises Show that the Markov Blanket of a node i in a BN is given by it s children arents and co-arents Show that the Markov Blanket of a node i in a MRF is given by its neighbors Tibério Caetano: Grahical Models Lecture 4 - Markov Random Fields 8 / 8
77 Grahical Models Lecture 5 - Inference Tibério Caetano tiberiocaetano.com Statistical Machine Learning Grou NICTA Canberra LLSS Canberra 009 Tibério Caetano: Grahical Models Lecture 5 - Inference / 9
78 Factorized Distributions Our as a factorized form For BNs we have = i i a i for MRFs we have = Z c C c c Will this enable us to answer the relevant questions in ractice? Tibério Caetano: Grahical Models Lecture 5 - Inference / 9
79 Key Concet: Distributive Law Distributive Law ab }{{ + ac } = ab + c }{{} oerations oerations I.e. if the same constant factor a here is resent in every term we can gain by ulling it out Consider comuting the marginal for the MRF with factorization N = Z i= i i+ Eercise: which grah is this? = N... N Z i= i i+ = Z N N N O N i= X i vs. O i=n i= X i X i+ Tibério Caetano: Grahical Models Lecture 5 - Inference / 9
80 Elimination Algorithm Distributive Law DL is the key to efficient inference in GMs The simlest algorithm using the DL is the Elimination Algorithm This algorithm is aroriate when we have a single query Just like in the revious eamle of comuting in a givel MRF This algorithm can be seen as successive elimination of nodes in the grah Tibério Caetano: Grahical Models Lecture 5 - Inference 4 / 9
81 X X 5 Elimination Algorithm XX X X 5 5 a a a X X X X X X X 5 X X 5 5 b b b X X X X X 4 X 4 XX4 4 X 4 X X 4X 4 4 X X 5 X 5 X X X XX X X X X c X X X c d d X 6 X X 6 X X X X X X X X X X X X X X X X 5 X X d X b X 5 Comute Xwith 4 elimination order X X X X X X X X X X X X = Z P X X X XX X c c c XX5 5 X 5 X X f f f g g g X k# h8 f # $ &'$ &$ ' #" &'$ $ " {z a } h id8 D e X d d d X X X f g k# h8 f # $ &'$ &$ ' #" &'$ $ " k# h8$ f & # $ &' $! $ ad8 &$ ' #" &'$ $ " $ & $! ad8 = Z P P P 4 4 X 5 5 X k# h8 f # # $ $ &'$ &$ &$ ' ' #" #" &'$ $ $ " " a a h h id8 id8 D D $ & $ & $! $! ad8 ad8 m 6 5 {z } = Z sum P m 5 X 4 4 m 5 e e e f g f {z } g &$ ' #" &'$ $ " a h id8 D m 4 = Z X m 4 X m 5 {z } m {z } m Tibério Caetano: Grahical Models Lecture 5 - Inference 5 / 9
82 Belief Proagation Belief Proagation Algorithm also called Probability Proagation Sum-Product Algorithm Does not reeat comutations Is secifically targeted at tree-structured grahs Tibério Caetano: Grahical Models Lecture 5 - Inference 6 / 9
83 Belief Proagation in a Chain µ α n µ α n µ β n µ β n+ n n n+ N N Z n <n >n n = <n >n n = Z [ n = Z <n n = Z i= i i+ i= i i+ N i=n i i+ ] [ >n n i= i i+ [ ] n i i+ <n i= }{{} µ α n=o P i=n i= X i X i+ [ >n N Tibério Caetano: Grahical Models Lecture 5 - Inference 7 / 9 i=n N i=n i i+ i i+ ] }{{} µ β n=o P N i=n X i X i+ ]
84 Belief Proagation in a Chain So in order to comute n we only need the incoming messages to n But n is arbitrary so in order to answer an arbitrary query we need an arbitrary air of incoming messages So we need all messages To comute a message to the right left we need all revious messages coming from the left right So the rotocol should be: start from the leaves u to n then go back towards the leaves Chain with N nodes N messages to be comuted Tibério Caetano: Grahical Models Lecture 5 - Inference 8 / 9
85 Comuting Messages Defining trivial messages: m 0 := m N+ N := For i = to N comute m i i = i i i m i i For i = N back to comute m i+ i = i+ i i+ m i+ + Tibério Caetano: Grahical Models Lecture 5 - Inference 9 / 9
86 Belief Proagation in a Tree The reason why things are so nice in the chain is that every node can be seen as a leaf after it has received the message from one side i.e. after the nodes from which the message come have been eliminated Original Leaves give us the right lace to start the comutations and from there the adjacent nodes become leaves as well However this roerty also holds in a tree Tibério Caetano: Grahical Models Lecture 5 - Inference 0 / 9
87 Belief Proagation in a Tree Message Passing Equation m j i = j j i k:k jk i m k j k:k jk i m k j := whenever j is a leaf Comuting Marginals i = j:j i m j i Tibério Caetano: Grahical Models Lecture 5 - Inference / 9
88 Ma-Product Algorithm There are imortant queries other than comuting marginals. For eamle we may want to comute the most likely assignment: = argma as well as its robability one ossibility would be to comute i = for all i then ĩ i = argma i i and then simly =... N What s the roblem with this? Tibério Caetano: Grahical Models Lecture 5 - Inference / 9
89 Eercises Eercise Construct with {0 } such that = 0 where i = argma i i Tibério Caetano: Grahical Models Lecture 5 - Inference / 9
90 Ma-Product and Ma-Sum Algorithms Instead we need to comute directly = argma... N... N We can use the distributive law again since maab ac = a mab c for a > 0 Eactly the same algorithm alies here with ma instead of : ma-roduct algorithm. To avoid underflow we comute via logargma = argma log = argma s log f s s since log is a monotonic function. We can still use the distributive law since ma + is also a commutative semiring i.e. maa + b a + c = a + mab c Tibério Caetano: Grahical Models Lecture 5 - Inference 4 / 9
91 A Detail in Ma-Sum After comuting the ma-marginal for the root : i = ma i s µ f s and its maimizer i = argma i i It s not a good idea simly to ass back the messages to the leaves and then terminate Why? In such cases it is safer to store the maimizing configurations of revious variables with resect to the net variables and then simly backtrack to restore the maimizing ath. In the articular case of a chain this is called Viterbi algorithm an instance of dynamic rogramming. Tibério Caetano: Grahical Models Lecture 5 - Inference 5 / 9
92 Arbitrary Grahs X 4 X X X 6 X X 5 Elimination algorithm is needed to comute marginals Tibério Caetano: Grahical Models Lecture 5 - Inference 6 / 9
93 A Problem with the Elimination Algorithm Elimination: eamle How to comute X 4 6 X X X 6 X X 5 Tibério Caetano: Grahical Models Lecture 5 - Inference 7 / 9
94 A Problem with the Elimination Algorithm Elimination m Z m m Z m m Z m Z m Z Z Z = = = = = = = δ δ Tibério Caetano: Grahical Models Lecture 5 - Inference 8 / 9
95 A Problem with the Elimination Algorithm Elimination = 6 m Z 6 m Z = = 6 m m Tibério Caetano: Grahical Models Lecture 5 - Inference 9 / 9
96 A Problem with the Elimination Algorithm Elimination: eamle What if now we want to comute 6 X X 4 X X 6 X X 5 Tibério Caetano: Grahical Models Lecture 5 - Inference 0 / 9
97 A Problem with the Elimination Algorithm Elimination: eamle m Z m Z m m Z m Z m Z Z Z = = = = = = = δ δ Tibério Caetano: Grahical Models Lecture 5 - Inference / 9
98 A Problem with the Elimination Algorithm Elimination: eamle m Z m Z m m Z m Z m Z Z Z = = = = = = = δ δ m Z m m Z m m Z m Z m Z Z Z = = = = = = = δ δ Reeated Comutations!! 6 6 How to avoid that? Tibério Caetano: Grahical Models Lecture 5 - Inference / 9
99 The Junction Tree Algorithm The Junction Tree Algorithm is a generalization of the belief roagation algorithm for arbitrary grahs In theory it can be alied to any grah DAG or undirected However it will be efficient only for certain classes of grahs Tibério Caetano: Grahical Models Lecture 5 - Inference / 9
100 Chordal Grahs Chordal Grahs also called triangulated grahs The JT algorithm runs on chordal grahs A chord in a cycle is an edge connecting two nodes in the cycle but which does not belong to the cycle i.e. a shortcut in the cycle Triangulation A grah is chordal if every cycle of length greater than The first ste in the algorithm is to triangulate the grah by roer insertion of edges. has a chord. A B A B C D C D not chordal E F E F chordal Tibério Caetano: Grahical Models Lecture 5 - Inference 4 / 9
101 Chordal Grahs What if a grah is not chordal? Add edges until it becomes chordal This will change the grah Eercise: Why is this not a roblem? Tibério Caetano: Grahical Models Lecture 5 - Inference 5 / 9
102 Triangulation Ste Triangulate the grah if it s not triangulated X 4 X X X 6 X X 5 Tibério Caetano: Grahical Models Lecture 5 - Inference 6 / 9
103 Junction Tree Construction The Junction Tree Algorithm Create a Junction Tree X 4 X X X X X X X X 5 X X X 6 X X 5 X X 5 X X 4 X X X 5 X 6 Tibério Caetano: Grahical Models Lecture 5 - Inference 7 / 9
104 Initialization The Junction Tree Algorithm Initialize clique otentials nodes and searators Ψc Directly introduced Φ s Initialized to Ψ 4 Ψ Ψ 5 X X X X X X X X 5 Φ5 = ones S S X X 4 Φ = ones S S Φ = ones S X X X 5 X X 5 X 6 Ψ 56 Tibério Caetano: Grahical Models Lecture 5 - Inference 8 / 9
105 Proagation Ψ ** Φ Ψ * * 4 Φ ** The Junction Tree Algorithm 4 Message assing Φ = Φ ** * = Ψ Φ = Φ * = Ψ 4 Ψ Ψ 4 * 4 Φ Ψ Φ Ψ Φ Ψ ** * 5 * 5 * 56 * * 56 = Ψ Φ = Φ * = Ψ Φ = Φ = Ψ Φ = Φ Ψ * 5 * 5 5 Ψ * 56 ** * ** 5 Ψ Ψ ** 5 Φ ** 5 Φ = Φ = Ψ 6 X X X X X X X X 5 ** 5 * 5 Ψ ** 56 * 5 X X 4 X X X 5 X X 5 X 6 Tibério Caetano: Grahical Models Lecture 5 - Inference 9 / 9
106 Grahical Models Lecture 6 - Learning Tibério Caetano tiberiocaetano.com Statistical Machine Learning Grou NICTA Canberra LLSS Canberra 009 Tibério Caetano: Grahical Models Lecture 6 - Learning / 0
107 Learning We saw that Given ; θ Probabilistic Inference consists of comuting Marginals of ; θ Conditional distributions MAP configurations etc. However what is ; θ in the first lace? Finding ; θ from data is called Learning or Estimation or Statistical Inference. Tibério Caetano: Grahical Models Lecture 6 - Learning / 0
108 Maimum Likelihood Estimation In the case of Grahical Models we ve seen that ; θ = Z s S f s s ; θ s where {s} are subsets of random variables and {f s } are non-negative real-valued functions. We can re-write that as ; θ = e s S log f s s ; θ s gθ where gθ = log e s S log f s s ; θ s Tibério Caetano: Grahical Models Lecture 6 - Learning / 0
109 Data IID assumtion We observe data X = {X... X m } We assume every X i is a samle from the same unknown distribution ; θ S identical assumtion We assume X i and X j i j to be drawn indeendently from ; θ S indeendence assumtion This is the iid setting indeendently and identically distributed Tibério Caetano: Grahical Models Lecture 6 - Learning 4 / 0
110 Maimum Likelihood Estimation The joint robability of observing the data X is thus X; θ = i i ; θ = i e s log f s i s; θ s gθ Seen as a function of θ this is the likelihood function. The negative log-likelihood is log X; θ = mgθ m i= s log f ss; i θ s Tibério Caetano: Grahical Models Lecture 6 - Learning 5 / 0
111 Maimum Likelihood Estimation Maimum likelihood estimation consists of finding θ that maimizes the likelihood function or minimizes the negative log-likelihood: ] m θ = argmin θ [mgθ log f s s; i θ s i= s } {{ } :=lθ;x In order to minimize it we must have θ lθ; X = 0 and therefore each θs lθ; X = 0 s. What haens for both BNs and MRFs? Tibério Caetano: Grahical Models Lecture 6 - Learning 6 / 0
112 ML Estimation in BNs For BNs gθ = 0 Eercise and s f s s ; θ s = s so θs [ mgθ m log f s s; i θ s + s i= m θs f s s i ; θ s = λ s i= s s θs f s s ; θ s s λ s s f s s ; θ s Therefore we have S equations where every air can be solved indeendently for θ s and λ s So the ML estimation roblem decoules on local ML estimation roblems involving only the variables in each individual set s. Tibério Caetano: Grahical Models Lecture 6 - Learning 7 / 0 ] =
113 ML Estimation in MRFs For MRFs gθ 0 so θs [mgθ m θs gθ ] m log f s s; i θ s = 0 i= s m θs f s s; i θ s = 0 s i= s Therefore we have S equations that cannot be solved indeendently since gθ involves all s. This may give rise to a comle non-linear system of equations. So learning in MRFs is more difficult than in BNs. Tibério Caetano: Grahical Models Lecture 6 - Learning 8 / 0
114 Eonential Families Consider the arameterized family of distributions ; θ = e Φ θ gθ Such a family of distributions is called an Eonential Family Φ is the sufficient statistics θ is the natural arameter gθ = log e Φ θ is the log-artition function This is the form of several distributions of interest like Gaussian binomial multinomial Poisson gamma Rayleigh beta etc. Tibério Caetano: Grahical Models Lecture 6 - Learning 9 / 0
115 Eonential Families If we assume that our ; θ is an eonential family the learning roblem becomes articularly convenient because it becomes conve Why? Recall the form of ; θ for a grahical model ; θ = e s S log f s s ; θ s gθ for it to be an eonential family we need s S log f s s ; θ s = Φ θ = s Φ s s θ s Tibério Caetano: Grahical Models Lecture 6 - Learning 0 / 0
116 Eonential Families for MRFs For MRFs the negative log-likelihood now becomes m log X; θ =mgθ Φs s i θ s i= =mgθ m s where we defined µ s s := m i= Φ s s /m s µ s s θ s Taking the gradient and setting to zero we have θs mgθ m s µ s s θ s = 0 θs gθ = µ s s but θs gθ = E ;θ [Φ s s ]? so E ;θ [Φ s s ] = µ s s Tibério Caetano: Grahical Models Lecture 6 - Learning / 0
117 Eonential Families for MRFs In other words: The ML estimate θ must be such that the eected value of the sufficient statistics under ; θ for every clique has to match the samle average for the clique. Why is the roblem conve? Eercise Tibério Caetano: Grahical Models Lecture 6 - Learning / 0
118 Eonential Families for BNs For BNs the negative log-likelihood now becomes m log X; θ = Φs s i θ s i= = m s s µ s s θ s where we also defined µ s s := m i= Φ s i s/m. Constructing the Lagrangian corresonding to the constraints s e Φ s θ s = s and taking the gradient equal to zero we have m µ s s = λ s E s s ;θ s [Φ s s ] s which can be solved for θ s and λ s using e Φ s θ s = Tibério Caetano: Grahical Models Lecture 6 - Learning / 0 s
119 Eamle: Discrete BNs Multinomial random variables Tabular reresentation for v av define φ v := v av One arameter θ v associated to each v av i.e. θ v φv := v av ; θ v Note that there are no constraints beyond the normalization constraint The joint is V θ = v θ v φv The likelihood is then log X; θ = log n Vn θ continuing... Tibério Caetano: Grahical Models Lecture 6 - Learning 4 / 0
120 Eamle: Discrete BNs log X; θ = log Vn θ n = log V ; θ δ V Vn n V = δ V Vn log V ; θ n V = m V log V ; θ V = V m V log v θ v φv = m V log θ v φv V v = m V log θ v φv v φv V\φv = m φv log θ v φv v φv Tibério Caetano: Grahical Models Lecture 6 - Learning 5 / 0
121 Eamle: Discrete BNs The Lagrangian is Lθ λ = m φv log θ v φv + v v λ v v θ v φv φ v and θv φv Lθ λ = m φ v θ v φv λ v = 0 λ v = m φ v θ v φv but since v θ v φv = λ v = m av so θ v φv = m φ v m av Matches intuition Tibério Caetano: Grahical Models Lecture 6 - Learning 6 / 0
122 Learning the Potentials ML Estimation First How to learn the otential functions when we have observed data for all variables in the model? Second How to learn the otential functions when there are latent hidden variables i.e. we do not observe data for them? Tibério Caetano: Grahical Models Lecture 6 - Learning 7 / 0
123 Learning the Potentials Comletely Observed GMs V Ψ Z X X X = Ψ Assume we observe N instances of this model For IID samling the sufficient statistics are the emirical marginals ~ ~ and How do we estimate Ψ and Ψ from the sufficient statistics? Tibério Caetano: Grahical Models Lecture 6 - Learning 8 / 0
124 Learning the Potentials Comletely Observed GMs Let s make a guess: ~ ~ ~ ˆ ML = so that We can verify that our guess is good because: ~ ˆ ML = Ψ ~ ~ ˆ ML = Ψ = = ~ ~ ~ ~ ˆ ML = = ~ ~ ~ ~ ˆ ML Tibério Caetano: Grahical Models Lecture 6 - Learning 9 / 0
125 Learning the Potentials Comletely Observed GMs The general recie is: For every maimal clique C set the clique otential to its emirical marginal For every intersection S between maimal cliques associate an emirical marginal with that intersection and divide it into the otential of ONE of the cliques that form the intersection This will give ML estimates for decomosable Grahical Models Tibério Caetano: Grahical Models Lecture 6 - Learning 0 / 0
126 that are not adjacent in the cycle are not neighbors that is any {v a v b } with a b is not in E. That is there is no chord or shortcut for the cycle. Decomosable A grah is chordal also Grahs called triangulated if it contains no chordless cycles of length greater than. A grah is comlete if E contains all airs of distinct elements of V. A grah G = V E is decomosable if either. G is comlete or. We can eress V as V = A B C where a A B and C are disjoint b A and C are non-emty c B is comlete d B searates A and C in G and e A B and B C are decomosable. A ath is a sequence v... v n of distinct vertices for which all {v i v i+ } E. Afortree decomosable is an undirectedgrahs for thewhich derivative every air of the of vertices log-artition is connected by recisely one ath. A clique of a grah G is a subset of vertices that are comletely function gθ connected. decoules A maimal over clique theof cliques G is a clique Eercise for which every MRF suerset of learning vertices of easy. G is not a clique. A clique tree for a grah G = V E is a tree T = V T E T where V T is a set of cliques of G that contains all maimal cliques of G. Tibério Caetano: Grahical Models Lecture 6 - Learning / 0
127 Learning the Potentials Comletely Observed GMs Non-decomosable Grahical Models: An iterative rocedure must be used: Iterative Proortional Fitting IPF: ~ C C C C C C Ψ = Ψ ~ C t C C t C C t C Ψ = Ψ + Where it can be shown that: ~ C C t = + Tibério Caetano: Grahical Models Lecture 6 - Learning / 0
128 EM Algorithm Hidden variables: EM algorithm How to estimate the otentials when there are unobserved variables? X X X Answer: EM algorithm Tibério Caetano: Grahical Models Lecture 6 - Learning / 0
129 EM Algorithm Hidden variables: EM algorithm Denote the observed variables by X and the hidden variables by Z X Z X X X If we knew Z the roblem would reduce to maimizing the comlete log-likelihood: l c θ; z = log z θ Tibério Caetano: Grahical Models Lecture 6 - Learning 4 / 0
130 EM Algorithm Hidden variables: EM algorithm However we don t observe Z so the robability of the data X is l θ; = log θ = log z θ Which is the incomlete log-likelihood z This is the quantity we really want to maimize Note that now the logarithm cannot transform the roduct into a sum since it is blocked by the sum over Z and the otimization does not decoule Tibério Caetano: Grahical Models Lecture 6 - Learning 5 / 0
131 EM Algorithm Hidden variables: EM algorithm The basic idea of the EM algorithm is: Given that Z is not observed we may try to otimize an averaged version over all ossible values of Z of the comlete log-likelihood We do that through an averaging distribution q: l θ z = q z θ log z θ c q z And obtain the eected comlete log-likelihood The hoe then is that maimizing this should at least imrove the current estimate for the arameters so that iteration would eventually maimize the log-likelihood Tibério Caetano: Grahical Models Lecture 6 - Learning 6 / 0
132 EM Algorithm Hidden variables: EM algorithm In order to resent the algorithm we first note that: : log log ; log ; log ; θ θ θ θ θ θ θ θ q L z q z z q z q z z q l z l l z z z = = = = Where L is the auiliary function. The EM algorithm is coordinate-ascent on L Tibério Caetano: Grahical Models Lecture 6 - Learning 7 / 0
133 Hidden variables: EM algorithm EM Algorithm The EM algorithm E - ste q = arg ma L q θ t+ t q M - ste θ t+ = arg ma L q θ t+ θ Tibério Caetano: Grahical Models Lecture 6 - Learning 8 / 0
134 EM Algorithm Hidden variables: EM algorithm Note that the M ste is equivalent to maimizing the eected comlete loglikelihood: = = = z q c z z z z q z q z l q L z q z q z z q q L z q z z q q L log ; log log log θ θ θ θ θ θ Because the second term does not deend on θ Tibério Caetano: Grahical Models Lecture 6 - Learning 9 / 0
135 EM Algorithm The general solution to the E ste turns out to be Because t t z z q θ = + ; log log log l z L z L z z L z z z z L t t t t t t t z t t t t t z t t t θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ = = = = Tibério Caetano: Grahical Models Lecture 6 - Learning 0 / 0
Introduction to Probability for Graphical Models
Introduction to Probability for Grahical Models CSC 4 Kaustav Kundu Thursday January 4, 06 *Most slides based on Kevin Swersky s slides, Inmar Givoni s slides, Danny Tarlow s slides, Jaser Snoek s slides,
More informationBayesian Networks Practice
Bayesian Networks Practice Part 2 2016-03-17 Byoung-Hee Kim, Seong-Ho Son Biointelligence Lab, CSE, Seoul National University Agenda Probabilistic Inference in Bayesian networks Probability basics D-searation
More informationBayesian networks. 鮑興國 Ph.D. National Taiwan University of Science and Technology
Bayesian networks 鮑興國 Ph.D. National Taiwan University of Science and Technology Outline Introduction to Bayesian networks Bayesian networks: definition, d-searation, equivalence of networks Eamles: causal
More informationBayesian Networks Practice
ayesian Networks Practice Part 2 2016-03-17 young-hee Kim Seong-Ho Son iointelligence ab CSE Seoul National University Agenda Probabilistic Inference in ayesian networks Probability basics D-searation
More informationPart I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS
Part I C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Probabilistic Graphical Models Graphical representation of a probabilistic model Each variable corresponds to a
More informationp L yi z n m x N n xi
y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen
More informationApproximating min-max k-clustering
Aroximating min-max k-clustering Asaf Levin July 24, 2007 Abstract We consider the roblems of set artitioning into k clusters with minimum total cost and minimum of the maximum cost of a cluster. The cost
More informationMachine Learning Lecture 14
Many slides adapted from B. Schiele, S. Roth, Z. Gharahmani Machine Learning Lecture 14 Undirected Graphical Models & Inference 23.06.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de
More informationOutline. Markov Chains and Markov Models. Outline. Markov Chains. Markov Chains Definitions Huizhen Yu
and Markov Models Huizhen Yu janey.yu@cs.helsinki.fi Det. Comuter Science, Univ. of Helsinki Some Proerties of Probabilistic Models, Sring, 200 Huizhen Yu (U.H.) and Markov Models Jan. 2 / 32 Huizhen Yu
More informationUndirected Graphical Models: Markov Random Fields
Undirected Graphical Models: Markov Random Fields 40-956 Advanced Topics in AI: Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2015 Markov Random Field Structure: undirected
More informationChapter 7 Sampling and Sampling Distributions. Introduction. Selecting a Sample. Introduction. Sampling from a Finite Population
Chater 7 and s Selecting a Samle Point Estimation Introduction to s of Proerties of Point Estimators Other Methods Introduction An element is the entity on which data are collected. A oulation is a collection
More informationCSC 412 (Lecture 4): Undirected Graphical Models
CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:
More informationProbabilistic Graphical Models (I)
Probabilistic Graphical Models (I) Hongxin Zhang zhx@cad.zju.edu.cn State Key Lab of CAD&CG, ZJU 2015-03-31 Probabilistic Graphical Models Modeling many real-world problems => a large number of random
More informationUniform Law on the Unit Sphere of a Banach Space
Uniform Law on the Unit Shere of a Banach Sace by Bernard Beauzamy Société de Calcul Mathématique SA Faubourg Saint Honoré 75008 Paris France Setember 008 Abstract We investigate the construction of a
More informationParticle-Based Approximate Inference on Graphical Model
article-based Approimate Inference on Graphical Model Reference: robabilistic Graphical Model Ch. 2 Koller & Friedman CMU, 0-708, Fall 2009 robabilistic Graphical Models Lectures 8,9 Eric ing attern Recognition
More informationStatics and dynamics: some elementary concepts
1 Statics and dynamics: some elementary concets Dynamics is the study of the movement through time of variables such as heartbeat, temerature, secies oulation, voltage, roduction, emloyment, rices and
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationCS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 Plan for today and next week Today and next time: Bayesian networks (Bishop Sec. 8.1) Conditional
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationLecture 4 October 18th
Directed and undirected graphical models Fall 2017 Lecture 4 October 18th Lecturer: Guillaume Obozinski Scribe: In this lecture, we will assume that all random variables are discrete, to keep notations
More informationEmpirical Bayesian EM-based Motion Segmentation
Emirical Bayesian EM-based Motion Segmentation Nuno Vasconcelos Andrew Liman MIT Media Laboratory 0 Ames St, E5-0M, Cambridge, MA 09 fnuno,lig@media.mit.edu Abstract A recent trend in motion-based segmentation
More informationLecture 7: Linear Classification Methods
Homeork Homeork Lecture 7: Linear lassification Methods Final rojects? Grous oics Proosal eek 5 Lecture is oster session, Jacobs Hall Lobby, snacks Final reort 5 June. What is linear classification? lassification
More informationFeedback-error control
Chater 4 Feedback-error control 4.1 Introduction This chater exlains the feedback-error (FBE) control scheme originally described by Kawato [, 87, 8]. FBE is a widely used neural network based controller
More informationConvex Analysis and Economic Theory Winter 2018
Division of the Humanities and Social Sciences Ec 181 KC Border Conve Analysis and Economic Theory Winter 2018 Toic 16: Fenchel conjugates 16.1 Conjugate functions Recall from Proosition 14.1.1 that is
More informationChris Bishop s PRML Ch. 8: Graphical Models
Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular
More informationMATH 2710: NOTES FOR ANALYSIS
MATH 270: NOTES FOR ANALYSIS The main ideas we will learn from analysis center around the idea of a limit. Limits occurs in several settings. We will start with finite limits of sequences, then cover infinite
More informationCSE 599d - Quantum Computing When Quantum Computers Fall Apart
CSE 599d - Quantum Comuting When Quantum Comuters Fall Aart Dave Bacon Deartment of Comuter Science & Engineering, University of Washington In this lecture we are going to begin discussing what haens to
More informationECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4
ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian
More informationTopic 7: Using identity types
Toic 7: Using identity tyes June 10, 2014 Now we would like to learn how to use identity tyes and how to do some actual mathematics with them. By now we have essentially introduced all inference rules
More informationLearning Sequence Motif Models Using Gibbs Sampling
Learning Sequence Motif Models Using Gibbs Samling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Sring 2018 Anthony Gitter gitter@biostat.wisc.edu These slides excluding third-arty material are licensed under
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationBiointelligence Lab School of Computer Sci. & Eng. Seoul National University
Artificial Intelligence Chater 19 easoning with Uncertain Information Biointelligence Lab School of Comuter Sci. & Eng. Seoul National University Outline l eview of Probability Theory l Probabilistic Inference
More informationRepresentation of undirected GM. Kayhan Batmanghelich
Representation of undirected GM Kayhan Batmanghelich Review Review: Directed Graphical Model Represent distribution of the form ny p(x 1,,X n = p(x i (X i i=1 Factorizes in terms of local conditional probabilities
More informationNamed Entity Recognition using Maximum Entropy Model SEEM5680
Named Entity Recognition using Maximum Entroy Model SEEM5680 Named Entity Recognition System Named Entity Recognition (NER): Identifying certain hrases/word sequences in a free text. Generally it involves
More informationPart III. for energy minimization
ICCV 2007 tutorial Part III Message-assing algorithms for energy minimization Vladimir Kolmogorov University College London Message assing ( E ( (,, Iteratively ass messages between nodes... Message udate
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft
More informationBiitlli Biointelligence Laboratory Lb School of Computer Science and Engineering. Seoul National University
Monte Carlo Samling Chater 4 2009 Course on Probabilistic Grahical Models Artificial Neural Networs, Studies in Artificial i lintelligence and Cognitive i Process Biitlli Biointelligence Laboratory Lb
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 9 Undirected Models CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due next Wednesday (Nov 4) in class Start early!!! Project milestones due Monday (Nov 9)
More informationGraphical Models and Kernel Methods
Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture Notes Fall 2009 November, 2009 Byoung-Ta Zhang School of Computer Science and Engineering & Cognitive Science, Brain Science, and Bioinformatics Seoul National University
More information3 : Representation of Undirected GM
10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should
More informationConditional Independence and Factorization
Conditional Independence and Factorization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More information1 Entropy 1. 3 Extensivity 4. 5 Convexity 5
Contents CONEX FUNCIONS AND HERMODYNAMIC POENIALS 1 Entroy 1 2 Energy Reresentation 2 3 Etensivity 4 4 Fundamental Equations 4 5 Conveity 5 6 Legendre transforms 6 7 Reservoirs and Legendre transforms
More informationOn parameter estimation in deformable models
Downloaded from orbitdtudk on: Dec 7, 07 On arameter estimation in deformable models Fisker, Rune; Carstensen, Jens Michael Published in: Proceedings of the 4th International Conference on Pattern Recognition
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 12 Dynamical Models CS/CNS/EE 155 Andreas Krause Homework 3 out tonight Start early!! Announcements Project milestones due today Please email to TAs 2 Parameter learning
More informationOutline. Spring It Introduction Representation. Markov Random Field. Conclusion. Conditional Independence Inference: Variable elimination
Probabilistic Graphical Models COMP 790-90 Seminar Spring 2011 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Outline It Introduction ti Representation Bayesian network Conditional Independence Inference:
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationB8.1 Martingales Through Measure Theory. Concept of independence
B8.1 Martingales Through Measure Theory Concet of indeendence Motivated by the notion of indeendent events in relims robability, we have generalized the concet of indeendence to families of σ-algebras.
More informationMODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL
Technical Sciences and Alied Mathematics MODELING THE RELIABILITY OF CISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL Cezar VASILESCU Regional Deartment of Defense Resources Management
More informationMachine Learning 4771
Machine Learning 4771 Instructor: Tony Jebara Topic 16 Undirected Graphs Undirected Separation Inferring Marginals & Conditionals Moralization Junction Trees Triangulation Undirected Graphs Separation
More informationChapter 1: PROBABILITY BASICS
Charles Boncelet, obability, Statistics, and Random Signals," Oxford University ess, 0. ISBN: 978-0-9-0005-0 Chater : PROBABILITY BASICS Sections. What Is obability?. Exeriments, Outcomes, and Events.
More informationBayesian Networks for Modeling and Managing Risks of Natural Hazards
[National Telford Institute and Scottish Informatics and Comuter Science Alliance, Glasgow University, Set 8, 200 ] Bayesian Networks for Modeling and Managing Risks of Natural Hazards Daniel Straub Engineering
More informationNotes on Markov Networks
Notes on Markov Networks Lili Mou moull12@sei.pku.edu.cn December, 2014 This note covers basic topics in Markov networks. We mainly talk about the formal definition, Gibbs sampling for inference, and maximum
More informationProbabilistic Graphical Models
2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector
More informationCHAPTER 5 STATISTICAL INFERENCE. 1.0 Hypothesis Testing. 2.0 Decision Errors. 3.0 How a Hypothesis is Tested. 4.0 Test for Goodness of Fit
Chater 5 Statistical Inference 69 CHAPTER 5 STATISTICAL INFERENCE.0 Hyothesis Testing.0 Decision Errors 3.0 How a Hyothesis is Tested 4.0 Test for Goodness of Fit 5.0 Inferences about Two Means It ain't
More informationMAP Estimation Algorithms in Computer Vision - Part II
MAP Estimation Algorithms in Comuter Vision - Part II M. Pawan Kumar, University of Oford Pushmeet Kohli, Microsoft Research Eamle: Image Segmentation E() = c i i + c ij i (1- j ) i i,j E: {0,1} n R 0
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationElements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley
Elements of Asymtotic Theory James L. Powell Deartment of Economics University of California, Berkeley Objectives of Asymtotic Theory While exact results are available for, say, the distribution of the
More informationChapter 10. Supplemental Text Material
Chater 1. Sulemental Tet Material S1-1. The Covariance Matri of the Regression Coefficients In Section 1-3 of the tetbook, we show that the least squares estimator of β in the linear regression model y=
More informationBayesian Networks BY: MOHAMAD ALSABBAGH
Bayesian Networks BY: MOHAMAD ALSABBAGH Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN Introduction Conditional
More informationChapter 16. Structured Probabilistic Models for Deep Learning
Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe
More informationBERNOULLI TRIALS and RELATED PROBABILITY DISTRIBUTIONS
BERNOULLI TRIALS and RELATED PROBABILITY DISTRIBUTIONS A BERNOULLI TRIALS Consider tossing a coin several times It is generally agreed that the following aly here ) Each time the coin is tossed there are
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 10 Undirected Models CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due this Wednesday (Nov 4) in class Project milestones due next Monday (Nov 9) About half
More information1 Undirected Graphical Models. 2 Markov Random Fields (MRFs)
Machine Learning (ML, F16) Lecture#07 (Thursday Nov. 3rd) Lecturer: Byron Boots Undirected Graphical Models 1 Undirected Graphical Models In the previous lecture, we discussed directed graphical models.
More informationRadial Basis Function Networks: Algorithms
Radial Basis Function Networks: Algorithms Introduction to Neural Networks : Lecture 13 John A. Bullinaria, 2004 1. The RBF Maing 2. The RBF Network Architecture 3. Comutational Power of RBF Networks 4.
More informationStatistics II Logistic Regression. So far... Two-way repeated measures ANOVA: an example. RM-ANOVA example: the data after log transform
Statistics II Logistic Regression Çağrı Çöltekin Exam date & time: June 21, 10:00 13:00 (The same day/time lanned at the beginning of the semester) University of Groningen, Det of Information Science May
More informationResearch Note REGRESSION ANALYSIS IN MARKOV CHAIN * A. Y. ALAMUTI AND M. R. MESHKANI **
Iranian Journal of Science & Technology, Transaction A, Vol 3, No A3 Printed in The Islamic Reublic of Iran, 26 Shiraz University Research Note REGRESSION ANALYSIS IN MARKOV HAIN * A Y ALAMUTI AND M R
More informationSystem Reliability Estimation and Confidence Regions from Subsystem and Full System Tests
009 American Control Conference Hyatt Regency Riverfront, St. Louis, MO, USA June 0-, 009 FrB4. System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests James C. Sall Abstract
More informationMonte Carlo Studies. Monte Carlo Studies. Sampling Distribution
Monte Carlo Studies Do not let yourself be intimidated by the material in this lecture This lecture involves more theory but is meant to imrove your understanding of: Samling distributions and tests of
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:
More informationAn Introduction to Bayesian Machine Learning
1 An Introduction to Bayesian Machine Learning José Miguel Hernández-Lobato Department of Engineering, Cambridge University April 8, 2013 2 What is Machine Learning? The design of computational systems
More informationLecture 6: Graphical Models
Lecture 6: Graphical Models Kai-Wei Chang CS @ Uniersity of Virginia kw@kwchang.net Some slides are adapted from Viek Skirmar s course on Structured Prediction 1 So far We discussed sequence labeling tasks:
More informationAn Introduction to Probabilistic Graphical Models
An Introduction to Probabilistic Graphical Models Cédric Archambeau Xerox Research Centre Europe cedric.archambeau@xrce.xerox.com Pascal Bootcamp Marseille, France, July 2010 Reference material D. Koller
More informationAI*IA 2003 Fusion of Multiple Pattern Classifiers PART III
AI*IA 23 Fusion of Multile Pattern Classifiers PART III AI*IA 23 Tutorial on Fusion of Multile Pattern Classifiers by F. Roli 49 Methods for fusing multile classifiers Methods for fusing multile classifiers
More informationGraphical models. Sunita Sarawagi IIT Bombay
1 Graphical models Sunita Sarawagi IIT Bombay http://www.cse.iitb.ac.in/~sunita 2 Probabilistic modeling Given: several variables: x 1,... x n, n is large. Task: build a joint distribution function Pr(x
More informationRANDOM WALKS AND PERCOLATION: AN ANALYSIS OF CURRENT RESEARCH ON MODELING NATURAL PROCESSES
RANDOM WALKS AND PERCOLATION: AN ANALYSIS OF CURRENT RESEARCH ON MODELING NATURAL PROCESSES AARON ZWIEBACH Abstract. In this aer we will analyze research that has been recently done in the field of discrete
More informationLecture 21: Quantum Communication
CS 880: Quantum Information Processing 0/6/00 Lecture : Quantum Communication Instructor: Dieter van Melkebeek Scribe: Mark Wellons Last lecture, we introduced the EPR airs which we will use in this lecture
More informationarxiv: v1 [physics.data-an] 26 Oct 2012
Constraints on Yield Parameters in Extended Maximum Likelihood Fits Till Moritz Karbach a, Maximilian Schlu b a TU Dortmund, Germany, moritz.karbach@cern.ch b TU Dortmund, Germany, maximilian.schlu@cern.ch
More informationA Social Welfare Optimal Sequential Allocation Procedure
A Social Welfare Otimal Sequential Allocation Procedure Thomas Kalinowsi Universität Rostoc, Germany Nina Narodytsa and Toby Walsh NICTA and UNSW, Australia May 2, 201 Abstract We consider a simle sequential
More informationCS Lecture 4. Markov Random Fields
CS 6347 Lecture 4 Markov Random Fields Recap Announcements First homework is available on elearning Reminder: Office hours Tuesday from 10am-11am Last Time Bayesian networks Today Markov random fields
More informationGenerating Swerling Random Sequences
Generating Swerling Random Sequences Mark A. Richards January 1, 1999 Revised December 15, 8 1 BACKGROUND The Swerling models are models of the robability function (df) and time correlation roerties of
More informationShort course A vademecum of statistical pattern recognition techniques with applications to image and video analysis. Agenda
Short course A vademecum of statistical attern recognition techniques with alications to image and video analysis Lecture 6 The Kalman filter. Particle filters Massimo Piccardi University of Technology,
More information4. Score normalization technical details We now discuss the technical details of the score normalization method.
SMT SCORING SYSTEM This document describes the scoring system for the Stanford Math Tournament We begin by giving an overview of the changes to scoring and a non-technical descrition of the scoring rules
More informationBased on slides by Richard Zemel
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we
More informationChapter 7: Special Distributions
This chater first resents some imortant distributions, and then develos the largesamle distribution theory which is crucial in estimation and statistical inference Discrete distributions The Bernoulli
More information4 : Exact Inference: Variable Elimination
10-708: Probabilistic Graphical Models 10-708, Spring 2014 4 : Exact Inference: Variable Elimination Lecturer: Eric P. ing Scribes: Soumya Batra, Pradeep Dasigi, Manzil Zaheer 1 Probabilistic Inference
More informationInference and Representation
Inference and Representation David Sontag New York University Lecture 5, Sept. 30, 2014 David Sontag (NYU) Inference and Representation Lecture 5, Sept. 30, 2014 1 / 16 Today s lecture 1 Running-time of
More informationRandom variables. Lecture 5 - Discrete Distributions. Discrete Probability distributions. Example - Discrete probability model
Random Variables Random variables Lecture 5 - Discrete Distributions Sta02 / BME02 Colin Rundel Setember 8, 204 A random variable is a numeric uantity whose value deends on the outcome of a random event
More informationStatistical Learning
Statistical Learning Lecture 5: Bayesian Networks and Graphical Models Mário A. T. Figueiredo Instituto Superior Técnico & Instituto de Telecomunicações University of Lisbon, Portugal May 2018 Mário A.
More informationDirected Graphical Models
CS 2750: Machine Learning Directed Graphical Models Prof. Adriana Kovashka University of Pittsburgh March 28, 2017 Graphical Models If no assumption of independence is made, must estimate an exponential
More informationLecture 8: Bayesian Networks
Lecture 8: Bayesian Networks Bayesian Networks Inference in Bayesian Networks COMP-652 and ECSE 608, Lecture 8 - January 31, 2017 1 Bayes nets P(E) E=1 E=0 0.005 0.995 E B P(B) B=1 B=0 0.01 0.99 E=0 E=1
More informationCMSC 425: Lecture 4 Geometry and Geometric Programming
CMSC 425: Lecture 4 Geometry and Geometric Programming Geometry for Game Programming and Grahics: For the next few lectures, we will discuss some of the basic elements of geometry. There are many areas
More informationNetwork Configuration Control Via Connectivity Graph Processes
Network Configuration Control Via Connectivity Grah Processes Abubakr Muhammad Deartment of Electrical and Systems Engineering University of Pennsylvania Philadelhia, PA 90 abubakr@seas.uenn.edu Magnus
More informationUniversal Finite Memory Coding of Binary Sequences
Deartment of Electrical Engineering Systems Universal Finite Memory Coding of Binary Sequences Thesis submitted towards the degree of Master of Science in Electrical and Electronic Engineering in Tel-Aviv
More informationRobust hamiltonicity of random directed graphs
Robust hamiltonicity of random directed grahs Asaf Ferber Rajko Nenadov Andreas Noever asaf.ferber@inf.ethz.ch rnenadov@inf.ethz.ch anoever@inf.ethz.ch Ueli Peter ueter@inf.ethz.ch Nemanja Škorić nskoric@inf.ethz.ch
More informationNumerical Linear Algebra
Numerical Linear Algebra Numerous alications in statistics, articularly in the fitting of linear models. Notation and conventions: Elements of a matrix A are denoted by a ij, where i indexes the rows and
More information1 Gambler s Ruin Problem
Coyright c 2017 by Karl Sigman 1 Gambler s Ruin Problem Let N 2 be an integer and let 1 i N 1. Consider a gambler who starts with an initial fortune of $i and then on each successive gamble either wins
More information4.1 Notation and probability review
Directed and undirected graphical models Fall 2015 Lecture 4 October 21st Lecturer: Simon Lacoste-Julien Scribe: Jaime Roquero, JieYing Wu 4.1 Notation and probability review 4.1.1 Notations Let us recall
More information