University of Washington Department of Electrical Engineering EE512 Spring, 2006 Graphical Models Jeff A. Bilmes <bilmes@ee.washington.edu> Lecture 1 Slides March 28 th, 2006 Lec 1: March 28th, 2006 EE512 - Graphical Models - J. Bilmes Page 1
Outline of Today s Lecture Class overview What are graphical models Semantics of Bayesian networks Lec 1: March 28th, 2006 EE512 - Graphical Models - J. Bilmes Page 2
Books and Sources for Today Jordan: Chapters 1 and 2 Lec 1: March 28th, 2006 EE512 - Graphical Models - J. Bilmes Page 3
Class Road Map L1: Tues, 3/28: Overview, GMs, Intro BNs. L2: Thur, 3/30 L3: Tues, 4/4 L4: Thur, 4/6 L5: Tue, 4/11 L6: Thur, 4/13 L7: Tues, 4/18 L8: Thur, 4/20 L9: Tue, 4/25 L10: Thur, 4/27 L11: Tues, 5/2 L12: Thur, 5/4 L13: Tues, 5/9 L14: Thur, 5/11 L15: Tue, 5/16 L16: Thur, 5/18 L17: Tues, 5/23 L18: Thur, 5/25 L19: Tue, 5/30 L20: Thur, 6/1: final presentations Lec 1: March 28th, 2006 EE512 - Graphical Models - J. Bilmes Page 4
Announcements READING: Chapter 1,2 in Jordan s book (pick up book from basement of communications copy center). List handout: name, department, and email List handout: regular makeup slot, and discussion section Syllabus Course web page: http://ssli.ee.washington.edu/ee512 Goal: powerpoint slides this quarter, they will be on the web page after lecture (but not before as you are getting them hot off the press). Lec 1: March 28th, 2006 EE512 - Graphical Models - J. Bilmes Page 5
Graphical Models A graphical model is a visual, abstract, and mathematically formal description of properties of families of probability distributions (densities, mass functions) There are many different types of Graphical model, ex: Bayesian Networks Factor Graph Markov Random Fields Chain Graph Lec 1: March 28th, 2006 EE512 - Graphical Models - J. Bilmes Page 6
GMs cover many well-known methods Graphical Models Other Semantics Causal Models Chain Graphs Factor Graphs Dependency Networks DGMs AR UGMs FST HMM ZMs DBNs Factorial HMM/Mixed Memory Markov Models BMMs Bayesian Networks Mixture Models Kalman Decision Trees Segment Models Simple Models PCA LDA MRFs Gibbs/Boltzman Distributions Lec 1: March 28th, 2006 EE512 - Graphical Models - J. Bilmes Page 7
Graphical Models Provide: Structure Algorithms Language Approximations Data-Bases Lec 1: March 28th, 2006 EE512 - Graphical Models - J. Bilmes Page 8
Graphical Models Provide GMs give us: I. Structure: A method to explore the structure of natural phenomena (causal vs. correlated relations, properties of natural signals and scenes) II. Algorithms: A set of algorithms that provide efficient probabilistic inference and statistical decision making III. Language: A mathematically formal, abstract, visual language with which to efficiently discuss families of probabilistic models and their properties. Lec 1: March 28th, 2006 EE512 - Graphical Models - J. Bilmes Page 9
GMs Provide GMs give us (cont): IV. Approximation: Methods to explore systems of approximation and their implications. E.g., what are the consequences of a (perhaps known to be) wrong assumption? V. Data-base: Provide a probabilistic data-base and corresponding search algorithms for making queries about properties in such model families. Lec 1: March 28th, 2006 EE512 - Graphical Models - J. Bilmes Page 10
GMs There are many different types of GM. Each GM has its semantics A GM (under the current semantics) is really a set of constraints. The GM represents all probability distributions that obey these constraints, including those that obey additional constraints (but not including those that obey fewer constraints). Most often, the constraints are some form of factorization property, e.g., f() factorizes (is composed of a product of factors of subsets of arguments). Lec 1: March 28th, 2006 EE512 - Graphical Models - J. Bilmes Page 11
Types of Queries Several types of queries we may be interested in: Compute: p(one subset of vars) Compute: p(one subset of vars another subset of vars) Find the N most probable configurations of one subset of variables given assignments of values to some other sets Q: Is one subset independent of another subset? Q: Is one subset independent of another given a third? How efficiently can we do this? Can this question be answered? What if it is too costly, can we approximate, and if so, how well? These are questions we will answer this term. GMs are like a probabilistic data-base (or data structure), a system that can be queried to provide answers to these sorts of questions. Lec 1: March 28th, 2006 EE512 - Graphical Models - J. Bilmes Page 12
Example Typical goal of pattern recognition: training (say, EM or gradient descent), need query of form: In this form, we need to compute p(o,h) efficiently. Bayes decision rule, need to find best class for a given unknown pattern: but this is yet another query on a probability distribution. We can train, and perform Bayes decision theory quickly if we can compute with probabilities quickly. Graphical models provide a way to reason about, and understand when this is possible, and if not, how to reasonably approximate. Lec 1: March 28th, 2006 EE512 - Graphical Models - J. Bilmes Page 13
Some Notation Random variables ψ ψ, ψ ψ, ψ, ψ (scalar or vector) Distributions: Subsets: Lec 1: March 28th, 2006 EE512 - Graphical Models - J. Bilmes Page 14
Main types of Graphical Models Markov Random Fields a form of undirected graphical model relatively simple to understand their semantics also, log-linear models, Gibbs distributions, Boltzman distributions, many exponential models, conditional random fields (CRFs), etc. Bayesian networks a form of directed graphical model originally developed to represent a form of causality, but not ideal for that (they still represent factorization) Semantics more interesting (but trickier) than MRFs Factor Graphs pure, the assembly language models for factorization properties came out of coding theory community (LDPC, Turbo codes) Lec 1: March 28th, 2006 EE512 - Graphical Models - J. Bilmes Page 15
Chain graphs: Main types of Graphical Models Hybrid between Bayesian networks and MRFs A set of clusters of undirected nodes connected as directed links Not as widely used, but very powerful. Ancestral graphs we probably won t cover these. Lec 1: March 28th, 2006 EE512 - Graphical Models - J. Bilmes Page 16
Bayesian Network Examples Mixture models C Markov Chains Q 1 Q 2 Q 3 Q 4 p( qt q1 : t 1) = p( qt qt 1) X px ( ) = cpxi i ( ) i Q 1 Q 2 Q 3 Q 4 p( qt q1 : t 1) = p( qt qt 1, qt 2) Lec 1: March 28th, 2006 EE512 - Graphical Models - J. Bilmes Page 17
GMs: PCA and Factor Analysis Y μ = AX X Y + Q R X N(0, Q) N( μ, R) PCA: Q = Λ, R 0, A = ortho Y = ΓX + μ FA: Q = I, R = diagonal Y = Γ X + u+ μ Other generalizations possible E.g., Q = gen. diagonal, or capture using general A since T Y N( μ, AQA + R) Lec 1: March 28th, 2006 EE512 - Graphical Models - J. Bilmes Page 18
Independent Component Analysis I 1 I 2 I I 1 2 X 1 X 2 X 3 X 4 The data X 1:4 is explained by the two (marginally) independent causes. Lec 1: March 28th, 2006 EE512 - Graphical Models - J. Bilmes Page 19
Linear Discriminant Analysis fi ( X) p( i) PC ( = i X) = f ( X) p( k) f ( X) = N( μ, Σ) j k Class conditional data has diff. mean but common covariance matrix. Fisher s formulation: project onto space spanned by the means. j k μ C X Lec 1: March 28th, 2006 EE512 - Graphical Models - J. Bilmes Page 20
Extensions to LDA HDA/QDA MDA HMDA C C C μ M M X μ μ Heteroschedastic Discriminant Analysis/Quadratic Discriminant Analysis Mixture Discriminant Analsysis X X Both Lec 1: March 28th, 2006 EE512 - Graphical Models - J. Bilmes Page 21
Generalized Decision Trees I D 1 D 2 Generalized Probabilistic Decision Trees Hierarchical Mixtures of Experts (Jordan) D 3 O Lec 1: March 28th, 2006 EE512 - Graphical Models - J. Bilmes Page 22
Example: Printer Troubleshooting Dechter Lec 1: March 28th, 2006 EE512 - Graphical Models - J. Bilmes Page 23
Classifier Combination Mixture of Experts (Sum rule) Naive Bayes (prod. rule) Other Combination Schemes X X 1 X 2... X N C S C 1 C 2 C C X 1 X 2 Lec 1: March 28th, 2006 EE512 - Graphical Models - J. Bilmes Page 24
Discriminative and Generative Models Generative Model C Discriminative Model C X X Lec 1: March 28th, 2006 EE512 - Graphical Models - J. Bilmes Page 25
HMMs/Kalman Filter HMM Q 1 Q 2 Q 3 Q 4 X 1 X 2 X 3 X 4 Autoregressive HMM Q 1 Q 2 Q 3 Q 4 X 1 X 2 X 3 X 4 Lec 1: March 28th, 2006 EE512 - Graphical Models - J. Bilmes Page 26
Switching Kalman Filter Q 1 Q 2 Q 3 Q 4 S 1 S 2 S 3 S 4 X 1 X 2 X 3 X 4 Lec 1: March 28th, 2006 EE512 - Graphical Models - J. Bilmes Page 27
Factorial HMM Q 1 Q 2 Q 3 Q 4 Q 1 Q 2 Q 3 Q 4 X 1 X 2 X 3 X 4 Lec 1: March 28th, 2006 EE512 - Graphical Models - J. Bilmes Page 28
Standard Language Modeling Example: standard 4-gram Pw ( h) = Pw ( w, w, w ) t t t t 1 t 2 t 3 W t 4 W t 3 W t 2 Wt 1 W t Lec 1: March 28th, 2006 EE512 - Graphical Models - J. Bilmes Page 29
Interpolated Uni-,Bi-,Tri-Grams Pw ( h) = P( α = 1) Pw ( ) + P( α = 2) Pw ( w ) t t t t t t t 1 + P( α = 3) p( w w, w ) t t t 1 t 2 Nothing gets zero probability Lec 1: March 28th, 2006 EE512 - Graphical Models - J. Bilmes Page 30
Conditional mixture tri-grams Pw ( h) = P( α = 1 w, w ) Pw ( ) t t t t 1 t 2 t + P( α = 2 w, w ) P( w w ) t t 1 t 2 t t 1 + P( α = 3 w, w ) p( w w, w ) t t 1 t 2 t t 1 t 2 Lec 1: March 28th, 2006 EE512 - Graphical Models - J. Bilmes Page 31
Bayesian Networks and so on We need to be more formal about what BNs mean. In the rest of this, and in the next, lecture, we start with basic semantics of BNs, move on to undirected models, and then come back to BNs again to clean up Lec 1: March 28th, 2006 EE512 - Graphical Models - J. Bilmes Page 32
Bayesian Networks Has nothing to do with Bayesian statistical models (there are Bayesian and non-bayesian Bayesian networks). valid: invalid: Lec 1: March 28th, 2006 EE512 - Graphical Models - J. Bilmes Page 33
BN: Alarm Network Example Compact representation: factors of probabilities of children given parents (J. Pearl, 1988) Sub-family specification: Directed acyclic graph (DAG) Nodes - random variables Edges - direct influence Radio Earthquake Burglary Alarm E e e e e B b b b b P(A E,B) 0.9 0.1 0.2 0.8 0.9 0.1 0.01 0.99 Together (graph and inst.): Defines a unique distribution in a factored form PBEACR (,,,, ) = PBPEPABEPR ( ) ( ) (, ) ( EPC ) ( A) Instantiation: Set of conditional probability distributions Lec 1: March 28th, 2006 EE512 - Graphical Models - J. Bilmes Page 34 Call
Bayesian Networks Reminder: chain rule of probability. For any order σ Lec 1: March 28th, 2006 EE512 - Graphical Models - J. Bilmes Page 35
Bayesian Networks Lec 1: March 28th, 2006 EE512 - Graphical Models - J. Bilmes Page 36
Conditional Independence Lec 1: March 28th, 2006 EE512 - Graphical Models - J. Bilmes Page 37
Conditional Independence & BNs GMs can help answer CI queries: X A B Z X Z Y??? C E D No!! Y Lec 1: March 28th, 2006 EE512 - Graphical Models - J. Bilmes Page 38
Bayesian Networks & CI Lec 1: March 28th, 2006 EE512 - Graphical Models - J. Bilmes Page 39
Bayesian Networks & CI Lec 1: March 28th, 2006 EE512 - Graphical Models - J. Bilmes Page 40
Bayesian Networks & CI Lec 1: March 28th, 2006 EE512 - Graphical Models - J. Bilmes Page 41
Bayesian Networks & CI Lec 1: March 28th, 2006 EE512 - Graphical Models - J. Bilmes Page 42
Pictorial d-separation: blocked/unblocked paths Blocked Paths Unblocked Paths Lec 1: March 28th, 2006 EE512 - Graphical Models - J. Bilmes Page 43
Three Canonical Cases Three 3-node examples of BNs and their conditional independence statements. V 1 V 1 V 2 V 3 V 2 V 3 V V V 2 3 1 V V 2 3 V 2 V 3 V V V 2 3 1 V V 2 3 V 1 V V 2 3 V V V 2 3 1 Lec 1: March 28th, 2006 EE512 - Graphical Models - J. Bilmes Page 44
Case 1 Markov Chain Lec 1: March 28th, 2006 EE512 - Graphical Models - J. Bilmes Page 45
Case 2 Still a Markov Chain Lec 1: March 28th, 2006 EE512 - Graphical Models - J. Bilmes Page 46
Case 3 NOT a Markov Chain Lec 1: March 28th, 2006 EE512 - Graphical Models - J. Bilmes Page 47
Examples of the three cases SUVs Greenhouse Gasses Global Warming Lung Cancer Smoking Bad Breath Genetics Cancer Smoking Lec 1: March 28th, 2006 EE512 - Graphical Models - J. Bilmes Page 48
Bayes Ball simple algorithm, ball bouncing along path in graph, to tell if path is blocked or not (more details in text). Lec 1: March 28th, 2006 EE512 - Graphical Models - J. Bilmes Page 49
What are implied conditional independences? Lec 1: March 28th, 2006 EE512 - Graphical Models - J. Bilmes Page 50
Two Views of a Family Lec 1: March 28th, 2006 EE512 - Graphical Models - J. Bilmes Page 51