Learning Bayes Net Structures

Size: px
Start display at page:

Download "Learning Bayes Net Structures"

Transcription

1 Learning Bayes Net Structures KF, Chapter (RN, Chapter 20) Some material taken from C Guesterin (CMU), K Murphy (UBC) 1

2 2 Learning Bayes Nets Known Structure Unknown Data Complete Missing Easy Hard EM NP-hard Very hard!! x (7) = [x 1 =t, x 2 =?, x 3 =f, ] Data x (1) x (m) structure CPTs : P(X i Pa Xi ) parameters

3 3 Learning the structure of a BN Data <x 1 (1),,x n (1) > <x 1 (m),,x n (m) > Flu Headache Learn structure and parameters Sinus Allergy Nose Constraint-based approach BN encodes conditional independencies Test conditional independencies in data Find an I-map (?P-map?) Score-based approach Finding structure + parameters is density estimation Evaluate model as we evaluated parameters Maximum likelihood Bayesian etc.

4 4 Remember: Obtaining a P-map? Given I(P) = independence assertions that are true for P 1. Obtain skeleton 2. Obtain immoralities 3. Using skeleton and immoralities, obtain every (and any) BN structure from the equivalence class Constraint-based approach: Use Learn_PDAG algorithm Key question: Independence test

5 5 Independence tests Statistically difficult task! Intuitive approach: Mutual information Mutual information and independence: X and Y independent if and only if I(X,Y)=0 X Y P(x, y) = P(x) P(y) log[ P(x,y)/P(x)P(y) ] = 0 Conditional mutual information: X Y Z iff P(X,Y Z) = P(X Z) P(Y Z) iff I(X,Y Z)= 0

6 6 Independence tests and the constraint based approach Using the data D Empirical distribution: Mutual information: Similarly for conditional MI Use learning PDAG algorithm: When algorithm asks: (X Y U)? Use I(X,Y U ) = 0? No doesn t happen Use I(X,Y U ) < t for some t>0? based on some statistical text t s.t. p<0.05 Many other types of independence tests

7 7 Independence Tests II For discrete data: χ 2 statistic measures how far the counts are from what we would expect given independence: p-value requires summing over all datasets of size N: p(t) = P({D : d(d) > t} H 0,N) Expensive approximation consider the expected distribution of d(d) (under the null hypothesis) as N to define thresholds for a given significance

8 8 Ex of classical hypothesis testing Spin Belgian one-euro coin N = 250 heads Y = 140; tails 110. Distinguish two models, H 0 = coin is unbiased (so p = 0.5) H 1 = coin is biased p 0.5 p-value is less than 7% p = P(Y 140) + P(Y 110) = 0.066: n=250; p = 0.5; y = 140; p = (1-binocdf(y-1,n,p)) + binocdf(n-y,n,p) If Y = 141: p = reject the null hypothesis at significance level But is the coin really biased?

9 9 build-pdag Algorithm build-pdag can recover the true structure up to I-equivalence in O(N 3 2 d ) time if maximum number of parents over nodes is d independence test oracle can handle 2d + 2 variables G = a I-map of P underlying distribution P is faithful to G spurious independencies not sanctioned by G Called IC or PC algorithm

10 10 Eval of IC / PC alg Bad Faithfulness assumption rules out certain CPDs XOR. Independence test typically unreliable (especially given small data sets) make many errors One misleading independence test result can result in multiple errors in the resulting PDAG, so overall the approach is not robust to noise. Good PC algorithm is less dumb than local search

11 11 Score-based Approach Possible DAG structures (gazillions) Flu Headache Sinus Allergy Nose Data <x 1 (1),,x n (1) > <x 1 (m),,x n (m) > Score of each Structure 15,000 Learn Parameters + Evaluate... 10,000 20,000 10,500

12 12 Information-theoretic interpretation of maximum likelihood Given structure, log likelihood of data: Flu Headache Sinus Allergy Nose N ijk θ ijk

13 13 Entropy Entropy of V = [p(v = 1), p(v = 0)] : H(V) = - vi P( V = v i ) log 2 P( V = v i ) # of bits needed to obtain full info average surprise of result of one trial" of V Entropy measure of uncertainty

14 14 Examples of Entropy Fair coin: H(½, ½) = ½ log 2 (½) ½ log 2 (½) = 1 bit ie, need 1 bit to convey the outcome of coin flip) Biased coin: H( 1/100, 99/100) = 1/100 log 2 (1/100) 99/100 log 2 (99/100) = 0.08 bit As P( heads ) 1, info of actual outcome 0 H(0, 1) = H(1, 0) = 0 bits ie, no uncertainty left in source (0 log 2 (0) = 0)

15 15 Entropy & Conditional Entropy Entropy of Distribution H(X) = - i P(x i ) log P(x i ) How `surprising variable is Entropy = 0 when know everything eg P(+x)=1.0 Conditional Entropy H(X U) H(X U) = - u P(u) i P(x i u) log P(x i u) How much uncertainty is left in X, after observing U

16 16 Information-theoretic interpretation of maximum likelihood 2 Given structure, log likelihood of data: So log P(D θ, G) is LARGEST when each H(X i Pa X G ) is SMALL i, ie, when parents of X i are very INFORMATIVE about X i!

17 17 Score for Bayesian Network I(X, U) = H(X) H(X U) H(X Pa X,G ) = H(X) I(X, Pa X,G ) Log data likelihood Doesn t involve the structure, G! (X Pa X ) not very independent So use score: i I(X i, Pa Xi, G )

18 18 Decomposable Score Log data likelihood or perhaps just score: i I(X i, Pa Xi, G ) Decomposable score: Decomposes over families in BN (node and its parents) Will lead to significant computational efficiency!!! Score(G : D) = i FamScore( X i Pa Xi : D) For MLE: FamScore( X i Pa Xi : D) = m[i(x i, Pa xi ) H(X i ) ]

19 Using DeComposability Compare i I(X i, Pa Xi, G) + c G 1 X Y Z G 2 Y Z X G 1 : i I(X i, Pa Xi, G1) = I(X, {}) + I(Y, X) + I(Z, Y) 0 = I(Y, X) + I(Z, Y) G 2 : i I(X i, Pa Xi, G2) = I(Y, {}) + I(Z,Y) + I(X, Z) 0 = I(Z,Y) + I(X, Z) so diff is I(Y, X) - I(X, Z) 19

20 20 How many trees are there? Tree: one path between any two nodes (in skeleton) Most nodes have 1 parent (+ root with 0 parents) How many: One: pick root pick children for each child another tree Nonetheless efficient optimal alg to find OPTIMAL tree

21 21 Best Tree Structure Identify tree with set F = { Pa(X) } each Pa(X) is {}, or another variable Optimal tree, given data, is argmax F m i I( X i, Pa(X i ) ) m i H(X i ) = argmax F i I( X i, Pa(X i ) ) as i H(X i ) does not depend on structure So want parents F s.t. tree structure maximizes i I( X i, Pa(X i ) )

22 22 Chow-Liu tree learning algorithm 1 For each pair of variables X i,x j Compute empirical distribution: Compute mutual information: Define a graph Nodes X 1,,X n Edge (i,j) gets weight Find maximal Spanning Tree Pick a node for root, dangle

23 23 Chow-Liu tree learning algorithm 2 Optimal tree BN Compute maximum weight spanning tree Directions in BN: pick any node as root, breadth-first-search defines directions Doesn t matter which! Score Equivalence: If G and G are I-equiv, then scores are same

24 24 Chow-Liu (CL) Results If distribution P is tree-structured, CL finds CORRECT one If distribution P is NOT tree-structured, CL finds tree structured Q that has min l KL-divergence argmin Q KL(P; Q) Even though 2 θ(n log n) trees, CL finds BEST one in poly time O(n 2 [m + log n])

25 25 Can we extend Chow-Liu 1 Naïve Bayes model overcounts because it ignores correlation between features Tree augmented naïve Bayes (TAN) [Friedman et al. 97] Same as Chow-Liu, but score edges with:

26 26 Can we extend Chow-Liu 1 Naïve Bayes model overcounts because it ignores correlation between features Tree augmented naïve Bayes (TAN) [Friedman et al. 97] Same as Chow-Liu, but score edges with:

27 27 Can we extend Chow-Liu 2 (Approximately learning) models with tree-width up to k [Narasimhan & Bilmes 04] But, O(n k+1 ) and more subtleties

28 28 Comments on learning BN structures so far Decomposable scores Maximum likelihood Information theoretic interpretation Best tree (Chow-Liu) Best TAN Nearly best k-treewidth (in O(N k+1 ))

29 29 Overfitting So far: Find parameters/structure that fit the training data If too many parameters, will match TRAINING data well, but NOT new instances Overfitting! Error Complexity Regularizing, Bayesian approach,

30 30 How to Evaluate a Model? Training Data TRAIN SNP1 SNP2 SNP3 SNP53 Bleeding? C/G A/G T/T T/C No SNP1 SNP2 SNP3 SNP53 Bleed? T/C : C/C : A/A : T/T : Yes : G/A C/C T/T T/C No G/A T/C G/G T/C No A/A C/C A/T T/T Yes A/A C/T A/A T/T Yes : : : : : TEST Find Param G/A C/T A/A T/T No SNP1 G/A A/A : SNP2 C/C C/C : SNP3 T/T A/T : SNP53 T/C T/T : Compute Prob -38 Training Set Error too optimistic G/A C/T A/A T/T

31 31 How to Evaluate a Model? Training Data TRAIN SNP1 SNP2 SNP3 SNP53 Bleeding? C/G A/G T/T T/C No SNP1 SNP2 SNP3 SNP53 Bleed? T/C : C/C : A/A : T/T : Yes : G/A C/C T/T T/C No G/A T/C G/G T/C No A/A C/C A/T T/T Yes A/A C/T A/A T/T Yes : : : : : TEST Find Param G/A C/T A/A T/T No SNP1 G/A A/A : SNP2 C/C C/C : SNP3 T/T A/T : SNP53 T/C T/T : Compute Prob -53 G/A C/T A/A T/T Simple Hold-out Set Error slightly pessimistic

32 32 How to Evaluate a Model? K-fold Cross Validation Eg, K=3 Not as pessimistic every point is test example, once

33 33 Maximum likelihood score overfits! Information never hurts: Adding a parent never decreases score!!!

34 34 Bayesian Score Prior distributions: Over structures Over parameters of a structure Goal: Prefer simpler structures.. regularization Posterior over structures given data: P(G D) P(D G) P(G) Posterior Likelihood Prior over Graphs P(D G) = ΘG P(D G, Θ G ) P( Θ G G) d Θ G Prior over Parameters

35 35 Bayesian score and Model complexity True model: Structure 1: X and Y independent X Y P(Y=t X=t) = α P(Y=t X=f) = α Score doesn t depend on α!

36 36 Bayesian score and model complexity True model: Structure 1: X and Y independent X Score doesn t depend on alpha Structure 2: X Y Y P(Y=t X=t) = α P(Y=t X=f) = α Data points split between P(Y=t X=t) and P(Y=t X=f) For fixed m, only worth it for large α Because posterior over parameter will be more diffuse with less data

37 37 Bayesian, a decomposable score Local and global parameter independence θ Y +x θ X Prior satisfies parameter modularity: If X i has same parents in G and G, then parameters have same prior C A B A B X X C Θ(X; A,B) same in both structures Structure prior P(G) satisfies structure modularity Product of terms over families E.g., P(G) c G G =#edges; c<1.. then Bayesian score decomposes along families! log P(G D) = X ScoreFam( X Pa X : D)

38 38 Marginal Posterior Given θ ~ Beta(1,1), what is probability of h H, T, T, H, H i? P( f 1 =H, f 2 =T, f 3 =T, f 4 =H, f 5 =H θ ~ Beta(1,1) ) = P( f 1 =H θ ~ Beta(1,1) ) P( f 2 =T, f 3 =T, f 4 =H, f 5 =H f 1 =H, θ ~ Beta(1,1) ) = ½ P( f 2 =T, f 3 =T, f 4 =H, f 5 =H θ ~ Beta(2,1) ) = ½ P( f 2 =T θ ~ Beta(2,1) ) x P(f 3 =T, f 4 =H, f 5 =H f 2 =T, θ ~ Beta(2,1) ) = ½ 1/3 P(f 3 =T, f 4 =H, f 5 =H θ ~ Beta(2,2) ) = ½ 1/3 2/4 2/5 P(f 5 =H θ ~ Beta(2,3) ) = ½ 1/3 2/4 2/5 3/6 = (1 2 3) (1 2) / ( )

39 Marginal Posterior con t Given θ ~ Beta(a,b), what is P[ h H, T, T, H, H i ]? P( f 1 =H, f 2 =T, f 3 =T, f 4 =H, f 5 =H θ ~ Beta(a,b) ) = P( f 1 =H θ ~ Beta(a,b) ) P( f 2 =T, f 3 =T, f 4 =H, f 5 =H f 1 =H, θ ~ Beta(a,b) ) = a/(a+b) P( f 2 =T, f 3 =T, f 4 =H, f 5 =H θ~beta(a+1,b) ) = = a b b+ 1 a+ 1 a+ 2 a+ b a+ b+ 1 a+ b+ 2 a+ b+ 3 a+ b+ 4 a ( a+ 1) ( a+ 2) b ( b+ 1) ( a+ b)( a+ b+ 1)( a+ b+ 2)( a+ b+ 3)( a+ b+ 4) = Γ ( a+ mh) Γ ( b+ mt) Γ ( a+ b) Γ( a) Γ( b) Γ ( a+ b+ m) 39

40 40 Marginal, vs Maximal, Likelihood Data D =h H, T, T, H, H i θ * = argmax θ P( D θ ) = 3/5 Here: P( D θ * ) = (3/5) 3 (2/5) Or Bayesian, from Beta(1,1), θ * B(1,1) = 4/7 Marginal i P( x i x 1, x i-1 ) kinda like cross validation: Evaluate each instance, wrt previous instance

41 41 Marginal Probability of Graph Given complete data, independent parameters,

42 BIC approximation of Bayesian Score In general, Bayesian has difficult integrals For Dirichlet prior over parameters, can use simple Bayes information criterion (BIC) approximation In the limit, we can forget prior! Theorem: for Dirichlet prior, and a BN with Dim(G) independent parameters, as m : max likelihood estimate for θ likelihood score prefers fully-connected graph regularizer penalizes edges 42

43 43 BIC approximation, a decomposable score BIC: Using information theoretic formulation:

44 44 Consistency of BIC, Bayesian scores Consistency is limiting behavior says nothing wrt finite sample size!!! A scoring function is consistent if, for true model G *, as m, with probability 1 G * maximizes the score All structures not I-equivalent to G * have strictly lower score Theorem: BIC score (with Dirichlet prior) is consistent Corollary: the Bayesian score is consistent What about likelihood score? NO! True, Likelihood of optimal is MAX. But fully-connected graph (which is NOT I-equiv, also max s score!

45 45 Priors for General Graphs For finite datasets, prior is important! Prior over structure satisfying prior modularity What about prior over parameters, how do we represent it? K2 prior: fix an α, P(θ Xi PaXi ) = Dirichlet(α,, α) K2 is inconsistent

46 Priors for Parameters G 1 X Y θ X ~ Beta(1, 1) θ Y +x ~ Beta(1, 1) θ Y -x ~ Beta(1, 1) Does this make sense? EffectiveSampleSize(θ Y +x ) = 2 But only 1 example ~ +x?? G 2 X θ X +y ~ Beta(1, 1) θ X -y ~ Beta(1, 1) I-Equivalent structure What happens after [+x, -y]? Should be the same!! Y θ Y ~ Beta(1, 1)

47 Priors for Parameters G 1 G 1 X θ X ~ Beta(1, 1) X θ X ~ Beta(2, 1) Y P 1 (+x) = 2/3 θ Y +x ~ Beta(1, 1) θ Y -x ~ Beta(1, 1) [+x, -y] Y θ Y +x ~ Beta(1, 2) θ Y -x ~ Beta(1, 1) G 2 G 2 X θ X +y ~ Beta(1, 1) θ X -y ~ Beta(1, 1) P 2 (+x) = P 2 (+x,+y) + P 2 (+x,-y) = 1/3 * ½ + 2/3 * 2/3 = 11/18!!! Y θ Y ~ Beta(1, 1) X θ X +y ~ Beta(1, 1) θ X -y ~ Beta(2, 1) Y θ Y ~ Beta(1, 2)

48 BDe Priors G 1 X Y θ X ~ Beta(2, 2) θ Y +x ~ Beta(1, 1) θ Y -x ~ Beta(1, 1) This makes more sense: EffectiveSampleSize(θ Y +x ) = 2 Now 2 examples ~ +x?? G 2 X θ X +y ~ Beta(1, 1) θ X -y ~ Beta(1, 1) I-Equivalent structure Now what happens after [+x, -y]? Y θ Y ~ Beta(2, 2)

49 BDe Priors G 1 G 1 X θ X ~ Beta(2, 2) X θ X ~ Beta(3, 2) Y P 1 (+x) = 3/5 θ Y +x ~ Beta(1, 1) θ Y -x ~ Beta(1, 1) [+x, -y] Y θ Y +x ~ Beta(1, 2) θ Y -x ~ Beta(1, 1) G 2 G 2 X θ X +y ~ Beta(1, 1) θ X -y ~ Beta(1, 1) P 2 (+x) = P 2 (+x,+y) + P 2 (+x,-y) = 2/5 * ½ + 3/5 * 2/3 = 3/5!!! Y θ Y ~ Beta(2, 2) X θ X +y ~ Beta(1, 1) θ X -y ~ Beta(2, 1) Y θ Y ~ Beta(2, 3)

50 50 BDe Prior View Dirichlet parameters as fictitious samples equivalent sample size Pick a fictitious sample size m For each possible family, define a prior distribution P(X i,pa Xi ) Represent with a BN Usually independent (product of marginals)

51 51 Score Equivalence If G and G are I-equivalent, then they have same score Theorem 1: Maximum likelihood score and BIC score satisfy score equivalence. Theorem 2: If P(G) assigns same prior to I-equivalent structures (e.g., edge counting) and each parameter prior is Dirichlet then Bayesian score satisfies score equivalence if and only if prior over parameters represented as a BDe prior!!!!!!

52 52 Chow-Liu for Bayesian score Edge weight w Xj Xi is advantage of adding X j as parent for X i Now have a directed graph, need directed spanning forest Note that adding an edge can hurt Bayesian score choose forest not tree But, if score satisfies score equivalence, then w Xj Xi = w Xj Xi! Simple maximum spanning forest algorithm works

53 53 Structure learning for General Graphs In a tree, a node only has one parent Theorem: The problem of learning a BN structure with at most d parents is NP-hard for any (fixed) d 2 Most structure learning approaches use heuristics Exploit score decomposition (Quickly) Describe two heuristics that exploit decomposition in different ways

54 54 Understanding Score Decomposition Coherence Difficulty Intelligence Grade SAT Letter Happy Job

55 55 Fixed variable order 1 Pick a variable order e.g., X 1,,X n X i can only pick parents in {X 1,, X i-1 } Any subset Acyclicity guaranteed! Total score = sum score of each node

56 Fixed variable order.. 2 Fix max number of parents to k For each i in order Pick Pa Xi {X 1,,X i-1 } Exhaustively search through all possible subsets Pa Xi is U {X 1,,X i-1 } that maximizes FamScore(X i U : D) Optimal BN for each order!!! Greedy search through space of orders: E.g., try switching pairs of variables in order If neighboring vars in order are switched, only need to recompute score for this pair O(n) speed up per iteration F A S H N F S A H N 56

57 57 Learn BN structure using local search Starting from Chow-Liu tree Local search, possible moves: Only if acyclic!!! Add edge Delete edge Invert edge Select using favorite score Computed locally ( efficiently) thanks to Score Decomposition FamScore

58 58 Exploit score decomposition in local search Coherence Difficulty Grade Intelligence SAT Add edge: Re-score only one family! Delete edge: Re-score only one family! Happy Letter Job Reverse edge Re-score only two families

59 59 Some experiments Alarm network

60 60 Order search versus Graph search Order search advantages For fixed order, optimal BN more global optimization Space of orders (n!) much smaller than space of graphs Ω(2 n2 ) Graph search advantages Not restricted to k parents Especially if exploiting CPD structure, such as CSI Cheaper per iteration Finer moves within a graph

61 61 Bayesian Model Averaging So far, we have selected a single structure But, if you are really Bayesian must average over structures Similar to averaging over parameters P(G D) probability for each graph Inference for structure averaging is very hard!!! Clever tricks in KF text

62 62 Summary wrt Learning BN Structure Decomposable scores Data likelihood Information theoretic interpretation Bayesian BIC approximation Priors Structure and parameter assumptions BDe if and only if score equivalence Best tree (Chow-Liu) Best TAN Nearly best k-treewidth (in O(N k+1 )) Search techniques Search through orders Search through structures Bayesian model averaging

Structure Learning: the good, the bad, the ugly

Structure Learning: the good, the bad, the ugly Readings: K&F: 15.1, 15.2, 15.3, 15.4, 15.5 Structure Learning: the good, the bad, the ugly Graphical Models 10708 Carlos Guestrin Carnegie Mellon University September 29 th, 2006 1 Understanding the uniform

More information

Readings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008

Readings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008 Readings: K&F: 16.3, 16.4, 17.3 Bayesian Param. Learning Bayesian Structure Learning Graphical Models 10708 Carlos Guestrin Carnegie Mellon University October 6 th, 2008 10-708 Carlos Guestrin 2006-2008

More information

Learning P-maps Param. Learning

Learning P-maps Param. Learning Readings: K&F: 3.3, 3.4, 16.1, 16.2, 16.3, 16.4 Learning P-maps Param. Learning Graphical Models 10708 Carlos Guestrin Carnegie Mellon University September 24 th, 2008 10-708 Carlos Guestrin 2006-2008

More information

BN Semantics 3 Now it s personal! Parameter Learning 1

BN Semantics 3 Now it s personal! Parameter Learning 1 Readings: K&F: 3.4, 14.1, 14.2 BN Semantics 3 Now it s personal! Parameter Learning 1 Graphical Models 10708 Carlos Guestrin Carnegie Mellon University September 22 nd, 2006 1 Building BNs from independence

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 5 Bayesian Learning of Bayesian Networks CS/CNS/EE 155 Andreas Krause Announcements Recitations: Every Tuesday 4-5:30 in 243 Annenberg Homework 1 out. Due in class

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 4 Learning Bayesian Networks CS/CNS/EE 155 Andreas Krause Announcements Another TA: Hongchao Zhou Please fill out the questionnaire about recitations Homework 1 out.

More information

Bayesian Networks Structure Learning (cont.)

Bayesian Networks Structure Learning (cont.) Koller & Friedman Chapters (handed out): Chapter 11 (short) Chapter 1: 1.1, 1., 1.3 (covered in the beginning of semester) 1.4 (Learning parameters for BNs) Chapter 13: 13.1, 13.3.1, 13.4.1, 13.4.3 (basic

More information

Directed Graphical Models or Bayesian Networks

Directed Graphical Models or Bayesian Networks Directed Graphical Models or Bayesian Networks Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Bayesian Networks One of the most exciting recent advancements in statistical AI Compact

More information

Directed Graphical Models

Directed Graphical Models Directed Graphical Models Instructor: Alan Ritter Many Slides from Tom Mitchell Graphical Models Key Idea: Conditional independence assumptions useful but Naïve Bayes is extreme! Graphical models express

More information

Structure Learning in Bayesian Networks (mostly Chow-Liu)

Structure Learning in Bayesian Networks (mostly Chow-Liu) Structure Learning in Bayesian Networks (mostly Chow-Liu) Sue Ann Hong /5/007 Chow-Liu Give directions to edges in MST Chow-Liu: how-to e.g. () & () I(X, X ) = x =0 val = 0, I(X i, X j ) = x i,x j ˆP (xi,

More information

Learning Bayesian Networks

Learning Bayesian Networks Learning Bayesian Networks Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki V-1 Aspects in learning Learning the parameters of a Bayesian network Marginalizing over all all parameters

More information

Learning Bayesian network : Given structure and completely observed data

Learning Bayesian network : Given structure and completely observed data Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution

More information

Bayesian Learning. Bayesian Learning Criteria

Bayesian Learning. Bayesian Learning Criteria Bayesian Learning In Bayesian learning, we are interested in the probability of a hypothesis h given the dataset D. By Bayes theorem: P (h D) = P (D h)p (h) P (D) Other useful formulas to remember are:

More information

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag Decision Trees Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Supervised Learning Input: labelled training data i.e., data plus desired output Assumption:

More information

Bayesian Methods: Naïve Bayes

Bayesian Methods: Naïve Bayes Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior

More information

Bayesian Networks. Motivation

Bayesian Networks. Motivation Bayesian Networks Computer Sciences 760 Spring 2014 http://pages.cs.wisc.edu/~dpage/cs760/ Motivation Assume we have five Boolean variables,,,, The joint probability is,,,, How many state configurations

More information

Learning' Probabilis2c' Graphical' Models' BN'Structure' Structure' Learning' Daphne Koller

Learning' Probabilis2c' Graphical' Models' BN'Structure' Structure' Learning' Daphne Koller Probabilis2c' Graphical' Models' Learning' BN'Structure' Structure' Learning' Why Structure Learning To learn model for new queries, when domain expertise is not perfect For structure discovery, when inferring

More information

BN Semantics 3 Now it s personal!

BN Semantics 3 Now it s personal! Readings: K&F: 3.3, 3.4 BN Semantics 3 Now it s personal! Graphical Models 10708 Carlos Guestrin Carnegie Mellon University September 22 nd, 2008 10-708 Carlos Guestrin 2006-2008 1 Independencies encoded

More information

Structure Learning in Bayesian Networks

Structure Learning in Bayesian Networks Structure Learning in Bayesian Networks Sargur Srihari srihari@cedar.buffalo.edu 1 Topics Problem Definition Overview of Methods Constraint-Based Approaches Independence Tests Testing for Independence

More information

An Empirical-Bayes Score for Discrete Bayesian Networks

An Empirical-Bayes Score for Discrete Bayesian Networks An Empirical-Bayes Score for Discrete Bayesian Networks scutari@stats.ox.ac.uk Department of Statistics September 8, 2016 Bayesian Network Structure Learning Learning a BN B = (G, Θ) from a data set D

More information

Directed and Undirected Graphical Models

Directed and Undirected Graphical Models Directed and Undirected Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Last Lecture Refresher Lecture Plan Directed

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University August 30, 2017 Today: Decision trees Overfitting The Big Picture Coming soon Probabilistic learning MLE,

More information

The Monte Carlo Method: Bayesian Networks

The Monte Carlo Method: Bayesian Networks The Method: Bayesian Networks Dieter W. Heermann Methods 2009 Dieter W. Heermann ( Methods)The Method: Bayesian Networks 2009 1 / 18 Outline 1 Bayesian Networks 2 Gene Expression Data 3 Bayesian Networks

More information

Beyond Uniform Priors in Bayesian Network Structure Learning

Beyond Uniform Priors in Bayesian Network Structure Learning Beyond Uniform Priors in Bayesian Network Structure Learning (for Discrete Bayesian Networks) scutari@stats.ox.ac.uk Department of Statistics April 5, 2017 Bayesian Network Structure Learning Learning

More information

Learning Bayesian Networks (part 1) Goals for the lecture

Learning Bayesian Networks (part 1) Goals for the lecture Learning Bayesian Networks (part 1) Mark Craven and David Page Computer Scices 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Some ohe slides in these lectures have been adapted/borrowed from materials

More information

{ p if x = 1 1 p if x = 0

{ p if x = 1 1 p if x = 0 Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X =

More information

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables? Linear Regression Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2014 1 What about continuous variables? n Billionaire says: If I am measuring a continuous variable, what

More information

COMP538: Introduction to Bayesian Networks

COMP538: Introduction to Bayesian Networks COMP538: Introduction to Bayesian Networks Lecture 9: Optimal Structure Learning Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering Hong Kong University of Science and Technology

More information

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013 Bayesian Methods Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2013 1 What about prior n Billionaire says: Wait, I know that the thumbtack is close to 50-50. What can you

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Overview of Course. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58

Overview of Course. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58 Overview of Course So far, we have studied The concept of Bayesian network Independence and Separation in Bayesian networks Inference in Bayesian networks The rest of the course: Data analysis using Bayesian

More information

Lecture 9: PGM Learning

Lecture 9: PGM Learning 13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and

More information

Today. Statistical Learning. Coin Flip. Coin Flip. Experiment 1: Heads. Experiment 1: Heads. Which coin will I use? Which coin will I use?

Today. Statistical Learning. Coin Flip. Coin Flip. Experiment 1: Heads. Experiment 1: Heads. Which coin will I use? Which coin will I use? Today Statistical Learning Parameter Estimation: Maximum Likelihood (ML) Maximum A Posteriori (MAP) Bayesian Continuous case Learning Parameters for a Bayesian Network Naive Bayes Maximum Likelihood estimates

More information

Learning Causality. Sargur N. Srihari. University at Buffalo, The State University of New York USA

Learning Causality. Sargur N. Srihari. University at Buffalo, The State University of New York USA Learning Causality Sargur N. Srihari University at Buffalo, The State University of New York USA 1 Plan of Discussion Bayesian Networks Causal Models Learning Causal Models 2 BN and Complexity of Prob

More information

Lecture 5: Bayesian Network

Lecture 5: Bayesian Network Lecture 5: Bayesian Network Topics of this lecture What is a Bayesian network? A simple example Formal definition of BN A slightly difficult example Learning of BN An example of learning Important topics

More information

Lecture 6: Graphical Models: Learning

Lecture 6: Graphical Models: Learning Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

Naïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Naïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824 Naïve Bayes Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative HW 1 out today. Please start early! Office hours Chen: Wed 4pm-5pm Shih-Yang: Fri 3pm-4pm Location: Whittemore 266

More information

4: Parameter Estimation in Fully Observed BNs

4: Parameter Estimation in Fully Observed BNs 10-708: Probabilistic Graphical Models 10-708, Spring 2015 4: Parameter Estimation in Fully Observed BNs Lecturer: Eric P. Xing Scribes: Purvasha Charavarti, Natalie Klein, Dipan Pal 1 Learning Graphical

More information

Bayesian Learning. Instructor: Jesse Davis

Bayesian Learning. Instructor: Jesse Davis Bayesian Learning Instructor: Jesse Davis 1 Announcements Homework 1 is due today Homework 2 is out Slides for this lecture are online We ll review some of homework 1 next class Techniques for efficient

More information

An Introduction to Bayesian Machine Learning

An Introduction to Bayesian Machine Learning 1 An Introduction to Bayesian Machine Learning José Miguel Hernández-Lobato Department of Engineering, Cambridge University April 8, 2013 2 What is Machine Learning? The design of computational systems

More information

Learning Bayesian networks

Learning Bayesian networks 1 Lecture topics: Learning Bayesian networks from data maximum likelihood, BIC Bayesian, marginal likelihood Learning Bayesian networks There are two problems we have to solve in order to estimate Bayesian

More information

CS 188: Artificial Intelligence. Bayes Nets

CS 188: Artificial Intelligence. Bayes Nets CS 188: Artificial Intelligence Probabilistic Inference: Enumeration, Variable Elimination, Sampling Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Chapter 8&9: Classification: Part 3 Instructor: Yizhou Sun yzsun@ccs.neu.edu March 12, 2013 Midterm Report Grade Distribution 90-100 10 80-89 16 70-79 8 60-69 4

More information

Introduction to Bayesian Learning

Introduction to Bayesian Learning Course Information Introduction Introduction to Bayesian Learning Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Apprendimento Automatico: Fondamenti - A.A. 2016/2017 Outline

More information

Machine Learning Summer School

Machine Learning Summer School Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,

More information

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting

More information

Machine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?

Machine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables? Linear Regression Machine Learning CSE546 Sham Kakade University of Washington Oct 4, 2016 1 What about continuous variables? Billionaire says: If I am measuring a continuous variable, what can you do

More information

CSE 473: Artificial Intelligence Autumn Topics

CSE 473: Artificial Intelligence Autumn Topics CSE 473: Artificial Intelligence Autumn 2014 Bayesian Networks Learning II Dan Weld Slides adapted from Jack Breese, Dan Klein, Daphne Koller, Stuart Russell, Andrew Moore & Luke Zettlemoyer 1 473 Topics

More information

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

CS6375: Machine Learning Gautam Kunapuli. Decision Trees Gautam Kunapuli Example: Restaurant Recommendation Example: Develop a model to recommend restaurants to users depending on their past dining experiences. Here, the features are cost (x ) and the user s

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.

More information

Announcements. Proposals graded

Announcements. Proposals graded Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Classification: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Lecture 8: December 17, 2003

Lecture 8: December 17, 2003 Computational Genomics Fall Semester, 2003 Lecture 8: December 17, 2003 Lecturer: Irit Gat-Viks Scribe: Tal Peled and David Burstein 8.1 Exploiting Independence Property 8.1.1 Introduction Our goal is

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Parameter Estimation December 14, 2015 Overview 1 Motivation 2 3 4 What did we have so far? 1 Representations: how do we model the problem? (directed/undirected). 2 Inference: given a model and partially

More information

CSC 411 Lecture 3: Decision Trees

CSC 411 Lecture 3: Decision Trees CSC 411 Lecture 3: Decision Trees Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 03-Decision Trees 1 / 33 Today Decision Trees Simple but powerful learning

More information

A Tutorial on Learning with Bayesian Networks

A Tutorial on Learning with Bayesian Networks A utorial on Learning with Bayesian Networks David Heckerman Presented by: Krishna V Chengavalli April 21 2003 Outline Introduction Different Approaches Bayesian Networks Learning Probabilities and Structure

More information

CSCE 478/878 Lecture 6: Bayesian Learning

CSCE 478/878 Lecture 6: Bayesian Learning Bayesian Methods Not all hypotheses are created equal (even if they are all consistent with the training data) Outline CSCE 478/878 Lecture 6: Bayesian Learning Stephen D. Scott (Adapted from Tom Mitchell

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 22, 2011 Today: MLE and MAP Bayes Classifiers Naïve Bayes Readings: Mitchell: Naïve Bayes and Logistic

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Empirical Bayes, Hierarchical Bayes Mark Schmidt University of British Columbia Winter 2017 Admin Assignment 5: Due April 10. Project description on Piazza. Final details coming

More information

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Part I C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Probabilistic Graphical Models Graphical representation of a probabilistic model Each variable corresponds to a

More information

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013 Introduction to Bayes Nets CS 486/686: Introduction to Artificial Intelligence Fall 2013 1 Introduction Review probabilistic inference, independence and conditional independence Bayesian Networks - - What

More information

Learning Bayesian belief networks

Learning Bayesian belief networks Lecture 4 Learning Bayesian belief networks Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Administration Midterm: Monday, March 7, 2003 In class Closed book Material covered by Wednesday, March

More information

Scoring functions for learning Bayesian networks

Scoring functions for learning Bayesian networks Tópicos Avançados p. 1/48 Scoring functions for learning Bayesian networks Alexandra M. Carvalho Tópicos Avançados p. 2/48 Plan Learning Bayesian networks Scoring functions for learning Bayesian networks:

More information

Andrew/CS ID: Midterm Solutions, Fall 2006

Andrew/CS ID: Midterm Solutions, Fall 2006 Name: Andrew/CS ID: 15-780 Midterm Solutions, Fall 2006 November 15, 2006 Place your name and your andrew/cs email address on the front page. The exam is open-book, open-notes, no electronics other than

More information

10708 Graphical Models: Homework 2

10708 Graphical Models: Homework 2 10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

Sampling Algorithms for Probabilistic Graphical models

Sampling Algorithms for Probabilistic Graphical models Sampling Algorithms for Probabilistic Graphical models Vibhav Gogate University of Washington References: Chapter 12 of Probabilistic Graphical models: Principles and Techniques by Daphne Koller and Nir

More information

Learning in Bayesian Networks

Learning in Bayesian Networks Learning in Bayesian Networks Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1 Overview 1. Bayesian Networks Stochastic Networks

More information

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI Generative and Discriminative Approaches to Graphical Models CMSC 35900 Topics in AI Lecture 2 Yasemin Altun January 26, 2007 Review of Inference on Graphical Models Elimination algorithm finds single

More information

The Naïve Bayes Classifier. Machine Learning Fall 2017

The Naïve Bayes Classifier. Machine Learning Fall 2017 The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

CS 630 Basic Probability and Information Theory. Tim Campbell

CS 630 Basic Probability and Information Theory. Tim Campbell CS 630 Basic Probability and Information Theory Tim Campbell 21 January 2003 Probability Theory Probability Theory is the study of how best to predict outcomes of events. An experiment (or trial or event)

More information

Bayesian Network Representation

Bayesian Network Representation Bayesian Network Representation Sargur Srihari srihari@cedar.buffalo.edu 1 Topics Joint and Conditional Distributions I-Maps I-Map to Factorization Factorization to I-Map Perfect Map Knowledge Engineering

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should

More information

Feature selection. Micha Elsner. January 29, 2014

Feature selection. Micha Elsner. January 29, 2014 Feature selection Micha Elsner January 29, 2014 2 Using megam as max-ent learner Hal Daume III from UMD wrote a max-ent learner Pretty typical of many classifiers out there... Step one: create a text file

More information

Probabilistic Graphical Models. Guest Lecture by Narges Razavian Machine Learning Class April

Probabilistic Graphical Models. Guest Lecture by Narges Razavian Machine Learning Class April Probabilistic Graphical Models Guest Lecture by Narges Razavian Machine Learning Class April 14 2017 Today What is probabilistic graphical model and why it is useful? Bayesian Networks Basic Inference

More information

Representation. Stefano Ermon, Aditya Grover. Stanford University. Lecture 2

Representation. Stefano Ermon, Aditya Grover. Stanford University. Lecture 2 Representation Stefano Ermon, Aditya Grover Stanford University Lecture 2 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 2 1 / 32 Learning a generative model We are given a training

More information

Introduction to Probabilistic Graphical Models

Introduction to Probabilistic Graphical Models Introduction to Probabilistic Graphical Models Kyu-Baek Hwang and Byoung-Tak Zhang Biointelligence Lab School of Computer Science and Engineering Seoul National University Seoul 151-742 Korea E-mail: kbhwang@bi.snu.ac.kr

More information

11. Learning graphical models

11. Learning graphical models Learning graphical models 11-1 11. Learning graphical models Maximum likelihood Parameter learning Structural learning Learning partially observed graphical models Learning graphical models 11-2 statistical

More information

Intelligent Systems:

Intelligent Systems: Intelligent Systems: Undirected Graphical models (Factor Graphs) (2 lectures) Carsten Rother 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM Roadmap for next two lectures Definition

More information

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian

More information

MACHINE LEARNING 2 UGM,HMMS Lecture 7

MACHINE LEARNING 2 UGM,HMMS Lecture 7 LOREM I P S U M Royal Institute of Technology MACHINE LEARNING 2 UGM,HMMS Lecture 7 THIS LECTURE DGM semantics UGM De-noising HMMs Applications (interesting probabilities) DP for generation probability

More information

Qualifier: CS 6375 Machine Learning Spring 2015

Qualifier: CS 6375 Machine Learning Spring 2015 Qualifier: CS 6375 Machine Learning Spring 2015 The exam is closed book. You are allowed to use two double-sided cheat sheets and a calculator. If you run out of room for an answer, use an additional sheet

More information

Probabilistic Graphical Models (I)

Probabilistic Graphical Models (I) Probabilistic Graphical Models (I) Hongxin Zhang zhx@cad.zju.edu.cn State Key Lab of CAD&CG, ZJU 2015-03-31 Probabilistic Graphical Models Modeling many real-world problems => a large number of random

More information

Lecture 8: PGM Inference

Lecture 8: PGM Inference 15 September 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I 1 Variable elimination Max-product Sum-product 2 LP Relaxations QP Relaxations 3 Marginal and MAP X1 X2 X3 X4

More information

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic CS 188: Artificial Intelligence Fall 2008 Lecture 16: Bayes Nets III 10/23/2008 Announcements Midterms graded, up on glookup, back Tuesday W4 also graded, back in sections / box Past homeworks in return

More information

GWAS IV: Bayesian linear (variance component) models

GWAS IV: Bayesian linear (variance component) models GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining MLE and MAP Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due tonight. Assignment 5: Will be released

More information

CS 6140: Machine Learning Spring What We Learned Last Week. Survey 2/26/16. VS. Model

CS 6140: Machine Learning Spring What We Learned Last Week. Survey 2/26/16. VS. Model Logis@cs CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Assignment

More information

CS221 / Autumn 2017 / Liang & Ermon. Lecture 15: Bayesian networks III

CS221 / Autumn 2017 / Liang & Ermon. Lecture 15: Bayesian networks III CS221 / Autumn 2017 / Liang & Ermon Lecture 15: Bayesian networks III cs221.stanford.edu/q Question Which is computationally more expensive for Bayesian networks? probabilistic inference given the parameters

More information

Generative Clustering, Topic Modeling, & Bayesian Inference

Generative Clustering, Topic Modeling, & Bayesian Inference Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week

More information

CS 6140: Machine Learning Spring 2016

CS 6140: Machine Learning Spring 2016 CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa?on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Logis?cs Assignment

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Bayesian Networks: Independencies and Inference

Bayesian Networks: Independencies and Inference Bayesian Networks: Independencies and Inference Scott Davies and Andrew Moore Note to other teachers and users of these slides. Andrew and Scott would be delighted if you found this source material useful

More information

From Bayesian Networks to Markov Networks. Sargur Srihari

From Bayesian Networks to Markov Networks. Sargur Srihari From Bayesian Networks to Markov Networks Sargur srihari@cedar.buffalo.edu 1 Topics Bayesian Networks and Markov Networks From BN to MN: Moralized graphs From MN to BN: Chordal graphs 2 Bayesian Networks

More information

Bayesian Networks Representation

Bayesian Networks Representation Bayesian Networks Representation Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University March 19 th, 2007 Handwriting recognition Character recognition, e.g., kernel SVMs a c z rr r r

More information