Machine Learning. Bayesian Learning. Michael M. Richter. Michael M. Richter

Size: px
Start display at page:

Download "Machine Learning. Bayesian Learning. Michael M. Richter. Michael M. Richter"

Transcription

1 Machine Learning Bayesian Learning

2 Topic This is concept learning the probabilistic way. That means, everything that is stated is done in an exact way but not always true. That means, the learned concept is equipped with a probability for being correct.

3 History Bayesian Decision Theory came long before Version Spaces, Decision Tree Learning and Neural Networks. It was studied in the field of Statistical Theory and more specifically, in the field of Pattern Recognition. Bayesian Decision Theory is at the basis of important learning schemes such as the Naïve Bayes Classifier, Learning Bayesian Belief Networks and the EM Algorithm. Bayesian Decision Theory is also useful as it provides a framework within which many non-bayesian classifiers can be studied (See [Mitchell, Sections 6.3, 4,5,6]).

4 : Why Bayesian Classification? Probabilistic learning: Calculate explicit probabilities for hypothesis, among the most practical approaches to certain types of learning problems Incremental: Each training example can incrementally increase/decrease the probability that a hypothesis is correct. Prior knowledge can be combined with observed data. Probabilistic prediction: Predict multiple hypotheses, weighted by their probabilities Standard: Even when Bayesian methods are computationally intractable, they can provide a standard of optimal decision making against which other methods can be measured

5 Maximum Likelihood Suppose there are a number of hypothesis generated and that for each one there is a probability for being the right one is calculated. The maximum likelihood principle says one should choose the hypothesis h with the highest probability. P(h D) is the a posteriori probability of h (after seeing the data D). P(h) is the a priori probability of h. P(D h) is the likelihood of D under h.

6 Part 1 The Naïve Bayesian Approach

7 Basic Formulas for Probabilities Product Rule : probability P(AB) of a conjunction of two events A and B: Sum Rule: probability of a disjunction of two events A and B: Theorem of Total Probability: if events A1,., An are mutually exclusive then ) ( ) ( ) ( ) ( ), ( A P A B P B P B A P B A P ) ( ) ( ) ( ) ( AB P B P A P B A P ) ( ) ( ) ( 1 i n i i A P A B P B P

8 A Basic Learning Scenario (1) Event Y = y Observed example event Event Z = z Correctness of hypothesis z D: Data P(Z = z Y = y) = P(Z = z) P(Y = y Z = z) P(Y = y) Probability that h is a correct hypothesis for data D P(h D) = Prob(h) to be a correct hypothesis P(h ) P(D h) P(D) Prob(D) to be observed, if h is correct Prob( D) to be observed

9 A Basic Learning Scenario (2) Notation: P(h) is a-priori-probability of h P(D h) is likelihood of D under h P(h D) is a-posteriori-probability of h given D The basic theorem of Bayes Rule: P(h D) = P(h ) P(D h) P(D) This theorem makes applications possible because it reduces the unknown conditional probability to ones that are known a priori.

10 A Basic Learning Scenario (3) Learner has hypotheses h1,,hn and uses observed data D. Wanted: Some h { h1,..., hk }, for which P(h D) is maximal: (Maximum-a-posteriori-Hypothesis). A posteriori means: After seeing data Background knowledge: A-priori-probability Pr(h) of h A priori means: before seeing data

11 Bayesian Classification and Decision (1) The Bayes decision rule selects the class with minimum conditional risk. In the case of minimum-error-rate classification, the rule will select the class with the maximum posteriori probability. Suppose there are k classes, c1, c2,..., ck. Given a feature vector x: The minimum-error-rate rule will assign it to the class cj if P(cj x) > P(ci x) for all i j.

12 Bayesian Classification and Decision (2) An equivalent but more useful criterion for minimum-errorrate classification is: Choose class cj so that P(x cj)p(cj) > P(x ci)p(ci) for all i j This relies on Bayes theorem. Note: There can no method exist that finds with higher probability the correct hypothesis. But: That can change if one has additional knowledge.

13 Assume: Example (1) A lab test D for a form of cancer has 98% chance of giving a positive result if the cancer is present, and 97% chance of giving a negative result if the cancer is absent. (2) 0.8% of population has this cancer: P(cancer)=0.008 and P(~cancer)=0.992 What is probability that the cancer is present for a positive result? P(cancer D) = P(D cancer)p(cancer) / P(D) = 0.98*0.008 /(0.98* *0.992)=0.21

14 MAP and ML Given some data D and a hypothesis space H, what is the most probable hypothesis h H; i.e., P(h D) is maximal? This hypothesis is called the maximal a posteriori hypothesis h MAP h MAP = = argmax h H P(D h)p(h) Again: h MAP is optimal in the sense that no method can exist that finds with higher correct probability. If P(h) = P(h ) for all h, h H then this reduces to the maximum likelihood principle h ML = argmax h H P(D h)

15 The Gibbs Classifier Bayes Optimal is optimal but expensive; it uses all hypotheses in H. Non-optimal but much more efficient is the GIBBS-classifier Algorithm: Given: Sample S set of data {x1,...,xm} D), hypotheses space H with a probability distribution P and some to be classified. Method: 1. Select h H randomly according to P (that is similar to GA!) 2. Output: h(x) Surprisingly: E(errorGibbs) 2 E(errorBayesOptimal)

16 The Naïve Bayesian Algorithm (1) Learning Scenario: Examples x 1,...,x m, x i = (a i1,...,a in ) for attributes A 1,..., A n; Hypotheses H = { h1,..., hk } for classes Class of x ist C(x) Two ways to proceed: 1) Using Bayes optimal classification 2) Do not access H for classification. Method 2) avoids to overview all hypothesis in H what is often very difficult and impractical.

17 Estimation of Probabilities from Samples X 1 X 2 X N C Two classes: -1, +1 N boolean attributes How do we estimate P(C)? E.g. Simple Binomial estimation Count # of instances with C = -1, and with C = +1 How do we estimate P(X1,,XN C)? Count P(X1,,XN C=+1) Count P(X1,,XN C=-1) Very complex tasks!

18 Conditional Independence Conditional independence is supposed to simplify the estimation task. Def.: (i) Y is independent of Z, if for all y Y, z Z P(Y = y, Z = z) = P(Y = y) P(Z = z) (ii) X is conditionally independent of Y given Z if P(X=x,Y=y Z=z) = P(X=x Z=z) P(Y=y Z=z) Another formulation: P(X = x Y = y, Z = z) = P(X = x Z = z) This reduces the complexity for n variables from O(2n) in the product space to O(n)!

19 The Naïve Bayesian Algorithm (2) Given x = (a 1,...,a n ): The (conditional) independence assumption says: P(a1,,an h) = P(a1 h)p(a2 h) P(an h) The assumption is called naive. This reduces the parameter estimation from the product space (which is =(2 n )) to the sum of the attribute spaces (which is O(n)). However, it is not always satisfied (e.g. thunder is not independent to rain). The goal is now to avoid the knowledge about P(h) for all h H.

20 The Naïve Bayesian Algorithm (3) Therefore we proceed:. h MAP = h { h1,..., hk }, for which Equivalent: P(C(x)=h x = (a1,...,an)) is maximal. P(x = (a1,...,an) C(x)=h) P(C(x)=h) is maximal. The probabilities from the right side are estimated from a given set S of examples. Without the independence assumption this would be impractible because S needs to be too large.

21 Example A naive Bayes classiers adopts the assumption of conditional independence. Given: P(pneumonia) = 0.01, P(flu) = 0.05 P(cough pneumonia) = 0.9, P(fever pneumonia) = 0.9, P(chest-pain pneumonia) = 0.8, P(cough flu) = 0.5, P(fever flu) = 0.9, P(chest-pain flu) = 0.1 Suppose a patient had cough, fever, but no chest pain. What is the probability ratio between pneumonia and flu? What is the best diagnosis? Solution: Probability ratio = 0.01 * 0.9 * 0.9 * (1-0.8) / 0.05 * 0.5 * 0.9 * (1-0.1) = 0.08 So flu is at least ten times more likely than pneumonia.

22 Advantages: Discussion (1) Tends to work well despite strong assumption of conditional independence. Experiments show it to be quite competitive with other classification methods on standard UCI datasets. Although it does not produce accurate probability estimates when its independence assumptions are violated, it may still pick the correct maximum-probability class in many cases. Able to learn conjunctive concepts in any case

23 Discussion (2) Disadvantages: Does not perform any search of the hypothesis space. Directly constructs a hypothesis from parameter estimates that are easily calculated from the training data. Strong bias Not guarantee consistency with training data. Typically handles noise well since it does not even focus on completely fitting the training data.

24 Part 2 Belief Networks

25 Bayesian Belief Networks (1) Discussing the independence assumption: Positive: makes computation feasible Negative: Is it often not satisfied Reason: There are causal or influential relations between the attributes. Such relation are background knowledge. Idea: Make them visible in a graph. Conditional independence is now only between a subsets of variables. Belief networks combine both.

26 Bayesian Belief Networks (2) A Bayesian belief net (BBN) is a directed graph, together with an associated set of probability tables. The nodes represent variables, which can be discrete or continuous. The edges represent causal/influential relationships between variables. Nodes not connected by edges are independent.

27 Causality (1) Although Bayesian networks are often used to represent causal relationships, this need not be the case: a directed edge from u to v does not require that X v is causally dependent on X u. Example: The graphs: A B C and C B A are equivalent: that is they impose exactly the same conditional independence requirements.

28 Causality (2) A causal network is a Bayesian network with an explicit requirement that the relationships be causal. The additional semantics of the causal networks specify that if a node X is actively caused to be in a given state x (an action written as do(x=x)), then the probability density function changes to the one of the network obtained by cutting the links from X's parents to X, and setting X to the caused value x. Using these semantics, one can predict the impact of external interventions from data obtained prior to intervention.

29 Influence Diagrams The network can represent influence diagrams. Such diagrams are used to represent decision models. Therefore they are a method to support decision making.

30 Example (1) Temperature Winds Umbrella Cloudiness Rain Temperature: cold, mild, hot Cloudes: none, partial, covered Winds: No, mild, strong Conditinal probability table

31 Storm Lightning Thunder Example (2) BusTourGroup Campfire ForestFire Associated with each node is a conditional probability table, which specifies the conditional distribution for the variable given its immediate parents in the graph Each node is asserted to be conditionally independent of its non-descendants, given its immediate parents

32 Inference in Bayesian Networks (1) In general: Calculate conditional probabilities along the directed edges. This can be done in a forward or backward mode. Example forward mode: Suppose we have the edge A B, then we get and P(B) = P(B A)P(A) + P(B not A)P(not A) P(not B) = P(not B A)P(A) + P(not B not A)P(not A)

33 Inference in Bayesian Networks (2) Suppose we want to calculate P(AB E). Using P(A,B) = P(A B) p(b) we get: P(AB E) = P(A E) * P(B AE) P(AB E) = P(B E) * P(A BE) Therefore: P(A BE) = ( P(A E) * P(B AE) ) / P(B E) (another version of Bayes' Theorem).

34 Example (1) Age Income How likely are elderly rich people to buy Sun? P( paper = Sun Age>60, Income > 60k) House Owner Living Location Newspaper Preference Voting Pattern

35 Example (2) Age House Owner Income Living Location How likely are elderly rich people who voted liberal to buy Herald? P( paper = H Age>60, Income > 60k, v = liberal) Newspaper Preference Voting Pattern

36 Unobserved Variables Bayesian networks can be used to answer probabilistic queries about unobserved variables They can be used to find out updated knowledge of the state of a subset of variables when other variables (the evidence variables) are observed. This process of computing the posterior distribution of variables given evidence is called probabilistic inference. A Bayesian network can thus be considered a mechanism for automatically applying Bayes theorem to complex problems.

37 Inference in Bayesian Networks (2) In the network we can chain on several edges: Find the probability of H given that A1, A2, A3 and E have happened: P(H A1A2A3E) = ( P(H E) * P(A1A2A3 HE) ) / P(A1A2A3 E) because: P(A1A2A3 E) = P(A1 A2A3E) * P(A2A3 E) = P(A1 A2A3E) * P(A2 A3E) P(A3 E). With independence this simplifies. E.g. we get: P(H A1AEI) = ( P(H E) * P(A1 HE) ) * P(A2 HE) ) / ( P(A1 E) * P(A2 E) )

38 Consider the net: Recalculation (1) A C Given probabilities: B True False P(A) = 0.1 P(~A) = 0.9 True False P(B) = 0.4 P(~B) = 0.6 A True False B True False True False True P(C AB) = 0.8 P(C A~B) = 0.6 P(C ~AB) = 0.5 P(C ~A~B) = 0.5 False P(~C AB) = 0.2 P(C A~B) = 0.6 P(~C ~AB) = 0.5 P(~C ~A~B) = 0.5

39 Recalculation (2) Calculation of the probability of C: p(c) =p(cab) + p(c~ab) + p(ca~b) + p(c~a~b)=p(c AB) * p(ab) + p(c ~AB) * p(~ab) + p(c A~B) * p(a~b) + p(c ~A~B) * p(~a~b)=p(c AB) * p(a) * p(b) + p(c ~AB) * p(~a) * p(b) + p(c A~B) * p(a) * p(~b) + p(c ~A~B) * p(~a) * p(~b)=0.518 Recalculation of P(A) and P(B) If we know that C is true using Bayes rule: P(B C) =( P( C B) * P(B) ) / P(C)=( ( P(C AB) * P(A) + P(C ~AB) * P(~A) ) * P(B) ) / P(C)=( (0.8 * * 0.9) * 0.4 ) / = P(A C) =( P( C A) * P(A) ) / P(C)=( ( ppc AB) * P(B) + P(C A~B) * P(~B) ) * P(A) ) / P(C)=( (0.8 * * 0.6) * 0.1 ) / = 0.131

40 Complete and Incomplete Information 1. The network structure is given in advance and all the variables are fully observable in the training examples. ==> Trivial Case: just estimate the conditional probabilities. 2. The network structure is given in advance but only some of the variables are observable in the training data. ==> Similar to learning the weights for the hidden units of a Neural Net: Gradient Ascent Procedure 3. The network structure is not known in advance. ==> Use a heuristic search or constraint-based technique to search through potential structures.

41 Parameter Learning In order to fully specify the Bayesian network and thus fully represent the joint parameter probabilitydistribution, it is necessary to specify for each node X the probability distribution for X conditional upon X's parents. The distribution of X conditional upon its parents may have any form.

42 Expectation Maximization: Unobservable Relevant Variables. Example:Assume that data points have been uniformly generated from k distinct Gaussian with the same known variance. Problem find a hypothesis h=< 1, 2,.., k > that describes the means of each of the k distributions. In particular, we are looking for a maximum likelihood hypothesis for these means. We extend the problem description as follows: for each point x i, there are k hidden variables z i1,..,z ik such that z il =1 if x i was generated by normal distribution N and z iq = 0 for all q N.

43 EM Algorithm Initially: An arbitrary initial hypothesis h=< 1, 2,.., k > is chosen. The EM Algorithm contains two steps: Step 1 (Estimation, E): Calculate the expected value E[zij] of each hidden variable zij, assuming that the current hypothesis h=< 1, 2,.., k > holds. Step 2 (Maximization, M): Calculate a new maximum likelihood hypothesis h =< 1, 2,.., k >, assuming the value taken on by each hidden variable z ij is its expected value E[z ij ] calculated in step 1. Then replace the hypothesis h=< 1, 2,.., k > by the new hypothesis h =< 1, 2,.., k > and iterate.

44 Problems and Limitations (1) A computational problem is exploring a previously unknown network. To calculate the probability of any branch of the network, all branches must be calculated. This process of network discovery is an NP-hard task which might either be too costly to perform, or impossible given the number and combination of variables.

45 Problems and Limitations (2) The network relies on the quality and coverage of the prior beliefs (which is knowledge!!) used in the inference processing. The network is only as useful as this background knowledge is reliable. Both, too optimistic or too pessimistic expectation of the quality of these prior beliefs will invalidate the results. Related to this is the selection of the statistical distribution induced in modeling the data. Selecting the proper distribution model to describe the data has a notable effect on the quality of the resulting network.

46 Dependency Networks They are a generalization and alternative to the Bayesian network. It has also has a graph and probability component but graph can be cyclic. The probability component is as in a Bayesian network.

47 Loops If BP is used on graphs with loops, messages may circulate indefinitely Empirically, a good approximation is still achievable Stop after fixed # of iterations Stop when no significant change in beliefs If solution is not oscillatory but converges, it usually is a good approximation

48 Applications Bayesian Learning is a standard methods in many application areas like Medicine (classification, prediction) Image retrieval and pattern recognition Quality control for material Some competitors are e.g. Support vector machines Clustering methods

49 Tools Hugin tool: Implements the propagation algorithm of Lauritzen and Spiegelhalter. A more modern and powerful BBN tool is the AgendaRisk tool With this tool it is possible to perform fast propagation in large BBNs (with hundreds of nodes and millions of state combinations) GeNIe: WinMine Toolkit, Weka, Matlab

50 Summary Bayes theorem Baysian decision Maximum a posteriori and maximum likelihood The naïve Baysian method and conditional independence Gibbs system Belief nets and inference in nets and belief revision Estimating unknown parameters: EM algorithm Limitations

51 Some References (1) Bernardo, J. M. and Smith, A. F. M. (1994) Bayesian Theory, New York: John Wiley. Gelman, A., Carlin, J.B., Stern, H.S., and Rubin, D.B. (1995) Bayesian Data Analysis, London: Chapman & Hall, ISBN Ian H.Witten, Eibe Frank: Data Mining Practical Machine Learning Tools with Java Implementations. Morgan Kaufmann, David W. Aha: Machine Learning tools. home.earthlink.net/~dwaha/research/machinelearning.html

52 Some References (2) Heckerman, David :Tutorial on Learning with Bayesian Networks. In Jordan, Michael Irwin, Learning in Graphical Models, Adaptive Computation and Machine Learning, MIT Press 1998, pp , Borgelt, Christian; Kruse, Rudolf (March 2002). Graphical Models for Data Analysis and Mining Chichester, D. Heckerman, D. M. Chickering, C. Meek, R. Rounthwaite, C. Kadie, Dependency Networks for Inference, Collaborative Filtering, and Data Visualization, Journal of Machine Learning Research, Vol. 1, 2000, pp

Stephen Scott.

Stephen Scott. 1 / 28 ian ian Optimal (Adapted from Ethem Alpaydin and Tom Mitchell) Naïve Nets sscott@cse.unl.edu 2 / 28 ian Optimal Naïve Nets Might have reasons (domain information) to favor some hypotheses/predictions

More information

CSCE 478/878 Lecture 6: Bayesian Learning

CSCE 478/878 Lecture 6: Bayesian Learning Bayesian Methods Not all hypotheses are created equal (even if they are all consistent with the training data) Outline CSCE 478/878 Lecture 6: Bayesian Learning Stephen D. Scott (Adapted from Tom Mitchell

More information

Bayesian Learning. Two Roles for Bayesian Methods. Bayes Theorem. Choosing Hypotheses

Bayesian Learning. Two Roles for Bayesian Methods. Bayes Theorem. Choosing Hypotheses Bayesian Learning Two Roles for Bayesian Methods Probabilistic approach to inference. Quantities of interest are governed by prob. dist. and optimal decisions can be made by reasoning about these prob.

More information

Notes on Machine Learning for and

Notes on Machine Learning for and Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori

More information

Bayesian Learning Features of Bayesian learning methods:

Bayesian Learning Features of Bayesian learning methods: Bayesian Learning Features of Bayesian learning methods: Each observed training example can incrementally decrease or increase the estimated probability that a hypothesis is correct. This provides a more

More information

CSCE 478/878 Lecture 6: Bayesian Learning and Graphical Models. Stephen Scott. Introduction. Outline. Bayes Theorem. Formulas

CSCE 478/878 Lecture 6: Bayesian Learning and Graphical Models. Stephen Scott. Introduction. Outline. Bayes Theorem. Formulas ian ian ian Might have reasons (domain information) to favor some hypotheses/predictions over others a priori ian methods work with probabilities, and have two main roles: Naïve Nets (Adapted from Ethem

More information

Bayesian Learning. CSL603 - Fall 2017 Narayanan C Krishnan

Bayesian Learning. CSL603 - Fall 2017 Narayanan C Krishnan Bayesian Learning CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Bayes Theorem MAP Learners Bayes optimal classifier Naïve Bayes classifier Example text classification Bayesian networks

More information

Lecture 9: Bayesian Learning

Lecture 9: Bayesian Learning Lecture 9: Bayesian Learning Cognitive Systems II - Machine Learning Part II: Special Aspects of Concept Learning Bayes Theorem, MAL / ML hypotheses, Brute-force MAP LEARNING, MDL principle, Bayes Optimal

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Classification: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Chapter 8&9: Classification: Part 3 Instructor: Yizhou Sun yzsun@ccs.neu.edu March 12, 2013 Midterm Report Grade Distribution 90-100 10 80-89 16 70-79 8 60-69 4

More information

Bayesian Learning. Examples. Conditional Probability. Two Roles for Bayesian Methods. Prior Probability and Random Variables. The Chain Rule P (B)

Bayesian Learning. Examples. Conditional Probability. Two Roles for Bayesian Methods. Prior Probability and Random Variables. The Chain Rule P (B) Examples My mood can take 2 possible values: happy, sad. The weather can take 3 possible vales: sunny, rainy, cloudy My friends know me pretty well and say that: P(Mood=happy Weather=rainy) = 0.25 P(Mood=happy

More information

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem Recall from last time: Conditional probabilities Our probabilistic models will compute and manipulate conditional probabilities. Given two random variables X, Y, we denote by Lecture 2: Belief (Bayesian)

More information

MODULE -4 BAYEIAN LEARNING

MODULE -4 BAYEIAN LEARNING MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities

More information

Bayesian Inference. Definitions from Probability: Naive Bayes Classifiers: Advantages and Disadvantages of Naive Bayes Classifiers:

Bayesian Inference. Definitions from Probability: Naive Bayes Classifiers: Advantages and Disadvantages of Naive Bayes Classifiers: Bayesian Inference The purpose of this document is to review belief networks and naive Bayes classifiers. Definitions from Probability: Belief networks: Naive Bayes Classifiers: Advantages and Disadvantages

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Algorithmisches Lernen/Machine Learning

Algorithmisches Lernen/Machine Learning Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines

More information

BAYESIAN LEARNING. [Read Ch. 6] [Suggested exercises: 6.1, 6.2, 6.6]

BAYESIAN LEARNING. [Read Ch. 6] [Suggested exercises: 6.1, 6.2, 6.6] 1 BAYESIAN LEARNING [Read Ch. 6] [Suggested exercises: 6.1, 6.2, 6.6] Bayes Theorem MAP, ML hypotheses, MAP learners Minimum description length principle Bayes optimal classifier, Naive Bayes learner Example:

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Introduction to Bayesian Learning. Machine Learning Fall 2018

Introduction to Bayesian Learning. Machine Learning Fall 2018 Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability

More information

Bayesian Classification. Bayesian Classification: Why?

Bayesian Classification. Bayesian Classification: Why? Bayesian Classification http://css.engineering.uiowa.edu/~comp/ Bayesian Classification: Why? Probabilistic learning: Computation of explicit probabilities for hypothesis, among the most practical approaches

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification

More information

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty Lecture 10: Introduction to reasoning under uncertainty Introduction to reasoning under uncertainty Review of probability Axioms and inference Conditional probability Probability distributions COMP-424,

More information

CHAPTER-17. Decision Tree Induction

CHAPTER-17. Decision Tree Induction CHAPTER-17 Decision Tree Induction 17.1 Introduction 17.2 Attribute selection measure 17.3 Tree Pruning 17.4 Extracting Classification Rules from Decision Trees 17.5 Bayesian Classification 17.6 Bayes

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Bayesian Learning. Bayesian Learning Criteria

Bayesian Learning. Bayesian Learning Criteria Bayesian Learning In Bayesian learning, we are interested in the probability of a hypothesis h given the dataset D. By Bayes theorem: P (h D) = P (D h)p (h) P (D) Other useful formulas to remember are:

More information

Uncertainty. Variables. assigns to each sentence numerical degree of belief between 0 and 1. uncertainty

Uncertainty. Variables. assigns to each sentence numerical degree of belief between 0 and 1. uncertainty Bayes Classification n Uncertainty & robability n Baye's rule n Choosing Hypotheses- Maximum a posteriori n Maximum Likelihood - Baye's concept learning n Maximum Likelihood of real valued function n Bayes

More information

Naïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Naïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824 Naïve Bayes Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative HW 1 out today. Please start early! Office hours Chen: Wed 4pm-5pm Shih-Yang: Fri 3pm-4pm Location: Whittemore 266

More information

Why Probability? It's the right way to look at the world.

Why Probability? It's the right way to look at the world. Probability Why Probability? It's the right way to look at the world. Discrete Random Variables We denote discrete random variables with capital letters. A boolean random variable may be either true or

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning CS4375 --- Fall 2018 Bayesian a Learning Reading: Sections 13.1-13.6, 20.1-20.2, R&N Sections 6.1-6.3, 6.7, 6.9, Mitchell 1 Uncertainty Most real-world problems deal with

More information

Introduction to Bayesian Learning

Introduction to Bayesian Learning Course Information Introduction Introduction to Bayesian Learning Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Apprendimento Automatico: Fondamenti - A.A. 2016/2017 Outline

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September

More information

Introduction to Machine Learning

Introduction to Machine Learning Uncertainty Introduction to Machine Learning CS4375 --- Fall 2018 a Bayesian Learning Reading: Sections 13.1-13.6, 20.1-20.2, R&N Sections 6.1-6.3, 6.7, 6.9, Mitchell Most real-world problems deal with

More information

Naïve Bayesian. From Han Kamber Pei

Naïve Bayesian. From Han Kamber Pei Naïve Bayesian From Han Kamber Pei Bayesian Theorem: Basics Let X be a data sample ( evidence ): class label is unknown Let H be a hypothesis that X belongs to class C Classification is to determine H

More information

Statistical learning. Chapter 20, Sections 1 4 1

Statistical learning. Chapter 20, Sections 1 4 1 Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

CMPT Machine Learning. Bayesian Learning Lecture Scribe for Week 4 Jan 30th & Feb 4th

CMPT Machine Learning. Bayesian Learning Lecture Scribe for Week 4 Jan 30th & Feb 4th CMPT 882 - Machine Learning Bayesian Learning Lecture Scribe for Week 4 Jan 30th & Feb 4th Stephen Fagan sfagan@sfu.ca Overview: Introduction - Who was Bayes? - Bayesian Statistics Versus Classical Statistics

More information

Artificial Intelligence: Reasoning Under Uncertainty/Bayes Nets

Artificial Intelligence: Reasoning Under Uncertainty/Bayes Nets Artificial Intelligence: Reasoning Under Uncertainty/Bayes Nets Bayesian Learning Conditional Probability Probability of an event given the occurrence of some other event. P( X Y) P( X Y) P( Y) P( X,

More information

Introduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees

Introduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees Introduction to ML Two examples of Learners: Naïve Bayesian Classifiers Decision Trees Why Bayesian learning? Probabilistic learning: Calculate explicit probabilities for hypothesis, among the most practical

More information

Directed Graphical Models

Directed Graphical Models Directed Graphical Models Instructor: Alan Ritter Many Slides from Tom Mitchell Graphical Models Key Idea: Conditional independence assumptions useful but Naïve Bayes is extreme! Graphical models express

More information

Consider an experiment that may have different outcomes. We are interested to know what is the probability of a particular set of outcomes.

Consider an experiment that may have different outcomes. We are interested to know what is the probability of a particular set of outcomes. CMSC 310 Artificial Intelligence Probabilistic Reasoning and Bayesian Belief Networks Probabilities, Random Variables, Probability Distribution, Conditional Probability, Joint Distributions, Bayes Theorem

More information

Two Roles for Bayesian Methods

Two Roles for Bayesian Methods Bayesian Learning Bayes Theorem MAP, ML hypotheses MAP learners Minimum description length principle Bayes optimal classifier Naive Bayes learner Example: Learning over text data Bayesian belief networks

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University August 30, 2017 Today: Decision trees Overfitting The Big Picture Coming soon Probabilistic learning MLE,

More information

Probabilistic Classification

Probabilistic Classification Bayesian Networks Probabilistic Classification Goal: Gather Labeled Training Data Build/Learn a Probability Model Use the model to infer class labels for unlabeled data points Example: Spam Filtering...

More information

Bayesian Learning. Chapter 6: Bayesian Learning. Bayes Theorem. Roles for Bayesian Methods. CS 536: Machine Learning Littman (Wu, TA)

Bayesian Learning. Chapter 6: Bayesian Learning. Bayes Theorem. Roles for Bayesian Methods. CS 536: Machine Learning Littman (Wu, TA) Bayesian Learning Chapter 6: Bayesian Learning CS 536: Machine Learning Littan (Wu, TA) [Read Ch. 6, except 6.3] [Suggested exercises: 6.1, 6.2, 6.6] Bayes Theore MAP, ML hypotheses MAP learners Miniu

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

Chapter 6 Classification and Prediction (2)

Chapter 6 Classification and Prediction (2) Chapter 6 Classification and Prediction (2) Outline Classification and Prediction Decision Tree Naïve Bayes Classifier Support Vector Machines (SVM) K-nearest Neighbors Accuracy and Error Measures Feature

More information

Bayesian Networks Inference with Probabilistic Graphical Models

Bayesian Networks Inference with Probabilistic Graphical Models 4190.408 2016-Spring Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang intelligence Lab Seoul National University 4190.408 Artificial (2016-Spring) 1 Machine Learning? Learning

More information

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004 A Brief Introduction to Graphical Models Presenter: Yijuan Lu November 12,2004 References Introduction to Graphical Models, Kevin Murphy, Technical Report, May 2001 Learning in Graphical Models, Michael

More information

COMP 328: Machine Learning

COMP 328: Machine Learning COMP 328: Machine Learning Lecture 2: Naive Bayes Classifiers Nevin L. Zhang Department of Computer Science and Engineering The Hong Kong University of Science and Technology Spring 2010 Nevin L. Zhang

More information

Topics. Bayesian Learning. What is Bayesian Learning? Objectives for Bayesian Learning

Topics. Bayesian Learning. What is Bayesian Learning? Objectives for Bayesian Learning Topics Bayesian Learning Sattiraju Prabhakar CS898O: ML Wichita State University Objectives for Bayesian Learning Bayes Theorem and MAP Bayes Optimal Classifier Naïve Bayes Classifier An Example Classifying

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Data Mining Part 4. Prediction

Data Mining Part 4. Prediction Data Mining Part 4. Prediction 4.3. Fall 2009 Instructor: Dr. Masoud Yaghini Outline Introduction Bayes Theorem Naïve References Introduction Bayesian classifiers A statistical classifiers Introduction

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I SYDE 372 Introduction to Pattern Recognition Probability Measures for Classification: Part I Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 Why use probability

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Mobile Robot Localization

Mobile Robot Localization Mobile Robot Localization 1 The Problem of Robot Localization Given a map of the environment, how can a robot determine its pose (planar coordinates + orientation)? Two sources of uncertainty: - observations

More information

Text Categorization CSE 454. (Based on slides by Dan Weld, Tom Mitchell, and others)

Text Categorization CSE 454. (Based on slides by Dan Weld, Tom Mitchell, and others) Text Categorization CSE 454 (Based on slides by Dan Weld, Tom Mitchell, and others) 1 Given: Categorization A description of an instance, x X, where X is the instance language or instance space. A fixed

More information

PROBABILISTIC REASONING SYSTEMS

PROBABILISTIC REASONING SYSTEMS PROBABILISTIC REASONING SYSTEMS In which we explain how to build reasoning systems that use network models to reason with uncertainty according to the laws of probability theory. Outline Knowledge in uncertain

More information

Bayesian Approaches Data Mining Selected Technique

Bayesian Approaches Data Mining Selected Technique Bayesian Approaches Data Mining Selected Technique Henry Xiao xiao@cs.queensu.ca School of Computing Queen s University Henry Xiao CISC 873 Data Mining p. 1/17 Probabilistic Bases Review the fundamentals

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Machine Learning (CS 567)

Machine Learning (CS 567) Machine Learning (CS 567) Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol Han (cheolhan@usc.edu)

More information

Introduction to Probabilistic Graphical Models

Introduction to Probabilistic Graphical Models Introduction to Probabilistic Graphical Models Kyu-Baek Hwang and Byoung-Tak Zhang Biointelligence Lab School of Computer Science and Engineering Seoul National University Seoul 151-742 Korea E-mail: kbhwang@bi.snu.ac.kr

More information

p L yi z n m x N n xi

p L yi z n m x N n xi y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Bayesian Methods in Artificial Intelligence

Bayesian Methods in Artificial Intelligence WDS'10 Proceedings of Contributed Papers, Part I, 25 30, 2010. ISBN 978-80-7378-139-2 MATFYZPRESS Bayesian Methods in Artificial Intelligence M. Kukačka Charles University, Faculty of Mathematics and Physics,

More information

Building Bayesian Networks. Lecture3: Building BN p.1

Building Bayesian Networks. Lecture3: Building BN p.1 Building Bayesian Networks Lecture3: Building BN p.1 The focus today... Problem solving by Bayesian networks Designing Bayesian networks Qualitative part (structure) Quantitative part (probability assessment)

More information

Probabilistic Reasoning. Kee-Eung Kim KAIST Computer Science

Probabilistic Reasoning. Kee-Eung Kim KAIST Computer Science Probabilistic Reasoning Kee-Eung Kim KAIST Computer Science Outline #1 Acting under uncertainty Probabilities Inference with Probabilities Independence and Bayes Rule Bayesian networks Inference in Bayesian

More information

Bayesian Networks BY: MOHAMAD ALSABBAGH

Bayesian Networks BY: MOHAMAD ALSABBAGH Bayesian Networks BY: MOHAMAD ALSABBAGH Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN Introduction Conditional

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Lecture 9: Naive Bayes, SVM, Kernels. Saravanan Thirumuruganathan

Lecture 9: Naive Bayes, SVM, Kernels. Saravanan Thirumuruganathan Lecture 9: Naive Bayes, SVM, Kernels Instructor: Outline 1 Probability basics 2 Probabilistic Interpretation of Classification 3 Bayesian Classifiers, Naive Bayes 4 Support Vector Machines Probability

More information

Lecture 6: Graphical Models: Learning

Lecture 6: Graphical Models: Learning Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)

More information

The Naïve Bayes Classifier. Machine Learning Fall 2017

The Naïve Bayes Classifier. Machine Learning Fall 2017 The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

13 : Variational Inference: Loopy Belief Propagation and Mean Field

13 : Variational Inference: Loopy Belief Propagation and Mean Field 10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction

More information

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4 ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine CS 484 Data Mining Classification 7 Some slides are from Professor Padhraic Smyth at UC Irvine Bayesian Belief networks Conditional independence assumption of Naïve Bayes classifier is too strong. Allows

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 3, 2016 CPSC 422, Lecture 11 Slide 1 422 big picture: Where are we? Query Planning Deterministic Logics First Order Logics Ontologies

More information

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 Plan for today and next week Today and next time: Bayesian networks (Bishop Sec. 8.1) Conditional

More information

Modeling and reasoning with uncertainty

Modeling and reasoning with uncertainty CS 2710 Foundations of AI Lecture 18 Modeling and reasoning with uncertainty Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square KB systems. Medical example. We want to build a KB system for the diagnosis

More information

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

CS6375: Machine Learning Gautam Kunapuli. Decision Trees Gautam Kunapuli Example: Restaurant Recommendation Example: Develop a model to recommend restaurants to users depending on their past dining experiences. Here, the features are cost (x ) and the user s

More information

Machine Learning Summer School

Machine Learning Summer School Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,

More information

Tutorial 2. Fall /21. CPSC 340: Machine Learning and Data Mining

Tutorial 2. Fall /21. CPSC 340: Machine Learning and Data Mining 1/21 Tutorial 2 CPSC 340: Machine Learning and Data Mining Fall 2016 Overview 2/21 1 Decision Tree Decision Stump Decision Tree 2 Training, Testing, and Validation Set 3 Naive Bayes Classifier Decision

More information

{ p if x = 1 1 p if x = 0

{ p if x = 1 1 p if x = 0 Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X =

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Mobile Robot Localization

Mobile Robot Localization Mobile Robot Localization 1 The Problem of Robot Localization Given a map of the environment, how can a robot determine its pose (planar coordinates + orientation)? Two sources of uncertainty: - observations

More information

The Origin of Deep Learning. Lili Mou Jan, 2015

The Origin of Deep Learning. Lili Mou Jan, 2015 The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets

More information

Bayesian Classifiers and Probability Estimation. Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington

Bayesian Classifiers and Probability Estimation. Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington Bayesian Classifiers and Probability Estimation Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington 1 Data Space Suppose that we have a classification problem The

More information

Directed and Undirected Graphical Models

Directed and Undirected Graphical Models Directed and Undirected Graphical Models Adrian Weller MLSALT4 Lecture Feb 26, 2016 With thanks to David Sontag (NYU) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,

More information

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

Inference in Graphical Models Variable Elimination and Message Passing Algorithm Inference in Graphical Models Variable Elimination and Message Passing lgorithm Le Song Machine Learning II: dvanced Topics SE 8803ML, Spring 2012 onditional Independence ssumptions Local Markov ssumption

More information

Probabilistic Graphical Models (I)

Probabilistic Graphical Models (I) Probabilistic Graphical Models (I) Hongxin Zhang zhx@cad.zju.edu.cn State Key Lab of CAD&CG, ZJU 2015-03-31 Probabilistic Graphical Models Modeling many real-world problems => a large number of random

More information

Lecture 8: Bayesian Networks

Lecture 8: Bayesian Networks Lecture 8: Bayesian Networks Bayesian Networks Inference in Bayesian Networks COMP-652 and ECSE 608, Lecture 8 - January 31, 2017 1 Bayes nets P(E) E=1 E=0 0.005 0.995 E B P(B) B=1 B=0 0.01 0.99 E=0 E=1

More information

Bayesian Network Representation

Bayesian Network Representation Bayesian Network Representation Sargur Srihari srihari@cedar.buffalo.edu 1 Topics Joint and Conditional Distributions I-Maps I-Map to Factorization Factorization to I-Map Perfect Map Knowledge Engineering

More information

Machine Learning. Bayesian Learning.

Machine Learning. Bayesian Learning. Machine Learning Bayesian Learning Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut für Informatik Technische Fakultät Albert-Ludwigs-Universität Freiburg Martin.Riedmiller@uos.de

More information

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 14, 2015 Today: The Big Picture Overfitting Review: probability Readings: Decision trees, overfiting

More information

Inference in Bayesian Networks

Inference in Bayesian Networks Andrea Passerini passerini@disi.unitn.it Machine Learning Inference in graphical models Description Assume we have evidence e on the state of a subset of variables E in the model (i.e. Bayesian Network)

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information