CMPT Machine Learning. Bayesian Learning Lecture Scribe for Week 4 Jan 30th & Feb 4th

Size: px
Start display at page:

Download "CMPT Machine Learning. Bayesian Learning Lecture Scribe for Week 4 Jan 30th & Feb 4th"

Transcription

1 CMPT Machine Learning Bayesian Learning Lecture Scribe for Week 4 Jan 30th & Feb 4th Stephen Fagan sfagan@sfu.ca Overview: Introduction - Who was Bayes? - Bayesian Statistics Versus Classical Statistics - Bayesian Learning (in a nutshell) Prerequisites From Probability Theory - Basic Terms - Basic Formulas Bayes Theorem - Derivation - Significance - Example Naive Bayes Classifier Bayesian Belief Networks - Conditional Independence - Representation - Inference - Causation - A Brief History of Causation

2 Introduction: Who was Bayes? "It is impossible to understand a man s work unless you understand something of his character and unless you understand something of his environment." Thomas Bayes? (possible, but not probable) Reverend Thomas Bayes ( ) was an English theologian and mathematician. Motivated by his religious beliefs, he proposed the well known argument for the existence of God known as the argument by design. Basically, the argument is: without assuming the existence of God, the operation of the universe is extremely unlikely, therefore, since the operation of the universe is a fact, it is very likely that God exists. To back up this argument, Bayes produced a general mathematical theory which introduced probabilistic inferences (a method for calculating the probability that an event will occur in the future from the frequency with which it has occurred in prior trials). Central to this theory was a theorem, now known as Bayes Theorem (1764), which states that one s evidence confirms the likelihood of an hypothesis only to the degree that the appearance of this evidence would be more probable with the assumption of the hypothesis than without it (see below for its formal statement). Bayesian Statistics Versus Classical Statistics: The central difference between Bayesian and Classical statistics is that in Bayesian statistics we assume that we know the probability of any event (before any calculations) and the classical statistician does not. The probabilities that the Bayesian assumes we know are called prior probabilities. Bayesian Learning (in a nutshell): As Bayesians, we assume that we have a prior probability distribution for all events. This gives us a quantitative method to weight the evidence that we come across during

3 learning. Such methods allow us to construct a more detailed ranking of the alternative hypotheses than if we were only concerned with consistency of the hypotheses with the evidence (though as we will see, consistency based learning is a subclass of Bayesian learning). As a result, Bayesian methods provides practical learning algorithms (though they require prior probabilities - see below) such as Naive Bayes learning and Bayesian belief network learning. In addition to this, Bayesian methods are thought to provide a useful conceptual framework with which we can get a standard for evaluating other learning algorithms. Prerequisites From Probability Theory In order to confidently use Bayesian learning methods, we will need to be familiar with a few basic terms and formulas from Probability Theory. Basic Terms a. Random Variable Since our concern is with machine learning, we can think of random variables as being like attributes which can take various values. e.g. sunny, rain, cloudy, snow a. Domain This is the set of possible values that a random variable can take. It could be finite or infinite. b. Probability Distribution This is a mapping from a domain (see above) to values in 0,1. When the domain has a finite of countably infinite number of distinct elements, then the sum of all of the probabilities given by the probability distribution equals 1. e.g. P 0.7,0.2, 0.08,0. 02 so, P sunny 0. 7, and so on. c. Event Each assignment of a domain value to a random variable is called an event e.g. rain Basic Formulas a. Conditional Probability This formula allows you to calculate the probability of an event A given that event B is assumed to have been obtained. This probability is denoted by P A B.

4 P A B P A B P B b. Product Rule This rule, derived from a, gives the probability of a conjunction of events A and B: P A B P A B P B P B A P A c. Sum Rule This rule gives the probability of the disjunction of events A and B: P A B P A P B P A B d. Theorem of Total Probability n If events A 1,...,A n are mutually exclusive with P A i i 1 P B Bayes Theorem n P B A i P A i. i 1 1, then The informal statement of Bayes Theorem is that one s evidence confirms the likelihood of an hypothesis only to the degree that the appearance of this evidence would be more probable with the assumption of the hypothesis than without it. Formally, in the special case for machine learning we get: P h D P D h P h P D where: D is a set training data. h is a hypothesis. P h D is the posterior probability, i.e. the conditional probability of h after the training data (evidence) is presented. P h is the prior probability of hypothesis h. This non-classical quantity is often found by looking at data from the past (or in the training data). P D is the prior probability of the training data D. This quantity is often a constant value, P D P D h P h P D h P h, which can be computed easily when we insist that P h D and P h D sum to 1. P D h is the probability of D given h, and is called the likelihood. This quantity is often easy to calculate since we sometimes assign it the value 1 when D and h are consistent, and assign it 0 when they are inconsistent. It should be noted that Bayes Theorem is completely general and can be applied to any situation where one wants to calculate a conditional probability and one has knowledge of prior probabilities. It s generality is demonstrated through its derivation, which is very simple. To help our intuitive understanding of Bayes theorem, consider the example where we see some clouds in the sky and we are wondering what the chances of rain are. That is we

5 % % want to know P rain clouds!. By Bayes Theorem, we know that this is equal to P" clouds rain# P" rain# P" clouds#. Here are some properties of Bayes theorem which make this formula more intuitive: $ The more likely P clouds rain! is, the more likely P rain clouds! is. $ If P clouds rain! % 0, the P rain clouds! % 0. If we take all of the probabilities to be 0 or 1, then we get the propositional calculus. $ Bayes theorem is only useable when P clouds! & 0. However there is research about extending Bayes theorem to handle cases like P clouds! % 0 (e.g. belief revision). $ The more likely P rain! is, the more likely P rain clouds! is. $ If P clouds! % 1, then P rain clouds! % P rain!. $ The more surprising you evidence (the smaller that P clouds! is), the larger its effect (the larger P rain clouds! is). Derivation of Bayes Theorem The derivation of this famous theorem is quite trivial. It is short and only uses the definition of conditional probability and the commutitivity of conjunction. P" D h# P" h# P" D# % P" D' h# 1 P" h# P h! P" D# P" D' h# P" D# P" h' D# P" D# % P h D! Despite this formal simplicity, Bayes Theorem is still considered an important result. Significance Bayes Theorem is important for several reasons: 1. Bayesians regard the theorem as a rule for updating beliefs in response to new evidence. 2. The posterior probability, P h D!, is a quantity that people find hard to assess (they are more used to calculating P D h! ). The theorem expresses this quantity in terms that are more accessible. 3. It forms the basis for some practical learning algorithms (see below). The general Bayesian learning strategy is: 1. Start with your prior probabilities, P H!. 2. Use data D to form P H D!. 3. Adopt most likely hypothesis given P H D!.

6 ( ( Bayes Theorem is used to choose the hypothesis that has the highest probability of being correct, given some set of training data. We call such an hypothesis a maximum a posteriori (MAP) hypothesis, and denote it by h MAP. h MAP ( arg max P* h D+ h) H P, D h- P, h- P, D- arg max h) H arg max P* D h+ P* h+ h) H Notice that P* D+ is omitted from the denominator is the last line because it is essentially a constant with respect to the class of hypotheses, H. If all of the hypotheses have the same prior probability,. i,j/ P* h i + ( P* h j + 0, then we define the maximum likelihood (ML) hypothesis as: Example h ML ( arg max P* D h i +. h i ) H P* cancer+ ( 0.08 P* : cancer+ ( P* ; cancer+ ( 0.98 P* <,cancer+ ( 0. 2 P* ; : cancer+ ( 0.03 P* <, : cancer+ ( 0.97 Where ; and < represent positive and negative cancer-test results, respectively. = > 4? What is the probability that I have cancer given that my cancer-test result it positive? (i.e. what is P* cancer ; +?) A 7 B C > B By Bayes Theorem, P* cancer ; + P, D cancer- P, cancer- ( P, D -. We know P* ; cancer+ and P* cancer+, but we must calculate P* ; + as follows: P* ; + ( P* ; E cancer+ F P* ; E : cancer+ So, ( P* ; cancer+ P* cancer+ F P* ; : cancer+ P* : cancer+

7 J J PG cancer H I Naive Bayes Classifier PK L cancerm PK cancerm PK L M PK L cancerm PK cancerm PK L cancerm PK cancerm N PK L O cancerm PK O cancerm J 0.98P P 0.008N 0.03P J N J 0.21 We see that PG cancer H I is still less than 1/2, however this does not imply any particular action. For different people, actions will vary (even with the same information) depending on their subjective utility judgements. Another thin to notice is that this value, PG cancer H I J 0. 21, is often confused (even by doctors) with PG H canceri J The naive Bayes classifier is a highly practical Bayesian learning method that can be used when: 1. the amount of training data is moderate or large (so that the frequency of the events in the data accurately reflect their probability of occurring outside of the training data) and 2. when the attributes values that describe instances are independent given the classification (see below). That is, given the target value (i.e. classification), v, of an instance that has attributes a 1,a 2,...,a n, PG a i Q a 2 Q... Q a n I J R i PG a i vi. Here is what the naive Bayes classifier does: S Let x be an instance described by a conjunction of attribute values and let fg xi be a target function whose range is some finite set V (representing the classes). S The learner is provided with a set of training examples of the target function and is then asked to classify (i.e. predict the target value of) a new instance which is described by a tuple of attributes, T a 1,a 2,...,a n U. S The learner assigns to the new instance the most probable target value, v MAP, given the attribute values that describe it, where

8 V V V V ] \ \ \ v MAP arg max PX v j a 1,a 2,...,a ny v jw V arg max PZ a 1,a 2,...,a n v j[ PZ v j[ PZ a v jw 1,a 2,...,a V n[ arg max PX a 1,a 2,...,a n v jy PX v jy v jw V arg max PX v jy v jw V i PX a i v jy V v NB \ by Bayes theorem because PX a 1,a 2,...,a ny is a constant given the instance by the assumption of conditional independence of the attributes given the target value denoting the target value outputted by the naive Bayes classifier. To calculate this value, the learner first estimates the PX v jy values from the training data by simply counting the frequency with which each v j occurs in the data. The PX a i v jy values are then calculated by counting the frequency with which a i occurs in the training examples that get the target value v j. Thus, the importance of having large training set is due to the fact that they determine the accuracy of these critical values. If the number of attribute values is n and the number of distinct target values is k, then the learner only needs to calculate n ^ k such PX a i v jy values. Computationally, this is very cheap compared with the number of PX a 1,a 2,...,a n v jy values that would have to be calculated if we did not have the assumption of conditional independence. Another interesting thing to note about the naive bayes learning method is that it doesn t perform an explicit search through the hypothesis space. Instead it merely counts the frequency of various data combinations in the training set to calculate probabilities. For examples of the naive Bayes classifier, see section and section 6.10 of the text. Bayesian Belief Networks In many cases, the condition of complete conditional independence cannot be met, and so naive Bayes classifier will not learn successfully. However, as we have seen, to remove this condition completely is computationally very expensive: we would have to find a number of conditional probabilities equal to the number of instances times the number of target values (as opposed to merely n ^ k). Bayesian belief networks (aka Bayes nets, belief nets, probability nets, causal nets) offer us a compromise: "A Bayesian belief network describes the probability distribution governing a set of variables by specifying a set of conditional independence assumptions along with a set of conditional probabilities." (page 184 of text)

9 To summarize, Bayes nets provide compact representations of joint probability distributions in systems with a lot of independencies (but some dependencies). In addition to the information contained in the training data, Bayesian nets allow us to incorporate any prior knowledge we have about the dependencies (and independencies) among the variables. This method of stating conditional independence that apply to subsets of the variables is less constraining than the global assumption of conditional independence made by the naive Bayes classifier. Conditional Independence Let X, Y, and Z be three discrete valued random variables where each can take on values from the domains V_ X`, V_ Y`, and V_ Z`, respectively. We say that X is conditionally independent of Y given Z provided _ a x i,y j,z k` P_ X b x i Y b y j c Z b z k` b P_ X b x i Z b z k` where x i d V_ X`, y j d V_ Y`, and z k d V_ Z`. This expression is abbreviated as P_ X Y c Z` b P_ X Z`. This definition easily extends to sets of variables as well (see text, page 185). Representation Bayesian belief networks are graphically represented by a directed acyclic graph and associated probability matrices which describe the prior and conditional probabilities of the variables. In the graph (network), each variable is represented by a node. The directed arcs between the nodes indicate that the variables are conditionally independent of its non-descendents given its immediate predecessors in the network. X is a descendent of Y if there is a directed path from Y to X. Associated with every node is a probability matrix which describes the probability distribution for that variable given the values of its immediate predecessors. Example of a Bayesian Belief Network

10 m from For the above graph, the probability matrix for the Alarm node given the events of Earthquake and Burglary might look like this: Earthquake Burglary Pe A Ve Ef g Ve Bf f Pe h A Ve Ef g Ve Bf f yes yes yes no no yes no no We can use the information in such tables to calculate the probability for any desired assignment i y 1,...,y n j to the tuple of network variables i Y 1,...,Y n j using Pe y 1,...,y n f k Inference n il 1 Pe y i Parentse y i f f. Given a specified Bayes network, we may want to infer the value of a specific variable given the observed values of the other variables. Since we are dealing with probabilities, we will likely not get a specific value. Instead we will calculate a probability distribution for the variable and then output the most probable value(s). In the above example, we were wondering where an apple tree is sick given that it is losing its leaves. The result is that chances are slightly in favor of the tree not being sick. This calculation is straightforward when the values of all of the other nodes are known, but when only a subset of the variables are known the problem becomes (NP) hard. There s a lot of research being done on methods of probabilistic inference in Bayesian nets as well as on devising effective algorithms for learning Bayesian networks from training data.

11 n Causation Another reason for the popularity of Bayesian belief networks is that they are thought to be a convenient way to represent causal knowledge. We already know that the arrows in a Bayes net indicate that the variables are conditionally independent of their non-descendents given their immediate predecessors in the network. This statement is known as the Markov condition. What does this condition imply about causation? Consider the following simple graph: This graph might represent the fact that the switch causes the light to be on. However, the issue is not this simple. Since the graph only represents statistical information, we might infer that there is some kind of causal relationship, but we don t know the details of such a relationship. For example, we consistently find that when we turn the switch on, the light comes on, however with similar consistency we find that when the light is on, so is the switch. More specifically, there are two issues concerning causation in Bayesian nets: 1. Are there unobserved common causes? That is, are there other variables, not represented in the network, that are causally related to both a node and that nodes parent. For example, consider the following two graphs: In the first (two node) graph, the fact that smoking causes cancer is represented. In the second graph, a gene which causes both smoking and cancer is represented. If such a common cause existed, but was not represented in our Bayesian network, then our inferences from the network would likely be inaccurate. In cases where we are unsure whether there is an unobserved common cause, we have two options: n Fisher suggested that controlled (randomized) experiments would help to uncover unobserved common causes. However, sometimes such experiments are impossible due to ethical constraints or where data is uncontrollable. Or, we could assume there are no unobserved common causes (as long as inferences appear accurate). Question: Are there ways, other than controlled experiments, to determine

12 if there might be common causes? (see work from CMU and by Pearl) 2. What way do the causal relationships go? That is, on the assumption that there are no unobservable common causes, how do we determine what is a cause and what is an effect? For example, given our training data, can we distinguish between the following two graphs: o p q r s t o p q r s u Notation: Let Av w B denote that A is independent of B. Yes, we can tell the difference using the independence relation. In Graph A we find that x sickv w dryy but in Graph B we find that x sickv w dry losesy. So in Graph A, if we alter Px sicky then dry wouldn t change, but in Graph B if we change Px sicky, then dry would change. Essentially, we want to determine whether or not Px sick dryy z Px sicky holds. To do so we could look at our data and determine whether the percentage of dry trees that are sick is the same as the percentage of non-dry trees that are sick.

13 A Brief History of Causation 18th Century, Philosophy, Hume: - Causality isn t real; we can t see it. - We can t distinguish cause from causation 19th Century, Philosophy, Mill: - Mill s methods for causal inference: - There are no unobserved causes. - An effect is caused by a conjunction of variables. - Methods fail for disjunctions of causes. - (1970: Winston reinvents Mill s methods) 20th Century, Philosophy, Positivism: - Causation isn t real. - Causation isn t scientific (not definable in FO Logic). - Causation is defeasible (situation dependent). Statistics, Pearson: - No cause from correlation. Fisher: - Randomized experiments can determine causes. 1980s Philosophy, Lewis-Stalnaker: - Theory of Counterfactuals and Causation - logic of "If I were to..." - We can reason about things yet to happen. Tetrad Group: - Causal Graphs Comp. Sci., Pearl: - Bayesian graphs as compact representations of probability distributions. - Causality wins 2001 Lakatos Award. - How to behave given causal information.

CSCE 478/878 Lecture 6: Bayesian Learning

CSCE 478/878 Lecture 6: Bayesian Learning Bayesian Methods Not all hypotheses are created equal (even if they are all consistent with the training data) Outline CSCE 478/878 Lecture 6: Bayesian Learning Stephen D. Scott (Adapted from Tom Mitchell

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

Bayesian Learning Features of Bayesian learning methods:

Bayesian Learning Features of Bayesian learning methods: Bayesian Learning Features of Bayesian learning methods: Each observed training example can incrementally decrease or increase the estimated probability that a hypothesis is correct. This provides a more

More information

Introduction to Artificial Intelligence. Unit # 11

Introduction to Artificial Intelligence. Unit # 11 Introduction to Artificial Intelligence Unit # 11 1 Course Outline Overview of Artificial Intelligence State Space Representation Search Techniques Machine Learning Logic Probabilistic Reasoning/Bayesian

More information

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem Recall from last time: Conditional probabilities Our probabilistic models will compute and manipulate conditional probabilities. Given two random variables X, Y, we denote by Lecture 2: Belief (Bayesian)

More information

Directed Graphical Models

Directed Graphical Models CS 2750: Machine Learning Directed Graphical Models Prof. Adriana Kovashka University of Pittsburgh March 28, 2017 Graphical Models If no assumption of independence is made, must estimate an exponential

More information

Introduction to Probabilistic Reasoning. Image credit: NASA. Assignment

Introduction to Probabilistic Reasoning. Image credit: NASA. Assignment Introduction to Probabilistic Reasoning Brian C. Williams 16.410/16.413 November 17 th, 2010 11/17/10 copyright Brian Williams, 2005-10 1 Brian C. Williams, copyright 2000-09 Image credit: NASA. Assignment

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Chapter 8&9: Classification: Part 3 Instructor: Yizhou Sun yzsun@ccs.neu.edu March 12, 2013 Midterm Report Grade Distribution 90-100 10 80-89 16 70-79 8 60-69 4

More information

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 Plan for today and next week Today and next time: Bayesian networks (Bishop Sec. 8.1) Conditional

More information

Bayesian Learning. Examples. Conditional Probability. Two Roles for Bayesian Methods. Prior Probability and Random Variables. The Chain Rule P (B)

Bayesian Learning. Examples. Conditional Probability. Two Roles for Bayesian Methods. Prior Probability and Random Variables. The Chain Rule P (B) Examples My mood can take 2 possible values: happy, sad. The weather can take 3 possible vales: sunny, rainy, cloudy My friends know me pretty well and say that: P(Mood=happy Weather=rainy) = 0.25 P(Mood=happy

More information

Probabilistic Classification

Probabilistic Classification Bayesian Networks Probabilistic Classification Goal: Gather Labeled Training Data Build/Learn a Probability Model Use the model to infer class labels for unlabeled data points Example: Spam Filtering...

More information

The Naïve Bayes Classifier. Machine Learning Fall 2017

The Naïve Bayes Classifier. Machine Learning Fall 2017 The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning

More information

Introduction to Bayesian Learning

Introduction to Bayesian Learning Course Information Introduction Introduction to Bayesian Learning Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Apprendimento Automatico: Fondamenti - A.A. 2016/2017 Outline

More information

Bayesian Inference. Definitions from Probability: Naive Bayes Classifiers: Advantages and Disadvantages of Naive Bayes Classifiers:

Bayesian Inference. Definitions from Probability: Naive Bayes Classifiers: Advantages and Disadvantages of Naive Bayes Classifiers: Bayesian Inference The purpose of this document is to review belief networks and naive Bayes classifiers. Definitions from Probability: Belief networks: Naive Bayes Classifiers: Advantages and Disadvantages

More information

Uncertainty. Variables. assigns to each sentence numerical degree of belief between 0 and 1. uncertainty

Uncertainty. Variables. assigns to each sentence numerical degree of belief between 0 and 1. uncertainty Bayes Classification n Uncertainty & robability n Baye's rule n Choosing Hypotheses- Maximum a posteriori n Maximum Likelihood - Baye's concept learning n Maximum Likelihood of real valued function n Bayes

More information

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty Lecture 10: Introduction to reasoning under uncertainty Introduction to reasoning under uncertainty Review of probability Axioms and inference Conditional probability Probability distributions COMP-424,

More information

Conditional probabilities and graphical models

Conditional probabilities and graphical models Conditional probabilities and graphical models Thomas Mailund Bioinformatics Research Centre (BiRC), Aarhus University Probability theory allows us to describe uncertainty in the processes we model within

More information

Stephen Scott.

Stephen Scott. 1 / 28 ian ian Optimal (Adapted from Ethem Alpaydin and Tom Mitchell) Naïve Nets sscott@cse.unl.edu 2 / 28 ian Optimal Naïve Nets Might have reasons (domain information) to favor some hypotheses/predictions

More information

Probability. CS 3793/5233 Artificial Intelligence Probability 1

Probability. CS 3793/5233 Artificial Intelligence Probability 1 CS 3793/5233 Artificial Intelligence 1 Motivation Motivation Random Variables Semantics Dice Example Joint Dist. Ex. Axioms Agents don t have complete knowledge about the world. Agents need to make decisions

More information

Notes on Machine Learning for and

Notes on Machine Learning for and Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Classification: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network

More information

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction 15-0: Learning vs. Deduction Artificial Intelligence Programming Bayesian Learning Chris Brooks Department of Computer Science University of San Francisco So far, we ve seen two types of reasoning: Deductive

More information

Probabilistic Models

Probabilistic Models Bayes Nets 1 Probabilistic Models Models describe how (a portion of) the world works Models are always simplifications May not account for every variable May not account for all interactions between variables

More information

CSCE 478/878 Lecture 6: Bayesian Learning and Graphical Models. Stephen Scott. Introduction. Outline. Bayes Theorem. Formulas

CSCE 478/878 Lecture 6: Bayesian Learning and Graphical Models. Stephen Scott. Introduction. Outline. Bayes Theorem. Formulas ian ian ian Might have reasons (domain information) to favor some hypotheses/predictions over others a priori ian methods work with probabilities, and have two main roles: Naïve Nets (Adapted from Ethem

More information

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine CS 484 Data Mining Classification 7 Some slides are from Professor Padhraic Smyth at UC Irvine Bayesian Belief networks Conditional independence assumption of Naïve Bayes classifier is too strong. Allows

More information

Introduction to Bayesian Learning. Machine Learning Fall 2018

Introduction to Bayesian Learning. Machine Learning Fall 2018 Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability

More information

CS 188: Artificial Intelligence Fall 2009

CS 188: Artificial Intelligence Fall 2009 CS 188: Artificial Intelligence Fall 2009 Lecture 14: Bayes Nets 10/13/2009 Dan Klein UC Berkeley Announcements Assignments P3 due yesterday W2 due Thursday W1 returned in front (after lecture) Midterm

More information

Lecture 9: Bayesian Learning

Lecture 9: Bayesian Learning Lecture 9: Bayesian Learning Cognitive Systems II - Machine Learning Part II: Special Aspects of Concept Learning Bayes Theorem, MAL / ML hypotheses, Brute-force MAP LEARNING, MDL principle, Bayes Optimal

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Bayesian Learning. CSL603 - Fall 2017 Narayanan C Krishnan

Bayesian Learning. CSL603 - Fall 2017 Narayanan C Krishnan Bayesian Learning CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Bayes Theorem MAP Learners Bayes optimal classifier Naïve Bayes classifier Example text classification Bayesian networks

More information

Probabilistic Graphical Networks: Definitions and Basic Results

Probabilistic Graphical Networks: Definitions and Basic Results This document gives a cursory overview of Probabilistic Graphical Networks. The material has been gleaned from different sources. I make no claim to original authorship of this material. Bayesian Graphical

More information

Bayesian Learning. Two Roles for Bayesian Methods. Bayes Theorem. Choosing Hypotheses

Bayesian Learning. Two Roles for Bayesian Methods. Bayes Theorem. Choosing Hypotheses Bayesian Learning Two Roles for Bayesian Methods Probabilistic approach to inference. Quantities of interest are governed by prob. dist. and optimal decisions can be made by reasoning about these prob.

More information

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic CS 188: Artificial Intelligence Fall 2008 Lecture 16: Bayes Nets III 10/23/2008 Announcements Midterms graded, up on glookup, back Tuesday W4 also graded, back in sections / box Past homeworks in return

More information

Uncertainty. Logic and Uncertainty. Russell & Norvig. Readings: Chapter 13. One problem with logical-agent approaches: C:145 Artificial

Uncertainty. Logic and Uncertainty. Russell & Norvig. Readings: Chapter 13. One problem with logical-agent approaches: C:145 Artificial C:145 Artificial Intelligence@ Uncertainty Readings: Chapter 13 Russell & Norvig. Artificial Intelligence p.1/43 Logic and Uncertainty One problem with logical-agent approaches: Agents almost never have

More information

Naïve Bayes Classifiers

Naïve Bayes Classifiers Naïve Bayes Classifiers Example: PlayTennis (6.9.1) Given a new instance, e.g. (Outlook = sunny, Temperature = cool, Humidity = high, Wind = strong ), we want to compute the most likely hypothesis: v NB

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning

CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning CS 446 Machine Learning Fall 206 Nov 0, 206 Bayesian Learning Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes Overview Bayesian Learning Naive Bayes Logistic Regression Bayesian Learning So far, we

More information

Bayesian Learning Extension

Bayesian Learning Extension Bayesian Learning Extension This document will go over one of the most useful forms of statistical inference known as Baye s Rule several of the concepts that extend from it. Named after Thomas Bayes this

More information

CS 5522: Artificial Intelligence II

CS 5522: Artificial Intelligence II CS 5522: Artificial Intelligence II Bayes Nets: Independence Instructor: Alan Ritter Ohio State University [These slides were adapted from CS188 Intro to AI at UC Berkeley. All materials available at http://ai.berkeley.edu.]

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Review: Bayesian learning and inference

Review: Bayesian learning and inference Review: Bayesian learning and inference Suppose the agent has to make decisions about the value of an unobserved query variable X based on the values of an observed evidence variable E Inference problem:

More information

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Probabilistic Reasoning. (Mostly using Bayesian Networks) Probabilistic Reasoning (Mostly using Bayesian Networks) Introduction: Why probabilistic reasoning? The world is not deterministic. (Usually because information is limited.) Ways of coping with uncertainty

More information

Introduction to Causal Calculus

Introduction to Causal Calculus Introduction to Causal Calculus Sanna Tyrväinen University of British Columbia August 1, 2017 1 / 1 2 / 1 Bayesian network Bayesian networks are Directed Acyclic Graphs (DAGs) whose nodes represent random

More information

Relationship between Least Squares Approximation and Maximum Likelihood Hypotheses

Relationship between Least Squares Approximation and Maximum Likelihood Hypotheses Relationship between Least Squares Approximation and Maximum Likelihood Hypotheses Steven Bergner, Chris Demwell Lecture notes for Cmpt 882 Machine Learning February 19, 2004 Abstract In these notes, a

More information

Lecture 5: Bayesian Network

Lecture 5: Bayesian Network Lecture 5: Bayesian Network Topics of this lecture What is a Bayesian network? A simple example Formal definition of BN A slightly difficult example Learning of BN An example of learning Important topics

More information

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2016/2017 Lesson 13 24 march 2017 Reasoning with Bayesian Networks Naïve Bayesian Systems...2 Example

More information

Uncertainty and Bayesian Networks

Uncertainty and Bayesian Networks Uncertainty and Bayesian Networks Tutorial 3 Tutorial 3 1 Outline Uncertainty Probability Syntax and Semantics for Uncertainty Inference Independence and Bayes Rule Syntax and Semantics for Bayesian Networks

More information

CS 188: Artificial Intelligence Fall 2008

CS 188: Artificial Intelligence Fall 2008 CS 188: Artificial Intelligence Fall 2008 Lecture 14: Bayes Nets 10/14/2008 Dan Klein UC Berkeley 1 1 Announcements Midterm 10/21! One page note sheet Review sessions Friday and Sunday (similar) OHs on

More information

Topics. Bayesian Learning. What is Bayesian Learning? Objectives for Bayesian Learning

Topics. Bayesian Learning. What is Bayesian Learning? Objectives for Bayesian Learning Topics Bayesian Learning Sattiraju Prabhakar CS898O: ML Wichita State University Objectives for Bayesian Learning Bayes Theorem and MAP Bayes Optimal Classifier Naïve Bayes Classifier An Example Classifying

More information

Structure learning in human causal induction

Structure learning in human causal induction Structure learning in human causal induction Joshua B. Tenenbaum & Thomas L. Griffiths Department of Psychology Stanford University, Stanford, CA 94305 jbt,gruffydd @psych.stanford.edu Abstract We use

More information

Bayesian Classification. Bayesian Classification: Why?

Bayesian Classification. Bayesian Classification: Why? Bayesian Classification http://css.engineering.uiowa.edu/~comp/ Bayesian Classification: Why? Probabilistic learning: Computation of explicit probabilities for hypothesis, among the most practical approaches

More information

Bayes Nets III: Inference

Bayes Nets III: Inference 1 Hal Daumé III (me@hal3.name) Bayes Nets III: Inference Hal Daumé III Computer Science University of Maryland me@hal3.name CS 421: Introduction to Artificial Intelligence 10 Apr 2012 Many slides courtesy

More information

Causality. Pedro A. Ortega. 18th February Computational & Biological Learning Lab University of Cambridge

Causality. Pedro A. Ortega. 18th February Computational & Biological Learning Lab University of Cambridge Causality Pedro A. Ortega Computational & Biological Learning Lab University of Cambridge 18th February 2010 Why is causality important? The future of machine learning is to control (the world). Examples

More information

PROBABILISTIC REASONING SYSTEMS

PROBABILISTIC REASONING SYSTEMS PROBABILISTIC REASONING SYSTEMS In which we explain how to build reasoning systems that use network models to reason with uncertainty according to the laws of probability theory. Outline Knowledge in uncertain

More information

Chapter 16. Structured Probabilistic Models for Deep Learning

Chapter 16. Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe

More information

CS 5522: Artificial Intelligence II

CS 5522: Artificial Intelligence II CS 5522: Artificial Intelligence II Bayes Nets Instructor: Alan Ritter Ohio State University [These slides were adapted from CS188 Intro to AI at UC Berkeley. All materials available at http://ai.berkeley.edu.]

More information

Ch.6 Uncertain Knowledge. Logic and Uncertainty. Representation. One problem with logical approaches: Department of Computer Science

Ch.6 Uncertain Knowledge. Logic and Uncertainty. Representation. One problem with logical approaches: Department of Computer Science Ch.6 Uncertain Knowledge Representation Hantao Zhang http://www.cs.uiowa.edu/ hzhang/c145 The University of Iowa Department of Computer Science Artificial Intelligence p.1/39 Logic and Uncertainty One

More information

Summary of the Bayes Net Formalism. David Danks Institute for Human & Machine Cognition

Summary of the Bayes Net Formalism. David Danks Institute for Human & Machine Cognition Summary of the Bayes Net Formalism David Danks Institute for Human & Machine Cognition Bayesian Networks Two components: 1. Directed Acyclic Graph (DAG) G: There is a node for every variable D: Some nodes

More information

Bayesian Networks Basic and simple graphs

Bayesian Networks Basic and simple graphs Bayesian Networks Basic and simple graphs Ullrika Sahlin, Centre of Environmental and Climate Research Lund University, Sweden Ullrika.Sahlin@cec.lu.se http://www.cec.lu.se/ullrika-sahlin Bayesian [Belief]

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

COMP5211 Lecture Note on Reasoning under Uncertainty

COMP5211 Lecture Note on Reasoning under Uncertainty COMP5211 Lecture Note on Reasoning under Uncertainty Fangzhen Lin Department of Computer Science and Engineering Hong Kong University of Science and Technology Fangzhen Lin (HKUST) Uncertainty 1 / 33 Uncertainty

More information

Probabilistic Models. Models describe how (a portion of) the world works

Probabilistic Models. Models describe how (a portion of) the world works Probabilistic Models Models describe how (a portion of) the world works Models are always simplifications May not account for every variable May not account for all interactions between variables All models

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning CS4375 --- Fall 2018 Bayesian a Learning Reading: Sections 13.1-13.6, 20.1-20.2, R&N Sections 6.1-6.3, 6.7, 6.9, Mitchell 1 Uncertainty Most real-world problems deal with

More information

Bayes Nets: Independence

Bayes Nets: Independence Bayes Nets: Independence [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://ai.berkeley.edu.] Bayes Nets A Bayes

More information

Informatics 2D Reasoning and Agents Semester 2,

Informatics 2D Reasoning and Agents Semester 2, Informatics 2D Reasoning and Agents Semester 2, 2017 2018 Alex Lascarides alex@inf.ed.ac.uk Lecture 23 Probabilistic Reasoning with Bayesian Networks 15th March 2018 Informatics UoE Informatics 2D 1 Where

More information

Introduction to Machine Learning

Introduction to Machine Learning Uncertainty Introduction to Machine Learning CS4375 --- Fall 2018 a Bayesian Learning Reading: Sections 13.1-13.6, 20.1-20.2, R&N Sections 6.1-6.3, 6.7, 6.9, Mitchell Most real-world problems deal with

More information

Bayesian Learning. Bayesian Learning Criteria

Bayesian Learning. Bayesian Learning Criteria Bayesian Learning In Bayesian learning, we are interested in the probability of a hypothesis h given the dataset D. By Bayes theorem: P (h D) = P (D h)p (h) P (D) Other useful formulas to remember are:

More information

CSE 473: Artificial Intelligence Autumn 2011

CSE 473: Artificial Intelligence Autumn 2011 CSE 473: Artificial Intelligence Autumn 2011 Bayesian Networks Luke Zettlemoyer Many slides over the course adapted from either Dan Klein, Stuart Russell or Andrew Moore 1 Outline Probabilistic models

More information

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements CS 188: Artificial Intelligence Spring 2011 Lecture 16: Bayes Nets IV Inference 3/28/2011 Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew Moore Announcements

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Bayes Nets Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188

More information

Consider an experiment that may have different outcomes. We are interested to know what is the probability of a particular set of outcomes.

Consider an experiment that may have different outcomes. We are interested to know what is the probability of a particular set of outcomes. CMSC 310 Artificial Intelligence Probabilistic Reasoning and Bayesian Belief Networks Probabilities, Random Variables, Probability Distribution, Conditional Probability, Joint Distributions, Bayes Theorem

More information

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004 A Brief Introduction to Graphical Models Presenter: Yijuan Lu November 12,2004 References Introduction to Graphical Models, Kevin Murphy, Technical Report, May 2001 Learning in Graphical Models, Michael

More information

Data Mining Part 4. Prediction

Data Mining Part 4. Prediction Data Mining Part 4. Prediction 4.3. Fall 2009 Instructor: Dr. Masoud Yaghini Outline Introduction Bayes Theorem Naïve References Introduction Bayesian classifiers A statistical classifiers Introduction

More information

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1 Bayes Networks CS540 Bryan R Gibson University of Wisconsin-Madison Slides adapted from those used by Prof. Jerry Zhu, CS540-1 1 / 59 Outline Joint Probability: great for inference, terrible to obtain

More information

1 : Introduction. 1 Course Overview. 2 Notation. 3 Representing Multivariate Distributions : Probabilistic Graphical Models , Spring 2014

1 : Introduction. 1 Course Overview. 2 Notation. 3 Representing Multivariate Distributions : Probabilistic Graphical Models , Spring 2014 10-708: Probabilistic Graphical Models 10-708, Spring 2014 1 : Introduction Lecturer: Eric P. Xing Scribes: Daniel Silva and Calvin McCarter 1 Course Overview In this lecture we introduce the concept of

More information

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements CS 188: Artificial Intelligence Spring 2011 Lecture 14: Bayes Nets II Independence 3/9/2011 Pieter Abbeel UC Berkeley Many slides over the course adapted from Dan Klein, Stuart Russell, Andrew Moore Announcements

More information

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence General overview Introduction Directed acyclic graphs (DAGs) and conditional independence DAGs and causal effects

More information

COMP61011! Probabilistic Classifiers! Part 1, Bayes Theorem!

COMP61011! Probabilistic Classifiers! Part 1, Bayes Theorem! COMP61011 Probabilistic Classifiers Part 1, Bayes Theorem Reverend Thomas Bayes, 1702-1761 p ( T W ) W T ) T ) W ) Bayes Theorem forms the backbone of the past 20 years of ML research into probabilistic

More information

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018 Bayesian networks CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018 Soleymani Slides have been adopted from Klein and Abdeel, CS188, UC Berkeley. Outline Probability

More information

Abstract. Three Methods and Their Limitations. N-1 Experiments Suffice to Determine the Causal Relations Among N Variables

Abstract. Three Methods and Their Limitations. N-1 Experiments Suffice to Determine the Causal Relations Among N Variables N-1 Experiments Suffice to Determine the Causal Relations Among N Variables Frederick Eberhardt Clark Glymour 1 Richard Scheines Carnegie Mellon University Abstract By combining experimental interventions

More information

MODULE -4 BAYEIAN LEARNING

MODULE -4 BAYEIAN LEARNING MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities

More information

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics Implementing Machine Reasoning using Bayesian Network in Big Data Analytics Steve Cheng, Ph.D. Guest Speaker for EECS 6893 Big Data Analytics Columbia University October 26, 2017 Outline Introduction Probability

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

Conditional Independence

Conditional Independence Conditional Independence Sargur Srihari srihari@cedar.buffalo.edu 1 Conditional Independence Topics 1. What is Conditional Independence? Factorization of probability distribution into marginals 2. Why

More information

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas Bayesian Networks Vibhav Gogate The University of Texas at Dallas Intro to AI (CS 6364) Many slides over the course adapted from either Dan Klein, Luke Zettlemoyer, Stuart Russell or Andrew Moore 1 Outline

More information

CHAPTER-17. Decision Tree Induction

CHAPTER-17. Decision Tree Induction CHAPTER-17 Decision Tree Induction 17.1 Introduction 17.2 Attribute selection measure 17.3 Tree Pruning 17.4 Extracting Classification Rules from Decision Trees 17.5 Bayesian Classification 17.6 Bayes

More information

Behavioral Data Mining. Lecture 2

Behavioral Data Mining. Lecture 2 Behavioral Data Mining Lecture 2 Autonomy Corp Bayes Theorem Bayes Theorem P(A B) = probability of A given that B is true. P(A B) = P(B A)P(A) P(B) In practice we are most interested in dealing with events

More information

Quantifying uncertainty & Bayesian networks

Quantifying uncertainty & Bayesian networks Quantifying uncertainty & Bayesian networks CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2016 Soleymani Artificial Intelligence: A Modern Approach, 3 rd Edition,

More information

Lecture 2: Foundations of Concept Learning

Lecture 2: Foundations of Concept Learning Lecture 2: Foundations of Concept Learning Cognitive Systems II - Machine Learning WS 2005/2006 Part I: Basic Approaches to Concept Learning Version Space, Candidate Elimination, Inductive Bias Lecture

More information

Bayes Nets. CS 188: Artificial Intelligence Fall Example: Alarm Network. Bayes Net Semantics. Building the (Entire) Joint. Size of a Bayes Net

Bayes Nets. CS 188: Artificial Intelligence Fall Example: Alarm Network. Bayes Net Semantics. Building the (Entire) Joint. Size of a Bayes Net CS 188: Artificial Intelligence Fall 2010 Lecture 15: ayes Nets II Independence 10/14/2010 an Klein UC erkeley A ayes net is an efficient encoding of a probabilistic model of a domain ayes Nets Questions

More information

Gibbs Field & Markov Random Field

Gibbs Field & Markov Random Field Statistical Techniques in Robotics (16-831, F12) Lecture#07 (Wednesday September 19) Gibbs Field & Markov Random Field Lecturer: Drew Bagnell Scribe:Minghao Ruan 1 1 SLAM (continued from last lecture)

More information

Introduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees

Introduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees Introduction to ML Two examples of Learners: Naïve Bayesian Classifiers Decision Trees Why Bayesian learning? Probabilistic learning: Calculate explicit probabilities for hypothesis, among the most practical

More information

1. what conditional independencies are implied by the graph. 2. whether these independecies correspond to the probability distribution

1. what conditional independencies are implied by the graph. 2. whether these independecies correspond to the probability distribution NETWORK ANALYSIS Lourens Waldorp PROBABILITY AND GRAPHS The objective is to obtain a correspondence between the intuitive pictures (graphs) of variables of interest and the probability distributions of

More information

Our learning goals for Bayesian Nets

Our learning goals for Bayesian Nets Our learning goals for Bayesian Nets Probability background: probability spaces, conditional probability, Bayes Theorem, subspaces, random variables, joint distribution. The concept of conditional independence

More information

Reasoning Under Uncertainty: Bayesian networks intro

Reasoning Under Uncertainty: Bayesian networks intro Reasoning Under Uncertainty: Bayesian networks intro Alan Mackworth UBC CS 322 Uncertainty 4 March 18, 2013 Textbook 6.3, 6.3.1, 6.5, 6.5.1, 6.5.2 Lecture Overview Recap: marginal and conditional independence

More information

Computer Science CPSC 322. Lecture 18 Marginalization, Conditioning

Computer Science CPSC 322. Lecture 18 Marginalization, Conditioning Computer Science CPSC 322 Lecture 18 Marginalization, Conditioning Lecture Overview Recap Lecture 17 Joint Probability Distribution, Marginalization Conditioning Inference by Enumeration Bayes Rule, Chain

More information

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013 Introduction to Bayes Nets CS 486/686: Introduction to Artificial Intelligence Fall 2013 1 Introduction Review probabilistic inference, independence and conditional independence Bayesian Networks - - What

More information

Uncertainty. Introduction to Artificial Intelligence CS 151 Lecture 2 April 1, CS151, Spring 2004

Uncertainty. Introduction to Artificial Intelligence CS 151 Lecture 2 April 1, CS151, Spring 2004 Uncertainty Introduction to Artificial Intelligence CS 151 Lecture 2 April 1, 2004 Administration PA 1 will be handed out today. There will be a MATLAB tutorial tomorrow, Friday, April 2 in AP&M 4882 at

More information

Approximate Inference

Approximate Inference Approximate Inference Simulation has a name: sampling Sampling is a hot topic in machine learning, and it s really simple Basic idea: Draw N samples from a sampling distribution S Compute an approximate

More information

The Origin of Deep Learning. Lili Mou Jan, 2015

The Origin of Deep Learning. Lili Mou Jan, 2015 The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets

More information