Bayesian Learning. Examples. Conditional Probability. Two Roles for Bayesian Methods. Prior Probability and Random Variables. The Chain Rule P (B)
|
|
- Shana Sullivan
- 5 years ago
- Views:
Transcription
1 Examples My mood can take 2 possible values: happy, sad. The weather can take 3 possible vales: sunny, rainy, cloudy My friends know me pretty well and say that: P(Mood=happy Weather=rainy) = 0.25 P(Mood=happy Weather=sunny) =0.4 P(Mood=happy Weather=cloudy) = 0.05 Can you compute: P(Mood=happy)? P(Mood=sad)? P(Weather=rainy)? Bayesian learning algorithms use probability theory as an approach to concept classification. Bayesian approaches typically do not generate an explicit hypothesis. Statistical data is used to represent implicit hypotheses. Bayesian classifiers generally produce probabilities for (possibly multiple) class assignments, rather than a single definite classification. To use Bayesian techniques, we often need to make independence assumptions that often aren t valid (but we make them anyway!). 4 1 Conditional Probability P(A B) represents the probability of A given that B isknowntobetrue. We call this a conditional or posterior probability. P(A B) = 1 is equivalent to B A. For example, suppose Rover rarely howls: P(Rover howls) = 0.01 But when there is a full moon, he always howls! Then P(Rover howls full moon) =1.0 Two Roles for Bayesian Methods 1. Provides practical learning algorithms Naive Bayes learning Bayesian belief network learning Combine prior knowledge (prior probabilities) with observed data Requires prior probabilities 2. Provides useful conceptual framework Provides gold standard for evaluating other learning algorithms Additional insight into Occam s razor 5 2 The Chain Rule Prior Probability and Random Variables P (A B) = P (A B) P (B) We can rewrite this as: P (A B) =P (A B) P (B) which is called the chain rule because we can chain together probabilities to compute the likelihood of conjunctions. Example: P (A) =0.5 P (B A) =0.6 P (C A B) =0.8 What is P (A B C)? P(A) represents the prior or unconditional probability that statement A is true, in the absence of other information. A random variable represents different outcomes of an event. Each random variable has a domain of possible values x 1...x n,which each have a probability. The probabilities for all possible outcomes must sum to 1. For example: P(Disease=CAVITY) = 0.5 P(Disease=GUM DISEASE) = 0.3 P(Disease=IMPACTED TOOTH) = 0.1 P(Disease=ROOT INFECTION) =
2 Basic Formulas for Probabilities Bayes Theorem applied to machine learning P (h D) = P (h) = prior probability of hypothesis h = prior probability of training data D P (h D) = probability of h given D P (D h) = probability of D given h Product (or Chain ) Rule: probability of a conjunction of two events: P (A B) =P (A B)P (B) =P (B A)P (A) Sum Rule: probability of a disjunction of two events: P (A B) =P (A)+P(B) P (A B) Theorem of total probability: if events A 1,...,A n are mutually n exclusive with P (A i )=1, then n P (B) = P (B A i )P (A i ) 10 7 Bayes Rule Increasing Probability with more Data Simple form: P( h) P(h D1) P(h D1,D2) P (A B) = P (A) P (B A) P (B) hypotheses hypotheses hypotheses ( a) ( b) ( c) General form: P (A B,X) = P (A X) P (B A,X) P (B X) 11 8 Choosing Hypotheses P (h D) = Generally want the most probable hypothesis given the training data. Maximum a posteriori hypothesis h MAP : P (h D) Bayes Rule Example Let S = patient has a stiff neck, M = disease is meningitis. Assume that: P(S M) =0.5 P(M) = 1/5000 P(S) = 1/20 If assume P (h i )=P(h j ) then can further simplify. The Maximum Likelihood hypothesis: h ML = argmax P (D h i ) P(M S) =? 12 9
3 Most Probable Classification of New Instances So far we ve sought the most probable hypothesis given the data D (i.e., h MAP ) Given new instance x, what is its most probable classification? h MAP (x) is not the most probable classification! Consider: Three possible hypotheses: P (h 1 D) =.4, P(h 2 D) =.3, P(h 3 D) =.3 Given new instance x h 1 (x) =+,h 2 (x) =, h 3 (x) = Brute Force MAP Hypothesis Learner For each hypothesis h in H, calculate the posterior probability: P (h D) = Output the hypothesis h MAP with the highest posterior probability: P (h D) What s most probable classification of x? Bayes Optimal Classifier Example: therefore and argmax P (h i D)P (c j h i ) P (h 1 D) =.4, P( h 1 )=0, P(+ h 1 )=1 P (h 2 D) =.3, P( h 2 )=1, P(+ h 2 )=0 P (h 3 D) =.3, P( h 3 )=1, P(+ h 3 )=0 P (h i D)P (+ h i ) =.4 P (h i D)P ( h i ) =.6 arg max v j V P (v j h i )P (h i D) = Relation to Concept Learning Consider our usual concept learning task Instance space X, hypothesis space H, training examples D Consider the FINDS learning algorithm (outputs most specific hypothesis from the version space VS H,D ) What would Bayes rule produce as the MAP hypothesis? Does FINDS output a MAP hypothesis?? Naive Bayes Classifier One of the most practical learning methods, along with decision trees, neural networks, nearest neighbor methods, etc. Requires: 1. Moderate or large training set. 2. Attributes that describe instances should be conditionally independent given the classification. Successful applications include diagnosis and text classification. Relation to Concept Learning, II Assume fixed set of instances x 1,...,x n Assume D is the set of classifications c(x 1 ),...,c(x n) Choose P (D h) P (D h) =1if h consistent with D P (D h) =0otherwise Choose P (h) to be uniform distribution: P (h) = 1 for all h H H Then, 1 if h is consistent with D P (h D) = VS H,D 0 otherwise 18 15
4 Sparse Data What if none of the training instances with class c j have attribute value a i? Then ˆP (a i c j )=0 Naive Bayes Classifier Assume target function f : X C, where each instance x is described by attributes a 1,a 2...a n. so ˆP (c j ) i ˆP (a i c j )=0!!! Most probable value of f(x) is: Typical solution is m-estimate for ˆP (a i c j ) nc + mp ˆP (a i c j ) n + m where n is number of training examples for which c = c j n c number of examples for which c = c j and a = a i p is prior estimate for ˆP (a i c j ) m is weight given to prior (i.e. # of virtual examples) c MAP = argmax c MAP = argmax c MAP = argmax P (c j a 1,a 2...a n) P (a 1,a 2...a n c j )P (c j ) P (a 1,a 2...a n) P (a 1,a 2...a n c j )P (c j ) Naive Bayes Example Consider a medical diagnosis problem with 3 possible diagnoses (Well, Cold, Allergy) and 3 symptoms (sneeze, cough, fever). Diagnosis Well Cold Allergy P (d) P (sneeze d) P (cough d) P (fever d) P (well sneeze cough fever)= Naive Bayes assumption To make the problem tractable, we often need to make the following independence assumption: P (a 1,a 2...a n c j )= P (a i c j ) i which allows us to define the Naive Bayes Classifier: c NB = argmax P (c j ) i P (a i c j ) Example, II Diagnosis Well Cold Allergy P (d) P (sneeze d) P (cough d) P (fever d) P (cold sneeze cough fever)= Naive Bayes Algorithm Naive Bayes Learn(examples) For each possible class c j ˆP (c j ) estimate P (c j ) For each attribute value a i of each attribute a ˆP (a i c j ) estimate P (a i c j ) P (allergy sneeze cough fever)= P (sneeze cough fever)= Classify New Instance(x) c NB = argmax ˆP (c j ) ˆP (a i c j ) a i x 24 21
5 Bayesian Belief Networks Naive Bayes conditional independence assumption sometimes still too restrictive But it s impossible to estimate all parameters without some such assumptions Bayesian Belief Networks describe conditional independence among subsets of variables Allows combining prior knowledge about (in)dependencies with observed training data Comments on Naive Bayes Generally works quite well despite the blanket assumption of independence Experiments show it can be quite competitive with other methods (C4.5) on standard data sets Although it does not estimate probabilities accurately when independence is violated, still often picks correct category Does not perform search! Has a fairly strong bias Hypothesis not guaranteed to fit the training data Handles noise well Conditional Independence Definition: X is conditionally independent of Y given Z if the probability distribution governing X is independent of the value of Y given the value of Z; that is, if ( x i,y j,z k )P (X = x i Y = y j,z = z k )=P(X = x i Z = z k ) Abbreviated: P (X Y,Z)=P (X Z) Example: Thunder is conditionally independent of Rain, given Lightning P (Thunder Rain, Lightning) =P (Thunder Lightning) Naive Bayes uses this to justify: P (X, Y Z) =P (X Y,Z)=P (X Z)P (Y Z) Minimum Description Length Principle Occam s razor: prefer the shortest hypothesis MDL: prefer the hypothesis that minimizes: h MDL = argmin L C 1 (h)+l C2 (D h) where L C (x) is the description length of x under encoding C For example: H=decision trees, D=labels from training data L C1 (h) is # bits to describe tree h L C2 (D h) is # bits to describe D given h Note: L C2 (D h) =0if examples classified perfectly by h. Need only describe exceptions Hence h MDL trades off tree size for training errors Bayesian Belief Network MDL, Continued Lightning Storm BusTourGroup Campfire C C S,B S, B S,B S, B h MAP = argmin log 2 P (D h)+log 2 P (h) log 2 P (D h) log 2 P (h) (1) Thunder ForestFire Campfire Network represents a set of conditional independence assertions: Each node is asserted to be conditionally independent of its nondescendants, given its immediate predecessors Directed acyclic graph Each node has an associated conditional probability table Interesting fact from information theory: The optimal (shortest expected coding length) code for an event with probability p is log 2 p bits. So interpret (1): log 2 P (h) is length of h under optimal code log 2 P (D h) is length of D given h under optimal code By Occam s razor, prefer the hypothesis that minimizes length(h) + length(misclassifications) 30 27
6 Summary: Bayesian Belief Networks Combine prior knowledge with observed data Impact of prior knowledge (when correct!) is to lower the sample complexity Active research area: Extend from boolean to real-valued variables Parameterized distributions instead of tables Extend to first-order instead of propositional systems More effective inference methods... Interpretation The net and its associated table represents the joint probability distribution over all variables and their values In general, n P (y 1,...,y k )= P (y i Parents(Y i )) where Parents(Y i ) denotes immediate predecessors of Y i in the graph Therefore, joint distribution is fully defined by graph, plus the conditional probabilities Example: P (S, L, T, C, F, B) =? Statistical Speech Recognition Model What is the most likely sequence of words that the speech signal represents? P (words signal) = P (words)p (signal words) P (signal) P (words) is the language model. For example, fat cat is more likely than hat cat. P (signal words) is the acoustic model. For example, cat is likely to be pronounced as kæt. Learning of Bayesian Networks Several variants of this learning task Network structure might be known or unknown Training examples might provide values of all network variables, or just some If structure known and all variables observed Then it reduces to training as with a Naive Bayes classifier P (signal) is the likelihood of the speech signal, but this is constant across all interpretations of the signal so we can ignore it The Language Model (P (words)) Ideally, we d like a complete model of the English language to tell us the exact probability of a sentence. But there is no complete model of English. (Or any other natural language.) You d need to know the probability of every sentence that could ever be uttered! A grammar should be helpful, but there is no complete grammar for English and spoken language is notoriously ungrammatical anyway. Learning Bayes Nets Suppose structure known, variables partially observable Similar to training neural network with hidden units In fact, can learn Bayes Net conditional probability tables using gradient ascent! Converge to network h that (locally) maximizes P (D h) Statistical speech recognition systems approximate the likelihood of sentences by collecting n-gram statistics of word usage from large spoken language corpora
7 Summary: Probabilistic inference lends power and flexibility, can incorporate prior probabilistic knowledge Can make statistically invalid assumptions and still get reasonably good results (Naive Bayes) Provides insights into algorithms that don t even use probabilities (explicitly) Bayesian Belief Networks let us incorporate even more types of prior knowledge (than standard Bayes techniques) Many techniques that have proven their success in many application areas N-grams A unigram represents a single word. We estimate: P (w) = frequency of w in corpus / number of words in corpus. A bigram represents a pair of sequential words. We estimate: P (w 2 w 1) = frequency of w 2 following w 1 / frequency of w 1. A trigram represents a triple of sequential words. We estimate: P (w 3 w 1,w 2) = frequency of w 3 following w 1,w 2 / frequency of w 1, Unigram/Bigram Statistics Word Unigram Previous words Count OF IN IS ON TO FROM THAT WITH LINE VISION THE ON OF TO IS A THAT WE LINE VISION The N-gram language model Ideally, we d like to compute: P (w 1...w n)=p (w 1 )P (w 2 w 1 )P (w 3 w 1 w 2 )...P (w n w 1...w n 1 ) But we rarely have the data necessary to compute those statistics reliably. So we estimate by making independence assumptions and using bigrams (or trigams). The bigram model is: P (w 1...w n)=p (w 1 )P (w 2 w 1 )P (w 3 w 2 )...P (w n w n 1 ) n P (w 1...w n)= P (w i w i 1 ) Does amazingly well at predicting the next word! 39
CSCE 478/878 Lecture 6: Bayesian Learning
Bayesian Methods Not all hypotheses are created equal (even if they are all consistent with the training data) Outline CSCE 478/878 Lecture 6: Bayesian Learning Stephen D. Scott (Adapted from Tom Mitchell
More informationBayesian Learning. Two Roles for Bayesian Methods. Bayes Theorem. Choosing Hypotheses
Bayesian Learning Two Roles for Bayesian Methods Probabilistic approach to inference. Quantities of interest are governed by prob. dist. and optimal decisions can be made by reasoning about these prob.
More informationNotes on Machine Learning for and
Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori
More informationStephen Scott.
1 / 28 ian ian Optimal (Adapted from Ethem Alpaydin and Tom Mitchell) Naïve Nets sscott@cse.unl.edu 2 / 28 ian Optimal Naïve Nets Might have reasons (domain information) to favor some hypotheses/predictions
More informationLecture 9: Bayesian Learning
Lecture 9: Bayesian Learning Cognitive Systems II - Machine Learning Part II: Special Aspects of Concept Learning Bayes Theorem, MAL / ML hypotheses, Brute-force MAP LEARNING, MDL principle, Bayes Optimal
More informationCSCE 478/878 Lecture 6: Bayesian Learning and Graphical Models. Stephen Scott. Introduction. Outline. Bayes Theorem. Formulas
ian ian ian Might have reasons (domain information) to favor some hypotheses/predictions over others a priori ian methods work with probabilities, and have two main roles: Naïve Nets (Adapted from Ethem
More informationUncertainty. Variables. assigns to each sentence numerical degree of belief between 0 and 1. uncertainty
Bayes Classification n Uncertainty & robability n Baye's rule n Choosing Hypotheses- Maximum a posteriori n Maximum Likelihood - Baye's concept learning n Maximum Likelihood of real valued function n Bayes
More informationBayesian Learning. CSL603 - Fall 2017 Narayanan C Krishnan
Bayesian Learning CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Bayes Theorem MAP Learners Bayes optimal classifier Naïve Bayes classifier Example text classification Bayesian networks
More informationTwo Roles for Bayesian Methods
Bayesian Learning Bayes Theorem MAP, ML hypotheses MAP learners Minimum description length principle Bayes optimal classifier Naive Bayes learner Example: Learning over text data Bayesian belief networks
More informationIntroduction to Machine Learning
Introduction to Machine Learning CS4375 --- Fall 2018 Bayesian a Learning Reading: Sections 13.1-13.6, 20.1-20.2, R&N Sections 6.1-6.3, 6.7, 6.9, Mitchell 1 Uncertainty Most real-world problems deal with
More informationIntroduction to Machine Learning
Uncertainty Introduction to Machine Learning CS4375 --- Fall 2018 a Bayesian Learning Reading: Sections 13.1-13.6, 20.1-20.2, R&N Sections 6.1-6.3, 6.7, 6.9, Mitchell Most real-world problems deal with
More informationMODULE -4 BAYEIAN LEARNING
MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities
More informationBAYESIAN LEARNING. [Read Ch. 6] [Suggested exercises: 6.1, 6.2, 6.6]
1 BAYESIAN LEARNING [Read Ch. 6] [Suggested exercises: 6.1, 6.2, 6.6] Bayes Theorem MAP, ML hypotheses, MAP learners Minimum description length principle Bayes optimal classifier, Naive Bayes learner Example:
More informationMachine Learning. CS Spring 2015 a Bayesian Learning (I) Uncertainty
Machine Learning CS6375 --- Spring 2015 a Bayesian Learning (I) 1 Uncertainty Most real-world problems deal with uncertain information Diagnosis: Likely disease given observed symptoms Equipment repair:
More informationBayesian Learning. Chapter 6: Bayesian Learning. Bayes Theorem. Roles for Bayesian Methods. CS 536: Machine Learning Littman (Wu, TA)
Bayesian Learning Chapter 6: Bayesian Learning CS 536: Machine Learning Littan (Wu, TA) [Read Ch. 6, except 6.3] [Suggested exercises: 6.1, 6.2, 6.6] Bayes Theore MAP, ML hypotheses MAP learners Miniu
More informationBayesian Learning Features of Bayesian learning methods:
Bayesian Learning Features of Bayesian learning methods: Each observed training example can incrementally decrease or increase the estimated probability that a hypothesis is correct. This provides a more
More informationTopics. Bayesian Learning. What is Bayesian Learning? Objectives for Bayesian Learning
Topics Bayesian Learning Sattiraju Prabhakar CS898O: ML Wichita State University Objectives for Bayesian Learning Bayes Theorem and MAP Bayes Optimal Classifier Naïve Bayes Classifier An Example Classifying
More informationMachine Learning. Bayesian Learning.
Machine Learning Bayesian Learning Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut für Informatik Technische Fakultät Albert-Ludwigs-Universität Freiburg Martin.Riedmiller@uos.de
More informationBayesian Learning Extension
Bayesian Learning Extension This document will go over one of the most useful forms of statistical inference known as Baye s Rule several of the concepts that extend from it. Named after Thomas Bayes this
More informationCMPT Machine Learning. Bayesian Learning Lecture Scribe for Week 4 Jan 30th & Feb 4th
CMPT 882 - Machine Learning Bayesian Learning Lecture Scribe for Week 4 Jan 30th & Feb 4th Stephen Fagan sfagan@sfu.ca Overview: Introduction - Who was Bayes? - Bayesian Statistics Versus Classical Statistics
More informationBayesian Learning. Bayesian Learning Criteria
Bayesian Learning In Bayesian learning, we are interested in the probability of a hypothesis h given the dataset D. By Bayes theorem: P (h D) = P (D h)p (h) P (D) Other useful formulas to remember are:
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationBayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction
15-0: Learning vs. Deduction Artificial Intelligence Programming Bayesian Learning Chris Brooks Department of Computer Science University of San Francisco So far, we ve seen two types of reasoning: Deductive
More informationText Categorization CSE 454. (Based on slides by Dan Weld, Tom Mitchell, and others)
Text Categorization CSE 454 (Based on slides by Dan Weld, Tom Mitchell, and others) 1 Given: Categorization A description of an instance, x X, where X is the instance language or instance space. A fixed
More informationBayesian Learning. Remark on Conditional Probabilities and Priors. Two Roles for Bayesian Methods. [Read Ch. 6] [Suggested exercises: 6.1, 6.2, 6.
Machine Learning Bayesian Learning Bayes Theorem Bayesian Learning [Read Ch. 6] [Suggested exercises: 6.1, 6.2, 6.6] Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationUncertainty and Bayesian Networks
Uncertainty and Bayesian Networks Tutorial 3 Tutorial 3 1 Outline Uncertainty Probability Syntax and Semantics for Uncertainty Inference Independence and Bayes Rule Syntax and Semantics for Bayesian Networks
More information[read Chapter 2] [suggested exercises 2.2, 2.3, 2.4, 2.6] General-to-specific ordering over hypotheses
1 CONCEPT LEARNING AND THE GENERAL-TO-SPECIFIC ORDERING [read Chapter 2] [suggested exercises 2.2, 2.3, 2.4, 2.6] Learning from examples General-to-specific ordering over hypotheses Version spaces and
More informationProbability Based Learning
Probability Based Learning Lecture 7, DD2431 Machine Learning J. Sullivan, A. Maki September 2013 Advantages of Probability Based Methods Work with sparse training data. More powerful than deterministic
More informationMachine Learning. Bayesian Learning. Acknowledgement Slides courtesy of Martin Riedmiller
Machine Learning Bayesian Learning Dr. Joschka Boedecker AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut für Informatik Technische Fakultät Albert-Ludwigs-Universität Freiburg jboedeck@informatik.uni-freiburg.de
More informationLecture 10: Introduction to reasoning under uncertainty. Uncertainty
Lecture 10: Introduction to reasoning under uncertainty Introduction to reasoning under uncertainty Review of probability Axioms and inference Conditional probability Probability distributions COMP-424,
More informationMachine Learning (CS 567)
Machine Learning (CS 567) Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol Han (cheolhan@usc.edu)
More information10/15/2015 A FAST REVIEW OF DISCRETE PROBABILITY (PART 2) Probability, Conditional Probability & Bayes Rule. Discrete random variables
Probability, Conditional Probability & Bayes Rule A FAST REVIEW OF DISCRETE PROBABILITY (PART 2) 2 Discrete random variables A random variable can take on one of a set of different values, each with an
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Chapter 8&9: Classification: Part 3 Instructor: Yizhou Sun yzsun@ccs.neu.edu March 12, 2013 Midterm Report Grade Distribution 90-100 10 80-89 16 70-79 8 60-69 4
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationIntroduction to Bayesian Learning. Machine Learning Fall 2018
Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability
More informationPROBABILISTIC REASONING SYSTEMS
PROBABILISTIC REASONING SYSTEMS In which we explain how to build reasoning systems that use network models to reason with uncertainty according to the laws of probability theory. Outline Knowledge in uncertain
More informationThe Noisy Channel Model and Markov Models
1/24 The Noisy Channel Model and Markov Models Mark Johnson September 3, 2014 2/24 The big ideas The story so far: machine learning classifiers learn a function that maps a data item X to a label Y handle
More informationCS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning
CS 446 Machine Learning Fall 206 Nov 0, 206 Bayesian Learning Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes Overview Bayesian Learning Naive Bayes Logistic Regression Bayesian Learning So far, we
More informationIntroduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees
Introduction to ML Two examples of Learners: Naïve Bayesian Classifiers Decision Trees Why Bayesian learning? Probabilistic learning: Calculate explicit probabilities for hypothesis, among the most practical
More informationMachine Learning, Midterm Exam: Spring 2009 SOLUTION
10-601 Machine Learning, Midterm Exam: Spring 2009 SOLUTION March 4, 2009 Please put your name at the top of the table below. If you need more room to work out your answer to a question, use the back of
More informationArtificial Intelligence Programming Probability
Artificial Intelligence Programming Probability Chris Brooks Department of Computer Science University of San Francisco Department of Computer Science University of San Francisco p.1/?? 13-0: Uncertainty
More informationBasic Probability. Robert Platt Northeastern University. Some images and slides are used from: 1. AIMA 2. Chris Amato
Basic Probability Robert Platt Northeastern University Some images and slides are used from: 1. AIMA 2. Chris Amato (Discrete) Random variables What is a random variable? Suppose that the variable a denotes
More informationProbabilistic Classification
Bayesian Networks Probabilistic Classification Goal: Gather Labeled Training Data Build/Learn a Probability Model Use the model to infer class labels for unlabeled data points Example: Spam Filtering...
More informationStatistical learning. Chapter 20, Sections 1 4 1
Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationBayesian Networks BY: MOHAMAD ALSABBAGH
Bayesian Networks BY: MOHAMAD ALSABBAGH Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN Introduction Conditional
More informationStatistical Learning. Philipp Koehn. 10 November 2015
Statistical Learning Philipp Koehn 10 November 2015 Outline 1 Learning agents Inductive learning Decision tree learning Measuring learning performance Bayesian learning Maximum a posteriori and maximum
More informationOutline. Training Examples for EnjoySport. 2 lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997
Outline Training Examples for EnjoySport Learning from examples General-to-specific ordering over hypotheses [read Chapter 2] [suggested exercises 2.2, 2.3, 2.4, 2.6] Version spaces and candidate elimination
More informationQuantifying uncertainty & Bayesian networks
Quantifying uncertainty & Bayesian networks CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2016 Soleymani Artificial Intelligence: A Modern Approach, 3 rd Edition,
More informationWhy Probability? It's the right way to look at the world.
Probability Why Probability? It's the right way to look at the world. Discrete Random Variables We denote discrete random variables with capital letters. A boolean random variable may be either true or
More informationUncertainty. Chapter 13
Uncertainty Chapter 13 Uncertainty Let action A t = leave for airport t minutes before flight Will A t get me there on time? Problems: 1. partial observability (road state, other drivers' plans, noisy
More information13.4 INDEPENDENCE. 494 Chapter 13. Quantifying Uncertainty
494 Chapter 13. Quantifying Uncertainty table. In a realistic problem we could easily have n>100, makingo(2 n ) impractical. The full joint distribution in tabular form is just not a practical tool for
More informationPROBABILITY. Inference: constraint propagation
PROBABILITY Inference: constraint propagation! Use the constraints to reduce the number of legal values for a variable! Possible to find a solution without searching! Node consistency " A node is node-consistent
More informationCOMP 328: Machine Learning
COMP 328: Machine Learning Lecture 2: Naive Bayes Classifiers Nevin L. Zhang Department of Computer Science and Engineering The Hong Kong University of Science and Technology Spring 2010 Nevin L. Zhang
More informationBITS F464: MACHINE LEARNING
BITS F464: MACHINE LEARNING Lecture-09: Concept Learning Dr. Kamlesh Tiwari Assistant Professor Department of Computer Science and Information Systems Engineering, BITS Pilani, Rajasthan-333031 INDIA Jan
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationIntroduction to Bayesian Learning
Course Information Introduction Introduction to Bayesian Learning Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Apprendimento Automatico: Fondamenti - A.A. 2016/2017 Outline
More informationThe Naïve Bayes Classifier. Machine Learning Fall 2017
The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning
More informationExpectation Maximization (EM)
Expectation Maximization (EM) The EM algorithm is used to train models involving latent variables using training data in which the latent variables are not observed (unlabeled data). This is to be contrasted
More informationBayes Theorem & Naïve Bayes. (some slides adapted from slides by Massimo Poesio, adapted from slides by Chris Manning)
Bayes Theorem & Naïve Bayes (some slides adapted from slides by Massimo Poesio, adapted from slides by Chris Manning) Review: Bayes Theorem & Diagnosis P( a b) Posterior Likelihood Prior P( b a) P( a)
More informationBayes Rule. CS789: Machine Learning and Neural Network Bayesian learning. A Side Note on Probability. What will we learn in this lecture?
Bayes Rule CS789: Machine Learning and Neural Network Bayesian learning P (Y X) = P (X Y )P (Y ) P (X) Jakramate Bootkrajang Department of Computer Science Chiang Mai University P (Y ): prior belief, prior
More informationBayesian Learning. Bayes Theorem. MAP, MLhypotheses. MAP learners. Minimum description length principle. Bayes optimal classier. Naive Bayes learner
Bayesian Learning [Read Ch. 6] [Suggested exercises: 6.1, 6.2, 6.6] Bayes Theorem MAP, MLhypotheses MAP learners Minimum description length principle Bayes optimal classier Naive Bayes learner Example:
More informationBayesian Inference. Definitions from Probability: Naive Bayes Classifiers: Advantages and Disadvantages of Naive Bayes Classifiers:
Bayesian Inference The purpose of this document is to review belief networks and naive Bayes classifiers. Definitions from Probability: Belief networks: Naive Bayes Classifiers: Advantages and Disadvantages
More informationAlgorithm-Independent Learning Issues
Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning
More informationUncertainty. Chapter 13
Uncertainty Chapter 13 Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes Rule Uncertainty Let s say you want to get to the airport in time for a flight. Let action A
More informationArtificial Intelligence
Artificial Intelligence Dr Ahmed Rafat Abas Computer Science Dept, Faculty of Computers and Informatics, Zagazig University arabas@zu.edu.eg http://www.arsaliem.faculty.zu.edu.eg/ Uncertainty Chapter 13
More informationConfusion matrix. a = true positives b = false negatives c = false positives d = true negatives 1. F-measure combines Recall and Precision:
Confusion matrix classifier-determined positive label classifier-determined negative label true positive a b label true negative c d label Accuracy = (a+d)/(a+b+c+d) a = true positives b = false negatives
More informationAlgorithmisches Lernen/Machine Learning
Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines
More informationStatistical learning. Chapter 20, Sections 1 3 1
Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationCS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 Plan for today and next week Today and next time: Bayesian networks (Bishop Sec. 8.1) Conditional
More informationDecision Tree Learning
Decision Tree Learning Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University References: 1. Machine Learning, Chapter 3 2. Data Mining: Concepts, Models,
More informationDirected Graphical Models or Bayesian Networks
Directed Graphical Models or Bayesian Networks Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Bayesian Networks One of the most exciting recent advancements in statistical AI Compact
More informationPengju XJTU 2016
Introduction to AI Chapter13 Uncertainty Pengju Ren@IAIR Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes Rule Wumpus World Environment Squares adjacent to wumpus are
More informationIn today s lecture. Conditional probability and independence. COSC343: Artificial Intelligence. Curse of dimensionality.
In today s lecture COSC343: Artificial Intelligence Lecture 5: Bayesian Reasoning Conditional probability independence Curse of dimensionality Lech Szymanski Dept. of Computer Science, University of Otago
More informationDecision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag
Decision Trees Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Supervised Learning Input: labelled training data i.e., data plus desired output Assumption:
More informationMidterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian
More informationT Machine Learning: Basic Principles
Machine Learning: Basic Principles Bayesian Networks Laboratory of Computer and Information Science (CIS) Department of Computer Science and Engineering Helsinki University of Technology (TKK) Autumn 2007
More informationArtificial Intelligence CS 6364
Artificial Intelligence CS 6364 rofessor Dan Moldovan Section 12 robabilistic Reasoning Acting under uncertainty Logical agents assume propositions are - True - False - Unknown acting under uncertainty
More informationBayesian Networks. Motivation
Bayesian Networks Computer Sciences 760 Spring 2014 http://pages.cs.wisc.edu/~dpage/cs760/ Motivation Assume we have five Boolean variables,,,, The joint probability is,,,, How many state configurations
More informationMachine Learning. Bayesian Learning. Michael M. Richter. Michael M. Richter
Machine Learning Bayesian Learning Email: mrichter@ucalgary.ca Topic This is concept learning the probabilistic way. That means, everything that is stated is done in an exact way but not always true. That
More informationArtificial Intelligence Uncertainty
Artificial Intelligence Uncertainty Ch. 13 Uncertainty Let action A t = leave for airport t minutes before flight Will A t get me there on time? A 25, A 60, A 3600 Uncertainty: partial observability (road
More informationMaxent Models and Discriminative Estimation
Maxent Models and Discriminative Estimation Generative vs. Discriminative models (Reading: J+M Ch6) Introduction So far we ve looked at generative models Language models, Naive Bayes But there is now much
More informationLecture 15. Probabilistic Models on Graph
Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Classification: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network
More informationRelationship between Least Squares Approximation and Maximum Likelihood Hypotheses
Relationship between Least Squares Approximation and Maximum Likelihood Hypotheses Steven Bergner, Chris Demwell Lecture notes for Cmpt 882 Machine Learning February 19, 2004 Abstract In these notes, a
More informationBayesian networks. Chapter Chapter
Bayesian networks Chapter 14.1 3 Chapter 14.1 3 1 Outline Syntax Semantics Parameterized distributions Chapter 14.1 3 2 Bayesian networks A simple, graphical notation for conditional independence assertions
More informationLearning with Probabilities
Learning with Probabilities CS194-10 Fall 2011 Lecture 15 CS194-10 Fall 2011 Lecture 15 1 Outline Bayesian learning eliminates arbitrary loss functions and regularizers facilitates incorporation of prior
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification
More informationSome Concepts of Probability (Review) Volker Tresp Summer 2018
Some Concepts of Probability (Review) Volker Tresp Summer 2018 1 Definition There are different way to define what a probability stands for Mathematically, the most rigorous definition is based on Kolmogorov
More information1. True/False (40 points) total
Name: 1 2 3 4 5 6 7 total 40 20 45 45 20 15 15 200 UMBC CMSC 671 Final Exam December 20, 2009 Please write all of your answers on this exam. The exam is closed book and has seven problems that add up to
More informationUncertainty. Logic and Uncertainty. Russell & Norvig. Readings: Chapter 13. One problem with logical-agent approaches: C:145 Artificial
C:145 Artificial Intelligence@ Uncertainty Readings: Chapter 13 Russell & Norvig. Artificial Intelligence p.1/43 Logic and Uncertainty One problem with logical-agent approaches: Agents almost never have
More informationRecall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem
Recall from last time: Conditional probabilities Our probabilistic models will compute and manipulate conditional probabilities. Given two random variables X, Y, we denote by Lecture 2: Belief (Bayesian)
More informationY. Xiang, Inference with Uncertain Knowledge 1
Inference with Uncertain Knowledge Objectives Why must agent use uncertain knowledge? Fundamentals of Bayesian probability Inference with full joint distributions Inference with Bayes rule Bayesian networks
More informationLecture 2: Foundations of Concept Learning
Lecture 2: Foundations of Concept Learning Cognitive Systems II - Machine Learning WS 2005/2006 Part I: Basic Approaches to Concept Learning Version Space, Candidate Elimination, Inductive Bias Lecture
More informationNaïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824
Naïve Bayes Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative HW 1 out today. Please start early! Office hours Chen: Wed 4pm-5pm Shih-Yang: Fri 3pm-4pm Location: Whittemore 266
More informationProbabilistic Reasoning. Kee-Eung Kim KAIST Computer Science
Probabilistic Reasoning Kee-Eung Kim KAIST Computer Science Outline #1 Acting under uncertainty Probabilities Inference with Probabilities Independence and Bayes Rule Bayesian networks Inference in Bayesian
More informationModel Averaging (Bayesian Learning)
Model Averaging (Bayesian Learning) We want to predict the output Y of a new case that has input X = x given the training examples e: p(y x e) = m M P(Y m x e) = m M P(Y m x e)p(m x e) = m M P(Y m x)p(m
More informationMIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,
MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run
More information