Today. Statistical Learning. Coin Flip. Coin Flip. Experiment 1: Heads. Experiment 1: Heads. Which coin will I use? Which coin will I use?
|
|
- Cameron Richardson
- 5 years ago
- Views:
Transcription
1 Today Statistical Learning Parameter Estimation: Maximum Likelihood (ML) Maximum A Posteriori (MAP) Bayesian Continuous case Learning Parameters for a Bayesian Network Naive Bayes Maximum Likelihood estimates Priors Learning Structure of Bayesian Networks Coin Flip Coin Flip C C C C P(H C ) =. P(H C P(H P(H C ) =. P(H C P(H Which coin will I use? Which coin will I use? P(C ) = / P(C ) = / P( ) = / Prior: Probability of a hypothesis before we make any observations P(C ) = / P(C ) = / P( ) = / Uniform Prior: All hypothesis are equally likely before we make any observations 4 Experiment : Heads Experiment : Heads P(C H) =? P(C H) =? P( H) =? P(C H) =.66 P(C H) =. P( H) =.6 P (C H) = P (H C )P (C ) P (H) P (H) = P (H C i )P (C i ) i= Posterior: Probability of a hypothesis given data C C C C P(H C ) =. P(H C P(H P(C ) = / P(C ) = / P( ) = / 5 P(H C ) =. P(H C P(H P(C ) = / P(C ) = / P( ) = / 6
2 Experiment : Tails P(C HT) =? P(C HT) =? P( HT) =? P (C HT ) = αp (HT C )P (C ) = αp (H C )P (T C )P (C ) Experiment : Tails P(C HT) =. P(C HT8 P( HT) =. P (C HT ) = αp (HT C )P (C ) = αp (H C )P (T C )P (C ) C C C C P(H C ) =. P(H C P(H P(C ) = / P(C ) = / P( ) = / 7 P(H C ) =. P(H C P(H P(C ) = / P(C ) = / P( ) = / 8 Experiment : Tails P(C HT) =. P(C HT8 P( HT) =. Your Estimate? What is the probability of heads after two experiments? Most likely coin: C Best estimate for P(H) P(H C C C C C P(H C ) =. P(H C P(H P(C ) = / P(C ) = / P( ) = / 9 P(H C ) =. P(H C P(H P(C ) = / P(C ) = / P( ) = / Most likely coin: C Your Estimate? Maximum Likelihood Estimate: The best hypothesis that fits observed data assuming uniform prior C Best estimate for P(H) P(H C Using Prior Knowledge Should we always use Uniform Prior? Background knowledge: Heads => you go first in Abalone against TA TAs are nice people => TA is more likely to use a coin biased in your favor C C P(H C P(C ) = / P(H C ) =. P(H C P(H
3 Using Prior Knowledge We can encode it in the prior: P(C ) =.5 P(C P( C C Experiment : Heads P(C H) =? P(C H) =? P( H) =? P (C H) = αp (H C )P (C ) C C P(H C ) =. P(H C P(H P(H C ) =. P(H C P(H P(C ) =.5 P(C P( 4 Experiment : Heads P(C H) =.6 P(C H) =.65 P( H) =.89 ML posterior after Exp : P(C H) =.66 P(C H) =. P( H) =.6 C C Experiment : Tails P(C HT) =? P(C HT) =? P( HT) =? P (C HT ) = αp (HT C )P (C ) = αp (H C )P (T C )P (C ) C C P(H C ) =. P(H C P(H P(C ) =.5 P(C P( 5 P(H C ) =. P(H C P(H P(C ) =.5 P(C P( 6 Experiment : Tails Experiment : Tails P(C HT) =.5 P(C HT) =.48 P( HT) =.485 P(C HT) =.5 P(C HT) =.48 P( HT) =.485 P (C HT ) = αp (HT C )P (C ) = αp (H C )P (T C )P (C ) C C C C P(H C ) =. P(H C P(H P(C ) =.5 P(C P( 7 P(H C ) =. P(H C P(H P(C ) =.5 P(C P( 8
4 Your Estimate? What is the probability of heads after two experiments? Most likely coin: Best estimate for P(H) Most likely coin: Your Estimate? Maximum A Posteriori (MAP) Estimate: The best hypothesis that fits observed data assuming a non-uniform prior Best estimate for P(H) P(H P(H C C P(H C ) =. P(H C P(H P(C ) =.5 P(C P( 9 P(H P( Did We Do The Right Thing? Did We Do The Right Thing? P(C HT) =.5 P(C HT) =.48 P( HT) =.485 P(C HT) =.5 P(C HT) =.48 P( HT) =.485 C and are almost equally likely C C C C P(H C ) =. P(H C P(H P(H C ) =. P(H C P(H A Better Estimate Recall: P (H) = P (H C i )P (C i ) =.68 i= P(C HT) =.5 P(C HT) =.48 P( HT) =.485 Bayesian Estimate Bayesian Estimate: Minimizes prediction error, given data and (generally) assuming a non-uniform prior P (H) = P (H C i )P (C i ) =.68 i= P(C HT) =.5 P(C HT) =.48 P( HT) =.485 C C P(H C ) =. P(H C P(H C C P(H C ) =. P(H C P(H 4
5 Comparison ML (Maximum Likelihood): P(H after experiments: P(H MAP (Maximum A Posteriori): P(H after experiments: P(H Bayesian: P(H) =.68 after experiments: P(H Comparison ML (Maximum Likelihood): P(H after experiments (HTH 8 ): P(H MAP (Maximum A Posteriori): P(H after experiments (HTH 8 ): P(H Bayesian: P(H) =.68 after experiments (HTH 8 ): P(H 5 6 Comparison ML (Maximum Likelihood): Easy to compute MAP (Maximum A Posteriori): Still easy to compute Incorporates prior knowledge Bayesian: Minimizes error => great when data is scarce Potentially much harder to compute Comparison ML (Maximum Likelihood): Easy to compute MAP (Maximum A Posteriori): Still easy to compute Incorporates prior knowledge Bayesian: Minimizes error => great when data is scarce Potentially much harder to compute 7 8 Comparison ML (Maximum Likelihood): Easy to compute MAP (Maximum A Posteriori): Still easy to compute Incorporates prior knowledge Bayesian: Minimizes error => great when data is scarce Potentially much harder to compute Summary For Now Prior: Probability of a hypothesis before we see any data Likelihood: Probability of data given hypothesis Uniform Prior: A prior that makes all hypothesis equaly likely Posterior: Probability of a hypothesis after we saw some data 9
6 Summary For Now Continuous Case Prior: Probability of a hypothesis before we see any data Uniform Prior: A prior that makes all hypothesis equaly likely Posterior: Probability of a hypothesis after we saw some data Likelihood: Probability of data given hypothesis Prior In the previous example, we chose from a discrete set of three coins Hypothesis Maximum Likelihood Estimate Uniform The most likely Point Maximum A Posteriori Estimate Any The most likely Point Bayesian Estimate Any Weighted combination In general, we have to pick from a continuous distribution of biased coins Average Continuous Case Continuous Case Continuous Case Prior 4 Continuous Case Exp : Tails Exp : Heads.9 Posterior after experiments: uniform ML Estimate MAP Estimate Bayesian Estimate w/ uniform prior with background knowledge. with background knowledge
7 After Experiments ML Estimate MAP Estimate Bayesian Estimate Posterior: w/ uniform prior After Experiments ML Estimate MAP Estimate Bayesian Estimate Posterior: w/ uniform prior with background knowledge with background knowledge 7 8 Parameter Estimation and Bayesian Networks Parameter Estimation and Bayesian Networks E B R A J M T F T T F T F F F F F T F T F T T T F F F T T T F T F F F F E B R A J M T F T T F T F F F F F T F T F T T T F F F T T T F T F F F F We have: - Bayes Net structure and observations - We need: Bayes Net parameters P(B) =? Prior + data = Now compute either MAP or Bayesian estimate 9 4 Parameter Estimation and Bayesian Networks E B R A J M T F T T F T F F F F F T F T F T T T F F F T T T F T F F F F Parameter Estimation and Bayesian Networks E B R A J M T F T T F T F F F F F T F T F T T T F F F T T T F T F F F F 4 P(A E,B) =? P(A E, B) =? P(A E,B) =? P(A E, B) =? 4
8 Parameter Estimation and Bayesian Networks Parameter Estimation and Bayesian Networks E B R A J M T F T T F T F F F F F T F T F T T T F F F T T T F T F F F F E B R A J M T F T T F T F F F F F T F T F T T T F F F T T T F T F F F F P(A E,B) =? P(A E, B) =? P(A E,B) =? P(A E, B) =? Prior + data = Now compute either MAP or Bayesian estimate 4 P(A E,B) =? P(A E, B) =? P(A E,B) =? P(A E, B) =? You know the drill 44 Naive Bayes FREE apple Bayes A Bayes Net where all nodes are children of a single root node Why? Expressive and accurate? Easy to learn? Naive Bayes A Bayes Net where all nodes are children of a single root node Why? Expressive and accurate? No - why? Easy to learn? Naive Bayes Naive Bayes A Bayes Net where all nodes are children of a single root node Why? Expressive and accurate? No Easy to learn? Yes A Bayes Net where all nodes are children of a single root node Why? Expressive and accurate? No Easy to learn? Yes Useful? Sometimes 47 48
9 Inference In Naive Bayes P(S) =.6 Inference In Naive Bayes P(S) =.6 P(A S) =. P(A S) =. P(B S) =. P(B S) =. P(F S) =.8 P(F S) =.5 P(A S) =? P(A S) =? P(A S) =. P(A S) =. P(B S) =. P(B S) =. P(F S) =.8 P(F S) =.5 P(A S) =? P(A S) =? Goal, given evidence (words in an ) decide if an is spam E = {A, B, F, K,...} P (S E) = = = P (E S)P (S) P (E) P (A, B, F, K,... S)P (S) P (A, B, F, K,...) Independence to the rescue! P (A S)P ( B S)P (F S)P ( K S)P (... S)P (S) P (A)P ( B)P (F )P ( K)P (...) 49 5 Inference In Naive Bayes P(S) =.6 Inference In Naive Bayes P(S) =.6 P(A S) =. P(A S) =. P(B S) =. P(B S) =. P(F S) =.8 P(F S) =.5 P(A S) =? P(A S) =? P(A S) =. P(A S) =. P(B S) =. P(B S) =. P(F S) =.8 P(F S) =.5 P(A S) =? P(A S) =? P (S E) = P ( S E) = P (A S)P ( B S)P (F S)P ( K S)P (... S)P (S) P (A)P ( B)P (F )P ( K)P (...) P (A S)P ( B S)P (F S)P ( K S)P (... S)P ( S) P (A)P ( B)P (F )P ( K)P (...) if P(S E) > P( S E) But 5 P (S E) P (A S)P ( B S)P (F S)P ( K S)P (... S)P (S) P ( S E) P (A S)P ( B S)P (F S)P ( K S)P (... S)P ( S) 5 Prior ! Can we calculate Maximum Likelihood estimate of! easily? + Data: = Max Likelihood estimate ! Looking for the maximum of a function: - find the derivative - set it to zero What function are we maximizing? P(data hypothesis) hypothesis = h! (one for each value of!) P(data h ) = P( h )P(!! h )P(! h )P(! h )! =! (-!) (-!)! =! # (-!) # 5 54
10 What function are we maximizing? P(data hypothesis) hypothesis = h! (one for each value of!) P(data h ) = P( h )P(!! h )P(! h )P(! h )! =! (-!) (-!)! =! # (-!) # What function are we maximizing? P(data hypothesis) hypothesis = h! (one for each value of!) P(data h ) = P( h )P(!! h )P(! h )P(! h )! =! (-!) (-!)! =! # (-!) # What function are we maximizing? P(data hypothesis) hypothesis = h! (one for each value of!) P(data h ) = P( h )P(!! h )P(! h )P(! h )! =! (-!) (-!)! =! # (-!) # What function are we maximizing? P(data hypothesis) hypothesis = h! (one for each value of!) P(data h ) = P( h )P(!! h )P(! h )P(! h )! =! (-!) (-!)! =! # (-!) # To find! that maximizes! # (-!) # we take a derivative of the function and set it to. And we get: = # / (# + # ) You knew it already, right? To find! that maximizes! # (-!) # we take a derivative of the function and set it to. And we get: # = # + # You knew it already, right? 59 6
11 Problems With Small Samples What happens if in your training data apples are not mentioned in any spam message? P(A S) = Why is it bad? P (S E) P (A S)P ( B S)P (F S)P ( K S)P (... S)P (S) = Smoothing Smoothing is used when samples are small Add-one smoothing is the simplest smoothing method: just add to every count! 6 6 Priors! # Recall that P(S) = # + # If we have a slight hunch that P(S)! p # + p P(S) = # + # + If we have a big hunch that P(S)! p # + mp # + # + m where m can be any number > Priors! # Recall that P(S) = # + # If we have a slight hunch that P(S)! p # + p P(S) = # + # + If we have a big hunch that P(S)! p # + mp # + # + m where m can be any number > 6 64 Priors! # Recall that P(S) = # + # If we have a slight hunch that P(S)! p # + p P(S) = # + # + If we have a big hunch that P(S)! p # + mp # + # + m where m can be any number > Priors! # + mp P(S) = # + # + m Note that if m = in the above, it is like saying I have seen samples that make me believe that P(S) = p Hence, m is referred to as the equivalent sample size 65 66
12 Priors! # + mp P(S) = # + # + m Where should p come from? No prior knowledge => p=.5 If you build a personalized spam filter, you can use p = P(S) from some body else s filter! Inference in Naive Bayes Revisited Recall that P (S E) P (A S)P ( B S)P (F S)P ( K S)P (... S)P (S) We are multiplying lots of small numbers together Is there => any danger potential of underflow! for trouble here? Solution? Use logs! Inference in Naive Bayes Revisited Recall that P (S E) P (A S)P ( B S)P (F S)P ( K S)P (... S)P (S) We are multiplying lots of small numbers together => danger of underflow! Solution? Use logs! Inference in Naive Bayes Revisited log(p (S E)) log(p (A S)P ( B S)P (F S)P ( K S)P (... S)P (S)) log(p (A S))+log(P ( B S))+log(P (F S))+log(P ( K S))+log(P (... S))+log(P (S))) Now we add regular numbers -- little danger of over- or underflow errors! 69 7 Learning The Structure of Bayesian Networks General idea: look at all possible network structures and pick one that fits observed data best Impossibly slow: exponential number of networks, and for each we have to learn parameters, too! What do we do if searching the space exhaustively is too expensive? Learning The Structure of Bayesian Networks Local search! Start with some network structure Try to make a change (add, delete or reverse node) See if the new network is any better 7 7
13 Learning The Structure of Bayesian Networks Learning The Structure of Bayesian Networks What network structure should we start with? Random with uniform prior? Networks that reflects our (or experts ) knowledge of the field? prior network+equivalent sample size data x x x x 4 x 5 x true true x 6 x 7 x 8 x true. x 9 x true true... improved network(s) x x x 4 x x 5 x 6 x 7 x 8 x Learning The Structure of Bayesian Networks We have just described how to get an ML or MAP estimate of the structure of a Bayes Net What would the Bayes estimate look like? Find all possible networks Calculate their posteriors When doing inference: result weighed combination of all networks! 75
Logistics. Naïve Bayes & Expectation Maximization. 573 Schedule. Coming Soon. Estimation Models. Topics
Logistics Naïve Bayes & Expectation Maximization CSE 7 eam Meetings Midterm Open book, notes Studying See AIMA exercises Daniel S. Weld Daniel S. Weld 7 Schedule Selected opics Coming Soon Selected opics
More informationWhich coin will I use? Which coin will I use? Logistics. Statistical Learning. Topics. 573 Topics. Coin Flip. Coin Flip
Logistics Statistical Learning CSE 57 Team Meetings Midterm Open book, notes Studying See AIMA exercises Daniel S. Weld Daniel S. Weld Supervised Learning 57 Topics Logic-Based Reinforcement Learning Planning
More informationSome slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2
Logistics CSE 446: Point Estimation Winter 2012 PS2 out shortly Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Last Time Random variables, distributions Marginal, joint & conditional
More informationBayesian Methods: Naïve Bayes
Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior
More informationCS 361: Probability & Statistics
October 17, 2017 CS 361: Probability & Statistics Inference Maximum likelihood: drawbacks A couple of things might trip up max likelihood estimation: 1) Finding the maximum of some functions can be quite
More informationProbabilistic Classification
Bayesian Networks Probabilistic Classification Goal: Gather Labeled Training Data Build/Learn a Probability Model Use the model to infer class labels for unlabeled data points Example: Spam Filtering...
More informationCS 361: Probability & Statistics
March 14, 2018 CS 361: Probability & Statistics Inference The prior From Bayes rule, we know that we can express our function of interest as Likelihood Prior Posterior The right hand side contains the
More informationLecture 10: Bayes' Theorem, Expected Value and Variance Lecturer: Lale Özkahya
BBM 205 Discrete Mathematics Hacettepe University http://web.cs.hacettepe.edu.tr/ bbm205 Lecture 10: Bayes' Theorem, Expected Value and Variance Lecturer: Lale Özkahya Resources: Kenneth Rosen, Discrete
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationBayesian Learning. Instructor: Jesse Davis
Bayesian Learning Instructor: Jesse Davis 1 Announcements Homework 1 is due today Homework 2 is out Slides for this lecture are online We ll review some of homework 1 next class Techniques for efficient
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationBayes Formula. MATH 107: Finite Mathematics University of Louisville. March 26, 2014
Bayes Formula MATH 07: Finite Mathematics University of Louisville March 26, 204 Test Accuracy Conditional reversal 2 / 5 A motivating question A rare disease occurs in out of every 0,000 people. A test
More informationIntroduction to Artificial Intelligence Midterm 2. CS 188 Spring You have approximately 2 hours and 50 minutes.
CS 188 Spring 2014 Introduction to Artificial Intelligence Midterm 2 You have approximately 2 hours and 50 minutes. The exam is closed book, closed notes except your two-page crib sheet. Mark your answers
More informationBayesian Updating with Discrete Priors Class 11, Jeremy Orloff and Jonathan Bloom
1 Learning Goals ian Updating with Discrete Priors Class 11, 18.05 Jeremy Orloff and Jonathan Bloom 1. Be able to apply theorem to compute probabilities. 2. Be able to define the and to identify the roles
More informationIntroduction to Machine Learning
Introduction to Machine Learning CS4375 --- Fall 2018 Bayesian a Learning Reading: Sections 13.1-13.6, 20.1-20.2, R&N Sections 6.1-6.3, 6.7, 6.9, Mitchell 1 Uncertainty Most real-world problems deal with
More informationGenerative Clustering, Topic Modeling, & Bayesian Inference
Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week
More informationMachine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation
Machine Learning CMPT 726 Simon Fraser University Binomial Parameter Estimation Outline Maximum Likelihood Estimation Smoothed Frequencies, Laplace Correction. Bayesian Approach. Conjugate Prior. Uniform
More informationIntroduction to Machine Learning
Uncertainty Introduction to Machine Learning CS4375 --- Fall 2018 a Bayesian Learning Reading: Sections 13.1-13.6, 20.1-20.2, R&N Sections 6.1-6.3, 6.7, 6.9, Mitchell Most real-world problems deal with
More informationWith Question/Answer Animations. Chapter 7
With Question/Answer Animations Chapter 7 Chapter Summary Introduction to Discrete Probability Probability Theory Bayes Theorem Section 7.1 Section Summary Finite Probability Probabilities of Complements
More informationCS4705. Probability Review and Naïve Bayes. Slides from Dragomir Radev
CS4705 Probability Review and Naïve Bayes Slides from Dragomir Radev Classification using a Generative Approach Previously on NLP discriminative models P C D here is a line with all the social media posts
More informationBayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1
Bayes Networks CS540 Bryan R Gibson University of Wisconsin-Madison Slides adapted from those used by Prof. Jerry Zhu, CS540-1 1 / 59 Outline Joint Probability: great for inference, terrible to obtain
More informationBayesian Models in Machine Learning
Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationCS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning
CS 446 Machine Learning Fall 206 Nov 0, 206 Bayesian Learning Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes Overview Bayesian Learning Naive Bayes Logistic Regression Bayesian Learning So far, we
More informationDiscrete Mathematics and Probability Theory Fall 2015 Lecture 21
CS 70 Discrete Mathematics and Probability Theory Fall 205 Lecture 2 Inference In this note we revisit the problem of inference: Given some data or observations from the world, what can we infer about
More informationText Categorization CSE 454. (Based on slides by Dan Weld, Tom Mitchell, and others)
Text Categorization CSE 454 (Based on slides by Dan Weld, Tom Mitchell, and others) 1 Given: Categorization A description of an instance, x X, where X is the instance language or instance space. A fixed
More informationCSE 473: Artificial Intelligence Autumn Topics
CSE 473: Artificial Intelligence Autumn 2014 Bayesian Networks Learning II Dan Weld Slides adapted from Jack Breese, Dan Klein, Daphne Koller, Stuart Russell, Andrew Moore & Luke Zettlemoyer 1 473 Topics
More informationNaïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 3 September 14, Readings: Mitchell Ch Murphy Ch.
School of Computer Science 10-701 Introduction to Machine Learning aïve Bayes Readings: Mitchell Ch. 6.1 6.10 Murphy Ch. 3 Matt Gormley Lecture 3 September 14, 2016 1 Homewor 1: due 9/26/16 Project Proposal:
More informationCSC Discrete Math I, Spring Discrete Probability
CSC 125 - Discrete Math I, Spring 2017 Discrete Probability Probability of an Event Pierre-Simon Laplace s classical theory of probability: Definition of terms: An experiment is a procedure that yields
More informationProbability Review and Naïve Bayes
Probability Review and Naïve Bayes Instructor: Alan Ritter Some slides adapted from Dan Jurfasky and Brendan O connor What is Probability? The probability the coin will land heads is 0.5 Q: what does this
More informationIntroduction to AI Learning Bayesian networks. Vibhav Gogate
Introduction to AI Learning Bayesian networks Vibhav Gogate Inductive Learning in a nutshell Given: Data Examples of a function (X, F(X)) Predict function F(X) for new examples X Discrete F(X): Classification
More informationBayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction
15-0: Learning vs. Deduction Artificial Intelligence Programming Bayesian Learning Chris Brooks Department of Computer Science University of San Francisco So far, we ve seen two types of reasoning: Deductive
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationMLE/MAP + Naïve Bayes
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes MLE / MAP Readings: Estimating Probabilities (Mitchell, 2016)
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationClassification & Information Theory Lecture #8
Classification & Information Theory Lecture #8 Introduction to Natural Language Processing CMPSCI 585, Fall 2007 University of Massachusetts Amherst Andrew McCallum Today s Main Points Automatically categorizing
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More informationBayesian RL Seminar. Chris Mansley September 9, 2008
Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in
More informationIntroduction to Bayesian Learning. Machine Learning Fall 2018
Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability
More informationLecture 9: Naive Bayes, SVM, Kernels. Saravanan Thirumuruganathan
Lecture 9: Naive Bayes, SVM, Kernels Instructor: Outline 1 Probability basics 2 Probabilistic Interpretation of Classification 3 Bayesian Classifiers, Naive Bayes 4 Support Vector Machines Probability
More informationNon-parametric Methods
Non-parametric Methods Machine Learning Alireza Ghane Non-Parametric Methods Alireza Ghane / Torsten Möller 1 Outline Machine Learning: What, Why, and How? Curve Fitting: (e.g.) Regression and Model Selection
More informationMachine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?
Linear Regression Machine Learning CSE546 Sham Kakade University of Washington Oct 4, 2016 1 What about continuous variables? Billionaire says: If I am measuring a continuous variable, what can you do
More informationConditional Probability & Independence. Conditional Probabilities
Conditional Probability & Independence Conditional Probabilities Question: How should we modify P(E) if we learn that event F has occurred? Definition: the conditional probability of E given F is P(E F
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013
Bayesian Methods Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2013 1 What about prior n Billionaire says: Wait, I know that the thumbtack is close to 50-50. What can you
More informationThe Naïve Bayes Classifier. Machine Learning Fall 2017
The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning
More informationStatistical learning. Chapter 20, Sections 1 3 1
Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationSYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I
SYDE 372 Introduction to Pattern Recognition Probability Measures for Classification: Part I Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 Why use probability
More informationCS 188: Artificial Intelligence Spring Today
CS 188: Artificial Intelligence Spring 2006 Lecture 9: Naïve Bayes 2/14/2006 Dan Klein UC Berkeley Many slides from either Stuart Russell or Andrew Moore Bayes rule Today Expectations and utilities Naïve
More informationLEARNING WITH BAYESIAN NETWORKS
LEARNING WITH BAYESIAN NETWORKS Author: David Heckerman Presented by: Dilan Kiley Adapted from slides by: Yan Zhang - 2006, Jeremy Gould 2013, Chip Galusha -2014 Jeremy Gould 2013Chip Galus May 6th, 2016
More informationMLE/MAP + Naïve Bayes
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes Matt Gormley Lecture 19 March 20, 2018 1 Midterm Exam Reminders
More informationCPSC 340: Machine Learning and Data Mining
CPSC 340: Machine Learning and Data Mining MLE and MAP Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due tonight. Assignment 5: Will be released
More informationFinal Examination CS 540-2: Introduction to Artificial Intelligence
Final Examination CS 540-2: Introduction to Artificial Intelligence May 7, 2017 LAST NAME: SOLUTIONS FIRST NAME: Problem Score Max Score 1 14 2 10 3 6 4 10 5 11 6 9 7 8 9 10 8 12 12 8 Total 100 1 of 11
More informationDirected Graphical Models
CS 2750: Machine Learning Directed Graphical Models Prof. Adriana Kovashka University of Pittsburgh March 28, 2017 Graphical Models If no assumption of independence is made, must estimate an exponential
More informationProbabilistic Machine Learning
Probabilistic Machine Learning Bayesian Nets, MCMC, and more Marek Petrik 4/18/2017 Based on: P. Murphy, K. (2012). Machine Learning: A Probabilistic Perspective. Chapter 10. Conditional Independence Independent
More information2011 Pearson Education, Inc
Statistics for Business and Economics Chapter 3 Probability Contents 1. Events, Sample Spaces, and Probability 2. Unions and Intersections 3. Complementary Events 4. The Additive Rule and Mutually Exclusive
More informationProbabilistic Graphical Models (I)
Probabilistic Graphical Models (I) Hongxin Zhang zhx@cad.zju.edu.cn State Key Lab of CAD&CG, ZJU 2015-03-31 Probabilistic Graphical Models Modeling many real-world problems => a large number of random
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a
More informationCSCE 478/878 Lecture 6: Bayesian Learning
Bayesian Methods Not all hypotheses are created equal (even if they are all consistent with the training data) Outline CSCE 478/878 Lecture 6: Bayesian Learning Stephen D. Scott (Adapted from Tom Mitchell
More informationMACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION
MACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION THOMAS MAILUND Machine learning means different things to different people, and there is no general agreed upon core set of algorithms that must be
More informationReview: Bayesian learning and inference
Review: Bayesian learning and inference Suppose the agent has to make decisions about the value of an unobserved query variable X based on the values of an observed evidence variable E Inference problem:
More informationPoint Estimation. Vibhav Gogate The University of Texas at Dallas
Point Estimation Vibhav Gogate The University of Texas at Dallas Some slides courtesy of Carlos Guestrin, Chris Bishop, Dan Weld and Luke Zettlemoyer. Basics: Expectation and Variance Binary Variables
More informationNaïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824
Naïve Bayes Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative HW 1 out today. Please start early! Office hours Chen: Wed 4pm-5pm Shih-Yang: Fri 3pm-4pm Location: Whittemore 266
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?
Linear Regression Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2014 1 What about continuous variables? n Billionaire says: If I am measuring a continuous variable, what
More informationInfo 2950, Lecture 4
Info 2950, Lecture 4 7 Feb 2017 More Programming and Statistics Boot Camps? This week (only): PG Wed Office Hour 8 Feb at 3pm Prob Set 1: due Mon night 13 Feb (no extensions ) Note: Added part to problem
More informationDeep Learning for Computer Vision
Deep Learning for Computer Vision Lecture 3: Probability, Bayes Theorem, and Bayes Classification Peter Belhumeur Computer Science Columbia University Probability Should you play this game? Game: A fair
More informationLecture 3 Probability Basics
Lecture 3 Probability Basics Thais Paiva STA 111 - Summer 2013 Term II July 3, 2013 Lecture Plan 1 Definitions of probability 2 Rules of probability 3 Conditional probability What is Probability? Probability
More informationLast Time. Today. Bayesian Learning. The Distributions We Love. CSE 446 Gaussian Naïve Bayes & Logistic Regression
CSE 446 Gaussian Naïve Bayes & Logistic Regression Winter 22 Dan Weld Learning Gaussians Naïve Bayes Last Time Gaussians Naïve Bayes Logistic Regression Today Some slides from Carlos Guestrin, Luke Zettlemoyer
More informationInconsistency of Bayesian inference when the model is wrong, and how to repair it
Inconsistency of Bayesian inference when the model is wrong, and how to repair it Peter Grünwald Thijs van Ommen Centrum Wiskunde & Informatica, Amsterdam Universiteit Leiden June 3, 2015 Outline 1 Introduction
More informationNaïve Bayes. Vibhav Gogate The University of Texas at Dallas
Naïve Bayes Vibhav Gogate The University of Texas at Dallas Supervised Learning of Classifiers Find f Given: Training set {(x i, y i ) i = 1 n} Find: A good approximation to f : X Y Examples: what are
More information1 Probabilities. 1.1 Basics 1 PROBABILITIES
1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability
More informationBe able to define the following terms and answer basic questions about them:
CS440/ECE448 Section Q Fall 2017 Final Review Be able to define the following terms and answer basic questions about them: Probability o Random variables, axioms of probability o Joint, marginal, conditional
More informationLanguage Modelling: Smoothing and Model Complexity. COMP-599 Sept 14, 2016
Language Modelling: Smoothing and Model Complexity COMP-599 Sept 14, 2016 Announcements A1 has been released Due on Wednesday, September 28th Start code for Question 4: Includes some of the package import
More informationCS221 / Autumn 2017 / Liang & Ermon. Lecture 15: Bayesian networks III
CS221 / Autumn 2017 / Liang & Ermon Lecture 15: Bayesian networks III cs221.stanford.edu/q Question Which is computationally more expensive for Bayesian networks? probabilistic inference given the parameters
More informationBayesian Networks BY: MOHAMAD ALSABBAGH
Bayesian Networks BY: MOHAMAD ALSABBAGH Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN Introduction Conditional
More informationComputational Perception. Bayesian Inference
Computational Perception 15-485/785 January 24, 2008 Bayesian Inference The process of probabilistic inference 1. define model of problem 2. derive posterior distributions and estimators 3. estimate parameters
More informationPROBABILISTIC REASONING SYSTEMS
PROBABILISTIC REASONING SYSTEMS In which we explain how to build reasoning systems that use network models to reason with uncertainty according to the laws of probability theory. Outline Knowledge in uncertain
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University August 30, 2017 Today: Decision trees Overfitting The Big Picture Coming soon Probabilistic learning MLE,
More information1 Probabilities. 1.1 Basics 1 PROBABILITIES
1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability
More informationDiscrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14
CS 70 Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14 Introduction One of the key properties of coin flips is independence: if you flip a fair coin ten times and get ten
More informationStatistical methods for NLP Estimation
Statistical methods for NLP Estimation UNIVERSITY OF Richard Johansson January 29, 2015 why does the teacher care so much about the coin-tossing experiment? because it can model many situations: I pick
More informationBayesian Networks. Motivation
Bayesian Networks Computer Sciences 760 Spring 2014 http://pages.cs.wisc.edu/~dpage/cs760/ Motivation Assume we have five Boolean variables,,,, The joint probability is,,,, How many state configurations
More informationNLP: Probability. 1 Basics. Dan Garrette December 27, E : event space (sample space)
NLP: Probability Dan Garrette dhg@cs.utexas.edu December 27, 2013 1 Basics E : event space (sample space) We will be dealing with sets of discrete events. Example 1: Coin Trial: flipping a coin Two possible
More informationAlgorithmisches Lernen/Machine Learning
Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines
More informationMachine Learning. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 8 May 2012
Machine Learning Hal Daumé III Computer Science University of Maryland me@hal3.name CS 421 Introduction to Artificial Intelligence 8 May 2012 g 1 Many slides courtesy of Dan Klein, Stuart Russell, or Andrew
More informationProbability Based Learning
Probability Based Learning Lecture 7, DD2431 Machine Learning J. Sullivan, A. Maki September 2013 Advantages of Probability Based Methods Work with sparse training data. More powerful than deterministic
More informationNaïve Bayes Classifiers and Logistic Regression. Doug Downey Northwestern EECS 349 Winter 2014
Naïve Bayes Classifiers and Logistic Regression Doug Downey Northwestern EECS 349 Winter 2014 Naïve Bayes Classifiers Combines all ideas we ve covered Conditional Independence Bayes Rule Statistical Estimation
More informationComputational Cognitive Science
Computational Cognitive Science Lecture 8: Frank Keller School of Informatics University of Edinburgh keller@inf.ed.ac.uk Based on slides by Sharon Goldwater October 14, 2016 Frank Keller Computational
More informationStatistical learning. Chapter 20, Sections 1 3 1
Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationBayes Theorem & Naïve Bayes. (some slides adapted from slides by Massimo Poesio, adapted from slides by Chris Manning)
Bayes Theorem & Naïve Bayes (some slides adapted from slides by Massimo Poesio, adapted from slides by Chris Manning) Review: Bayes Theorem & Diagnosis P( a b) Posterior Likelihood Prior P( b a) P( a)
More informationBayesian Learning. CSL603 - Fall 2017 Narayanan C Krishnan
Bayesian Learning CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Bayes Theorem MAP Learners Bayes optimal classifier Naïve Bayes classifier Example text classification Bayesian networks
More informationBayesian Learning Features of Bayesian learning methods:
Bayesian Learning Features of Bayesian learning methods: Each observed training example can incrementally decrease or increase the estimated probability that a hypothesis is correct. This provides a more
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationMachine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d.
More informationLecture 7. Bayes formula and independence
18.440: Lecture 7 Bayes formula and independence Scott Sheffield MIT 1 Outline Bayes formula Independence 2 Outline Bayes formula Independence 3 Recall definition: conditional probability Definition: P(E
More informationAccouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF
Accouncements You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF Please do not zip these files and submit (unless there are >5 files) 1 Bayesian Methods Machine Learning
More information{ p if x = 1 1 p if x = 0
Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X =
More informationStatistical learning. Chapter 20, Sections 1 4 1
Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More information(1) Introduction to Bayesian statistics
Spring, 2018 A motivating example Student 1 will write down a number and then flip a coin If the flip is heads, they will honestly tell student 2 if the number is even or odd If the flip is tails, they
More information