KEY CONCEPTS IN PROBABILITY: SMOOTHING, MLE, AND MAP
|
|
- Amberly Clara Flowers
- 6 years ago
- Views:
Transcription
1 KEY CONCEPTS IN PROBABILITY: SMOOTHING, MLE, AND MAP
2 Outline MAPs and MLEs catchup from last week Joint Distributions a new learner Naïve Bayes another new learner
3 Administrivia Homeworks: Due tomorrow Hardcopy and Autolab submission (see wiki) Texts Mitchell or Murphy are optional this week an update from Tom Mitchell s longexpected new edition Bishop is also excellent if you prefer but a little harder to skip around in pick one or the other (both is overkill) main differences are not content but notation: for instance
4 Some practical problems I bought a loaded d20 on EBay but it didn t come with any useful specs. How can I find out how it behaves? Frequency Face Shown 1. Collect some data (20 rolls) 2. Estimate Pr(i)=C(rolls of i)/c(any roll)
5 A better solution I bought a loaded d20 on EBay but it didn t come with any specs. How can I find out how it behaves? Frequency Face Shown 0. Imagine some data (20 rolls, each i shows up 1x) 1. Collect some data (20 rolls) 2. Estimate Pr(i)=C(rolls of i)/c(any roll)
6 A better solution? Q: What if I used m rolls with a probability of q=1/20 of rolling any i? Pˆr( i) = C( ANY ) C( i) C( IMAGINED) Pˆr( i) = C( i) + mq C( ANY ) + m I can use this formula with m>20, or even with m<20 say with m=1
7 Terminology more later This is called a uniform Dirichlet prior C(i), C(ANY) are sufficient statistics Pˆr( i) = C( i) + mq C( ANY ) + m Tom s notes are different MLE = maximum likelihood estimate MAP= maximum a posteriori estimate
8 Some differences. William: Estimate each probability Pr(i) associated with a multinomial with MLE as: Tom: estimate Θ=P(heads) for a binomial with MLE as: #heads ˆPr(i) = C(i) C(ANY ) #tails for C(i)=count of times you saw i, and estimate ith MAP as: and with MAP as: #imaginary heads Pˆr( i) = C( i) + mq C( ANY ) + m #imaginary tails
9 Some apparent differences. Pˆr( i) = C( i) + mq C( ANY ) + m Tom: estimate Θ=P(heads) for a binomial with MLE as: #heads C(i) = α 1 #tails C(ANY) = α 0 +α 1 m = (γ 0 +γ 1 ) q = γ 1 / (γ 0 +γ 1 ).. and confidence in prior and with MAP as: #imaginary heads emphasizes the prior emphasizes the pseudo-data #imaginary tails
10 imagined m=60 samples with q = 0.3 imagined m=60 samples with q = 0.4
11 imagined m=120 samples with q = 0.3 imagined m=120 samples with q = 0.4
12 Why we call this a MAP Simple case: replace the die with a coin Now there s one parameter: q=p(h) I start with a prior over q, P(q) I get some data: D={D1=H, D2=T,.} I compute maximum of posterior of q argmax q P(D q) argmax q P(q D) = P(D q)p(q) P(D) = argmax q P(D q)p(q) MAP estimate MLE estimate
13 Why we call this a MAP Simple case: replace the die with a coin Now there s one parameter: q=p(h) I start with a prior over q, P(q) I get some data: D={D1=H, D2=T,.} I compute the posterior of q The math works if the pdf of P(q) is P(x) = α+1,β+1 are counts of imaginary pos/neg examples
14 Why we call this a MAP The math works if the pdf P(x) =
15 Why we call this a MAP This is called a beta distribution The generalization to multinomials is called a Dirichlet distribution Parameters are f(x 1,,x K ) =
16 KEY CONCEPTS IN PROBABILITY: THE JOINT DISTRIBUTION
17 Some practical problems I have 1 standard fair d6 die, 2 loaded d6 die, one loaded high, one low. Loaded high: P(X=6)=0.50 Loaded low: P(X=1)=0.50 Experiment: pick one d6 uniformly at random (A) and roll it. What is more likely rolling a seven or rolling doubles? Three combinations: HL, HF, FL P(D) = P(D ^ A=HL) + P(D ^ A=HF) + P(D ^ A=FL) = P(D A=HL)*P(A=HL) + P(D A=HF)*P(A=HF) + P(A A=FL)*P(A=FL)
18 A brute-force solution A Roll 1 Roll 2 P Comment FL 1 1 1/3 * 1/6 * ½ doubles FL 1 2 1/3 * 1/6 * 1/10 A joint probability table shows P(X1=x1 and and Xk=xk) FL for 1 every possible combination of values x1,x2,., xk 1 6 seven FL With 2 this you 1 can compute any P(A) where A is any FL boolean 2 combination of the primitive events (Xi=Xk), e.g. P(doubles) FL 6 P(seven or 6 eleven) doubles HL 1 P(total is higher 1 than 5) HL 1. 2 HF 1 1 doubles
19 The Joint Distribution Example: Boolean variables A, B, C Recipe for making a joint distribution of M variables:
20 The Joint Distribution Example: Boolean variables A, B, C Recipe for making a joint distribution of M variables: 1. Make a truth table listing all combinations of values of your variables (if there are M Boolean variables then the table will have 2 M rows). A B C
21 The Joint Distribution Example: Boolean variables A, B, C Recipe for making a joint distribution of M variables: 1. Make a truth table listing all combinations of values of your variables (if there are M Boolean variables then the table will have 2 M rows). 2. For each combination of values, say how probable it is. A B C Prob
22 The Joint Distribution Example: Boolean variables A, B, C Recipe for making a joint distribution of M variables: 1. Make a truth table listing all combinations of values of your variables (if there are M Boolean variables then the table will have 2 M rows). 2. For each combination of values, say how probable it is. 3. If you subscribe to the axioms of probability, those numbers must sum to 1. A B C Prob
23 Estimating The Joint Distribution Example: Boolean variables A, B, C Recipe for making a joint distribution of M variables: 1. Make a truth table listing all combinations of values of your variables (if there are M Boolean variables then the table will have 2 M rows). 2. For each combination of values, estimate how probable it is from data. 3. If you subscribe to the axioms of probability, those numbers must sum to 1. A B C Prob
24 Pros and Cons of the Joint Distribution You can do a lot with it! J Answer any query Pr(Y1,Y2,.. X1,X2, ) It takes up a lot of room! L It takes a lot of data to train! L It can be expensive to use L The big question: how do you simplify (approximate, compactly store, ) the joint and still be able to answer interesting queries?
25 Density Estimation Our Joint Distribution learner is our first example of something called Density Estimation A Density Estimator learns a mapping from a set of attributes values to a Probability Input Attributes Density Estimator Probability Copyright Andrew W. Moore
26 Density Estimation looking ahead Compare it to two other major kinds of models: Input Attributes Input Attributes Classifier Density Estimator Prediction of categorical output or class One of a few discrete values Probability Input Attributes Regressor Prediction of real-valued output Copyright Andrew W. Moore
27 Another example
28 Another example Starting point: Google books 5-gram data All 5-grams that appear >= 40 times in a corpus of 1M English books 30Gb compressed, Gb uncompressed Each 5-gram contains frequency distribution over years (which I ignored) Pulled out counts for all 5-grams (A,B,C,D,E) where C=affect or C=effect and turned this into a joint probability table
29 Some of the Joint Distribution A B C D E p is the effect of the is the effect of a The effect of this to this effect : be the effect of the not the effect of any does not affect the general does not affect the question any manner affect the principle about 50k more rows...that summarize 90M 5-gram instances in text
30 Example queries Pr(C)? c Pr(C=c) C=effect C=affect C=Effect C=EFFECT C=effecT
31 Example queries Pr(B C=affect)? b Pr(B=b C=affect) B=not B=to B=may B=they B=which
32 Example queries Pr(C B=not,D=the)? c Pr(C b=not,d=the) B=affect B=effect
33 Density Estimation As a Classifier Input Attributes Input Attributes Classifier Density Estimator Prediction of categorical output or class One of a few discrete values Probability P(X 1 =x 1,,X n =x n ) Input Attributes + Class Y Density Estimator Probability P(Y=y 1 X 1 =x 1,,X n =x n ) P(Y=y k X 1 =x 1,,X n =x n ) Predict: f(x 1 =x 1,,X n =x n )=max y i P(Y=y i X 1 =x 1,,X n =x n ) Copyright Andrew W. Moore
34 An experiment: how useful is the brute-force joint classifier? Test set: extracted all uses affect or effect in a 20k document newswire corpus: about 723 n-grams, 661 distinct Tried to predict center word C with: argmax c Pr(C=c A=a,B=b,D=d,E=e) using the joint estimated from the Google ngram data
35 Poll time
36 Example queries How many errors would I expect in 100 trials if my classifier always just guesses the most frequent class? c Pr(C=c) C=effect C=affect C=Effect C=EFFECT C=effecT
37 Performance summary Pattern Used Errors P(C A,B,D,E) But: no counts at all for a,b,c,d for 622 of the 723 instances!
38 Slightly fancier idea. Tried to predict center word with: Pr(C A=a,B=b,D=d,E=e) then P(C A,B,D) if there s no data for that then P(C B,D) if there s no data for that then P(C B) then P(C)
39 EXAMPLES The cumulative _ of the à effect (1.0) Go into _ on January à effect (1.0) From cumulative _ of accounting not present in train data Nor is From cumulative _ of _ But _ cumulative _ of _ à effect (1.0) Would not _ Finance Minister not present But _ not _ à affect (0.9625)
40 Performance summary Pattern Used Errors P(C A,B,D,E) P(C A,B,D) P(C B,D) P(C B) P(C) % error 5% error 15% error
A [somewhat] Quick Overview of Probability. Shannon Quinn CSCI 6900
A [somewhat] Quick Overview of Probability Shannon Quinn CSCI 6900 [Some material pilfered from http://www.cs.cmu.edu/~awm/tutorials] Probabilistic and Bayesian Analytics Note to other teachers and users
More informationCourse Overview and Review of Probability. William W. Cohen Machine Learning
Course Overview and Review of Probability William W. Cohen Machine Learning 10-601 OVERVIEW OF 601 SPRING 2016 SECTION B This is 10-601B (I m William Cohen) Main information page: the class wiki My home
More informationAarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell
Machine Learning 0-70/5 70/5-78, 78, Spring 00 Probability 0 Aarti Singh Lecture, January 3, 00 f(x) µ x Reading: Bishop: Chap, Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell Announcements Homework
More informationNaïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 3 September 14, Readings: Mitchell Ch Murphy Ch.
School of Computer Science 10-701 Introduction to Machine Learning aïve Bayes Readings: Mitchell Ch. 6.1 6.10 Murphy Ch. 3 Matt Gormley Lecture 3 September 14, 2016 1 Homewor 1: due 9/26/16 Project Proposal:
More informationIntroduction to Bayesian Learning. Machine Learning Fall 2018
Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability
More informationCHAPTER 2 Estimating Probabilities
CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2017. Tom M. Mitchell. All rights reserved. *DRAFT OF September 16, 2017* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is
More informationMLE/MAP + Naïve Bayes
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes MLE / MAP Readings: Estimating Probabilities (Mitchell, 2016)
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 26, 2015 Today: Bayes Classifiers Conditional Independence Naïve Bayes Readings: Mitchell: Naïve Bayes
More informationMachine Learning
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 13, 2011 Today: The Big Picture Overfitting Review: probability Readings: Decision trees, overfiting
More informationDiscrete Binary Distributions
Discrete Binary Distributions Carl Edward Rasmussen November th, 26 Carl Edward Rasmussen Discrete Binary Distributions November th, 26 / 5 Key concepts Bernoulli: probabilities over binary variables Binomial:
More informationLogistic Regression. William Cohen
Logistic Regression William Cohen 1 Outline Quick review classi5ication, naïve Bayes, perceptrons new result for naïve Bayes Learning as optimization Logistic regression via gradient ascent Over5itting
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University August 30, 2017 Today: Decision trees Overfitting The Big Picture Coming soon Probabilistic learning MLE,
More informationProbabilistic modeling. The slides are closely adapted from Subhransu Maji s slides
Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework
More informationNaïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824
Naïve Bayes Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative HW 1 out today. Please start early! Office hours Chen: Wed 4pm-5pm Shih-Yang: Fri 3pm-4pm Location: Whittemore 266
More informationClassification & Information Theory Lecture #8
Classification & Information Theory Lecture #8 Introduction to Natural Language Processing CMPSCI 585, Fall 2007 University of Massachusetts Amherst Andrew McCallum Today s Main Points Automatically categorizing
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 4, 2015 Today: Generative discriminative classifiers Linear regression Decomposition of error into
More information6.867 Machine Learning
6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.
More information6.867 Machine Learning
6.867 Machine Learning Problem set 1 Due Thursday, September 19, in class What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.
More informationBayesian Methods: Naïve Bayes
Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior
More informationNaïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Naïve Bayes Matt Gormley Lecture 18 Oct. 31, 2018 1 Reminders Homework 6: PAC Learning
More informationEstimating Parameters
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 13, 2012 Today: Bayes Classifiers Naïve Bayes Gaussian Naïve Bayes Readings: Mitchell: Naïve Bayes
More informationBayesian Models in Machine Learning
Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 22, 2011 Today: MLE and MAP Bayes Classifiers Naïve Bayes Readings: Mitchell: Naïve Bayes and Logistic
More informationCS 361: Probability & Statistics
October 17, 2017 CS 361: Probability & Statistics Inference Maximum likelihood: drawbacks A couple of things might trip up max likelihood estimation: 1) Finding the maximum of some functions can be quite
More informationMachine Learning
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 1, 2011 Today: Generative discriminative classifiers Linear regression Decomposition of error into
More informationProbability Review Lecturer: Ji Liu Thank Jerry Zhu for sharing his slides
Probability Review Lecturer: Ji Liu Thank Jerry Zhu for sharing his slides slide 1 Inference with Bayes rule: Example In a bag there are two envelopes one has a red ball (worth $100) and a black ball one
More informationMACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION
MACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION THOMAS MAILUND Machine learning means different things to different people, and there is no general agreed upon core set of algorithms that must be
More informationMachine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation
Machine Learning CMPT 726 Simon Fraser University Binomial Parameter Estimation Outline Maximum Likelihood Estimation Smoothed Frequencies, Laplace Correction. Bayesian Approach. Conjugate Prior. Uniform
More informationIntroduction to Probability and Statistics (Continued)
Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Chapter 8&9: Classification: Part 3 Instructor: Yizhou Sun yzsun@ccs.neu.edu March 12, 2013 Midterm Report Grade Distribution 90-100 10 80-89 16 70-79 8 60-69 4
More informationLanguage as a Stochastic Process
CS769 Spring 2010 Advanced Natural Language Processing Language as a Stochastic Process Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu 1 Basic Statistics for NLP Pick an arbitrary letter x at random from any
More informationMATH MW Elementary Probability Course Notes Part I: Models and Counting
MATH 2030 3.00MW Elementary Probability Course Notes Part I: Models and Counting Tom Salisbury salt@yorku.ca York University Winter 2010 Introduction [Jan 5] Probability: the mathematics used for Statistics
More informationLanguage Modelling: Smoothing and Model Complexity. COMP-599 Sept 14, 2016
Language Modelling: Smoothing and Model Complexity COMP-599 Sept 14, 2016 Announcements A1 has been released Due on Wednesday, September 28th Start code for Question 4: Includes some of the package import
More informationReview: Probability. BM1: Advanced Natural Language Processing. University of Potsdam. Tatjana Scheffler
Review: Probability BM1: Advanced Natural Language Processing University of Potsdam Tatjana Scheffler tatjana.scheffler@uni-potsdam.de October 21, 2016 Today probability random variables Bayes rule expectation
More informationCS 188: Artificial Intelligence Spring Today
CS 188: Artificial Intelligence Spring 2006 Lecture 9: Naïve Bayes 2/14/2006 Dan Klein UC Berkeley Many slides from either Stuart Russell or Andrew Moore Bayes rule Today Expectations and utilities Naïve
More informationProbability Theory for Machine Learning. Chris Cremer September 2015
Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares
More informationSome slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2
Logistics CSE 446: Point Estimation Winter 2012 PS2 out shortly Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Last Time Random variables, distributions Marginal, joint & conditional
More informationMachine Learning, Midterm Exam: Spring 2008 SOLUTIONS. Q Topic Max. Score Score. 1 Short answer questions 20.
10-601 Machine Learning, Midterm Exam: Spring 2008 Please put your name on this cover sheet If you need more room to work out your answer to a question, use the back of the page and clearly mark on the
More informationLecture 10 and 11: Text and Discrete Distributions
Lecture 10 and 11: Text and Discrete Distributions Machine Learning 4F13, Spring 2014 Carl Edward Rasmussen and Zoubin Ghahramani CUED http://mlg.eng.cam.ac.uk/teaching/4f13/ Rasmussen and Ghahramani Lecture
More informationMachine Learning. Theory of Classification and Nonparametric Classifier. Lecture 2, January 16, What is theoretically the best classifier
Machine Learning 10-701/15 701/15-781, 781, Spring 2008 Theory of Classification and Nonparametric Classifier Eric Xing Lecture 2, January 16, 2006 Reading: Chap. 2,5 CB and handouts Outline What is theoretically
More informationMachine Learning Algorithm. Heejun Kim
Machine Learning Algorithm Heejun Kim June 12, 2018 Machine Learning Algorithms Machine Learning algorithm: a procedure in developing computer programs that improve their performance with experience. Types
More informationNotes on Machine Learning for and
Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori
More informationBayesian Classifiers, Conditional Independence and Naïve Bayes. Required reading: Naïve Bayes and Logistic Regression (available on class website)
Bayesian Classifiers, Conditional Independence and Naïve Bayes Required reading: Naïve Bayes and Logistic Regression (available on class website) Machine Learning 10-701 Tom M. Mitchell Machine Learning
More informationName: Firas Rassoul-Agha
Midterm 1 - Math 5010 - Spring 016 Name: Firas Rassoul-Agha Solve the following 4 problems. You have to clearly explain your solution. The answer carries no points. Only the work does. CALCULATORS ARE
More informationBayesian Learning. Instructor: Jesse Davis
Bayesian Learning Instructor: Jesse Davis 1 Announcements Homework 1 is due today Homework 2 is out Slides for this lecture are online We ll review some of homework 1 next class Techniques for efficient
More informationBayesian Analysis for Natural Language Processing Lecture 2
Bayesian Analysis for Natural Language Processing Lecture 2 Shay Cohen February 4, 2013 Administrativia The class has a mailing list: coms-e6998-11@cs.columbia.edu Need two volunteers for leading a discussion
More informationLecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019
Lecture 10: Probability distributions DANIEL WELLER TUESDAY, FEBRUARY 19, 2019 Agenda What is probability? (again) Describing probabilities (distributions) Understanding probabilities (expectation) Partial
More informationLearning Bayesian network : Given structure and completely observed data
Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution
More informationLecture 2: Conjugate priors
(Spring ʼ) Lecture : Conjugate priors Julia Hockenmaier juliahmr@illinois.edu Siebel Center http://www.cs.uiuc.edu/class/sp/cs98jhm The binomial distribution If p is the probability of heads, the probability
More informationNaïve Bayes. Vibhav Gogate The University of Texas at Dallas
Naïve Bayes Vibhav Gogate The University of Texas at Dallas Supervised Learning of Classifiers Find f Given: Training set {(x i, y i ) i = 1 n} Find: A good approximation to f : X Y Examples: what are
More informationGaussian Mixture Models, Expectation Maximization
Gaussian Mixture Models, Expectation Maximization Instructor: Jessica Wu Harvey Mudd College The instructor gratefully acknowledges Andrew Ng (Stanford), Andrew Moore (CMU), Eric Eaton (UPenn), David Kauchak
More informationMLE/MAP + Naïve Bayes
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes Matt Gormley Lecture 19 March 20, 2018 1 Midterm Exam Reminders
More informationToday. Statistical Learning. Coin Flip. Coin Flip. Experiment 1: Heads. Experiment 1: Heads. Which coin will I use? Which coin will I use?
Today Statistical Learning Parameter Estimation: Maximum Likelihood (ML) Maximum A Posteriori (MAP) Bayesian Continuous case Learning Parameters for a Bayesian Network Naive Bayes Maximum Likelihood estimates
More informationOutline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution
Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model
More informationMachine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?
Linear Regression Machine Learning CSE546 Sham Kakade University of Washington Oct 4, 2016 1 What about continuous variables? Billionaire says: If I am measuring a continuous variable, what can you do
More informationMachine Learning, Fall 2012 Homework 2
0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0
More informationCS4705. Probability Review and Naïve Bayes. Slides from Dragomir Radev
CS4705 Probability Review and Naïve Bayes Slides from Dragomir Radev Classification using a Generative Approach Previously on NLP discriminative models P C D here is a line with all the social media posts
More informationModeling Environment
Topic Model Modeling Environment What does it mean to understand/ your environment? Ability to predict Two approaches to ing environment of words and text Latent Semantic Analysis (LSA) Topic Model LSA
More informationThe Naïve Bayes Classifier. Machine Learning Fall 2017
The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning
More information10/15/2015 A FAST REVIEW OF DISCRETE PROBABILITY (PART 2) Probability, Conditional Probability & Bayes Rule. Discrete random variables
Probability, Conditional Probability & Bayes Rule A FAST REVIEW OF DISCRETE PROBABILITY (PART 2) 2 Discrete random variables A random variable can take on one of a set of different values, each with an
More informationBayesian Learning. CSL603 - Fall 2017 Narayanan C Krishnan
Bayesian Learning CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Bayes Theorem MAP Learners Bayes optimal classifier Naïve Bayes classifier Example text classification Bayesian networks
More information7.1 What is it and why should we care?
Chapter 7 Probability In this section, we go over some simple concepts from probability theory. We integrate these with ideas from formal language theory in the next chapter. 7.1 What is it and why should
More informationGenerative Models for Discrete Data
Generative Models for Discrete Data ddebarr@uw.edu 2016-04-21 Agenda Bayesian Concept Learning Beta-Binomial Model Dirichlet-Multinomial Model Naïve Bayes Classifiers Bayesian Concept Learning Numbers
More informationCS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning
CS 446 Machine Learning Fall 206 Nov 0, 206 Bayesian Learning Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes Overview Bayesian Learning Naive Bayes Logistic Regression Bayesian Learning So far, we
More informationBasics on Probability. Jingrui He 09/11/2007
Basics on Probability Jingrui He 09/11/2007 Coin Flips You flip a coin Head with probability 0.5 You flip 100 coins How many heads would you expect Coin Flips cont. You flip a coin Head with probability
More informationLecture 9: Naive Bayes, SVM, Kernels. Saravanan Thirumuruganathan
Lecture 9: Naive Bayes, SVM, Kernels Instructor: Outline 1 Probability basics 2 Probabilistic Interpretation of Classification 3 Bayesian Classifiers, Naive Bayes 4 Support Vector Machines Probability
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationCPSC 340: Machine Learning and Data Mining
CPSC 340: Machine Learning and Data Mining MLE and MAP Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due tonight. Assignment 5: Will be released
More informationDeep Learning for Computer Vision
Deep Learning for Computer Vision Lecture 3: Probability, Bayes Theorem, and Bayes Classification Peter Belhumeur Computer Science Columbia University Probability Should you play this game? Game: A fair
More informationBayes Theorem & Naïve Bayes. (some slides adapted from slides by Massimo Poesio, adapted from slides by Chris Manning)
Bayes Theorem & Naïve Bayes (some slides adapted from slides by Massimo Poesio, adapted from slides by Chris Manning) Review: Bayes Theorem & Diagnosis P( a b) Posterior Likelihood Prior P( b a) P( a)
More informationProbabilistic Graphical Models
Parameter Estimation December 14, 2015 Overview 1 Motivation 2 3 4 What did we have so far? 1 Representations: how do we model the problem? (directed/undirected). 2 Inference: given a model and partially
More informationProbabilistic and Bayesian Analytics Based on a Tutorial by Andrew W. Moore, Carnegie Mellon University
robabilistic and Bayesian Analytics Based on a Tutorial by Andrew W. Moore, Carnegie Mellon Uniersity www.cs.cmu.edu/~awm/tutorials Discrete Random Variables A is a Boolean-alued random ariable if A denotes
More informationCS 361: Probability & Statistics
March 14, 2018 CS 361: Probability & Statistics Inference The prior From Bayes rule, we know that we can express our function of interest as Likelihood Prior Posterior The right hand side contains the
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a
More informationQuantitative Understanding in Biology 1.7 Bayesian Methods
Quantitative Understanding in Biology 1.7 Bayesian Methods Jason Banfelder October 25th, 2018 1 Introduction So far, most of the methods we ve looked at fall under the heading of classical, or frequentist
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationLecture 18: Learning probabilistic models
Lecture 8: Learning probabilistic models Roger Grosse Overview In the first half of the course, we introduced backpropagation, a technique we used to train neural nets to minimize a variety of cost functions.
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 14, 2015 Today: The Big Picture Overfitting Review: probability Readings: Decision trees, overfiting
More informationA Bayesian Method for Guessing the Extreme Values in a Data Set
A Bayesian Method for Guessing the Extreme Values in a Data Set Mingxi Wu University of Florida May, 2008 Mingxi Wu (University of Florida) May, 2008 1 / 74 Outline Problem Definition Example Applications
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide
More informationMachine Learning: Homework Assignment 2 Solutions
10-601 Machine Learning: Homework Assignment 2 Solutions Professor Tom Mitchell Carnegie Mellon University January 21, 2009 The assignment is due at 1:30pm (beginning of class) on Monday, February 2, 2009.
More informationData Analysis and Monte Carlo Methods
Lecturer: Allen Caldwell, Max Planck Institute for Physics & TUM Recitation Instructor: Oleksander (Alex) Volynets, MPP & TUM General Information: - Lectures will be held in English, Mondays 16-18:00 -
More informationBayesian RL Seminar. Chris Mansley September 9, 2008
Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in
More informationCS 361: Probability & Statistics
February 12, 2018 CS 361: Probability & Statistics Random Variables Monty hall problem Recall the setup, there are 3 doors, behind two of them are indistinguishable goats, behind one is a car. You pick
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationUVA CS / Introduc8on to Machine Learning and Data Mining
UVA CS 4501-001 / 6501 007 Introduc8on to Machine Learning and Data Mining Lecture 13: Probability and Sta3s3cs Review (cont.) + Naïve Bayes Classifier Yanjun Qi / Jane, PhD University of Virginia Department
More informationStatistical methods for NLP Estimation
Statistical methods for NLP Estimation UNIVERSITY OF Richard Johansson January 29, 2015 why does the teacher care so much about the coin-tossing experiment? because it can model many situations: I pick
More informationMath 105 Course Outline
Math 105 Course Outline Week 9 Overview This week we give a very brief introduction to random variables and probability theory. Most observable phenomena have at least some element of randomness associated
More informationMachine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 20, 2012 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)
More informationIntroduction to Machine Learning
Introduction to Machine Learning CS4375 --- Fall 2018 Bayesian a Learning Reading: Sections 13.1-13.6, 20.1-20.2, R&N Sections 6.1-6.3, 6.7, 6.9, Mitchell 1 Uncertainty Most real-world problems deal with
More informationBayesian Networks. Motivation
Bayesian Networks Computer Sciences 760 Spring 2014 http://pages.cs.wisc.edu/~dpage/cs760/ Motivation Assume we have five Boolean variables,,,, The joint probability is,,,, How many state configurations
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationReview of Probabilities and Basic Statistics
Alex Smola Barnabas Poczos TA: Ina Fiterau 4 th year PhD student MLD Review of Probabilities and Basic Statistics 10-701 Recitations 1/25/2013 Recitation 1: Statistics Intro 1 Overview Introduction to
More information2/3/04. Syllabus. Probability Lecture #2. Grading. Probability Theory. Events and Event Spaces. Experiments and Sample Spaces
Probability Lecture #2 Introduction to Natural Language Processing CMPSCI 585, Spring 2004 University of Massachusetts Amherst Andrew McCallum Syllabus Probability and Information Theory Spam filtering
More informationMachine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014
Machine Learning Probability Basics Basic definitions: Random variables, joint, conditional, marginal distribution, Bayes theorem & examples; Probability distributions: Binomial, Beta, Multinomial, Dirichlet,
More informationIntroduction to Machine Learning
Uncertainty Introduction to Machine Learning CS4375 --- Fall 2018 a Bayesian Learning Reading: Sections 13.1-13.6, 20.1-20.2, R&N Sections 6.1-6.3, 6.7, 6.9, Mitchell Most real-world problems deal with
More informationSAMPLE CHAPTER. Avi Pfeffer. FOREWORD BY Stuart Russell MANNING
SAMPLE CHAPTER Avi Pfeffer FOREWORD BY Stuart Russell MANNING Practical Probabilistic Programming by Avi Pfeffer Chapter 9 Copyright 2016 Manning Publications brief contents PART 1 INTRODUCING PROBABILISTIC
More informationCOS 424: Interacting with Data. Lecturer: Dave Blei Lecture #11 Scribe: Andrew Ferguson March 13, 2007
COS 424: Interacting with ata Lecturer: ave Blei Lecture #11 Scribe: Andrew Ferguson March 13, 2007 1 Graphical Models Wrap-up We began the lecture with some final words on graphical models. Choosing a
More informationBayesian classification CISC 5800 Professor Daniel Leeds
Bayesian classification CISC 5800 Professor Daniel Leeds Classifying with robabilities Examle goal: Determine is it cloudy out Available data: Light detector: x 0,25 Potential class (atmosheric states):
More informationWhy Probability? It's the right way to look at the world.
Probability Why Probability? It's the right way to look at the world. Discrete Random Variables We denote discrete random variables with capital letters. A boolean random variable may be either true or
More information