CS 6140: Machine Learning Spring 2016
|
|
- Elmer Clarke
- 6 years ago
- Views:
Transcription
1 CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa?on Science Northeastern University Webpage:
2 Logis?cs Assignment 1 Due Feb 4 Electronic copy on blackboard Hard copy in class If you have discussed a problem with someone or get the idea from other sources (e.g. academic publica?ons, lectures, textbooks), you need to acknowledge it! Northeastern University Academic Integrity Policy hup://
3 Survey What do you expect you can learn from this course? Content of the Course Difficulty of the material Difficulty of the assignment Amount of programming
4 What We Learned Last Week Genera?ve Model and Discrimina?ve Model Logis?c Regression Genera?ve Models Genera?ve Models vs. Discrimina?ve Models Decision Tree
5 Genera?ve VS. Discrimina?ve Model Genera?ve model Learn P(X, Y) from training sample P(X, Y)=P(Y)P(X Y) Specifies how to generate the observed features x for y Discrimina?ve model Learn P(Y X) from training sample Directly models the mapping from features x to y
6 Genera?ve VS. Discrimina?ve Model Easy to fit the model
7 Genera?ve VS. Discrimina?ve Model Easy to fit the model Genera?ve model!
8 Genera?ve VS. Discrimina?ve Model Fit classes separately
9 Genera?ve VS. Discrimina?ve Model Fit classes separately Genera?ve model!
10 Genera?ve VS. Discrimina?ve Model Handle missing features easily
11 Genera?ve VS. Discrimina?ve Model Handle missing features easily Genera?ve model!
12 Genera?ve VS. Discrimina?ve Model Handle unlabeled training data
13 Genera?ve VS. Discrimina?ve Model Handle unlabeled training data Easier for Genera?ve model!
14 Genera?ve VS. Discrimina?ve Model Symmetric in inputs and outputs
15 Genera?ve VS. Discrimina?ve Model Symmetric in inputs and outputs Genera?ve model! Define p(x,y)
16 Genera?ve VS. Discrimina?ve Model Handle feature preprocessing
17 Genera?ve VS. Discrimina?ve Model Handle feature preprocessing Discrimina?ve model!
18 Genera?ve VS. Discrimina?ve Model Well-calibrated probabili?es
19 Genera?ve VS. Discrimina?ve Model Well-calibrated probabili?es Discrimina?ve model!
20 Logis?c Regression A discrimina?ve model sigm is sigmod func?on
21 Logis?c Regression
22 Bayesian Inference
23 Bayes Rules
24 Play tennis? Decision Tree
25 Entropy Entropy H(X) of a random variable X H(X) is the expected number of bits needed to encode a randomly drawn value of X (under most efficient code)
26 Informa?on Gain Gain(S,A)=expected reduc?on in entropy due to sor?ng on A
27 Today s Outline Bayesian Sta?s?cs Frequen?st Sta?s?cs Feature Selec?on Some slides are borrowed from Kevin Murphy s lectures
28 Fundamental principle of Bayesian sta?s?cs Everything that is uncertain is modeled with a probability distribu?on. Parameters Hyper-parameters Incorporate everything that is known is by condi?oning on it, using Bayes rule to update our prior beliefs into posterior beliefs.
29 Fundamental principle of Bayesian sta?s?cs Everything that is uncertain is modeled with a probability distribu?on. Parameters Hyper-parameters Incorporate everything that is known is by condi?oning on it, using Bayes rule to update our prior beliefs into posterior beliefs. Posterior Prior Likelihood
30 Advantages of Bayes Conceptually simple Handle small sample sizes Handle complex hierarchical models without overfihng No need to choose between different es?mators, hypothesis tes?ng procedures
31 Disadvantages of Bayes Need to specify a prior! Computa?onal Issues!
32 Disadvantages of Bayes Need to specify a prior! Subjec?ve But every model come with its own assump?on Es?mate prior from data -> empirical Bayes
33 Disadvantages of Bayes Computa?onal Issues! Compu?ng the normaliza?on constant requires integra?ng over all the parameters Compu?ng posterior expecta?ons requires integra?ng over all the parameters
34 Approximate inference We can evaluate posterior expecta?ons using Monte Carlo integra?on
35 Monte Carlo Approxima?on In general, compu?ng the distribu?on of a func?on of an random variable using the change of variable is difficult. A powerful way: Generate samples from the distribu?on Use Monte Carlo to approximate the expected value of any func?on of a random variable
36 Monte Carlo Approxima?on Many useful func?ons that we can approximate
37 Monte Carlo Approxima?on Suppose we have and We can approximate p(y) by drawing sample from p(x), squaring them, and compu?ng the empirical distribu?on.
38 Monte Carlo Approxima?on Suppose we have and P(y)
39 Disadvantages of Bayes Computa?onal Issues! Compu?ng the normaliza?on constant requires integra?ng over all the parameters Compu?ng posterior expecta?ons requires integra?ng over all the parameters
40 Conjugate priors For simplicity, we will mostly focus on a special kind of prior which has nice mathema?cal proper?es. A prior likelihood posterior as. is said to be conjugate to a if the corresponding has the same func?onal form
41 Conjugate priors This means the prior family is closed under Bayesian upda?ng. we can recursively apply the rule to update our beliefs as data streams in. -> online learning
42 Coin Tossing Example Consider the problem of es?ma?ng the probability of heads from a sequence of N coin tosses: Likelihood Prior Posterior
43 Likelihood: Binomial distribu?on Let X = number of heads in N trials.
44
45 Likelihood: Bernoulli Distribu?on Special case of Binomial Binomial distribu?on when N=1 is called the Bernoulli distribu?on. Specially,
46 Fihng a Bernoulli distribu?on Suppose we conduct N=100 trials and get data D = (1, 0, 1, 1, 0,.) with N 1 heads and N 0 tails. What is?
47 Fihng a Bernoulli distribu?on Suppose we conduct N=100 trials and get data D = (1, 0, 1, 1, 0,.) with N 1 heads and N 0 tails. What is? Maximum likelihood es?ma?on
48 Fihng a Bernoulli distribu?on
49 Fihng a Bernoulli distribu?on Log-likelihood
50 Fihng a Bernoulli distribu?on Log-likelihood
51 Fihng a Bernoulli distribu?on
52 Conjugate priors: The beta-bernoulli model Consider the probability of heads, given a sequence of N coin tosses, X 1,, X N. Likelihood Natural conjugate prior is the Beta distribu?on Posterior is also Beta, with updated counts
53 The beta distribu?on Beta distribu?on Beta func?on
54 Beta distribu?on The beta distribu?on
55 Upda?ng a beta distribu?on Prior is Beta(2,2). Observe 1 head. Posterior is Beta(3,2), so mean shins from 2/4 to 3/5. Prior is Beta(3,2). Observe 1 head. Posterior is Beta(4,2), so mean shins from 3/5 to 4/6.
56 Sehng the hyper-parameters The prior hyper-parameters can be interpreted as pseudo counts The effec?ve sample size (strength) of the prior is The prior mean is If our prior belief is p(heads) = 0.3, and we think this belief is equivalent to about 10 data points, we just solve
57 Point Es?ma?on The posterior is our belief state. To convert it to a single best guess (point es?mate), we pick the value that minimizes some loss func?on, e.g., MSE -> posterior mean, 0/1 loss -> posterior mode
58 Posterior Mean Let N=N 1 + N 0 be the amount of data, and be the amount of virtual data The posterior mean is a convex combina?on of prior mean and MLE N 1 /N Prior MLE
59 MAP Es?ma?on It is onen easier to compute the posterior mode (op?miza?on) than the posterior mean (integra?on). This is called maximum a posteriori es?ma?on. For the beta distribu?on
60 Summary of beta-bernoulli model
61 Bayesian Model Selec?on Face with a set of models of different complexity, how should we choose?
62 Bayesian Model Selec?on Cross-valida?on Divide training set into N par??ons Train on N-1 par??ons, and evaluate on the rest In total, fihng the model for N?mes
63 Bayesian Model Selec?on Compute posterior Then compute MAP
64 Bayesian Model Selec?on Compute posterior Uniform prior over models Then we are picking the model which maximizes Marginal likelihood, Integrated likelihood, Or evidence
65 Bayes Factors To compare two models, use posterior odds Bayes factor The Bayes factor is a Bayesian version of a likelihood ra?o test, that can be used to compare models of different complexity
66 Example: Coin Flipping Suppose we toss a coin N=250?mes and observe N 1 =141 heads and N 0 =109 tails
67 Example: Coin Flipping Suppose we toss a coin N=250?mes and observe N 1 =141 heads and N 0 =109 tails Consider two hypotheses: H 0 : H 1 :
68 Example: Coin Flipping
69 Bayesian Occam s Razor Occam s Razor
70 Bayesian Occam s Razor Occam s Razor Simplest model that adequately explains the data
71 Bayesian Occam s Razor Occam s Razor Simplest model that adequately explains the data selects models would always favor the model with most parameters MLE, or MAP to es?mate parameters Integrate out the parameters!
72 Bayesian Occam s Razor Overfihng early samples
73 Bayesian Occam s Razor Probability over all possible datasets Complex models must spread out their probability mass thinly
74 Bayesian Occam s Razor Complex models must spread out their probability mass thinly
75 Marginal likelihood When performing Bayesian model selec?on and empirical Bayes es?ma?on, we will need This is given by a ra?o of the posterior and prior normalizing constants
76 Summary of beta-bernoulli model
77 From coins to dice
78 Mul?nomial: 1 sample One-shot encoding Probability for class k
79 Likelihood
80 Conjugate Prior: Dirichlet distribu?on Generaliza?on of Beta to K dimensions Normaliza?on constant
81 Conjugate Prior: Dirichlet distribu?on Generaliza?on of Beta to K dimensions (20, 20, 20) (2, 2, 2) (20, 2, 2)
82 Summary of Dirichlet-mul?nomial model
83 Frequen?st Sta?s?cs We have seen how Bayesian inference offers a principled solu?on to the parameter es?ma?on problem.
84 Frequen?st Sta?s?cs Parameter es?ma?on MAP es?mate MLE
85 Why maximum likelihood? KL divergence from the true distribu?on p to the approxima?on q is
86 Why maximum likelihood? KL divergence from the true distribu?on p to the approxima?on q is Empirical distribu?on
87 Maximum Likelihood = min KL (to empirical distribu?on) KL divergence to empirical distribu?on
88 Maximum Likelihood = min KL (to empirical distribu?on) KL divergence to empirical distribu?on Hence minimizing KL is equivalent to minimizing the average nega?ve log likelihood on the training set
89 Bernoulli MLE Remember that
90 However Suppose we toss a coin N=3?mes and see 3 tails. We would es?mate the probability of heads as 0.
91 However Suppose we toss a coin N=3?mes and see 3 tails. We would es?mate the probability of heads as 0. Too few samples -> sparse data!
92 However Suppose we toss a coin N=3?mes and see 3 tails. We would es?mate the probability of heads as 0. We can add pseudo counts C 0 and C 1 (e.g., 0.1) to the sufficient sta?s?cs N 0 and N 1 to get a beuer behaved es?mate. This is the MAP es?mate using a Beta prior.
93 MLE for the mul?nomial If x n {1,,K}, the likelihood is The log-likelihood is
94 Compu?ng the mul?nomial MLE
95 Compu?ng the mul?nomial MLE
96 Compu?ng the Gaussian MLE
97 Compu?ng the Gaussian MLE
98 Bayesian vs. Frequen?st MLE returns a point es?mate In frequen?st sta?s?cs, we treat D as random and as fixed, and ask how the es?mate would change if D changed. In Bayesian sta?s?cs, we treat D as fixed and as random, and model our uncertainty with the posterior
99 Unbiased es?mators The bias of an es?mator is defined as An es?mator is unbiased if bias=0.
100 Unbiased es?mators MLE for Gaussian mean is unbiased
101 Is being unbiased enough?
102 Consistent es?mators An es?mator is consistent if it converges (in probability) to the true value with enough data MLE is a consistent es?mator.
103 Bias-variance tradeoff Being unbiased is not necessarily desirable! Suppose our loss func?on is mean squared error where
104 Feature Selec?on If predic?ve accuracy is the goal, onen best to keep all predictors and use L2 regulariza?on We onen want to select a subset of the inputs that are most relevant for predic?ng the output, to get sparse models interpretability, speed, possibly beuer predic?ve accuracy
105 Filter methods Compute relevance of each feature to the label marginally Computa?onally efficient
106 Correla?on coefficient Measures extent to which X j and Y are linearly related
107 Correla?on coefficient Mutual informa?on Can model non linear non Gaussian dependencies For discrete data
108 Wrapper Methods Perform discrete search in model space Wrap search around standard model fihng
109 Wrapper Methods Forward selec?on for linear regression At each step, add feature that maximally reduces residual error
110 Wrapper Methods Forward selec?on for linear regression At each step, add feature that maximally reduces residual error
111 Wrapper Methods Forward selec?on for linear regression Put the es?ma?on in
112 What we learned today Bayesian Sta?s?cs Frequen?st Sta?s?cs Feature Selec?on Some slides are borrowed from Kevin Murphy s lectures
113 Homework Read Murphy CH 5, 6 Assignment 1 due 02/04, 6pm! Both hard copy and electronic copy
CS 6140: Machine Learning Spring What We Learned Last Week. Survey 2/26/16. VS. Model
Logis@cs CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Assignment
More informationCS 6140: Machine Learning Spring What We Learned Last Week 2/26/16
Logis@cs CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Sign
More informationCS 6140: Machine Learning Spring 2017
CS 6140: Machine Learning Spring 2017 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Logis@cs Assignment
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Empirical Bayes, Hierarchical Bayes Mark Schmidt University of British Columbia Winter 2017 Admin Assignment 5: Due April 10. Project description on Piazza. Final details coming
More informationIntroduction to Bayesian Learning. Machine Learning Fall 2018
Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability
More informationSlides modified from: PATTERN RECOGNITION AND MACHINE LEARNING CHRISTOPHER M. BISHOP
Slides modified from: PATTERN RECOGNITION AND MACHINE LEARNING CHRISTOPHER M. BISHOP Predic?ve Distribu?on (1) Predict t for new values of x by integra?ng over w: where The Evidence Approxima?on (1) The
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013
Bayesian Methods Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2013 1 What about prior n Billionaire says: Wait, I know that the thumbtack is close to 50-50. What can you
More informationLatent Dirichlet Alloca/on
Latent Dirichlet Alloca/on Blei, Ng and Jordan ( 2002 ) Presented by Deepak Santhanam What is Latent Dirichlet Alloca/on? Genera/ve Model for collec/ons of discrete data Data generated by parameters which
More informationUVA CS / Introduc8on to Machine Learning and Data Mining
UVA CS 4501-001 / 6501 007 Introduc8on to Machine Learning and Data Mining Lecture 13: Probability and Sta3s3cs Review (cont.) + Naïve Bayes Classifier Yanjun Qi / Jane, PhD University of Virginia Department
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University August 30, 2017 Today: Decision trees Overfitting The Big Picture Coming soon Probabilistic learning MLE,
More informationCSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015
CSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015 Luke ZeElemoyer Slides adapted from Carlos Guestrin Predic5on of con5nuous variables Billionaire says: Wait, that s not what
More informationLearning with Probabilities
Learning with Probabilities CS194-10 Fall 2011 Lecture 15 CS194-10 Fall 2011 Lecture 15 1 Outline Bayesian learning eliminates arbitrary loss functions and regularizers facilitates incorporation of prior
More informationStatistical learning. Chapter 20, Sections 1 3 1
Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationMachine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?
Linear Regression Machine Learning CSE546 Sham Kakade University of Washington Oct 4, 2016 1 What about continuous variables? Billionaire says: If I am measuring a continuous variable, what can you do
More informationMachine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation
Machine Learning CMPT 726 Simon Fraser University Binomial Parameter Estimation Outline Maximum Likelihood Estimation Smoothed Frequencies, Laplace Correction. Bayesian Approach. Conjugate Prior. Uniform
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?
Linear Regression Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2014 1 What about continuous variables? n Billionaire says: If I am measuring a continuous variable, what
More informationProbabilistic modeling. The slides are closely adapted from Subhransu Maji s slides
Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework
More informationSTAD68: Machine Learning
STAD68: Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! h0p://www.cs.toronto.edu/~rsalakhu/ Lecture 1 Evalua;on 3 Assignments worth 40%. Midterm worth 20%. Final
More informationSome slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2
Logistics CSE 446: Point Estimation Winter 2012 PS2 out shortly Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Last Time Random variables, distributions Marginal, joint & conditional
More informationMachine Learning
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 13, 2011 Today: The Big Picture Overfitting Review: probability Readings: Decision trees, overfiting
More informationMLE/MAP + Naïve Bayes
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes MLE / MAP Readings: Estimating Probabilities (Mitchell, 2016)
More informationBayesian Models in Machine Learning
Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of
More informationThe Naïve Bayes Classifier. Machine Learning Fall 2017
The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 2. MLE, MAP, Bayes classification Barnabás Póczos & Aarti Singh 2014 Spring Administration http://www.cs.cmu.edu/~aarti/class/10701_spring14/index.html Blackboard
More informationNaïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Naïve Bayes Matt Gormley Lecture 18 Oct. 31, 2018 1 Reminders Homework 6: PAC Learning
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More informationIntroduc)on to Bayesian methods (con)nued) - Lecture 16
Introduc)on to Bayesian methods (con)nued) - Lecture 16 David Sontag New York University Slides adapted from Luke Zettlemoyer, Carlos Guestrin, Dan Klein, and Vibhav Gogate Outline of lectures Review of
More informationComputer Vision. Pa0ern Recogni4on Concepts Part I. Luis F. Teixeira MAP- i 2012/13
Computer Vision Pa0ern Recogni4on Concepts Part I Luis F. Teixeira MAP- i 2012/13 What is it? Pa0ern Recogni4on Many defini4ons in the literature The assignment of a physical object or event to one of
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More informationBias/variance tradeoff, Model assessment and selec+on
Applied induc+ve learning Bias/variance tradeoff, Model assessment and selec+on Pierre Geurts Department of Electrical Engineering and Computer Science University of Liège October 29, 2012 1 Supervised
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationBayesian networks Lecture 18. David Sontag New York University
Bayesian networks Lecture 18 David Sontag New York University Outline for today Modeling sequen&al data (e.g., =me series, speech processing) using hidden Markov models (HMMs) Bayesian networks Independence
More informationCS 361: Probability & Statistics
March 14, 2018 CS 361: Probability & Statistics Inference The prior From Bayes rule, we know that we can express our function of interest as Likelihood Prior Posterior The right hand side contains the
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationCOMP 551 Applied Machine Learning Lecture 19: Bayesian Inference
COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted
More informationAccouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF
Accouncements You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF Please do not zip these files and submit (unless there are >5 files) 1 Bayesian Methods Machine Learning
More informationPoint Estimation. Vibhav Gogate The University of Texas at Dallas
Point Estimation Vibhav Gogate The University of Texas at Dallas Some slides courtesy of Carlos Guestrin, Chris Bishop, Dan Weld and Luke Zettlemoyer. Basics: Expectation and Variance Binary Variables
More informationStatistical learning. Chapter 20, Sections 1 4 1
Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationGenerative Model (Naïve Bayes, LDA)
Generative Model (Naïve Bayes, LDA) IST557 Data Mining: Techniques and Applications Jessie Li, Penn State University Materials from Prof. Jia Li, sta3s3cal learning book (Has3e et al.), and machine learning
More informationDecision Trees Lecture 12
Decision Trees Lecture 12 David Sontag New York University Slides adapted from Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore Machine Learning in the ER Physician documentation Triage Information
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationDS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling
DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling Due: Tuesday, May 10, 2016, at 6pm (Submit via NYU Classes) Instructions: Your answers to the questions below, including
More informationLogis&c Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com
Logis&c Regression These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these
More informationAnnouncements. Proposals graded
Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:
More informationIntroduc)on to Ar)ficial Intelligence
Introduc)on to Ar)ficial Intelligence Lecture 10 Probability CS/CNS/EE 154 Andreas Krause Announcements! Milestone due Nov 3. Please submit code to TAs! Grading: PacMan! Compiles?! Correct? (Will clear
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationCS540 Machine learning L8
CS540 Machine learning L8 Announcements Linear algebra tutorial by Mark Schmidt, 5:30 to 6:30 pm today, in the CS X-wing 8th floor lounge (X836). Move midterm from Tue Oct 14 to Thu Oct 16? Hw3sol handed
More informationStatistical learning. Chapter 20, Sections 1 3 1
Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationBayesian Methods: Naïve Bayes
Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationAn Introduc+on to Sta+s+cs and Machine Learning for Quan+ta+ve Biology. Anirvan Sengupta Dept. of Physics and Astronomy Rutgers University
An Introduc+on to Sta+s+cs and Machine Learning for Quan+ta+ve Biology Anirvan Sengupta Dept. of Physics and Astronomy Rutgers University Why Do We Care? Necessity in today s labs Principled approach:
More informationIntroduc)on to Ar)ficial Intelligence
Introduc)on to Ar)ficial Intelligence Lecture 13 Approximate Inference CS/CNS/EE 154 Andreas Krause Bayesian networks! Compact representa)on of distribu)ons over large number of variables! (OQen) allows
More informationModel Averaging (Bayesian Learning)
Model Averaging (Bayesian Learning) We want to predict the output Y of a new case that has input X = x given the training examples e: p(y x e) = m M P(Y m x e) = m M P(Y m x e)p(m x e) = m M P(Y m x)p(m
More informationBayesian Analysis for Natural Language Processing Lecture 2
Bayesian Analysis for Natural Language Processing Lecture 2 Shay Cohen February 4, 2013 Administrativia The class has a mailing list: coms-e6998-11@cs.columbia.edu Need two volunteers for leading a discussion
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationDecision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag
Decision Trees Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Supervised Learning Input: labelled training data i.e., data plus desired output Assumption:
More informationIntroduction to Particle Filters for Data Assimilation
Introduction to Particle Filters for Data Assimilation Mike Dowd Dept of Mathematics & Statistics (and Dept of Oceanography Dalhousie University, Halifax, Canada STATMOS Summer School in Data Assimila5on,
More informationBayesian Inference and MCMC
Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the
More informationNaïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824
Naïve Bayes Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative HW 1 out today. Please start early! Office hours Chen: Wed 4pm-5pm Shih-Yang: Fri 3pm-4pm Location: Whittemore 266
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationCOMP 562: Introduction to Machine Learning
COMP 562: Introduction to Machine Learning Lecture 20 : Support Vector Machines, Kernels Mahmoud Mostapha 1 Department of Computer Science University of North Carolina at Chapel Hill mahmoudm@cs.unc.edu
More informationComputational Perception. Bayesian Inference
Computational Perception 15-485/785 January 24, 2008 Bayesian Inference The process of probabilistic inference 1. define model of problem 2. derive posterior distributions and estimators 3. estimate parameters
More informationRegression.
Regression www.biostat.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts linear regression RMSE, MAE, and R-square logistic regression convex functions and sets
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September
More information(1) Introduction to Bayesian statistics
Spring, 2018 A motivating example Student 1 will write down a number and then flip a coin If the flip is heads, they will honestly tell student 2 if the number is even or odd If the flip is tails, they
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationCS 630 Basic Probability and Information Theory. Tim Campbell
CS 630 Basic Probability and Information Theory Tim Campbell 21 January 2003 Probability Theory Probability Theory is the study of how best to predict outcomes of events. An experiment (or trial or event)
More informationMLE/MAP + Naïve Bayes
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes Matt Gormley Lecture 19 March 20, 2018 1 Midterm Exam Reminders
More informationMidterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian
More informationMachine Learning and Data Mining. Bayes Classifiers. Prof. Alexander Ihler
+ Machine Learning and Data Mining Bayes Classifiers Prof. Alexander Ihler A basic classifier Training data D={x (i),y (i) }, Classifier f(x ; D) Discrete feature vector x f(x ; D) is a con@ngency table
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 1 Evalua:on
More informationDiscrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14
CS 70 Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14 Introduction One of the key properties of coin flips is independence: if you flip a fair coin ten times and get ten
More informationPMR Learning as Inference
Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning
More informationCS 361: Probability & Statistics
October 17, 2017 CS 361: Probability & Statistics Inference Maximum likelihood: drawbacks A couple of things might trip up max likelihood estimation: 1) Finding the maximum of some functions can be quite
More informationMachine Learning & Data Mining CS/CNS/EE 155. Lecture 11: Hidden Markov Models
Machine Learning & Data Mining CS/CNS/EE 155 Lecture 11: Hidden Markov Models 1 Kaggle Compe==on Part 1 2 Kaggle Compe==on Part 2 3 Announcements Updated Kaggle Report Due Date: 9pm on Monday Feb 13 th
More informationProbability and Statistical Decision Theory
Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Probability and Statistical Decision Theory Many slides attributable to: Erik Sudderth (UCI) Prof. Mike Hughes
More informationMachine Learning - MT & 5. Basis Expansion, Regularization, Validation
Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships
More informationPreliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com
1 School of Oriental and African Studies September 2015 Department of Economics Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com Gujarati D. Basic Econometrics, Appendix
More informationMachine Learning & Data Mining CS/CNS/EE 155. Lecture 8: Hidden Markov Models
Machine Learning & Data Mining CS/CNS/EE 155 Lecture 8: Hidden Markov Models 1 x = Fish Sleep y = (N, V) Sequence Predic=on (POS Tagging) x = The Dog Ate My Homework y = (D, N, V, D, N) x = The Fox Jumped
More informationCS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning
CS 446 Machine Learning Fall 206 Nov 0, 206 Bayesian Learning Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes Overview Bayesian Learning Naive Bayes Logistic Regression Bayesian Learning So far, we
More informationA Brief Review of Probability, Bayesian Statistics, and Information Theory
A Brief Review of Probability, Bayesian Statistics, and Information Theory Brendan Frey Electrical and Computer Engineering University of Toronto frey@psi.toronto.edu http://www.psi.toronto.edu A system
More informationComputational Cognitive Science
Computational Cognitive Science Lecture 8: Frank Keller School of Informatics University of Edinburgh keller@inf.ed.ac.uk Based on slides by Sharon Goldwater October 14, 2016 Frank Keller Computational
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationSta$s$cal sequence recogni$on
Sta$s$cal sequence recogni$on Determinis$c sequence recogni$on Last $me, temporal integra$on of local distances via DP Integrates local matches over $me Normalizes $me varia$ons For cts speech, segments
More informationIntroduction to Machine Learning
Introduction to Machine Learning Generative Models Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1
More informationPriors in Dependency network learning
Priors in Dependency network learning Sushmita Roy sroy@biostat.wisc.edu Computa:onal Network Biology Biosta2s2cs & Medical Informa2cs 826 Computer Sciences 838 hbps://compnetbiocourse.discovery.wisc.edu
More informationParameter Es*ma*on: Cracking Incomplete Data
Parameter Es*ma*on: Cracking Incomplete Data Khaled S. Refaat Collaborators: Arthur Choi and Adnan Darwiche Agenda Learning Graphical Models Complete vs. Incomplete Data Exploi*ng Data for Decomposi*on
More informationGenerative Clustering, Topic Modeling, & Bayesian Inference
Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week
More informationBayesian RL Seminar. Chris Mansley September 9, 2008
Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in
More informationCSC 411 Lecture 3: Decision Trees
CSC 411 Lecture 3: Decision Trees Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 03-Decision Trees 1 / 33 Today Decision Trees Simple but powerful learning
More informationA.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I. kevin small & byron wallace
A.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I kevin small & byron wallace today a review of probability random variables, maximum likelihood, etc. crucial for clinical
More informationClass Notes. Examining Repeated Measures Data on Individuals
Ronald Heck Week 12: Class Notes 1 Class Notes Examining Repeated Measures Data on Individuals Generalized linear mixed models (GLMM) also provide a means of incorporang longitudinal designs with categorical
More informationBias-Variance Tradeoff
What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff
More informationHierarchical Models & Bayesian Model Selection
Hierarchical Models & Bayesian Model Selection Geoffrey Roeder Departments of Computer Science and Statistics University of British Columbia Jan. 20, 2016 Contact information Please report any typos or
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More information