Logistic Regression. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Size: px
Start display at page:

Download "Logistic Regression. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824"

Transcription

1 Logistic Regression Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019

2 Administrative Please start HW 1 early! Questions are welcome!

3 Two principles for estimating parameters Maximum Likelihood Estimate (MLE) Choose θ that maximizes probability of observed data θ MLE = argmax θ P(Data θ) Maximum a posteriori estimation (MAP) Choose θ that is most probable given prior probability and data θ MAP = argmax θ P θ D = argmax θ P Data θ P θ P(Data) Slide credit: Tom Mitchell

4 Naïve Bayes classifier Want to learn P Y X 1,, X n ) But require 2 n parameters... How about applying Bayes rule? P Y X 1,, X n ) = P(X 1,,X n Y P Y P(X 1,,X n ) P(X 1,, X n Y P Y P(X 1,, X n Y : Need (2 n 1) 2 parameters P(Y): Need 1 parameter Apply conditional independence assumption n P X 1,, X n Y = ς j=1 P(X j Y): Need n 2 parameters

5 Naïve Bayes classifier Bayes rule: P Y = y k X 1,, X n ) = P(Y = y k)p(x 1,, X n Y = y k σ j P Y = y j P X 1,, X n Y = y j Assume conditional independence among X i s: P Y = y k X 1,, X n ) = P Y = y k Π i P X i Y = y k ) σ j P Y = y j Π i P X i Y = y j ) Pick the most probable Y Y argmax y k P Y = y k Π i P X i Y = y k ) Slide credit: Tom Mitchell

6 Example P Y X 1, X 2 Estimating parameters P Y P X 1, X 2 Y = P Y P X 1 Y P(X 2 Y Bayes rule P Y = 1 = 0.4 P X 1 = 1 Y = 1 = 0.2 P X 1 = 1 Y = 0 = 0.7 P X 2 = 1 Y = 1 = 0.3 P X 2 = 1 Y = 0 = 0.9 Conditional indep. P Y = 0 = 0.6 P X 1 = 0 Y = 1 = 0.8 P X 1 = 0 Y = 0 = 0.3 P X 2 = 0 Y = 1 = 0.7 P X 2 = 0 Y = 0 = 0.1 Test example: X 1 = 1, X 2 = 0 Y = 1: P Y = 1 P X 1 = 1 Y = 1 P X 2 = 0 Y = 1 = = Y = 0: P Y = 0 P X 1 = 1 Y = 0 P X 2 = 0 Y = 0 = = 0.042

7 Naïve Bayes algorithm discrete X i For each value y k Estimate π k = P(Y = y k ) For each value x ij of each attribute X i Estimate θ ijk = P(X i = x ijk Y = y k ) Classify X test Y argmax y k P Y = y k Π i P X i test Y = y k ) Y argmax y k π k Π i θ ijk Slide credit: Tom Mitchell

8 Estimating parameters: discrete Y, X i Maximum likelihood estimates (MLE) π k = P Y = y k = #D Y = y k D መθ ijk = P X i = x ij Y = y k = #D X i = x ij ^ Y = y k #D{Y = y k } Slide credit: Tom Mitchell

9 F = 1 iff you live in Fox Ridge S = 1 iff you watched the superbowl last night D = 1 iff you drive to VT G = 1 iff you went to gym in the last month P F = 1 = P S = 1 F = 1 = P S = 1 F = 0 = P D = 1 F = 1 = P D = 1 F = 0 = P G = 1 F = 1 = P G = 1 F = 0 = P F = 0 = P S = 0 F = 1 = P S = 0 F = 0 = P D = 0 F = 1 = P D = 0 F = 0 = P G = 0 F = 1 = P G = 0 F = 0 = P F S, D, G = P F P S F P D F P(G F)

10 Naïve Bayes: Subtlety #1 Often the X i are not really conditionally independent Naïve Bayes often works pretty well anyway Often the right classification, even when not the right probability [Domingos & Pazzani, 1996]) What is the effect on estimated P(Y X)? What if we have two copies: X i = X k P Y = y k X 1,, X n ) P Y = y k Π i P X i Y = y k ) Slide credit: Tom Mitchell

11 Naïve Bayes: Subtlety #2 MLE estimate for P X i Y = y k ) might be zero. (for example, X i = birthdate. X i = Feb_4_1995) Why worry about just one parameter out of many? P Y = y k X 1,, X n ) P Y = y k Π i P X i Y = y k ) What can we do to address this? MAP estimates (adding imaginary examples) Slide credit: Tom Mitchell

12 Estimating parameters: discrete Y, X i Maximum likelihood estimates (MLE) π k = P Y = y k = #D Y = y k D መθ ijk = P X i = x ij Y = y k = #D X i = x ij, Y = y k #D{Y = y k } MAP estimates (Dirichlet priors): π k = P Y = y k = #D Y = y k + (β k 1) D + σ m (β m 1) መθ ijk = P X i = x ij Y = y k = #D X i = x ij, Y = y k + (β k 1) #D{Y = y k } + σ m (β m 1) Slide credit: Tom Mitchell

13 What if we have continuous X i Gaussian Naïve Bayes (GNB): assume P X i = x Y = y k = 1 exp( x μ ik 2 2πσ ik 2σ ik 2 ) Additional assumption on σ ik : Is independent of Y (σ i ) Is independent of X i (σ k ) Is independent of X i and Y (σ) Slide credit: Tom Mitchell

14 Naïve Bayes algorithm continuous X i For each value y k Estimate π k = P(Y = y k ) For each attribute X i estimate Class conditional mean μ ik, variance σ ik Classify X test Y argmax P Y = y k Π i P X test i Y = y k ) y k Y argmax π k Π i Normal(X test i, μ ik, σ ik ) y k Slide credit: Tom Mitchell

15 Things to remember Probability basics Conditional probability, joint probability, Bayes rule Estimating parameters from data Maximum likelihood (ML) Maximum a posteriori estimation (MAP) maximize P(Data θ) maximize P(θ Data) Naive Bayes P Y = y k X 1,, X n ) P Y = y k Π i P X i Y = y k )

16 Logistic Regression Hypothesis representation Cost function Logistic regression with gradient descent Regularization Multi-class classification

17 Logistic Regression Hypothesis representation Cost function Logistic regression with gradient descent Regularization Multi-class classification

18 1 (Yes) Malignant? 0 (No) h θ x = θ x Tumor Size Threshold classifier output h θ x at 0.5 If h θ x 0.5, predict y = 1 If h θ x < 0.5, predict y = 0 Slide credit: Andrew Ng

19 Classification: y = 1 or y = 0 h θ x = θ x (from linear regression) can be > 1 or < 0 Logistic regression: 0 h θ x 1 Logistic regression is actually for classification Slide credit: Andrew Ng

20 Hypothesis representation Want 0 h θ x 1 h θ x = g θ x, where g z = 1 1+e z h θ x = g(z) e θ x Sigmoid function Logistic function z Slide credit: Andrew Ng

21 Interpretation of hypothesis output h θ x = estimated probability that y = 1 on input x Example: If x = x 0 x = 1 h θ x = tumorsize Tell patient that 70% chance of tumor being malignant Slide credit: Andrew Ng

22 Logistic regression h θ x = g θ x 1 g z = 1 + e z g(z) z = θ x Suppose predict y = 1 if h θ x 0.5 predict y = 0 if h θ x < 0.5 z = θ x 0 z = θ x < 0 Slide credit: Andrew Ng

23 Decision boundary h θ x = g(θ 0 + θ 1 x 1 + θ 2 x 2 ) Age E.g., θ 0 = 3, θ 1 = 1, θ 2 = 1 Tumor Size Predict y = 1 if 3 + x 1 + x 2 0 Slide credit: Andrew Ng

24 h θ x = g(θ 0 + θ 1 x 1 + θ 2 x 2 + θ 3 x θ 4 x 2 2 ) E.g., θ 0 = 1, θ 1 = 0, θ 2 = 0, θ 3 = 1, θ 4 = 1 Predict y = 1 if 1 + x x h θ x = g(θ 0 + θ 1 x 1 + θ 2 x 2 + θ 3 x θ 4 x 1 2 x 2 + θ 5 x 1 2 x θ 6 x 1 3 x 2 + ) Slide credit: Andrew Ng

25 Where does the form come from? Logistic regression hypothesis representation 1 h θ x = 1 + e = 1 θ x 1 + e (θ 0+θ 1 x 1 +θ 2 x 2 + +θ n x n ) Consider learning f: X Y, where X is a vector of real-valued features X 1,, X n Y is Boolean Assume all X i are conditionally independent given Y Model P X i Y = y k as Gaussian N μ ik, σ i Model P Y as Bernoulli π What is P Y X 1, X 2,, X n? Slide credit: Tom Mitchell

26 P Y = 1 X = = = = P x y k = 1 2πσ i e P Y=1 P(X Y=1) P Y=1 P(X Y=1) +P(Y=0)P(X Y=0) 1 1+ P Y=0 P(X Y=0) P Y=1 P(X Y=1) 1+exp(ln( 1+exp(ln 1 π π x μ ik 2 2σ i 2 1 P Y=0 P(X Y=0) P Y=1 P(X Y=1) )) P Y = 1 X 1, X 2,, X n = 1 +σ i ln P(X i Y=0) P(X i Y=1) ) ( μ i0 μ i1 i σ i 2 X i + μ i1 2 2 μ i0 2 ) 2σ i exp(θ 0 + σ i θ i X i ) Applying Bayes rule Divide by P Y = 1 P(X Y = 1) Apply exp(ln( )) Plug in P(X i Y) Slide credit: Tom Mitchell

27

28 Logistic Regression Hypothesis representation Cost function Logistic regression with gradient descent Regularization Multi-class classification

29 Training set with m examples { x 1, y 1, x 2, y 2,, x m, y m x 0 x x 1 x n x 0 = 1, y {0, 1} h θ x = e θ x How to choose parameters θ? Slide credit: Andrew Ng

30 Cost function for Linear Regression J θ = 1 m 2m i=1 m h θ x i y i 2 1 = m i=1 Cost(h θ (x i ), y)) Cost(h θ x, y) = 1 2 h θ x y 2 Slide credit: Andrew Ng

31 Cost function for Logistic Regression Cost(h θ x, y) = log h θ x if y = 1 log 1 h θ x if y = 0 if y = 1 if y = 0 0 h θ x 1 0 h θ x 1 Slide credit: Andrew Ng

32 Logistic regression cost function Cost(h θ x, y) = log h θ x if y = 1 log 1 h θ x if y = 0 Cost h θ x, y = y log h θ x (1 y) log 1 h θ x If y = 1: Cost h θ x, y = log h θ x If y = 0: Cost h θ x, y = log 1 h θ x Slide credit: Andrew Ng

33 Logistic regression m J θ = 1 m i=1 Cost(h θ (x i ), y (i) )) = 1 σ m i=1 m y (i) log h θ x (i) + (1 y (i) ) log 1 h θ x (i) Learning: fit parameter θ min θ J(θ) Prediction: given new x Output h θ x = 1 1+e θ x Slide credit: Andrew Ng

34 Where does the cost come from? Training set with m examples x 1, y 1, x 2, y 2,, x m, y m Maximum likelihood estimate for parameter θ θ MLE = argmax P θ x 1, y 1, x 2, y 2,, x m, y m θ = argmax θ m i=1 P θ x i, y i Maximum conditional likelihood estimate for parameter θ Slide credit: Tom Mitchell

35 Goal: choose θ to maximize conditional likelihood of training data P θ Y = 1 X = x = h θ x = P θ Y = 0 X = x = 1 h θ x = 1 1+e θ x e θ x 1+e θ x Training data D = x 1, y 1, x 2, y 2,, x m, y m Data likelihood = ς m i=1 P θ x i, y i Data conditional likelihood = ς m i=1 P θ y (i) x i θ MCLE = argmax θ ς m i=1 P θ y (i) x i Slide credit: Tom Mitchell

36 Expressing conditional log-likelihood m m L θ = log P θ y (i) x i = log P θ y (i) x i m i=1 i=1 = y (i) log P θ y (i) = 1 x i + 1 y i log P θ y (i) = 0 x i i=1 = σ m i=1 y (i) log (h θ (x (i) )) + 1 y i log(1 h θ (x (i) )) Cost(h θ x, y) = log h θ x if y = 1 log 1 h θ x if y = 0

37 Logistic Regression Hypothesis representation Cost function Logistic regression with gradient descent Regularization Multi-class classification

38 Gradient descent m J θ = 1 m i=1 y (i) log h θ x (i) + (1 y (i) ) log 1 h θ x (i) Goal: min θ J(θ) Good news: Convex function! Bad news: No analytical solution Repeat { θ j θ j α θ j J(θ) } (Simultaneously update all θ j ) m θ j J θ = 1 m i=1 (h θ x i y (i) ) x j (i) Slide credit: Andrew Ng

39 Gradient descent m J θ = 1 m i=1 y (i) log h θ x (i) + (1 y (i) ) log 1 h θ x (i) Goal: min θ J(θ) Repeat { m θ j θ j α 1 m i=1 } (Simultaneously update all θ j ) h θ x i y (i) x j (i) Slide credit: Andrew Ng

40 Gradient descent for Linear Regression Repeat { m θ j θ j α 1 m i=1 } h θ x i y (i) x j (i) h θ x = θ x Gradient descent for Logistic Regression Repeat { m θ j θ j α 1 m i=1 h θ x i y (i) x j (i) h θ x = e θ x } Slide credit: Andrew Ng

41 Logistic Regression Hypothesis representation Cost function Logistic regression with gradient descent Regularization Multi-class classification

42 How about MAP? Maximum conditional likelihood estimate (MCLE) θ MCLE = argmax θ ς m i=1 P θ y (i) x i Maximum conditional a posterior estimate (MCAP) θ MCAP = argmax θ ς m i=1 P θ y (i) x i P(θ)

43 Prior P(θ) Common choice of P(θ): Normal distribution, zero mean, identity covariance Pushes parameters towards zeros Corresponds to Regularization Helps avoid very large weights and overfitting Slide credit: Tom Mitchell

44 MLE vs. MAP Maximum conditional likelihood estimate (MCLE) m θ j θ j α 1 m i=1 h θ x i y (i) x j (i) Maximum conditional a posterior estimate (MCAP) m θ j θ j αλθ j α 1 m i=1 h θ x i y (i) x j (i)

45 Logistic Regression Hypothesis representation Cost function Logistic regression with gradient descent Regularization Multi-class classification

46 Multi-class classification foldering/taggning: Work, Friends, Family, Hobby Medical diagrams: Not ill, Cold, Flu Weather: Sunny, Cloudy, Rain, Snow Slide credit: Andrew Ng

47 Binary classification Multiclass classification x 2 x 2 x 1 x 1

48 One-vs-all (one-vs-rest) x 2 h θ 1 x x 2 x 1 h θ 2 x x 2 Class 1: Class 2: Class 3: h θ i x 1 h θ 3 x = P y = i x; θ (i = 1, 2, 3) x x 2 x 1 x 1 Slide credit: Andrew Ng

49 One-vs-all Train a logistic regression classifier h θ i x for each class i to predict the probability that y = i Given a new input x, pick the class i that maximizes max i h θ i x Slide credit: Andrew Ng

50 Generative Approach Ex: Naïve Bayes Discriminative Approach Ex: Logistic regression Estimate P(Y) and P(X Y) Estimate P(Y X) directly (Or a discriminant function: e.g., SVM) Prediction y = argmax y P Y = y P(X = x Y = y) Prediction y = P(Y = y X = x)

51 Further readings Tom M. Mitchell Generative and discriminative classifiers: Naïve Bayes and Logistic Regression Andrew Ng, Michael Jordan On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes

52 Things to remember Hypothesis representation h θ x = e θ x Cost function Cost(h θ x, y) = log h θ x if y = 1 log 1 h θ x if y = 0 Logistic regression with gradient descent Regularization Multi-class classification m θ j θ j α 1 m i=1 θ j θ j αλθ j α 1 m i=1 max i h θ i x h θ x i y (i) x j (i) m h θ x i y (i) x j (i)

53 Coming up Regularization Support Vector Machine

Naïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Naïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824 Naïve Bayes Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative HW 1 out today. Please start early! Office hours Chen: Wed 4pm-5pm Shih-Yang: Fri 3pm-4pm Location: Whittemore 266

More information

Support Vector Machine I

Support Vector Machine I Support Vector Machine I Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative Please use piazza. No emails. HW 0 grades are back. Re-grade request for one week. HW 1 due soon. HW

More information

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data

More information

Machine Learning Gaussian Naïve Bayes Big Picture

Machine Learning Gaussian Naïve Bayes Big Picture Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 27, 2011 Today: Naïve Bayes Big Picture Logistic regression Gradient ascent Generative discriminative

More information

Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012

Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012 Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 20, 2012 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 22, 2011 Today: MLE and MAP Bayes Classifiers Naïve Bayes Readings: Mitchell: Naïve Bayes and Logistic

More information

Generative v. Discriminative classifiers Intuition

Generative v. Discriminative classifiers Intuition Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 1 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 2, 2015 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)

More information

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 26, 2015 Today: Bayes Classifiers Conditional Independence Naïve Bayes Readings: Mitchell: Naïve Bayes

More information

ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning ECE 5984: Introduction to Machine Learning Topics: Classification: Logistic Regression NB & LR connections Readings: Barber 17.4 Dhruv Batra Virginia Tech Administrativia HW2 Due: Friday 3/6, 3/15, 11:55pm

More information

Classification Based on Probability

Classification Based on Probability Logistic Regression These slides were assembled by Byron Boots, with only minor modifications from Eric Eaton s slides and grateful acknowledgement to the many others who made their course materials freely

More information

Estimating Parameters

Estimating Parameters Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 13, 2012 Today: Bayes Classifiers Naïve Bayes Gaussian Naïve Bayes Readings: Mitchell: Naïve Bayes

More information

Bias-Variance Tradeoff

Bias-Variance Tradeoff What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

Logistic Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Logistic Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com Logistic Regression These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these

More information

Last Time. Today. Bayesian Learning. The Distributions We Love. CSE 446 Gaussian Naïve Bayes & Logistic Regression

Last Time. Today. Bayesian Learning. The Distributions We Love. CSE 446 Gaussian Naïve Bayes & Logistic Regression CSE 446 Gaussian Naïve Bayes & Logistic Regression Winter 22 Dan Weld Learning Gaussians Naïve Bayes Last Time Gaussians Naïve Bayes Logistic Regression Today Some slides from Carlos Guestrin, Luke Zettlemoyer

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

Bayesian Classifiers, Conditional Independence and Naïve Bayes. Required reading: Naïve Bayes and Logistic Regression (available on class website)

Bayesian Classifiers, Conditional Independence and Naïve Bayes. Required reading: Naïve Bayes and Logistic Regression (available on class website) Bayesian Classifiers, Conditional Independence and Naïve Bayes Required reading: Naïve Bayes and Logistic Regression (available on class website) Machine Learning 10-701 Tom M. Mitchell Machine Learning

More information

Logistic Regression. Machine Learning Fall 2018

Logistic Regression. Machine Learning Fall 2018 Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Gaussian and Linear Discriminant Analysis; Multiclass Classification Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015

More information

Machine Learning

Machine Learning Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 1, 2011 Today: Generative discriminative classifiers Linear regression Decomposition of error into

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 4, 2015 Today: Generative discriminative classifiers Linear regression Decomposition of error into

More information

Machine Learning, Fall 2012 Homework 2

Machine Learning, Fall 2012 Homework 2 0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0

More information

Generative v. Discriminative classifiers Intuition

Generative v. Discriminative classifiers Intuition Logistic Regression Machine Learning 070/578 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features Y target

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Generative v. Discriminative classifiers Intuition

Generative v. Discriminative classifiers Intuition Logistic Regression (Continued) Generative v. Discriminative Decision rees Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University January 31 st, 2007 2005-2007 Carlos Guestrin 1 Generative

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012) Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation

More information

Support Vector Machine II

Support Vector Machine II Support Vector Machine II Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative HW 1 due tonight HW 2 released. Online Scalable Learning Adaptive to Unknown Dynamics and Graphs Yanning

More information

The Naïve Bayes Classifier. Machine Learning Fall 2017

The Naïve Bayes Classifier. Machine Learning Fall 2017 The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning

More information

Bayesian Methods: Naïve Bayes

Bayesian Methods: Naïve Bayes Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior

More information

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Naïve Bayes Matt Gormley Lecture 18 Oct. 31, 2018 1 Reminders Homework 6: PAC Learning

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d.

More information

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework

More information

Notes on Machine Learning for and

Notes on Machine Learning for and Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori

More information

Logis&c Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Logis&c Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com Logis&c Regression These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these

More information

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017 CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class

More information

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian

More information

CSCE 478/878 Lecture 6: Bayesian Learning

CSCE 478/878 Lecture 6: Bayesian Learning Bayesian Methods Not all hypotheses are created equal (even if they are all consistent with the training data) Outline CSCE 478/878 Lecture 6: Bayesian Learning Stephen D. Scott (Adapted from Tom Mitchell

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

CSC 411: Lecture 09: Naive Bayes

CSC 411: Lecture 09: Naive Bayes CSC 411: Lecture 09: Naive Bayes Class based on Raquel Urtasun & Rich Zemel s lectures Sanja Fidler University of Toronto Feb 8, 2015 Urtasun, Zemel, Fidler (UofT) CSC 411: 09-Naive Bayes Feb 8, 2015 1

More information

Stochastic Gradient Descent

Stochastic Gradient Descent Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Introduction to Machine Learning

Introduction to Machine Learning Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},

More information

Is the test error unbiased for these programs?

Is the test error unbiased for these programs? Is the test error unbiased for these programs? Xtrain avg N o Preprocessing by de meaning using whole TEST set 2017 Kevin Jamieson 1 Is the test error unbiased for this program? e Stott see non for f x

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Bayesian Classification Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

More information

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University August 30, 2017 Today: Decision trees Overfitting The Big Picture Coming soon Probabilistic learning MLE,

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

Classification Logistic Regression

Classification Logistic Regression Announcements: Classification Logistic Regression Machine Learning CSE546 Sham Kakade University of Washington HW due on Friday. Today: Review: sub-gradients,lasso Logistic Regression October 3, 26 Sham

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining MLE and MAP Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due tonight. Assignment 5: Will be released

More information

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so. CS 89 Spring 07 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic

More information

Machine Learning, Midterm Exam: Spring 2009 SOLUTION

Machine Learning, Midterm Exam: Spring 2009 SOLUTION 10-601 Machine Learning, Midterm Exam: Spring 2009 SOLUTION March 4, 2009 Please put your name at the top of the table below. If you need more room to work out your answer to a question, use the back of

More information

Machine Learning. Classification. Bayes Classifier. Representing data: Choosing hypothesis class. Learning: h:x a Y. Eric Xing

Machine Learning. Classification. Bayes Classifier. Representing data: Choosing hypothesis class. Learning: h:x a Y. Eric Xing Machine Learning 10-701/15 701/15-781, 781, Spring 2008 Naïve Bayes Classifier Eric Xing Lecture 3, January 23, 2006 Reading: Chap. 4 CB and handouts Classification Representing data: Choosing hypothesis

More information

Lecture 2 Machine Learning Review

Lecture 2 Machine Learning Review Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things

More information

Machine Learning: Assignment 1

Machine Learning: Assignment 1 10-701 Machine Learning: Assignment 1 Due on Februrary 0, 014 at 1 noon Barnabas Poczos, Aarti Singh Instructions: Failure to follow these directions may result in loss of points. Your solutions for this

More information

Machine Learning, Midterm Exam

Machine Learning, Midterm Exam 10-601 Machine Learning, Midterm Exam Instructors: Tom Mitchell, Ziv Bar-Joseph Wednesday 12 th December, 2012 There are 9 questions, for a total of 100 points. This exam has 20 pages, make sure you have

More information

Stephen Scott.

Stephen Scott. 1 / 28 ian ian Optimal (Adapted from Ethem Alpaydin and Tom Mitchell) Naïve Nets sscott@cse.unl.edu 2 / 28 ian Optimal Naïve Nets Might have reasons (domain information) to favor some hypotheses/predictions

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

More information

Logistic Regression. William Cohen

Logistic Regression. William Cohen Logistic Regression William Cohen 1 Outline Quick review classi5ication, naïve Bayes, perceptrons new result for naïve Bayes Learning as optimization Logistic regression via gradient ascent Over5itting

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

MLE/MAP + Naïve Bayes

MLE/MAP + Naïve Bayes 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes MLE / MAP Readings: Estimating Probabilities (Mitchell, 2016)

More information

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber Machine Learning Regression-Based Classification & Gaussian Discriminant Analysis Manfred Huber 2015 1 Logistic Regression Linear regression provides a nice representation and an efficient solution to

More information

Generative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham

Generative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham Generative classifiers: The Gaussian classifier Ata Kaban School of Computer Science University of Birmingham Outline We have already seen how Bayes rule can be turned into a classifier In all our examples

More information

Generative Learning algorithms

Generative Learning algorithms CS9 Lecture notes Andrew Ng Part IV Generative Learning algorithms So far, we ve mainly been talking about learning algorithms that model p(y x; θ), the conditional distribution of y given x. For instance,

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer

More information

Bayesian Learning. Reading: Tom Mitchell, Generative and discriminative classifiers: Naive Bayes and logistic regression, Sections 1-2.

Bayesian Learning. Reading: Tom Mitchell, Generative and discriminative classifiers: Naive Bayes and logistic regression, Sections 1-2. Bayesian Learning Reading: Tom Mitchell, Generative and discriminative classifiers: Naive Bayes and logistic regression, Sections 1-2. (Linked from class website) Conditional Probability Probability of

More information

Introduction to Logistic Regression and Support Vector Machine

Introduction to Logistic Regression and Support Vector Machine Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel

More information

ECE521 Lecture7. Logistic Regression

ECE521 Lecture7. Logistic Regression ECE521 Lecture7 Logistic Regression Outline Review of decision theory Logistic regression A single neuron Multi-class classification 2 Outline Decision theory is conceptually easy and computationally hard

More information

An Introduction to Statistical and Probabilistic Linear Models

An Introduction to Statistical and Probabilistic Linear Models An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning

More information

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5 Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5 Slides adapted from Jordan Boyd-Graber, Tom Mitchell, Ziv Bar-Joseph Machine Learning: Chenhao Tan Boulder 1 of 27 Quiz question For

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Generative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul

Generative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul Generative Learning INFO-4604, Applied Machine Learning University of Colorado Boulder November 29, 2018 Prof. Michael Paul Generative vs Discriminative The classification algorithms we have seen so far

More information

COMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma

COMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma COMS 4771 Introduction to Machine Learning James McInerney Adapted from slides by Nakul Verma Announcements HW1: Please submit as a group Watch out for zero variance features (Q5) HW2 will be released

More information

CSE446: Clustering and EM Spring 2017

CSE446: Clustering and EM Spring 2017 CSE446: Clustering and EM Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin, Dan Klein, and Luke Zettlemoyer Clustering systems: Unsupervised learning Clustering Detect patterns in unlabeled

More information

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 8 Feb. 12, 2018 1 10-601 Introduction

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning CS4375 --- Fall 2018 Bayesian a Learning Reading: Sections 13.1-13.6, 20.1-20.2, R&N Sections 6.1-6.3, 6.7, 6.9, Mitchell 1 Uncertainty Most real-world problems deal with

More information

Comments. x > w = w > x. Clarification: this course is about getting you to be able to think as a machine learning expert

Comments. x > w = w > x. Clarification: this course is about getting you to be able to think as a machine learning expert Logistic regression Comments Mini-review and feedback These are equivalent: x > w = w > x Clarification: this course is about getting you to be able to think as a machine learning expert There has to be

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

Midterm: CS 6375 Spring 2015 Solutions

Midterm: CS 6375 Spring 2015 Solutions Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an

More information

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1) HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter

More information

Introduction to Gaussian Process

Introduction to Gaussian Process Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression

More information

Introduction to Machine Learning

Introduction to Machine Learning Uncertainty Introduction to Machine Learning CS4375 --- Fall 2018 a Bayesian Learning Reading: Sections 13.1-13.6, 20.1-20.2, R&N Sections 6.1-6.3, 6.7, 6.9, Mitchell Most real-world problems deal with

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

Gaussian discriminant analysis Naive Bayes

Gaussian discriminant analysis Naive Bayes DM825 Introduction to Machine Learning Lecture 7 Gaussian discriminant analysis Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Outline 1. is 2. Multi-variate

More information

10-701/ Machine Learning - Midterm Exam, Fall 2010

10-701/ Machine Learning - Midterm Exam, Fall 2010 10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

CS446: Machine Learning Fall Final Exam. December 6 th, 2016

CS446: Machine Learning Fall Final Exam. December 6 th, 2016 CS446: Machine Learning Fall 2016 Final Exam December 6 th, 2016 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains

More information

Bayesian Learning. Bayesian Learning Criteria

Bayesian Learning. Bayesian Learning Criteria Bayesian Learning In Bayesian learning, we are interested in the probability of a hypothesis h given the dataset D. By Bayes theorem: P (h D) = P (D h)p (h) P (D) Other useful formulas to remember are:

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Antti Ukkonen TAs: Saska Dönges and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer,

More information