Logistic Regression. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824
|
|
- Angel Glenn
- 5 years ago
- Views:
Transcription
1 Logistic Regression Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019
2 Administrative Please start HW 1 early! Questions are welcome!
3 Two principles for estimating parameters Maximum Likelihood Estimate (MLE) Choose θ that maximizes probability of observed data θ MLE = argmax θ P(Data θ) Maximum a posteriori estimation (MAP) Choose θ that is most probable given prior probability and data θ MAP = argmax θ P θ D = argmax θ P Data θ P θ P(Data) Slide credit: Tom Mitchell
4 Naïve Bayes classifier Want to learn P Y X 1,, X n ) But require 2 n parameters... How about applying Bayes rule? P Y X 1,, X n ) = P(X 1,,X n Y P Y P(X 1,,X n ) P(X 1,, X n Y P Y P(X 1,, X n Y : Need (2 n 1) 2 parameters P(Y): Need 1 parameter Apply conditional independence assumption n P X 1,, X n Y = ς j=1 P(X j Y): Need n 2 parameters
5 Naïve Bayes classifier Bayes rule: P Y = y k X 1,, X n ) = P(Y = y k)p(x 1,, X n Y = y k σ j P Y = y j P X 1,, X n Y = y j Assume conditional independence among X i s: P Y = y k X 1,, X n ) = P Y = y k Π i P X i Y = y k ) σ j P Y = y j Π i P X i Y = y j ) Pick the most probable Y Y argmax y k P Y = y k Π i P X i Y = y k ) Slide credit: Tom Mitchell
6 Example P Y X 1, X 2 Estimating parameters P Y P X 1, X 2 Y = P Y P X 1 Y P(X 2 Y Bayes rule P Y = 1 = 0.4 P X 1 = 1 Y = 1 = 0.2 P X 1 = 1 Y = 0 = 0.7 P X 2 = 1 Y = 1 = 0.3 P X 2 = 1 Y = 0 = 0.9 Conditional indep. P Y = 0 = 0.6 P X 1 = 0 Y = 1 = 0.8 P X 1 = 0 Y = 0 = 0.3 P X 2 = 0 Y = 1 = 0.7 P X 2 = 0 Y = 0 = 0.1 Test example: X 1 = 1, X 2 = 0 Y = 1: P Y = 1 P X 1 = 1 Y = 1 P X 2 = 0 Y = 1 = = Y = 0: P Y = 0 P X 1 = 1 Y = 0 P X 2 = 0 Y = 0 = = 0.042
7 Naïve Bayes algorithm discrete X i For each value y k Estimate π k = P(Y = y k ) For each value x ij of each attribute X i Estimate θ ijk = P(X i = x ijk Y = y k ) Classify X test Y argmax y k P Y = y k Π i P X i test Y = y k ) Y argmax y k π k Π i θ ijk Slide credit: Tom Mitchell
8 Estimating parameters: discrete Y, X i Maximum likelihood estimates (MLE) π k = P Y = y k = #D Y = y k D መθ ijk = P X i = x ij Y = y k = #D X i = x ij ^ Y = y k #D{Y = y k } Slide credit: Tom Mitchell
9 F = 1 iff you live in Fox Ridge S = 1 iff you watched the superbowl last night D = 1 iff you drive to VT G = 1 iff you went to gym in the last month P F = 1 = P S = 1 F = 1 = P S = 1 F = 0 = P D = 1 F = 1 = P D = 1 F = 0 = P G = 1 F = 1 = P G = 1 F = 0 = P F = 0 = P S = 0 F = 1 = P S = 0 F = 0 = P D = 0 F = 1 = P D = 0 F = 0 = P G = 0 F = 1 = P G = 0 F = 0 = P F S, D, G = P F P S F P D F P(G F)
10 Naïve Bayes: Subtlety #1 Often the X i are not really conditionally independent Naïve Bayes often works pretty well anyway Often the right classification, even when not the right probability [Domingos & Pazzani, 1996]) What is the effect on estimated P(Y X)? What if we have two copies: X i = X k P Y = y k X 1,, X n ) P Y = y k Π i P X i Y = y k ) Slide credit: Tom Mitchell
11 Naïve Bayes: Subtlety #2 MLE estimate for P X i Y = y k ) might be zero. (for example, X i = birthdate. X i = Feb_4_1995) Why worry about just one parameter out of many? P Y = y k X 1,, X n ) P Y = y k Π i P X i Y = y k ) What can we do to address this? MAP estimates (adding imaginary examples) Slide credit: Tom Mitchell
12 Estimating parameters: discrete Y, X i Maximum likelihood estimates (MLE) π k = P Y = y k = #D Y = y k D መθ ijk = P X i = x ij Y = y k = #D X i = x ij, Y = y k #D{Y = y k } MAP estimates (Dirichlet priors): π k = P Y = y k = #D Y = y k + (β k 1) D + σ m (β m 1) መθ ijk = P X i = x ij Y = y k = #D X i = x ij, Y = y k + (β k 1) #D{Y = y k } + σ m (β m 1) Slide credit: Tom Mitchell
13 What if we have continuous X i Gaussian Naïve Bayes (GNB): assume P X i = x Y = y k = 1 exp( x μ ik 2 2πσ ik 2σ ik 2 ) Additional assumption on σ ik : Is independent of Y (σ i ) Is independent of X i (σ k ) Is independent of X i and Y (σ) Slide credit: Tom Mitchell
14 Naïve Bayes algorithm continuous X i For each value y k Estimate π k = P(Y = y k ) For each attribute X i estimate Class conditional mean μ ik, variance σ ik Classify X test Y argmax P Y = y k Π i P X test i Y = y k ) y k Y argmax π k Π i Normal(X test i, μ ik, σ ik ) y k Slide credit: Tom Mitchell
15 Things to remember Probability basics Conditional probability, joint probability, Bayes rule Estimating parameters from data Maximum likelihood (ML) Maximum a posteriori estimation (MAP) maximize P(Data θ) maximize P(θ Data) Naive Bayes P Y = y k X 1,, X n ) P Y = y k Π i P X i Y = y k )
16 Logistic Regression Hypothesis representation Cost function Logistic regression with gradient descent Regularization Multi-class classification
17 Logistic Regression Hypothesis representation Cost function Logistic regression with gradient descent Regularization Multi-class classification
18 1 (Yes) Malignant? 0 (No) h θ x = θ x Tumor Size Threshold classifier output h θ x at 0.5 If h θ x 0.5, predict y = 1 If h θ x < 0.5, predict y = 0 Slide credit: Andrew Ng
19 Classification: y = 1 or y = 0 h θ x = θ x (from linear regression) can be > 1 or < 0 Logistic regression: 0 h θ x 1 Logistic regression is actually for classification Slide credit: Andrew Ng
20 Hypothesis representation Want 0 h θ x 1 h θ x = g θ x, where g z = 1 1+e z h θ x = g(z) e θ x Sigmoid function Logistic function z Slide credit: Andrew Ng
21 Interpretation of hypothesis output h θ x = estimated probability that y = 1 on input x Example: If x = x 0 x = 1 h θ x = tumorsize Tell patient that 70% chance of tumor being malignant Slide credit: Andrew Ng
22 Logistic regression h θ x = g θ x 1 g z = 1 + e z g(z) z = θ x Suppose predict y = 1 if h θ x 0.5 predict y = 0 if h θ x < 0.5 z = θ x 0 z = θ x < 0 Slide credit: Andrew Ng
23 Decision boundary h θ x = g(θ 0 + θ 1 x 1 + θ 2 x 2 ) Age E.g., θ 0 = 3, θ 1 = 1, θ 2 = 1 Tumor Size Predict y = 1 if 3 + x 1 + x 2 0 Slide credit: Andrew Ng
24 h θ x = g(θ 0 + θ 1 x 1 + θ 2 x 2 + θ 3 x θ 4 x 2 2 ) E.g., θ 0 = 1, θ 1 = 0, θ 2 = 0, θ 3 = 1, θ 4 = 1 Predict y = 1 if 1 + x x h θ x = g(θ 0 + θ 1 x 1 + θ 2 x 2 + θ 3 x θ 4 x 1 2 x 2 + θ 5 x 1 2 x θ 6 x 1 3 x 2 + ) Slide credit: Andrew Ng
25 Where does the form come from? Logistic regression hypothesis representation 1 h θ x = 1 + e = 1 θ x 1 + e (θ 0+θ 1 x 1 +θ 2 x 2 + +θ n x n ) Consider learning f: X Y, where X is a vector of real-valued features X 1,, X n Y is Boolean Assume all X i are conditionally independent given Y Model P X i Y = y k as Gaussian N μ ik, σ i Model P Y as Bernoulli π What is P Y X 1, X 2,, X n? Slide credit: Tom Mitchell
26 P Y = 1 X = = = = P x y k = 1 2πσ i e P Y=1 P(X Y=1) P Y=1 P(X Y=1) +P(Y=0)P(X Y=0) 1 1+ P Y=0 P(X Y=0) P Y=1 P(X Y=1) 1+exp(ln( 1+exp(ln 1 π π x μ ik 2 2σ i 2 1 P Y=0 P(X Y=0) P Y=1 P(X Y=1) )) P Y = 1 X 1, X 2,, X n = 1 +σ i ln P(X i Y=0) P(X i Y=1) ) ( μ i0 μ i1 i σ i 2 X i + μ i1 2 2 μ i0 2 ) 2σ i exp(θ 0 + σ i θ i X i ) Applying Bayes rule Divide by P Y = 1 P(X Y = 1) Apply exp(ln( )) Plug in P(X i Y) Slide credit: Tom Mitchell
27
28 Logistic Regression Hypothesis representation Cost function Logistic regression with gradient descent Regularization Multi-class classification
29 Training set with m examples { x 1, y 1, x 2, y 2,, x m, y m x 0 x x 1 x n x 0 = 1, y {0, 1} h θ x = e θ x How to choose parameters θ? Slide credit: Andrew Ng
30 Cost function for Linear Regression J θ = 1 m 2m i=1 m h θ x i y i 2 1 = m i=1 Cost(h θ (x i ), y)) Cost(h θ x, y) = 1 2 h θ x y 2 Slide credit: Andrew Ng
31 Cost function for Logistic Regression Cost(h θ x, y) = log h θ x if y = 1 log 1 h θ x if y = 0 if y = 1 if y = 0 0 h θ x 1 0 h θ x 1 Slide credit: Andrew Ng
32 Logistic regression cost function Cost(h θ x, y) = log h θ x if y = 1 log 1 h θ x if y = 0 Cost h θ x, y = y log h θ x (1 y) log 1 h θ x If y = 1: Cost h θ x, y = log h θ x If y = 0: Cost h θ x, y = log 1 h θ x Slide credit: Andrew Ng
33 Logistic regression m J θ = 1 m i=1 Cost(h θ (x i ), y (i) )) = 1 σ m i=1 m y (i) log h θ x (i) + (1 y (i) ) log 1 h θ x (i) Learning: fit parameter θ min θ J(θ) Prediction: given new x Output h θ x = 1 1+e θ x Slide credit: Andrew Ng
34 Where does the cost come from? Training set with m examples x 1, y 1, x 2, y 2,, x m, y m Maximum likelihood estimate for parameter θ θ MLE = argmax P θ x 1, y 1, x 2, y 2,, x m, y m θ = argmax θ m i=1 P θ x i, y i Maximum conditional likelihood estimate for parameter θ Slide credit: Tom Mitchell
35 Goal: choose θ to maximize conditional likelihood of training data P θ Y = 1 X = x = h θ x = P θ Y = 0 X = x = 1 h θ x = 1 1+e θ x e θ x 1+e θ x Training data D = x 1, y 1, x 2, y 2,, x m, y m Data likelihood = ς m i=1 P θ x i, y i Data conditional likelihood = ς m i=1 P θ y (i) x i θ MCLE = argmax θ ς m i=1 P θ y (i) x i Slide credit: Tom Mitchell
36 Expressing conditional log-likelihood m m L θ = log P θ y (i) x i = log P θ y (i) x i m i=1 i=1 = y (i) log P θ y (i) = 1 x i + 1 y i log P θ y (i) = 0 x i i=1 = σ m i=1 y (i) log (h θ (x (i) )) + 1 y i log(1 h θ (x (i) )) Cost(h θ x, y) = log h θ x if y = 1 log 1 h θ x if y = 0
37 Logistic Regression Hypothesis representation Cost function Logistic regression with gradient descent Regularization Multi-class classification
38 Gradient descent m J θ = 1 m i=1 y (i) log h θ x (i) + (1 y (i) ) log 1 h θ x (i) Goal: min θ J(θ) Good news: Convex function! Bad news: No analytical solution Repeat { θ j θ j α θ j J(θ) } (Simultaneously update all θ j ) m θ j J θ = 1 m i=1 (h θ x i y (i) ) x j (i) Slide credit: Andrew Ng
39 Gradient descent m J θ = 1 m i=1 y (i) log h θ x (i) + (1 y (i) ) log 1 h θ x (i) Goal: min θ J(θ) Repeat { m θ j θ j α 1 m i=1 } (Simultaneously update all θ j ) h θ x i y (i) x j (i) Slide credit: Andrew Ng
40 Gradient descent for Linear Regression Repeat { m θ j θ j α 1 m i=1 } h θ x i y (i) x j (i) h θ x = θ x Gradient descent for Logistic Regression Repeat { m θ j θ j α 1 m i=1 h θ x i y (i) x j (i) h θ x = e θ x } Slide credit: Andrew Ng
41 Logistic Regression Hypothesis representation Cost function Logistic regression with gradient descent Regularization Multi-class classification
42 How about MAP? Maximum conditional likelihood estimate (MCLE) θ MCLE = argmax θ ς m i=1 P θ y (i) x i Maximum conditional a posterior estimate (MCAP) θ MCAP = argmax θ ς m i=1 P θ y (i) x i P(θ)
43 Prior P(θ) Common choice of P(θ): Normal distribution, zero mean, identity covariance Pushes parameters towards zeros Corresponds to Regularization Helps avoid very large weights and overfitting Slide credit: Tom Mitchell
44 MLE vs. MAP Maximum conditional likelihood estimate (MCLE) m θ j θ j α 1 m i=1 h θ x i y (i) x j (i) Maximum conditional a posterior estimate (MCAP) m θ j θ j αλθ j α 1 m i=1 h θ x i y (i) x j (i)
45 Logistic Regression Hypothesis representation Cost function Logistic regression with gradient descent Regularization Multi-class classification
46 Multi-class classification foldering/taggning: Work, Friends, Family, Hobby Medical diagrams: Not ill, Cold, Flu Weather: Sunny, Cloudy, Rain, Snow Slide credit: Andrew Ng
47 Binary classification Multiclass classification x 2 x 2 x 1 x 1
48 One-vs-all (one-vs-rest) x 2 h θ 1 x x 2 x 1 h θ 2 x x 2 Class 1: Class 2: Class 3: h θ i x 1 h θ 3 x = P y = i x; θ (i = 1, 2, 3) x x 2 x 1 x 1 Slide credit: Andrew Ng
49 One-vs-all Train a logistic regression classifier h θ i x for each class i to predict the probability that y = i Given a new input x, pick the class i that maximizes max i h θ i x Slide credit: Andrew Ng
50 Generative Approach Ex: Naïve Bayes Discriminative Approach Ex: Logistic regression Estimate P(Y) and P(X Y) Estimate P(Y X) directly (Or a discriminant function: e.g., SVM) Prediction y = argmax y P Y = y P(X = x Y = y) Prediction y = P(Y = y X = x)
51 Further readings Tom M. Mitchell Generative and discriminative classifiers: Naïve Bayes and Logistic Regression Andrew Ng, Michael Jordan On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes
52 Things to remember Hypothesis representation h θ x = e θ x Cost function Cost(h θ x, y) = log h θ x if y = 1 log 1 h θ x if y = 0 Logistic regression with gradient descent Regularization Multi-class classification m θ j θ j α 1 m i=1 θ j θ j αλθ j α 1 m i=1 max i h θ i x h θ x i y (i) x j (i) m h θ x i y (i) x j (i)
53 Coming up Regularization Support Vector Machine
Naïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824
Naïve Bayes Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative HW 1 out today. Please start early! Office hours Chen: Wed 4pm-5pm Shih-Yang: Fri 3pm-4pm Location: Whittemore 266
More informationSupport Vector Machine I
Support Vector Machine I Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative Please use piazza. No emails. HW 0 grades are back. Re-grade request for one week. HW 1 due soon. HW
More informationLogistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu
Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data
More informationMachine Learning Gaussian Naïve Bayes Big Picture
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 27, 2011 Today: Naïve Bayes Big Picture Logistic regression Gradient ascent Generative discriminative
More informationMachine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 20, 2012 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 22, 2011 Today: MLE and MAP Bayes Classifiers Naïve Bayes Readings: Mitchell: Naïve Bayes and Logistic
More informationGenerative v. Discriminative classifiers Intuition
Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 1 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 2, 2015 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)
More informationLogistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.
Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 26, 2015 Today: Bayes Classifiers Conditional Independence Naïve Bayes Readings: Mitchell: Naïve Bayes
More informationECE 5984: Introduction to Machine Learning
ECE 5984: Introduction to Machine Learning Topics: Classification: Logistic Regression NB & LR connections Readings: Barber 17.4 Dhruv Batra Virginia Tech Administrativia HW2 Due: Friday 3/6, 3/15, 11:55pm
More informationClassification Based on Probability
Logistic Regression These slides were assembled by Byron Boots, with only minor modifications from Eric Eaton s slides and grateful acknowledgement to the many others who made their course materials freely
More informationEstimating Parameters
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 13, 2012 Today: Bayes Classifiers Naïve Bayes Gaussian Naïve Bayes Readings: Mitchell: Naïve Bayes
More informationBias-Variance Tradeoff
What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationLogistic Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com
Logistic Regression These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these
More informationLast Time. Today. Bayesian Learning. The Distributions We Love. CSE 446 Gaussian Naïve Bayes & Logistic Regression
CSE 446 Gaussian Naïve Bayes & Logistic Regression Winter 22 Dan Weld Learning Gaussians Naïve Bayes Last Time Gaussians Naïve Bayes Logistic Regression Today Some slides from Carlos Guestrin, Luke Zettlemoyer
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationBayesian Classifiers, Conditional Independence and Naïve Bayes. Required reading: Naïve Bayes and Logistic Regression (available on class website)
Bayesian Classifiers, Conditional Independence and Naïve Bayes Required reading: Naïve Bayes and Logistic Regression (available on class website) Machine Learning 10-701 Tom M. Mitchell Machine Learning
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationMachine Learning
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 1, 2011 Today: Generative discriminative classifiers Linear regression Decomposition of error into
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 4, 2015 Today: Generative discriminative classifiers Linear regression Decomposition of error into
More informationMachine Learning, Fall 2012 Homework 2
0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0
More informationGenerative v. Discriminative classifiers Intuition
Logistic Regression Machine Learning 070/578 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features Y target
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationGenerative v. Discriminative classifiers Intuition
Logistic Regression (Continued) Generative v. Discriminative Decision rees Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University January 31 st, 2007 2005-2007 Carlos Guestrin 1 Generative
More informationIntroduction to Machine Learning
Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationOutline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation
More informationSupport Vector Machine II
Support Vector Machine II Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative HW 1 due tonight HW 2 released. Online Scalable Learning Adaptive to Unknown Dynamics and Graphs Yanning
More informationThe Naïve Bayes Classifier. Machine Learning Fall 2017
The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning
More informationBayesian Methods: Naïve Bayes
Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior
More informationNaïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Naïve Bayes Matt Gormley Lecture 18 Oct. 31, 2018 1 Reminders Homework 6: PAC Learning
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More informationMachine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d.
More informationProbabilistic modeling. The slides are closely adapted from Subhransu Maji s slides
Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework
More informationNotes on Machine Learning for and
Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori
More informationLogis&c Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com
Logis&c Regression These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these
More informationCPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017
CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class
More informationMidterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian
More informationCSCE 478/878 Lecture 6: Bayesian Learning
Bayesian Methods Not all hypotheses are created equal (even if they are all consistent with the training data) Outline CSCE 478/878 Lecture 6: Bayesian Learning Stephen D. Scott (Adapted from Tom Mitchell
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationCSC 411: Lecture 09: Naive Bayes
CSC 411: Lecture 09: Naive Bayes Class based on Raquel Urtasun & Rich Zemel s lectures Sanja Fidler University of Toronto Feb 8, 2015 Urtasun, Zemel, Fidler (UofT) CSC 411: 09-Naive Bayes Feb 8, 2015 1
More informationStochastic Gradient Descent
Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationIntroduction to Machine Learning
Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},
More informationIs the test error unbiased for these programs?
Is the test error unbiased for these programs? Xtrain avg N o Preprocessing by de meaning using whole TEST set 2017 Kevin Jamieson 1 Is the test error unbiased for this program? e Stott see non for f x
More informationIntroduction to Machine Learning
Introduction to Machine Learning Bayesian Classification Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More informationMachine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University August 30, 2017 Today: Decision trees Overfitting The Big Picture Coming soon Probabilistic learning MLE,
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationClassification Logistic Regression
Announcements: Classification Logistic Regression Machine Learning CSE546 Sham Kakade University of Washington HW due on Friday. Today: Review: sub-gradients,lasso Logistic Regression October 3, 26 Sham
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationCPSC 340: Machine Learning and Data Mining
CPSC 340: Machine Learning and Data Mining MLE and MAP Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due tonight. Assignment 5: Will be released
More informationMidterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.
CS 89 Spring 07 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic
More informationMachine Learning, Midterm Exam: Spring 2009 SOLUTION
10-601 Machine Learning, Midterm Exam: Spring 2009 SOLUTION March 4, 2009 Please put your name at the top of the table below. If you need more room to work out your answer to a question, use the back of
More informationMachine Learning. Classification. Bayes Classifier. Representing data: Choosing hypothesis class. Learning: h:x a Y. Eric Xing
Machine Learning 10-701/15 701/15-781, 781, Spring 2008 Naïve Bayes Classifier Eric Xing Lecture 3, January 23, 2006 Reading: Chap. 4 CB and handouts Classification Representing data: Choosing hypothesis
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More informationMachine Learning: Assignment 1
10-701 Machine Learning: Assignment 1 Due on Februrary 0, 014 at 1 noon Barnabas Poczos, Aarti Singh Instructions: Failure to follow these directions may result in loss of points. Your solutions for this
More informationMachine Learning, Midterm Exam
10-601 Machine Learning, Midterm Exam Instructors: Tom Mitchell, Ziv Bar-Joseph Wednesday 12 th December, 2012 There are 9 questions, for a total of 100 points. This exam has 20 pages, make sure you have
More informationStephen Scott.
1 / 28 ian ian Optimal (Adapted from Ethem Alpaydin and Tom Mitchell) Naïve Nets sscott@cse.unl.edu 2 / 28 ian Optimal Naïve Nets Might have reasons (domain information) to favor some hypotheses/predictions
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationLogistic Regression. William Cohen
Logistic Regression William Cohen 1 Outline Quick review classi5ication, naïve Bayes, perceptrons new result for naïve Bayes Learning as optimization Logistic regression via gradient ascent Over5itting
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationMLE/MAP + Naïve Bayes
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes MLE / MAP Readings: Estimating Probabilities (Mitchell, 2016)
More informationMachine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber
Machine Learning Regression-Based Classification & Gaussian Discriminant Analysis Manfred Huber 2015 1 Logistic Regression Linear regression provides a nice representation and an efficient solution to
More informationGenerative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham
Generative classifiers: The Gaussian classifier Ata Kaban School of Computer Science University of Birmingham Outline We have already seen how Bayes rule can be turned into a classifier In all our examples
More informationGenerative Learning algorithms
CS9 Lecture notes Andrew Ng Part IV Generative Learning algorithms So far, we ve mainly been talking about learning algorithms that model p(y x; θ), the conditional distribution of y given x. For instance,
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer
More informationBayesian Learning. Reading: Tom Mitchell, Generative and discriminative classifiers: Naive Bayes and logistic regression, Sections 1-2.
Bayesian Learning Reading: Tom Mitchell, Generative and discriminative classifiers: Naive Bayes and logistic regression, Sections 1-2. (Linked from class website) Conditional Probability Probability of
More informationIntroduction to Logistic Regression and Support Vector Machine
Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel
More informationECE521 Lecture7. Logistic Regression
ECE521 Lecture7 Logistic Regression Outline Review of decision theory Logistic regression A single neuron Multi-class classification 2 Outline Decision theory is conceptually easy and computationally hard
More informationAn Introduction to Statistical and Probabilistic Linear Models
An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning
More informationMachine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5
Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5 Slides adapted from Jordan Boyd-Graber, Tom Mitchell, Ziv Bar-Joseph Machine Learning: Chenhao Tan Boulder 1 of 27 Quiz question For
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationGenerative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul
Generative Learning INFO-4604, Applied Machine Learning University of Colorado Boulder November 29, 2018 Prof. Michael Paul Generative vs Discriminative The classification algorithms we have seen so far
More informationCOMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma
COMS 4771 Introduction to Machine Learning James McInerney Adapted from slides by Nakul Verma Announcements HW1: Please submit as a group Watch out for zero variance features (Q5) HW2 will be released
More informationCSE446: Clustering and EM Spring 2017
CSE446: Clustering and EM Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin, Dan Klein, and Luke Zettlemoyer Clustering systems: Unsupervised learning Clustering Detect patterns in unlabeled
More informationLogistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 8 Feb. 12, 2018 1 10-601 Introduction
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationIntroduction to Machine Learning
Introduction to Machine Learning CS4375 --- Fall 2018 Bayesian a Learning Reading: Sections 13.1-13.6, 20.1-20.2, R&N Sections 6.1-6.3, 6.7, 6.9, Mitchell 1 Uncertainty Most real-world problems deal with
More informationComments. x > w = w > x. Clarification: this course is about getting you to be able to think as a machine learning expert
Logistic regression Comments Mini-review and feedback These are equivalent: x > w = w > x Clarification: this course is about getting you to be able to think as a machine learning expert There has to be
More informationCh 4. Linear Models for Classification
Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,
More informationMidterm: CS 6375 Spring 2015 Solutions
Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an
More informationChapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)
HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter
More informationIntroduction to Gaussian Process
Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression
More informationIntroduction to Machine Learning
Uncertainty Introduction to Machine Learning CS4375 --- Fall 2018 a Bayesian Learning Reading: Sections 13.1-13.6, 20.1-20.2, R&N Sections 6.1-6.3, 6.7, 6.9, Mitchell Most real-world problems deal with
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationGaussian discriminant analysis Naive Bayes
DM825 Introduction to Machine Learning Lecture 7 Gaussian discriminant analysis Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Outline 1. is 2. Multi-variate
More information10-701/ Machine Learning - Midterm Exam, Fall 2010
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationCS446: Machine Learning Fall Final Exam. December 6 th, 2016
CS446: Machine Learning Fall 2016 Final Exam December 6 th, 2016 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains
More informationBayesian Learning. Bayesian Learning Criteria
Bayesian Learning In Bayesian learning, we are interested in the probability of a hypothesis h given the dataset D. By Bayes theorem: P (h D) = P (D h)p (h) P (D) Other useful formulas to remember are:
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Antti Ukkonen TAs: Saska Dönges and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer,
More information