HTF: Ch4 B: Ch4. Linear Classifiers. R Greiner Cmput 466/551
|
|
- Oliver Sims
- 5 years ago
- Views:
Transcription
1 HTF: Ch4 B: Ch4 Linear Classifiers R Greiner Cmput 466/55
2 Outline Framework Eact Minimize Mistakes Perceptron Training Matri inversion LMS Logistic Regression Ma Likelihood Estimation MLE of P Gradient descent MSE; MLE Newton-Raphson Linear Discriminant Analsis Ma Likelihood Estimation MLE of P, Direct Computation Fisher s Linear Discriminant
3 Diagnosing Butterfl-itis 3
4 Classifier: Decision Boundaries Classifier: partitions input space X into decision regions #wings #antennae Linear threshold unit has a linear decision boundar Defn: Set of points that can be separated b linear decision boundar is linearl separable" 4
5 Linear Separators Draw separating line If #antennae, then butterfl-itis So? is Not butterfl-itis. 5
6 Can be angled.3 #w 7.5 #a. 0 If.3 #Wings 7.5 #antennae. > 0 then butterfl-itis 6
7 Linear Separators, in General Given data man features Temp. F Press F Color F n diseasex? Class Pale 3 No 80 Clear - Yes : : : : 0 50 Pale.9 No find weights {w, w,, w n, w 0 } such that means 7
8 Linear Separator ' Σ!! "# ' $ % & # 8
9 Linear Separator ' *,- Σ!! "# ' Performance Given {w i }, and values for instance, compute response Learning Given labeled data, find correct {w i } Linear Threshold Unit Perceptron 9
10 Geometric View Consider 3 training eamples: [.0,.0]; [0.5; 3.0]; [.0;.0]; 0 Want classifier that looks like... 0
11 Linear Equation is Hperplane Equation w i w i i is plane if w > 0 0 otherwise
12 Linear Threshold Unit: Perceptron Squashing function: sgn: R {-, } sgnr Heaviside if r > 0 0 otherwise Actuall w > b but... Create etra input 0 fied at Corresponding w 0 corresponds to -b
13 Learning Perceptrons Can represent Linearl-Separated surface... an hper-plane between two half-spaces Remarkable learning algorithm: [Rosenblatt 960] If function f can be represented b perceptron, then learning alg guaranteed to quickl converge to f! enormous popularit, earl / mid 60's But some simple fns cannot be represented killed the field temporaril! 3
14 Perceptron Learning Hpothesis space is... Fied Size: O n^ distinct perceptrons over n boolean features Deterministic Continuous Parameters Learning algorithm: Various: Local search, Direct computation,... Eager Online / Batch 4
15 Task Input: labeled data Transformed to Output: w R r Goal: Want w s.t. i sgn w [, i ] i... minimize mistakes wrt data... 5
16 Error Function Given data { [ i, i ] } i..m, optimize.... Classification error Perceptron Training; Matri Inversion err Class w m I[ m i o i i w ]. Mean-squared error LMS Matri Inversion; Gradient Descent err MSE w m m i [ i o w i ] 3. Log Conditional Probabilit LR MSE Gradient Descent; LCL Gradient Descent LCL w m m i log P w i i 4. Log Joint Probabilit LDA; FDA Direct Computation LL w m m i log P w, i i 6
17 #: Optimal Classification Error For each labeled instance [, ] Err o w f is target value o w sgnw is perceptron output Idea: Move weights in appropriate direction, to push Err 0 If Err > 0 error on POSITIVE eample need to increase sgnw need to increase w Input j contributes w j j to w if j > 0, increasing w j will increase w if j < 0, decreasing w j will increase w w j w j j If Err < 0 error on NEGATIVE eample w j w j j 7
18 Local Search via Gradient Descent 8
19 #a: Mistake Bound Perceptron Alg Initialize w 0 Do until bored Predict iff w > 0 else " Mistake on : w w Mistake on w w Weights [0 0 0] [ 0 0] [0-0] [ 0 ] [ 0 ] Instance # # #3 # # Action - OK - [0 - ] #3 [ 0 ] # OK [ 0 ] # - [0 - ] #3 OK [ - ] # [ - ] # OK [ - ] #3 OK [ - ] # OK 9
20 Mistake Bound Theorem See SVM Theorem: [Rosenblatt 960] If data is consistent w/some linear threshold w, then number of mistakes is /, where measures wiggle room available: If, then is ma, over all consistent planes, of minimum distance of eample to that plane w is to separator, as w 0 at boundar So w is projection of onto plane, PERPENDICULAR to boundar line ie, is distance from to that line once normalized 0
21 Proof of Convergence wrong wrt w iff w < 0 Let w * be unit vector rep'ning target plane min { w * } Let w be hpothesis plane Consider: On each mistake, add to w w Σ { w < 0 }
22 Proof con't If w is mistake min { w * } w Σ { w < 0 }
23 #b: Perceptron Training Rule For each labeled instance [, ] Err [, ] o w { -, 0, } If Err [, ] 0 Correct! Do nothing! w 0 Err [, ] If Err [, ] Mistake on positive! Increment b w Err [, ] If Err [, ] - Mistake on negative! Increment b - w - Err [, ] In all cases... w i Err [ i, i ] i [ i o w i ] i Batch Mode: do ALL updates at once! w j i w j i i i j i o w i w j η w j 3
24 feature j 0. Fi New ww w : 0. For each row i, compute a. E i : i o w i b. w E i i [ w j E i i j ]. Increment w η w i i j E i w w j 4
25 Correctness Rule is intuitive: Climbs in correct direction... Thrm: Converges to correct answer, if... training data is linearl separable sufficientl small η Proof: Weight space has EXACTLY minimum! no non-global minima with enough eamples, finds correct function! Eplains earl popularit If η too large, ma overshoot If η too small, takes too long So often η ηk which decas with # of iterations, k 5
26 #c: Matri Version? Task: Given { i, i } i i {, } is label Find { w i } s.t. Linear Equalities X w Solution: w X - 6
27 Issues. Wh restrict to onl i {, }? If from discrete set i { 0,,, m } : General non-binar classification If ARBITRARY i R: Regression. What if NO w works?...x is singular; overconstrained... Could tr to minimize residual i [ i w i ] X w i i w i X w i i w i NP-Hard! Eas! 7
28 L error vs 0/-Loss 0/ Loss function not smooth, differentiable MSE error is smooth, differentiable and is overbound... 8
29 Gradient Descent for Perceptron? Wh not Gradient Descent for THRESHOLDed perceptron? Needs gradient derivative, not Gradient Descent is General approach. Requires continuousl parameterized hpothesis error must be differentiatable wrt parameters But... can be slow man iterations ma onl find LOCAL opt 9
30 Linear Separators Facts GOOD NEWS: If data is linearl separated, Then FAST ALGORITHM finds correct {w i }! But 30
31 Linear Separators Facts GOOD NEWS: If data is linearl separated, Then FAST ALGORITHM finds correct {w i }! But Some data sets are NOT linearl separatable! Sta tuned! 3
32 #. LMS version of Classifier View as Regression Find best linear mapping w from X to Y w * argmin Err X, Y LMS w Err X, Y LMS w i i w i Threshold: if w T > 0.5, return ; else 0 See Chapter 3 3
33 General Idea Use a discriminant function δ k for each class k Eg, δ k P Gk X Classification rule: Return k argma j δ j If each δ j is linear, decision boundaries are piecewise hperplanes 33
34 Linear Classification using Linear Regression D Input space: X X, X K-3 classes: Training sample N5: Y Y, Y, Y3 X [,0,0] [0,,0] [0,0,] , Y Regression output: T T Y ˆ, X X X Y T T T β β β3 Classification rule: Gˆ arg ma ˆ Yk k Yˆ Yˆ Yˆ β β β 3 34
35 Use Linear Regression for Classification? But regression minimizes sum of squared errors on target function which gives strong influence to outliers Great separation Bad separation 35
36 #3: Logistic Regression Want to compute P w... based on parameters w But w has range [-, ] probabilit must be in range [0; ] Need squashing function [-, ] [0, ] σ e 36
37 37 Alternative Derivation ep a P P P P P P P ln P P P P a
38 Sigmoid Unit 38
39 39 Logistic Regression con t Assume classes: w w e w P σ w w w w e e e P Log Odds: w P P w w log Linear
40 How to learn parameters w? depends on goal? A: Minimize MSE? i i o w i B: Maimize likelihood? i log P w i i 40
41 MSError Gradient for Sigmoid Unit Error: j j o w j j E j For single training instance Input: j [ j,, j k] Computed Output: o j σ i j i w i σ z j where z j i j i w i using current { w i } Correct output: j Stochastic Error Gradient Ignore j superscript σ z e z 4
42 4 Derivative of Sigmoid ] [ a a e e e e e e e e da d e e da d a da d a a a a a a a a a a σ σ σ
43 Updating LR Weights MSE σ z e z Note: As alread computed o j σ z j to get answer, trivial to compute σ z j σ z j σ z j Update w i w i where 43
44 LMS feature j 0. New Fi ww w 0. For each row i, compute a. E i o i i o i w o i i o i b. w E i i [ w j E i i j ]. Increment w η w i i j E i w w j 44
45 B: Or... Learn Conditional Probabilit As fitting probabilit distribution, better to return probabilit distribution w that is most likel, given training data, S Baes Rules As PS does not depend on w As Pw is uniform As log is monotonic 45
46 ML Estimation P S w likelihood function Lw log P S w w * argma w Lw is maimum likelihood estimator MLE 46
47 Computing the Likelihood As training eamples [ i, i ] are iid drawn independentl from same unknown prob P w, log P S w log Π i P w i, i i log P w i, i i log P w i i i log P w i Here P w i /n not dependent on w, over empirical sample S w * argma w i log P w i i 47
48 Fit Logistic Regression b Gradient Ascent Want w * argma w Jw Jw i r i, i, w For {0, } r,, w log P w log P w log P w So climb along i i J r w w j i j 48
49 49 Gradient Descent j j j j j w p p p p w p p w p p p p w w r log log [ ] [ i j j j j w j p p w w w w w w w P w p σ σ σ,, i j i w i i j i i i j i i j P p p p p p w w r w w J
50 Gradient Ascent for Logistic Regression MLE w p i i p i w j η w 50
51 Comments on MLE Algorithm This is BATCH; obvious online alg stochastic gradient ascent Can use second-order Newton-Raphson alg for faster convergence weighted least squares computation; aka Iterativel-Reweighted Least Squares IRLS 5
52 Use Logistic Regression for Classification Return YES iff P P P 0 P ln P 0 / ep w ln ep w / ep w ln w > 0 ep w > P 0 > > 0 > 0 Logistic Regression learns a LTU! 5
53 Logistic Regression for K > Classes Note: k- different w i weights, each of dimension 53
54 Learning LR Weights ep w ep w ep w if if 0 0 w i j o i i o i o i w i j i p i i j 54
55 MaProb LMS feature j 0. New Fi ww w 0. For each row i, compute a. E i o i i op i w o i i o i b. w E i i [ w j E i i j ]. Increment w η w i i j E i w w j 55
56 56 Logistic Regression Computation p non-linear equations Solve b Newton-Raphson method: 0 ep ep N i i T T i l β β β β β β β β β β ] [Jacobian - old old old new l l N i i T i i T i N i i T i i T i N i i i i i N i i i X G X G X G l ep log ep log 0 logpr logpr } {log Pr β β β β β
57 Newton-Raphson Method A gen l technique for solving f0 even if non-linear Talor series: f n f n n n f n n n [ f n f n ] / f n When n near root, f n 0 n : n f f n n Iteration 57
58 58 Newton-Raphson in Multi-dimensions To solve the equations: Talor series: N-R: 0,,, 0,,, 0,,, N N N N f f f N j f f f N k k k j j j,...,,,,,,,,,,, n N n n N n N n n n N n n N N N N N N n N n n n N n n f f f f f f f f f f f f Jacobian matri
59 59 Newton-Raphson : Eample Solve 0 sin, 0 cos, 3 f f 3 sin cos 3 cos sin n n n n n n n n n n n n n n
60 Maimum Likelihood Parameter Estimation Find the unknown parameters mean & standard deviation of a Gaussian pdf, µ p ; µ, σ ep πσ σ given N independent samples, {,., N } Estimate the parameters that maimize the N likelihood function L i µ µ, σ ep ˆ, µ ˆ σ arg ma L µ, σ µ, σ i πσ σ 60
61 Logistic Regression Algs for LTUs Learns Conditional Probabilit Distribution P Local Search: Begin with initial weight vector; iterativel modif to maimize objective function log likelihood of the data ie, seek w s.t. probabilit distribution P w is most likel given data. Eager: Classifier constructed from training eamples, which can then be discarded. Online or batch 6
62 Masking of Some Class Linear regression of the indicator matri can lead to masking D input space and three classes Masking Y ˆ 3 β 3 Y ˆ β Y ˆ β Viewing direction LDA can avoid this masking 6
63 #4: Linear Discriminant Analsis LDA learns joint distribution P, As P, P ; optimizing P, optimizing P generative model P, model of how data is generated Eg, factor P, P P P generates value for ; then P generates value for given this Belief net: Y X 63
64 Linear Discriminant Analsis, con't P, P P P is a simple discrete distribution Eg: P 0 0.3; P % negative eamples; 69% positive eamples Assume P is multivariate normal, with mean µ k and covariance 64
65 Estimating LDA Model Linear discriminant analsis assumes form P, µ is mean for eamples belonging to class ; covariance matri is shared b all classes! Can estimate LDA directl: m k #training eamples in class k Estimate of P k : p k m k / m ˆ µ k m { i: k} i i Σ ˆ i i i i i m ˆ µ ˆ µ Subtract each i from corresponding before taking outer product µˆ i T m k 65
66 Eample of Estimation m7 eamples; m 3 positive; m - 4 negative p 3/7 p - 4/7 Note: do NOT pre-pend 0! 4 T T T T 66 T
67 Estimation T T T T T T z 7 : 67
68 Classifing, Using LDA How to classif new instance, given estimates T T Class for instance [5, 4, 6] T? T T T T T T T T T T 68
69 LDA learns an LTU Consider -class case with a 0/ loss function Classif if iff 69
70 LDA Learns an LTU µ T - µ µ 0 T - µ 0 T - µ 0 µ µ 0 µ T - µ T - µ µ 0 T - µ 0 As - is smmetric, T - µ 0 µ µ T - µ µ 0 T - µ 0 70
71 LDA Learns an LTU 3 So let Classif iff w c > 0 LTU!! 7
72 LDA: Eample LDA was able to avoid masking here 7
73 View LDA wrt Mahalanobis Distance Squared Mahalanobis distance between and µ D M, µ µ T - µ - linear distortion converts standard Euclidean distance into Mahalanobis distance. LDA classifies as 0 if D M, µ 0 < D M, µ log P k log π k ½ D M, µ k 73
74 Generalizations of LDA General Gaussian Classifier: QDA Allow each class k to have its own k Classifier quadratic threshold unit not LTU Naïve Gaussian Classifier Allow each class k to have its own k but require each k be diagonal. within each class, an pair of features i and j are independent Classifier is still quadratic threshold unit but with a restricted form Most discriminating Low Dimensional Projection Fisher s Linear Discriminant 74
75 QDA and Masking Better than Linear Regression in terms of handling masking: Usuall computationall more epensive than LDA 75
76 Variants of LDA Covariance matri n features; k classes Name Same for all classes? Diagonal #param s k LDA n Naïve Gaussian Classifier k n General Gaussian Classifier k n 76
77 Versions of?l?q?n? DA LDA Quadratic Naïve SuperSimple 77
78 Summar of Linear Discriminant Analsis Learns Joint Probabilit Distr'n P, Direct Computation. MLEstimate of P, computed directl from data without search. But need to invert matri, which is On 3 Eager: Classifier constructed from training eamples, which can then be discarded. Batch: Onl a batch algorithm. An online LDA alg requires online alg for incrementall updating - [Eas if - is diagonal... ] 78
79 Fisher's Linear Discriminant LDA Finds K dim hperplane K number of classes Project and { µ k } to that hperplane Classif as nearest µ k within hperplane Better: Find hperplane that maimall separates projection of 's wrt - Fisher s Linear Discriminant 79
80 Fisher Linear Discriminant Recall an vector w projects R n R Goal: Want w that separates classes Each w far from each w µ µ - Perhaps project onto m m? Still overlap wh? 80
81 Fisher Linear Discriminant µ µ - Problem with m m : Does not consider scatter within class Goal: Want w that separates classes Each w far from each w Positive 's: w close to each other Negative 's: w close to each other scatter of instance; instance s i i w i m s i i w i m - 8
82 Fisher Linear Discriminant Recall an vector w projects R n R Goal: Want w that separates classes Positive 's: w close to each other Negative 's: w close to each other Each w far from each w µ µ - scatter of instance; instance s i i w i m s i i w i m 8
83 FLD, con't Separate means m and m maimize m m Minimize each spread s, s minimize s s Objective function: maimize µ µ J S w s s #:µ µ w T m w T m w T m m m m T w w T S B w between-class scatter S B m m m m T 83
84 FLD, III µ µ J S w s s s i i w i m i w T i i m i m T w w T S w S i i i m i m T within-class scatter matri for S i i i m i m T within-class scatter matri for S w S S so s s w T S W w 84
85 85 FLD, IV w * is eigenvector of S B - S w w T B T S S S s s J µ µ, S B S w L λ λ Minimizing J S w w * argmin w w T S B w s.t. w T S w w Lagrange: Lw, λ w T S B w λ - w T S w w λ λ 0, S B S w L
86 FLD, V J µ µ S s s T T S S B w Optimal w * is eigenvector of S B - S w When P i ~ Nµ i ; LINEAR DISCRIMINANT: w - µ µ FLD is optimal classifier, if classes normall distributed Can use even if not Gaussian: After projecting d-dim to, just use an classification method 86
87 Fisher s LD vs LDA Fisher s LD LDA when Prior probabilities are same Each class conditional densit is multivariate Gaussian with common covariance matri Fisher s LD does not assume Gaussian densities can be used to reduce dimensions even when multiple classes scenario 87
88 Comparing LMS, Logistic Regression, LDA, FLD Which is best: LMS, LR, LDA, FLD? Ongoing debate within machine learning communit about relative merits of direct classifiers [ LMS ] conditional models P [ LR ] generative models P, [ LDA, FLD ] Sta tuned... 88
89 Issues in Debate Statistical efficienc If generative model P, is correct, then usuall gives better accurac, particularl if training sample is small Computational efficienc Generative models tpicall easiest to compute LDA/FLD computed directl, without iteration Robustness to changing loss functions LMS must re-train the classifier when the loss function changes. no retraining for generative and conditional models Robustness to model assumptions. Generative model usuall performs poorl when the assumptions are violated. Eg, LDA works poorl if P is non-gaussian. Logistic Regression is more robust, LMS is even more robust Robustness to missing values and noise. In man applications, some of the features ij ma be missing or corrupted for some of the training eamples. Generative models tpicall provide better was of handling this than non-generative models. 89
90 Other Algorithms for learning LTUs Naive Baes [Discuss later] For K classes, produces LTU Winnow [?Discuss later?] Can handle large numbers of irrelevant" features features whose weights should be zero 90
91 Learning Theor Assume data is trul linearl separable... Sample Compleit: Given ε, δ 0,, want LTU has error rate on new eamples less than ε with probabilit > δ. Suffices to learn from be consistent with labeled training eamples. Computational Compleit: There is a polnomial time algorithm for finding a consistent LTU reduction from linear programming Agnostic case different 9
Linear Classifiers 1
Linear Classifiers 1 Outline Framework Exact Minimize Mistakes (Perceptron Training) Matrix inversion Logistic Regression Model Max Likelihood Estimation (MLE) of P( y x ) Gradient descent (MSE; MLE) Linear
More informationMachine Learning (CS 567) Lecture 5
Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationCS534: Machine Learning. Thomas G. Dietterich 221C Dearborn Hall
CS534: Machine Learning Thomas G. Dietterich 221C Dearborn Hall tgd@cs.orst.edu http://www.cs.orst.edu/~tgd/classes/534 1 Course Overview Introduction: Basic problems and questions in machine learning.
More informationLinear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging
More informationCSE 546 Midterm Exam, Fall 2014
CSE 546 Midterm Eam, Fall 2014 1. Personal info: Name: UW NetID: Student ID: 2. There should be 14 numbered pages in this eam (including this cover sheet). 3. You can use an material ou brought: an book,
More informationIntroduction to Machine Learning
Introduction to Machine Learning Thomas G. Dietterich tgd@eecs.oregonstate.edu 1 Outline What is Machine Learning? Introduction to Supervised Learning: Linear Methods Overfitting, Regularization, and the
More informationMachine Learning 2017
Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section
More informationLecture 3: Pattern Classification. Pattern classification
EE E68: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mitures and
More informationLinear Discrimination Functions
Laurea Magistrale in Informatica Nicola Fanizzi Dipartimento di Informatica Università degli Studi di Bari November 4, 2009 Outline Linear models Gradient descent Perceptron Minimum square error approach
More informationINF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification
INF 4300 151014 Introduction to classifiction Anne Solberg anne@ifiuiono Based on Chapter 1-6 in Duda and Hart: Pattern Classification 151014 INF 4300 1 Introduction to classification One of the most challenging
More informationINF Introduction to classifiction Anne Solberg
INF 4300 8.09.17 Introduction to classifiction Anne Solberg anne@ifi.uio.no Introduction to classification Based on handout from Pattern Recognition b Theodoridis, available after the lecture INF 4300
More informationLast updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship
More informationLecture 5: Logistic Regression. Neural Networks
Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture
More informationLogistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu
Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data
More informationLecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.
Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods
More informationLINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,
More informationCh 4. Linear Models for Classification
Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationL20: MLPs, RBFs and SPR Bayes discriminants and MLPs The role of MLP hidden units Bayes discriminants and RBFs Comparison between MLPs and RBFs
L0: MLPs, RBFs and SPR Bayes discriminants and MLPs The role of MLP hidden units Bayes discriminants and RBFs Comparison between MLPs and RBFs CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU
More information7 Gaussian Discriminant Analysis (including QDA and LDA)
36 Jonathan Richard Shewchuk 7 Gaussian Discriminant Analysis (including QDA and LDA) GAUSSIAN DISCRIMINANT ANALYSIS Fundamental assumption: each class comes from normal distribution (Gaussian). X N(µ,
More informationThe classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know
The Bayes classifier Theorem The classifier satisfies where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know Alternatively, since the maximum it is
More informationThe classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA
The Bayes classifier Linear discriminant analysis (LDA) Theorem The classifier satisfies In linear discriminant analysis (LDA), we make the (strong) assumption that where the min is over all possible classifiers.
More informationMACHINE LEARNING ADVANCED MACHINE LEARNING
MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 22 MACHINE LEARNING Discrete Probabilities Consider two variables and y taking discrete
More informationLecture 9: Large Margin Classifiers. Linear Support Vector Machines
Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Perceptrons Definition Perceptron learning rule Convergence Margin & max margin classifiers (Linear) support vector machines Formulation
More informationLinear and logistic regression
Linear and logistic regression Guillaume Obozinski Ecole des Ponts - ParisTech Master MVA Linear and logistic regression 1/22 Outline 1 Linear regression 2 Logistic regression 3 Fisher discriminant analysis
More informationLogistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.
Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationMachine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 20, 2012 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationLearning Neural Networks
Learning Neural Networks Neural Networks can represent complex decision boundaries Variable size. Any boolean function can be represented. Hidden units can be interpreted as new features Deterministic
More informationMACHINE LEARNING ADVANCED MACHINE LEARNING
MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 2 2 MACHINE LEARNING Overview Definition pdf Definition joint, condition, marginal,
More informationChap 2. Linear Classifiers (FTH, ) Yongdai Kim Seoul National University
Chap 2. Linear Classifiers (FTH, 4.1-4.4) Yongdai Kim Seoul National University Linear methods for classification 1. Linear classifiers For simplicity, we only consider two-class classification problems
More informationLinear Models in Machine Learning
CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationOutline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation
More informationCross-validation for detecting and preventing overfitting
Cross-validation for detecting and preventing overfitting A Regression Problem = f() + noise Can we learn f from this data? Note to other teachers and users of these slides. Andrew would be delighted if
More information11 More Regression; Newton s Method; ROC Curves
More Regression; Newton s Method; ROC Curves 59 11 More Regression; Newton s Method; ROC Curves LEAST-SQUARES POLYNOMIAL REGRESSION Replace each X i with feature vector e.g. (X i ) = [X 2 i1 X i1 X i2
More informationIterative Reweighted Least Squares
Iterative Reweighted Least Squares Sargur. University at Buffalo, State University of ew York USA Topics in Linear Classification using Probabilistic Discriminative Models Generative vs Discriminative
More informationGenerative v. Discriminative classifiers Intuition
Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 1 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features
More informationSupport Vector Machines
Support Vector Machines Hypothesis Space variable size deterministic continuous parameters Learning Algorithm linear and quadratic programming eager batch SVMs combine three important ideas Apply optimization
More informationLogistic Regression: Online, Lazy, Kernelized, Sequential, etc.
Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Harsha Veeramachaneni Thomson Reuter Research and Development April 1, 2010 Harsha Veeramachaneni (TR R&D) Logistic Regression April 1, 2010
More informationGenerative v. Discriminative classifiers Intuition
Logistic Regression Machine Learning 070/578 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features Y target
More informationLinear discriminant functions
Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative
More informationSupport Vector Machines. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Support Vector Machines CAP 5610: Machine Learning Instructor: Guo-Jun QI 1 Linear Classifier Naive Bayes Assume each attribute is drawn from Gaussian distribution with the same variance Generative model:
More informationMachine Learning. Linear Models. Fabio Vandin October 10, 2017
Machine Learning Linear Models Fabio Vandin October 10, 2017 1 Linear Predictors and Affine Functions Consider X = R d Affine functions: L d = {h w,b : w R d, b R} where ( d ) h w,b (x) = w, x + b = w
More informationECE521 lecture 4: 19 January Optimization, MLE, regularization
ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity
More informationSGN (4 cr) Chapter 5
SGN-41006 (4 cr) Chapter 5 Linear Discriminant Analysis Jussi Tohka & Jari Niemi Department of Signal Processing Tampere University of Technology January 21, 2014 J. Tohka & J. Niemi (TUT-SGN) SGN-41006
More informationINF Anne Solberg One of the most challenging topics in image analysis is recognizing a specific object in an image.
INF 4300 700 Introduction to classifiction Anne Solberg anne@ifiuiono Based on Chapter -6 6inDuda and Hart: attern Classification 303 INF 4300 Introduction to classification One of the most challenging
More informationMachine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber
Machine Learning Regression-Based Classification & Gaussian Discriminant Analysis Manfred Huber 2015 1 Logistic Regression Linear regression provides a nice representation and an efficient solution to
More informationStochastic Gradient Descent
Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular
More informationMethods for Advanced Mathematics (C3) Coursework Numerical Methods
Woodhouse College 0 Page Introduction... 3 Terminolog... 3 Activit... 4 Wh use numerical methods?... Change of sign... Activit... 6 Interval Bisection... 7 Decimal Search... 8 Coursework Requirements on
More informationMachine Learning for NLP
Machine Learning for NLP Uppsala University Department of Linguistics and Philology Slides borrowed from Ryan McDonald, Google Research Machine Learning for NLP 1(50) Introduction Linear Classifiers Classifiers
More informationECE521 Lecture7. Logistic Regression
ECE521 Lecture7 Logistic Regression Outline Review of decision theory Logistic regression A single neuron Multi-class classification 2 Outline Decision theory is conceptually easy and computationally hard
More informationMachine Learning 4771
Machine Learning 4771 Instructor: Tony Jebara Topic 7 Unsupervised Learning Statistical Perspective Probability Models Discrete & Continuous: Gaussian, Bernoulli, Multinomial Maimum Likelihood Logistic
More informationMulticlass Logistic Regression
Multiclass Logistic Regression Sargur. Srihari University at Buffalo, State University of ew York USA Machine Learning Srihari Topics in Linear Classification using Probabilistic Discriminative Models
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Antti Ukkonen TAs: Saska Dönges and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer,
More informationMIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,
MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run
More informationApril 9, Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá. Linear Classification Models. Fabio A. González Ph.D.
Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá April 9, 2018 Content 1 2 3 4 Outline 1 2 3 4 problems { C 1, y(x) threshold predict(x) = C 2, y(x) < threshold, with threshold
More informationECS171: Machine Learning
ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018 Gradient descent Optimization Goal: find the minimizer of a function min f (w) w For now we assume f
More informationMidterm: CS 6375 Spring 2015 Solutions
Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationLecture 3: Pattern Classification
EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures
More informationMachine Learning - Waseda University Logistic Regression
Machine Learning - Waseda University Logistic Regression AD June AD ) June / 9 Introduction Assume you are given some training data { x i, y i } i= where xi R d and y i can take C different values. Given
More informationLecture 4: Types of errors. Bayesian regression models. Logistic regression
Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture
More informationLecture 5: LDA and Logistic Regression
Lecture 5: and Logistic Regression Hao Helen Zhang Hao Helen Zhang Lecture 5: and Logistic Regression 1 / 39 Outline Linear Classification Methods Two Popular Linear Models for Classification Linear Discriminant
More informationMachine Learning: The Perceptron. Lecture 06
Machine Learning: he Perceptron Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu 1 McCulloch-Pitts Neuron Function 0 1 w 0 activation / output function 1 w 1 w w
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer
More informationNotes on Discriminant Functions and Optimal Classification
Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem
More informationEngineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers
Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More information6.867 Machine learning
6.867 Machine learning Mid-term eam October 8, 6 ( points) Your name and MIT ID: .5.5 y.5 y.5 a).5.5 b).5.5.5.5 y.5 y.5 c).5.5 d).5.5 Figure : Plots of linear regression results with different types of
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationLinear and Logistic Regression. Dr. Xiaowei Huang
Linear and Logistic Regression Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Two Classical Machine Learning Algorithms Decision tree learning K-nearest neighbor Model Evaluation Metrics
More informationMidterm exam CS 189/289, Fall 2015
Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points
More informationStructured Prediction with Perceptron: Theory and Algorithms
Structured Prediction with Perceptron: Theor and Algorithms the man bit the dog the man hit the dog DT NN VBD DT NN 那 人咬了狗 =+1 =-1 Kai Zhao Dept. Computer Science Graduate Center, the Cit Universit of
More informationLinear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x))
Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard and Mitch Marcus (and lots original slides by
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 2, 2015 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)
More informationLinear classifiers: Overfitting and regularization
Linear classifiers: Overfitting and regularization Emily Fox University of Washington January 25, 2017 Logistic regression recap 1 . Thus far, we focused on decision boundaries Score(x i ) = w 0 h 0 (x
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationReading Group on Deep Learning Session 1
Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More informationGenerative v. Discriminative classifiers Intuition
Logistic Regression (Continued) Generative v. Discriminative Decision rees Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University January 31 st, 2007 2005-2007 Carlos Guestrin 1 Generative
More informationIntroduction to Machine Learning
Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationCOMP 652: Machine Learning. Lecture 12. COMP Lecture 12 1 / 37
COMP 652: Machine Learning Lecture 12 COMP 652 Lecture 12 1 / 37 Today Perceptrons Definition Perceptron learning rule Convergence (Linear) support vector machines Margin & max margin classifier Formulation
More informationStatistical Machine Learning Hilary Term 2018
Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html
More informationThe Perceptron algorithm
The Perceptron algorithm Tirgul 3 November 2016 Agnostic PAC Learnability A hypothesis class H is agnostic PAC learnable if there exists a function m H : 0,1 2 N and a learning algorithm with the following
More informationMachine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d.
More informationIntroduction to Machine Learning
Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},
More informationMining Classification Knowledge
Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology COST Doctoral School, Troina 2008 Outline 1. Bayesian classification
More informationMachine Learning. Lecture 3: Logistic Regression. Feng Li.
Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification
More informationLogistic Regression. William Cohen
Logistic Regression William Cohen 1 Outline Quick review classi5ication, naïve Bayes, perceptrons new result for naïve Bayes Learning as optimization Logistic regression via gradient ascent Over5itting
More informationMachine Learning
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 1, 2011 Today: Generative discriminative classifiers Linear regression Decomposition of error into
More informationLanguage and Statistics II
Language and Statistics II Lecture 19: EM for Models of Structure Noah Smith Epectation-Maimization E step: i,, q i # p r $ t = p r i % ' $ t i, p r $ t i,' soft assignment or voting M step: r t +1 # argma
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More information