Logistic Classifier CISC 5800 Professor Daniel Leeds

lon 9/7/8 Logstc Classfer CISC 58 Professor Danel Leeds Classfcaton strategy: generatve vs. dscrmnatve Generatve, e.g., Bayes/Naïve Bayes: 5 5 Identfy probablty dstrbuton for each class Determne class wth maxmum probablty for data example Dscrmnatve, e.g., Logstc Regresson: Identfy boundary between classes Determne whch sde of boundary new data example exsts on 5 5 Lnear algebra: data features Document Document Document 3 Feature space Vector lst of numbers: each number descrbes a data feature Matrx lst of lsts of numbers: features for each data pont Wolf Lon 6 Monkey 4 Broker Analyst Dvdend d 8 # of word occurrences 4 3 Each data feature defnes a dmenson n space Document Document Document3 Wolf 8 Lon 6 Monkey 4 Broker 4 Analyst Dvdend d doc doc doc3 wolf 4

9/7/8 The dot product The dot product compares two vectors: a b a =, b = a b = n = a b = a T b a n b n 5 = 5 + = 5 + = 5 6 The dot product, contnued Magntude of a vector s the sum of the squares of the elements a = a If a has unt magntude, a b s the projecton of b onto a a b =.6.8.5.6.8.5 n = a b =.6.5 +.8 =.9 +.8 =.7 =.6 +.8.5 = +.4 =.4 9 Separatng boundary, defned by w Separatng boundary, defned by w and b Separatng hyperplane splts class and class Example: w = b=-4 Plane s defned by lne w perpendcular to plane x = 5 w T x +b= Is data pont x n class or class? w T x+b > class w T x+b < class b< x = w T x +b=-

9/7/8 Notatonal smplfcaton Recall: w T x = w x = n = w x Defne x :n = x :n and x n+ = for all nputs x and w :n = w :n and w n+ = b Now w T x = w T x + b Let s assume x n+ = always, and w n+ =b always From real-number projecton to / label Bnary classfcaton: s class A, s class B Sgmod functon stands n for p(x y) g(h) Sgmod: g h = +e h p y = x; θ = g w T x = p y = x; θ = g w T x = e wt x +e wt x +e wt x.5-5 5 h w T x = j w j x j + b 4 Learnng parameters for classfcaton Smlar to MLE for Bayes classfer Lkelhood for data ponts y,, y n (dfferent from Bayesan lkelhood) If y n class A, y =, multply (-g(x ;w)) If y n class B, y =, multply (g(x ;w)) Learnng parameters for classfcaton Smlar to MLE for Bayes classfer Lkelhood for data ponts y,, y n (dfferent from Bayesan lkelhood) If y n class A, y =, multply (-g(x ;w)) If y n class B, y =, multply (g(x ;w)) argmax L y x; w = w g x ; w ( y ) g x ; w y argmax L y x; w = w g x ; w ( y ) g x ; w y 5 ( y ) log g x ; w + y log g x ; w y log g x ; w g x ; w + log g x ; w 6 3

9/7/8 Learnng parameters for classfcaton y g x ; w log g x ; w + log g x ; w y log + e wt x e wt x + log + e wt x + e wt x y log + e wt x + log e wtx + e wt x w T x = Learnng parameters for classfcaton g h = + e h g h = y w T x w T x + log g w T x y w T x w T x log + e wt x 7 w j w j x j w j y x j x j + x j e wt x + e wt x y ( g(w T x ) ) x j y g(w T x ) j e h + e h 8 w j x j Iteratve gradent ascent Begn wth ntal guessed weghts w For each data pont (y,x ), update each weght w j w j w j + εx j y g(w T x ) Choose ε so change s not too bg or too small step sze x j y g(w T x ) y true data label g(w T x ) computed data label Intuton If y = and g(w T x )=, and x j>, make w j larger and push w T x to be larger If y = and g(w T x )=, and x j>, make w smaller and push w T x to be smaller Iteratve gradent ascent bg pcture Intalze w wth random values Loop across all tranng data x for each feature x j Repeat ths loop many tmes (x, or x, etc.) 9 4

9/7/8 Gradent ascent for L(y x;w) Typcal gradent ascent can get stuck n local maxma L y x; w = g x ; w ( y ) g x ; w y L s convex t has at most maxmum MAP for dscrmnatve classfer MLE: P(y= x;w) ~ g(w T x) MAP: P(y=,w x) P(y= x;w) P(w) ~ g(w T x)??? (dfferent from Bayesan posteror) P(w) prors L regularzaton mnmze all weghts L regularzaton mnmze number of non-zero weghts L and L update rules L: w j w j + εx j L: w j w j + εx j y g(w T x ) sgn w j λ y g(w T x ) w j λ sgn w j = + f w j > f w j < Thnkng about your data: numerc and non-numerc features Data to be classfed can have multple features x = x x n Each feature could be: Numerc: Loudness of musc, from to 3 decbels Non-numerc: Acton, ncludng Laugh, Cry, Jump, Dance 5

9/7/8 Classfer choce Logstc regresson only makes sense for numerc data Gaussan Bayes only makes sense for numerc data Multnomal Bayes makes sense for non-numerc data Non-numerc features -> numerc You may map non-numerc features to contnuous space Example: Mood={Depressed, Dsapponted, Neutral, Happy, Excted} Swtch to: HappnessLevel = {-, -,,, } 6