9.913 Pattern Recognition for Vision. Class IV Part I Bayesian Decision Theory Yuri Ivanov
|
|
- Tamsin Blankenship
- 6 years ago
- Views:
Transcription
1 9.93 Class IV Part I Bayesan Decson Theory Yur Ivanov
2 TOC Roadmap to Machne Learnng Bayesan Decson Makng Mnmum Error Rate Decsons Mnmum Rsk Decsons Mnmax Crteron Operatng Characterstcs
3 Notaton x - scalar varable x - vector varable, sometmes x when clear px ( ) - probablty densty functon (contnuous varable) Px ( ) - probablty mass functon (dscrete varable) p ( abc,,, d e, f, gh, ) denstyof these functonof these - condtonal densty f ( x) d x f( x, x,... x ) dx dx... dx 2 n 2... f( x, x,... x ) dx dx... dx 2 n 2 n n A w > B - f A > B then the answer s ω, otherwse ω 2 w 2
4 Machne Learnng Learnng machne: G x S y LM ŷ = f ( xw, ) G Generator, or Nature: mplements p(x) S Supervsor: mplements p(y x) LM - Learnng Machne: mplements f(x, w)
5 Loss and Rsk L( y, f ( x, w )) - Loss Functon how much penalty we get for devatons from true y z R ( w ) = L( y, f ( x, w )) p( x, y) dxdy - Expected Rsk how much penalty we get on average Goal of learnng s to fnd f(x,w*) such that R(w*) s mnmal.
6 Quck Illustraton What does t mean? Rw ( ) L( y, f ( xw, )) pxydxdy (, ) = From basc probablty: pxy (, ) = py ( xpx ) ( ) If no nose: py ( x ) = d ( ygx, ( )) Rw ( ) = Lgx ( ( ), f ( xw, )) pxdx ( )
7 Illustraton cont. So, Rw ( ) = Lgx ( ( ), f ( xw, )) pxdx ( ) y x
8 Example Classfcaton: Measurements: Labels: x [ -, y = { 0,} ] - 0 x 0 y <= p(x, y) x y = 0 y = Fnd f(x,w*) such that R(w*) s mnmal.
9 Example Let s choose f ( xa, ) = H ( xa, ) - step functon L( y, f ( xa, )) = -d ( y - H ( xa, )) - + for every mstake (supermposed) Y = 0 [ d Y ] Ra ( ) = - ( - H ( xa, )) pxy (, = Y ) dx
10 Learnng n Realty Fundamental problem: where do we get p(x, y)??? What we want: z R ( w ) = L ( y, f ( x, w )) p( x, y ) d xdy Approxmate: estmate rsk functonal by averagng loss over observed (tranng) data. What we get: n R ( w ) R ( w ) = L ( y, f ( x, w )) e n = Replace expected rsk wth emprcal rsk
11 What We Call Learnng Taxonomes of machne learnng: by source of evaluaton Supervsed, Transductve, Unsupervsed, Renforcement by nductve prncple ERM, SRM, MDL, Bayesan estmaton by objectve classfcaton, regresson, vector quantzaton and densty estmaton
12 Taxonomy by Evaluaton Source Supervsed (classfcaton, regresson) Evaluaton source - mmedate error vector, that s, we get to see the true y Transductve Evaluaton source mmedate error vector for SOME of the data Unsupervsed (clusterng, densty estmaton) Evaluaton source - nternal metrc we don t get to see true y Renforcement Evaluaton source - envronment we get to see some scalar value (possbly delayed) that n some way related to whether the label we chose was correct
13 Taxonomy by Inductve Prncple Emprcal Rsk Mnmzaton (ERM) = Structural Rsk Mnmzaton (SRM) w, h = Mnmum Descrpton Length Bayesan Estmaton n mn L ( y, f ( x, w )), f ( x, w ) ` w n n mn( L ( y, f ( x, w )) +F ( h )), n f ( x, w ) `, ` ` `, k h k 2 k n l ( D, H ) = l ( D H ) + l( H ) mn( L ( y, f ( x, w )) + F ( w )) w n z n P ( x C ) = P ( x q ) p ( q C ) d q mn( L ( y, f ( x, w )) + F( P( w))) n = = complexty
14 Taxonomy by Learnng Objectve Classfcaton L( y, f ( x, w )) = - d ( y, f ( x, w)) Regresson L( y, f ( x, w )) = ( y - f ( x, w )) 2 Densty Estmaton L( f ( x, w )) = - log( f ( x, w)) Clusterng/Vector Quantzaton L( f ( x, w )) = ( x - f ( x, w )) ( x - f ( x, w))
15 The Lay of the Land n fnd f ( or w ) = arg mn( L ( y, f ( x, w )) +F) f ( or w ) n = Classfcaton Regresson Densty Estmaton Vector Quantzaton/Clusterng Supervsed Transductve Unsupervsed Renforcement ERM SRM MDL Bayesan Regularzaton
16 Class Prors Makng a decson about observaton x s fndng a rule that says: If x s n regon A, decde a, f x s n regon B, decde b w w = w - state of nature { } C = P ( w) - Pror probablty Shorthand P ( w ) P( w = w ) C = P ( w ) = Poor man s decson rule: Decde w f P ( w ) > P( w2) otherwse w 2
17 Class-Condtonal Densty px ( w ) - class-condtonal densty functon - p ( x w ) dx = px ( w ) - densty for x gven that the nature s n the state w How do we decde whch class x came from? Decdng on ths s not far?
18 Jont Densty p ( x, w ) = p ( x w ) P( w) - jont densty functon Good It s far Bad not very convenent. P ( w ) = 0.2 P ( w 2) = 0.8 Ths relates to the measurement probablstcally, we need a functon!
19 Bayes Rule and Posteror We want a probablty of ω for each value of x: P ( w x) Note that p ( x, w ) = p ( x w ) P ( w ) = P ( w xpx ) ( ) lkelhood pror p ( x w ) P( w ) P ( w x ) = px ( ) posteror evdence Bayes rule Contnuum of bnary dstrbutons
20 Margnalzaton Bayes Rule: how to convert pror to posteror by usng measurements: P ( w x ) = p ( x w ) P( w ) px ( ) What s p(x)? C p ( x ) = p ( x, w ) = p ( x w ) P( w ) C = = Or, more generally: p ( x ) = pxy (, ) dy - margnalzaton
21 Makng Decsons Wth posterors we can compare class probabltes Intutvely: w = arg max p ( w x) ( ) What s the probablty of error? p ( error x ) = p ( w x ) f we choose 2 p ( w x ) f we choose 2 P ( error ) = P ( error, x ) dx = P ( error xpx ) ( ) dx x w w
22 Errors How bad s a decson threshold? Pe ( ) = Px ( R, w ) + Px ( R, w ) 2 2 T* Recall that Pa ( A ) = pada ( ) a A Pe () = px (, w ) dx + px (, w ) dx = x R 2 x R 2 R R 2 = + x R px ( w ) P ( w ) dx px ( w ) P ( w ) dx 2 2 x R 2
23 Bayes Decson Rule To mnmze P ( error ) = P ( error xpx ) ( ) dx we need to make P(error x) as small as possble for all x [ ] [ x ] mn P ( error ) = mn P ( error ) px ( ) dx Bayes error w [ P w x P w x ] = mn ( ), ( 2 ) px ( ) dx Bayes decson rule: Decde w f P ( w x ) > P ( w x); otherwse w 2 2
24 Bayes Decson Rule For Bayes decson rule (back to the st example): R regon where we always choose ω R 2 regon where we always choose ω 2
25 Loss Functon w, w 2,..., w c - set of { } a, a 2,..., a a - set of { } For classes actons L ( a w ) - Loss functon, penalty sustaned for takng We make ths up j an acton α when the state of nature s ω j d x R condtonal rsk for takng an acton α n ω j s the expected (average) loss for classfyng ONE x: c R ( a x ) = L ( a w ) P ( w x) j j j =
26 Bayes Rsk We wll see a lot of x Ø c ø R = R ( a xp ) ( x ) dx = Œ L ( a w j ) P ( w j x) œ px ( ) dx º j = ß Ø ø = Œ L p x œ dx es. To see how well we do, we average agan: c ( a w j ) ( w j, ) º j = ß Ths s exactly the expresson for expected rsk from before Smlarly to the earler argument about P(error): [ ] [ ] * R = R a x) ( ) = mn mn ( px dx R Bayes rsk -
27 Quck Summary L ( a w ) j - loss R ( a x ) = E Ø w x º L( a wj ) ø ß x [ ( a )] R = E R x - condtonal rsk (expected loss) - total rsk (expected cond. rsk) R * = mn [ R] - Bayes rsk (mnmum rsk)
28 Mnmum Rsk Classfcaton Two classes and a smple acton: a l = L( a w ) j j w - decde to choose = { w, w } 2 ω L Ø ø = Œ 3 0 œ º ß Then: R ( a x ) = l P ( w x ) + l2 P ( w 2 x ) R ( a 2 x ) = l2 P ( w x ) + l22 P ( w 2 x ) Obvous decson decde n favor of the class wth mnmal rsk: w R ( a x ) < R ( a x ) 2 w 2
29 Lkelhood Rato Test Rewrtng R(α x) s: w w ( l - l ) P ( w x ) > ( l - l ) P ( w x ) 2 w ( l - l ) p ( x w ) P ( w ) > ( l - l ) p ( x w ) P ( w ) w w ( w ) ( > ( w ) w 2 ( 2 Class model Class pror p x p x l - l ) P ( w ) l - l ) P ( w ) 2 2 Lkelhood Rato Test
30 LRT Example You are drvng to Blockbuster s to return a vdeo due today It s 5 mn to mdnght You ht a red lght You see a car that you 60% sure looks lke a polce car Traffc fne s $5 AND you are late Blockbuster s fne s $0 Should you run the red lght?
31 Mnmum Rsk P ( polce x ) = 0.6 P ( polce x ) = 0.4 You pay run not run polce not polce $5 $0 $0 $0 L 5 0 = Ł 0 0 ł R ( run x ) = l P ( polce x ) + l2 P ( polce x ) = $9 R ( wat x ) = l2 P ( polce x ) + l22 P ( polce x ) = $0 The rsk s hgher f you wat
32 LRT Way OK WAIT Let s say we have ths polceness feature Decson threshold px px w ( w ) ( > ( w ) w 2 ( l - l ) P ( w ) l - l ) P ( w )
33 LRT Example OK WAIT OK WAIT Threshold s dependent on prors px px w ( w ) ( > ( w ) w 2 ( l - l ) P ( w ) l - l ) P ( w )
34 LRT Example L 5 0 = 0 Ł 0 ł L 5-60 = 0 Ł 0 ł OK WAIT OK OK Threshold s dependent on loss WAIT px px w ( w ) ( > ( w ) w 2 ( l - l ) P ( w ) l - l ) P ( w )
35 Mnmum Error Rate Classfcaton Let s smplfy the Mn. Rsk classfcaton: 0 L = 0 Ł ł Then the condtonal rsk becomes: c R ( a x ) = L ( a w ) P ( w x) j j j = = P ( w j x ) = -P( w x) " j - zero-one loss, just counts errors So, ω havng the hghest value of the posteror mnmzes the rsk: P ( w x ) > P ( w x ) " j w j - good ol Bayes decson rule
36 Mnmax Crteron Is there a decson rule such that the rsk s nsenstve to prors? R R [ ( w x) ( w )] = + After some algebra: l P l P x dx 2 2 R 2 [ ( w x) ( w )] + + l P l P x dx RP ( ( w )) = l + ( l - l ) px ( w ) dx + P ( w ) f( T) R Mnmax rsk Goal fnd the decson boundary T, such that f(t) s 0. At the boundary for whch the mnmum rsk s maxmal the rsk s ndependent of prors.
37 Messy Illustraton of the Mnmax Soluton R + P ( w ) f( T) mm -cannot be smaller than Bayes rsk RPw ( ( )) R + P ( w ) f( T) mm Bayes rsk s concave down 0 T : f( T ) 0 T : f( T ) = 0 P( w) Worst possble Bayes rsk s ndependent of prors
38 Dscrmnant Functons and Decson Surfaces Dscrmnant functons convenently represent classfers: C = ( ), ( ),... g ( x)} { g x g 2 x n w : = arg max g ( x) ( ) Eg: g ( x ) =-R ( a x) g ( x ) = P ( w x) g ( x ) = ln p( x w ) + ln P( w ) Dscrmnants DO NOT have to relate to probabltes.
39 Dscrmnant for Bnary Classfcaton For two-class problem: gx ( ) = g - g ( x) ( x ) 2 Then (assumng that classes are encoded as - and +): w = sgn( gx ( )) Eg: gx ( ) = P ( w x ) - P ( w x) 2 px ( w ) P ( w ) gx ( ) = ln + ln px ( w ) P ( w ) 2 2
40 Boundary for Two Normal Dstrbutons If we assume a Gaussan for a class model: px ( w ) = (2 p ) S d / 2 /2 e T - ( x m x m ) - ( - ) S ( - ) 2 and the mnmum error rate classfer: [ ( w ) P( w )] g ( x ) = ln px = =- ( ( ) T - ( ) ) ) d / 2 /2 x - m S x - m - ln Ø p S ø + P w 2 º (2 ß ln ( )
41 Boundary Between Two Normal Dstrbutons Dscrmnant (after some algebra): g ( x ) = g ( x )- g ( x ) == x Wx + wx + w - quadratc where Specal cases: T 2 0 W = w =S m -S m w o = ( - - S ) -S well, the rest of t ) S =S 2 W = 0 g ( x )- lnear 2) s I W Ø ø S = = Œ - œ ( ) 2 s 2s I = s I g x - a crcle º 2 ß - a matrx - a vector - a scalar
42 Evaluatng Decsons Is 70% classfcaton rate good or bad? 70% = BAD 70% = GOOD Dscrmnablty: d ' = m - m 2 Hgh d means that the classes are easy to dscrmnate. s
43 ROC We do not know m, m2, s but we can get: P x x x w * ( > 2 ) P x x x w * ( > ) P x x x w * ( < 2 ) P x x x w * ( < ) - probablty of ht - probablty of false alarm - probablty of mss - probablty of correct rejecton Each x* corresponds to a pont on ht/false_alarm plane. Ths s called an ROC curve
44 ROC Curve
45 Realty In practce, t s done for a sngle parameter Usng the data for whch true ω s known: Identfy a parameter of nterest Identfy the parameter range Vary the parameter wthn the range Compute P(ht) and P(false_alarm) emprcally for each value It tells us how well the classfer can deal wth the data set.
46 Homework Part I A chance puzzle Try to solve t Understand the soluton Smulate n Matlab Buld an ROC curve Almost lke n class Can you mplement t effcently?
P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /
Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons
More informationGenerative classification models
CS 675 Intro to Machne Learnng Lecture Generatve classfcaton models Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square Data: D { d, d,.., dn} d, Classfcaton represents a dscrete class value Goal: learn
More information2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification
E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton
More informationMIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU
Group M D L M Chapter Bayesan Decson heory Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty Bayesan Decson heory Bayesan decson theory s a statstcal approach to data mnng/pattern
More informationMaximum Likelihood Estimation (MLE)
Maxmum Lkelhood Estmaton (MLE) Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 01 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y
More informationThe Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD
he Gaussan classfer Nuno Vasconcelos ECE Department, UCSD Bayesan decson theory recall that we have state of the world X observatons g decson functon L[g,y] loss of predctng y wth g Bayes decson rule s
More informationLecture 12: Classification
Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna
More informationLogistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton
More informationSupport Vector Machines
Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far So far Supervsed machne learnng Lnear models Non-lnear models Unsupervsed machne learnng Generc scaffoldng So far
More informationClassification learning II
Lecture 8 Classfcaton learnng II Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square Logstc regresson model Defnes a lnear decson boundar Dscrmnant functons: g g g g here g z / e z f, g g - s a logstc functon
More informationSupport Vector Machines
Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far Supervsed machne learnng Lnear models Least squares regresson Fsher s dscrmnant, Perceptron, Logstc model Non-lnear
More informationDepartment of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING
MACHINE LEANING Vasant Honavar Bonformatcs and Computatonal Bology rogram Center for Computatonal Intellgence, Learnng, & Dscovery Iowa State Unversty honavar@cs.astate.edu www.cs.astate.edu/~honavar/
More informationFinite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin
Fnte Mxture Models and Expectaton Maxmzaton Most sldes are from: Dr. Maro Fgueredo, Dr. Anl Jan and Dr. Rong Jn Recall: The Supervsed Learnng Problem Gven a set of n samples X {(x, y )},,,n Chapter 3 of
More informationWhy Bayesian? 3. Bayes and Normal Models. State of nature: class. Decision rule. Rev. Thomas Bayes ( ) Bayes Theorem (yes, the famous one)
Why Bayesan? 3. Bayes and Normal Models Alex M. Martnez alex@ece.osu.edu Handouts Handoutsfor forece ECE874 874Sp Sp007 If all our research (n PR was to dsappear and you could only save one theory, whch
More informationINF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018
INF 5860 Machne learnng for mage classfcaton Lecture 3 : Image classfcaton and regresson part II Anne Solberg January 3, 08 Today s topcs Multclass logstc regresson and softma Regularzaton Image classfcaton
More informationENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition
EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton Memoral Unversty of ewfoundland Pattern Recognton Lecture 7 May 3, 006 http://wwwengrmunca/~charlesr Offce Hours: uesdays hursdays 8:30-9:30
More informationCS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements
CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before
More informationStatistical pattern recognition
Statstcal pattern recognton Bayes theorem Problem: decdng f a patent has a partcular condton based on a partcular test However, the test s mperfect Someone wth the condton may go undetected (false negatve
More informationSpace of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics
/7/7 CSE 73: Artfcal Intellgence Bayesan - Learnng Deter Fox Sldes adapted from Dan Weld, Jack Breese, Dan Klen, Daphne Koller, Stuart Russell, Andrew Moore & Luke Zettlemoyer What s Beng Learned? Space
More informationMachine learning: Density estimation
CS 70 Foundatons of AI Lecture 3 Machne learnng: ensty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square ata: ensty estmaton {.. n} x a vector of attrbute values Objectve: estmate the model of
More informationFor now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.
Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson
More informationLogistic Classifier CISC 5800 Professor Daniel Leeds
lon 9/7/8 Logstc Classfer CISC 58 Professor Danel Leeds Classfcaton strategy: generatve vs. dscrmnatve Generatve, e.g., Bayes/Naïve Bayes: 5 5 Identfy probablty dstrbuton for each class Determne class
More informationClustering & Unsupervised Learning
Clusterng & Unsupervsed Learnng Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 2012 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y
More informationOther NN Models. Reinforcement learning (RL) Probabilistic neural networks
Other NN Models Renforcement learnng (RL) Probablstc neural networks Support vector machne (SVM) Renforcement learnng g( (RL) Basc deas: Supervsed dlearnng: (delta rule, BP) Samples (x, f(x)) to learn
More informationEvaluation of classifiers MLPs
Lecture Evaluaton of classfers MLPs Mlos Hausrecht mlos@cs.ptt.edu 539 Sennott Square Evaluaton For any data set e use to test the model e can buld a confuson matrx: Counts of examples th: class label
More informationPattern Classification
attern Classfcaton All materals n these sldes were taken from attern Classfcaton nd ed by R. O. Duda,. E. Hart and D. G. Stork, John Wley & Sons, 000 wth the ermsson of the authors and the ublsher Chater
More informationClassification as a Regression Problem
Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class
More informationPredictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore
Sesson Outlne Introducton to classfcaton problems and dscrete choce models. Introducton to Logstcs Regresson. Logstc functon and Logt functon. Maxmum Lkelhood Estmator (MLE) for estmaton of LR parameters.
More informationComposite Hypotheses testing
Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter
More informationLearning from Data 1 Naive Bayes
Learnng from Data 1 Nave Bayes Davd Barber dbarber@anc.ed.ac.uk course page : http://anc.ed.ac.uk/ dbarber/lfd1/lfd1.html c Davd Barber 2001, 2002 1 Learnng from Data 1 : c Davd Barber 2001,2002 2 1 Why
More informationClustering & (Ken Kreutz-Delgado) UCSD
Clusterng & Unsupervsed Learnng Nuno Vasconcelos (Ken Kreutz-Delgado) UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y ), fnd an approxmatng
More informationThe big picture. Outline
The bg pcture Vncent Claveau IRISA - CNRS, sldes from E. Kjak INSA Rennes Notatons classes: C = {ω = 1,.., C} tranng set S of sze m, composed of m ponts (x, ω ) per class ω representaton space: R d (=
More informationCIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M
CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute
More informationMLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012
MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:
More informationMachine Learning. Classification. Theory of Classification and Nonparametric Classifier. Representing data: Hypothesis (classifier) Eric Xing
Machne Learnng 0-70/5 70/5-78, 78, Fall 008 Theory of Classfcaton and Nonarametrc Classfer Erc ng Lecture, Setember 0, 008 Readng: Cha.,5 CB and handouts Classfcaton Reresentng data: M K Hyothess classfer
More informationEvaluation for sets of classes
Evaluaton for Tet Categorzaton Classfcaton accuracy: usual n ML, the proporton of correct decsons, Not approprate f the populaton rate of the class s low Precson, Recall and F 1 Better measures 21 Evaluaton
More informationEngineering Risk Benefit Analysis
Engneerng Rsk Beneft Analyss.55, 2.943, 3.577, 6.938, 0.86, 3.62, 6.862, 22.82, ESD.72, ESD.72 RPRA 2. Elements of Probablty Theory George E. Apostolaks Massachusetts Insttute of Technology Sprng 2007
More informationHomework Assignment 3 Due in class, Thursday October 15
Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.
More informationLecture 10 Support Vector Machines II
Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed
More informationMAXIMUM A POSTERIORI TRANSDUCTION
MAXIMUM A POSTERIORI TRANSDUCTION LI-WEI WANG, JU-FU FENG School of Mathematcal Scences, Peng Unversty, Bejng, 0087, Chna Center for Informaton Scences, Peng Unversty, Bejng, 0087, Chna E-MIAL: {wanglw,
More informationThe conjugate prior to a Bernoulli is. A) Bernoulli B) Gaussian C) Beta D) none of the above
The conjugate pror to a Bernoull s A) Bernoull B) Gaussan C) Beta D) none of the above The conjugate pror to a Gaussan s A) Bernoull B) Gaussan C) Beta D) none of the above MAP estmates A) argmax θ p(θ
More informationLecture 20: Hypothesis testing
Lecture : Hpothess testng Much of statstcs nvolves hpothess testng compare a new nterestng hpothess, H (the Alternatve hpothess to the borng, old, well-known case, H (the Null Hpothess or, decde whether
More informationSupport Vector Machines
Separatng boundary, defned by w Support Vector Machnes CISC 5800 Professor Danel Leeds Separatng hyperplane splts class 0 and class 1 Plane s defned by lne w perpendcular to plan Is data pont x n class
More information10-701/ Machine Learning, Fall 2005 Homework 3
10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40
More informationxp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ
CSE 455/555 Sprng 2013 Homework 7: Parametrc Technques Jason J. Corso Computer Scence and Engneerng SUY at Buffalo jcorso@buffalo.edu Solutons by Yngbo Zhou Ths assgnment does not need to be submtted and
More informationProbabilistic Classification: Bayes Classifiers. Lecture 6:
Probablstc Classfcaton: Bayes Classfers Lecture : Classfcaton Models Sam Rowes January, Generatve model: p(x, y) = p(y)p(x y). p(y) are called class prors. p(x y) are called class condtonal feature dstrbutons.
More informationINTRODUCTION TO MACHINE LEARNING 3RD EDITION
ETHEM ALPAYDIN The MIT Press, 2014 Lecture Sldes for INTRODUCTION TO MACHINE LEARNING 3RD EDITION alpaydn@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/2ml3e CHAPTER 3: BAYESIAN DECISION THEORY Probablty
More informationStanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011
Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected
More informationArtificial Intelligence Bayesian Networks
Artfcal Intellgence Bayesan Networks Adapted from sldes by Tm Fnn and Mare desjardns. Some materal borrowed from Lse Getoor. 1 Outlne Bayesan networks Network structure Condtonal probablty tables Condtonal
More informationThe exam is closed book, closed notes except your one-page cheat sheet.
CS 89 Fall 206 Introducton to Machne Learnng Fnal Do not open the exam before you are nstructed to do so The exam s closed book, closed notes except your one-page cheat sheet Usage of electronc devces
More informationLinear Classification, SVMs and Nearest Neighbors
1 CSE 473 Lecture 25 (Chapter 18) Lnear Classfcaton, SVMs and Nearest Neghbors CSE AI faculty + Chrs Bshop, Dan Klen, Stuart Russell, Andrew Moore Motvaton: Face Detecton How do we buld a classfer to dstngush
More informationGeneralized Linear Methods
Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set
More informationLinear Feature Engineering 11
Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19
More information8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore
8/5/17 Data Modelng Patrce Koehl Department of Bologcal Scences atonal Unversty of Sngapore http://www.cs.ucdavs.edu/~koehl/teachng/bl59 koehl@cs.ucdavs.edu Data Modelng Ø Data Modelng: least squares Ø
More informationSDMML HT MSc Problem Sheet 4
SDMML HT 06 - MSc Problem Sheet 4. The recever operatng characterstc ROC curve plots the senstvty aganst the specfcty of a bnary classfer as the threshold for dscrmnaton s vared. Let the data space be
More informationCS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015
CS 3710: Vsual Recognton Classfcaton and Detecton Adrana Kovashka Department of Computer Scence January 13, 2015 Plan for Today Vsual recognton bascs part 2: Classfcaton and detecton Adrana s research
More informationEM and Structure Learning
EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder
More informationClassification. Representing data: Hypothesis (classifier) Lecture 2, September 14, Reading: Eric CMU,
Machne Learnng 10-701/15-781, 781, Fall 2011 Nonparametrc methods Erc Xng Lecture 2, September 14, 2011 Readng: 1 Classfcaton Representng data: Hypothess (classfer) 2 1 Clusterng 3 Supervsed vs. Unsupervsed
More informationOutline. Multivariate Parametric Methods. Multivariate Data. Basic Multivariate Statistics. Steven J Zeil
Outlne Multvarate Parametrc Methods Steven J Zel Old Domnon Unv. Fall 2010 1 Multvarate Data 2 Multvarate ormal Dstrbuton 3 Multvarate Classfcaton Dscrmnants Tunng Complexty Dscrete Features 4 Multvarate
More informationDiscriminative classifier: Logistic Regression. CS534-Machine Learning
Dscrmnatve classfer: Logstc Regresson CS534-Machne Learnng 2 Logstc Regresson Gven tranng set D stc regresson learns the condtonal dstrbuton We ll assume onl to classes and a parametrc form for here s
More informationSupport Vector Machines. Vibhav Gogate The University of Texas at dallas
Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest
More information1 Convex Optimization
Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,
More informationLecture Notes on Linear Regression
Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume
More informationCS47300: Web Information Search and Management
CS47300: Web Informaton Search and Management Probablstc Retreval Models Prof. Chrs Clfton 7 September 2018 Materal adapted from course created by Dr. Luo S, now leadng Albaba research group 14 Why probabltes
More informationSupport Vector Machines
/14/018 Separatng boundary, defned by w Support Vector Machnes CISC 5800 Professor Danel Leeds Separatng hyperplane splts class 0 and class 1 Plane s defned by lne w perpendcular to plan Is data pont x
More informationCHAPTER 3: BAYESIAN DECISION THEORY
HATER 3: BAYESIAN DEISION THEORY Decson mang under uncertanty 3 Data comes from a process that s completely not nown The lac of nowledge can be compensated by modelng t as a random process May be the underlyng
More informationj) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1
Random varables Measure of central tendences and varablty (means and varances) Jont densty functons and ndependence Measures of assocaton (covarance and correlaton) Interestng result Condtonal dstrbutons
More informationMarkov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement
Markov Chan Monte Carlo MCMC, Gbbs Samplng, Metropols Algorthms, and Smulated Annealng 2001 Bonformatcs Course Supplement SNU Bontellgence Lab http://bsnuackr/ Outlne! Markov Chan Monte Carlo MCMC! Metropols-Hastngs
More informationLecture 2: Prelude to the big shrink
Lecture 2: Prelude to the bg shrnk Last tme A slght detour wth vsualzaton tools (hey, t was the frst day... why not start out wth somethng pretty to look at?) Then, we consdered a smple 120a-style regresson
More informationBayesian decision theory. Nuno Vasconcelos ECE Department, UCSD
Bayesan decson theory Nuno Vasconcelos ECE Department, UCSD Bayesan decson theory recall that we have state of the world observatons decson functon L[,y] loss of predctn y wth the epected value of the
More informationProbabilistic & Unsupervised Learning. Introduction and Foundations
Probablstc & Unsupervsed Learnng Introducton and Foundatons Maneesh Sahan maneesh@gatsby.ucl.ac.uk Gatsby Computatonal Neuroscence Unt, and MSc ML/CSML, Dept Computer Scence Unversty College London Term
More informationHidden Markov Models
CM229S: Machne Learnng for Bonformatcs Lecture 12-05/05/2016 Hdden Markov Models Lecturer: Srram Sankararaman Scrbe: Akshay Dattatray Shnde Edted by: TBD 1 Introducton For a drected graph G we can wrte
More informationStatistical machine learning and its application to neonatal seizure detection
19/Oct/2009 Statstcal machne learnng and ts applcaton to neonatal sezure detecton Presented by Andry Temko Department of Electrcal and Electronc Engneerng Page 2 of 42 A. Temko, Statstcal Machne Learnng
More informationBayesian decision theory. Nuno Vasconcelos ECE Department, UCSD
Bayesan decson theory Nuno Vasconcelos ECE Department UCSD Notaton the notaton n DHS s qute sloppy e.. show that error error z z dz really not clear what ths means we wll use the follown notaton subscrpts
More informationClassification Bayesian Classifiers
lassfcaton Bayesan lassfers Jeff Howbert Introducton to Machne Learnng Wnter 2014 1 Bayesan classfcaton A robablstc framework for solvng classfcaton roblems. Used where class assgnment s not determnstc,.e.
More informationC4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )
C4B Machne Learnng Answers II.(a) Show that for the logstc sgmod functon dσ(z) dz = σ(z) ( σ(z)) A. Zsserman, Hlary Term 20 Start from the defnton of σ(z) Note that Then σ(z) = σ = dσ(z) dz = + e z e z
More informationKernel Methods and SVMs Extension
Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general
More informationStat 543 Exam 2 Spring 2016
Stat 543 Exam 2 Sprng 2016 I have nether gven nor receved unauthorzed assstance on ths exam. Name Sgned Date Name Prnted Ths Exam conssts of 11 questons. Do at least 10 of the 11 parts of the man exam.
More informationUNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours
UNIVERSITY OF TORONTO Faculty of Arts and Scence December 005 Examnatons STA47HF/STA005HF Duraton - hours AIDS ALLOWED: (to be suppled by the student) Non-programmable calculator One handwrtten 8.5'' x
More informationEnsemble Methods: Boosting
Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement
More informationStatistical analysis using matlab. HY 439 Presented by: George Fortetsanakis
Statstcal analyss usng matlab HY 439 Presented by: George Fortetsanaks Roadmap Probablty dstrbutons Statstcal estmaton Fttng data to probablty dstrbutons Contnuous dstrbutons Contnuous random varable X
More informationLimited Dependent Variables
Lmted Dependent Varables. What f the left-hand sde varable s not a contnuous thng spread from mnus nfnty to plus nfnty? That s, gven a model = f (, β, ε, where a. s bounded below at zero, such as wages
More informationStat 543 Exam 2 Spring 2016
Stat 543 Exam 2 Sprng 206 I have nether gven nor receved unauthorzed assstance on ths exam. Name Sgned Date Name Prnted Ths Exam conssts of questons. Do at least 0 of the parts of the man exam. I wll score
More informationFeature Selection: Part 1
CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?
More informationLinear discriminants. Nuno Vasconcelos ECE Department, UCSD
Lnear dscrmnants Nuno Vasconcelos ECE Department UCSD Classfcaton a classfcaton problem as to tpes of varables e.g. X - vector of observatons features n te orld Y - state class of te orld X R 2 fever blood
More informationConjugacy and the Exponential Family
CS281B/Stat241B: Advanced Topcs n Learnng & Decson Makng Conjugacy and the Exponental Famly Lecturer: Mchael I. Jordan Scrbes: Bran Mlch 1 Conjugacy In the prevous lecture, we saw conjugate prors for the
More informationPattern Classification
Pattern Classfcaton All materals n these sldes ere taken from Pattern Classfcaton (nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wley & Sons, 000 th the permsson of the authors and the publsher
More information3.1 ML and Empirical Distribution
67577 Intro. to Machne Learnng Fall semester, 2008/9 Lecture 3: Maxmum Lkelhood/ Maxmum Entropy Dualty Lecturer: Amnon Shashua Scrbe: Amnon Shashua 1 In the prevous lecture we defned the prncple of Maxmum
More informationRelevance Vector Machines Explained
October 19, 2010 Relevance Vector Machnes Explaned Trstan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introducton Ths document has been wrtten n an attempt to make Tppng s [1] Relevance Vector Machnes
More informationBayesian Learning. Smart Home Health Analytics Spring Nirmalya Roy Department of Information Systems University of Maryland Baltimore County
Smart Home Health Analytcs Sprng 2018 Bayesan Learnng Nrmalya Roy Department of Informaton Systems Unversty of Maryland Baltmore ounty www.umbc.edu Bayesan Learnng ombnes pror knowledge wth evdence to
More informationStatistical Foundations of Pattern Recognition
Statstcal Foundatons of Pattern Recognton Learnng Objectves Bayes Theorem Decson-mang Confdence factors Dscrmnants The connecton to neural nets Statstcal Foundatons of Pattern Recognton NDE measurement
More informationProbability review. Adopted from notes of Andrew W. Moore and Eric Xing from CMU. Copyright Andrew W. Moore Slide 1
robablty reew dopted from notes of ndrew W. Moore and Erc Xng from CMU Copyrght ndrew W. Moore Slde So far our classfers are determnstc! For a gen X, the classfers we learned so far ge a sngle predcted
More informationIntroduction to Hidden Markov Models
Introducton to Hdden Markov Models Alperen Degrmenc Ths document contans dervatons and algorthms for mplementng Hdden Markov Models. The content presented here s a collecton of my notes and personal nsghts
More informationUVA$CS$6316$$ $Fall$2015$Graduate:$$ Machine$Learning$$ $ $Lecture$15:$LogisAc$Regression$/$ GeneraAve$vs.$DiscriminaAve$$
Dr.YanjunQ/UVACS6316/f15 UVACS6316 Fall2015Graduate: MachneLearnng Lecture15:LogsAcRegresson/ GeneraAvevs.DscrmnaAve 10/21/15 Dr.YanjunQ UnverstyofVrgna Departmentof ComputerScence 1 Wherearewe?! FvemajorsecHonsofthscourse
More informationLossy Compression. Compromise accuracy of reconstruction for increased compression.
Lossy Compresson Compromse accuracy of reconstructon for ncreased compresson. The reconstructon s usually vsbly ndstngushable from the orgnal mage. Typcally, one can get up to 0:1 compresson wth almost
More informationProbability Theory (revisited)
Probablty Theory (revsted) Summary Probablty v.s. plausblty Random varables Smulaton of Random Experments Challenge The alarm of a shop rang. Soon afterwards, a man was seen runnng n the street, persecuted
More informationNeural networks. Nuno Vasconcelos ECE Department, UCSD
Neural networs Nuno Vasconcelos ECE Department, UCSD Classfcaton a classfcaton problem has two types of varables e.g. X - vector of observatons (features) n the world Y - state (class) of the world x X
More informationMaximum Likelihood Estimation
Maxmum Lkelhood Estmaton INFO-2301: Quanttatve Reasonng 2 Mchael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 1 of 9 Why MLE?
More informationA Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach
A Bayes Algorthm for the Multtask Pattern Recognton Problem Drect Approach Edward Puchala Wroclaw Unversty of Technology, Char of Systems and Computer etworks, Wybrzeze Wyspanskego 7, 50-370 Wroclaw, Poland
More informationNonlinear Classifiers II
Nonlnear Classfers II Nonlnear Classfers: Introducton Classfers Supervsed Classfers Lnear Classfers Perceptron Least Squares Methods Lnear Support Vector Machne Nonlnear Classfers Part I: Mult Layer Neural
More information