ML4NLP Introduction to Classification

Size: px
Start display at page:

Download "ML4NLP Introduction to Classification"

Transcription

1 ML4NLP Introducton to Classfcaton CS 590NLP Dan Goldwasser Purdue Unversty

2 Statstcal Language Modelng Intuton: by lookng at large quanttes of text we can fnd statstcal regulartes Dstngush between correct and ncorrect sentences Language models defne a probablty dstrbuton over strngs (e.g., sentences) n a language. We can use language model to score and rank sentences I don t know {whether,weather} to laugh or cry P( I don t.. weather to laugh.. ) >< P( I don t.. whether to laugh.. )

3 Language Modelng wth N-grams Ungram m o del Bgram m o del Trgram m o del P (w)p (w2)...p(w) P (w)p (w2 w)...p(w w - ) P (w)p (w2 w)...p(w w-2 w-)

4 Evaluatng Language Models Assumng that we have a language model, how can we tell f t s good? Opton : try to generate Shakespeare.. Ths s know as Qualtatve evaluaton Opton 2: Quanttatve evaluaton Opton 2.: See how well you do on Spellng correcton Ths s known as Extrnsc Evaluaton Opton 2.2: Fnd an ndependent measure for LM qualty Ths s known as Intrnsc Evaluaton

5 When are LM applcable? Fndng regularty n language s surprsngly useful! Easy example: weather/whether But also- Translaton (can you produce legal French from source Englsh?) Capton Generaton (combne output of vsual sensors nto a grammatcal sentence) Deep Vsual-Semantc Algnments for Generatng Image Descrptons

6 Classfcaton A fundamental machne learnng tool Wdely applcable n NLP Supervsed learnng: Learner s gven a collecton of labeled documents Emals: Spam/not spam; Revews: Pos/Neg Buld a functon mappng documents to labels Key property: Generalzaton functon should work well on new data

7 Sentment Analyss Dude, I ust watched ths horror flck! Sellng ponts: nghtmares scenes, torture scenes, terrble monsters that was so bad a##! Don t buy the popcorn t was terrble, the monsters sellng t must have wanted to torture me, t was so bad t gave me nghtmares! What should your learnng algorthm look at?

8 Deceptve Revews What should your learnng algorthm look at? Fndng Deceptve Opnon Spam by Any Stretch of the Imagnaton. Ott etal. ACL 20

9 Power Relatons Blah Unaccep table blah Your honor, I agree blah blah blah What should your learnng algorthm look at? Echoes of Power: Language Effects and Power Dfferences n Socal Interacton. Danescu-Nculescu-Mzl et-al. WWW 202.

10 Power Relatons Communcatve behavors are patterned and coordnated, lke a dance [Nederhoffer and Pennebaker 2002] Echoes of Power: Language Effects and Power Dfferences n Socal Interacton. Danescu- Nculescu-Mzl et-al. WWW 202.

11 Classfcaton We assume we have a labeled dataset. How can we buld a classfer? Decde on a representaton and a learnng algorthm Essentally: Functon approxmaton Representaton: What s the doman of the functon Learnng: How to fnd a good approxmaton We wll look nto several smple examples Naïve Bayes, Perceptron Let s start wth some defntons..

12 Basc Defntons Gven: D a set of labeled examples {<x,y>} Goal: Learn a functon f(x) s.t. f(x) y Note: y can be bnary, or categorcal Typcally the nput x s represented as a vector of features Break D nto three parts: Tranng set (used by the learnng algorthm) Test set (evaluate the learned model) Development set (tunng the learnng algorthm) Evaluaton: performance measure over the test set Accuracy: proporton of correct predctons (test data)

13 Precson and Recall Gven a dataset, we tran a classfer that gets 99% accuracy Dd we do a good ob? Buld a classfer for bran tumor: 99.9% of bran scans do not show sgns of tumor Dd we do a good ob? By smply sayng NO to all examples we reduce the error by a factor of 0! Clearly Accuracy s not the best way to evaluate the learnng system when the data s heavly skewed! Intuton: we need a measure that captures the class we care about! (rare) 3

14 Precson and Recall The learner can make two knds of mstakes: False Postve False Negatve Precson: when we predcted the rare class, how often are we rght? Recall Predc ted: Predc ted: True Pos Predcted Pos True Label: True Postve False Negatve True Label: False Postve True Negatve Out of all the nstances of the rare class, how many dd we catch? True Pos Actual Pos 0 0 True Pos True Pos + False Pos True Pos True Pos + False Neg 4

15 F-Score Precson and Recall gve us two reference ponts to compare learnng performance Whch algorthm s better? Opton : Average Opton 2: F-Score Precson Recall Average F Score Algorthm Algorthm Algorthm P + R 2 2 PR P + R We need a sngle score Propertes of f-score: Ranges between 0- Prefers precson and recall wth smlar values 5

16 Smple Example: Naïve Bayes Naïve Bayes: smple probablstc classfer Gven a set of labeled data: Documents D, each assocated wth a label v Smple feature representaton: BoW Learnng: Construct a probablty dstrbuton P(v d) Predcton: Assgn the label wth the hghest probablty Reles on strong smplfyng assumptons

17 Smple Representaton: BoW Basc dea: (sentment analyss) I loved ths move, t s awesome! I couldn t stop laughng for two hours! Mappng nput to label can be done by representng the frequences of ndvdual words Document word counts Smple, yet surprsngly powerful representaton!

18 Bayes Rule Naïve Bayes s a smple probablstc classfcaton method, based on Bayes rule. P(v d) P(d v) P(v) P(d)

19 Bascs of Naïve Bayes P(v) - the pror probablty of a label v Reflects background knowledge; before data s observed. If no nformaton - unform dstrbuton. P(D) - The probablty that ths sample of the Data s observed. (No knowledge of the label) P(D v): The probablty of observng the sample D, gven that the label v s the target (Lkelhood) P(v D): The posteror probablty of v. The probablty that v s the target, gven that D has been observed. 9

20 Bayes Rule Naïve Bayes s a smple classfcaton method, based on Bayes rule. P(v d) P(d v) P(v) P(d) Check your ntuton: P(v d) ncreases wth P(v) and wth P(d v) P(v d) decreases wth P(d)

21 Naïve Bayes P(v D) P(D v) P(v)/P(D) The learner consders a set of canddate labels, and attempts to fnd the most probable one v V, gven the observed data. Such maxmally probable assgnment s called maxmum a posteror assgnment (MAP); Bayes theorem s used to compute t: v MAP argmax v V P(v D) argmax v v P(D v) P(v)/P(D) argmax v V P(D v) P(v) Snce P(D) s the same for all v V

22 Naïve Bayes How can we compute P(v D)? Basc dea: represent document as a set of features, such as BoW features v MAP argmax v VP(v x) argmax v V P(v x,x 2,...,x n ) P(x v MAP argmax,x 2,...,x n v )P(v ) v V P(x,x 2,...,x n ) argmax v VP(x,x 2,...,x n v )P(v )

23 NB: Parameter Estmaton V MAP argmax v P(x, x 2,, x n v )P(v) Gven tranng data we can estmate the two terms Estmatng P(v) s easy. For each value v count how many tmes t appears n the tranng data. Queston: Assume bnary x s. How many parameters does the model requre? However, t s not feasble to estmate P(x,, x n v ) In ths case we have to estmate, for each target value, the probablty of each nstance (most of whch wll not occur) In order to use a Bayesan classfers n practce, we need to make assumptons that wll allow us to estmate these quanttes. 23

24 NB: Independence Assumpton Bag of words representaton: Word poston can be gnored Condtonal Independence: Assume feature probabltes are ndependent gven the label P(x v ) P(x x - ; v ) Both assumptons are not true Help smplfy the model Smple models work well

25 Nave Bayes V MAP argmax v P(x, x 2,, x n v )P(v) P(x,x 2,...,x n v ) P(x x 2,...,x n, v )P(x 2,...,x n v ) P(x x 2,...,x n, v )P(x 2 x 3,...,x n, v )P(x 3,...,x n v )... P(x x 2,...,x n, v )P(x 2 x 3,...,x n, v )P(x 3 x 4,...,x n, v )...P(x n v ) Assumpton: feature values are ndependent gven the target value n P(x v ) 25

26 Estmatng Probabltes (MLE) Assume a document classfcaton problem, usng word features v NB argmax v {lke,dslke} word v) P(v) P(x P(word k # (word k appears ntranng n v documents) nk P(word k v) #(v documents) n Sparsty of data s a problem -- f n s small, the estmate s not accurate -- f n s 0, t wll domnate the estmate: we wll never predct v k f a word that never appeared n tranng (wth v) appears n the test data v) How do we estmate? 26

27 Robust Estmaton of Probabltes v NB argmax v {lke,dslke} word v) P(v) P(x Ths process s called smoothng. There are many ways to do t, some better ustfed than others; An emprcal ssue. P(x k Here: n k s #(of occurrences of the word n the presence of v) n s #(of occurrences of the label v) p s a pror estmate of v (e.g., unform) m s equvalent sample sze (# of labels) Laplace Rule: for the Boolean case, p/2, m2 v) n k n + + mp m P(x k v) n k + n

28 Naïve Bayes Very easy to mplement Converges very quckly Learnng s ust countng Performs well n practce Appled to many document classfcaton tasks If data set s small, NB can perform better than sophstcated algorthms Strong ndependence assumptons If assumptons hold: NB s the optmal classfer Even f not, can perform well Next: from NB to learnng lnear threshold functons

29 Naïve Bayes: Two Classes Notce that the naïve Bayes method gves a method for predctng rather than an explct classfer In the case of two classes, v {0,} we predct that v ff: P(v P(v ) 0) n P(x n P(x v v ) 0) > 29

30 Naïve Bayes: Two Classes Notce that the naïve Bayes method gves a method for predctng rather than an explct classfer. In the case of two classes, v {0,} we predct that v ff: P(v P(v ) 0) n P(x n P(x v v ) 0) > Denote: p P(x v ), q P(x v 0) P(v ) P(v 0) n n p x (- p ) -x q x (- q ) -x > 30

31 Naïve Bayes: Two Classes In the case of two classes, v {0,} we predct that v ff: P(v P(v ) 0) n n p q x x (- q (- p (- p ) ) -x -x P(v P(v ) 0) n n (- q p )( - p q )( - q ) ) x x > 3

32 Naïve Bayes: Two Classes In the case of two classes, v {0,} we predct that v ff: P(v P(v ) 0) n n p q x x (- q (- p (- p ) ) -x -x P(v P(v ) 0) n n (- q p )( - p q )( - q ) ) x x > Take logarthm; we predct v ff : log P(v P(v ) 0) + log - p - q + p (log - p log q - q )x > 0 32

33 Naïve Bayes: Two Classes In the case of two classes, v {0,} we predct that v ff: P(v P(v ) 0) n n p q x x (- q (- p (- p ) ) -x -x P(v P(v ) 0) n n (- q p )( - p q )( - q ) ) x x > Take logarthm; we predct v ff : log P(v P(v ) 0) + log - p - q + p (log - p log q - q )x > 0 We get that nave Bayes s a lnear separator wth : w log p log q log p - q - p - q q - p f p q then w 0 and the feature s rrelevant Introducton to Machne Learnng. Fall

34 Lnear Classfers Lnear threshold functons Assocate a weght (w ) wth each feature (x ) Predcton: sgn(b + w T x) sgn (b + Σ w x ) b + w T x 0 predct y Otherwse, predct y- NB s a lnear threshold functon Weght vector (w) s assgned by computng condtonal probabltes In fact, Lnear threshold functons are a very popular representaton!

35 Lnear Classfers sgn(b + w T x) Each pont n ths space s a document The coordnates (e.g., x,x2), are determned by feature actvatons

36 Expressvty Lnear functons are qute expressve Exsts a lnear functon that s consstent wth the data A famous negatve examples (XOR):

37 Expressvty By transformng the feature space these functons can be made lnear Represent each pont n 2D as (x,x 2 )

38 Expressvty sgn(b + w T x) More realstc scenaro: the data s almost lnearly separable, except for some nose

39 Features So far we have dscussed BoW representaton In fact, you can use a very rch representaton Broader defnton Functons mappng attrbutes of the nput to a Boolean/categorcal/numerc value φ (x) x s captalzed 0 otherwse x contans ''good '' more than twce φ k (x) 0 otherwse Queston: assume that you have a lexcon, contanng postve and negatve sentment words. How can you use t to mprove over BoW?

40 Perceptron One of the earlest learnng algorthms Introduced by Rosenblatt 958 to model neural learnng Goal: drectly search for a separatng hyperplane If one exsts, perceptron wll fnd t If not, Onlne algorthm Consders one example at a tme (NB looks at entre data) Error drven algorthm Updates the weghts only when a mstake s made

41 Perceptron Intuton

42 Perceptron We learn f:xà {-,+} represented as f sgn{wx) Where X {0,} n or X R n and w ² R n Gven Labeled examples: {(x, y ), (x 2, y 2 ), (x m, y m )}. Intalze w0 R n 2. Cycle through all examples a. Predct the label of nstance x to be y sgn{wx) b. If y y, update the weght vector: w w + r y x (r - a constant, learnng rate) Otherwse, f y y, leave weghts unchanged. 42

43 Margn The margn of a hyperplane for a dataset s the dstance between the hyperplane and the data pont nearest to t.

44 Margn The margn of a hyperplane for a dataset s the dstance between the hyperplane and the data pont nearest to t. The margn of a data set (γ) s the maxmum margn possble for that dataset usng any weght vector.

45 Mstake Bound for Perceptron Let D{(x, y )} be a labeled dataset that s separable Let x < R for all examples. Let γ be the margn of the dataset D. Then, the perceptron algorthm wll make at most R 2 / γ 2 mstakes on the data.

46 Practcal Example Task: context senstve spellng {prncple, prncpal},{weather,whet her}. Source: Scalng to very very large corpora for natural language dsambguaton Mchele Banko, Erc Brll. Mcrosoft Research, Redmond, WA

47 Deceptve Revews What should your learnng algorthm look at? Fndng Deceptve Opnon Spam by Any Stretch of the Imagnaton. Ott etal. ACL 20

48 Decepton Classfcaton

49 Summary Classfcaton s a basc tool for NLP E.g., What s the topc of a document? Classfer: mappng from nput to label Label: Bnary or Categorcal We saw two smple learnng algorthms for fndng the parameters of lnear classfcaton functons Naïve Bayes and Perceptron Next: More sophstcated algorthms Applcatons (or how to get t to work!)

50 Questons?

Evaluation for sets of classes

Evaluation for sets of classes Evaluaton for Tet Categorzaton Classfcaton accuracy: usual n ML, the proporton of correct decsons, Not approprate f the populaton rate of the class s low Precson, Recall and F 1 Better measures 21 Evaluaton

More information

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton

More information

Ensemble Methods: Boosting

Ensemble Methods: Boosting Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement

More information

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015 CS 3710: Vsual Recognton Classfcaton and Detecton Adrana Kovashka Department of Computer Scence January 13, 2015 Plan for Today Vsual recognton bascs part 2: Classfcaton and detecton Adrana s research

More information

Multilayer Perceptron (MLP)

Multilayer Perceptron (MLP) Multlayer Perceptron (MLP) Seungjn Cho Department of Computer Scence and Engneerng Pohang Unversty of Scence and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjn@postech.ac.kr 1 / 20 Outlne

More information

Bayesian Learning. Smart Home Health Analytics Spring Nirmalya Roy Department of Information Systems University of Maryland Baltimore County

Bayesian Learning. Smart Home Health Analytics Spring Nirmalya Roy Department of Information Systems University of Maryland Baltimore County Smart Home Health Analytcs Sprng 2018 Bayesan Learnng Nrmalya Roy Department of Informaton Systems Unversty of Maryland Baltmore ounty www.umbc.edu Bayesan Learnng ombnes pror knowledge wth evdence to

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012 MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Retrieval Models: Language models

Retrieval Models: Language models CS-590I Informaton Retreval Retreval Models: Language models Luo S Department of Computer Scence Purdue Unversty Introducton to language model Ungram language model Document language model estmaton Maxmum

More information

Online Classification: Perceptron and Winnow

Online Classification: Perceptron and Winnow E0 370 Statstcal Learnng Theory Lecture 18 Nov 8, 011 Onlne Classfcaton: Perceptron and Wnnow Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton In ths lecture we wll start to study the onlne learnng

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

Boostrapaggregating (Bagging)

Boostrapaggregating (Bagging) Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod

More information

EM and Structure Learning

EM and Structure Learning EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder

More information

Semi-Supervised Learning

Semi-Supervised Learning Sem-Supervsed Learnng Consder the problem of Prepostonal Phrase Attachment. Buy car wth money ; buy car wth wheel There are several ways to generate features. Gven the lmted representaton, we can assume

More information

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton Memoral Unversty of ewfoundland Pattern Recognton Lecture 7 May 3, 006 http://wwwengrmunca/~charlesr Offce Hours: uesdays hursdays 8:30-9:30

More information

Introduction to the Introduction to Artificial Neural Network

Introduction to the Introduction to Artificial Neural Network Introducton to the Introducton to Artfcal Neural Netork Vuong Le th Hao Tang s sldes Part of the content of the sldes are from the Internet (possbly th modfcatons). The lecturer does not clam any onershp

More information

Generative classification models

Generative classification models CS 675 Intro to Machne Learnng Lecture Generatve classfcaton models Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square Data: D { d, d,.., dn} d, Classfcaton represents a dscrete class value Goal: learn

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mnng Massve Datasets Jure Leskovec, Stanford Unversty http://cs246.stanford.edu 2/19/18 Jure Leskovec, Stanford CS246: Mnng Massve Datasets, http://cs246.stanford.edu 2 Hgh dm. data Graph data Infnte

More information

Lecture 12: Classification

Lecture 12: Classification Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna

More information

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014 COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #16 Scrbe: Yannan Wang Aprl 3, 014 1 Introducton The goal of our onlne learnng scenaro from last class s C comparng wth best expert and

More information

Learning from Data 1 Naive Bayes

Learning from Data 1 Naive Bayes Learnng from Data 1 Nave Bayes Davd Barber dbarber@anc.ed.ac.uk course page : http://anc.ed.ac.uk/ dbarber/lfd1/lfd1.html c Davd Barber 2001, 2002 1 Learnng from Data 1 : c Davd Barber 2001,2002 2 1 Why

More information

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018 INF 5860 Machne learnng for mage classfcaton Lecture 3 : Image classfcaton and regresson part II Anne Solberg January 3, 08 Today s topcs Multclass logstc regresson and softma Regularzaton Image classfcaton

More information

Classification as a Regression Problem

Classification as a Regression Problem Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far So far Supervsed machne learnng Lnear models Non-lnear models Unsupervsed machne learnng Generc scaffoldng So far

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far Supervsed machne learnng Lnear models Least squares regresson Fsher s dscrmnant, Perceptron, Logstc model Non-lnear

More information

Manning & Schuetze, FSNLP (c)1999, 2001

Manning & Schuetze, FSNLP (c)1999, 2001 page 589 16.2 Maxmum Entropy Modelng 589 Mannng & Schuetze, FSNLP (c)1999, 2001 a decson tree that detects spam. Fndng the rght features s paramount for ths task, so desgn your feature set carefully. Exercse

More information

Linear Classification, SVMs and Nearest Neighbors

Linear Classification, SVMs and Nearest Neighbors 1 CSE 473 Lecture 25 (Chapter 18) Lnear Classfcaton, SVMs and Nearest Neghbors CSE AI faculty + Chrs Bshop, Dan Klen, Stuart Russell, Andrew Moore Motvaton: Face Detecton How do we buld a classfer to dstngush

More information

Natural Language Processing and Information Retrieval

Natural Language Processing and Information Retrieval Natural Language Processng and Informaton Retreval Support Vector Machnes Alessandro Moschtt Department of nformaton and communcaton technology Unversty of Trento Emal: moschtt@ds.untn.t Summary Support

More information

Logistic Classifier CISC 5800 Professor Daniel Leeds

Logistic Classifier CISC 5800 Professor Daniel Leeds lon 9/7/8 Logstc Classfer CISC 58 Professor Danel Leeds Classfcaton strategy: generatve vs. dscrmnatve Generatve, e.g., Bayes/Naïve Bayes: 5 5 Identfy probablty dstrbuton for each class Determne class

More information

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Support Vector Machines. Vibhav Gogate The University of Texas at dallas Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest

More information

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering / Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons

More information

Lecture 10 Support Vector Machines. Oct

Lecture 10 Support Vector Machines. Oct Lecture 10 Support Vector Machnes Oct - 20-2008 Lnear Separators Whch of the lnear separators s optmal? Concept of Margn Recall that n Perceptron, we learned that the convergence rate of the Perceptron

More information

SDMML HT MSc Problem Sheet 4

SDMML HT MSc Problem Sheet 4 SDMML HT 06 - MSc Problem Sheet 4. The recever operatng characterstc ROC curve plots the senstvty aganst the specfcty of a bnary classfer as the threshold for dscrmnaton s vared. Let the data space be

More information

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD he Gaussan classfer Nuno Vasconcelos ECE Department, UCSD Bayesan decson theory recall that we have state of the world X observatons g decson functon L[g,y] loss of predctng y wth g Bayes decson rule s

More information

Instance-Based Learning (a.k.a. memory-based learning) Part I: Nearest Neighbor Classification

Instance-Based Learning (a.k.a. memory-based learning) Part I: Nearest Neighbor Classification Instance-Based earnng (a.k.a. memory-based learnng) Part I: Nearest Neghbor Classfcaton Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n

More information

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute

More information

FREQUENCY DISTRIBUTIONS Page 1 of The idea of a frequency distribution for sets of observations will be introduced,

FREQUENCY DISTRIBUTIONS Page 1 of The idea of a frequency distribution for sets of observations will be introduced, FREQUENCY DISTRIBUTIONS Page 1 of 6 I. Introducton 1. The dea of a frequency dstrbuton for sets of observatons wll be ntroduced, together wth some of the mechancs for constructng dstrbutons of data. Then

More information

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement Markov Chan Monte Carlo MCMC, Gbbs Samplng, Metropols Algorthms, and Smulated Annealng 2001 Bonformatcs Course Supplement SNU Bontellgence Lab http://bsnuackr/ Outlne! Markov Chan Monte Carlo MCMC! Metropols-Hastngs

More information

9.913 Pattern Recognition for Vision. Class IV Part I Bayesian Decision Theory Yuri Ivanov

9.913 Pattern Recognition for Vision. Class IV Part I Bayesian Decision Theory Yuri Ivanov 9.93 Class IV Part I Bayesan Decson Theory Yur Ivanov TOC Roadmap to Machne Learnng Bayesan Decson Makng Mnmum Error Rate Decsons Mnmum Rsk Decsons Mnmax Crteron Operatng Characterstcs Notaton x - scalar

More information

CS47300: Web Information Search and Management

CS47300: Web Information Search and Management CS47300: Web Informaton Search and Management Probablstc Retreval Models Prof. Chrs Clfton 7 September 2018 Materal adapted from course created by Dr. Luo S, now leadng Albaba research group 14 Why probabltes

More information

Evaluation of classifiers MLPs

Evaluation of classifiers MLPs Lecture Evaluaton of classfers MLPs Mlos Hausrecht mlos@cs.ptt.edu 539 Sennott Square Evaluaton For any data set e use to test the model e can buld a confuson matrx: Counts of examples th: class label

More information

Maximum Likelihood Estimation (MLE)

Maximum Likelihood Estimation (MLE) Maxmum Lkelhood Estmaton (MLE) Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 01 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y

More information

9.2 Maximum A Posteriori and Maximum Likelihood

9.2 Maximum A Posteriori and Maximum Likelihood Maxmum A Posteror and Maxmum Lkelhood In the above, p( 0 < 0.5 V) = = Z 0.5 0 p( 0 V)d 0 (9.1.29) 1 B( + N H, + N T ) Z 0.5 0 +N H 1 (1 ) +N T 1 d (9.1.30) I 0.5 ( + N H, + N T ) (9.1.31) where I x (a,

More information

The big picture. Outline

The big picture. Outline The bg pcture Vncent Claveau IRISA - CNRS, sldes from E. Kjak INSA Rennes Notatons classes: C = {ω = 1,.., C} tranng set S of sze m, composed of m ponts (x, ω ) per class ω representaton space: R d (=

More information

8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore

8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore 8/5/17 Data Modelng Patrce Koehl Department of Bologcal Scences atonal Unversty of Sngapore http://www.cs.ucdavs.edu/~koehl/teachng/bl59 koehl@cs.ucdavs.edu Data Modelng Ø Data Modelng: least squares Ø

More information

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before

More information

3.1 ML and Empirical Distribution

3.1 ML and Empirical Distribution 67577 Intro. to Machne Learnng Fall semester, 2008/9 Lecture 3: Maxmum Lkelhood/ Maxmum Entropy Dualty Lecturer: Amnon Shashua Scrbe: Amnon Shashua 1 In the prevous lecture we defned the prncple of Maxmum

More information

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach A Bayes Algorthm for the Multtask Pattern Recognton Problem Drect Approach Edward Puchala Wroclaw Unversty of Technology, Char of Systems and Computer etworks, Wybrzeze Wyspanskego 7, 50-370 Wroclaw, Poland

More information

Singular Value Decomposition: Theory and Applications

Singular Value Decomposition: Theory and Applications Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real

More information

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING MACHINE LEANING Vasant Honavar Bonformatcs and Computatonal Bology rogram Center for Computatonal Intellgence, Learnng, & Dscovery Iowa State Unversty honavar@cs.astate.edu www.cs.astate.edu/~honavar/

More information

Maxent Models & Deep Learning

Maxent Models & Deep Learning Maxent Models & Deep Learnng 1. Last bts of maxent (sequence) models 1.MEMMs vs. CRFs 2.Smoothng/regularzaton n maxent models 2. Deep Learnng 1. What s t? Why s t good? (Part 1) 2. From logstc regresson

More information

Multilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata

Multilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata Multlayer Perceptrons and Informatcs CG: Lecture 6 Mrella Lapata School of Informatcs Unversty of Ednburgh mlap@nf.ed.ac.uk Readng: Kevn Gurney s Introducton to Neural Networks, Chapters 5 6.5 January,

More information

MDL-Based Unsupervised Attribute Ranking

MDL-Based Unsupervised Attribute Ranking MDL-Based Unsupervsed Attrbute Rankng Zdravko Markov Computer Scence Department Central Connectcut State Unversty New Brtan, CT 06050, USA http://www.cs.ccsu.edu/~markov/ markovz@ccsu.edu MDL-Based Unsupervsed

More information

Linear Feature Engineering 11

Linear Feature Engineering 11 Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19

More information

18.1 Introduction and Recap

18.1 Introduction and Recap CS787: Advanced Algorthms Scrbe: Pryananda Shenoy and Shjn Kong Lecturer: Shuch Chawla Topc: Streamng Algorthmscontnued) Date: 0/26/2007 We contnue talng about streamng algorthms n ths lecture, ncludng

More information

Statistical pattern recognition

Statistical pattern recognition Statstcal pattern recognton Bayes theorem Problem: decdng f a patent has a partcular condton based on a partcular test However, the test s mperfect Someone wth the condton may go undetected (false negatve

More information

Expectation Maximization Mixture Models HMMs

Expectation Maximization Mixture Models HMMs -755 Machne Learnng for Sgnal Processng Mture Models HMMs Class 9. 2 Sep 200 Learnng Dstrbutons for Data Problem: Gven a collecton of eamples from some data, estmate ts dstrbuton Basc deas of Mamum Lelhood

More information

I529: Machine Learning in Bioinformatics (Spring 2017) Markov Models

I529: Machine Learning in Bioinformatics (Spring 2017) Markov Models I529: Machne Learnng n Bonformatcs (Sprng 217) Markov Models Yuzhen Ye School of Informatcs and Computng Indana Unversty, Bloomngton Sprng 217 Outlne Smple model (frequency & profle) revew Markov chan

More information

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ CSE 455/555 Sprng 2013 Homework 7: Parametrc Technques Jason J. Corso Computer Scence and Engneerng SUY at Buffalo jcorso@buffalo.edu Solutons by Yngbo Zhou Ths assgnment does not need to be submtted and

More information

Composite Hypotheses testing

Composite Hypotheses testing Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter

More information

Feature-Rich Sequence Models. Statistical NLP Spring MEMM Taggers. Decoding. Derivative for Maximum Entropy. Maximum Entropy II

Feature-Rich Sequence Models. Statistical NLP Spring MEMM Taggers. Decoding. Derivative for Maximum Entropy. Maximum Entropy II Statstcal NLP Sprng 2010 Feature-Rch Sequence Models Problem: HMMs make t hard to work wth arbtrary features of a sentence Example: name entty recognton (NER) PER PER O O O O O O ORG O O O O O LOC LOC

More information

Multilayer neural networks

Multilayer neural networks Lecture Multlayer neural networks Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Mdterm exam Mdterm Monday, March 2, 205 In-class (75 mnutes) closed book materal covered by February 25, 205 Multlayer

More information

Other NN Models. Reinforcement learning (RL) Probabilistic neural networks

Other NN Models. Reinforcement learning (RL) Probabilistic neural networks Other NN Models Renforcement learnng (RL) Probablstc neural networks Support vector machne (SVM) Renforcement learnng g( (RL) Basc deas: Supervsed dlearnng: (delta rule, BP) Samples (x, f(x)) to learn

More information

MAXIMUM A POSTERIORI TRANSDUCTION

MAXIMUM A POSTERIORI TRANSDUCTION MAXIMUM A POSTERIORI TRANSDUCTION LI-WEI WANG, JU-FU FENG School of Mathematcal Scences, Peng Unversty, Bejng, 0087, Chna Center for Informaton Scences, Peng Unversty, Bejng, 0087, Chna E-MIAL: {wanglw,

More information

Vapnik-Chervonenkis theory

Vapnik-Chervonenkis theory Vapnk-Chervonenks theory Rs Kondor June 13, 2008 For the purposes of ths lecture, we restrct ourselves to the bnary supervsed batch learnng settng. We assume that we have an nput space X, and an unknown

More information

CSE 546 Midterm Exam, Fall 2014(with Solution)

CSE 546 Midterm Exam, Fall 2014(with Solution) CSE 546 Mdterm Exam, Fall 014(wth Soluton) 1. Personal nfo: Name: UW NetID: Student ID:. There should be 14 numbered pages n ths exam (ncludng ths cover sheet). 3. You can use any materal you brought:

More information

Representing arbitrary probability distributions Inference. Exact inference; Approximate inference

Representing arbitrary probability distributions Inference. Exact inference; Approximate inference Bayesan Learnng So far What does t mean to be Bayesan? Naïve Bayes Independence assumptons EM Algorthm Learnng wth hdden varables Today: Representng arbtrary probablty dstrbutons Inference Exact nference;

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

Support Vector Machines

Support Vector Machines CS 2750: Machne Learnng Support Vector Machnes Prof. Adrana Kovashka Unversty of Pttsburgh February 17, 2016 Announcement Homework 2 deadlne s now 2/29 We ll have covered everythng you need today or at

More information

CSE 252C: Computer Vision III

CSE 252C: Computer Vision III CSE 252C: Computer Vson III Lecturer: Serge Belonge Scrbe: Catherne Wah LECTURE 15 Kernel Machnes 15.1. Kernels We wll study two methods based on a specal knd of functon k(x, y) called a kernel: Kernel

More information

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore Sesson Outlne Introducton to classfcaton problems and dscrete choce models. Introducton to Logstcs Regresson. Logstc functon and Logt functon. Maxmum Lkelhood Estmator (MLE) for estmaton of LR parameters.

More information

Admin NEURAL NETWORKS. Perceptron learning algorithm. Our Nervous System 10/25/16. Assignment 7. Class 11/22. Schedule for the rest of the semester

Admin NEURAL NETWORKS. Perceptron learning algorithm. Our Nervous System 10/25/16. Assignment 7. Class 11/22. Schedule for the rest of the semester 0/25/6 Admn Assgnment 7 Class /22 Schedule for the rest of the semester NEURAL NETWORKS Davd Kauchak CS58 Fall 206 Perceptron learnng algorthm Our Nervous System repeat untl convergence (or for some #

More information

Machine learning: Density estimation

Machine learning: Density estimation CS 70 Foundatons of AI Lecture 3 Machne learnng: ensty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square ata: ensty estmaton {.. n} x a vector of attrbute values Objectve: estmate the model of

More information

Module 9. Lecture 6. Duality in Assignment Problems

Module 9. Lecture 6. Duality in Assignment Problems Module 9 1 Lecture 6 Dualty n Assgnment Problems In ths lecture we attempt to answer few other mportant questons posed n earler lecture for (AP) and see how some of them can be explaned through the concept

More information

Motion Perception Under Uncertainty. Hongjing Lu Department of Psychology University of Hong Kong

Motion Perception Under Uncertainty. Hongjing Lu Department of Psychology University of Hong Kong Moton Percepton Under Uncertanty Hongjng Lu Department of Psychology Unversty of Hong Kong Outlne Uncertanty n moton stmulus Correspondence problem Qualtatve fttng usng deal observer models Based on sgnal

More information

Statistical Foundations of Pattern Recognition

Statistical Foundations of Pattern Recognition Statstcal Foundatons of Pattern Recognton Learnng Objectves Bayes Theorem Decson-mang Confdence factors Dscrmnants The connecton to neural nets Statstcal Foundatons of Pattern Recognton NDE measurement

More information

Which Separator? Spring 1

Which Separator? Spring 1 Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal

More information

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

Relevance Vector Machines Explained

Relevance Vector Machines Explained October 19, 2010 Relevance Vector Machnes Explaned Trstan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introducton Ths document has been wrtten n an attempt to make Tppng s [1] Relevance Vector Machnes

More information

Corpora and Statistical Methods Lecture 6. Semantic similarity, vector space models and wordsense disambiguation

Corpora and Statistical Methods Lecture 6. Semantic similarity, vector space models and wordsense disambiguation Corpora and Statstcal Methods Lecture 6 Semantc smlarty, vector space models and wordsense dsambguaton Part 1 Semantc smlarty Synonymy Dfferent phonologcal/orthographc words hghly related meanngs: sofa

More information

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline Outlne Bayesan Networks: Maxmum Lkelhood Estmaton and Tree Structure Learnng Huzhen Yu janey.yu@cs.helsnk.f Dept. Computer Scence, Unv. of Helsnk Probablstc Models, Sprng, 200 Notces: I corrected a number

More information

Kernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan

Kernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan Kernels n Support Vector Machnes Based on lectures of Martn Law, Unversty of Mchgan Non Lnear separable problems AND OR NOT() The XOR problem cannot be solved wth a perceptron. XOR Per Lug Martell - Systems

More information

Lecture 3: Probability Distributions

Lecture 3: Probability Distributions Lecture 3: Probablty Dstrbutons Random Varables Let us begn by defnng a sample space as a set of outcomes from an experment. We denote ths by S. A random varable s a functon whch maps outcomes nto the

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

Discriminative classifier: Logistic Regression. CS534-Machine Learning

Discriminative classifier: Logistic Regression. CS534-Machine Learning Dscrmnatve classfer: Logstc Regresson CS534-Machne Learnng robablstc Classfer Gven an nstance, hat does a probablstc classfer do dfferentl compared to, sa, perceptron? It does not drectl predct Instead,

More information

Learning Theory: Lecture Notes

Learning Theory: Lecture Notes Learnng Theory: Lecture Notes Lecturer: Kamalka Chaudhur Scrbe: Qush Wang October 27, 2012 1 The Agnostc PAC Model Recall that one of the constrants of the PAC model s that the data dstrbuton has to be

More information

1 Convex Optimization

1 Convex Optimization Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,

More information

Probabilistic Information Retrieval CE-324: Modern Information Retrieval Sharif University of Technology

Probabilistic Information Retrieval CE-324: Modern Information Retrieval Sharif University of Technology Probablstc Informaton Retreval CE-324: Modern Informaton Retreval Sharf Unversty of Technology M. Soleyman Fall 2016 Most sldes have been adapted from: Profs. Mannng, Nayak & Raghavan (CS-276, Stanford)

More information

CSC 411 / CSC D11 / CSC C11

CSC 411 / CSC D11 / CSC C11 18 Boostng s a general strategy for learnng classfers by combnng smpler ones. The dea of boostng s to take a weak classfer that s, any classfer that wll do at least slghtly better than chance and use t

More information

Chapter 8 Indicator Variables

Chapter 8 Indicator Variables Chapter 8 Indcator Varables In general, e explanatory varables n any regresson analyss are assumed to be quanttatve n nature. For example, e varables lke temperature, dstance, age etc. are quanttatve n

More information

Neural networks. Nuno Vasconcelos ECE Department, UCSD

Neural networks. Nuno Vasconcelos ECE Department, UCSD Neural networs Nuno Vasconcelos ECE Department, UCSD Classfcaton a classfcaton problem has two types of varables e.g. X - vector of observatons (features) n the world Y - state (class) of the world x X

More information

18-660: Numerical Methods for Engineering Design and Optimization

18-660: Numerical Methods for Engineering Design and Optimization 8-66: Numercal Methods for Engneerng Desgn and Optmzaton n L Department of EE arnege Mellon Unversty Pttsburgh, PA 53 Slde Overve lassfcaton Support vector machne Regularzaton Slde lassfcaton Predct categorcal

More information

Bayesian decision theory. Nuno Vasconcelos ECE Department, UCSD

Bayesian decision theory. Nuno Vasconcelos ECE Department, UCSD Bayesan decson theory Nuno Vasconcelos ECE Department UCSD Notaton the notaton n DHS s qute sloppy e.. show that error error z z dz really not clear what ths means we wll use the follown notaton subscrpts

More information

CIS 519/419 Appled Machne Learnng www.seas.upenn.edu/~cs519 Dan Roth danroth@seas.upenn.edu http://www.cs.upenn.edu/~danroth/ 461C, 3401 Walnut Sldes were created by Dan Roth (for CIS519/419 at Penn or

More information

Space of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics

Space of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics /7/7 CSE 73: Artfcal Intellgence Bayesan - Learnng Deter Fox Sldes adapted from Dan Weld, Jack Breese, Dan Klen, Daphne Koller, Stuart Russell, Andrew Moore & Luke Zettlemoyer What s Beng Learned? Space

More information