Learning undirected Models. Instructor: Su-In Lee University of Washington, Seattle. Mean Field Approximation

Size: px
Start display at page:

Download "Learning undirected Models. Instructor: Su-In Lee University of Washington, Seattle. Mean Field Approximation"

Transcription

1 Readngs: K&F 0.3, 0.4, 0.6, 0.7 Learnng undrected Models Lecture 8 June, 0 CSE 55, Statstcal Methods, Sprng 0 Instructor: Su-In Lee Unversty of Washngton, Seattle Mean Feld Approxmaton Is the energy functonal convex n the parameters of Q? - entropy x log(x) s concave n x - xy s ontly convex n (x,y) The energy functonal s easy to compute, even for networks where nference s complex The energy functonal for a fully factored dstrbuton Q can be rewrtten smply as a sum of expectatons, each one over a small set of varables. F [ PF, Q] = EQ[lnφ] + HQ ( U) E Q φ F [ln φ ] = uφ Q( u H Q (U) = H Q ( X ) φ ) ln φ ( u φ ) = ( Q( x )) ln φ ( u φ ) The complexty of ths expresson depends on the sze of the factors n P F, and not on the topology of the network. uφ X uφ

2 Learnng Undrected Graphs The lkelhood functon Learnng parameters Collectve classfcaton wth HMM, MEMM, CRF Generatve vs. dscrmnatve models Drected vs. undrected models Learnng wth ncomplete data Learnng wth Prors Maxmum A Pror (MAP) estmaton Learnng wth alternatve obectves Pseudo lkelhood obectve Max-margn learnng Structure Learnng 3 Collectve Classfcaton Takng a set of nterrelated nstances and ontly labelng them Sequental labelng: labelng nstances organzed n a sequence Example: handwrtng recognton A sequence of observatons (feature) Use local nformaton Explot correlatons br ac e Label them wth some ont label Model-based approach Tranng data: Fully labeled (both Y and X are observed) Test data: only X s observed 4

3 Collectve Classfcaton Trade-offs between dfferent models Hdden Markov Model (HMM) Maxmum Entropy Markov Model (MEMM) Condtonal Random Feld (CRF) HMM MEMM CRF Y Y Y X X X Y: Jont label X: A sequence of observatons (feature) 5 Hdden Markov Model For each classfcaton task, Sngle (hdden) state varable Y (e.g. label) Sngle (observed) observaton varable X (e.g. mage) Observaton probablty P(X Y) For example, P(X= Y= b ) Transton probablty P(Y Y) Statstcal dependences between the neghborng Y s Y Y Y X 0 X X G 0 G Unrolled network 6 3

4 Hdden Markov Model For each classfcaton task, Sngle (hdden) state varable Y Sngle (observed) observaton varable X Observaton probablty P(X Y) Transton probablty P(Y Y) assumed to be sparse Usually encoded by a state transton graph 0.4 Y Y State transton representaton 0.5 P(Y Y) y y y 3 y 4 y y y y Learnng: Hdden Markov Model Generatve models Defne a ont probablty P(Y,X) over pared label Y and observaton X Parameters traned to maxmze the ont log-lkelhood log P(Y,X) HMM Jont dstrbuton Y P(X,Y) =? X We can label new observatons x by nferrng P(Y X=x) To make nference tractable, there are typcally no long-range dependences (Markov assumpton) 8 4

5 Dscrmnatve (Condtonal) Models Specfes the probablty of possble label sequences gven the observatons, P(Y X) X s always observed Key advantage: Does not waste parameters on modelng P(X) Dstrbuton over Y can depend on non-ndependent features X wthout modelng feature dependences Transton probabltes can depend on past and future Two representatons Maxmum Entropy Markov Models (MEMMs) Condtonal Random Felds (CRFs) 9 Max Entropy Markov Models Models the probablty over the next state gven the prevous state and the observatons Dscrmnatve model: Provdes a model for P(Y X) Weakness: label bas problem (Y X X - ) for any > : an observaton from later n the sequence has absolutely no effect on the probablty of the current state Y Y X X HMM MEMM 0 5

6 Label-Bas Problem: Example Y Weakness: current label s not affected by the future observaton X A model for dstngushng rob from rb Suppose we get an nput sequence X= rb Frst step, r matches both possble states equally lkely Next, s observed, but snce both y and y 4 have one outgong state, they both gve probablty to the next state Note: f one word s more lkely n tran data, t wll wn Does not happen n HMMs r y0 y y r y 4 y 5 o y 3 State transton representaton b b P(Y Y) y 0 y y y 3 y 4 y 5 y y y y y y r b Condtonal Random Felds Advantages of MEMMs wthout the label bas problem X Key dfference MEMMs use per-state model for condtonal probabltes of next state gven current state CRFs have a sngle model for the ont probablty of the entre sequence of labels gven the observatons Thus, weghts of dfferent features at dfferent states can trade CRF tranng off aganst each other CRF Y Maxmum lkelhood estmaton or MAP (a lttle later) Obectve functon s concave, guaranteeng convergence to global optmum X MEMM Y - Y 6

7 Condtonal Random Felds Let G=(V,E) be a graph wth vertces V and edges E, such that Y = (Y v ) v V Then (X,Y) s a CRF f the random varables Y v obey the Markov property wth respect to the graph: P( Y X,Y, v) P(Y X,Y, ~ v) v = where Y s the set of Y neghbors of Y And f t models only P(Y X) CRF v Y X OR Y X 3 Condtonal Random Felds Jont probablty dstrbuton for trees over Y Clques (and thus potentals) are the edges and vertces p (y x) exp λ k f k (e,y[e],x) + µ k g k (v,y[v],x) e E,k v V,k x are the observed varables y are the state varables y[s] s the components of Y assocated wth vertces n S f k s an edge feature wth weght λ k g k s a vertex feature wth weght µ k Note that features can be over all of varables n x CRF Y Y X X 4 7

8 Comparson /3 Computatonal perspectve Purely drected models HMMs and MEMMs are much more easly learned Ther parameters can be computed n closed form usng MLE or Bayesan estmaton CRF requres an teratve gradent-based approach and nference must be run for every tranng nstance Y Y Y X X X HMM MEMM CRF 5 Comparson /3 Ablty to use a rch feature set Success n a classfcaton task often depends strongly on the qualty of our features In an HMM, we must explctly model the dstrbuton over features X, ncludng the nteractons between them Dependng on features, ths type of model s very hard and often mpossble to construct correctly MEMM, CRF are both dscrmnatve models and so they avod ths challenge entrely Y Y Y X X X HMM MEMM CRF 6 8

9 Comparson: Summary Independence assumptons made by the model In MEMMs, (Y X X - ) for any > : current label s not affected by the future observaton (label bas problem) Summary In cases where there are many correlated features, dscrmnatve models are probably better If only lmted data are avalable, the stronger bas of the generatve model (modelng P(X)) may domnate and allow learnng wth fewer samples Among the dscrmnatve models, MEMMs should probably be avoded n cases where many transtons are close to determnstc (label bas problem) In many cases, CRFs are lkely to be a safer choce, but the computatonal cost may be prohbtve for large datasets 7 Learnng Undrected Graphs The lkelhood functon Learnng parameters Collectve classfcaton wth HMM, MEMM, CRF Generatve vs. dscrmnatve models Drected vs. undrected models Learnng wth ncomplete data Learnng wth Prors Maxmum A Pror (MAP) estmaton Learnng wth alternatve obectves Pseudo lkelhood obectve Max-margn learnng Structure Learnng 8 9

10 Learnng wth Mssng Data In MLE wth complete data, the gradent s l( : D) = MED[ f[ d ]] ME[ f ] Number of tmes feature f s true n data D Expected number of tmes feature f s true accordng to model Gradent of lkelhood s now dfference of expectatons Y: hdden, X: observed l( : D) = ME[ f[ y x ]] ME[ f ] Expected number of tmes feature f s true gven observed data Expected number of tmes feature f s true accordng to model Can use gradent descent or EM 9 Learnng Undrected Graphs The lkelhood functon Learnng parameters Collectve classfcaton wth HMM, MEMM, CRF Generatve vs. dscrmnatve models Drected vs. undrected models Learnng wth ncomplete data Learnng wth Prors Maxmum A Pror (MAP) estmaton Learnng wth alternatve obectves Pseudo lkelhood obectve Max-margn learnng Structure Learnng 0 0

11 Maxmum A Pror (MAP) estmaton Introducng a pror dstrbuton P( ) over the model parameters Bayesan approach Gven D={x[],,x[M]}, P ( x[ M + ] D) = P( x[ M + ] ) P( D) d Maxmum a Pror (MAP) estmaton arg max P ( D) = arg max P( ) P( D ) Maxmum lkelhood estmaton (MLE) arg max P( D ) Gaussan Pror MAP estmaton arg max P ( D) = arg max P( ) P( D ) log P ( D) = log P( D ) + log P( ) Gaussan pror P( σ ) = Convertng to log-space L regularzaton k = exp πσ σ σ k =

12 Laplacan Pror Laplacan pror P Laplacan Convertng to log-space L regularzaton k = ( β ) exp = β β β k = P( ) Laplacan dstrbuton (β=) and Gaussan dstrbuton (σ =) 3 Why Regularzaton? Both forms of regularzaton penalze parameters whose magntude s large Why s a bas n favor of parameters of low magntude a reasonable one? A pror often serves to pull the dstrbuton toward an unnformed one, smoothng out fluctuatons n the data A dstrbuton s smooth f the probabltes assgned to dfferent assgnments are not radcally dfferent. Consder two assgnments and ξ ' Log of ther relatve probablty s k k P( ξ ) ln = f ( ξ ) f ( ξ ') = P( ξ ') = = ξ k = ( f ( ξ ) f ( ξ ')) When all Θ s have small magntude, ths log-rato s also bounded, resultng n a smooth dstrbuton. 4

13 L vs L Regularzaton Gaussan pror (L): σ Laplacan pror (L): β Key dfferences: k = k = In L, the penalty grows quadratcally wth the parameter magntude. In L, the penalty s lnear n the parameter magntude. In L, as the parameters get close to 0, the effect of the penalty dmnshes, whereas n L case, the penalty s lnear all the way untl the parameter value s 0. The models learned wth an L regularzaton tend to be much sparser than the L case The strength depends on the hyper-parameter β 0 L L Learnng Undrected Graphs The lkelhood functon Learnng parameters Collectve classfcaton wth HMM, MEMM, CRF Generatve vs. dscrmnatve models Drected vs. undrected models Learnng wth ncomplete data Learnng wth Prors Maxmum A Pror (MAP) estmaton Learnng wth alternatve obectves Pseudo lkelhood obectve Max-margn learnng Structure Learnng 6 3

14 Why Alternatve Obectves? The log-lkelhood obectve, on the case of a sngle data nstance ξ ~ ~ ~ l( : ξ ) = ln P( ξ ) ln Z( ) = ln P( ξ ) ln P( ξ ' ) ξ ' MLE can be vewed as amng to ncrease the dstance between the log of the un-normalzed probablty (logmeasure) of ξ and the aggregate of the measures of all nstances. Key dffculty: the nd term nvolves a summaton over the exponentally many nstances n Val(X). In MLE, we have to compute the log-lkelhood n every teraton (approxmate nference) Alternatve obectves Am to ncrease the dfference between the log-measure of the data nstance and a more tractable set of other nstances ( Contrastve obectves) 7 Pseudo-lkelhood For a data nstance ξ, usng the chan rule, we can n wrte P(ξ ) = P( x x,..., = x ) We can approxmate ths formulaton by replacng each term by the condtonal probablty x gven all other varables x - P(ξ ) n = P( x x,..., x, x +,..., xn) Ths approxmaton leads to the pseudolkelhood obectve: Gven D wth M tranng nstances, l PL ( : D) = ln P( x [ m] x [ m], ) M Each nstance m Each varable 8 4

15 Gradent of Pseudolkelhood Pseudolkelhood obectve: lpl( : D) = ln P( x [ m] x [ m], ) M Each nstance m Each varable Each term s a log-condtonal lkelhood term over a sngle var X, condtonal on all the remanng vars ln P( x x ) = f[ x, u ] ln exp f[ x ', u ] : X Scope[f ] x ' : X Scope[f ] The nd term nvolves a summaton over values on only a sngle var X (does not requre nference at each step) Wdely used n vson, spatal statstcs, etc. Jontly concave over all parameters Consstent estmator As the number of data nstances M goes to nfnty, wth probablty, MLE of the log-lkelhood obectve Θ* (the true parameter) s a global optmum of the pseudolkelhood obectve 9 Pseudolkelhood vs Lkelhood When the pseudolkelhood does not work well? Depends on the types of queres for whch we ntend to use the model Psuedolkelhood obectve s a better tranng obectve If we plan to run queres where we condton on most of the varables and query the values of only a few, the pseudolkelhood obectve s a very close match to the type of predctons we would lke to make Any example? 5 Netflx collaboratve flterng? 3 Matrx Star Wars VI? Star Wars I 4 Star Wars II Harry Potter II Harry Potter I? Indana Jones New user Probablstc nference 30 5

16 Pseudolkelhood vs Lkelhood Lkelhood obectve s a better tranng obectve If a typcal query nvolves most or all of the varables n the model, the lkelhood obectve s more approprate. E.g. Gven E=mage what s P(X=labels E=mage)=? Image from the webste of Prof Daphne Koller s lab ( X = = = X 4 = Image from the webste of Prof Daphne Koller s lab = = 3 = 4 =3 = = 3 =3 X =3 34 A grd-structured Markov network However, even n cases where the lkelhood s the more approprate obectve, we may have to resort to pseudolkelhood for computatonal reasons In many cases, ths obectve performs surprsngly well. 3 Max-margn Tranng Say that we want to use the model for predctng a MAP assgnment (E.g. mage segmentaton) In ths settng, our tranng set conssts of a set of pars D={(y[m],x[m])} m=,,m. Gven an observaton x[m], we want our learned model to gve the hghest probablty to y[m]. In other words, we want the probablty P Θ (y[m] x[m]) to be hgher than any other probablty P Θ (y x[m]) for y y[m]. To ncrease our confdence n the predcton, we would lke to ncrease the log-probablty gap as much as possble by ncreasng ln P ( y[ m] x[ m]) max ln P ( y x[ m]) y y[ m] Ths dfference between the log-probablty of the target assgnment y[m] and that of the next best assgnment s called the margn. The hgher the margn, the more confdence the model s 3 6

17 Handwrtng Recognton Example Margn ln P ( y[ m] x[ m]) max ln P ( y x[ m]) y y[ m] We want: Equvalently: brace Y X CRF brace aaaaa brace brace aaaab zzzzz a lot! 33 Learnng Undrected Graphs The lkelhood functon Learnng parameters Collectve classfcaton wth HMM, MEMM, CRF Generatve vs. dscrmnatve models Drected vs. undrected models Learnng wth ncomplete data Learnng wth Prors Maxmum A Pror (MAP) estmaton Learnng wth alternatve obectves Structure Learnng Structure learnng va L regularzaton 37 7

18 Structure Learnng Start wth atomc features Greedly conon features to mprove score Problem: Need to re-estmate weghts for each new canddate Approxmaton: Keep weghts of prevous features constant 38 Structure Learnng va Regularzaton* Treat the structure learnng problem as a parameter estmaton problem n a fully connected network L regularzaton to obtan a sparse representaton X X 5 X 4 X X 5 X 4 Lkelhood or pseudolkelhood obectve Convex optmzaton problem *Lee 07, Wanwrght 07, Hoeflng

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data Condtonal Random Felds: Probablstc Models for Segmentng and Labelng Sequence Data Paper by John Lafferty, Andrew McCallum, and Fernando Perera ICML 2001 Presentaton by Joe Drsh May 9, 2002 Man Goals Present

More information

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton

More information

EM and Structure Learning

EM and Structure Learning EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder

More information

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin Fnte Mxture Models and Expectaton Maxmzaton Most sldes are from: Dr. Maro Fgueredo, Dr. Anl Jan and Dr. Rong Jn Recall: The Supervsed Learnng Problem Gven a set of n samples X {(x, y )},,,n Chapter 3 of

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012 MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:

More information

Supporting Information

Supporting Information Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to

More information

Maxent Models & Deep Learning

Maxent Models & Deep Learning Maxent Models & Deep Learnng 1. Last bts of maxent (sequence) models 1.MEMMs vs. CRFs 2.Smoothng/regularzaton n maxent models 2. Deep Learnng 1. What s t? Why s t good? (Part 1) 2. From logstc regresson

More information

Motion Perception Under Uncertainty. Hongjing Lu Department of Psychology University of Hong Kong

Motion Perception Under Uncertainty. Hongjing Lu Department of Psychology University of Hong Kong Moton Percepton Under Uncertanty Hongjng Lu Department of Psychology Unversty of Hong Kong Outlne Uncertanty n moton stmulus Correspondence problem Qualtatve fttng usng deal observer models Based on sgnal

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018 INF 5860 Machne learnng for mage classfcaton Lecture 3 : Image classfcaton and regresson part II Anne Solberg January 3, 08 Today s topcs Multclass logstc regresson and softma Regularzaton Image classfcaton

More information

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2) 1/16 MATH 829: Introducton to Data Mnng and Analyss The EM algorthm (part 2) Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 20, 2016 Recall 2/16 We are gven ndependent observatons

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Hidden Markov Models

Hidden Markov Models CM229S: Machne Learnng for Bonformatcs Lecture 12-05/05/2016 Hdden Markov Models Lecturer: Srram Sankararaman Scrbe: Akshay Dattatray Shnde Edted by: TBD 1 Introducton For a drected graph G we can wrte

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before

More information

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Hidden Markov Models & The Multivariate Gaussian (10/26/04) CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models

More information

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore Sesson Outlne Introducton to classfcaton problems and dscrete choce models. Introducton to Logstcs Regresson. Logstc functon and Logt functon. Maxmum Lkelhood Estmator (MLE) for estmaton of LR parameters.

More information

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Support Vector Machines. Vibhav Gogate The University of Texas at dallas Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest

More information

Space of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics

Space of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics /7/7 CSE 73: Artfcal Intellgence Bayesan - Learnng Deter Fox Sldes adapted from Dan Weld, Jack Breese, Dan Klen, Daphne Koller, Stuart Russell, Andrew Moore & Luke Zettlemoyer What s Beng Learned? Space

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

Negative Binomial Regression

Negative Binomial Regression STATGRAPHICS Rev. 9/16/2013 Negatve Bnomal Regresson Summary... 1 Data Input... 3 Statstcal Model... 3 Analyss Summary... 4 Analyss Optons... 7 Plot of Ftted Model... 8 Observed Versus Predcted... 10 Predctons...

More information

Week 5: Neural Networks

Week 5: Neural Networks Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple

More information

Composite Hypotheses testing

Composite Hypotheses testing Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

Classification as a Regression Problem

Classification as a Regression Problem Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class

More information

Probabilistic Classification: Bayes Classifiers. Lecture 6:

Probabilistic Classification: Bayes Classifiers. Lecture 6: Probablstc Classfcaton: Bayes Classfers Lecture : Classfcaton Models Sam Rowes January, Generatve model: p(x, y) = p(y)p(x y). p(y) are called class prors. p(x y) are called class condtonal feature dstrbutons.

More information

Representing arbitrary probability distributions Inference. Exact inference; Approximate inference

Representing arbitrary probability distributions Inference. Exact inference; Approximate inference Bayesan Learnng So far What does t mean to be Bayesan? Naïve Bayes Independence assumptons EM Algorthm Learnng wth hdden varables Today: Representng arbtrary probablty dstrbutons Inference Exact nference;

More information

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute

More information

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30 STATS 306B: Unsupervsed Learnng Sprng 2014 Lecture 10 Aprl 30 Lecturer: Lester Mackey Scrbe: Joey Arthur, Rakesh Achanta 10.1 Factor Analyss 10.1.1 Recap Recall the factor analyss (FA) model for lnear

More information

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton Memoral Unversty of ewfoundland Pattern Recognton Lecture 7 May 3, 006 http://wwwengrmunca/~charlesr Offce Hours: uesdays hursdays 8:30-9:30

More information

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement Markov Chan Monte Carlo MCMC, Gbbs Samplng, Metropols Algorthms, and Smulated Annealng 2001 Bonformatcs Course Supplement SNU Bontellgence Lab http://bsnuackr/ Outlne! Markov Chan Monte Carlo MCMC! Metropols-Hastngs

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

Ensemble Methods: Boosting

Ensemble Methods: Boosting Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement

More information

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ CSE 455/555 Sprng 2013 Homework 7: Parametrc Technques Jason J. Corso Computer Scence and Engneerng SUY at Buffalo jcorso@buffalo.edu Solutons by Yngbo Zhou Ths assgnment does not need to be submtted and

More information

Conjugacy and the Exponential Family

Conjugacy and the Exponential Family CS281B/Stat241B: Advanced Topcs n Learnng & Decson Makng Conjugacy and the Exponental Famly Lecturer: Mchael I. Jordan Scrbes: Bran Mlch 1 Conjugacy In the prevous lecture, we saw conjugate prors for the

More information

Discriminative classifier: Logistic Regression. CS534-Machine Learning

Discriminative classifier: Logistic Regression. CS534-Machine Learning Dscrmnatve classfer: Logstc Regresson CS534-Machne Learnng 2 Logstc Regresson Gven tranng set D stc regresson learns the condtonal dstrbuton We ll assume onl to classes and a parametrc form for here s

More information

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010 Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton

More information

Retrieval Models: Language models

Retrieval Models: Language models CS-590I Informaton Retreval Retreval Models: Language models Luo S Department of Computer Scence Purdue Unversty Introducton to language model Ungram language model Document language model estmaton Maxmum

More information

Course 395: Machine Learning - Lectures

Course 395: Machine Learning - Lectures Course 395: Machne Learnng - Lectures Lecture 1-2: Concept Learnng (M. Pantc Lecture 3-4: Decson Trees & CC Intro (M. Pantc Lecture 5-6: Artfcal Neural Networks (S.Zaferou Lecture 7-8: Instance ased Learnng

More information

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression 11 MACHINE APPLIED MACHINE LEARNING LEARNING MACHINE LEARNING Gaussan Mture Regresson 22 MACHINE APPLIED MACHINE LEARNING LEARNING Bref summary of last week s lecture 33 MACHINE APPLIED MACHINE LEARNING

More information

Hidden Markov Models

Hidden Markov Models Hdden Markov Models Namrata Vaswan, Iowa State Unversty Aprl 24, 204 Hdden Markov Model Defntons and Examples Defntons:. A hdden Markov model (HMM) refers to a set of hdden states X 0, X,..., X t,...,

More information

I529: Machine Learning in Bioinformatics (Spring 2017) Markov Models

I529: Machine Learning in Bioinformatics (Spring 2017) Markov Models I529: Machne Learnng n Bonformatcs (Sprng 217) Markov Models Yuzhen Ye School of Informatcs and Computng Indana Unversty, Bloomngton Sprng 217 Outlne Smple model (frequency & profle) revew Markov chan

More information

Feature-Rich Sequence Models. Statistical NLP Spring MEMM Taggers. Decoding. Derivative for Maximum Entropy. Maximum Entropy II

Feature-Rich Sequence Models. Statistical NLP Spring MEMM Taggers. Decoding. Derivative for Maximum Entropy. Maximum Entropy II Statstcal NLP Sprng 2010 Feature-Rch Sequence Models Problem: HMMs make t hard to work wth arbtrary features of a sentence Example: name entty recognton (NER) PER PER O O O O O O ORG O O O O O LOC LOC

More information

THE ROYAL STATISTICAL SOCIETY 2006 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE

THE ROYAL STATISTICAL SOCIETY 2006 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE THE ROYAL STATISTICAL SOCIETY 6 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE PAPER I STATISTICAL THEORY The Socety provdes these solutons to assst canddates preparng for the eamnatons n future years and for

More information

1 Convex Optimization

1 Convex Optimization Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.65/15.070J Fall 013 Lecture 1 10/1/013 Martngale Concentraton Inequaltes and Applcatons Content. 1. Exponental concentraton for martngales wth bounded ncrements.

More information

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U) Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of

More information

Evaluation for sets of classes

Evaluation for sets of classes Evaluaton for Tet Categorzaton Classfcaton accuracy: usual n ML, the proporton of correct decsons, Not approprate f the populaton rate of the class s low Precson, Recall and F 1 Better measures 21 Evaluaton

More information

8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF

8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF 10-708: Probablstc Graphcal Models 10-708, Sprng 2014 8 : Learnng n Fully Observed Markov Networks Lecturer: Erc P. Xng Scrbes: Meng Song, L Zhou 1 Why We Need to Learn Undrected Graphcal Models In the

More information

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline Outlne Bayesan Networks: Maxmum Lkelhood Estmaton and Tree Structure Learnng Huzhen Yu janey.yu@cs.helsnk.f Dept. Computer Scence, Unv. of Helsnk Probablstc Models, Sprng, 200 Notces: I corrected a number

More information

CIE4801 Transportation and spatial modelling Trip distribution

CIE4801 Transportation and spatial modelling Trip distribution CIE4801 ransportaton and spatal modellng rp dstrbuton Rob van Nes, ransport & Plannng 17/4/13 Delft Unversty of echnology Challenge the future Content What s t about hree methods Wth specal attenton for

More information

Stat 543 Exam 2 Spring 2016

Stat 543 Exam 2 Spring 2016 Stat 543 Exam 2 Sprng 206 I have nether gven nor receved unauthorzed assstance on ths exam. Name Sgned Date Name Prnted Ths Exam conssts of questons. Do at least 0 of the parts of the man exam. I wll score

More information

CSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing

CSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing CSC321 Tutoral 9: Revew of Boltzmann machnes and smulated annealng (Sldes based on Lecture 16-18 and selected readngs) Yue L Emal: yuel@cs.toronto.edu Wed 11-12 March 19 Fr 10-11 March 21 Outlne Boltzmann

More information

Gaussian process classification: a message-passing viewpoint

Gaussian process classification: a message-passing viewpoint Gaussan process classfcaton: a message-passng vewpont Flpe Rodrgues fmpr@de.uc.pt November 014 Abstract The goal of ths short paper s to provde a message-passng vewpont of the Expectaton Propagaton EP

More information

Boostrapaggregating (Bagging)

Boostrapaggregating (Bagging) Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod

More information

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering / Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons

More information

Errors for Linear Systems

Errors for Linear Systems Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch

More information

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD he Gaussan classfer Nuno Vasconcelos ECE Department, UCSD Bayesan decson theory recall that we have state of the world X observatons g decson functon L[g,y] loss of predctng y wth g Bayes decson rule s

More information

Lecture 12: Classification

Lecture 12: Classification Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna

More information

3.1 ML and Empirical Distribution

3.1 ML and Empirical Distribution 67577 Intro. to Machne Learnng Fall semester, 2008/9 Lecture 3: Maxmum Lkelhood/ Maxmum Entropy Dualty Lecturer: Amnon Shashua Scrbe: Amnon Shashua 1 In the prevous lecture we defned the prncple of Maxmum

More information

Discriminative classifier: Logistic Regression. CS534-Machine Learning

Discriminative classifier: Logistic Regression. CS534-Machine Learning Dscrmnatve classfer: Logstc Regresson CS534-Machne Learnng robablstc Classfer Gven an nstance, hat does a probablstc classfer do dfferentl compared to, sa, perceptron? It does not drectl predct Instead,

More information

Gaussian Mixture Models

Gaussian Mixture Models Lab Gaussan Mxture Models Lab Objectve: Understand the formulaton of Gaussan Mxture Models (GMMs) and how to estmate GMM parameters. You ve already seen GMMs as the observaton dstrbuton n certan contnuous

More information

Stat 543 Exam 2 Spring 2016

Stat 543 Exam 2 Spring 2016 Stat 543 Exam 2 Sprng 2016 I have nether gven nor receved unauthorzed assstance on ths exam. Name Sgned Date Name Prnted Ths Exam conssts of 11 questons. Do at least 10 of the 11 parts of the man exam.

More information

Generative and Discriminative Models. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

Generative and Discriminative Models. Jie Tang Department of Computer Science & Technology Tsinghua University 2012 Generatve and Dscrmnatve Models Je Tang Department o Computer Scence & Technolog Tsnghua Unverst 202 ML as Searchng Hpotheses Space ML Methodologes are ncreasngl statstcal Rule-based epert sstems beng

More information

p(z) = 1 a e z/a 1(z 0) yi a i x (1/a) exp y i a i x a i=1 n i=1 (y i a i x) inf 1 (y Ax) inf Ax y (1 ν) y if A (1 ν) = 0 otherwise

p(z) = 1 a e z/a 1(z 0) yi a i x (1/a) exp y i a i x a i=1 n i=1 (y i a i x) inf 1 (y Ax) inf Ax y (1 ν) y if A (1 ν) = 0 otherwise Dustn Lennon Math 582 Convex Optmzaton Problems from Boy, Chapter 7 Problem 7.1 Solve the MLE problem when the nose s exponentally strbute wth ensty p(z = 1 a e z/a 1(z 0 The MLE s gven by the followng:

More information

Excess Error, Approximation Error, and Estimation Error

Excess Error, Approximation Error, and Estimation Error E0 370 Statstcal Learnng Theory Lecture 10 Sep 15, 011 Excess Error, Approxaton Error, and Estaton Error Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton So far, we have consdered the fnte saple

More information

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4) I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

4DVAR, according to the name, is a four-dimensional variational method.

4DVAR, according to the name, is a four-dimensional variational method. 4D-Varatonal Data Assmlaton (4D-Var) 4DVAR, accordng to the name, s a four-dmensonal varatonal method. 4D-Var s actually a drect generalzaton of 3D-Var to handle observatons that are dstrbuted n tme. The

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they

More information

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017 U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that

More information

Learning from Data 1 Naive Bayes

Learning from Data 1 Naive Bayes Learnng from Data 1 Nave Bayes Davd Barber dbarber@anc.ed.ac.uk course page : http://anc.ed.ac.uk/ dbarber/lfd1/lfd1.html c Davd Barber 2001, 2002 1 Learnng from Data 1 : c Davd Barber 2001,2002 2 1 Why

More information

Clustering gene expression data & the EM algorithm

Clustering gene expression data & the EM algorithm CG, Fall 2011-12 Clusterng gene expresson data & the EM algorthm CG 08 Ron Shamr 1 How Gene Expresson Data Looks Entres of the Raw Data matrx: Rato values Absolute values Row = gene s expresson pattern

More information

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1 On an Extenson of Stochastc Approxmaton EM Algorthm for Incomplete Data Problems Vahd Tadayon Abstract: The Stochastc Approxmaton EM (SAEM algorthm, a varant stochastc approxmaton of EM, s a versatle tool

More information

Lecture 7: Boltzmann distribution & Thermodynamics of mixing

Lecture 7: Boltzmann distribution & Thermodynamics of mixing Prof. Tbbtt Lecture 7 etworks & Gels Lecture 7: Boltzmann dstrbuton & Thermodynamcs of mxng 1 Suggested readng Prof. Mark W. Tbbtt ETH Zürch 13 März 018 Molecular Drvng Forces Dll and Bromberg: Chapters

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

Statistical pattern recognition

Statistical pattern recognition Statstcal pattern recognton Bayes theorem Problem: decdng f a patent has a partcular condton based on a partcular test However, the test s mperfect Someone wth the condton may go undetected (false negatve

More information

Using deep belief network modelling to characterize differences in brain morphometry in schizophrenia

Using deep belief network modelling to characterize differences in brain morphometry in schizophrenia Usng deep belef network modellng to characterze dfferences n bran morphometry n schzophrena Walter H. L. Pnaya * a ; Ary Gadelha b ; Orla M. Doyle c ; Crstano Noto b ; André Zugman d ; Qurno Cordero b,

More information

A Robust Method for Calculating the Correlation Coefficient

A Robust Method for Calculating the Correlation Coefficient A Robust Method for Calculatng the Correlaton Coeffcent E.B. Nven and C. V. Deutsch Relatonshps between prmary and secondary data are frequently quantfed usng the correlaton coeffcent; however, the tradtonal

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

Maximum a posteriori estimation for Markov chains based on Gaussian Markov random fields

Maximum a posteriori estimation for Markov chains based on Gaussian Markov random fields Proceda Computer Scence Proceda Computer Scence (1) 1 1 Maxmum a posteror estmaton for Markov chans based on Gaussan Markov random felds H. Wu, F. Noé Free Unversty of Berln, Arnmallee 6, 14195 Berln,

More information

Limited Dependent Variables

Limited Dependent Variables Lmted Dependent Varables. What f the left-hand sde varable s not a contnuous thng spread from mnus nfnty to plus nfnty? That s, gven a model = f (, β, ε, where a. s bounded below at zero, such as wages

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Maxmum Lkelhood Estmaton INFO-2301: Quanttatve Reasonng 2 Mchael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 1 of 9 Why MLE?

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probablstc & Unsupervsed Learnng Convex Algorthms n Approxmate Inference Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computatonal Neuroscence Unt Unversty College London Term 1, Autumn 2008 Convexty A convex

More information

Lossy Compression. Compromise accuracy of reconstruction for increased compression.

Lossy Compression. Compromise accuracy of reconstruction for increased compression. Lossy Compresson Compromse accuracy of reconstructon for ncreased compresson. The reconstructon s usually vsbly ndstngushable from the orgnal mage. Typcally, one can get up to 0:1 compresson wth almost

More information

CSC401/2511 Spring CSC401/2511 Natural Language Computing Spring 2019 Lecture 5 Frank Rudzicz and Chloé Pou-Prom University of Toronto

CSC401/2511 Spring CSC401/2511 Natural Language Computing Spring 2019 Lecture 5 Frank Rudzicz and Chloé Pou-Prom University of Toronto CSC41/2511 Natural Language Computng Sprng 219 Lecture 5 Frank Rudzcz and Chloé Pou-Prom Unversty of Toronto Defnton of an HMM θ A hdden Markov model (HMM) s specfed by the 5-tuple {S, W, Π, A, B}: S =

More information

Which Separator? Spring 1

Which Separator? Spring 1 Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal

More information

Lecture 4 Hypothesis Testing

Lecture 4 Hypothesis Testing Lecture 4 Hypothess Testng We may wsh to test pror hypotheses about the coeffcents we estmate. We can use the estmates to test whether the data rejects our hypothess. An example mght be that we wsh to

More information

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition) Count Data Models See Book Chapter 11 2 nd Edton (Chapter 10 1 st Edton) Count data consst of non-negatve nteger values Examples: number of drver route changes per week, the number of trp departure changes

More information

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015 CS 3710: Vsual Recognton Classfcaton and Detecton Adrana Kovashka Department of Computer Scence January 13, 2015 Plan for Today Vsual recognton bascs part 2: Classfcaton and detecton Adrana s research

More information

CIS 519/419 Appled Machne Learnng www.seas.upenn.edu/~cs519 Dan Roth danroth@seas.upenn.edu http://www.cs.upenn.edu/~danroth/ 461C, 3401 Walnut Sldes were created by Dan Roth (for CIS519/419 at Penn or

More information

Logistic Classifier CISC 5800 Professor Daniel Leeds

Logistic Classifier CISC 5800 Professor Daniel Leeds lon 9/7/8 Logstc Classfer CISC 58 Professor Danel Leeds Classfcaton strategy: generatve vs. dscrmnatve Generatve, e.g., Bayes/Naïve Bayes: 5 5 Identfy probablty dstrbuton for each class Determne class

More information

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Analyss of Varance and Desgn of Exerments-I MODULE III LECTURE - 2 EXPERIMENTAL DESIGN MODELS Dr. Shalabh Deartment of Mathematcs and Statstcs Indan Insttute of Technology Kanur 2 We consder the models

More information

Learning Algorithm of Boltzmann Machine Based on Spatial Monte Carlo Integration Method

Learning Algorithm of Boltzmann Machine Based on Spatial Monte Carlo Integration Method algorthms Artcle Learnng Algorthm of Boltzmann Machne Based on Spatal Monte Carlo Integraton Method Munek Yasuda Graduate School of Scence and Engneerng, Yamagata Unversty, Yamagata 992-8510, Japan; munek@yz.yamagata-u.ac.p

More information

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU Group M D L M Chapter Bayesan Decson heory Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty Bayesan Decson heory Bayesan decson theory s a statstcal approach to data mnng/pattern

More information

Overview. Hidden Markov Models and Gaussian Mixture Models. Acoustic Modelling. Fundamental Equation of Statistical Speech Recognition

Overview. Hidden Markov Models and Gaussian Mixture Models. Acoustic Modelling. Fundamental Equation of Statistical Speech Recognition Overvew Hdden Marov Models and Gaussan Mxture Models Steve Renals and Peter Bell Automatc Speech Recognton ASR Lectures &5 8/3 January 3 HMMs and GMMs Key models and algorthms for HMM acoustc models Gaussans

More information

Polynomial Regression Models

Polynomial Regression Models LINEAR REGRESSION ANALYSIS MODULE XII Lecture - 6 Polynomal Regresson Models Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur Test of sgnfcance To test the sgnfcance

More information

Support Vector Machines

Support Vector Machines Separatng boundary, defned by w Support Vector Machnes CISC 5800 Professor Danel Leeds Separatng hyperplane splts class 0 and class 1 Plane s defned by lne w perpendcular to plan Is data pont x n class

More information