Pattern Classification (III) & Pattern Verification

Similar documents
Pattern Classification (VI) 杜俊

CHAPTER 5: MULTIVARIATE METHODS

Variants of Pegasos. December 11, 2009

( ) [ ] MAP Decision Rule

Clustering (Bishop ch 9)

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Face Detection: The Problem

CHAPTER 2: Supervised Learning

Fall 2010 Graduate Course on Dynamic Learning

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 4

Introduction ( Week 1-2) Course introduction A brief introduction to molecular biology A brief introduction to sequence comparison Part I: Algorithms

Machine Learning Linear Regression

Bayes rule for a classification problem INF Discriminant functions for the normal density. Euclidean distance. Mahalanobis distance

Outline. Probabilistic Model Learning. Probabilistic Model Learning. Probabilistic Model for Time-series Data: Hidden Markov Model

Advanced Machine Learning & Perception

Lecture 6: Learning for Control (Generalised Linear Regression)

Robust and Accurate Cancer Classification with Gene Expression Profiling

Machine Learning 2nd Edition

Lecture Slides for INTRODUCTION TO. Machine Learning. ETHEM ALPAYDIN The MIT Press,

OP = OO' + Ut + Vn + Wb. Material We Will Cover Today. Computer Vision Lecture 3. Multi-view Geometry I. Amnon Shashua

A New Method for Computing EM Algorithm Parameters in Speaker Identification Using Gaussian Mixture Models

Dynamic Team Decision Theory. EECS 558 Project Shrutivandana Sharma and David Shuman December 10, 2005

Department of Economics University of Toronto

Introduction to Boosting

NPTEL Project. Econometric Modelling. Module23: Granger Causality Test. Lecture35: Granger Causality Test. Vinod Gupta School of Management

WiH Wei He

Foundations of State Estimation Part II

CHAPTER 10: LINEAR DISCRIMINATION

Lecture VI Regression

Lecture 11 SVM cont

Hidden Markov Models Following a lecture by Andrew W. Moore Carnegie Mellon University

An introduction to Support Vector Machine

Discrete Markov Process. Introduction. Example: Balls and Urns. Stochastic Automaton. INTRODUCTION TO Machine Learning 3rd Edition

January Examinations 2012

Hidden Markov Models with Kernel Density Estimation of Emission Probabilities and their Use in Activity Recognition

Lecture 18: The Laplace Transform (See Sections and 14.7 in Boas)

Lecture Slides for INTRODUCTION TO. Machine Learning. ETHEM ALPAYDIN The MIT Press,

Normal Random Variable and its discriminant functions

Calculating Model Parameters Using Gaussian Mixture Models; Based on Vector Quantization in Speaker Identification

Robustness Experiments with Two Variance Components

Bayesian Decision Theory

Graduate Macroeconomics 2 Problem set 5. - Solutions

Pattern Classification (II) 杜俊

EP2200 Queuing theory and teletraffic systems. 3rd lecture Markov chains Birth-death process - Poisson process. Viktoria Fodor KTH EES

Learning of Graphical Models Parameter Estimation and Structure Learning

Econ107 Applied Econometrics Topic 5: Specification: Choosing Independent Variables (Studenmund, Chapter 6)

Bayesian Learning based Negotiation Agents for Supporting Negotiation with Incomplete Information

Hidden Markov Models

Digital Speech Processing Lecture 20. The Hidden Markov Model (HMM)

ハイブリッドモンテカルロ法に よる実現確率的ボラティリティモデルのベイズ推定

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

Machine learning: Density estimation

CS 536: Machine Learning. Nonparametric Density Estimation Unsupervised Learning - Clustering

FACIAL IMAGE FEATURE EXTRACTION USING SUPPORT VECTOR MACHINES

Endogeneity. Is the term given to the situation when one or more of the regressors in the model are correlated with the error term such that

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

5th International Conference on Advanced Design and Manufacturing Engineering (ICADME 2015)

Pavel Azizurovich Rahman Ufa State Petroleum Technological University, Kosmonavtov St., 1, Ufa, Russian Federation

Chapter 6: AC Circuits

Statistical Paradigm

( t) Outline of program: BGC1: Survival and event history analysis Oslo, March-May Recapitulation. The additive regression model

FTCS Solution to the Heat Equation

Clustering with Gaussian Mixtures

Imperfect Information

Consider processes where state transitions are time independent, i.e., System of distinct states,

Outline. Energy-Efficient Target Coverage in Wireless Sensor Networks. Sensor Node. Introduction. Characteristics of WSN

M. Y. Adamu Mathematical Sciences Programme, AbubakarTafawaBalewa University, Bauchi, Nigeria

Curves. Curves. Many objects we want to model are not straight. How can we represent a curve? Ex. Text, sketches, etc.

ROC Curves for Multivariate Biometric Matching Models

Video-Based Face Recognition Using Adaptive Hidden Markov Models

John Geweke a and Gianni Amisano b a Departments of Economics and Statistics, University of Iowa, USA b European Central Bank, Frankfurt, Germany

FI 3103 Quantum Physics

Author s Accepted Manuscript

DEEP UNFOLDING FOR MULTICHANNEL SOURCE SEPARATION SUPPLEMENTARY MATERIAL

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.

Advanced time-series analysis (University of Lund, Economic History Department)

Time-interval analysis of β decay. V. Horvat and J. C. Hardy

In the complete model, these slopes are ANALYSIS OF VARIANCE FOR THE COMPLETE TWO-WAY MODEL. (! i+1 -! i ) + [(!") i+1,q - [(!

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

ABSTRACT KEYWORDS. Bonus-malus systems, frequency component, severity component. 1. INTRODUCTION

EEL 6266 Power System Operation and Control. Chapter 5 Unit Commitment

Dishonest casino as an HMM

Transcription: Messenger RNA, mrna, is produced and transported to Ribosomes

CHAPTER 7: CLUSTERING

Attribute Reduction Algorithm Based on Discernibility Matrix with Algebraic Method GAO Jing1,a, Ma Hui1, Han Zhidong2,b

Using Fuzzy Pattern Recognition to Detect Unknown Malicious Executables Code

(,,, ) (,,, ). In addition, there are three other consumers, -2, -1, and 0. Consumer -2 has the utility function

Supervised Learning in Multilayer Networks

Machine Learning for Language Technology Lecture 8: Decision Trees and k- Nearest Neighbors

Inverse Joint Moments of Multivariate. Random Variables

Bayesian Inference of the GARCH model with Rational Errors

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

Beyond Balanced Growth : Some Further Results

Filtrage particulaire et suivi multi-pistes Carine Hue Jean-Pierre Le Cadre and Patrick Pérez

Density estimation III.

Support Vector Machines

New M-Estimator Objective Function. in Simultaneous Equations Model. (A Comparative Study)

Markov Chain applications to non parametric option pricing theory

Comparison of the Bayesian and Maximum Likelihood Estimation for Weibull Distribution

Online Supplement for Dynamic Multi-Technology. Production-Inventory Problem with Emissions Trading

Transcription:

Preare by Prof. Hu Jang CSE638 --4 CSE638 3. Seech & Language Processng o.5 Paern Classfcaon III & Paern Verfcaon Prof. Hu Jang Dearmen of Comuer Scence an Engneerng York Unversy Moel Parameer Esmaon Maxmum Lkelhoo ML Esmaon: ML meho: mos oular moel esmaon EM Exece-Maxmzaon algorhm Examles: Unvarae Gaussan srbuon Mulvarae Gaussan srbuon Mulnomal srbuon Gaussan Mxure moel Markov chan moel: n-gram for language moelng Hen Markov Moel HMM Dscrmnave Tranng alernave moel esmaon meho Maxmum Muual Informaon MMI Mnmum Classfcaon Error MCE Large Margn Esamon LME Bayesan Moel Esmaon: Bayesan heory MDI Mnmum Dscrmnaon Informaon De. of CSE York Unv.

Preare by Prof. Hu Jang CSE638 --4 De. of CSE York Unv. Dscrmnave TranngI: Maxmum Muual Informaon Esmaon The moel s vewe as a nosy aa generaon channel class observaon feaure. Deermne moel arameers o maxmze muual nformaon beween an. close relaon beween an nosy aa generaon channel I log log log log arg max } { I MMI Dscrmnave TranngI: Maxmum Muual Informaon Esmaon Dffculy: jon srbuon s unknown. Soluon: collec a reresenave ranng se T T o aroxmae he jon srbuon. Omzaon: Ierave graen-ascen meho Growh-ransformaon meho T MMI I log arg max log arg max arg max } {

Preare by Prof. Hu Jang CSE638 --4 Dscrmnave TranngII: Mnmum Classfcaon Error Esmaon In a -class aern classfcaon roblem gven a se of ranng aa D{ T T} esmae moel arameers for all class o mnmze oal classfcaon errors n D. MCE: mnmze emrcal classfcaon errors Objecve funcon oal classfcaon errors n D For each ranng aa efne msclassfcaon measure: or + max ' ' ln[ ] + max ln[ ' ] ' f > ncorrec classfcaon for error f < correc classfcaon for error ' ' Dscrmnave TranngII: Mnmum Classfcaon Error Esmaon Sof-max: aroxmae by a fferenable funcon: + ln ex[ η ' ] ' ' or ln[ ] + ln ex[ η ln ' ] ' ' where η>. / η / η De. of CSE York Unv. 3

Preare by Prof. Hu Jang CSE638 --4 Dscrmnave TranngII: Mnmum Classfcaon Error Esmaon 3 Error coun for one aa s H where H. s se funcon. Toal errors n ranng se: T Q Λ H Se funcon s no fferenable aroxmae by a sgmo funcon smoohe oal errors n ranng se. Q Λ Q' Λ l where T l + e a a> s a arameer o conrol s shae. Dscrmnave TranngII: Mnmum Classfcaon Error Esmaon 3 MCE esmaon of moel arameers for all classes: { } MCE arg mn Q' Omzaon: no smle soluon s avalable Ierave graen escen meho. GPD generalze robablsc escen meho. n+ n ε ' Q n De. of CSE York Unv. 4

Preare by Prof. Hu Jang CSE638 --4 De. of CSE York Unv. 5 The MCE/GPD Meho Fn nal moel arameers e.g. ML esmaes Calculae graen of he objecve funcon Calculae he value of he graen base on he curren moel arameers Uae moel arameers Ierae unl convergence ' n n n Q ε + How o calculae graen? The key ssue n MCE/GPD s how o se a roer se sze exermenally. [ ] T T T l l a l l Q ] [ '

Preare by Prof. Hu Jang CSE638 --4 Overranng Overfng Low classfcaon error rae n ranng se oes no always lea o a low error rae n a new es se ue o overranng. Measurng Performance of MCE Objecve funcon Classfcaon Error n % When o converge: monor hree quanes n he MCE/GPD The objecve funcon Error rae n ranng se Error rae n es se De. of CSE York Unv. 6

Preare by Prof. Hu Jang CSE638 --4 Large Margn Esmaon searaon bounary FΛ-F Λ moel Λ moel Λ Large-Margn Classfer orgnal searaon bounary FΛ-F Λ Λ Λ Λ Λ new searaon bounary FΛ -FΛ De. of CSE York Unv. 7

Preare by Prof. Hu Jang CSE638 --4 How o efne searaon margn? In -class searable roblem: For a aa oken x of class Λ x FxΛ FxΛ > For a aa oken x of class Λ x Fx Λ Fx Λ > How o efne searaon margn? Exen o mulle-class roblem: classes Λ Λ Λ For a aa oken x of class Λ x FxΛ max FxΛ j mn j [ FxΛ FxΛ ] j j De. of CSE York Unv. 8

Preare by Prof. Hu Jang CSE638 --4 Large Margn Esmaon An -class roblem: each class s reresene by one moel Λ Λ Λ Λ { Gven a ranng se D efne a subse calle suor oken se S base on nal moel as: S { D an ε} } Large-Margn Esmaon LME: Λ ˆ arg max mn subjec o all > Λ S Bayesan Theory Bayesan mehos vew moel arameers as ranom varables havng some known ror srbuon. Pror secfcaon Secfy ror srbuon of moel arameers θ as θ. Tranng aa D allow us o conver he ror srbuon no a oseror srbuon. Bayesan learnng θ D θ θ D θ D θ D We nfer or ece everyhng solely base on he oseror srbuon. Bayesan nference Moel esmaon: he MAP maxmum a oseror esmaon Paern Classfcaon: Bayesan classfcaon Sequenal on-lne ncremenal learnng Ohers: recon moel selecon ec. De. of CSE York Unv. 9

Preare by Prof. Hu Jang CSE638 --4 Bayesan Learnng Poseror θ D Lkelhoo P D θ Pror θ θmap θml θ The MAP esmaon of moel arameers Do a on esmae abou θ base on he oseror srbuon θ MAP arg max θ D arg max θ D θ θ Then θmap s reae as esmae of moel arameers jus lke ML esmae. Somemes nee he EM algorhm o erve. θ MAP esmaon omally combne ror knowlege wh new nformaon rove by aa. MAP esmaon s use n seech recognon o aa seech moels o a arcular seaker o coe wh varous accens From a generc seaker-neenen seech moel ror Collec a small se of aa from a arcular seaker The MAP esmae gve a seaker-aave moel whch su beer o hs arcular seaker. De. of CSE York Unv.

Preare by Prof. Hu Jang CSE638 --4 Bayesan Classfcaon Assume we have classes each class has a classcononal f θ wh arameers θ. The ror knowlege abou θ s nclue n a ror θ. For each class we have a ranng aa se D. Problem: classfy an unknown aa Y no one of he classes. The Bayesan classfcaon s one as: Y arg max Y D arg max Y θ θ D θ where θ D θ θ D θ D θ D Recursve Bayes Learnng Sequenal Bayesan Learnng Bayesan heory roves a framework for on-lne learnng a.k.a. ncremenal learnng aave learnng. When we observe ranng aa one by one we can ynamcally ajus he moel o learn ncremenally from aa. Assume we observe ranng aa se D{ n} one by one θ θ θ θ D n Learnng Rule: oseror ror lkelhoo Knowlege abou Moel a hs sage Knowlege abou Moel a hs sage Knowlege abou Moel a hs sage Knowlege abou Moel a hs sage De. of CSE York Unv.

Preare by Prof. Hu Jang CSE638 --4 De. of CSE York Unv. How o secfy rors onnformave rors In case we on have enough ror knowlege jus use a fla ror a he begnnng. Conjugae rors: for comuaon convenence For some moels f her robably funcons are a reroucng ensy we can choose he ror as a secal form calle conjugae ror so ha afer Bayesan leanng he oseror wll have he exac same funcon form as he ror exce he all arameers are uae. o every moel has conjugae ror. Conjugae Pror For a unvarae Gaussan moel wh only unknown mean: If we choose he ror as a Gaussan srbuon Gaussan s conjugae ror s Gaussan Afer observng a new aa x he oseror wll sll be Gaussan: ] ex[ π x x x ] ex[ π where ] ex[ π + + + + x x

Preare by Prof. Hu Jang CSE638 --4 The sequenal MAP Esmae of Gaussan For unvarae Gaussan wh unknown mean he MAP esmae of s mean afer observng x: x + + + Afer observng nex aa x: x + + + De. of CSE York Unv. 3