Pattern Classification (VI) 杜俊

Similar documents
Pattern Classification (III) & Pattern Verification

Ensamble methods: Boosting

Ensamble methods: Bagging and Boosting

Speaker Adaptation Techniques For Continuous Speech Using Medium and Small Adaptation Data Sets. Constantinos Boulis

Anno accademico 2006/2007. Davide Migliore

Tom Heskes and Onno Zoeter. Presented by Mark Buller

A variational radial basis function approximation for diffusion processes.

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED

Experiments on logistic regression

Deep Learning: Theory, Techniques & Applications - Recurrent Neural Networks -

FORECASTS GENERATING FOR ARCH-GARCH PROCESSES USING THE MATLAB PROCEDURES

Linear Gaussian State Space Models

Non-parametric techniques. Instance Based Learning. NN Decision Boundaries. Nearest Neighbor Algorithm. Distance metric important

CptS 570 Machine Learning School of EECS Washington State University. CptS Machine Learning 1

Non-parametric techniques. Instance Based Learning. NN Decision Boundaries. Nearest Neighbor Algorithm. Distance metric important

Article from. Predictive Analytics and Futurism. July 2016 Issue 13

Západočeská Univerzita v Plzni, Czech Republic and Groupe ESIEE Paris, France

ON DETERMINATION OF SOME CHARACTERISTICS OF SEMI-MARKOV PROCESS FOR DIFFERENT DISTRIBUTIONS OF TRANSIENT PROBABILITIES ABSTRACT

Material Resistance and Friction in Cold Rolling

Description of the MS-Regress R package (Rmetrics)

Air Traffic Forecast Empirical Research Based on the MCMC Method

CSE-473. A Gentle Introduction to Particle Filters

The Rosenblatt s LMS algorithm for Perceptron (1958) is built around a linear neuron (a neuron with a linear

Speech and Language Processing

Hidden Markov Models

1. Consider a pure-exchange economy with stochastic endowments. The state of the economy

Dimitri Solomatine. D.P. Solomatine. Data-driven modelling (part 2). 2

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

Tasty Coffee example

CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK

APPLIED ECONOMETRIC TIME SERIES (2nd edition)

Temporal probability models

MANY FACET, COMMON LATENT TRAIT POLYTOMOUS IRT MODEL AND EM ALGORITHM. Dimitar Atanasov

Computer Vision. Motion Extraction

OBJECTIVES OF TIME SERIES ANALYSIS

References are appeared in the last slide. Last update: (1393/08/19)

Outline. Intro. to Machine Learning. Outline. Course Info. Course Info.: People, References, Resources

An EM based training algorithm for recurrent neural networks

Written HW 9 Sol. CS 188 Fall Introduction to Artificial Intelligence

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter

Understanding the asymptotic behaviour of empirical Bayes methods

Vehicle Arrival Models : Headway

of Manchester The University COMP14112 Hidden Markov Models

hen found from Bayes rule. Specically, he prior disribuion is given by p( ) = N( ; ^ ; r ) (.3) where r is he prior variance (we add on he random drif

CSCE 496/896 Lecture 2: Basic Artificial Neural Networks. Stephen Scott. Introduction. Supervised Learning. Basic Units.

Fault Tolerant Computing CS 530 Reliability Analysis. Yashwant K. Malaiya Colorado State University

Georey E. Hinton. University oftoronto. Technical Report CRG-TR February 22, Abstract

- The whole joint distribution is independent of the date at which it is measured and depends only on the lag.

Sequential Importance Resampling (SIR) Particle Filter

Multivariate analysis of H b b in associated production of H with t t-pair using full simulation of ATLAS detector

Combining Statistical and Knowledge-based Spoken Language Understanding in Conditional Models

Inventory-Based Empty Container Repositioning in a Multi-Port System

Solutions to the Exam Digital Communications I given on the 11th of June = 111 and g 2. c 2

Isolated-word speech recognition using hidden Markov models

Inteligencia Artificial. Revista Iberoamericana de Inteligencia Artificial ISSN:

Modification on Search Algorithm for Computer Simulated Experiment

Age (x) nx lx. Age (x) nx lx dx qx

Presentation Overview

Structural Break Detection in Time Series Models

Vectorautoregressive Model and Cointegration Analysis. Time Series Analysis Dr. Sevtap Kestel 1

Orientation. Connections between network coding and stochastic network theory. Outline. Bruce Hajek. Multicast with lost packets

Learning Naive Bayes Classifier from Noisy Data

Machine Learning 4771

Learning to Process Natural Language in Big Data Environment

Tracking. Announcements

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power

Temporal probability models. Chapter 15, Sections 1 5 1

Inference of Sparse Gene Regulatory Network from RNA-Seq Time Series Data

5.2. The Natural Logarithm. Solution

ST2352. Stochastic Processes constructed via Conditional Simulation. 09/02/2014 ST2352 Week 4 1

Tracking. Many slides adapted from Kristen Grauman, Deva Ramanan

1 birth rate γ (number of births per time interval) 2 death rate δ proportional to size of population

Doctoral Course in Speech Recognition

Expectation- Maximization & Baum-Welch. Slides: Roded Sharan, Jan 15; revised by Ron Shamir, Nov 15

Kriging Models Predicting Atrazine Concentrations in Surface Water Draining Agricultural Watersheds

Maximum Likelihood Parameter Estimation in State-Space Models

04. Kinetics of a second order reaction

Chapter 3 Common Families of Distributions

USP. Surplus-Production Models

Introduc)on to Bayesian methods (con)nued) - Lecture 16

Augmented Reality II - Kalman Filters - Gudrun Klinker May 25, 2004

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power

Conditional Variance Parameters in Symmetric Models

IV. Sign restrictions and Bayesian VAR

MODELLING AND ANALYZING INTERVAL DATA

EE650R: Reliability Physics of Nanoelectronic Devices Lecture 9:

GMM - Generalized Method of Moments

Decentralized Control of Petri Nets

Self assessment due: Monday 4/29/2019 at 11:59pm (submit via Gradescope)

Energy Storage Benchmark Problems

Hidden Markov Models. Adapted from. Dr Catherine Sweeney-Reed s slides

Generalized Least Squares

From Particles to Rigid Bodies

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017

Probabilistic Robotics

Tracking. Many slides adapted from Kristen Grauman, Deva Ramanan

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010

ON THE ANALYSIS OF SYMBOLIC DATA

Statistical Distributions

Figure 1. Jaw RMS target-tracker difference for a.9hz sinusoidal target.

Transcription:

Paern lassificaion VI 杜俊 jundu@usc.edu.cn

Ouline Bayesian Decision Theory How o make he oimal decision? Maximum a oserior MAP decision rule Generaive Models Join disribuion of observaion and label sequences Model esimaion: MLE Bayesian learning discriminaive raining Discriminaive Models Model he oserior robabiliy direcly discriminan funcion Logisic regression suor vecor machine neural nework

Saisical Models: Roadma Gaussian -d Mulivariae Gaussian oninuous daa Discree daa GMM Markov hain DHMM DDHMM Grahical Models Mulinomial Mixure of Mulinomial Esimaion: ML Bayesian DT

Model Parameer Esimaion Maximum Likelihood ML Esimaion: ML mehod: mos oular model esimaion EM Execed-Maximizaion algorihm Examles: Univariae Gaussian disribuion Mulivariae Gaussian disribuion Mulinomial disribuion Gaussian Mixure model Markov chain model: n-gram for language modeling Hidden Markov Model HMM Discriminaive Training Minimum lassificaion Error ME Maximum Muual Informaion MMI Bayesian Model Esimaion: Bayesian heory

Minimum lassificaion Error Esimaion I In a -class aern classificaion roblem given a se of raining daa D{ T T} esimae model arameers for all class o minimize oal classificaion errors in D. ME: minimize emirical classificaion errors Objecive funcion oal classificaion errors in D For each raining daa define misclassificaion measure: or d + max d ln[ ] + max ln[ ] If d >0 incorrec classificaion for error If d <0 correc classificaion for 0 error

Minimum lassificaion Error Esimaion II Sof-max: aroximae d by a differeniable funcion: or η η / ] ex[ ln + d η η / ] ln ex[ ln ] ln[ + d where η>.

Minimum lassificaion Error Esimaion III Error coun for one daa is a se funcion Hd Toal errors in raining se: Q Λ T H d Se funcion is no differeniable aroximaed by a sigmoid funcion smoohed oal errors in raining se. Q Λ Q' Λ T l d where a>0 is a arameer o conrol is shae.

Minimum lassificaion Error Esimaion IV ME esimaion of model arameers for all classes: { } ME arg min Q' Oimizaion: no simle soluion is available Ieraive gradien descen mehod. Sochasic GD bach mode mini-bach mode

Minimum lassificaion Error Esimaion V Find iniial model arameers e.g. ML esimaes alculae gradien of he objecive funcion alculae he value of he gradien based on he curren arameers Udae model arameers Ierae unil convergence

How o alculae Gradien? [ ] T i T i T i i d d l d l a d d d l d l Q ] [ ' The key issue in ME raining is o se a roer se size exerimenally.

Overraining Overfiing Low classificaion error rae in raining se does no always lead o a low error rae in a new es se due o overraining.

Measuring Performance of ME Objecive funcion lassificaion Error in % When o converge: monior hree quaniies in he ME The objecive funcion Error rae in raining se Error rae in es se

Maximum Muual Informaion Esimaion I The model is viewed as a noisy daa generaion channel lass id observaion feaure Maximize muual informaion beween and noisy daa generaion channel I log log log log arg max } { MMI I

Maximum Muual Informaion Esimaion II Difficuly: join disribuion is unknown. Soluion: collec a reresenaive raining se T T o aroximae he join disribuion. Oimizaion: Ieraive gradien-ascen mehod Growh-ransformaion mehod T I MMI log arg max log arg max arg max } {

Bayesian Model Esimaion Bayesian mehods view model arameers as random variables having some known rior disribuion. Prior secificaion Secify rior disribuion of model arameers θ as θ. Training daa D allow us o conver he rior disribuion ino a oseriori disribuion. Bayesian learning

Bayesian Learning Poseriori Likelihood D θ Prior θmap θml

MAP Esimaion Do a oin esimae abou θ based on he oseriori disribuion θ MAP arg max θ θ D arg max θ D θ Then θmap is reaed as esimae of model arameers jus like ML esimae. Someimes need he EM algorihm o derive i. MAP esimaion oimally combine rior knowledge wih new informaion rovided by daa. MAP esimaion is used in seech recogniion o ada seech models o a aricular seaker o coe wih various accens From a generic seaker-indeenden seech model rior ollec a small se of daa from a aricular seaker The MAP esimae give a seaker-adaive model which suis beer o his aricular seaker. θ

How o Secify Priors oninformaive riors Wihou enough rior knowledge jus use a fla rior onjugae riors: for comuaion convenience Afer Bayesian leaning he oserior will have he exac same funcion form as he rior exce he all arameers are udaed. o every model has conjugae rior.

onjugae Prior For a univariae Gaussian model wih only unknown mean: The conjugae rior of Gaussian is Gaussian Afer observing a new daa x he oserior will sill be Gaussian: ] ex[ µ π µ x x x 0 0 0 0 0 0 where ] ex[ µ µ µ µ π µ µ µ + + + + x x

The Sequenial MAP Esimae of Gaussian For univariae Gaussian wih unknown mean he MAP esimae of is mean afer observing x: Afer observing nex daa x:

Projec: Building a -lass lassifier Given some daa from wo classes Build a classifier wih mulivariae Gaussian models ML esimaion Tes wih he lug-in MAP decision rule Imrove i wih GMM models Iniialize GMM wih he K-means clusering Esimae GMM wih he EM algorihm Invesigae GMM wih he mixure number 4 8. Imrove he Gaussian classifier wih discriminaive raining minimum classificaion error esimaion Preferably rogramming wih /++ Reor all of your exerimens and your bes classifier.