EE513 Audio Signals and Systems. Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky

Similar documents
Departure Process from a M/M/m/ Queue

LECTURE :FACTOR ANALYSIS

Composite Hypotheses testing

Pattern Classification

Source-Channel-Sink Some questions

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

Excess Error, Approximation Error, and Estimation Error

XII.3 The EM (Expectation-Maximization) Algorithm

Classification Bayesian Classifiers

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

System in Weibull Distribution

Multipoint Analysis for Sibling Pairs. Biostatistics 666 Lecture 18

Discrete Memoryless Channels

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

COS 511: Theoretical Machine Learning

PROBABILITY AND STATISTICS Vol. III - Analysis of Variance and Analysis of Covariance - V. Nollau ANALYSIS OF VARIANCE AND ANALYSIS OF COVARIANCE

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

Homework Assignment 3 Due in class, Thursday October 15

Naïve Bayes Classifier

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

1 Definition of Rademacher Complexity

Statistical Foundations of Pattern Recognition

Boostrapaggregating (Bagging)

1 Review From Last Time

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

halftoning Journal of Electronic Imaging, vol. 11, no. 4, Oct Je-Ho Lee and Jan P. Allebach

Pattern Classification (II) 杜俊

Determination of the Confidence Level of PSD Estimation with Given D.O.F. Based on WELCH Algorithm

Lecture 3: Probability Distributions

Several generation methods of multinomial distributed random number Tian Lei 1, a,linxihe 1,b,Zhigang Zhang 1,c

BAYESIAN CURVE FITTING USING PIECEWISE POLYNOMIALS. Dariusz Biskup

Bayesian Decision Theory

Lecture 19 of 42. MAP and MLE continued, Minimum Description Length (MDL)

CHAPT II : Prob-stats, estimation

Statistical Evaluation of WATFLOOD

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

y new = M x old Feature Selection: Linear Transformations Constraint Optimization (insertion)

Gradient Descent Learning and Backpropagation

Learning from Data 1 Naive Bayes

Kernel Methods and SVMs Extension

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach

Probability Theory. The nth coefficient of the Taylor series of f(k), expanded around k = 0, gives the nth moment of x as ( ik) n n!

Communication with AWGN Interference

Linear Approximation with Regularization and Moving Least Squares

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede. ) with a symmetric Pcovariance matrix of the y( x ) measurements V

Multilayer Perceptron (MLP)

Limited Dependent Variables

Outline. Multivariate Parametric Methods. Multivariate Data. Basic Multivariate Statistics. Steven J Zeil

The Dirac Equation. Elementary Particle Physics Strong Interaction Fenomenology. Diego Bettoni Academic year

Effects of Ignoring Correlations When Computing Sample Chi-Square. John W. Fowler February 26, 2012

First Year Examination Department of Statistics, University of Florida

Our focus will be on linear systems. A system is linear if it obeys the principle of superposition and homogenity, i.e.

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Applied Mathematics Letters

A random variable is a function which associates a real number to each element of the sample space

Computational and Statistical Learning theory Assignment 4

Pattern Recognition. Approximating class densities, Bayesian classifier, Errors in Biometric Systems

Multi-dimensional Central Limit Argument

Logistic regression with one predictor. STK4900/ Lecture 7. Program

Comparison of Regression Lines

Statistics and Probability Theory in Civil, Surveying and Environmental Engineering

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

On Pfaff s solution of the Pfaff problem

Machine Learning. Classification. Theory of Classification and Nonparametric Classifier. Representing data: Hypothesis (classifier) Eric Xing

3. Tensor (continued) Definitions

Signal space Review on vector space Linear independence Metric space and norm Inner product

Multi-dimensional Central Limit Theorem

Differentiating Gaussian Processes

Chapter One Mixture of Ideal Gases

Finite Vector Space Representations Ross Bannister Data Assimilation Research Centre, Reading, UK Last updated: 2nd August 2003

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

Number of cases Number of factors Number of covariates Number of levels of factor i. Value of the dependent variable for case k

Designing Fuzzy Time Series Model Using Generalized Wang s Method and Its application to Forecasting Interest Rate of Bank Indonesia Certificate

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

International Journal of Mathematical Archive-9(3), 2018, Available online through ISSN

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede

Advanced Circuits Topics - Part 1 by Dr. Colton (Fall 2017)

Introduction to Random Variables

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Mixture of Gaussians Expectation Maximization (EM) Part 2

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Statistical analysis using matlab. HY 439 Presented by: George Fortetsanakis

Least Squares Fitting of Data

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

An Application of Fuzzy Hypotheses Testing in Radar Detection

Statistics Spring MIT Department of Nuclear Engineering

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression

Fermi-Dirac statistics

+, where 0 x N - n. k k

Error Probability for M Signals

Least Squares Fitting of Data

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS053) p.5185

Assuming that the transmission delay is negligible, we have

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

U-Pb Geochronology Practical: Background

Transcription:

EE53 Audo Sgnals and Systes Statstcal Pattern Classfcaton Kevn D. Donohue Electrcal and Couter Engneerng Unversty of Kentucy

Interretaton of Audtory Scenes Huan erceton and cognton greatly eceeds any couter-based syste for abstractng sounds nto objects and creatng eanngful audtory scenes. hs erceton of objects not just detectng acoustc energy allows for nterretaton of stuatons leadng to an arorate resonse or further analyses. Sensory organs ears searate acoustc energy nto frequency bands and convert band energy nto neural frngs he audtory corte receves the neural resonses and abstracts an audtory scene.

Audtory Scene Perceton derves a useful reresentaton of realty fro sensory nut. Audtory Strea refers to a ercetual unt assocated wth a sngle haenng A.S. Bregan, 990. Acoustc to Neural Converson Organze nto Audtory Streas Reresentaton of Realty

Couter Interretaton In order for a couter algorth to nterret a scene Acoustc sgnals ust be converted to nubers usng eanngful odels. Sets of nubers or atterns are aed nto events ercetons. Events are analyzed wth other events n relaton to the goal of the algorth and aed nto a stuaton cognton or dervng eanng. Stuaton s aed nto an acton/resonse. Nubers etracted fro the acoustc sgnal for the urose of classfcaton deternaton of event are referred to as features. e -based features are etracted fro sgnal transfors such as: Enveloe Correlatons Frequency-based features are etracted fro sgnal transfors such as: Sectru Cestru Power Sectral Densty

Feature Selecton Eale Consder a roble of dscrnatng between the soen words yes and no based on features:. he estate of frst forant frequency g resonance of the sectral enveloe. he rato n db of the altude of the second forant frequency over the thrd forant frequency g. A fcttous eerent was erfored and these features were couted for 5 recordngs of eole sayng these words. he feature were lotted for each class to develo an algorth to classfy these sales correctly.

Feature Plot Defne a feature vector. G g g Plot G, gven a yes was soen, wth green o s, and gven a no was soen, be wht red s.

Mnu Dstance Aroach Create reresentatve vector for yes and no features 5 μ yes G n yes 5 n 5 μno G n no 5 n For a new sale wth estated features, use decson rule: G μ no no < yes G μ yes Results n 3 ncorrect decsons.

Noralzaton Wth SD he frequency features had larger values than the altude ratos, and therefore had ore nfluence n the decson rocess. Reove scale dfferences by noralzng each feature by ts standard devaton over all classes. σ 5 5 g n yes μ yes + g n no μ no 5 n n Now 4 errors result why would t change?

Mnu Dstance Classfer Consder feature vector wth the otental to be classfed as belongng to K eclusve classes. Classfcaton decson wll be based on the dstance of the feature vector to one of the telate vectors reresentng each of the K classes. he decson rule s for a gven observaton and set of telate vectors z for each class, decde on class such that: argn [ ] D z z

Mnu Dstance Classfer If soe features need to be weghted ore than others n the decson rocess, as well as elotng correlaton between the features, the dstance for each feature can be weghted to result n the weghted nu dstance classfer: argn [ ] D z W z where W s a square atr of weghts wth denson equal to length of. If W s a dagonal atr, t sly scales each of the features n the decson rocess. Off dagonal ters scale the correlaton between features. If W s the nverse of the covarance atr of the features n, and z s the ean feature vector for each class, then the above dstances are referred to as the Mahanalobs dstance. z E K [ ] W E z z K [ ]

Correlaton Recever It can be shown that selectng the class based on the nu dstance between the observaton vector and the telate vector s equvalent to fndng the au correlaton between the observaton vector and the telate: [ ] arg n D z z arga[ C z ] or [ ] D z W z [ C Wz ] arg n arga where the telate vectors have been noralzed such that z z P P s a constant for all

Defntons Rando varable RV s a functon that as events sets nto a dscrete set of real nubers for a dscrete RV, or a contnuous set of real nubers for a contnuous RV. Rando rocess RP s a seres of RVs ndeed by a countable set for a dscrete RP, or by a non-countable set for contnuous RP.

Defntons: PDF Frst Order he lelhood of RV values s descrbed through the robablty densty functon df. X Pr e [ < ] b X e d 0 and d b X X

Defntons: Jont PDF he robabltes descrbng ore than one RV s descrbed by a jont df. Pr [ < < ] b X e yb Y ye XY y y e b e b XY, y ddy, y 0, y and, y ddy XY

Defntons: Condtonal PDF he robabltes descrbng a RV gven that the another event has already occurred s descrbed by a condtonal df. Closely related to ths s Bayes rule:, y y y Y XY Y X, y y y y y y y X Y Y X X Y X X Y XY Y Y X

Eales: Gaussan PDF A frst order Gaussan RV df scalar wth ean µ and standard devaton σ s gven by: A hgher order jont Gaussan df colun vector wth ean vector and covarance atr s gven by: e σ μ πσ X [ ] [] [ ] n n E E,, e / / X L π

Eale Uncorrelated Prove that for an N th order sequence of uncorrelated Gaussan zero-ean RVs the jont PDF can be wrtten as: X N πσ e σ Note that for Gaussan RVs uncorrelated les statstcal ndeendence. Assue varances are equal for all eleents. What would the autocorrelaton of ths sequence loo le? How would the above analyss change f RVs were not zero ean?

Class PDFs When features are odeled as RVs, ther dfs can be used to derve dstance easures for the classfer, and an otal decson rule that nzes classfcaton error can be desgned. Consder K classes ndvdually denoted by. Feature values assocated wth each class can be descrbed by: a osteror robablty lelhood the class after observaton/data a ror robablty lelhood the class before observaton/data Lelhood functon lelhood observaton/data gven a class

Class PDFs he lelhood functon can be estated through ercal studes. Consder 3 seaers whose 3 rd forant frequency s dstrbuted by: Decson hresholds 3 Classfer robabltes can be obtaned fro Bayes rule

Mau a osteror Decson Rule For K classes and observed feature vector, the au a osteror MAP decson rule states: or by alyng Bayes rule: For the bnary case ths reduces to the log lelhood rato j j > f Decde j j j > f Decde > < > < ln ln ln j j j j j j

Eale Consder a class roble wth Gaussan dstrbuted feature vectors [,, ] L E E E [ ] E[ ] N [ ] [ ] Derve the log lelhood rato and descrbe how the classfer uses dstance nforaton to dscrnate between the classes.

Hoewor Consder a features for use n a bnary classfcaton roble. he features are Gaussan dstrbuted are for feature vector [, ]. Derve the log lelhood rato and corresondng classfer for the 3 dfferent cases lsted below: 0.5 0.5 [,] 0.6 0 0,. [, ] 0.8 0 0 0. 3 4 0.5 0. 0 [ 0,0] 0, 0. 0.5 0 0 0.5 [,] 0.6 0 0.5 0. [,] 0,. [, ] 0. 0.5 0. 0.8 [, ] 0.8 0 0 0. Coent how each classfer coutes dstance and uses t n the classfcaton rocess.

Classfcaton Error Classfcaton error s the ercentage of decson statstcs that occur on the wrong sde of the threshold, scaled by the ercentage of tes such an event occurs. λ λ λ λ λ λ 3 e λ λ dλ + λ λ dλ + λ λ dλ + 3 λ λ 3 dλ

Hoewor For the revous eale, wrte an eresson for robablty of a correct classfcaton by changng the ntegrals and lts.e. do not sly wrte c - e

Aroatng a Bayes Classfer If densty functons are not nown: Deterne telate vectors that nze dstances to feature vectors n each class for tranng data vector quantzaton. Assue for of densty functon and estate araeters drectly or teratvely fro the data araetrc or eectaton azaton. Learn osteror robabltes drectly fro tranng data and nterolate on test data neural networs.