STUDY OF ATTRACTOR VARIATION IN THE RECONSTRUCTED PHASE SPACE OF SPEECH SIGNALS

Similar documents
PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS

Statistical Pattern Recognition

Vector Quantization: a Limiting Case of EM

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE

1 Review of Probability & Statistics

1 Inferential Methods for Correlation and Regression Analysis

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

REGRESSION (Physics 1210 Notes, Partial Modified Appendix A)

Optimum LMSE Discrete Transform

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

A statistical method to determine sample size to estimate characteristic value of soil parameters

Random Variables, Sampling and Estimation

RAINFALL PREDICTION BY WAVELET DECOMPOSITION

Dr. Maddah ENMG 617 EM Statistics 11/26/12. Multiple Regression (2) (Chapter 15, Hines)

Mixtures of Gaussians and the EM Algorithm

Chapter 11 Output Analysis for a Single Model. Banks, Carson, Nelson & Nicol Discrete-Event System Simulation

Assessment of extreme discharges of the Vltava River in Prague

ADVANCED SOFTWARE ENGINEERING

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

Lecture 2: Monte Carlo Simulation

Goodness-Of-Fit For The Generalized Exponential Distribution. Abstract

Expectation-Maximization Algorithm.

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Machine Learning Brett Bernstein

11 Correlation and Regression

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01

Final Examination Solutions 17/6/2010

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

Basis for simulation techniques

Double Stage Shrinkage Estimator of Two Parameters. Generalized Exponential Distribution

Chapter 2 Descriptive Statistics

Chapter 13, Part A Analysis of Variance and Experimental Design

Jacob Hays Amit Pillay James DeFelice 4.1, 4.2, 4.3

Pattern Classification, Ch4 (Part 1)

EE 6885 Statistical Pattern Recognition

The target reliability and design working life

Power and Type II Error

10. Comparative Tests among Spatial Regression Models. Here we revisit the example in Section 8.1 of estimating the mean of a normal random

Discrete Orthogonal Moment Features Using Chebyshev Polynomials

General IxJ Contingency Tables

Basics of Probability Theory (for Theory of Computation courses)

Session 5. (1) Principal component analysis and Karhunen-Loève transformation

577. Estimation of surface roughness using high frequency vibrations

Statistical Analysis on Uncertainty for Autocorrelated Measurements and its Applications to Key Comparisons

Algebra of Least Squares

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 9

A proposed discrete distribution for the statistical modeling of

Chapter 6 Principles of Data Reduction

Sensitivity Analysis of Daubechies 4 Wavelet Coefficients for Reduction of Reconstructed Image Error

R. van Zyl 1, A.J. van der Merwe 2. Quintiles International, University of the Free State

Statistical Noise Models and Diagnostics

CS284A: Representations and Algorithms in Molecular Biology

Excellent Performances of The Third-level Disturbed Chaos in The Cryptography Algorithm and The Spread Spectrum Communication

Extended Weighted Linear Prediction Using the Autocorrelation Snapshot

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ

Information-based Feature Selection

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

TMA4245 Statistics. Corrected 30 May and 4 June Norwegian University of Science and Technology Department of Mathematical Sciences.

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

On an Application of Bayesian Estimation

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

15-780: Graduate Artificial Intelligence. Density estimation

Topic 9: Sampling Distributions of Estimators

CHAPTER 4 BIVARIATE DISTRIBUTION EXTENSION

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Analysis of Experimental Measurements

OPTIMAL ALGORITHMS -- SUPPLEMENTAL NOTES

There is no straightforward approach for choosing the warmup period l.

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Logit regression Logit regression

Non-Linear Maximum Likelihood Feature Transformation For Speech Recognition

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

Assessment and Modeling of Forests. FR 4218 Spring Assignment 1 Solutions

Chapter 12 EM algorithms The Expectation-Maximization (EM) algorithm is a maximum likelihood method for models that have hidden variables eg. Gaussian

ANALYSIS OF EXPERIMENTAL ERRORS

ISSN: ISO 9001:2008 Certified International Journal of Engineering and Innovative Technology (IJEIT) Volume 4, Issue 10, April 2015

Direction of Arrival Estimation Method in Underdetermined Condition Zhang Youzhi a, Li Weibo b, Wang Hanli c

Access to the published version may require journal subscription. Published with permission from: Elsevier.

NYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018)

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

Topic 9: Sampling Distributions of Estimators

10-701/ Machine Learning Mid-term Exam Solution

Estimation for Complete Data

Pixel Recurrent Neural Networks

WHAT IS THE PROBABILITY FUNCTION FOR LARGE TSUNAMI WAVES? ABSTRACT

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science. BACKGROUND EXAM September 30, 2004.

[412] A TEST FOR HOMOGENEITY OF THE MARGINAL DISTRIBUTIONS IN A TWO-WAY CLASSIFICATION

Riesz-Fischer Sequences and Lower Frame Bounds

CHAPTER 10 INFINITE SEQUENCES AND SERIES

- E < p. ˆ p q ˆ E = q ˆ = 1 - p ˆ = sample proportion of x failures in a sample size of n. where. x n sample proportion. population proportion

Accuracy assessment methods and challenges

a b c d e f g h Supplementary Information

Activity 3: Length Measurements with the Four-Sided Meter Stick

( ) = p and P( i = b) = q.

Infinite Sequences and Series

Investigating the Significance of a Correlation Coefficient using Jackknife Estimates

THE SYSTEMATIC AND THE RANDOM. ERRORS - DUE TO ELEMENT TOLERANCES OF ELECTRICAL NETWORKS

Transcription:

STUDY OF ATTRACTOR VARIATION IN THE RECONSTRUCTED PHASE SPACE OF SPEECH SIGNALS Jiji Ye Departmet of Electrical ad Computer Egieerig Milwaukee, WI USA jiji.ye@mu.edu Michael T. Johso Departmet of Electrical ad Computer Egieerig Milwaukee, WI USA mike.johso@mu.edu Richard J. Povielli Departmet of Electrical ad Computer Egieerig Milwaukee, WI USA richard.povielli@mu.edu ABSTRACT This paper presets a study of the attractor variatio i the recostructed phase spaces of isolated phoemes. The approach is based o recet work i time-domai sigal classificatio usig dyamical sigal models, whereby a statistical distributio model is obtaied from the phase space ad used for maximum likelihood classificatio. Two sets of experimets are preseted i this paper. The first uses a variable time lag phase space to examie the effect of fudametal frequecy o attractor patters. The secod focuses o speaker variability through a ivestigatio of speaker-depedet phoeme classificatio across speaker sets of icreasig size. The results are discussed at the ed of the paper. 1. INTRODUCTION State of the art speech recogitio systems typically use cepstral coefficiet features, obtaied via a frame-based spectral aalysis of the speech sigal (Deller et al., 2). However, recet work i phase space recostructio techiques (Abarbael, 1996; Katz ad Schrieber, 2) for oliear modelig of time-series sigals has motivated ivestigatio ito the efficacy of usig dyamical systems models i the time-domai for speech recogitio (Ye et al., 22). I theory, recostructed phase spaces capture the full dyamics of the uderlyig system, icludig oliear iformatio ot preserved by traditioal spectral techiques, leadig to possibilities for improved recogitio accuracy. This paper is based o work supported by NSF uder Grat No. IIS-11358 The classical techique for phoeme classificatio is Hidde Markov Models (HMM) (Lee ad Ho, 1989; Youg, 1992), ofte based o Gaussia Mixture Model (GMM) observatio probabilities. The most commo features are Mel Frequecy Cepstral Coefficiets (MFCCs). I cotrast, the recostructed phase space is a plot of the time-lagged vectors of a sigal. Such phase spaces have bee show to be topologically equivalet to the origial system, if the embeddig dimesio is large eough (Sauer et al., 1991). Structural patters occur i this processig space, commoly referred to as trajectories or attractors, which ca be quatified through ivariat metrics such as correlatio dimesio or Lyapuov expoets or through direct models of the phase space distributio. Previous results o phoeme classificatio (Ye et al., 22) have show that a Bayes classifier over statistical models of the recostructed phase spaces are effective i classifyig phoemes. Phase space recostructios are ot specific to ay particular productio model of the uderlyig system, assumig oly that the dimesio of the system is fiite. We would like to be able to take advatage of our a priori kowledge about speech productio mechaisms to improve usefuless of phase space models for speech recogitio i particular. I pursuit of this, we have implemeted two sets of experimets to study attractor variatio, the first to look at fudametal frequecy effects i vowels ad the secod to look at speaker variability issues. Fudametal frequecy, as a parameter that varies sigificatly but does ot cotai iformatio about the geeratig phoeme, should clearly affect the phase space i a adverse way for classificatio. This hypothesis is examied through a compesatio techique usig variable lag recostructios. Speaker variability is

a as of yet ukow factor with regard to the amout of variace caused i uderlyig attractor characteristics, ad is a importat issue i the questio of how well this techique will work for speaker-idepedet tasks. Iitial experimets have show some sigificat discrimiability i such tasks, but performed at a measurably lower accuracy tha that for speaker-depedet tests. Each of these experimets use a oparametric distributio model based o bi couts with a maximum likelihood phoeme classifier, as described i more detail i the ext sectio. The TIMIT database is used for both tasks. 2. METHOD 2.1. Phase Space Recostructio Phase space recostructio techiques are fouded o uderlyig priciples of dyamical system theory (Sauer et al., 1991; Takes, 198) ad have bee applied to a variety of time series aalysis ad oliear sigals processig applicatios [refs xx] (Abarbael, 1996; Katz ad Schrieber, 2). Give a time series x = x, = 1... N (1) where is a time idex ad N is the umber of observatios, the vectors i a recostructed phase space are formed as x = [ x x x ], (2) ( d 1) where is the time lag ad d is the embeddig dimesio. Take as a whole, the sigal forms a trajectory matrix compiled from the time-lagged sigal vectors: X = x x x 1 1+ 1 + ( d 1) x x x 2 2+ 2 + ( d 1) x x x N ( d 1) N ( d 2) N Figure 1 shows a illustrative two-dimesioal recostructed phase space trajectory, while Figure 2 shows the same space with idividual poits oly, as modeled by the statistical approach used here. (3) Figure 1 Recostructed phase space of the vowel phoeme /aa/ illustratig trajectory Figure 2 Recostructed phase space of the vowel phoeme /aa/ illustratig desity The time lag used i the recostructed phase space is empirical but guided by some key measures such as mutual iformatio ad autocorrelatio (Abarbael, 1996; Katz ad Schrieber, 2). Usig such measures, a time lag of six is appropriate for isolated phoeme recogitio usig TIMIT. For the variable lag experimets baselie time lags of 6 ad 12 are used, as discussed i details later.

I practice, the phase-space recostructio is zeromeaed ad the amplitude variatio is radially ormalized via: x µ x X = (4) σ where σ r x N 1 x µ x N ( d 1) =+ r 1 ( d 1) is a origial poit i the phase space, µ is the sample mea of the colums of X. X 2.2. Noparametric Distributio Model of Recostructed Phase Space A statistical characterizatio of the recostructed phase space, related to the atural measure or atural distributio of the uderlyig attractor (Abarbael, 1996; Katz ad Schrieber, 2), is estimated by dividig the recostructed phase space ito 1 histogram bis as is illustrated i Figure 2. This is doe by dividig each dimesio ito te partitios such that each partitio cotais approximately 1% of the data poits. The itercepts of the bis are determied usig all the traiig data. 2.3. The Bayes Classifier The estimates of the atural distributio are used as models for a Bayes classifier. This classifier simply computes the coditioal probabilities of the differet classes give the phase space ad the selects the class with the highest coditioal probability: N ˆ ω = arg max{ pˆ ( )} pˆ i X = arg max i ( x ) (6) i= 1 C i= 1 C = 1 p x is the bi-based likelihood of a where ( ) ˆi poit i the phase space, C is the umber of phoeme classes, ad ˆω is the resultig maximum likelihood hypothesis. 3. EXPERIMENT DESIGN 3.1. Variable Lag Model for Vowels The basic idea of this experimet is to use variable time lags istead of a fixed time lag for embeddig vowel phoemes, as a fuctio of the uderlyig fudametal frequecy of the vowel. A estimate 2 (5) of the fudametal frequecy is used to determie the appropriate embeddig lag. The fudametal frequecy estimate algorithm for vowels used here is based o the computatio of autocorrelatio i the time domai as implemeted by the Etropic ESPS package (Etropic, 1993). The typical vowel fudametal frequecy rage for male speakers is 1~15Hz, with a average of about 125Hz, while the typical rage for female speakers is 175~256Hz, with a average of about 2Hz. For this experimet oly male speakers were used. I the recostructed phase space, a lower fudametal frequecy has a loger period, correspodig to a larger time lag. With a baselie time lag ad mea fudametal frequecy give as ad f respectively, we perform fudametal frequecy compesatio via the equatios f = f (7) ad f ' = (8) f where is the ew time lag ad f is the fudametal frequecy estimate of the phoeme example. This time lag is rouded ad used for phase space recostructio, for both estimatio of the phoeme distributios across the traiig set ad maximum likelihood classificatio of the test set examples. Two differet baselie time lags are used i the experimets. A time lag of 6 correspods to that chose through examiatio of the mutual iformatio ad autocorrelatio heuristics; however, roudig effects lead to quite a low resolutio o the lags i the experimet, which vary primarily betwee 5, 6, ad 7. To achieve a slightly higher resolutio, a secod set of experimets at a time lag of 12 is implemeted for compariso. Sice the fial time lags used for recostructio are give by a fudametal frequecy ratio, the value of the baselie frequecy is ot of great importace, but should be chose to be ear the mea fudametal frequecy. Baselies of 12Hz ad 13Hz were ivestigated. The fial time lag is give i accordace with equatio (8) above. The data set used here icludes 6 male speakers for traiig ad 3 differet male speakers for testig, all withi the same dialect regio.

3.2. Speaker Variability Usig the phase space recostructio techique for speaker-idepedet tasks clearly requires that the attractor patter across differet speakers is cosistet. Icosistecy of attractor structures across differet speakers would be expected to lead to smoothed ad imprecise phoeme models with resultig poor classificatio accuracy. The experimets preseted here are desiged to ivestigate the iter-speaker variatio of attractor patters. Although a umber of differet attractor distace metrics could be used for this purpose, the best such choice is ot readily apparet ad we have istead focused o classificatio accuracy as a fuctio of the umber of speakers i a closedset speaker depedet recogitio task. The higher the cosistecy of attractors across speakers, the less accuracy degradatio should be expected as the umber of speakers i the set is icreased. All speakers are male speakers selected from the same dialect regio withi the TIMIT corpus. The bi-based models ad maximum likelihood classificatio methods discussed previously are used i all cases. The oly variable is the umber of speakers for isolated phoeme classificatio tasks. To examie speaker variability effects across differet classes of phoemes, vowels, fricatives ad asals are tested separately. The overall data set is a group of 22 male speakers, from which subsets of 22, 17, 11, 6, 3, 2 or 1 speaker(s) have bee radomly selected. Classificatio experimets are performed o sets of 7 fricatives, 7 vowels, ad 5 asals. Exp 1: d = 2, = 6, =, f Exp 2: d = 2, = 6, f = 12 Hz, =, f f Exp 3: d = 2, = 6, f = 13 Hz, =, f Exp 4: d = 2, = 12, =, f Exp 5: d = 2, = 12, f = 12 Hz, = f, f Exp 6: d = 2, = 12, f = 13 Hz, = f, where d is the embeddig dimesio, is the default time lag, f is the default fudametal frequecy, f is the estimated fudametal frequecy, ad is the actual embeddig time lag. Table 1 shows the resultig rages for give the parameters, while Table 2 ad 3 show the classificatio results. f 6 12Hz 5~7 6 13Hz 5~8 12 12Hz 1~14 12 13Hz 1~16 Table 1 Rage of give ad f Exp 1 Exp 2 Exp 3 27.7% 33.78% 35.14% 4. RESULTS Table 2 Vowel phoeme classificatio results for 4.1. Variable Lag for Vowels lag of 6 The 7 vowel set { /aw/ /ay/ /ey/ /ix/ /iy/ /ow/ /oy/ } is used for these experimets. As described previously, data are selected from 6 male speakers for traiig ad 3 differet male speakers for testig, all withi the same dialect regio. There are a total of six experimets, three with a baselie lag of 6 ad three with a baselie lag of 12. I each case the tests are ru with a fixed lag as well as with variable lags based o both 12Hz ad 13Hz. The six experimets are summarized as follows: Exp 4 Exp 5 Exp 6 39.19% 35.14% 39.86% Table 3 Vowel phoeme classificatio results for lag of 12 As ca be see from the above results, the best classificatio accuracy is obtaied by usig a variable lag model with default fudametal frequecy of 13Hz.

4.2. Speaker Variability The evaluatio of speaker variability was carried out usig the leave-oe-out cross validatio. The overall classificatio accuracies for the three types of phoemes are show i Table 4, Table 5 ad Table 6. Spkr# 1 2 6 22 Acc(%) 58. 51.6 49.26 48.58 Table 4 Phoeme classificatio results of fricatives with differet umber of speakers Spkr# 1 2 6 11 17 22 Acc(%) 61.9 49.9 49.58 46.92 45.46 46.3 Table 5 Phoeme classificatio results of vowels with differet umber of speakers Spkr# 1 2 3 6 22 Acc(%) 51.79 4. 31.91 29.95 26.76 Table 6 Phoeme classificatio results of asals with differet umber of speakers Figure 3 is a visual iterpretatio of the results preseted above. The classificatio results are plotted agaist the umber of speakers for vowels, fricatives ad asals respectively. reported i the previous paper (Ye et al., 22), for a speaker-idepedet task over vowels, asals, ad fricatives. I all three phoeme types, the accuracy was relatively uchaged after 2 or 3 speakers. 5. WORK IN PROGRESS Because of the large differece betwee the baselie umbers i the variable lag experimets, we will verify ad ivestigate them by ruig other experimets. The experimets preseted are performed o a speaker-idepedet dataset, which could affect the results due to speaker variability. The additioal ivestigatio o the effect of fudametal frequecy o attractor patters will be icluded i the fial paper. 6. DISCUSSION AND CONCLUSIONS We have examied attractor variatio i the recostructed phase space of isolated phoemes. The variable lag experimets show a improvemet whe the phase space icorporates fudametal frequecy compesatio. The speaker variability experimets idicate that there is a sigificat amout of cosistecy across attractor patters betwee speakers, which shows that the recostructed phase space represetatio of speech sigal has the discrimiative power for the speaker-idepedet tasks. Future work would iclude discoverig the method to compesate speaker variability for the phase space recostructio techique o speech recogitio applicatios. REFERENCES Figure 3 The classificatio accuracy vs. umber of speakers As ca be see from Figure 3, the degree of attractor variatio across speakers is differet for these three types of phoemes. Nasals appear to have the largest variability while the fricatives have the least, which is cosistet with the results Abarbael, H.D.I., 1996. Aalysis of observed chaotic data. Spriger, New York, xiv, 272 pp. Deller, J.R., Hase, J.H.L. ad Proakis, J.G., 2. Discrete-Time Processig of Speech Sigals. IEEE Press, New York, 98 pp. Etropic, 1993. ESPS Programs Maual. Etropic Research laboratory. Katz, H. ad Schrieber, T., 2. Noliear Time Series Aalysis. Cambridge Noliear Sciece Series 7. Cambridge Uiversity Press, New York, NY, 34 pp. Lee, K.-F. ad Ho, H.-W., 1989. Speaker-idepedet phoe recogitio usig hidde Markov models.

IEEE Trasactios o Acoustics, Speech ad Sigal Processig, 37(11): 1641-1648. Sauer, T., Yorke, J.A. ad Casdagli, M., 1991. Embedology. Joural of Statistical Physics, 65(3/4): 579-616. Takes, F., 198. Detectig strage attractors i turbulece. Dyamical Systems ad Turbulece, 898: 366-381. Ye, J., Povielli, R.J. ad Johso, M.T., 22. Phoeme Classificatio Usig Naive Bayes Classifier I Recostructed Phase Space. 1th Digital Sigal Processig Workshop. Youg, S., 1992. The geeral use of tyig i phoemebased HMM speech recogitio. ICASSP, 1: 569-572.