A New Statistic Feature of the Short-Time Amplitude Spectrum Values for Human s Unvoiced Pronunciation

Similar documents
Driving Cycle Construction of City Road for Hybrid Bus Based on Markov Process Deng Pan1, a, Fengchun Sun1,b*, Hongwen He1, c, Jiankun Peng1, d

New Expansion and Infinite Series

Continuous Random Variables

Monte Carlo method in solving numerical integration and differential equation

Chapter 5 : Continuous Random Variables

Section 11.5 Estimation of difference of two proportions

The steps of the hypothesis test

CS667 Lecture 6: Monte Carlo Integration 02/10/05

Chapter 9: Inferences based on Two samples: Confidence intervals and tests of hypotheses

Non-Linear & Logistic Regression

A Signal-Level Fusion Model for Image-Based Change Detection in DARPA's Dynamic Database System

Lecture 3 Gaussian Probability Distribution

7.2 The Definite Integral

NUMERICAL INTEGRATION. The inverse process to differentiation in calculus is integration. Mathematically, integration is represented by.

Chapter 2 Fundamental Concepts

Predict Global Earth Temperature using Linier Regression

UNIT 1 FUNCTIONS AND THEIR INVERSES Lesson 1.4: Logarithmic Functions as Inverses Instruction

Hybrid Group Acceptance Sampling Plan Based on Size Biased Lomax Model

Student Activity 3: Single Factor ANOVA

Tests for the Ratio of Two Poisson Rates

Minimum Energy State of Plasmas with an Internal Transport Barrier

Physics 202H - Introductory Quantum Physics I Homework #08 - Solutions Fall 2004 Due 5:01 PM, Monday 2004/11/15

#6A&B Magnetic Field Mapping

Lecture INF4350 October 12008

Pi evaluation. Monte Carlo integration

Acceptance Sampling by Attributes

Fig. 1. Open-Loop and Closed-Loop Systems with Plant Variations

University of Washington Department of Chemistry Chemistry 453 Winter Quarter 2010 Homework Assignment 4; Due at 5p.m. on 2/01/10

The Riemann-Lebesgue Lemma

A signalling model of school grades: centralized versus decentralized examinations

Duality # Second iteration for HW problem. Recall our LP example problem we have been working on, in equality form, is given below.

MIXED MODELS (Sections ) I) In the unrestricted model, interactions are treated as in the random effects model:

13: Diffusion in 2 Energy Groups

Expectation and Variance

Problem. Statement. variable Y. Method: Step 1: Step 2: y d dy. Find F ( Step 3: Find f = Y. Solution: Assume

Solution for Assignment 1 : Intro to Probability and Statistics, PAC learning

Estimation of Binomial Distribution in the Light of Future Data

14.3 comparing two populations: based on independent samples

Conservation Law. Chapter Goal. 5.2 Theory

Decision Science Letters

Probability Distributions for Gradient Directions in Uncertain 3D Scalar Fields

Continuous Random Variables

Comparison Procedures

Review of Calculus, cont d

Reversals of Signal-Posterior Monotonicity for Any Bounded Prior

Credibility Hypothesis Testing of Fuzzy Triangular Distributions

ECO 317 Economics of Uncertainty Fall Term 2007 Notes for lectures 4. Stochastic Dominance

1 Probability Density Functions

Theoretical foundations of Gaussian quadrature

The First Fundamental Theorem of Calculus. If f(x) is continuous on [a, b] and F (x) is any antiderivative. f(x) dx = F (b) F (a).

A SHORT NOTE ON THE MONOTONICITY OF THE ERLANG C FORMULA IN THE HALFIN-WHITT REGIME. Bernardo D Auria 1

Lesson 1: Quadratic Equations

Physics 201 Lab 3: Measurement of Earth s local gravitational field I Data Acquisition and Preliminary Analysis Dr. Timothy C. Black Summer I, 2018

Math 1B, lecture 4: Error bounds for numerical methods

Unit #9 : Definite Integral Properties; Fundamental Theorem of Calculus

Research on Modeling and Compensating Method of Random Drift of MEMS Gyroscope

5.7 Improper Integrals

Method: Step 1: Step 2: Find f. Step 3: = Y dy. Solution: 0, ( ) 0, y. Assume

NOTE ON TRACES OF MATRIX PRODUCTS INVOLVING INVERSES OF POSITIVE DEFINITE ONES

Construction and Selection of Single Sampling Quick Switching Variables System for given Control Limits Involving Minimum Sum of Risks

Section 6.1 INTRO to LAPLACE TRANSFORMS

A Brief Review on Akkar, Sandikkaya and Bommer (ASB13) GMPE

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives

Lecture 21: Order statistics

Math 113 Exam 1-Review

Goals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite

On the Uncertainty of Sensors Based on Magnetic Effects. E. Hristoforou, E. Kayafas, A. Ktena, DM Kepaptsoglou

Estimation of Parameters in Weighted Generalized Beta Distributions of the Second Kind

Chapter 4 Contravariance, Covariance, and Spacetime Diagrams

New data structures to reduce data size and search time

Equations and Inequalities

For the percentage of full time students at RCC the symbols would be:

Read section 3.3, 3.4 Announcements:

Tutorial 4. b a. h(f) = a b a ln 1. b a dx = ln(b a) nats = log(b a) bits. = ln λ + 1 nats. = log e λ bits. = ln 1 2 ln λ + 1. nats. = ln 2e. bits.

THIELE CENTRE. Linear stochastic differential equations with anticipating initial conditions

and that at t = 0 the object is at position 5. Find the position of the object at t = 2.

Fourier Series and Their Applications

Intro to Nuclear and Particle Physics (5110)

Math 135, Spring 2012: HW 7

Time Truncated Two Stage Group Sampling Plan For Various Distributions

INVESTIGATION OF MATHEMATICAL MODEL OF COMMUNICATION NETWORK WITH UNSTEADY FLOW OF REQUESTS

Research Article Moment Inequalities and Complete Moment Convergence

Entropy and Ergodic Theory Notes 10: Large Deviations I

Measuring Electron Work Function in Metal

4 7x =250; 5 3x =500; Read section 3.3, 3.4 Announcements: Bell Ringer: Use your calculator to solve

Week 10: Line Integrals

Scientific notation is a way of expressing really big numbers or really small numbers.

Classical Mechanics. From Molecular to Con/nuum Physics I WS 11/12 Emiliano Ippoli/ October, 2011

Statististics of the SK estimator

Undergraduate Research

INTRODUCTION TO INTEGRATION

1.2. Linear Variable Coefficient Equations. y + b "! = a y + b " Remark: The case b = 0 and a non-constant can be solved with the same idea as above.

Factors affecting the phonation threshold pressure and frequency

12 TRANSFORMING BIVARIATE DENSITY FUNCTIONS

AQA Further Pure 2. Hyperbolic Functions. Section 2: The inverse hyperbolic functions

University of Texas MD Anderson Cancer Center Department of Biostatistics. Inequality Calculator, Version 3.0 November 25, 2013 User s Guide

A Compound of Geeta Distribution with Generalized Beta Distribution

Jackson 2.26 Homework Problem Solution Dr. Christopher S. Baird University of Massachusetts Lowell

Families of Solutions to Bernoulli ODEs

Polynomial Approximations for the Natural Logarithm and Arctangent Functions. Math 230

Transcription:

Xiodong Zhung A ew Sttistic Feture of the Short-Time Amplitude Spectrum Vlues for Humn s Unvoiced Pronuncition IAODOG ZHUAG 1 1. Qingdo University, Electronics & Informtion College, Qingdo, 6671 CHIA Abstrct: - In this pper, new sttistic feture of the discrete short-time mplitude spectrum is discovered by experiments for the signls of unvoiced pronuncition. For the rndom-vrying short-time spectrum, this feture revels the reltionship between the mplitude s verge nd its stndrd for every frequency component. On the other hnd, the ssocition between the mplitude distributions for different frequency components is lso studied. A new model representing such ssocition is inspired by the normlized histogrm of mplitude. By mthemticl nlysis, the new sttistic feture discovered is proved to be necessry evidence which supports the proposed model, nd lso cn be direct evidence for the widely used hypothesis of identicl distribution of mplitude for ll frequencies. Key-Words: - unvoiced pronuncition, short-time spectrum, mplitude distribution, sttistic nlysis 1 Introduction Speech signl cn be mthemticlly modelled by stochstic process. The speech fetures re rndom nd time-vrying in both time domin nd trnsformed domins such s the short-time spectrum [1,]. The sttistic feture of speech signl is one of the importnt reserch topics. In the frequency domin, the short-time mplitude spectrum vlues cn be mthemticlly ten s rndom vribles, nd there hve been reserches estimting their probbility distribution, which fcilittes the ppliction of speech enhncement [3,4]. Such reserches re bsed on the lrge mount of speech dt in corpor lie TIMIT or other dtbse of dily speech signl from the internet [,5]. However, these studies re bsed on the words or sentences spoen in dily-life communiction, which re the mixture of vrious pronuncition types including vowel, consonnt, plosive, etc. Bsed on such corpor, the estimted sttistic feture is in fct the overll feture of the signl mixed by different pronuncition types. Therefore, it is necessry to further study the sttistic feture of specific pronuncition type (or specific phoneme) lone, becuse different types hve different pronuncition mechnisms. The unvoiced pronuncition is one of the mjor pronuncition types, which is closely relted to the erodynmic process in vocl trct [6-8]. The physicl process of unvoiced pronuncition is complicted, while the sttisticl study of its signl my revel some underlying properties of it. In this pper, the sttistic study is crried out in the frequency domin for unvoiced pronuncition. A novel sttisticl feture nmed consistent stndrd devition coefficient is discovered for short-time mplitude spectrum dt, which is reveled by the sttistic study on stble nd sustined signls of unvoiced pronuncition. Moreover, the reltionship between the mplitude probbility distributions of two different frequency components is investigted, bsed on which new model is proposed representing such reltionship. The vlidity of the new model is supported in mthemticl nlysis with the discovered sttistic feture s direct evidence, which hs potentil ppliction lie speech synthesis. ew Sttistic Feture in Frequency Domin for Unvoiced Pronuncition In order to obtin sufficient dt for sttistic study, the signls used in this study re stble nd sustined pronuncitions. For ech unvoiced phoneme studied, its signl is recorded, nd ech signl is studied lone. For ech signl, the shorttime Fourier trnsform (STFT) is used to gther sufficient spectrum dt for the sttistic study. Since the STFT used is in discrete form, the spectrum hs finite number of discrete components, nd the sttistic study is eventully performed for ech frequency component individully. E-ISS: 4-3488 65 Volume 1, 16

Xiodong Zhung Since currently there is little corpus of sustined phoneme pronuncition, signls hve been cptured using microphones connected to the sound crd on computers. The signls were recorded t smple frequency of 16 Hz, with 16 bit per smple. To gurntee the generlity of experimentl results, signls hve been cptured for group of unvoiced pronuncition spoen by different speers, nd on different recording pltforms (different microphones nd sound crds on different computers). In the collection of signl, the speers were informed with the requirements of stble pronuncition during sufficient time length, which is required by relible sttistic study. For ech unvoiced phoneme, the stbility of pronuncition lrgely determines the effectiveness of further nlysis, therefore the signls were cptured repetedly for severl times, nd the most stble signl cn be selected. In the STFT on ech signl, the frme length is set to 51, which corresponds to time intervl of 3ms for 16 Hz smpling frequency. A Hmming window is used on ech frme in STFT. Let ω denotes the -th frequency component in STFT. Due to the rndomness of the signl, the mplitude of ω lso vries rndomly in ech frme of the signl. Let (ω ) nd σ (ω ) represent the estimted verge nd vrince of ω s mplitude respectively. And the estimted stndrd devition σ(ω ) is the squre-root of σ (ω ). Mthemticlly, (ω ) nd σ(ω ) re two functions, nd their curves cn be drwn fter nd σ re estimted for ech frequency ω. For dozen of unvoiced phoneme, the bove bsic sttistic is estimted. Some typicl results re shown in Fig. 1 nd Fig. s the curves of (ω ) nd σ(ω ). It cn be observed evidently tht there is cler similrity between the curves of (ω ) nd σ(ω ). Such similrity lso exists in ll the other results of unvoiced pronuncition in the experiments, which inspires the study of the reltionship between the two function (ω ) nd σ(ω ) s following. (b) mplitude stndrd devition σ(ω ) Fig. 1. The estimted expecttion nd stndrd devition of the short-time mplitude spectrum for [h] () mplitude expecttion (ω ) (b) mplitude stndrd devition σ(ω ) Fig.. The estimted expecttion nd stndrd devition of the short-time mplitude spectrum for unvoiced [e] () mplitude expecttion (ω ) Besides the bove experimentl results, the reltionship between (ω ) nd σ(ω ) is quntittively verified by clculting the correltion coefficient between the two curves of (ω ) nd σ(ω ). The correltion coefficient is clculted in discrete form: E-ISS: 4-3488 66 Volume 1, 16

Xiodong Zhung ρ σ 1 σω ( ) ω ( ) 1 1 σ ( ω ) ( ω ) (1) where is the number of discrete frequencies in the discrete spectrum. Some of the experimentl results re shown in Tble 1, which re bsed on the pronuncition signls recorded for one mle speer. The correltion coefficients between (ω ) nd σ(ω ) re clculted for different unvoiced phonemes. The correltion coefficients between (ω ) nd σ(ω ) re much close to 1.. Consider the unvoidble error cused by the instbility of sustined nturl pronuncition, nd lso the noise introduced in the signl cpture process, the results indicte tht (ω ) nd σ(ω ) re strongly relted by liner proportionl reltionship, which is new sttistic feture discovered for humn s unvoiced pronuncition. Tble 1 The correltion coefficient of (ω ) nd σ(ω ) for unvoiced pronuncition Pronuncition ρ between (ω ) nd σ(ω ) umber of signl frmes [s] (mle).991 35748 [θ] (mle).985 816 [f] (mle).9948 4179 [h] (mle).998 199 unvoiced [] (mle).996 17497 unvoiced [ə] (mle).9817 45336 unvoiced [e] (mle).9913 4187 unvoiced [i] (mle).9896 44147 Becuse the prmeter of the stndrd devition coefficient represents the σ to rtio, the bove sttistic feture is nmed s the feture of consistent stndrd devition coefficient. In nother word, for the pronuncition of n unvoiced phoneme, the proportionl coefficient between the stndrd devition nd the expecttion is consistent for ll the frequency components in the short-time mplitude spectrum. This feture cn lso be expressed by: σω ( ) cs ( ω) () where c s is the consistent stndrd devition coefficient of mplitude for ll frequency components. The subscript s mens tht Eqution () is for one signl of unvoiced pronuncition. If the signl is chnged to the one of nother different unvoiced pronuncition, the vlue c s my lso chnge. Becuse the expecttion nd the stndrd devition re two bsic sttistic of rndom vrible, the feture of consistent stndrd devition coefficient indictes tht there is certin ssocition between the mplitude probbility distributions of different frequency components, which is studied in the next section. 3 The Reltionship between Amplitude Probbility Distributions of Different Frequency Components Bsed on the spectrum dt obtined by STFT, the histogrm of mplitude for ech frequency component ω is computed. The histogrm reflects the distribution of rndom mplitude dt for ech ω, which is closely relted to the mplitude probbility distribution. Therefore, the mplitude histogrm of ech ω is compred to those of other frequencies, in order to study the reltionship between the corresponding probbility distributions. On the other hnd, in order to study the mplitude distribution type of different ω without the influence of different verge vlue, the normlized histogrm is lso computed for ech ω. The normliztion is for the verge of mplitude. First, the verge of mplitude for ω is computed. After tht, ech mplitude dt of ω is divided by tht verge vlue s preprocessing step. The normlized histogrm is then computed bsed on the dt fter tht preprocessing. For dozen of unvoiced phonemes, the originl histogrm nd normlized histogrm of mplitude re both computed for comprison. Two typicl results re shown in Fig. 3 nd Fig. 4. In order to find clues of the reltionship between mplitude distributions of different ω, the histogrm curves of every ω re plotted together s fmily of curves. () The mplitude histogrms before mplitude normliztion E-ISS: 4-3488 67 Volume 1, 16

Xiodong Zhung (b) The mplitude histogrms fter mplitude normliztion Fig. 3. The mplitude histogrm of ech frequency ω for [h] () The mplitude histogrms before mplitude normliztion (b) The mplitude histogrms fter mplitude normliztion Fig. 4. The mplitude histogrm of ech frequency ω for unvoiced [e] In Fig. 3() nd Fig. 4(), the originl histogrm curves re mixed nd there is no obvious regulrity between them. However, in Fig. 3(b) nd Fig. 4(b), the normlized histogrm curves obviously converge to one centrl curve (shown in blc colour), especilly compred to () of these figures. Becuse the normlized histogrm curves converge closely, the mixed plotting results in belt round centrl curve. For other unvoiced phonemes, similr results re obtined. The results indicte the strong ssocition between the mplitude distributions of different ω. Bsed on the bove results, new model of mplitude distribution in frequency domin is proposed for humn s unvoiced pronuncition. In the model, for the signl of some unvoiced pronuncition, the mplitude distributions for different ω re of the sme type, but with different expecttion (or verge) vlues. In nother word, there is prototype distribution function p ( ), from which the mplitude distribution of ny ω cn be derived by vrying the expecttion. The prototype p ( ) corresponds to the centrl curve (in blc colour) in Fig. 3(b) or Fig. 4(b). This model cn lso be described mthemticlly s follows. As rndom vrible, the mplitude of some ω is modeled s the scling of prototype rndom vrible, whose expecttion is 1: (3) where is the scling prmeter. Eqution (3) is mthemticl description of the model proposed. In the model, is the sme for ech frequency component, but the scling prmeter my be different for different ω. Besides the normlized mplitude histogrms s direct inspirtion of the model, it cn lso find proof from the new discovered sttistic feture in Section. In the following, the feture of consistent stndrd devition coefficient cn be theoreticlly induced from the proposed model; in nother word, this model ccords well with the feture of consistent stndrd devition coefficient discovered in the experiments. First, consider the probbility distribution of in Eqution (3), given p ( ) is the probbility distribution of. According to Eqution (3), the expecttion of is: E [ ] E [ ] E [ ] (4) where is the expecttion of. Bsed on the pdf (probbility distribution function) of vrible s function in probbility theory, the probbility distribution of cn be deduced s: 1 p ( ) p (5) Second, consider the stndrd devition coefficient of : ( ) ( ) σ ( ) p d Vr (6) Considering Eqution (4) nd (5), Eqution (6) cn be rewritten s: E-ISS: 4-3488 68 Volume 1, 16

Xiodong Zhung σ 1 p d ( ) (7) Then do the vrible substitution to the integrl on the right side of Eqution (7): 1 ( ) p( ) d ( ) σ ( ) ( ) p d (8) Remember tht the vribles nd represent the mplitude vlue, which is non-negtive. Therefore, is lso non-negtive. Then Eqution (8) cn be rewritten s: ( ) ( ) σ p d (9) otice tht the numertor of the right side of Eqution (9) is just the stndrd devition of. Therefore, σ σ (1) otice tht the right side of Eqution (1) is constnt given the prototype distribution p ( ). Therefore, the stndrd devition coefficient of is consistent whtever the scling fctor is, which is equl to tht of the prototype vrible. This just ccords well with the experimentl results shown in Section. Therefore, the feture of consistent stndrd devition coefficient supports the model proposed here. 4 Conclusion In this pper, the sttistic feture of unvoiced pronuncition in frequency domin is studied. The Study is focused on the short-time mplitude spectrum, nd is bsed on the dt obtined by STFT on signls of stble nd sustining unvoiced pronuncitions. A new sttistic feture nmed consistent stndrd devition coefficient is discovered. This feture indictes strong ssocitions between mplitude distributions of different frequency components. On the other hnd, such ssocition is lso reveled by compring the normlized mplitude histogrms of every frequency components. A new model is proposed to representing such ssocition. In this model, the rndom vribles representing mplitude of every frequency component belong to the sme pdf type, but they hve different expecttions. If the prototype pdf p ( ) is determined, the pdf of ny frequency s mplitude cn be derived by, where is s expecttion. Moreover, by mthemticl nlysis, this model ccords well with the feture of consistent stndrd devition coefficient. The results in the pper deepen the understnding of the stochstic fetures of unvoiced pronuncition, which is n importnt topic in speech signl nlysis. In future wor, the specific pdf type will be studied to suit the short-time mplitude spectrum dt for unvoiced pronuncition. And other types of pronuncition lie voiced phonemes will be lso studied sttisticlly for new possible fetures. References: [1] W. B. Dvenport, An experimentl study of speech wve probbility distributions. J. Acoust. Soc. Amer., Vol. 4, o.4, 195, pp. 39-399. [] S. Gzor, W. Zhng, Speech probbility distribution, IEEE Signl Processing Letters, Vol. 1, o. 7, 3, pp. 4-7. [3] B. J. Borgstrom, A. Alwn, Log-spectrl mplitude estimtion with Generlized Gmm distributions for speech enhncement. Proceedings of 11 IEEE ICASSP, 11, pp. 4756-4759. [4] J. S. Erelens, J. Jensen, R. Heusdens, Speech enhncement bsed on Ryleigh mixture modeling of speech spectrl mplitude distributions. 15th Europen Signl Processing Conference, 7, pp. 65-69. [5] J. Grofolo, L. Lmel, W. Fisher, J. Fiscus, D. Pllett,. Dhlgren, V. Zue, TIMIT Acousticphonetic continuous speech corpus, Linguistic Dt Consortium, Phildelphi, 1993. [6] D. J. Sinder, M. H. Krne, J. L. Flngn, Synthesis of frictive sounds using n erocoustic noise genertion model. Proceedings of 16th Interntionl Congress Acoustics, Vol. 1, 1998, pp. 49 5. [7] Richrd S. McGown, An erocoustics pproch to phontion: some experimentl nd theoreticl observtions, Hsins Lbortories: Sttus Report on Speech Reserch SR-86/87, pp. 17-116 [8] R. Mittl, B. D. Erth, M. W. Plesni, Fluid dynmics of humn phontion nd speech. Annul Review of Fluid Mechnics, Vol. 45, 13, pp. 437-467. E-ISS: 4-3488 69 Volume 1, 16