A New Statistic Feature of the Short-Time Amplitude Spectrum Values for Human s Unvoiced Pronunciation

Xiodong Zhung A ew Sttistic Feture of the Short-Time Amplitude Spectrum Vlues for Humn s Unvoiced Pronuncition IAODOG ZHUAG 1 1. Qingdo University, Electronics & Informtion College, Qingdo, 6671 CHIA Abstrct: - In this pper, new sttistic feture of the discrete short-time mplitude spectrum is discovered by experiments for the signls of unvoiced pronuncition. For the rndom-vrying short-time spectrum, this feture revels the reltionship between the mplitude s verge nd its stndrd for every frequency component. On the other hnd, the ssocition between the mplitude distributions for different frequency components is lso studied. A new model representing such ssocition is inspired by the normlized histogrm of mplitude. By mthemticl nlysis, the new sttistic feture discovered is proved to be necessry evidence which supports the proposed model, nd lso cn be direct evidence for the widely used hypothesis of identicl distribution of mplitude for ll frequencies. Key-Words: - unvoiced pronuncition, short-time spectrum, mplitude distribution, sttistic nlysis 1 Introduction Speech signl cn be mthemticlly modelled by stochstic process. The speech fetures re rndom nd time-vrying in both time domin nd trnsformed domins such s the short-time spectrum [1,]. The sttistic feture of speech signl is one of the importnt reserch topics. In the frequency domin, the short-time mplitude spectrum vlues cn be mthemticlly ten s rndom vribles, nd there hve been reserches estimting their probbility distribution, which fcilittes the ppliction of speech enhncement [3,4]. Such reserches re bsed on the lrge mount of speech dt in corpor lie TIMIT or other dtbse of dily speech signl from the internet [,5]. However, these studies re bsed on the words or sentences spoen in dily-life communiction, which re the mixture of vrious pronuncition types including vowel, consonnt, plosive, etc. Bsed on such corpor, the estimted sttistic feture is in fct the overll feture of the signl mixed by different pronuncition types. Therefore, it is necessry to further study the sttistic feture of specific pronuncition type (or specific phoneme) lone, becuse different types hve different pronuncition mechnisms. The unvoiced pronuncition is one of the mjor pronuncition types, which is closely relted to the erodynmic process in vocl trct [6-8]. The physicl process of unvoiced pronuncition is complicted, while the sttisticl study of its signl my revel some underlying properties of it. In this pper, the sttistic study is crried out in the frequency domin for unvoiced pronuncition. A novel sttisticl feture nmed consistent stndrd devition coefficient is discovered for short-time mplitude spectrum dt, which is reveled by the sttistic study on stble nd sustined signls of unvoiced pronuncition. Moreover, the reltionship between the mplitude probbility distributions of two different frequency components is investigted, bsed on which new model is proposed representing such reltionship. The vlidity of the new model is supported in mthemticl nlysis with the discovered sttistic feture s direct evidence, which hs potentil ppliction lie speech synthesis. ew Sttistic Feture in Frequency Domin for Unvoiced Pronuncition In order to obtin sufficient dt for sttistic study, the signls used in this study re stble nd sustined pronuncitions. For ech unvoiced phoneme studied, its signl is recorded, nd ech signl is studied lone. For ech signl, the shorttime Fourier trnsform (STFT) is used to gther sufficient spectrum dt for the sttistic study. Since the STFT used is in discrete form, the spectrum hs finite number of discrete components, nd the sttistic study is eventully performed for ech frequency component individully. E-ISS: 4-3488 65 Volume 1, 16

Xiodong Zhung Since currently there is little corpus of sustined phoneme pronuncition, signls hve been cptured using microphones connected to the sound crd on computers. The signls were recorded t smple frequency of 16 Hz, with 16 bit per smple. To gurntee the generlity of experimentl results, signls hve been cptured for group of unvoiced pronuncition spoen by different speers, nd on different recording pltforms (different microphones nd sound crds on different computers). In the collection of signl, the speers were informed with the requirements of stble pronuncition during sufficient time length, which is required by relible sttistic study. For ech unvoiced phoneme, the stbility of pronuncition lrgely determines the effectiveness of further nlysis, therefore the signls were cptured repetedly for severl times, nd the most stble signl cn be selected. In the STFT on ech signl, the frme length is set to 51, which corresponds to time intervl of 3ms for 16 Hz smpling frequency. A Hmming window is used on ech frme in STFT. Let ω denotes the -th frequency component in STFT. Due to the rndomness of the signl, the mplitude of ω lso vries rndomly in ech frme of the signl. Let (ω ) nd σ (ω ) represent the estimted verge nd vrince of ω s mplitude respectively. And the estimted stndrd devition σ(ω ) is the squre-root of σ (ω ). Mthemticlly, (ω ) nd σ(ω ) re two functions, nd their curves cn be drwn fter nd σ re estimted for ech frequency ω. For dozen of unvoiced phoneme, the bove bsic sttistic is estimted. Some typicl results re shown in Fig. 1 nd Fig. s the curves of (ω ) nd σ(ω ). It cn be observed evidently tht there is cler similrity between the curves of (ω ) nd σ(ω ). Such similrity lso exists in ll the other results of unvoiced pronuncition in the experiments, which inspires the study of the reltionship between the two function (ω ) nd σ(ω ) s following. (b) mplitude stndrd devition σ(ω ) Fig. 1. The estimted expecttion nd stndrd devition of the short-time mplitude spectrum for [h] () mplitude expecttion (ω ) (b) mplitude stndrd devition σ(ω ) Fig.. The estimted expecttion nd stndrd devition of the short-time mplitude spectrum for unvoiced [e] () mplitude expecttion (ω ) Besides the bove experimentl results, the reltionship between (ω ) nd σ(ω ) is quntittively verified by clculting the correltion coefficient between the two curves of (ω ) nd σ(ω ). The correltion coefficient is clculted in discrete form: E-ISS: 4-3488 66 Volume 1, 16

Xiodong Zhung ρ σ 1 σω ( ) ω ( ) 1 1 σ ( ω ) ( ω ) (1) where is the number of discrete frequencies in the discrete spectrum. Some of the experimentl results re shown in Tble 1, which re bsed on the pronuncition signls recorded for one mle speer. The correltion coefficients between (ω ) nd σ(ω ) re clculted for different unvoiced phonemes. The correltion coefficients between (ω ) nd σ(ω ) re much close to 1.. Consider the unvoidble error cused by the instbility of sustined nturl pronuncition, nd lso the noise introduced in the signl cpture process, the results indicte tht (ω ) nd σ(ω ) re strongly relted by liner proportionl reltionship, which is new sttistic feture discovered for humn s unvoiced pronuncition. Tble 1 The correltion coefficient of (ω ) nd σ(ω ) for unvoiced pronuncition Pronuncition ρ between (ω ) nd σ(ω ) umber of signl frmes [s] (mle).991 35748 [θ] (mle).985 816 [f] (mle).9948 4179 [h] (mle).998 199 unvoiced [] (mle).996 17497 unvoiced [ə] (mle).9817 45336 unvoiced [e] (mle).9913 4187 unvoiced [i] (mle).9896 44147 Becuse the prmeter of the stndrd devition coefficient represents the σ to rtio, the bove sttistic feture is nmed s the feture of consistent stndrd devition coefficient. In nother word, for the pronuncition of n unvoiced phoneme, the proportionl coefficient between the stndrd devition nd the expecttion is consistent for ll the frequency components in the short-time mplitude spectrum. This feture cn lso be expressed by: σω ( ) cs ( ω) () where c s is the consistent stndrd devition coefficient of mplitude for ll frequency components. The subscript s mens tht Eqution () is for one signl of unvoiced pronuncition. If the signl is chnged to the one of nother different unvoiced pronuncition, the vlue c s my lso chnge. Becuse the expecttion nd the stndrd devition re two bsic sttistic of rndom vrible, the feture of consistent stndrd devition coefficient indictes tht there is certin ssocition between the mplitude probbility distributions of different frequency components, which is studied in the next section. 3 The Reltionship between Amplitude Probbility Distributions of Different Frequency Components Bsed on the spectrum dt obtined by STFT, the histogrm of mplitude for ech frequency component ω is computed. The histogrm reflects the distribution of rndom mplitude dt for ech ω, which is closely relted to the mplitude probbility distribution. Therefore, the mplitude histogrm of ech ω is compred to those of other frequencies, in order to study the reltionship between the corresponding probbility distributions. On the other hnd, in order to study the mplitude distribution type of different ω without the influence of different verge vlue, the normlized histogrm is lso computed for ech ω. The normliztion is for the verge of mplitude. First, the verge of mplitude for ω is computed. After tht, ech mplitude dt of ω is divided by tht verge vlue s preprocessing step. The normlized histogrm is then computed bsed on the dt fter tht preprocessing. For dozen of unvoiced phonemes, the originl histogrm nd normlized histogrm of mplitude re both computed for comprison. Two typicl results re shown in Fig. 3 nd Fig. 4. In order to find clues of the reltionship between mplitude distributions of different ω, the histogrm curves of every ω re plotted together s fmily of curves. () The mplitude histogrms before mplitude normliztion E-ISS: 4-3488 67 Volume 1, 16

Xiodong Zhung (b) The mplitude histogrms fter mplitude normliztion Fig. 3. The mplitude histogrm of ech frequency ω for [h] () The mplitude histogrms before mplitude normliztion (b) The mplitude histogrms fter mplitude normliztion Fig. 4. The mplitude histogrm of ech frequency ω for unvoiced [e] In Fig. 3() nd Fig. 4(), the originl histogrm curves re mixed nd there is no obvious regulrity between them. However, in Fig. 3(b) nd Fig. 4(b), the normlized histogrm curves obviously converge to one centrl curve (shown in blc colour), especilly compred to () of these figures. Becuse the normlized histogrm curves converge closely, the mixed plotting results in belt round centrl curve. For other unvoiced phonemes, similr results re obtined. The results indicte the strong ssocition between the mplitude distributions of different ω. Bsed on the bove results, new model of mplitude distribution in frequency domin is proposed for humn s unvoiced pronuncition. In the model, for the signl of some unvoiced pronuncition, the mplitude distributions for different ω re of the sme type, but with different expecttion (or verge) vlues. In nother word, there is prototype distribution function p ( ), from which the mplitude distribution of ny ω cn be derived by vrying the expecttion. The prototype p ( ) corresponds to the centrl curve (in blc colour) in Fig. 3(b) or Fig. 4(b). This model cn lso be described mthemticlly s follows. As rndom vrible, the mplitude of some ω is modeled s the scling of prototype rndom vrible, whose expecttion is 1: (3) where is the scling prmeter. Eqution (3) is mthemticl description of the model proposed. In the model, is the sme for ech frequency component, but the scling prmeter my be different for different ω. Besides the normlized mplitude histogrms s direct inspirtion of the model, it cn lso find proof from the new discovered sttistic feture in Section. In the following, the feture of consistent stndrd devition coefficient cn be theoreticlly induced from the proposed model; in nother word, this model ccords well with the feture of consistent stndrd devition coefficient discovered in the experiments. First, consider the probbility distribution of in Eqution (3), given p ( ) is the probbility distribution of. According to Eqution (3), the expecttion of is: E [ ] E [ ] E [ ] (4) where is the expecttion of. Bsed on the pdf (probbility distribution function) of vrible s function in probbility theory, the probbility distribution of cn be deduced s: 1 p ( ) p (5) Second, consider the stndrd devition coefficient of : ( ) ( ) σ ( ) p d Vr (6) Considering Eqution (4) nd (5), Eqution (6) cn be rewritten s: E-ISS: 4-3488 68 Volume 1, 16

Xiodong Zhung σ 1 p d ( ) (7) Then do the vrible substitution to the integrl on the right side of Eqution (7): 1 ( ) p( ) d ( ) σ ( ) ( ) p d (8) Remember tht the vribles nd represent the mplitude vlue, which is non-negtive. Therefore, is lso non-negtive. Then Eqution (8) cn be rewritten s: ( ) ( ) σ p d (9) otice tht the numertor of the right side of Eqution (9) is just the stndrd devition of. Therefore, σ σ (1) otice tht the right side of Eqution (1) is constnt given the prototype distribution p ( ). Therefore, the stndrd devition coefficient of is consistent whtever the scling fctor is, which is equl to tht of the prototype vrible. This just ccords well with the experimentl results shown in Section. Therefore, the feture of consistent stndrd devition coefficient supports the model proposed here. 4 Conclusion In this pper, the sttistic feture of unvoiced pronuncition in frequency domin is studied. The Study is focused on the short-time mplitude spectrum, nd is bsed on the dt obtined by STFT on signls of stble nd sustining unvoiced pronuncitions. A new sttistic feture nmed consistent stndrd devition coefficient is discovered. This feture indictes strong ssocitions between mplitude distributions of different frequency components. On the other hnd, such ssocition is lso reveled by compring the normlized mplitude histogrms of every frequency components. A new model is proposed to representing such ssocition. In this model, the rndom vribles representing mplitude of every frequency component belong to the sme pdf type, but they hve different expecttions. If the prototype pdf p ( ) is determined, the pdf of ny frequency s mplitude cn be derived by, where is s expecttion. Moreover, by mthemticl nlysis, this model ccords well with the feture of consistent stndrd devition coefficient. The results in the pper deepen the understnding of the stochstic fetures of unvoiced pronuncition, which is n importnt topic in speech signl nlysis. In future wor, the specific pdf type will be studied to suit the short-time mplitude spectrum dt for unvoiced pronuncition. And other types of pronuncition lie voiced phonemes will be lso studied sttisticlly for new possible fetures. References: [1] W. B. Dvenport, An experimentl study of speech wve probbility distributions. J. Acoust. Soc. Amer., Vol. 4, o.4, 195, pp. 39-399. [] S. Gzor, W. Zhng, Speech probbility distribution, IEEE Signl Processing Letters, Vol. 1, o. 7, 3, pp. 4-7. [3] B. J. Borgstrom, A. Alwn, Log-spectrl mplitude estimtion with Generlized Gmm distributions for speech enhncement. Proceedings of 11 IEEE ICASSP, 11, pp. 4756-4759. [4] J. S. Erelens, J. Jensen, R. Heusdens, Speech enhncement bsed on Ryleigh mixture modeling of speech spectrl mplitude distributions. 15th Europen Signl Processing Conference, 7, pp. 65-69. [5] J. Grofolo, L. Lmel, W. Fisher, J. Fiscus, D. Pllett,. Dhlgren, V. Zue, TIMIT Acousticphonetic continuous speech corpus, Linguistic Dt Consortium, Phildelphi, 1993. [6] D. J. Sinder, M. H. Krne, J. L. Flngn, Synthesis of frictive sounds using n erocoustic noise genertion model. Proceedings of 16th Interntionl Congress Acoustics, Vol. 1, 1998, pp. 49 5. [7] Richrd S. McGown, An erocoustics pproch to phontion: some experimentl nd theoreticl observtions, Hsins Lbortories: Sttus Report on Speech Reserch SR-86/87, pp. 17-116 [8] R. Mittl, B. D. Erth, M. W. Plesni, Fluid dynmics of humn phontion nd speech. Annul Review of Fluid Mechnics, Vol. 45, 13, pp. 437-467. E-ISS: 4-3488 69 Volume 1, 16