THE PROBLEMS OF ROBUST LPC PARAMETRIZATION FOR. Petr Pollak & Pavel Sovka. Czech Technical University of Prague

Similar documents
NOISE ROBUST RELATIVE TRANSFER FUNCTION ESTIMATION. M. Schwab, P. Noll, and T. Sikora. Technical University Berlin, Germany Communication System Group

A Subspace Approach to Estimation of. Measurements 1. Carlos E. Davila. Electrical Engineering Department, Southern Methodist University

Chapter 9. Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析

SPEECH ENHANCEMENT USING PCA AND VARIANCE OF THE RECONSTRUCTION ERROR IN DISTRIBUTED SPEECH RECOGNITION

Optimal Speech Enhancement Under Signal Presence Uncertainty Using Log-Spectral Amplitude Estimator

CEPSTRAL analysis has been widely used in signal processing

Impulsive Noise Filtering In Biomedical Signals With Application of New Myriad Filter

Lab 9a. Linear Predictive Coding for Speech Processing

SPEECH ANALYSIS AND SYNTHESIS

EFFECTS OF ILL-CONDITIONED DATA ON LEAST SQUARES ADAPTIVE FILTERS. Gary A. Ybarra and S.T. Alexander

Voice Activity Detection Using Pitch Feature

Multimedia Communications. Differential Coding

A new fast algorithm for blind MA-system identication. based on higher order cumulants. K.D. Kammeyer and B. Jelonnek

Relative Irradiance. Wavelength (nm)

SVD-based optimal ltering with applications to noise reduction in speech signals Simon Doclo ESAT - SISTA, Katholieke Universiteit Leuven Kardinaal Me

Part III Spectrum Estimation

Feature extraction 2

A Fast Algorithm for. Nonstationary Delay Estimation. H. C. So. Department of Electronic Engineering, City University of Hong Kong

Bobby Hunt, Mariappan S. Nadar, Paul Keller, Eric VonColln, and Anupam Goyal III. ASSOCIATIVE RECALL BY A POLYNOMIAL MAPPING

A Generalized Subspace Approach for Enhancing Speech Corrupted by Colored Noise

Signal Modeling Techniques in Speech Recognition. Hassan A. Kingravi

A SUBSPACE METHOD FOR SPEECH ENHANCEMENT IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes

Lesson 1. Optimal signalbehandling LTH. September Statistical Digital Signal Processing and Modeling, Hayes, M:

ADSP ADSP ADSP ADSP. Advanced Digital Signal Processing (18-792) Spring Fall Semester, Department of Electrical and Computer Engineering

Examples. 2-input, 1-output discrete-time systems: 1-input, 1-output discrete-time systems:

Enhancement of Noisy Speech. State-of-the-Art and Perspectives

NOISE reduction is an important fundamental signal

Least Square Es?ma?on, Filtering, and Predic?on: ECE 5/639 Sta?s?cal Signal Processing II: Linear Es?ma?on

FORSCHUNGSZENTRUM JÜLICH GmbH Zentralinstitut für Angewandte Mathematik D Jülich, Tel. (02461)

Submitted to Electronics Letters. Indexing terms: Signal Processing, Adaptive Filters. The Combined LMS/F Algorithm Shao-Jen Lim and John G. Harris Co

ICA ALGORITHM BASED ON SUBSPACE APPROACH. Ali MANSOUR, Member of IEEE and N. Ohnishi,Member of IEEE. Bio-Mimetic Control Research Center (RIKEN),

Linear Regression and Its Applications

ECE 673-Random signal analysis I Final

Parametric Signal Modeling and Linear Prediction Theory 1. Discrete-time Stochastic Processes

SYSTEM RECONSTRUCTION FROM SELECTED HOS REGIONS. Haralambos Pozidis and Athina P. Petropulu. Drexel University, Philadelphia, PA 19104

Rate-Distortion Based Temporal Filtering for. Video Compression. Beckman Institute, 405 N. Mathews Ave., Urbana, IL 61801

2D Spectrogram Filter for Single Channel Speech Enhancement

[km]

L7: Linear prediction of speech

The goal of the Wiener filter is to filter out noise that has corrupted a signal. It is based on a statistical approach.

Parametric Method Based PSD Estimation using Gaussian Window

Modifying Voice Activity Detection in Low SNR by correction factors

Data Detection for Controlled ISI. h(nt) = 1 for n=0,1 and zero otherwise.

Elec4621: Advanced Digital Signal Processing Chapter 8: Wiener and Adaptive Filtering

"Robust Automatic Speech Recognition through on-line Semi Blind Source Extraction"

CCNY. BME I5100: Biomedical Signal Processing. Stochastic Processes. Lucas C. Parra Biomedical Engineering Department City College of New York

MANY digital speech communication applications, e.g.,

A Levinson algorithm based on an isometric transformation of Durbin`s

ADAPTIVE FILTER THEORY

Department of Electrical and Computer Engineering Digital Speech Processing Homework No. 7 Solutions

2-D FIR WIENER FILTER REALIZATIONS USING ORTHOGONAL LATTICE STRUCTURES

Sound Recognition in Mixtures

Learning with Ensembles: How. over-tting can be useful. Anders Krogh Copenhagen, Denmark. Abstract

IDIAP. Martigny - Valais - Suisse ADJUSTMENT FOR THE COMPENSATION OF MODEL MISMATCH IN SPEAKER VERIFICATION. Frederic BIMBOT + Dominique GENOUD *

Time-domain representations

On Moving Average Parameter Estimation

Performance Comparison of Two Implementations of the Leaky. LMS Adaptive Filter. Scott C. Douglas. University of Utah. Salt Lake City, Utah 84112

w 1 output input &N 1 x w n w N =&2

(a)

1. Determine if each of the following are valid autocorrelation matrices of WSS processes. (Correlation Matrix),R c =

Parametric Signal Modeling and Linear Prediction Theory 4. The Levinson-Durbin Recursion

Chapter 9 Observers, Model-based Controllers 9. Introduction In here we deal with the general case where only a subset of the states, or linear combin

Estimation of the Optimum Rotational Parameter for the Fractional Fourier Transform Using Domain Decomposition

Dominant Feature Vectors Based Audio Similarity Measure

EE482: Digital Signal Processing Applications

Statistical and Adaptive Signal Processing

CONTENTS NOTATIONAL CONVENTIONS GLOSSARY OF KEY SYMBOLS 1 INTRODUCTION 1

III.C - Linear Transformations: Optimal Filtering

SEPARATION OF ACOUSTIC SIGNALS USING SELF-ORGANIZING NEURAL NETWORKS. Temujin Gautama & Marc M. Van Hulle

Phoneme segmentation based on spectral metrics

Speech Signal Representations

Improved system blind identification based on second-order cyclostationary statistics: A group delay approach

Musical noise reduction in time-frequency-binary-masking-based blind source separation systems

SINGLE-CHANNEL SPEECH PRESENCE PROBABILITY ESTIMATION USING INTER-FRAME AND INTER-BAND CORRELATIONS

Experimental evidence showing that stochastic subspace identication methods may fail 1

Median Filter Based Realizations of the Robust Time-Frequency Distributions

ADAPTIVE FILTER THEORY

On the Estimation of the Mixing Matrix for Underdetermined Blind Source Separation in an Arbitrary Number of Dimensions

Comparison of DDE and ETDGE for. Time-Varying Delay Estimation. H. C. So. Department of Electronic Engineering, City University of Hong Kong

Linear Prediction Coding. Nimrod Peleg Update: Aug. 2007

Matrix Factorizations

ESTIMATION OF RELATIVE TRANSFER FUNCTION IN THE PRESENCE OF STATIONARY NOISE BASED ON SEGMENTAL POWER SPECTRAL DENSITY MATRIX SUBTRACTION

HARMONIC EXTENSION OF AN ADAPTIVE NOTCH FILTER FOR FREQUENCY TRACKING

Application of the Tuned Kalman Filter in Speech Enhancement

MULTICHANNEL SIGNAL PROCESSING USING SPATIAL RANK COVARIANCE MATRICES

CEPSTRAL ANALYSIS SYNTHESIS ON THE MEL FREQUENCY SCALE, AND AN ADAPTATIVE ALGORITHM FOR IT.

Improved noise power spectral density tracking by a MAP-based postprocessor

Lecture 7: Feature Extraction

IMPROVED NOISE CANCELLATION IN DISCRETE COSINE TRANSFORM DOMAIN USING ADAPTIVE BLOCK LMS FILTER

15th European Signal Processing Conference (EUSIPCO 2007), Poznan, Poland, September 3-7, 2007, copyright by EURASIP

The Bias-Variance dilemma of the Monte Carlo. method. Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel

Error correcting least-squares Subspace algorithm for blind identication and equalization

Wavelet de-noising for blind source separation in noisy mixtures.

The DFT as Convolution or Filtering

y(n) Time Series Data

EFFICIENT CONSTRUCTION OF POLYPHASE SEQUENCES WITH OPTIMAL PEAK SIDELOBE LEVEL GROWTH. Arindam Bose and Mojtaba Soltanalian

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement

No. of dimensions 1. No. of centers

Design Criteria for the Quadratically Interpolated FFT Method (I): Bias due to Interpolation

Allpass Modeling of LP Residual for Speaker Recognition

Transcription:

THE PROBLEMS OF ROBUST LPC PARAMETRIZATION FOR SPEECH CODING Petr Polla & Pavel Sova Czech Technical University of Prague CVUT FEL K, 66 7 Praha 6, Czech Republic E-mail: polla@noel.feld.cvut.cz Abstract The problems and possible solutions of robust LPC parametrization in the case of noise presence are discused. The experience with preprocessing using standard speech enhancement techniques is mentioned. These techniques are based onspectral subraction and its modications. The possibility of correlation subtraction is discussed in the second part. This problem is solved for the case of white noise becasue white noise variance seems to be eciently estimated from correlation matrix of noisy speech. INTRODUCTION Linear Predictive Coding (LPC) [4], [] analysis is frequently used tool for parametrization of speech signal. Nevertheless, the standard technique is not too robust to noise, the solution is biased so futher processing with biased parameters is degraded or it completely fails. The analyzed signal s[n] is assumed to be predicted by the linear predictor of order p as bs[n] =; = a[]s[n ; ]: () The solution of predictor coecients a[] is based on minimalization of the mean square prediction error and it satises normal equations in the general matrix form R ss a = ;r s : () We assume autocorrelation method of solution, the normal equations are called Yule-Waler equations, the matrix R ss is the symetic equidiagonal (Toeplitz) correlation matrix of correlations R[],..., R[p ; ], and r s is the vector of correlations R[],..., R[p]. When we assume speech signal corrupted by uncorrelated additive noise x[n] = s[n] + n[n] we obtaine correlation matrix of noisy speech R xx and Yule-Waler equations yield biased solution of predictor coecients a, i.e. R xx a = ;r x () We have studied two dierent noise removal technique which are described in following two sections. PREPROCESSING WITH SPECTRAL SUBTRACTION The preprocessing with standard speech enhancement was the rst possibility of robust parametrization technique which was tested. We have studied many modications with more detailed analysis as speech enhancement algorithms for transmission with respect to speech distortion and level of noise suppression [5], [6]. We realized the experiments with isolated word recognizer realized by HTK tools under noise condittion. We used dierent modications of spectral subtraction as rst step before parametrization and regognition. The resultes were published in [7] and they can be summarized: Generally, these speech enhancement algorithms improve input SNR of processed speech and consequently enhance the initial condition of noisy speech for further parametrization. It does not produce exactly noise level independent robust parametrization. The speech distortion nally decreases the rate of speech recognition. The level of correct speech recognition rate for clean speech cannot be achieved. The negative inuence of the distortion produced by spectral subtraction can be decreased by training on speech preprocessed by same algorithm.

The enhanced signal needs not be generated. The autocorrelation coecients can be evaluated from power spectral density of enhanced speech and then used in the solution of Yule-Waler equations. These techniques are very ecient from the point of view of computational cost. NOISE CORRELATION COMPENSATION TECHNIQUE This technique of the robust pametrization is derived from the algorithm in [] by Gesbert et al. and it is based on noise correlation compensation (correlation subtraction). The basic problem is the estimation of noise correlation again.. Theoretical solution for white noise case The theoretical solution in the presence of white additive noise seems to be very simple. Since the noise is assume to be white, its correlation matrix is theoretically diagonal with all values equaled to R b [] = and the autocorrelation coecients satisfy R x [] = R s [] + R b [] and R x [] =R s [] for>, i.e. the Yule-Waler equations for biased solution tae form (R ss + I) a = ;r x : (4) When we now the noise variance we can loo for the unbiased solution from the Yule-Waler equations modied as (R xx ; I) a = ;r x : (5) In this case, the problem is simplied to estimate noise variance.. Estimation of white noise variance For strongly correlated signals the eigenvalues of correlation matrix fall rapidly to zero. We assume standard ortonormal matrix decomposition where D is the diagonal matrix of eigenvalues fullled the condition R ss = U D U T (6) ::: r = ::: p =: (7) The idea of the described technique is in the estimation of noise variance from the eigenvalues of noisy correlation matrix. Since the correlation matrix of white noise is diagonal, it must be fullled R xx = U(D + I) U T (8) so the noise variance can be estimated from minimal eigenvalue of noise correlation matrix as = min (9) Problems: The last eigenvalue of speech correlation matrix is not exactly equal to zero so the last eigenvalue of noisy correlation matrix should be close to with some error. x x ::: xp : () Let we assume that estimation of noise variance is possible to obtain from the minimal eigenvalue of noisy speech correlation matrix, eq. (9). The ratio ow-graph of minimal eigenvalue normalized to the sum of all ones is shown on g.. This minimal eigenvalue seems to be very small especially for voiced parts of speech so the minimal eigenvalue can be mentioned very close to zero. It is shown on g. that the minimal eigenvalue is higher in noisy case. The dash-line shows how much it is close to noise variance (the real is used instead of min ).

5 Clean speech s[n] 5.5.5 Discrete time.5 x 4 lammin/sum(lam) 4 5 6.5.5 Discrete time.5 x 4 Figure : Order of magnitude uctuation for clean speech. 5 White noise case SNR db x[n] 5.5.5.5 Discrete time x 4 lammin/sum(lam) 4.5.5.5 Discrete time x 4 Figure : Order of magnitude uctuation for noisy speech.. Basic technique This method is based exactly according to the theoretical solution: (R xx ; min I) a = ;r x : () The matrix R xx ; min I is singular so it is not possible to nd a solution by using simple inverse of modied matrix or using Levinson-Durbin algorithm respectively. Itmust be used pseudoinverse of the modied matrix, i.e. inverse in Moore-Penrose sense. More precisely, for non-singular matrices which satisfy R mm = U D U T the inverse is R ; mm = U D; U T () D ; = diag(= = ::: = p ) : () For singular case a number of last eigenvalues are equal to zero, D = diag( ::: r ::: ), and the inverse must be found as R ; mm = U D; U T (4) D ; = diag(= ::: = r ::: ) : (5) Problems: The results are better for the voiced speech sequences. Absolute cepstral distance from the original clean speech is less but the distance of successive frames increases. It conrms typical behaviour demonstrated on g..

The solution is not always stable (the reection coecients have modul greater than, see g. 6. The polynomial stabilization with respect of unit circle must be used. The most important problem (the unstability of solution) appears already in the case of clean speech. But spectral characteristic of unstable model are close to the unbiased solution for clean speech. When the polynomial stabilization is used the cepstral coecients can be computed which seem to converge to the unbiased solution. The exact noise variance estimation was also used instead of min so the error of its estimation was eliminated but the solution remains practicaly same. It shows that the correlation matrix non-zero nondiagonal elements of noise correlation matrix have not negligible inuence..4 Scaled factor modication This approach tests the behaviour when not all min is subtracted. The scaling constant q of min is used. Generally, it could be in the range <q<butthetypical value is :8 :9. (R xx ; q min I) a = ;r x : (6) The problem of modied correlation matrix singularity isovercome in this case. (Levinson-Durbin) can be used. The standard technique Problems: For the values of q close to, the results start to be close to above decribe basic technique with mentioned problems. Some level of noise rests in the analyzed signal. This depends on parameter q and..5 Experiments with modeling in the correlation domain We assume the white noise uncorrelated with the speech signal. Since we had some problems to fulll this condition in the environment of MATLAB we tried to model directly the correlation matrices. The correlation matrix of white noise is modeled as diagonal matrix of noise variance I and the Toeplitz matrix of random values E. R xx = R ss + I + E : (7) For the experiment which results are shown in tab. 5 the random values were scaled by ;. Comments: The algorithm starts giving better results in the case of diagonal noise correlation matrix modeling when non-zero non-diagonal and cross-correlation elements were ommitted. Increasing of window length should decrease inuence of undesirable cross-correlation and non-diagonal elements. But it is not possible to increase the window length without the respect of stationary parts of speech. 4 EXPERIMENTS 4. Classication criteria. Since the cepstral coecients computed by the recursion from the predictors coecients are used usually as parameters for speech recognizers the Euclidan cepstral distance of two cepstral vectors were used as classication criterion. Let us assume c[] -unbiased parameters, i.e. the vectors of parameters of standard technique for signal without any noise, c [] -biased parameters for speech with noise, and c [] -robust parameters of noise removal technique. Then we can compute cepstral distance from unbiased solution, cd i = = (c i [] ; c i[]) cd i = = (c i [] ; c i []) (8) Without these random values the solution of the problem is trivial, i.e. only adding and subtraction of diagonal matrix 4

and cepstral distance of successive frames cds i = = (c i[] ; c i+[]) cds i = = i [] ; c i+[]) : (9) The cds must be compared always to cds of clean speech. Distance of successive frames for noisy speech is very low because of noise masing. 4. Results of experiments All discussed techniques were tested on small database of typical speech segments. The database was created from the TIMIT database, i.e. american English database. Experiments with this datbase was realized in MATLAB. Further, the speech was mixed with white noise generated by MATLAB under dierent levels. Four constant noise levels were used approximatelly 4dB, 8dB, db, and;6db computed as long-time global SNR respectively. The results of two basic versions of speech enhancement preprocessors are presented in tab. - algorithm with PSD subtraction (Wiener lter) - and in tab. - algorithm with spectral magnitude subtraction. It is shown that the distance from unbiased solution is less but still relatively high. On the other hand the cepstral distance of successive frames does not increase so much. Averaged value SNR [db] 4 8-6 4 8-6 cd..58.448.76 4.79..894.5.456.547 cd..8.4569.99.5598.49.687.94.87.46 cds.48.646.67.6.69.7..57..5 cds.6.676.97.74.668.77.866.98.956.6 Table : Global results for preprocessing by Wiener ltering Averaged value SNR [db] 4 8-6 4 8-6 cd..58.448.76 4.79..894.5.456.547 cd..555.74.79.686.49.56.778.5.4 cds.48.646.67.6.69.7..57..5 cds.6.4.659.5.769.77.89.98.6.5 Table : Global results for preprocessing by magnitude spectral subtraction The results of experiments with correlation compensation techniques are in tabs. - 5. It is shown that especially the distance od successive frames increased as consequence of mentioned problems. The results of experiments with modeling in the correlation domain are much more better. Especially when we see at g. 5 - the robust solution seems to converge to the unbiased one. Averaged value SNR [db] 4 8-6 4 8-6 cd..58.448.76 4.79..894.5.456.547 cd.7645.75.6589.7.44.5.96.55.484.558 cds.48.646.67.6.69.7..57..5 cds.449.448.7949.985.8.746.5.44.45.985 (c Table : Global results for basic technique 5

Averaged value SNR [db] 4 8-6 4 8-6 cd..58.448.76 4.79..894.5.456.547 cd.98.759.7.7.74.57.597.889.577.4497 cds.48.646.67.6.69.7..57..5 cds.44.58.677.7587.874.49.6.5.48.9 Table 4: Global results for scaled subtraction method Averaged value SNR [db] 4 8-6 4 8-6 cd..5.45.66 4.69..897.77.45.5 cd.7645.597.666.858.4.5.657.684.767.475 cds.48.86.9..67.7.6.9.7.9 cds.449.547.7.5.56.746.6.7544.9576.94 Table 5: Global results for experiments with modeled correlation matrix 5 CONCLUSIONS The problems and comments connected to the particular parts of this study were mentioned above. They can be summarized by following points. However some global criteria did not give the satised results the described approach seems to converge to the unbiased solution. Great dierences between signal in the generated database may deteriorate the results. Correct tests seem to be realized only by the application in the recognition system and by comparison with results for clean, noisy, and robust parametrization technique. The robust parameters gives the unstable model which had to be stabilized with the respect of unit circle. The more detailed analysis of robust parametrization for speech without any noise with the focuse to the unstability of the solution. REFERENCES [] S. F. Boll. Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. on Acoustics, Speech, and Signal Processing, ASSP-7():{, April 979. [] D. Gesbert, P. Duhamel, and M. Unser. Regression lineare en presence de bruit: Une solution adaptative non-biasee. In GRETSI, France, 995. [] J. Mahoul. Linear prediction: A tutorial review. Proceedings of the IEEE, 6(4):56{58, April 975. [4] J. D. MarelandA. H. GrayJr. Linear Prediction of Speech. Springer-Verlag, 976. [5] P. Polla, P. Sova, and J. Uhlr. Noise suppression system for a car. In Proceedings of the rd European Conference on Speech, Communication, and Technology - EUROSPEECH'9, pages 7{76, Berlin, September 99. [6] P. Sova, P. Polla, and J. Kybic. Extended spectral subtraction. In EUSIPCO'96, Trieste, September 996. [7] T.Kreisinger, P. Polla, P. Sova, and J. Uhlr. Study of speech recognition in noisy environment. In The European Conference on Signal Analysis and Prediction, Praha, June 997. 6

4 No. vf4 K(4) Frame.5 4 No. vf4 K(4) Frame.5 4 No. vf4 K(4) Frame.5 Figure : One signal demonstration for basic technique: dashed curve - original clean speech, dash-dotted curve - noisy speech, solid curve - robust modeled noisy speech 4 No. vf4 K() Frame.5 4 No. vf4 K() Frame.5 4 No. vf4 K() Frame.5 Figure 4: Demostration for clean speech and basic technique: dashed curve - original clean speech, solid curve - robust modeled noisy speech 7

4.5 a[] a[].5 4 5 5 5 5 5 4 a[] a[] 5 5 5 4 5 5 Figure 5: Basic technique - predictor coecients over one signal..5.5 rc[] rc[].5.5 5 5 5 5.5 rc[] rc[] 4.5 6 5 5 5 5 Figure 6: Basic technique - reection coecients over one signal..5 c[] c[] 5 5 4.5 5 5 c[] c[] 4 5 5 5 5 Figure 7: Basic technique - cepstral coecients over one signal. 8

4.5 a[] a[].5 4 5 5 5 5 5 4 a[] 5 a[] 5 5 4 5 5 Figure 8: Correlation modeling - predictor coecients over one signal..5.5 rc[] rc[].5.5 5 5 5 5.5 rc[] rc[].5 5 5 5 5 Figure 9: Correlation modeling - reection coecients over one signal..5 c[] c[] 5 5.5 5 5 c[] c[] 5 5 5 5 Figure : Correlation modeling - cepstral coecients over one signal. 9