A Spectral-Flatness Measure for Studying the Autocorrelation Method of Linear Prediction of Speech Analysis

Similar documents
SPEECH ANALYSIS AND SYNTHESIS

Speech Signal Representations

Signal representations: Cepstrum

L7: Linear prediction of speech

Chapter 9. Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析

Feature extraction 2

Linear Prediction 1 / 41

Time-domain representations

representation of speech

Voiced Speech. Unvoiced Speech

Linear Prediction Coding. Nimrod Peleg Update: Aug. 2007

Lab 9a. Linear Predictive Coding for Speech Processing

Nonparametric and Parametric Defined This text distinguishes between systems and the sequences (processes) that result when a WN input is applied

Optimal Speech Enhancement Under Signal Presence Uncertainty Using Log-Spectral Amplitude Estimator

New Statistical Model for the Enhancement of Noisy Speech

Automatic Speech Recognition (CS753)

Feature extraction 1

Ill-Conditioning and Bandwidth Expansion in Linear Prediction of Speech

CEPSTRAL ANALYSIS SYNTHESIS ON THE MEL FREQUENCY SCALE, AND AN ADAPTATIVE ALGORITHM FOR IT.

Signal Modeling Techniques in Speech Recognition. Hassan A. Kingravi

Zeros of z-transform(zzt) representation and chirp group delay processing for analysis of source and filter characteristics of speech signals

Voice Activity Detection Using Pitch Feature

Department of Electrical and Computer Engineering Digital Speech Processing Homework No. 7 Solutions

Acoustic holography. LMS Test.Lab. Rev 12A

Design Criteria for the Quadratically Interpolated FFT Method (I): Bias due to Interpolation

Mel-Generalized Cepstral Representation of Speech A Unified Approach to Speech Spectral Estimation. Keiichi Tokuda

Topic 6. Timbre Representations

System Identification and Adaptive Filtering in the Short-Time Fourier Transform Domain

L8: Source estimation

Speech Coding. Speech Processing. Tom Bäckström. October Aalto University

Source/Filter Model. Markus Flohberger. Acoustic Tube Models Linear Prediction Formant Synthesizer.

3GPP TS V6.1.1 ( )

Class of waveform coders can be represented in this manner

Independent Component Analysis and Unsupervised Learning

Lecture 7: Feature Extraction

Signal Modeling Techniques In Speech Recognition

Allpass Modeling of LP Residual for Speaker Recognition

LECTURE NOTES IN AUDIO ANALYSIS: PITCH ESTIMATION FOR DUMMIES

Lecture 4 - Spectral Estimation

MATRICES. a m,1 a m,n A =

Modeling Prosody for Speaker Recognition: Why Estimating Pitch May Be a Red Herring

Sinusoidal Modeling. Yannis Stylianou SPCC University of Crete, Computer Science Dept., Greece,

Lecture Notes 5: Multiresolution Analysis

David Weenink. First semester 2007

Signal Processing COS 323

The analytic treatment of the error probability due to clipping a solved problem?

Department of Electrical and Computer Engineering Digital Speech Processing Homework No. 6 Solutions

SPEECH COMMUNICATION 6.541J J-HST710J Spring 2004

Sound 2: frequency analysis

ETSI TS V7.0.0 ( )

Adaptive Filter Theory

SINGLE-CHANNEL SPEECH PRESENCE PROBABILITY ESTIMATION USING INTER-FRAME AND INTER-BAND CORRELATIONS

The Equivalence of ADPCM and CELP Coding

Time-Delay Estimation *

TinySR. Peter Schmidt-Nielsen. August 27, 2014

M. Hasegawa-Johnson. DRAFT COPY.

Log Likelihood Spectral Distance, Entropy Rate Power, and Mutual Information with Applications to Speech Coding

Feature Extraction for ASR: Pitch

Improved Method for Epoch Extraction in High Pass Filtered Speech

On 1.9, you will need to use the facts that, for any x and y, sin(x+y) = sin(x) cos(y) + cos(x) sin(y). cos(x+y) = cos(x) cos(y) - sin(x) sin(y).

Frequency Domain Speech Analysis

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien

Robust Speaker Identification

Speech Spectra and Spectrograms

EFFECTS OF ILL-CONDITIONED DATA ON LEAST SQUARES ADAPTIVE FILTERS. Gary A. Ybarra and S.T. Alexander

MULTI-RESOLUTION SIGNAL DECOMPOSITION WITH TIME-DOMAIN SPECTROGRAM FACTORIZATION. Hirokazu Kameoka

Statistical and Adaptive Signal Processing

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan

AN ALTERNATIVE FEEDBACK STRUCTURE FOR THE ADAPTIVE ACTIVE CONTROL OF PERIODIC AND TIME-VARYING PERIODIC DISTURBANCES

NOISE reduction is an important fundamental signal

Application of the Tuned Kalman Filter in Speech Enhancement

A comparative study of LPC parameter representations and quantisation schemes for wideband speech coding

LINEAR-PHASE FIR FILTERS DESIGN

Real-Time Pitch Determination of One or More Voices by Nonnegative Matrix Factorization

Time-varying quasi-closed-phase weighted linear prediction analysis of speech for accurate formant detection and tracking

Pitch Prediction Filters in Speech Coding

VALLIAMMAI ENGINEERING COLLEGE. SRM Nagar, Kattankulathur DEPARTMENT OF INFORMATION TECHNOLOGY. Academic Year

Chapter 7: Filter Design 7.1 Practical Filter Terminology

Timbral, Scale, Pitch modifications

DATA COMPRESSION OF LINEAR PREDICTION (LP) ANALYZED SPEECH

where =0,, 1, () is the sample at time index and is the imaginary number 1. Then, () is a vector of values at frequency index corresponding to the mag

ENGR352 Problem Set 02

REAL-TIME TIME-FREQUENCY BASED BLIND SOURCE SEPARATION. Scott Rickard, Radu Balan, Justinian Rosca. Siemens Corporate Research Princeton, NJ 08540

Applications of Linear Prediction

A Diagnostic for Seasonality Based Upon Autoregressive Roots

Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic Approximation Algorithm

Lecture 5: GMM Acoustic Modeling and Feature Extraction

Reverberation Impulse Response Analysis

UCSD SIO221c: (Gille) 1

Econ 300/QAC 201: Quantitative Methods in Economics/Applied Data Analysis. 17th Class 7/1/10

Noise Reduction. Two Stage Mel-Warped Weiner Filter Approach

Bayesian Estimation of Time-Frequency Coefficients for Audio Signal Enhancement

Lecture 3 - Design of Digital Filters

ECE 8440 Unit Applica,ons (of Homomorphic Deconvolu,on) to Speech Processing

Machine Recognition of Sounds in Mixtures

Syllabus for IMGS-616 Fourier Methods in Imaging (RIT #11857) Week 1: 8/26, 8/28 Week 2: 9/2, 9/4

Single Channel Signal Separation Using MAP-based Subspace Decomposition

AN INVERTIBLE DISCRETE AUDITORY TRANSFORM

Chirp Decomposition of Speech Signals for Glottal Source Estimation

Estimation of Cepstral Coefficients for Robust Speech Recognition

Transcription:

A Spectral-Flatness Measure for Studying the Autocorrelation Method of Linear Prediction of Speech Analysis Authors: Augustine H. Gray and John D. Markel By Kaviraj, Komaljit, Vaibhav

Spectral Flatness Spectral Flatness is a measure of the noisiness/ sinusoidality of a spectrum. For tonal signals it is close to 0 and for noisy signals it is close to 1. SFM = 0 SFM = 0.51

Define log spectrum of time sequence Normalized log spectrum of time sequence is given by Where r e (0) denotes energy of time sequence, given by 1. A perfectly flat or constant spectrum will yield a normalized spectrum of zero. 2. Nonzero values for V will represent deviations from a flat or constant spectrum.

Measuring deviations from flatness Let s consider a couple of such measures: η Weights -ve and +ve excursions of the normalized log spectrum equally μ Weighs the +ve excursions more heavily and the -ve excursions less heavily

μ OR η? As the peaks of speech log spectra (more precisely, the formants) play a more important role than do the valleys in the perception of speech, it would be preferable to use an integrand that is not symmetric, but more heavily weighs the positive excursions of V than the negative excursions. Thus, μ has the desired properties.

μ(e) As V(θ) represents the normalized log spectrum of the signal, the average of e V will be unity. Thus, our initial representation may be simplified and represented as: Thus, - μ ( E ) with a factor of 2 represents the zeroth quefrency of the cepstrum and exp [ - μ (E) ] represents the ratio of the geometric to arithmetic means of the spectrum.

Ƶ(E) the Spectral Flatness Measure With this normalization the spectral-flatness measure, Ƶ(E) will lie between 0 and 1, and equal 1 for a perfectly flat spectrum.

Spectral-Flatness Transformations X(z) E(z) = X(z) A M (z) Residue calculus can be applied to show that = 0 Hence we obtain the result

Log spectrum of X(z) Using the previous result, and the definition of the measure of spectral flatness, Ƶ(E) = Ƶ(X) r x (0) / r e (0) ------- To find Filter Coefficients: The results are described in terms of what are often called the k-parameters, k 0, k 1,, k M-l, whose basic property is that each is less than unity magnitude. and for m = 0, l,,m 1, which recursively leads to (i) Using these results, we may re-write Equation (i) as

Log spectrum of X(z) continued It has been shown that the energy of the time sequence associated with α M 1/2 /A M (z) is equal to r x (0) = α 0, so that Thus, one can decompose the log spectrum of X(z) in terms of both the log spectra of E(z) and 1/A M (z), and the spectralflatness measure as was determined earlier.

SPECTRAL-FLATNESS OF TWO DRIVING FUNCTION MODELS Unvoiced Driving Function Model Driving function: Uncorrelated Gaussian noise. Spectral Flatness measure has an expected value of roughly exp (- ϒ) or -2.5 db.(ϒ is Euler s constant) Voiced Driving Function Model Driving function: Set of L + 1 equally spaced samples where, y l is the first sample in the time window, y Z + lp is the last, and P represents a pitch period Spectral Flatness Є [(L!) 2 /(2L)!, 1] Eg. For two samples (L = 1) the measure lies between 1/2 and 1, or -3 db and 0 db If all samples have the same size, then Spectral-Flatness measure equals l / (L + 1)

WILL THE REST FOLLOW SOON Voiced Portion Unvoiced Portion Spectogram of utterance Will the rest follow soon.

Rectangular Window, F s = 6.5 khz M = 8, N = 128 Spectral flatness measures at the input to the inverse filter 10log 10 Ƶ(X) Hamming Window, F s = 6.5 khz M = 8, N = 128 Hamming Window, F s = 13 khz Spectral flatness measures at the outputs 10log 10 Ƶ(E) Hamming Window & Twice the Window Length, F s = 6.5 khz M = 16, N = 256 M = 8, N = 256 Each data window has N points. Thus, time windows have length N/F S = NΔt

Conclusions from the figure Spectral-flatness measure of the inverse filter output varies far less than input s. During unvoiced portions, the theoretical model predicted an average level of -2.5 db which compares well with the experimental results. During voiced portions, the theoretical model predicted a wide range of values, [(L!) 2 /(2L)!, 1] which compares well with the experimental results. 5 (d): the number of pitch periods per analysis window is twice the spectral-flatness measure is decreased. 5(a) and 5(b): Spectral-flatness measure is decreased during voiced portions by the use of a Hamming window. 5(b) and 5(c): Increasing the sampling rate reduces spectral-flatness measure of voiced sounds.

SPECTRAL FLATNESS AND ILL CONDITIONING 1. It takes more accuracy to analyze voiced sounds than unvoiced. 2. The use of a Hamming or Hanning window increases the amount of computational accuracy needed. 3. Increasing the sampling rate increases the amount of computational accuracy needed. 4. Proper pre-emphasis or pre-whitening can decrease the amount of computational accuracy needed.

Measure of Ill Conditioning Ill conditioning may be quantified by ρ M : This expression for ρ M indicates that ρ M represents the geometric mean of the decreasing sequence (α m / α 0 ). It will decrease from 1, for M = 1, and approach the limiting value The spectral-flatness measure is thus both a lower bound and a limiting value of ρ M and is ITSELF a measure of ill-conditioning. Value 1 Perfectly Conditioned Problem Value 0 Singular Problem

PREEMPHASIS OF THE SPEECH DATA Differencing the input speech signal accents the higher formants, thus allowing more accurate formant tracking results. One approach to preemphasis is to utilize a low order inverse filter and maximize the spectral flatness of its output. Proposed is a simple first order preemphasis filter to do the purpose.

First Order Preemphasis Pre-emphasis Filter Form: 1 - µz -1. Where µ = r s (1) / r s (0), r s (n) autocorrelation sequence for the input sequence data {s n }. If {f n } is the time sequence of the preemphasis filter output then Ξ (F) = Ξ (S) r s (0)/ r f (0), and a direct evaluation gives r f (0) as: Ξ (F) max = Ξ (S) / (1 - µ 2 ) r f (0) = (1 +µ 2 ) r s (0) 2.µ.r s (l)

Experimental Results Spectral Flatness with Preemphasis (a) Rectangular Window, (b) Hamming Window.

Conclusions A spectral-flatness measure has been developed: numerical value from 0 to 1. Perfectly flat or constant spectrum has a flatness of 0 db. The lower the spectral flatness the more illconditioned the problem. Pre-emphasis of the speech signal by means of a one-term linear predictor was shown to greatly enhance the spectral flatness of the signal.