Lab 9a. Linear Predictive Coding for Speech Processing

Similar documents
Chapter 9. Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析

SPEECH ANALYSIS AND SYNTHESIS

Signal representations: Cepstrum

L7: Linear prediction of speech

M. Hasegawa-Johnson. DRAFT COPY.

Linear Prediction 1 / 41

Timbral, Scale, Pitch modifications

Department of Electrical and Computer Engineering Digital Speech Processing Homework No. 7 Solutions

Keywords: Vocal Tract; Lattice model; Reflection coefficients; Linear Prediction; Levinson algorithm.

Frequency Domain Speech Analysis

Linear Prediction Coding. Nimrod Peleg Update: Aug. 2007

CS578- Speech Signal Processing

Lesson 1. Optimal signalbehandling LTH. September Statistical Digital Signal Processing and Modeling, Hayes, M:

Automatic Speech Recognition (CS753)

Voiced Speech. Unvoiced Speech

Sinusoidal Modeling. Yannis Stylianou SPCC University of Crete, Computer Science Dept., Greece,

Chapter 2 Speech Production Model

Applications of Linear Prediction

Signal Modeling Techniques in Speech Recognition. Hassan A. Kingravi

Speech Coding. Speech Processing. Tom Bäckström. October Aalto University

L8: Source estimation

representation of speech

Design of a CELP coder and analysis of various quantization techniques

Feature extraction 2

Mel-Generalized Cepstral Representation of Speech A Unified Approach to Speech Spectral Estimation. Keiichi Tokuda

Chapter 10 Applications in Communications

Source/Filter Model. Markus Flohberger. Acoustic Tube Models Linear Prediction Formant Synthesizer.

Time-domain representations

c 2014 Jacob Daniel Bryan

Formant Analysis using LPC

Application of the Bispectrum to Glottal Pulse Analysis

Sound 2: frequency analysis

Resonances and mode shapes of the human vocal tract during vowel production

LPC methods are the most widely used in. recognition, speaker recognition and verification

Nearly Perfect Detection of Continuous F 0 Contour and Frame Classification for TTS Synthesis. Thomas Ewender

SPEECH COMMUNICATION 6.541J J-HST710J Spring 2004

THE PROBLEMS OF ROBUST LPC PARAMETRIZATION FOR. Petr Pollak & Pavel Sovka. Czech Technical University of Prague

Feature extraction 1

Thursday, October 29, LPC Analysis

Speech Signal Representations

LECTURE NOTES IN AUDIO ANALYSIS: PITCH ESTIMATION FOR DUMMIES

4.2 Acoustics of Speech Production

Vocoding approaches for statistical parametric speech synthesis

A Spectral-Flatness Measure for Studying the Autocorrelation Method of Linear Prediction of Speech Analysis

Statistical and Adaptive Signal Processing

BASIC COMPRESSION TECHNIQUES

On reducing the coding-delay and computational complexity in an innovations-assisted linear predictive speech coder

L6: Short-time Fourier analysis and synthesis

Voice Activity Detection Using Pitch Feature

Allpass Modeling of LP Residual for Speaker Recognition

The Equivalence of ADPCM and CELP Coding

Z - Transform. It offers the techniques for digital filter design and frequency analysis of digital signals.

Zeros of z-transform(zzt) representation and chirp group delay processing for analysis of source and filter characteristics of speech signals

ADSP ADSP ADSP ADSP. Advanced Digital Signal Processing (18-792) Spring Fall Semester, Department of Electrical and Computer Engineering

Improved Method for Epoch Extraction in High Pass Filtered Speech

Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic Approximation Algorithm

EIGENFILTERS FOR SIGNAL CANCELLATION. Sunil Bharitkar and Chris Kyriakakis

AN INVERTIBLE DISCRETE AUDITORY TRANSFORM

Department of Electrical and Telecommunications Engineering Technology TEL (718) FAX: (718) Courses Description:

Signals and Systems. Problem Set: The z-transform and DT Fourier Transform

NEAR EAST UNIVERSITY

3GPP TS V6.1.1 ( )

ADAPTIVE FILTER THEORY

Course content (will be adapted to the background knowledge of the class):

CEPSTRAL ANALYSIS SYNTHESIS ON THE MEL FREQUENCY SCALE, AND AN ADAPTATIVE ALGORITHM FOR IT.

Parametric Method Based PSD Estimation using Gaussian Window

Lecture 3: Acoustics

Some notes about signals, orthogonal polynomials and linear algebra

Source modeling (block processing)

1. Determine if each of the following are valid autocorrelation matrices of WSS processes. (Correlation Matrix),R c =

ETSI TS V7.0.0 ( )

Estimation of Cepstral Coefficients for Robust Speech Recognition

LPC and Vector Quantization

ETSI TS V ( )

Practical Spectral Estimation

INTRODUCTION Noise is present in many situations of daily life for ex: Microphones will record noise and speech. Goal: Reconstruct original signal Wie

Part III Spectrum Estimation

ISOLATED WORD RECOGNITION FOR ENGLISH LANGUAGE USING LPC,VQ AND HMM

Music Synthesis. synthesis. 1. NCTU/CSIE/ DSP Copyright 1996 C.M. LIU

Fourier Methods in Digital Signal Processing Final Exam ME 579, Spring 2015 NAME

NEW LINEAR PREDICTIVE METHODS FOR DIGITAL SPEECH PROCESSING

Linear Prediction: The Problem, its Solution and Application to Speech

Singer Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers

Andrzej DOBRUCKI, Rafał SICZEK. 1. Introduction

Chirp Decomposition of Speech Signals for Glottal Source Estimation

ETSI TS V5.0.0 ( )

ECE 8440 Unit Applica,ons (of Homomorphic Deconvolu,on) to Speech Processing

Communications and Signal Processing Spring 2017 MSE Exam

ETSI EN V7.0.1 ( )

ETSI EN V7.1.1 ( )

USEFULNESS OF LINEAR PREDICTIVE CODING IN HYDROACOUSTICS SIGNATURES FEATURES EXTRACTION ANDRZEJ ZAK

Sound Recognition in Mixtures

Least Square Es?ma?on, Filtering, and Predic?on: ECE 5/639 Sta?s?cal Signal Processing II: Linear Es?ma?on

CCNY. BME I5100: Biomedical Signal Processing. Stochastic Processes. Lucas C. Parra Biomedical Engineering Department City College of New York

A Levinson algorithm based on an isometric transformation of Durbin`s

INTERNATIONAL TELECOMMUNICATION UNION. Wideband coding of speech at around 16 kbit/s using Adaptive Multi-Rate Wideband (AMR-WB)

MULTI-RESOLUTION SIGNAL DECOMPOSITION WITH TIME-DOMAIN SPECTROGRAM FACTORIZATION. Hirokazu Kameoka

Characterization of phonemes by means of correlation dimension

COMP 546, Winter 2018 lecture 19 - sound 2

3. ESTIMATION OF SIGNALS USING A LEAST SQUARES TECHNIQUE

Transcription:

EE275Lab October 27, 2007 Lab 9a. Linear Predictive Coding for Speech Processing Pitch Period Impulse Train Generator Voiced/Unvoiced Speech Switch Vocal Tract Parameters Time-Varying Digital Filter H(z) Random Noise Generator Block Diagram of simplified model of speech production Figure 1: Sections 0.4 and 0.5 contain the Lab Experiment and Lab Report needed. 0.1 Basic Principles of Linear Predictive Analysis The basic discrete-time model of speech production is shown above. The composite spectral effects of radiation, vocal tract and glottal excitation are represented by a time-varying digital filter. For short periods when parameters are considered stationary, we have a time-invariant system. The steady-state transfer function H(z) of the filter part of the model is modeled as, H(z) = S(z) U(z) = G (1) 1 a 1 z 1) a 2 z 2 a 3 z 3...a p z p The vocal tract system is excited by signal u[n], which will be an impulse train for voiced speech or random noise for unvoiced speech. Thus, the parameters of this speech model are: voiced-unvoiced classification, pitch period for voiced speech, gain parameter G and the coefficients {a k } of the filter. These are the parameters that are transmitted in coded speech. There are many methods for estimation of pitch period and voiced/unvoiced classification. They are not discussed here and actually are not implemented in this Demo. What is implemented is a method for determining filter coefficients ( lattice filter coefficients, referred to as reflection coefficients). It is these filter coefficients that are transmitted along with a residual signal instead of the parameters referred to above. We consider the simplified all-pole model of Figure 1, equation (1) as the natural representation of nonnasal voiced sounds. (For nasals and fricatives, the acoustic theory calls for both poles and zeros in the

vocal tract transfer function H(z)). Actually, if the filter order p is high enough, the all-pole model provides a fairly good representation for almost all the sounds of the speech. The major advantage of the all-pole model is that the gain parameter G and the filter coefficients a k can be estimated in a straightforward and computationally efficient way using the method of linear predictive analysis. 0.2 Linear Predicton Analysis & Synthesis Filters We assume that speech is modeled as shown in Figure 1. The speech s(n) is related to excitation u(n) by s[n] = a k s[n k]+gu[n] (2) To obtain model coefficients, we resort to the following: Assume that you are trying to predict signal s[n] at time n from previous values at times n 1,n 2,...etc.. A linear predictor with prediction coefficients α k is defined as a system whose output is s[n] = α k s[n k] (3) The transfer function of the p th order linear predictor of equation (3) is the polynomial P (z) = α k z k The prediction error e(n) is defined as e[n] =s[n] s[n] =s[n] α k s[n k] (4) Equivalently, where E(z) =A(z)S(z) A(z) =1 α k z k Comparing equations (2) and (4) it is seen that when the speech signal obeys the model of (2) exactly, then α k = a k exactly. Then e[n] =Gu[n] ande(z) =GU(z). Thus the prediction error filter A(z) will be the inverse filter of the system H(z) of(1).thatis, E(z) =GU(z) =A(z)S(z) Hence, H(z) = S(z) U(z) = G A(z) So we have A(z), the analysis filter and H(z), the synthesis filter. The basic problem of linear prediction analysis is to determine the set of predictor coefficients coefficients α k directly from the speech signal. Because of the non-stationary nature of speech, coefficients are determined for short segments of the speech where the signal is considered approximately stationary. These are found through a minimization of the mean-square prediction error. The resulting parameters are then assumed to be the parameters of the system function H(z) which is then used for the synthesis of that speech segment. The method of determining these coeffcients is outlined below.

0.3 Minimum Mean-Square Error and the Orthogonality Principle We consider the linear prediction problem of equation (3) as predicting a random variable from a set of other random variables. Given RVs (x 1, x 2,...,x n ) we wish to find n constants a 1,a 2,a 3,...,a n such that we form a linear estimate of a random variable s by the sum of RVs ŝ = a 1 x 1 + a 2 x 2 +...,+a n x n. (5) This is typically done by assuring that the the mean-square value P = E{ s (a 1 x 1 + a 2 x 2 +...,+x n ) 2 } of the resulting error ɛ = s ŝ = s (a 1 x 1 + a 2 x 2 +...,+x n ) is minimum. We do this by setting P = E{2[s (a 1 x 1 + a 2 x 2 +...,+a n x n )]( x i )} =0 (6) a i which yields the so-called Yule Walker equations: Setting i =1, 2,...,n in equation (6) we get R 11 a 1 + R 12 a 2 +... + R 1n a n = R 01 R 21 a 1 + R 22 a 2 +... + R 2n a n = R 02 R 31 a 1 + R 32 a 2 +... + R 3n a n = R 03... R n1 a 1 + R n2 a 2 +... + R nn a n = R 0n (7) where R ji = E{x i x j } R 0j = E{sx j } If the data x i are linearly independent then the determinant of the coefficients R ij is positive. Equation (7) is solved for the unknown coefficients a k,k =1, 2,...n (α k on the previous page) by using the so-called Levinson-Durbin algorithm. Accordingly, the problem essentially consists of determining, for a short segment of speech, the matrix of correlation coefficients R i,j and then inverting the matrix to obtain the prediction coefficients which are then transmitted. All this often has to be done in real-time. 0.4 MATLAB LPC DEMO Run the Demo as per instructions in Lab 9. Demo Decsription The demo consists of two parts; analysis and synthesis. The analysis portion is found in the transmitter section of the system. Analysis Section: In this simulation, the speech signal is divided into frames of size 20 ms (160 samples), with an overlap of 10 ms (80 samples). Each frame is windowed using a Hamming window. The original speech signal is passed

through an analysis filter, which is an all-zero filter. It is a so-called lattice filter with coefficients referred to as reflection coefficients obtained in the previous step. The output of the filter is called the residual signal. This is what is transmitted here along with the filter coefficients. Here, the analysis section output is simply connected to the synthesis portion. Synthesis Section: This residual signal is passed through a synthesis filter which is the inverse of the analysis filter. The output of the synthesis filter is the original signal. 0.5 LAB REPORT Give a brief description of what exactly is happening in the analysis and synthesis portion of the MATLAB LPC speech analysis and synthesis Demo. Observe the residual signal and filter coefficients generated in the Analysis section that are then transmitted to the synthesis section. Figure 2: Ref: MATLAB Help, Linear Predicting & Coding of Speech.

Class notes:mirchand/ee276-2003