Thursday, October 29, LPC Analysis

Similar documents
Formant Analysis using LPC

Recovering f0 using autocorelation. Tuesday, March 31, 15

The Z-Transform. For a phasor: X(k) = e jωk. We have previously derived: Y = H(z)X

SPEECH ANALYSIS AND SYNTHESIS

Damped Oscillators (revisited)

Chapter 9. Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析

Harmonic motion. time 1 - A. Wednesday, September 30, 2009

Lab 9a. Linear Predictive Coding for Speech Processing

SPEECH COMMUNICATION 6.541J J-HST710J Spring 2004

Signal representations: Cepstrum

Linear Prediction 1 / 41

Department of Electrical and Computer Engineering Digital Speech Processing Homework No. 7 Solutions

Digital Signal Processing Prof. T. K. Basu Department of Electrical Engineering Indian Institute of Technology, Kharagpur

Recursive, Infinite Impulse Response (IIR) Digital Filters:

Z - Transform. It offers the techniques for digital filter design and frequency analysis of digital signals.

z as shift (delay) operator

L7: Linear prediction of speech

2 Background: Fourier Series Analysis and Synthesis

The Equivalence of ADPCM and CELP Coding

Chapter 10 Applications in Communications

Improved Method for Epoch Extraction in High Pass Filtered Speech

Feature extraction 2

Voiced Speech. Unvoiced Speech

Signal Modeling Techniques in Speech Recognition. Hassan A. Kingravi

DIGITAL SIGNAL PROCESSING LABORATORY

representation of speech

Digital Signal Processing: Signal Transforms

David Weenink. First semester 2007

Correlation and Regression

The Impulse Signal. A signal I(k), which is defined as zero everywhere except at a single point, where its value is equal to 1

L6: Short-time Fourier analysis and synthesis

Perturbation Theory: Why do /i/ and /a/ have the formants they have? Wednesday, March 9, 16

4.2 Acoustics of Speech Production

Perturbation Theory: Why do /i/ and /a/ have the formants they have? Tuesday, March 3, 15

Resonances and mode shapes of the human vocal tract during vowel production

Applications of Linear Prediction

George Mason University ECE 201: Introduction to Signal Analysis Spring 2017

COMP 546, Winter 2018 lecture 19 - sound 2

Timbral, Scale, Pitch modifications

Discrete Time Systems

2.161 Signal Processing: Continuous and Discrete

Speech Coding. Speech Processing. Tom Bäckström. October Aalto University

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

Speech Signal Representations

21.4. Engineering Applications of z-transforms. Introduction. Prerequisites. Learning Outcomes

EECE 301 Signals & Systems Prof. Mark Fowler

Source/Filter Model. Markus Flohberger. Acoustic Tube Models Linear Prediction Formant Synthesizer.

Real Sound Synthesis for Interactive Applications

Simple Linear Regression

Introduction to Signal Analysis Parts I and II

Linear Prediction Coding. Nimrod Peleg Update: Aug. 2007

(Refer Slide Time: )

Problem Set 9 Solutions

Digital Baseband Systems. Reference: Digital Communications John G. Proakis

BASIC COMPRESSION TECHNIQUES

Lecture 5: GMM Acoustic Modeling and Feature Extraction

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

Allpass Modeling of LP Residual for Speaker Recognition

Solutions for examination in TSRT78 Digital Signal Processing,

E : Lecture 1 Introduction

LPC and Vector Quantization

Chapter 4: Regression Models

CMPT 889: Lecture 5 Filters

EECE 301 Signals & Systems

Digital Filters. Linearity and Time Invariance. Linear Time-Invariant (LTI) Filters: CMPT 889: Lecture 5 Filters

Applied Regression Analysis

Herndon High School Geometry Honors Summer Assignment

M. Hasegawa-Johnson. DRAFT COPY.

Simple Linear Regression Estimation and Properties

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Chapter 6. Systems of Equations and Inequalities

Lesson 9 Exploring Graphs of Quadratic Functions

Digital Signal Processing Lab 4: Transfer Functions in the Z-domain

Automatic Speech Recognition (CS753)

Zeros of z-transform(zzt) representation and chirp group delay processing for analysis of source and filter characteristics of speech signals

BME 50500: Image and Signal Processing in Biomedicine. Lecture 5: Correlation and Power-Spectrum CCNY

Lecture 2. The Simple Linear Regression Model: Matrix Approach

BNAD 276 Lecture 10 Simple Linear Regression Model

Sound 2: frequency analysis

1. Calculation of the DFT

Spectral and Textural Feature-Based System for Automatic Detection of Fricatives and Affricates

Statistical methods. Mean value and standard deviations Standard statistical distributions Linear systems Matrix algebra

Principal Component Analysis

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Section Least Squares Regression

6.6 General Form of the Equation for a Linear Relation

Sinusoidal Modeling. Yannis Stylianou SPCC University of Crete, Computer Science Dept., Greece,

It is common to think and write in time domain. creating the mathematical description of the. Continuous systems- using Laplace or s-

(i) Understanding the characteristics and properties of DTFT

Signal Processing First Lab 11: PeZ - The z, n, and ˆω Domains

DSP First. Laboratory Exercise #10. The z, n, and ˆω Domains

L8: Source estimation

8.4. Systems of Equations in Three Variables. Identifying Solutions 2/20/2018. Example. Identifying Solutions. Solving Systems in Three Variables

III. The STM-8 Loop - Work in progress

PHY321 Homework Set 2

Simple Linear Regression Using Ordinary Least Squares

EEO 401 Digital Signal Processing Prof. Mark Fowler

Using regression to study economic relationships is called econometrics. econo = of or pertaining to the economy. metrics = measurement

Intro to Linear Regression

Chapter 4 Discrete Fourier Transform (DFT) And Signal Spectrum

Transcription:

LPC Analysis

Prediction & Regression We hypothesize that there is some systematic relation between the values of two variables, X and Y. If this hypothesis is true, we can (partially) predict the observed values of one (Y) from observations of the other. When we do this we say we regress the values of the independent variable, Y, onto the dependent variable, X.

Linear Regression The simplest form of systematic relationship between 2 variables is a linear one. This means that the predicted values of Y fall on a straight line, which can be characterized by a slope (b) and a Y- intercept (I). Ypred = b*x + I Res = Y - Ypred Y = b*x + I + Res

Example: predicting coffee preferences Suppose I want to learn how to maximize my coffee pleasure. I can rate my morning coffee on a scale of 1-7 and record some the values of some potenial predictor variables like: days since ROAST hours since GRINDing DARKness of roast Then I can see how well these variables predict my coffee preferences.

Regression Results: ROAST >> plotreg (roast, rating); Ypred

What is RSQ? RSQ = R 2, where R =correlation coefficient, which is a measure of the strength of the association between the two variables. Also equal to the percentage of the variance of Y that is explained by the variance in X. RSQ = variance of Ypred / variance of Y

Displaying Regression Results function [b res rsq] = plotreg (X,Y) % Louis Goldstein % 27 October 2009 % % input arguments: % X = independent variable column vector % Y = dependent variable column vector % % output arguments: % b = vector of regressions coefficients % one for each column of X, plus a constant % Res = vector of residuals % Rsq = R^2 [b res rsq] = regress (X,Y); % calculate line showing Y-Pred for range of X values xline = linspace(0,max(x),100); yline = b(1)*xline + b(2); % plot data points and regression line plot (X,Y, 'or', xline,yline,'b'); title ([' slope = ', num2str(b(1)), '; RSQ = ', num2str(rsq)]);

How does regression work? (1) Y = bx + I Matrix form: (2) Y = b*x X is a matrix = [Xdata ones]; b is a column vector, one entry for each col of X. (3) X -1 *Y = X -1 *b*x (4) X -1 *Y = b (5) b = X\Y

Regression in Matlab function [b, res, Rsq] = regress (X,Y) % Append a vector of ones to X % The coefficient associated with this vector is the constant. N = length(x); X = [X ones(n,1)]; % Do the regression b = X\Y; % Calculate residuals Ypredicted = X * b; res = Y - Ypredicted; % Calculate Rsq SSreg = (std(ypredicted).^ 2).* (N-1); SStot = (std(y).^ 2).* (N-1); Rsq = SSreg/SStot;

>> plotreg (grind, rating); Results: GRIND Reduced slope Reduced RSQ

>> plotreg (dark, rating); Results: DARK Reduced slope Reduced RSQ

Multiple Regression If we want as good a prediction as possible of the dependent variable, we could attempt to fit a single linear equation to all the possible X- variables: Y = b1x1 + b2x2 + b3x3 +... + bm XM + K + Res b coefficients will be related to b coefficients of individual regressions (the will be equal, if variables are uncorrelated). RSQ will always be higher as variables are added, not every one will result in a significant increase.

Coffee multiple regression [b Res RSQ] = regress([roast grind dark], rating); b coefficients: ROAST -0.3597 GRIND -0.1602 DARK 0.1402 RSQ = 0.8946

Linear Prediction of Speech Because speech is often periodic (with a variety of frequencies), there will be predictability (correlation) between sample points at different lags. We can do multiple regression of a sample speech samples, attempting to predict their values from the values of the M previous sample points: Y(k) = a(1)*y(k-1) + a(2)*y(k-2) +... + a(m)y(k-m) + Res We perform this over a window of data during which correlations (and frequency components) are assumed not to change (5-10 ms).

Matrix Construction Example: M=5, window = 20 points Y = -0.0023-0.0033-0.0027-0.0016-0.0024-0.0036-0.0029-0.0031-0.0048-0.0056-0.0070-0.0090-0.0137-0.0096-0.0078-0.0109-0.0101-0.0084-0.0071-0.0073 X = -0.0036-0.0043-0.0038-0.0023-0.0023-0.0023-0.0036-0.0043-0.0038-0.0023-0.0033-0.0023-0.0036-0.0043-0.0038-0.0027-0.0033-0.0023-0.0036-0.0043-0.0016-0.0027-0.0033-0.0023-0.0036-0.0024-0.0016-0.0027-0.0033-0.0023-0.0036-0.0024-0.0016-0.0027-0.0033-0.0029-0.0036-0.0024-0.0016-0.0027-0.0031-0.0029-0.0036-0.0024-0.0016-0.0048-0.0031-0.0029-0.0036-0.0024-0.0056-0.0048-0.0031-0.0029-0.0036-0.0070-0.0056-0.0048-0.0031-0.0029-0.0090-0.0070-0.0056-0.0048-0.0031-0.0137-0.0090-0.0070-0.0056-0.0048-0.0096-0.0137-0.0090-0.0070-0.0056-0.0078-0.0096-0.0137-0.0090-0.0070-0.0109-0.0078-0.0096-0.0137-0.0090-0.0101-0.0109-0.0078-0.0096-0.0137-0.0084-0.0101-0.0109-0.0078-0.0096-0.0071-0.0084-0.0101-0.0109-0.0078

Interpretation of Regression (1) Y = b1x1 + b2x2 + b3x3 +... + bm XM + K + Res (2) Y = a1x1 + a2x2 + a3x3 +... + am XM + K + Res Residual is the signal after periodic short-time resonances (due to the vocal tract are removed), so it represents what is not predictable by a short-time filter, which are the pulses of the source. By re-arranging (2), we can create an inverse filter that takes a speech sample as input and produces a sequence of E pulses (ignore K). (3) E = Res = Y -a1x1 -a2x2 -a3x3 +... -am XM

Inverse and Forward Filters The inverse filter thus has the following form: A(z) = 1 -a1z -1 -a2z -2 -a3z -3 +... -am z -M E = A(z)*Y But we can also rearrange this to derive the forward filter, that will generate the output (Y) from E and the filter: Y = E/A(z) A coefficients now in the denominator

function [acoef, Res] = lpc_demo (signal, srate, winsize, M, ibeg) % Input arguments: % signal vector containing time series data % srate sample rate in Hz % winsize window size in number of samples % M number of coefficients % ibeg beginning sample number % iend is last sample in window iend = ibeg+winsize-1; % Y is a vector of winsize data points from signal % Y is the dependent variable for the regression Y = signal(ibeg:iend); % matrix X contains the independent variables for the regression. % It contains M columns, each of which contains % elements of signal (y) delayed progressively by one sample for i=1:m X(:,i) = signal(ibeg-i:iend-i); end % perform the regression % the coefficients are the weights that are applied to each % of the M columns in order to predict Y. % Since the columns are delayed versions of Y, % the weights represent the best prediction of Y from the % previous M sample points. [acoef, Res] = regress(x,y);

% The predicted signal is given by subtracting the residual % from Y Pred = Y - Res; % Plot original window and predicted subplot (211), plot (signal(ibeg:iend)) title ('Original Signal') subplot (212), plot (Pred) title ('Predicted Signal') subplot pause % Plot original window and residual subplot (211), plot (signal(ibeg:iend)) title ('Original Signal') subplot (212), plot (Res) title ('Residual') subplot pause % Get rid of the M+1st term returned by the regression. % It just represents the mean (DC level of the signal). acoef(m+1) = [ ];

% The returned coefficients (acoef) can be used to predict % Y(k) from preceding M samples: % E1: Y(k) = a(1)*y(k-1) + a(2)*y(k-2) +... + a(m)y(k-m) + Res % We want what is called the INVERSE filter, which will % take Y as input and output the Res signal. % That is, it will take all the structure out of the signal % and give us an inpulse sequence. This can be obtained from % equation E1 by moving Y(k) and Res to the other side of E1: % E2: Res = 1*Y(k) - a(1)*y(k-1) - a(2)*y(k-2) -... - a(m)*y(k-m) % Thus, the first coefficient of the inverse filter is 1; % The rest are the negatives of the coefficients returned % Note -acoef' is used to convert from a column to row vector: acoef = [1 -acoef']; % Plot magnitude of transfer function % for the INVERSE FILTER % freqz will return value of the transfer function, h, % for a given filter numerator and denominator % at npoints along the frequency scale. % The frequency of each point (in rad/sample) is returned in w. num = acoef; den = 1; npoints = 100; [h, w] = freqz(num,den,npoints); plot (w*srate./(2*pi), 20*log10(abs(h))) grid xlabel ('Frequency in Hz') ylabel (' Magnitude of Transfer Function') title (['INVERSE FILTER: M = ', num2str(m), ' winsize = ', num2str(winsize),' Beginning sample = ', num2str(ibeg)]) pause

% Plot magnitude of transfer function if the coefficients are % used as coefficients of the FORWARD filter (in denominator). % freqz will return value of the transfer function, h, % for a given filter numerator and denominator % at npoint along the frequency scale. % The frequency of each point (in rad/sample) is returned in w. num = 1; den = acoef; npoints = 100; [h, w] = freqz(num,acoef,npoints); plot (w*srate./(2*pi), 20*log10(abs(h))) grid xlabel ('Frequency in Hz') ylabel ('Log Magnitude of Transfer Function') title (['FORWARD FILTER: M = ', num2str(m), ' winsize = ', num2str(winsize),' Beginning sample = ', num2str(ibe

Example F 290 2070 2960 3300 3750 BW 60 200 400 250 200 >> iy=syn_noplot (10000,F,BW,100,.5);

inverse filter forward filter >> Frec=formants (acoef, 10000) Frec = 1.0e+03 * 0.2924 2.0714 2.9626 3.3014 3.7507 F 290 2070 2960 3300 3750

Choice of M We saw that a digital resonator (formant) can be represented with 2 feedback filter coefficients. So M needs to be at least equal to 2*NF, where NF is the expected number of formants, given the length of the VT and SR. Example: For a VT of 17 cm, formants are spaced 1000Hz apart, on average. So for a sr of 10000, there will be 5 formants or 10 coefficients M = sr/1000 But the effect of the glottal resonator and other things (anti-resonance) need an additional 4 coefficients M = (sr/1000) + 4, and subtract 2 for a somewhat shorter vocal tract

F 290 2070 2960 3300 3750 F 0.2924 2.0714 2.9626 3.3014 3.7507 M=12 F 0.2655 2.2215 3.1564 3.6343 F 0.2922 2.0698 2.9582 3.2986 3.7496 M=10 M=14

I, yo-yo, wow, why M=36 M=48 M=60

LPC of running speech: windowing We can measure how LPC parameters and the F and BW values we calculate from them change over the course of an utterance. Analysis frames need to be long enough to include at least one full cycle of the lowest VT resonance, and better if it is somewhat longer than that. F1 in low vowels is 200 Hz, which would make 5ms the minimum duration. We also want the formants to smoothly (except near stop closures and releases), and not to show jitter from one frame to frame. So the longer window, the better. But we also want to track meaningful VT changes smoothly, and if the window is too long, we could get discrete jumps from one frame to the next. Solution: Use a relatively long windows (e.g., 20 ms), but slide the windows through time at a faster rate (e.g., 10 ms), so successive windows overlap

Example We also want to weight the points near the middle of window more heavily in regression (reducing the effect of seeing the window s frequencies in the output). So we multiply the signal in a window by a hamming weighting function.

function [A, G] = get_lpc (signal, srate, M, window, slide) % % get_lpc % Louis Goldstein % March 2005 % % Output arguments: % A filter coefficients (M+1 rows x nframes columns) % G Gain coefficients (vecor length = nframes) % % input arguements: % signal signal to be analyzed % srate sampling rate in Hz % M LPC order (def. = srate/1000 +4) % window size of analysis window in ms (def. = 20) % slide no. of ms. to slide analysis window for each frame (def. = 10) if nargin < 3, M = floor(srate/1000) + 4; end if nargin < 4, window = 20; end if nargin < 5, slide = 10; end samp_tot = length(signal); samp_win = fix((window/1000)*srate); samp_slide = fix((slide/1000)*srate); nframes = floor(samp_tot/samp_slide) - ceil((samp_win-samp_slide)/samp_slide); A = []; G = []; for i = 1:nframes begin = 1 + (i-1)*samp_slide; [Ai,Gi] = LPC (hamming(samp_win).*signal(begin:begin+samp_win-1),m); A = [A Ai']; G = [G Gi]; end

LPC synthesis The LPC filter coefficients can be used to re-synthesize the speech (with an impulse source). This can be used to compress the data required to represent speech. But also can be used to independently manipulate, f0, formants, intensity. Original Re-synthesized

Synthesis method (1) Produce impulse train (2) Take frames of samples and filter through the M-order filter defined by the LPC coefficients (3) Concatenate frames (as in syn1).

Formant Analysis Plot waveform, Excitation (Gain), Formants Use gain to zero formant display during silence. M=48