The Z-Transform For a phasor: X(k) = e jωk We have previously derived: Y = H(z)X That is, the output of the filter (Y(k)) is derived by multiplying the input signal (X(k)) by the transfer function (H(z)). However, this equation holds only where X(k) is a phasor. It is not true in the general case. Why not?
The Z-Transform (cont.) If X is unit step, and H(z) = z -1... -2-1 0 1 2 3... Multiplying by (z) should delay X by one sample. but H(z)X = e -jω which is not just a shift of X by one sample
The Z-Transform (cont.) Y = H(z)X It is possible to devise a transformation of a time sequence for which this equation will always hold. That is the z-transform. if and and X*(z) is the z-transform of X(k) Y*(z) is the z-transform of Y(k) H(z) is the transfer function of a filter then Y*(z) = H(z) X*(z) (2) So, what s useful about the z-transform is that it lets us derive the output of filtering by multiplying the input by the transfer function.
What is the Z-Transform? The z-transform of a signal is derived by taking each sample value in sequence as the coefficient of successive negative powers of a complex variable (z), and adding the terms together: X*(z) = X(0) z -0 + X(1)z -1 + x(2) z -2 +...
What is the Z-Transform? X*(z) = X(0) z -0 + X(1)z -1 + x(2) z -2 +... Thus, the z-transform is a polynomial in negative power of z, just as is H. Note that the z-transform is itself a function of z. It is not a number. It is a function, which, if you give it a value for z (a point in the complex plane), it will return a value for X*(z).
What is the Z-Transform? Multiplying the z-transform of a signal by z -l will be equivalent to shifting the signal one sample later in time. This was a property of z -l that we illustrated for phasors, but once a signal is transformed into the z-domain, it will work for any signal. X*(z) = X(0) z -0 + X(1)z -1 + x(2) z -2 +... H(z) = z -1 Y*(z) = X(0)z -1 +X(1)z -2 +x(3)z -3 Y*(z) is the z-transform of X delayed by one sample Y*(z) = X(k-1)*(z)
... -2-1 0 1 2 3...... -1 0 1 2 3 4... X(k) X delayed by one sample Y(k) X*(z) = X(0) z -0 + X(1)z -1 + x(2) z -2 +... Y*(z) = Y(0) z -0 + Y(1) z -1 + Y(2) z -2 +... = X(-1) z -0 + X(0) z -1 + X(2) z -2 +...
Discrete Fourier Transform (DFT) The values of X*(z) around the unit circle have a special meaning. These are the values for which z=e jω. The value of the z-transform at any point along the circle (X*(e jω )) for a particular value of ω then tells us something about that frequency content in the signal: Magnitude(X*(e jω )) Amplitude of that frequency. Arg(X*(e jω )) Phase of that frequency. This evaluation of X*(e jω ) around the unit circle is referred to as the discrete fourier transform (DFT) of the signal.
Discrete Fourier Transform (DFT) Why does the evaluation of the z-transform around the unit circle have this "special" property? Consider how similar this is to calculating the frequency response of a filter. A moving average filter can be considered a time waveform: it is the impulse response of the filter. Its coefficients also can be considered coefficients of a polynomial in z. To get the frequency response: -we simply chose a set of test frequencies (a vector ω), -computed z from these (=e jω ), -evaluated the polynomial (for a given ω) by summing the terms of the polynomial. To get the frequency content: -consider successive points to be the coefficients of successive powers of z. -chose a set of test frequencies (a vector ω) -compute z from these (=e jω ), -evaluate the polynomial (for a given ω) by summing the terms of the polynomial. In either case: The magnitude of this complex number corresponded to the amplitude response, while the argument corresponded to the phase response.
DFT vs. Frequency Response There is an important way in which computing the DFT of a signal differs from computing the frequency response of a filter. Frequency Response: allows an arbitrary number of test frequencies DFT: limited to N frequencies (distributed equally around the entire unit circle, where N is the number of points in the signal being analyzed These N frequencies have the property that if we add sinusoids at those N frequencies with the amplitudes and phases calculated in the DFT, then we reconstruct the original signal exactly.
Discrete Fourier Transform (DFT) Since these N frequencies go all the way around the unit circle, half of them are redundant. The useful part of the unit circle (0 to π radians) is thus divided into (N/2) + 1 frequencies. π/2 π 0
function dft(x) % first find the number of points N = length(x); % fill a vector of test frequencies to plot % Use N frequencies evenly spaced between 0 and 2 pi radians w = linspace(0,2*pi-(2*pi)/n,n); % Compute a vector of z from w; % This will result in a vector of z's z = exp(j*w); % Create a vector F that will eventually contain the dft. % There will be N elements in this vector. % First set all the elements of this vector to 0 F = zeros(1,n);
% Now compute F by summing the terms of the polynomial in z for n = 0:N-1 F = F + (x(n+1).* (z.^ n)); end % Plot the magnitude of F figure(1) w = w/pi; stem (w,abs(f)) xlabel ('frequency in fractions of pi') ylabel ('amplitude') pause % Plot the argument of F figure(2) plot (w,angle(f)) xlabel ('frequency in fractions of pi') ylabel ('phase')
>> srate = 10000; >> f600= phasor(srate,600); >> dft(f600(1:50)); Example 0 2500 5000 2500 0 Hz
Output spectrum from 0 Hz to the Nyquist frequency. There is a non-zero frequency component only at the frequency of the phasor, 600Hz.
>> srate = 10000; >> f200= phasor(srate,200); >> dft(f200(1:50)); Example 0 2500 5000 2500 0 Hz
Example >> f26=f200+f600 >> dft(f26(1:50)); 0 2500 5000 2500 0 Hz
Frequency Resolution Since the there are N equally spaced frequencies returned by the DFT between 0 Hz and srate, the frequency spacing of the fourier components can be calculated as follows: fint = srate/n Since the duration (in seconds) of the analyzed signal is equal to N/srate, it is also true that: fint = 1/signal_duration So if take a 10ms dft, we will have a frequency resolution of 100 Hz.
Leakage If we analyze a 500 Hz phasor using the same analysis conditions, we get a suprising result: >> f500= phasor(srate,500); >> spectrum(f500(1:50), srate) Where do all these non-zero frequencies come from? They are called leakage.
Leakage The problem is that fourier analysis assumes that the section of signal analyzed repeats infinitely both directions in time. If the analyzed signal contains a fractional period, then duplicating the analyzed signal indefinitely in time produces a signal with transients 50 samples of 500 Hz signal repeated
Windowing To get around this problem, we multiply the sample to be analyzed by a windowing function, which eliminates the transients. hamming hanning
Windowing: The Signal Unwindowed Signal Windowed Signal (Hanning) >>plot ([hanning(50)'.*f500(1:50)], 'o-');
Windowing: The Spectrum unwindowed spectrum windowed spectrum >>spectrum (hanning(50)'.*f500(1:50),srate)
Windowing: Note Note that windowing cannot completely solve the problem. For one thing, there is no fourier component at 500 Hz when N=50, only components at 400 Hz and 600 Hz. Not all of the other leakage components have been completely eliminated.
More points... >> spectrum (f500(1:100),srate)
Zero-Padding There are cases when we want to use a very small time window to analyze a signal. If a signal that is changing over time, we do not want to use a window whose duration includes significant change in the frequency, amplitude, and phase. Consider the problem of formant frequency analysis. If we analyze a segment of a speech signal (e.g., a segment of the vowel in a synthetic [da] syllable), and we use a 25ms time window, we will get frequency components that are 40Hz apart.
Zero-Padding This means that individual harmonics of the voiced source will be resolved. *Note that it is usual to plot a spectrum on a decibel (db) scale, which is a log scale. >> spectrum(hamming(250).*da(2501:2750),10000)
Zero-Padding We may want to find single peaks in the spectrum corresponding to the vocal tract resonances. In this case, we should analyze a shorter time window. Formant resonances are, in fact, on a much faster time scale than the voiced source. However, if we reduce the number of input points to 40, we will only get 40 frequencies. >> spectrum(hamming(40).*da(2501:2540),10000)
Zero-Padding To solve this, we can pad an input signal with extra zeros. This means we can get the value of more frequencies, and the spectrum will look smoother. F1 F2 F3 We can now see the formants more clearly >> spectrum(hamming(40).*da(2501:2540),10000,256)
function spectrum(signal, srate, Nfreqs) if nargin < 3 N = length(signal); else N = Nfreqs*2; end Y = fft(signal, N); % get frequency values of successive fft points w = 0:2*pi/N:2*pi-(2*pi/N); % these are values in radians/sample freq = w*srate/(2*pi); % these are values in Hz figure (1) stem (freq(1:(n/2)+1), ((abs(y(1:(n/2)+1))))) xlabel ('Frequency in Hz') ylabel ('Amplitude') grid figure (2) plot (freq(1:(n/2)+1), (20*log(abs(Y(1:(N/2)+1))))) xlabel ('Frequency in Hz') ylabel ('Amplitude in db') grid
Zero-Padding Note that we cannot increase frequency resolution this way, but we get a better picture how the energy is distributed between the (non-padded) frequency components. If you look at the code for the dft, you can see that it will get exactly the same answer for frequencies that are in common (why)? So all we are doing is allowing ourselves to evaluate the z- transform of the input at more than N frequencies. And it turns out that if you add the resulting components up, it will produce the original signal, followed by some number of (padding) zeros.
Zero-Padding Note that padding will also help with the case of a signal that is not exactly periodic in the window (i.e., leakage). unpadded spectrum zero-padded spectrum >> spectrum(hamming(50).*f500(1:50),10000,200)
Pre-emphasis In order to analyze formant resonances it is sometimes useful to remove the 6db/octave decrease in spectral energy that is due to the nature of the voiced source. The voiced source actually rolls off at 12db/octave, but half of that is compensated by the radiation of the sound from the lips. This 6db/octave decrease can be removed by high-pass filtering the speech (this is called pre-emphasis). This can be easily done in MATLAB by: Using the filter command with b coefficients = [1 1] Using the diff command (differentiate)
Pre-emphasis: Example Without Pre-emphasis With Pre-emphasis > spectrum(hamming(40).*diff(da(2501:2541)),10000,256)
Pre-emphasis: Example With Pre-emphasis The resulting peaks in the spectrum agree well with the formant frequencies that were specified to the synthesizer: [636 1129 2379 3933]
Spectrograms A spectrogram shows how the spectrum of a signal is changing over time. A spectrogram can be generated in MATLAB, by calculating a sequence of FFTs, which results in a matrix of analysis frames by frequency, with the amplitude as the value in each cell of the matrix. We can then use the image (or imagesc) commands to create an image in which each cell of the fft matrix controls the grey-scale value or a rectangular patch in the image.
function make_spect (signal, srate, Wsize, maxfreq, contrast); clf; if nargin < 4, maxfreq = srate/2; end; if nargin < 5, contrast = 4; end; if maxfreq > srate/2, maxfreq = srate/2; end; if srate < 25000, Nfreqs = 256; else Nfreqs = 512; end; freqsplot = round (Nfreqs.* (maxfreq/(srate/2))); % pre-emphasize, so analyze diff of signal sg = ComputeSgram(diff(signal), srate, Nfreqs, freqsplot, Wsize, 1); nframes = size(sg,2); % frames are 1 millisecond f = linspace(0,maxfreq,freqsplot); set(axes, 'xlim', [1 nframes], 'ylim', [0 maxfreq]); figure (1) imagesc(1:nframes, f, sg); set(gca, 'ydir', 'normal', 'box', 'on'); colormap(flipud(gray(256).^contrast)); xlabel ('Time in Millliseconds'); ylabel ('Frequency in Hz');
[frame, ifreq, button] = ginput (1); while (length (frame) > 0) iframe = round(frame); isample = iframe.* round(srate/1000); sp = ComputeSpectrum (diff(signal), srate, Nfreqs, freqsplot, Wsize,isample); figure(2) clf plot (f, 20*log10(sp)) xlabel ('Frequency in Hz') ylabel ('DB') grid figure (1) [frame, ifreq,button] = ginput(1); end
function sg = ComputeSgram(s,sRate,Nfreqs,freqsPlot,wSize,shift); nsamps = length(s); wsize = floor(wsize*srate/1000); wsize = wsize + mod(wsize,2); shift = shift * srate/1000; nframes = round(nsamps/shift); w = hanning(wsize); sg = zeros(freqsplot, nframes); s = [s ; zeros(wsize,1)]; sx = 1; % window size (samples) % make sure it's even % fractional samples per shift % append zeros at end % fractional sample index for fi = 1 : nframes, si = round(sx); pf = abs(fft(w.* s(si:si+wsize-1),nfreqs*2)); sg(:,fi) = pf(2:freqsplot+1); % drop DC sx = sx + shift; end; sg = filter(ones(3,1)/3,1,abs(sg),[],2); % clean holes
function sp=computespectrum(s,srate,nfreqs,freqsplot,wsize, isample); wsize = floor(wsize*srate/1000); % window size (samples) wsize = wsize + mod(wsize,2); % make sure it's even w = hanning(wsize); pf = abs(fft(w.* s(si:si+wsize-1),nfreqs*2)); sp = pf(2:freqsplot+1); % drop DC and unwanted freqs
> make_spect(sad,srate,6,5000,10)
> make_spect(sad,srate,25,5000,10)
> make_spect(dude,srate,6,5000,10)
> make_spect(dude,srate,25,5000,10)