Frequency Domain Speech Analysis

Size: px

Start display at page:

Download "Frequency Domain Speech Analysis"

Marshall Phelps
6 years ago
Views:

1 Frequency Domain Speech Analysis Short Time Fourier Analysis Cepstral Analysis Windowed (short time) Fourier Transform Spectrogram of speech signals Filter bank implementation* (Real) cepstrum and complex cepstrum Complex cepstrum for speech Pitch detection Echo hiding

Fourier Transform F w = f t e jwt 1 jwt f t = F w e dw 2π Joseph Fourier (1768 1830) It s a

2 Fourier Transform F w = f t e jwt 1 jwt f t = F w e dw 2π Joseph Fourier ( ) It s a simple and powerful idea: Can any signal be represented by linear combination of sines and cosines?

3 Deep Impact of FT A versatile tool for solving many problems in science and engineering Mathematics: functional/harmonic analysis Physics: thermodynamics, Fourier optics Astronomy: radar imaging, FT Spectrometer Biomedical engineering: MRI, FT infrared spectrography Electrical engineering: frequency domain analysis, wireless communication, signal processing

4 Spectral Analysis

5 Discrete Fourier Transform (DFT) Fourier Transform of a Sequence jw jwn X e = x n e w= N 1 DFT 2kπ N j X k = x n e n =0 2kπ n N

6 Properties of DFT Linearity Periodicity Shift Convolution

7 DFT Under MATLAB >X=fft(x) For a length N real signal x, output X will be length N complex sequence with low frequency clustered around 1 and N X=fftshift(fft(x)) will put low frequency to the center instead of boundary of X. >X=fft(x,N) pad with zeros if x has less than N points and truncated if it has more

8 DFT of Simple Waveforms Time Frequency

9 Time Frequency Localization* Frequency Heisenberg Box Time Heisenberg s uncertainty principle

10 Implication into Signal Analysis You CANNOT arbitrarily improve both the resolution of time analysis and frequency analysis. FT FT Time domain representation Frequency domain representation How do we define Instantaneous frequency?

11 Speech Signal Analysis Why (long term) FT is not appropriate for speech signals? FT is the ideal tool for analyzing periodic or stationary signals frequency domain representation greatly helps the analysis Like many other phenomena we observe in the natural worlds, speeches are transient or nonstationary signals whose properties change markedly as a function of time Due to Heisenberg s uncertainty principle, we can only find some compromised solution between time and frequency localization

12 Short Time (Windowed) FT (Normal) Fourier Transform of a Sequence X e jw = x n e jwn Time Dependent Fourier Transform X n e jw = w n m x m e jwm time frequency window (typically Hamming)

13 Definition of Spectrogram (Sonogram) Windowed speech speech time freq. window

14 Interpretation of Spectrogram Frequency Evolution of ST spectrum at frequency w along time w n STFT at time n Time

15 History of Sonagraph (Visual Speech) Spectrogram has been used in almost every phase of speech research for over 70 years (DSP has been around for 57 years) Before DSP, a device called sound spectrograph (also called wave analyzer) was widely used

16 Example of Spectrogram: Chirps Chirps are analytic signals which have a particular instantaneous frequency

17 Example of Speech Spectrogram

18 Spectrogram Reading We will use MATLAB demo to test your spectrogram reading capability

19 Spectrogram via Filter Bank* Filter Bank

20 Spectrogram Calculation Under MATLAB Method 1: Use COLEA toolbox (it has a nice GUI) Method 2: Use the demo program on the right Cut and paste it and save It as specgram_demo.m >[x,fs]=wavread(filename); >specgram_demo(x,fs); % function specgram_demo(y,fs) % display the spectrogram of speech signal % demo for EE493Q Fall 2006 function specgram_demo(y,fs) % calculate the table of amplitudes [B,f,t]=specgram(y,1024,fs,256,192); % calculate amplitude 50dB down from maximum bmin=max(max(abs(b)))/300; % plot top 50dB as image imagesc(t,f,20*log10(max(abs(b),bmin)/bmin)); % label plot axis xy;xlabel('time (s)');ylabel('frequency (Hz)'); % build and use a grey scale lgrays=zeros(100,3); for i=1:100 lgrays(i,:) = 1 i/100; end colormap(hot);

21 /Hello/

22 /Don t ask me/

23 Impact of Window on Spectrogram

24 Frequency Domain Speech Analysis Short Time Fourier Analysis Cepstral Analysis Windowed (short time) Fourier Transform Spectrogram of speech signals Filter bank implementation* (Real) cepstrum and complex cepstrum Complex cepstrum for speech pitch detection Echo hiding

25 From Spectrum to Cepstrum Recall fundamental assumption about speech signal: speech can be represented as the output of a linear filtering system whose excitation and system response vary slowly with time To separate excitation e(n) from the system response h(n), we need to perform some kind of deconvolution in the frequency domain: X(w)=E(w)H(w) Multiplication is not as easy as addition to deal with; can we convert product into sum?

26 The Power of Logarithm X e jw =E e jw H e jw log() X e jw =E e jw H e jw Note on Complex Logarithm Since X(w) (FT of x(n)) is typically a complex signal, we need to define complex logarithm as follows (i.e., take the logarithm of magnitude) X e jw =log[ X e jw ]=log X e jw j arg [ X e jw ]

27 Complex Cepstrum

28 Phase Unwrapping Problem* Note that

29 (Real) Cepstrum x(n) F() X(ejw) Log magnitude log X(ejw) F 1() c(n) 1 π c n = π log X e jw e jwn dw 2π Can show that cepstrum c(n) is the even part of complex cepstrum, i.e. 1 c n = [ x n x n ] 2 Hint: X(w) and arg[x(w)] are even and odd functions respectively

30 Complex Cepstrum Example x n =δ n aδ n N, 0 a 1 X e jw =1 ae jwn X e jw =log[ X e jw ]=log[ 1 ae jwn ] 1 n 1 n jwnn n=1 a e n k 1 1 x n = k=1 δ n kn k Conclusion: the complex cepstrum of a train of uniformly spaced impulses is also a uniformly spaced impulse train with the same spacing

31 Complex Cepstrum of Speech Model of speech Voiced speech is produced by quasi periodic pulse train exciting slowly time varying linear system, i.e. e(n)=p(n) Unvoiced speech is produced by random noise exciting slowly time varying linear system, i.e. e(n)=u(n) Glottis, vocal tract and radiation at the lip can all be modeled by slowly time varying linear systems

32 Full Model for Voiced Speech

33 Look Inside transform function for voiced speech Glottal pulse model Vocal tract model Radiation model Note: the combination of glottal pulse, vocal tract and radiation will correspond to low time part of cepstrum (around the origin)

34 Complex Cepstrum for Voiced Speech ARG(X(ejw)) arg(x(ejw))

35 Full Model for Unvoiced Speech Transfer function for unvoiced speech Note the two differences from voiced speech: 2) Excitation is no longer an impulse train but random noise 3) No glottal pulse model is involved

36 Complex Cepstrum for Unvoiced Speech

37 Pitch Detection in Cepstrum Domain

38 Frequency Domain Speech Analysis Short Time Fourier Analysis Cepstral Analysis Windowed (short time) Fourier Transform Spectrogram of speech signals Filter bank implementation* (Real) cepstrum and complex cepstrum Complex cepstrum for speech pitch detection Echo hiding

39 Background on Music Piracy An estimate of $4.3 billion loss each year due to piracy of digital music content That is what the whole napster story about no more free music Watermarking was proposed as one possible technical solution to copyright protection but its future remains uncertainty

40 What is Watermarking?

41 Echo Hiding Proposed by Gruhl, Bender and Lu of MIT Media Lab in 1999 Since then, various audio data hiding techniques have been proposed Basic idea is to exploit the masking property of human auditory system when an echoed signal is placed close to the host signal, it is inaudible to human ears but detectable by machine Decoding is done in cepstrum domain That is why we are interested in it here!

42 Demo of Speech with Echoes Original speech Modified speech with severe echo Modified speech with slight echo Modified speech with five echoes Conclusion: as long as echoes are inserted to the right place and

43 How to Hide One Bit?

44 Toy Example

45 Encoding Multiple Bits

46 Encoding Multiple Bits (Con t)

47 Cepstrum Decoding

48 Decoding Examples Bit 1 Bit 0

49 Echo Hiding Summary Information is embedded by adding echoes located at different positions By controlling the amplitude and distance of echoes, we can achieve perceptual transparency Embedded information can be extracted by detecting echoes in the cepstrum domain The down side of this approach is lack of security i.e., hacker can easily remove echoes or make it undetectable

50 One Minute Survey What is the muddiest point in this week s lecture? What is the difference between short time FT and FT? What is the use of cepstrum in echo hiding applications?

Signal representations: Cepstrum

Signal representations: Cepstrum Source-filter separation for sound production For speech, source corresponds to excitation by a pulse train for voiced phonemes and to turbulence (noise) for unvoiced phonemes,