Sinusoidal Modeling. Yannis Stylianou SPCC University of Crete, Computer Science Dept., Greece,

Similar documents
Timbral, Scale, Pitch modifications

Feature extraction 2

CS578- Speech Signal Processing

SPEECH ANALYSIS AND SYNTHESIS

representation of speech

GATE EE Topic wise Questions SIGNALS & SYSTEMS

Signal representations: Cepstrum

Voiced Speech. Unvoiced Speech

Lab 9a. Linear Predictive Coding for Speech Processing

SPEECH COMMUNICATION 6.541J J-HST710J Spring 2004

L6: Short-time Fourier analysis and synthesis

L7: Linear prediction of speech

Mel-Generalized Cepstral Representation of Speech A Unified Approach to Speech Spectral Estimation. Keiichi Tokuda

Chapter 5 Frequency Domain Analysis of Systems

Chapter 5 Frequency Domain Analysis of Systems

L8: Source estimation

Digital Signal Processing

Improved Method for Epoch Extraction in High Pass Filtered Speech

Chapter 9. Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析

Review of Fundamentals of Digital Signal Processing

Speech Signal Representations

ELEN E4810: Digital Signal Processing Topic 11: Continuous Signals. 1. Sampling and Reconstruction 2. Quantization

Vocoding approaches for statistical parametric speech synthesis

convenient means to determine response to a sum of clear evidence of signal properties that are obscured in the original signal

EE301 Signals and Systems In-Class Exam Exam 3 Thursday, Apr. 19, Cover Sheet

2A1H Time-Frequency Analysis II

Linear Prediction Coding. Nimrod Peleg Update: Aug. 2007

Feature extraction 1

Question Paper Code : AEC11T02

Digital Signal Processing Lecture 9 - Design of Digital Filters - FIR

Es e j4φ +4N n. 16 KE s /N 0. σ 2ˆφ4 1 γ s. p(φ e )= exp 1 ( 2πσ φ b cos N 2 φ e 0

Design Criteria for the Quadratically Interpolated FFT Method (I): Bias due to Interpolation

A.1 THE SAMPLED TIME DOMAIN AND THE Z TRANSFORM. 0 δ(t)dt = 1, (A.1) δ(t)dt =

Homework 4. May An LTI system has an input, x(t) and output y(t) related through the equation y(t) = t e (t t ) x(t 2)dt

ω 0 = 2π/T 0 is called the fundamental angular frequency and ω 2 = 2ω 0 is called the

Topic 3: Fourier Series (FS)

CEPSTRAL ANALYSIS SYNTHESIS ON THE MEL FREQUENCY SCALE, AND AN ADAPTATIVE ALGORITHM FOR IT.

Chapter 4 The Fourier Series and Fourier Transform

Lecture 16: Zeros and Poles

Discrete Time Signals and Systems Time-frequency Analysis. Gloria Menegaz

Linear Prediction 1 / 41

Review of Fundamentals of Digital Signal Processing

EE 224 Signals and Systems I Review 1/10

x[n] = x a (nt ) x a (t)e jωt dt while the discrete time signal x[n] has the discrete-time Fourier transform x[n]e jωn

Applications of Linear Prediction

CMPT 318: Lecture 5 Complex Exponentials, Spectrum Representation

Ch.11 The Discrete-Time Fourier Transform (DTFT)

EDISP (NWL2) (English) Digital Signal Processing Transform, FT, DFT. March 11, 2015

Class of waveform coders can be represented in this manner

LECTURE NOTES IN AUDIO ANALYSIS: PITCH ESTIMATION FOR DUMMIES

Lecture 7: Wireless Channels and Diversity Advanced Digital Communications (EQ2410) 1

Sound 2: frequency analysis

Chapter 2: Problem Solutions

I. Signals & Sinusoids

2A1H Time-Frequency Analysis II Bugs/queries to HT 2011 For hints and answers visit dwm/courses/2tf

3.2 Complex Sinusoids and Frequency Response of LTI Systems

Review of Discrete-Time System

Series FOURIER SERIES. Graham S McDonald. A self-contained Tutorial Module for learning the technique of Fourier series analysis

Source/Filter Model. Markus Flohberger. Acoustic Tube Models Linear Prediction Formant Synthesizer.

5 Kalman filters. 5.1 Scalar Kalman filter. Unit delay Signal model. System model

NAME: 11 December 2013 Digital Signal Processing I Final Exam Fall Cover Sheet

Randomly Modulated Periodic Signals

Final Exam 14 May LAST Name FIRST Name Lab Time

Parametric Signal Modeling and Linear Prediction Theory 1. Discrete-time Stochastic Processes (cont d)

Zeros of z-transform(zzt) representation and chirp group delay processing for analysis of source and filter characteristics of speech signals

Fourier Analysis of Signals Using the DFT

Frequency Domain Speech Analysis

Automatic Speech Recognition (CS753)

Discrete-Time David Johns and Ken Martin University of Toronto

Parametric Signal Modeling and Linear Prediction Theory 1. Discrete-time Stochastic Processes

Example: Bipolar NRZ (non-return-to-zero) signaling

Final Exam ECE301 Signals and Systems Friday, May 3, Cover Sheet

Multimedia Signals and Systems - Audio and Video. Signal, Image, Video Processing Review-Introduction, MP3 and MPEG2

MITIGATING UNCORRELATED PERIODIC DISTURBANCE IN NARROWBAND ACTIVE NOISE CONTROL SYSTEMS

Raktim Bhattacharya. . AERO 422: Active Controls for Aerospace Vehicles. Frequency Response-Design Method

Signals, Instruments, and Systems W5. Introduction to Signal Processing Sampling, Reconstruction, and Filters

LTI Systems (Continuous & Discrete) - Basics

Multidimensional digital signal processing

Discrete Time Fourier Transform

Transform Representation of Signals

SIGNAL PROCESSING. B14 Option 4 lectures. Stephen Roberts

Digital Speech Processing Lecture 10. Short-Time Fourier Analysis Methods - Filter Bank Design

Figure 3.1 Effect on frequency spectrum of increasing period T 0. Consider the amplitude spectrum of a periodic waveform as shown in Figure 3.2.

Virtual Array Processing for Active Radar and Sonar Sensing

Music 270a: Complex Exponentials and Spectrum Representation

Design of a CELP coder and analysis of various quantization techniques

NORWEGIAN UNIVERSITY OF SCIENCE AND TECHNOLOGY DEPARTMENT OF ELECTRONICS AND TELECOMMUNICATIONS

COMP Signals and Systems. Dr Chris Bleakley. UCD School of Computer Science and Informatics.

EE 521: Instrumentation and Measurements

Massachusetts Institute of Technology

A METHOD FOR NONLINEAR SYSTEM CLASSIFICATION IN THE TIME-FREQUENCY PLANE IN PRESENCE OF FRACTAL NOISE. Lorenzo Galleani, Letizia Lo Presti

Homework: 4.50 & 4.51 of the attachment Tutorial Problems: 7.41, 7.44, 7.47, Signals & Systems Sampling P1

KLT for transient signal analysis

Digital Image Processing Lectures 13 & 14

Lecture 7 Discrete Systems

Discrete-Time Signals: Time-Domain Representation

Cast of Characters. Some Symbols, Functions, and Variables Used in the Book

Solutions. Number of Problems: 10

Each problem is worth 25 points, and you may solve the problems in any order.

6.003 (Fall 2011) Quiz #3 November 16, 2011

Transcription:

Sinusoidal Modeling Yannis Stylianou University of Crete, Computer Science Dept., Greece, yannis@csd.uoc.gr SPCC 2016

1 Speech Production 2 Modulators 3 Sinusoidal Modeling Sinusoidal Models Voiced Speech Unvoiced Speech Synthesis Amplitude and Phase Interpolation 4 Harmonic Modeling HNM Speech Models 5 References yannis@csd.uoc.gr Sinusoidal Modeling 2/44

A Simple view Used after permission: [1] yannis@csd.uoc.gr Sinusoidal Modeling 3/44

Downward-looking into the larynx: Vocal folds Left: Voicing, Right: Breathing Used after permission: [1] yannis@csd.uoc.gr Sinusoidal Modeling 4/44

Glottal airflow velocity Used after permission: [1] yannis@csd.uoc.gr Sinusoidal Modeling 5/44

Vocat tract shapes Used after permission: [1] yannis@csd.uoc.gr Sinusoidal Modeling 6/44

Example Let the excitation of vocal tract, h[n], be: u[n] = g[n] p[n] then, the output speech, x[n, τ], is given by: x[n, τ] = w[n, τ]{h[n] (g[n] p[n])} and X (ω, τ) = 1 P H(ω k )G(ω k )W (ω ω k, τ) k= yannis@csd.uoc.gr Sinusoidal Modeling 7/44

Harmonics and Formants Used after permission: [1] yannis@csd.uoc.gr Sinusoidal Modeling 8/44

Sinusoidal Models Sinusoidal Model: K x(t) = γ k e j2πf kt k= K Special case, Harmonic Model: K x(t) = γ k e j2πkf 0t k= K Estimation of parameters (Linear approach): γ = F fk s Model Evaluation, Mean-Squared Error (MSE): ɛ = s(t) x(t) 2 dt w yannis@csd.uoc.gr Sinusoidal Modeling 9/44

Sinusoidal Models Sinusoidal Model: K x(t) = γ k e j2πf kt k= K Special case, Harmonic Model: K x(t) = γ k e j2πkf 0t k= K Estimation of parameters (Linear approach): γ = F fk s Model Evaluation, Mean-Squared Error (MSE): ɛ = s(t) x(t) 2 dt w yannis@csd.uoc.gr Sinusoidal Modeling 9/44

Sinusoidal Models Sinusoidal Model: K x(t) = γ k e j2πf kt k= K Special case, Harmonic Model: K x(t) = γ k e j2πkf 0t k= K Estimation of parameters (Linear approach): γ = F fk s Model Evaluation, Mean-Squared Error (MSE): ɛ = s(t) x(t) 2 dt w yannis@csd.uoc.gr Sinusoidal Modeling 9/44

Sinusoidal Models Sinusoidal Model: K x(t) = γ k e j2πf kt k= K Special case, Harmonic Model: K x(t) = γ k e j2πkf 0t k= K Estimation of parameters (Linear approach): γ = F fk s Model Evaluation, Mean-Squared Error (MSE): ɛ = s(t) x(t) 2 dt w yannis@csd.uoc.gr Sinusoidal Modeling 9/44

Source-Filter with Sinusoids Source: where: u(t) = φ k (t) = K(t) k= K(t) t 0 α k (t)e jφ k(t) Ω k (σ)dσ + φ k Filter: h(t, τ) with Fourier Transform (FT): H(t, Ω) = M(t, Ω)e jφ(t,ω) yannis@csd.uoc.gr Sinusoidal Modeling 10/44

Source-Filter with Sinusoids Source: where: u(t) = φ k (t) = K(t) k= K(t) t 0 α k (t)e jφ k(t) Ω k (σ)dσ + φ k Filter: h(t, τ) with Fourier Transform (FT): H(t, Ω) = M(t, Ω)e jφ(t,ω) yannis@csd.uoc.gr Sinusoidal Modeling 10/44

Source-Filter with Sinusoids where: s(t) = K(t) k=1 K(t) A k (t)e jθ k(t) A k (t) = α k (t)m [t, Ω k (t)] θ k (t) = φ k (t) + Φ [t, Ω k (t)] = t 0 Ω k (σ)dσ + Φ [t, Ω k (t)] + φ k yannis@csd.uoc.gr Sinusoidal Modeling 11/44

Sinusoidal Analysis: Stationarity Assumption We assume stationarity inside the analysis window: which leads to: and to: s(t) = In discrete time: s[n] = K k= K K k= K A k (t) = A k Ω k (t) = Ω k K(t) = K θ k (t) = Ω k (t t l ) + θ k A k e jθ k e jω k(t t l ) γ k e jω kn N w 1 2 t l T 2 t t l + T 2 n N w 1 2 yannis@csd.uoc.gr Sinusoidal Modeling 12/44

Sinusoidal Analysis: Stationarity Assumption We assume stationarity inside the analysis window: which leads to: and to: s(t) = In discrete time: s[n] = K k= K K k= K A k (t) = A k Ω k (t) = Ω k K(t) = K θ k (t) = Ω k (t t l ) + θ k A k e jθ k e jω k(t t l ) γ k e jω kn N w 1 2 t l T 2 t t l + T 2 n N w 1 2 yannis@csd.uoc.gr Sinusoidal Modeling 12/44

Sinusoidal Analysis: Stationarity Assumption We assume stationarity inside the analysis window: which leads to: and to: s(t) = In discrete time: s[n] = K k= K K k= K A k (t) = A k Ω k (t) = Ω k K(t) = K θ k (t) = Ω k (t t l ) + θ k A k e jθ k e jω k(t t l ) γ k e jω kn N w 1 2 t l T 2 t t l + T 2 n N w 1 2 yannis@csd.uoc.gr Sinusoidal Modeling 12/44

Sinusoidal Analysis: Stationarity Assumption We assume stationarity inside the analysis window: which leads to: and to: s(t) = In discrete time: s[n] = K k= K K k= K A k (t) = A k Ω k (t) = Ω k K(t) = K θ k (t) = Ω k (t t l ) + θ k A k e jθ k e jω k(t t l ) γ k e jω kn N w 1 2 t l T 2 t t l + T 2 n N w 1 2 yannis@csd.uoc.gr Sinusoidal Modeling 12/44

Mean-Squared Error Given the original measured waveform, y[n] and the synthetic speech waveform, s[n], estimate the unknown parameters A l k, ωl k, and θk l by minimizing the MSE criterion: n= (N w 1)/2 ɛ l = n=(n w 1)/2 n= (N w 1)/2 y[n] s[n] 2 which can be written as: n=(n w 1)/2 K l ( Y ) ɛ l = y[n] 2 + N w (ω l k ) γl k 2 Y (ωk l ) 2 which can be reduced further to: ɛ l = n=(n w 1)/2 n= (N w 1)/2 k=1 y[n] 2 N w Y (ωk l ) 2 K l k=1 yannis@csd.uoc.gr Sinusoidal Modeling 13/44

Mean-Squared Error Given the original measured waveform, y[n] and the synthetic speech waveform, s[n], estimate the unknown parameters A l k, ωl k, and θk l by minimizing the MSE criterion: n= (N w 1)/2 ɛ l = n=(n w 1)/2 n= (N w 1)/2 y[n] s[n] 2 which can be written as: n=(n w 1)/2 K l ( Y ) ɛ l = y[n] 2 + N w (ω l k ) γl k 2 Y (ωk l ) 2 which can be reduced further to: ɛ l = n=(n w 1)/2 n= (N w 1)/2 k=1 y[n] 2 N w Y (ωk l ) 2 K l k=1 yannis@csd.uoc.gr Sinusoidal Modeling 13/44

Mean-Squared Error Given the original measured waveform, y[n] and the synthetic speech waveform, s[n], estimate the unknown parameters A l k, ωl k, and θk l by minimizing the MSE criterion: n= (N w 1)/2 ɛ l = n=(n w 1)/2 n= (N w 1)/2 y[n] s[n] 2 which can be written as: n=(n w 1)/2 K l ( Y ) ɛ l = y[n] 2 + N w (ω l k ) γl k 2 Y (ωk l ) 2 which can be reduced further to: ɛ l = n=(n w 1)/2 n= (N w 1)/2 k=1 y[n] 2 N w Y (ωk l ) 2 K l k=1 yannis@csd.uoc.gr Sinusoidal Modeling 13/44

Karhunen-Loève Expansion Karhunen-Loève expansion allows constructing a random process from harmonic sinusoids with uncorrelated complex amplitudes. Estimated power spectrum should not vary too much over consecutive frequencies. Following the above necessary constraints, for unvoiced speech, and for a window width to be at least 20ms, an 100 Hz harmonic structure provides good results. yannis@csd.uoc.gr Sinusoidal Modeling 14/44

Karhunen-Loève Expansion Karhunen-Loève expansion allows constructing a random process from harmonic sinusoids with uncorrelated complex amplitudes. Estimated power spectrum should not vary too much over consecutive frequencies. Following the above necessary constraints, for unvoiced speech, and for a window width to be at least 20ms, an 100 Hz harmonic structure provides good results. yannis@csd.uoc.gr Sinusoidal Modeling 14/44

Karhunen-Loève Expansion Karhunen-Loève expansion allows constructing a random process from harmonic sinusoids with uncorrelated complex amplitudes. Estimated power spectrum should not vary too much over consecutive frequencies. Following the above necessary constraints, for unvoiced speech, and for a window width to be at least 20ms, an 100 Hz harmonic structure provides good results. yannis@csd.uoc.gr Sinusoidal Modeling 14/44

Karhunen-Loève Expansion Karhunen-Loève expansion allows constructing a random process from harmonic sinusoids with uncorrelated complex amplitudes. Estimated power spectrum should not vary too much over consecutive frequencies. Following the above necessary constraints, for unvoiced speech, and for a window width to be at least 20ms, an 100 Hz harmonic structure provides good results. yannis@csd.uoc.gr Sinusoidal Modeling 14/44

Sinusoidal Analysis: Peak picking Used after permission: [1] yannis@csd.uoc.gr Sinusoidal Modeling 15/44

Sinusoidal Synthesis: OLA, a simple solution yannis@csd.uoc.gr Sinusoidal Modeling 16/44

Problem of frequency matching yannis@csd.uoc.gr Sinusoidal Modeling 17/44

Frame-to-Frame Peak Matching yannis@csd.uoc.gr Sinusoidal Modeling 18/44

The birth/death process yannis@csd.uoc.gr Sinusoidal Modeling 19/44

A birth/death process in speech yannis@csd.uoc.gr Sinusoidal Modeling 20/44

Amplitude and Phase Interpolation Linear amplitude interpolation model: ( ( n ) A l k [n] = Al k + A l+1 k Ak) l n = 0, 1, 2,, L 1 L Cubic Phase interpolation model θ(t) = ζ + γt + αt 2 + βt 3 yannis@csd.uoc.gr Sinusoidal Modeling 21/44

Block diagram of the Synthesis System yannis@csd.uoc.gr Sinusoidal Modeling 22/44

Synthesis: Reconstruction Example Used after permission: [1] yannis@csd.uoc.gr Sinusoidal Modeling 23/44

Harmonic model Sinusoidal Model: K x(t) = γ k e j2πf kt k= K Special case, Harmonic Model: K x(t) = γ k e j2πkf 0t k= K Estimation of parameters (Linear approach): γ = F fk s Model Evaluation, Mean-Squared Error (MSE): ɛ = s(t) x(t) 2 dt w yannis@csd.uoc.gr Sinusoidal Modeling 24/44

Brief overview of HNM HNM is a pitch-synchronous harmonic plus noise representation of the speech signal. Speech spectrum is divided into a low and a high band delimited by the so-called maximum voiced frequency. The low band of the spectrum (below the maximum voiced frequency) is represented solely by harmonically related sine waves. The upper band is modeled as a noise component modulated by a time-domain amplitude envelope. HNM allows high-quality copy synthesis and prosodic modifications. yannis@csd.uoc.gr Sinusoidal Modeling 25/44

HNM in equations Harmonic part: h(t) = L(t) k= L(t) A k (t)e j kω 0(t) t Noise part: Speech: n(t) = e(t) [v(τ, t) b(t)] s(t) = h(t) + n(t) yannis@csd.uoc.gr Sinusoidal Modeling 26/44

Periodic part HNM 1 : Sum of exponential functions without slope h 1 [n] = L(n i a) k= L(n i a) a k (n i a)e j2πkf 0(n i a)(n n i a) HNM 2 : Sum of exponential function with complex slope L(n a i ) h 2 [n] = R A k (n) exp j2πkf 0(na i )(n ni a ) where k=1 A k (n) = a k (n i a) + (n n i a)b k (n i a) with a k (n i a), b k (n i a) to be complex numbers (amplitude and slope respectively). R denotes taking the real part. yannis@csd.uoc.gr Sinusoidal Modeling 27/44

Periodic part HNM 1 : Sum of exponential functions without slope h 1 [n] = L(n i a) k= L(n i a) a k (n i a)e j2πkf 0(n i a)(n n i a) HNM 2 : Sum of exponential function with complex slope L(n a i ) h 2 [n] = R A k (n) exp j2πkf 0(na i )(n ni a ) where k=1 A k (n) = a k (n i a) + (n n i a)b k (n i a) with a k (n i a), b k (n i a) to be complex numbers (amplitude and slope respectively). R denotes taking the real part. yannis@csd.uoc.gr Sinusoidal Modeling 27/44

Periodic part continuing HNM 3 : Sum of sinusoids with time-varying real amplitudes where h 3 [n] = L(n i a) k=0 a k (n)cos(ϕ k (n)) a k (n) = c k0 + c k1 (n na) i 1 + + c kp (n na) i p(n) ϕ k (n) = ɛ k + 2π kζ (n na) i where p(n) is the order of the amplitude polynomial, which is, in general, a time-varying parameter. yannis@csd.uoc.gr Sinusoidal Modeling 28/44

Residual (Noise) part The non-periodic part is just the residual signal obtained by subtracting the periodic-part (harmonic part) from the original speech signal in the time-domain r[n] = s[n] h[n] where h[n] is either h 1 [n], h 2 [n], or h 3 [n]. yannis@csd.uoc.gr Sinusoidal Modeling 29/44

Amplitudes and phases estimation Having f 0 estimated for voiced frames, amplitudes and phases are estimated by minimizing the criterion: ɛ = n i a+n n=n i a N w 2 [n](s[n] ĥ[n]) 2 where na i = na i 1 + P(na i 1 ), and P(na i 1 ) denotes the pitch period at na i 1. for HNM 1 and HNM 2, this criterion has a quadratic form and is solved by inverting an over-determined system of linear equations. For HNM 3, however,a non-linear system of equations has to be solved. The residual signal r[n] is estimated by ˆr[n] = s[n] ĥ[n] yannis@csd.uoc.gr Sinusoidal Modeling 30/44

Amplitudes and phases estimation Having f 0 estimated for voiced frames, amplitudes and phases are estimated by minimizing the criterion: ɛ = n i a+n n=n i a N w 2 [n](s[n] ĥ[n]) 2 where na i = na i 1 + P(na i 1 ), and P(na i 1 ) denotes the pitch period at na i 1. for HNM 1 and HNM 2, this criterion has a quadratic form and is solved by inverting an over-determined system of linear equations. For HNM 3, however,a non-linear system of equations has to be solved. The residual signal r[n] is estimated by ˆr[n] = s[n] ĥ[n] yannis@csd.uoc.gr Sinusoidal Modeling 30/44

Time domain characteristics of ˆr[n] 1.5 x Original speech signal 104 1 0.5 0 0.5 1 0 100 200 (a) Number of samples 3000 2000 1000 0 1000 2000 Error signal using the HNM2 3000 0 100 200 (c) Number of samples 3000 2000 1000 0 1000 2000 Error signal using HNM1 3000 0 100 200 (a) Number of samples 3000 2000 1000 0 1000 2000 Error signal using HNM3 3000 0 100 200 (d) Number of samples yannis@csd.uoc.gr Sinusoidal Modeling 31/44

Variance of the residual signal The variance of the residual signal is given as: E(rr h ) = I WP(P h W h WP) 1 P h W h 1 Variance using the three harmonic models 0.9 0.8 0.7 0.6 Variance 0.5 0.4 0.3 0.2 0.1 0 0 10 20 30 40 50 60 70 80 90 Time in samples yannis@csd.uoc.gr Sinusoidal Modeling 32/44

Modeling error 3 x 104 (a) Original signal Error in db 2 1 0 1 2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Time in sec 0 10 20 30 40 50 60 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Time in sec yannis@csd.uoc.gr Sinusoidal Modeling 33/44

Modeling the residual signal Full bandwidth representation using a low-order (10th) AR filter Time-domain characteristics of the residual signal can be modeled using various functions yannis@csd.uoc.gr Sinusoidal Modeling 34/44

Time domain energy modulation 0.4 0.2 0 0.2 0.4 0 200 400 600 800 1000 1200 1400 1600 1800 2000 Time in samples 0.1 0.05 0 0.05 0.1 0 200 400 600 800 1000 1200 1400 1600 1800 2000 Time in samples yannis@csd.uoc.gr Sinusoidal Modeling 35/44

Triangular envelope yannis@csd.uoc.gr Sinusoidal Modeling 36/44

Signal envelope There are many ways to obtain the envelope of a signal, as: Hilbert Transform (analytic signal) Low-pass local energy (energy envelope): e[n] = 1 2N + 1 N k= N where r[n] denotes the residual signal. r[n k] yannis@csd.uoc.gr Sinusoidal Modeling 37/44

Signal envelope There are many ways to obtain the envelope of a signal, as: Hilbert Transform (analytic signal) Low-pass local energy (energy envelope): e[n] = 1 2N + 1 N k= N where r[n] denotes the residual signal. r[n k] yannis@csd.uoc.gr Sinusoidal Modeling 37/44

Example of energy envelope Example of Energy Envelope, with N = 7 0.1 0.05 0 0.05 0.1 0 200 400 600 800 1000 1200 1400 1600 1800 2000 Time in samples The energy envelope can be efficiently parameterized with a few Fourier coefficients: ê[n] = where L e is set to be 3 to 4 L e k= L e A k e j2πk(f0/fs)n yannis@csd.uoc.gr Sinusoidal Modeling 38/44

Comparing time-domain modulations 0.1 0 0.1 0.1 200 400 600 800 1000 1200 1400 1600 1800 2000 Noise Part 0 0.1 0.1 200 400 600 800 1000 1200 1400 1600 1800 2000 Triangular Envelope 0 0.1 0.1 200 400 600 800 1000 1200 1400 1600 1800 2000 Hilbert Envelope 0 0.1 200 400 600 800 1000 1200 1400 1600 1800 2000 Energy Envelope yannis@csd.uoc.gr Sinusoidal Modeling 39/44

Sound Examples using HNM 1 Examples with HNM 1 and Maximum Voiced frequency fixed at 4000 Hz Male Male Female Female Original Triangular Hilbert Energy yannis@csd.uoc.gr Sinusoidal Modeling 40/44

Sound Examples using HNM 2 Examples with HNM 2 and Maximum Voiced frequency fixed at 4000 Hz Male Male Female Female Original Triangular Hilbert Energy yannis@csd.uoc.gr Sinusoidal Modeling 41/44

Sensitivity on f 0 100 Maginute (db) 80 60 40 20 0 500 1000 1500 2000 2500 3000 3500 4000 (a) Frequency in Hz 100 Maginute (db) 80 60 40 20 0 500 1000 1500 2000 2500 3000 3500 4000 (b) Frequency in Hz yannis@csd.uoc.gr Sinusoidal Modeling 42/44

T. F. Quatieri, Discrete-Time Speech Signal Processing. Engewood Cliffs, NJ: Prentice Hall, 2002. yannis@csd.uoc.gr Sinusoidal Modeling 43/44

THANK YOU for your attention on this part yannis@csd.uoc.gr Sinusoidal Modeling 43/44