FOURIER-BASED METHODS FOR THE SPECTRAL ANALYSIS OF MUSICAL SOUNDS. Sylvain Marchand

FOUIE-BAS METHOS FO THE SPECTAL ANALYSIS OF MUSICAL SOUNS Sylvain Marchand Univerity of Bret, Lab-STICC CNS UM 628, 292 Bret, Brittany, France ABSTACT When dealing with muical ound, the hort-time Fourier tranform prevail and inuoid play a key role, according to both acoutic (vibrating mode) and pychoacoutic (pure tone). The value obtained when decompoing the ignal on the time-frequency atom are uually aigned to their geometrical center, leading to etimation error for the inuoidal parameter. To correct thi, one can exploit the amplitude or phae information, ue the derivative of the analyi window, or thoe of the audio ignal. Thi lead to three method (phae vocoder, pectral reaignment, derivative algorithm) equally efficient: they are in fact different formulation of the bet analyi method baed on the Fourier pectrum. Index Term Sound analyi, inuoidal modeling, Fourier tranform, pectral reaignment 1. INTOUCTION Additive ynthei can be conidered a a pectrum modeling technique. It i originally rooted in Fourier theorem, which tate that any periodic function can be modeled a a um of inuoid at variou amplitude and harmonically related frequencie. Sinuoidal modeling conit in conidering the trajectorie in time of the amplitude and frequency parameter of each inuoid preent in the ound. Thi wa propoed by McAulay and Quatieri [1] for peech ignal and by Smith and Serra [2] for muical ound. Sinuoidal modeling lead to meaningful ound repreentation, uitable e.g. for audio effect (time tretching, pitch hifting, etc.), audio coding, ource eparation, or muic trancription. The inuoidal model being parametric, an important problem i to be able to etimate the model parameter a accurately a poible, to get high quality ound. In thi paper, we focu on etimator baed on the Fourier pectrum and well-uited for muical ound, although other approache exit (e.g. ee [3]) and other ignal could alo be conidered. After a preentation of inuoidal modeling in Section 2, Section 3 decribe three method for the etimation of the inuoidal parameter, and Section 4 how that they are all efficient in practice and in fact equivalent in theory: they are different formulation of the bet Fourier-baed analyi method. Together with Lagrange we tudied in [4] the equivalence of thee etimator in the tationary cae and only for one Thi reearch wa partly upported by the AN agency, eam project (AN-9-CO-6). inuoidal parameter (the frequency). Thi paper can be regarded a an extenion, where pectral reaignment play a central role. 2. SPECTAL SOUN MOELING 2.1. Sinuoidal Modeling Let u conider here the inuoidal model under it mot general expreion, which i a um of complex inuoid / exponential (the partial) with low time-varying amplitude a p and non-harmonically related frequencie ω p (defined a the firt derivative of the phae φ p ). The reulting ignal i thu (t) = P a p (t) exp(jφ p (t)) (1) p=1 where P i the number of partial. Since thi paper focue on the tatitical quality of the parameter etimator rather than their frequency reolution, the ignal model i reduced to only one partial (P = 1). The ubcript notation for the partial i then uele. Let u alo define Π a being the value of a given parameter Π at time, correponding to the center of the analyi frame. The ignal i then (t) = exp (λ + µ t) +j (φ + ω t) (2) }{{}}{{} λ(t)=log(a(t)) φ(t) where µ (the amplitude modulation) i the derivative of λ (the log-amplitude), and ω (the frequency) i the derivative of φ (the phae). Thu, the log-amplitude and the phae are modeled by polynomial of degree 1, which can be viewed either a truncated Taylor expanion of more complicated amplitude and frequency modulation (e.g. tremolo / vibrato), or either a an extenion of the tationary cae where µ =. 2.2. Spectral Analyi The main problem we have to tackle now i the etimation of the model parameter. Thi can be achieved, a in the tationary cae, by uing the hort-time Fourier tranform (STFT): = (τ)w(τ t) exp ( jω(τ t)) dτ (3) where S w i the hort-time pectrum of the ignal.

1.9.8 1.7 1 Amplitude.6..4.3.2.1 Magnitude (db) 2 2 3 3 4 1 2 3 4 6 7 8 9 1 11 Frequency (Hz) Fig. 1. Magnitude pectrum of a harmonic ound, reulting from a fat (dicrete) Fourier tranform (FFT). Each partial caue a peak (e.g. ee ) and will be conidered individually. 4 2 4 6 8 1 12 14 16 Frequency (bin) Fig. 2. Zoom on one peak: The analyzed inuoid (dahed line) i oberved from the pectrum of the analyi window through the dicrete frequencie (olid line) of the Fourier tranform, leading to everal bin with ignificant energy ( ). Note that, a in [], we ue here a lightly modified definition of the STFT. Indeed we let the time reference lide with the window, which i alo the cae in practice when the STFT i implemented uing a liding fat Fourier tranform (FFT). For the ake of implicity, all the mathematical derivation will be done in the continuou domain. However, in practice the ignal are dicrete (with ome ampling frequency F ). The Fourier tranform (FT) will then be replaced by it dicrete verion (FT) of ize N, and the time will be expreed in ample (ample n being at time n/f ) and the frequency in bin (bin m being at frequency mf /N). Many intrumental ound are harmonic, meaning that the frequencie of the partial are multiple of ome fundamental frequency (related to our perception of pitch). The magnitude of the Fourier pectrum exhibit then a erie of peak (ee Fig. 1). Each peak i a local maximum m in the magnitude pectrum and correpond to ome partial. S w involve an analyi window w, uually ymmetric and band-limited in uch a way that for any frequency correponding to one pecific partial, the influence of the other partial can be neglected (in the general cae when P > 1). In the tationary cae (µ = ), the pectrum of the analyi window get imply centered on the frequency ω and multiplied by the complex amplitude = a exp(jφ ) = exp(λ + jφ ), (4) a hown in Fig. 2, which can be regarded a a zoom on one of the peak of the preceding figure. In the non-tationary cae however, conidering Equation (3) at etimation time, we ee that get multiplied by Γ w (ω ω, µ ) where Γ w (ω, µ ) = w(t) exp (µ t + jωt) dt. () In the pecial cae of uing a Gauian window for w, an analytic formula can be derived. Ele, it i alway poible to compute Γ w directly from Equation (). Once the etimated amplitude modulation ˆµ and frequency ˆω are known, the amplitude and phae parameter can eventually be etimated ince ŝ = S w (, ω m ) Γ w (ˆω ω m, ˆµ ), (6) where ω m i the (dicrete) frequency of the local maximum of the magnitude pectrum where the partial i detected. 3. SINUSOIAL ESTIMATION The problem i yet to etimate the amplitude modulation and frequency parameter. In thi ection, we preent 3 method providing etimation function ˆµ and ˆω for thee parameter. In practice, for each detected partial at time (center of the analyi frame) and (dicrete) frequency ω m, the etimate of it parameter are given by: ˆµ = ˆµ(, ω m ), (7) ˆω = ˆω(, ω m ). (8) 3.1. ifference Method (Phae Vocoder) The Fourier-baed approach tarted together with computer muic, about year ago. The phae vocoder introduced by Flanagan and Golden [6] wa already uing the phae of the Fourier pectrum to etimate the frequency of the partial, and more preciely the phae difference of conecutive pectra [7]. Thi imple yet efficient difference approach wa generalized recently to the non-tationary cae [8].

Thu i the pectrum of the frame centered at the deired (dicrete) etimation time, and let S w (ω) = S w (t 1/F, ω) be it left (previou, i.e. one ample before) and right (next, i.e. one ample after) neighboring pectra, repectively (F denoting the ampling frequency). Since the log-amplitude and phae difference correpond to the real and imaginary part of the logarithm of pectral ratio, repectively, let u define: λ (S 1, S 2 ) = log S 1 log S 2 = (log (S 1 /S 2 )), (9) φ (S 1, S 2 ) = S 1 S 2 = I (log (S 1 /S 2 )) (1) (S 1 and S 2 denoting two arbitrary complex pectra). Since we can meaure the amplitude of the pectra, we can compute the left and right etimate of the amplitude modulation, and retain their mean a the final etimation: µ = λ (S w, Sw ) F, µ + = λ (S w +, S w ) F, ˆµ = (µ + µ + )/2. (11) Similarly, with the meaured phae of the pectra, we can compute an etimation of the intantaneou frequency: ω = φ (S w, S w ) F, ω + = φ (S w +, S w ) F, ˆω = (ω + ω + )/2. (12) In practice, we are looking for poitive frequencie and ince the phae i meaured modulo 2π, after each call to the φ function we mut apply the phae unwrapping procedure of the phae vocoder, i.e. adding 2π to the reult if lower than. 3.2. Spectral eaignment eaignment wa firt propoed by Kodera et al. [9] and wa generalized by Auger and Flandrin [1] to improve timefrequency repreentation. Uually, the value obtained when decompoing the ignal on the time-frequency atom are aigned to the geometrical center of the cell (center of the analyi window and bin of the Fourier tranform). The reaignment method aign each value to the center of gravity of the cell energy. The method ue the knowledge of the firt derivative w obtained by analytic differentiation of the analyi window w in order to adjut the frequency inide the Fourier tranform bin. Thi approach wa generalized for the amplitude modulation in the non-tationary cae (ee []). The complex hort-time pectrum reulting from the STFT i, in the polar form: = a(t, ω) exp (jφ(t, ω)) (13) where the intantaneou amplitude a and phae φ are realvalued function of time t and frequency ω. By conidering Equation (3), we can eaily derive: t log (S w(t, ω)) = jω S w(t, ω) (14) where w denote the derivative of w. Then, ince the amplitude modulation (rep. frequency) i the derivative of the amplitude (rep. phae), from Equation (13) and (14), we obtain the reaigned parameter: ˆµ(t, ω) = t (log (S w(t, ω))) Sw (t, ω) =, (1) ˆω(t, ω) = t I (log (S w(t, ω))) Sw (t, ω) = ω I. (16) 3.3. erivative Method Together with eainte-catherine in [11], we propoed to ue the ignal derivative to etimate the inuoidal parameter in the tationary cae; and with epalle in [], we generalized thi derivative method to the non-tationary cae. Indeed, conidering Equation (2), ince the derivative of an exponential i an exponential, we have: (t) = (µ + jω ) (t) and thu = µ and I = ω. For thi method to work in the cae of a ignal made of everal partial, we have to witch to the pectral domain and define: S ˆµ = w, (17) S w S ˆω = I w, (18) where S w i the hort-time pectrum of the ignal derivative. A hown in [], in practice thi (dicrete) derivative can be obtained by convolving the dicrete ignal by the following differentiator filter: S w ( 1) n h[n] = F for n, and h() = (19) n of infinite time upport. Thu, in practice, we multiply h by a (finite-length) Hann window. Thi reult in a high-pa filter, and can lead to etimation problem in the high frequencie (above approx. 3/4 of the Nyquit frequency), fortunately above the audible limit (16kHz). So, for each partial m detected in the (dicrete) Fourier pectrum at time t and frequency ω m, together with Equation (7) and (8), we have now 3 way to etimate the amplitude modulation and frequency parameter of the partial: the difference approach with Equation (11)-(12), the reaignment approach with Equation (1)-(16), the derivative approach with Equation (17)-(18). Once thee parameter are known, the other (amplitude and phae) can be etimated in turn uing Equation (6).

4. COMPAING THE ESTIMATOS Now the quetion i: Which i the bet approach? 4.1. Experimental eult To quantitatively evaluate the preciion of thee approache for the etimation of all the model parameter, we ran the ame experiment a in [, 8]. We conider dicrete-time ignal, with ampling rate F, each coniting of 1 complex exponential generated according to Equation (2) with an initial amplitude a = 1, and mixed with a Gauian white noie. In our experiment, we et the ampling frequency F = 441Hz, the FFT ize N = 11, and the ignal-to-noie ratio (SN) goe from 2dB to +1dB by tep of db. For each SN and for each analyi method, we tet everal parameter combination: 99 frequencie (ω ) linearly ditributed in the (, 3F /8) interval, 9 phae (φ ) linearly ditributed in ( π, +π), and amplitude modulation (µ ) linearly ditributed in [ 1, +1]. For the analyi window w, we ue the ymmetric Hann window. We focu on the variance of the etimation error over thi tet et (the mean being zero for unbiaed etimator). We then compare the difference method (), the reaignment method (), and two variant of the derivative method: The etimated derivative method (), where the derivative i etimated uing the differentiator filter of Equation (19) of ize 123; and the theoretic derivative method (), where the exact derivative i ued for ince it i analytically known for the tet ignal. The reult of the method can be regarded a the bet performance the method could achieve, at the expene of a longer differentiator filter though. When looking at the reult of thee experiment (ee Fig. 3), we ee that all thee method are very efficient, cloe to the the Cramér-ao bound (), which are the limit to the bet poible performance achievable by an unbiaed etimator given a data et (ee []). In the high SN, the method i biaed becaue of the approximation of the derivative by the finite-length differentiator filter. Applying the pectral reaignment on the dicrete pectrum caue a bia, a noticed by Hainworth [12], degrading the performance of the method. Perhap urpriingly, the implet method i the mot efficient. 4.2. Theoretical Equivalence 4.2.1. eaignment and erivative In [4], the reaignment (Section 3.2) and derivative (Section 3.3) method were proven to be theoretically equivalent, at leat a regard the etimation of the frequency in the tationary cae. In [], we generalized the proof of the equivalence to the non-tationary cae, and for the etimation of both the frequency and the amplitude modulation. More preciely, we introduce ρ = τ t which give another (equivalent) expreion for the STFT (ee Equation (3)): = (t + ρ)w(ρ) exp ( jωρ) dρ (2) from which we can derive t log (S w(t, ω)) = S w(t, ω). (21) By conidering Equation (21) intead of Equation (14) in Section 3.2 (reaignment approach), we would have obtained the equation of Section 3.3 (derivative approach). Thu the reaignment approach i equivalent to the derivative approach, at leat in the continuou cae. A different proof, baed on integration by part, can be found in [13]. 4.2.2. eaignment and ifference It turn out that the reaignment approach i alo equivalent to the difference approach in the dicrete cae. Indeed, in the phae vocoder approach the parameter are etimated by firt-order difference, approximating the differentiation of the pectrum of Equation (1)-(16) in the dicrete-time cae.. CONCLUSIONS Although the three approache we preented in Section 3 are equivalent in theory, the mall difference oberved in practice in Section 4.1 are due to a bia of the reaignment method in the dicrete cae, and of the derivative method when uing a finite-length differentiator filter. The mot efficient approach turn out to be the implet one, baed on firt-order difference, i.e. a rather crude approximation of differentiation... But further invetigation are neceary, ince in theory the reaignment method (rep. the derivative method) require the analyi window (rep. time ignal) to be differentiable, which are a priori different condition. 6. EFEENCES [1]. J. McAulay and T. F. Quatieri, Speech Analyi/Synthei Baed on a Sinuoidal epreentation, IEEE Tran. on Acou., Speech, and Sig. Proc., vol. 34, no. 4, pp. 744 74, 1986. [2] J. O. Smith III and X. Serra, PASHL: An Analyi/Synthei Program for Non-Harmonic Sound baed on a Sinuoidal epreentation, in Proc. Int. Computer Muic Conf., 1987, pp. 29 297. [3]. Badeau, G. ichard, and B. avid, Performance of ESPIT for Etimating Mixture of Complex Exponential Modulated by Polynomial, IEEE Tran. on Sig. Proc., vol. 6, no. 2, pp. 492 4, 28. [4] S. Marchand and M. Lagrange, On the Equivalence of Phae-Baed Method for the Etimation of Intantaneou Frequency, in Proc. 14th European Conf. on Sig. Proc., 26. [] S. Marchand and Ph. epalle, Generalization of the erivative Analyi Method to Non-Stationary Sinuoidal Modeling, in Proc. Int. Conf. on igital Audio Effect, 28, pp. 281 288.

variance of the error (log1 cale) -1 etimation of the amplitude variance of the error (log1 cale) -1 etimation of the amplitude modulation -2 2 4 6 8 1 ignal-to-noie ratio (db) (a) -2-2 2 4 6 8 1 ignal-to-noie ratio (db) (b) variance of the error (log1 cale) -1 etimation of the phae variance of the error (log1 cale) -1 etimation of the frequency -2 2 4 6 8 1 ignal-to-noie ratio (db) (c) -2-2 2 4 6 8 1 ignal-to-noie ratio (db) (d) Fig. 3. Etimation error a function of the SN for the amplitude (a), amplitude modulation (b), phae (c), and frequency (d), with the reaignment (), difference (), theoretical derivative (), and etimated derivative () method, and with comparion to the Cramér-ao Bound (). [6] J. L. Flanagan and. M. Golden, Phae vocoder, Bell Sytem Tech. Journal, vol. 4, pp. 1493 19, 1966. [7] M. B. olon, The Phae Vocoder: A Tutorial, Computer Muic Journal, vol. 1, no. 4, pp. 14 27, 1986. [8] S. Marchand, The Simplet Analyi Method for Non- Stationary Sinuoidal Modeling, in Proc. Int. Conf. on igital Audio Effect, 212, pp. 23 26. [9] K. Kodera,. Gendrin, and C. de Villedary, Analyi of Time-Varying Signal with Small BT Value, IEEE Tran. on Acou., Speech, and Sig. Proc., vol. 26, no. 1, pp. 64 76, 1978. [1] F. Auger and P. Flandrin, Improving the eadibility of Time-Frequency and Time-Scale epreentation by the eaignment Method, IEEE Tran. on Sig. Proc., vol. 43, no., pp. 168 189, 199. [11] M. eainte-catherine and S. Marchand, High Preciion Fourier Analyi of Sound Uing Signal erivative, Journal of the AES, vol. 48, no. 7/8, pp. 64 667, 2. [12] S. W. Hainworth, Technique for the Automated Analyi of Muical Audio, Ph.. thei, Univerity of Cambridge, United Kingdom, 23. [13] X. Wen and M. Sandler, Note on Model-Baed Non- Stationary Sinuoid Etimation Method Uing erivative, in Proc. Int. Conf. on igital Audio Effect, 29.