Melodic contour estimation with B-spline models using a MDL criterion

Meodic contour estimation with B-spine modes using a MDL criterion Damien Loive, Ney Barbot, Oivier Boeffard IRISA / University of Rennes 1 - ENSSAT 6 rue de Kerampont, B.P. 80518, F-305 Lannion Cedex France {damien.oive,ney.barbot,oivier.boeffard}@irisa.fr http://www.irisa.fr/cordia Abstract This artice describes a new approach to estimate F 0 curves using a B-Spine mode characterized by a knot sequence and associated contro points. The free parameters of the mode are the number of knots and their ocation. The free-knot pacement, which is a NP-hard probem, is done using a goba MLE within a simuated-anneaing strategy. The optima knot number estimation is provided by MDL methodoogy. Three criteria are proposed: contro points are first considered as integer vaues, next as rea coefficients with fixed precision and then with variabe precision. Experiments are conducted in a speech processing context on a 7000 syabes french corpus. We show that a variabe precision criterion gives better resuts in terms of RMS error (0.4Hz) as we as in terms of reduction of the number of B-spine degrees of freedom (63% of the fu mode). 1. Introduction Intonationa speech modes are widey used in the speech technoogy fied. In particuar, Text-to-Speech synthesis (TTS) cannot circumvent the use of such modes to predict meodic contours from text and speaking styes. More recenty, these modes were used to find an optima sequence of acoustic units taking account of prosodic characteristics [1]. Athough intonation resuts from a combination of numerous inguistic factors, this artice focuses on the acoustic parameters known to be the main factor of prosodic perception, which is the fundamenta frequency F 0. F 0 contours, extracted from speech signa, ensue from the evoution of the vibration of the voca fods over time. A wide range of pubications deas with modeing of such contours. Especiay, we can cite modes ike MoMe [], Tit [3], as we as Sakai and Gass s mode [4] based on reguar spine functions. In this artice, we propose a B-spine styisation which integrates variabe oca reguarity notion without affecting the goba approximation of the F 0 curve, estimated according to the east square error criterion. In [5], a comparison between spines and B-spines abiity to mode F 0 is presented using a east squares criterion. The parameters of the B-spine mode are the knot number and their pacement. A free-knot pacement aows to catch contour irreguarities. A Monte-Caro type agorithm (Simuated-Anneaing) is impemented to circumvent the combinatoria compexity of the free-knot pacement. Our main goa is then to estimate the optima number of knots and to determine a parsimonious cass of B-spine curves. The framework we choose is the minimum description ength (MDL) criterion which offers an efficient compromise between mode precision and its number of parameters. In section, we introduce the B-spine mode and its parameters, estimated in section 3 according to a east squares criterion. In section 4, we present the MDL criteria to optimize the number of parameters of the mode. In section 5, the experimenta protoco is described and resuts are given in section 6.. B-spine mode In this section, we present the spine functions and their generaization to B-spine curves which have the abiity to mode open curves with variabe oca reguarity. Let us consider an interva [a, b] and its subdivision into + 1 sub-intervas: a = t m <... < t m++1 = b. A spine of degree m associated to (t i ) is a poynomia function of degree m inside each sub-interva and beongs to the cass of functions C m 1, which are m 1 times continuousy differentiabe, on [a, b). The transition points are caed the interna knots. The set of spines of degree m is a inear space of dimension m + + 1. In [], the MoMe agorithm aims to provide meodic contour representation using a quadratic spine. For that, n target points are estimated to constitute the stationary points of a spine of degree. Transition points (t i ) correspond to the abscissas of the n target points and their n 1 median points. Thus, the quadratic spine is associated to = n 3 interna knots and is characterized by the n constraints due to the n points to interpoate with an horizonta tangent. The inear space of spines associated to (t i) admits a basis of B-spine functions. In order to define them, we add to this knot sequence the (m+1) times dupicated vaues t 0 = t m = a and t m++1 = t m++1 = b, caed externa knots. We denote t the vector containing the interna and externa knots. Then, B-spines are recursivey defined: for i from 0 to m +, the B-spines of degree 0 are given by B i 0(t) = 1 [ti,t i+1 )(t) and for i from 0 to m +, the B-spines of degree m are given by B i m(t) = t ti Bm 1(t) i + ti+m+1 t B t i+m t i t i+m+1 t m 1(t) i+1 i+1 where quotients equa zero if t i = t i+m or t i+1 = t i+m+1. Moreover, B-spines are non-negative and satisfy t [a, b), m+ X i=0 B i m(t) = 1. (1)

Consequenty, a spine function of degree m can be written as a inear combination of B-spine functions of same degree. Spine generaization to B-spine curve aows dupicated knots, the knot mutipicity corresponding to the number of times it appears in t. This way, a B-spine curve of degree m associated to a knot vector t can be written as a inear combination of B-spine functions of degree m where the coefficients c 0,..., c m+ are caed contro points. Let a sequence x 0,..., x N 1 inside [a, b], the matrix representation of the B-spine curve at point x j is given by g θ (x j) = m+ X i=0 c ib i m (x j) = (Bc) j () where B i m(x j) is the (j, i)-th entry of the matrix B, c is the coumn vector containing the contro points and θ = (t m+1,..., t m+, c 0,..., c +m ). Now we assess the impact of each parameter of a B-spine curve g θ. The knot effect on the curve g θ is mainy controed by its mutipicity. If we consider an interna knot t i, greater its mutipicity m i is, ower is the smoothness of the curve at t i. More precisey, if g θ beongs to C m 1 between two consecutive knots, it beongs to C m m i at knot t i. For instance, if m i = m 1 the curvature of the B-spine curve changes at t i, if m i = m the curve can not be differentiated at t i and if m i > m, it is not continuous at t i. As regards contro points, they have a oca infuence. Indeed, suppose we change the position of c i, it ony aters the term c i Bm i in (), and since Bm i is zero outside [t i, t i+m+1 ), it does not propagate outside the corresponding curve segment. 3. Parameter estimation Let a set of measurements {(x j, y j), 0 j N 1} of a meodic contour, where (x j ) is a non-decreasing sequence. Take the probem of the B-spine curve g θ of degree m such as the points (x j, g θ (x j )) best fit these data using the east squares error criterion. In this section, we suppose the number of interna knots known, and et t 0 = t m = x 0 and t m++1 = t m++1 = x N 1. First, we derive the optima contro points and then we estimate the interna knot ocation. 3.1. Contro points In this section, the knot vector t is supposed known and we determine the B-spine curve g θ of degree m, that is to say its contro points, which minimizes the mean square error with the data. Let us denote y the coumn vector containing the y j, the contro points are estimated such as bc = arg min c y Bc according to (). After derivation, we obtain bc = B T B 1 B T y (3) if the matrix B T B is invertibe, i.e. when the coumns of B are ineary independent. That is to say t must not have knots with mutipicity order greater than m+1 to avoid nu coumns of B. Besides, the number N of ines of B has to be reay greater than the number of coumns in order to differentiate the coumns, i.e. N m + + 1. (4) Finay, for a given knot sequence t, the optima B-spine curve of degree m according to a east squares criterion is described by the N foowing points x j (Bbc) j = 3.. Knot pacement 1 B B B T B T y. (5) j Let us now consider the optimization of the interna knot ocation. We introduce the maximum ikeihood criterion and we choose a Monte-Caro strategy (Simuated-Anneaing, SA) to compute the soution b t of this NP-hard probem. We restrict the knot ocation to the set of observations, and knots can then be represented by integers between 0 and N 1. Our experiments show that such a restriction does not seem to decrease the quaity of the fina curve estimate. Let us denote e j the error between the observation y j at time x j and its estimation g θ (x j ). To simpify cacuus, the errors e 0,..., e N 1 are supposed independent and identicay distributed according to a zero mean Gaussian aw with variance σ. Then, given the mode, the og-ikeihood of y is og p y; θ = 1 σ y Bc N og πσ. (6) Let us notice that bc, defined by (3), is the maximum ikeihood estimate of c. The maximum ikeihood estimate b t of t is defined by b t = arg min t y B B T B 1 B T y where the matrix B depends on t. To determine b t, we used a goba optimization agorithm based on the Simuated-Anneaing (SA) strategy, presented in [5]. The proposed reation between the parameters of the mode and the random distributions of the SA agorithm is as foows: 1. SA sampes a vector v inside {x 1,..., x N }.. An interna knot vector t is defined by sorting the coordinates of v. 3. A knot vector t is defined by adding (m + 1) times x 0 and x N 1 to the extremities of t. 4. If two consecutive knots ie in an interva smaer than 5% of the fu range [x 0, x N 1 ], the two knots are merged and the mutipicity order of the first knot is increased by one. This process is then appied recursivey to a the knots defining t. Let us reca that knot mutipicity must not be greater than m + 1. Before tacking the optimization of the number of interna knots, it is usefu to sum up the main choices and strategies as regards the considered B-spine mode b θ. First, for a given, we start to estimate an optima knot-sequence b t using the SA agorithm. Then with the associated matrix B and (3), we compute the optima bc. In the foowing section, we ook for the best mode, that is to say the optima vaue of, according to a MDL criterion. 4. B-spine and MDL The minimum description ength (MDL) criterion enabes a compromise between the quaity of a mode b θ and its compexity. More precisey, MDL consists in determining b which

minimizes the description ength L(y) of the data y. According to [6], we have b = arg min where L bθ L(y) = arg min L bθ og p y; θ b and og p y; b θ stand respectivey for the description ength of the estimated mode and the observations conditioned by the mode. The expression (6), using the unknown error variance σ, we use its maximum ikeihood estimate c σ : cσ = arg max σ og p y; θ b = 1 N y Bbc. Hence, σ is estimated by the root mean square error (RMS) between the observations and the B-spine curve. Thus, using expression (6), we have L(y) = L bθ 4.1. Principe of soution + N + N og (π) + N og (RMS). We consider now the description ength of the vector θ b. This one is composed of interna knots and ( + m + 1) contro points. We suppose that the knots have a the same description ength, as we as the contro points. Regarding the interna knots, they are inside the interva (x 0, x N 1 ) at observation paces. Hence, each knot can be represented by an integer between 0 and N 1 and so og (N) bits are sufficient to encode it. As for the contro points, they are rea parameters estimated from N data points. When N is arge, the ength of a rea parameter is generay approached by og ( N) which is asymptoticay optima [6]. However, each contro point has a oca infuence on the B-spine mode g θ b and if the number of observations is sma the asymptotic approximation is not we adapted [7]. We then consider a description ength of a contro point based on a uniform prior on the interva [ α, α] [8]. For a given contro point description ength precision ε, its ength is given by og (α) + 1 og (ε). The choices of α and ε are discussed in the foowing paragraph. In short, the description ength of θ b can be written as L bθ = og (N) + (m + + 1) (og (α) + 1 og (ε)). 4.. Theoretica bounds on contro points For a given b t, the contro points bc are assumed generated by an uniform prior on [ α, α]. The maximum ikeihood estimate of α is bc = max i bc i. Thus, moduo a constant independent of, we obtain the first MDL criterion L(y) = (m + + 1) og bc + 1 og (ε) +N og (RMS) + og (N) (7) denoted criterion (a). Considering the expression (3) of bc, we estabish a bound on contro point definition interva and we can then consider the associated uniform prior. Proposition 4.1 Let us denote µ the smaest singuar vaue of the matrix B, we have µ > 0 and bc y /µ. Proof. The matrix B being of maxima rank, i.e. m + + 1, its smaest singuar vaue µ is positive. According to [9], and using (3), we have (B T B) 1 B T = 1/µ bc bc = (B T B) 1 B T y (B T B) 1 B T y y /µ. From proposition 4.1, if the contro points are assumed uniformy distributed on [ y /µ, y /µ], we obtain the foowing MDL criterion L(y) = (m + + 1) (og ( y /µ) + 1 og (ε)) denoted criterion (b). 4.3. Contro point precision +N og (RMS) + og (N) (8) For a given b t, we now study the infuence of the contro point precision ε on the rebuit mode. Let us noteec an approximation of bc such that ec bc ε. According to the chosen error evauation between the estimated curve Bbc and the rebuit one Bec, we derive a sufficient precision ε. Proposition 4. We have Bec Bbc ε. Proof. According to (1), B = 1 and we get Bec Bbc B ec bc ε. As a resut, if we want a variation between the estimated and rebuit points smaer than the error between the data and the estimated points, choosing a contro point precision of ε = y Bbc is sufficient. Coroary 4.1 The RMS error between the rebuit curve B c and the optima one Bbc is ess than ε. Proof. We reca that for any vector x of R N, x x N x (9) and using proposition 4., the RMS error between Bec and Bbc then verifies Bec Bbc / N Bec Bbc ε. Thus, if we wish a RMS error between the estimated B- spine curve Bbc and Bec ess than the RMS error between y and Bbc, taking ε = RMS = y Bbc / N is sufficient. 4.4. MDL criteria for B-spines In paragraph 4., we have introduced two MDL criteria depending on the choice of the contro point description ength, caed (a) and (b), defined by (7) and (8). For each of them, we distinguish three sub-criteria in function of the precision ε. The first one considers ε as fixed and the others as a variabe precision depending on the mode. We consider that the error between the estimated and rebuit curves has not to be ower than the one between the data and the estimated curve. According to paragraph 4.3, we choose ε = RMS and then ε = y Bbc. From criteria (a) and (b), we obtain three sub-criteria: (a).1 and (b).1: ε fixed for a curves, (a). and (b).: ε = RMS, (a).3 and (b).3: ε = y Bbc.

5. Experimenta protoco The main objective consists in estimating the B-spine mode b θ using the above MDL criteria in order to obtain a compromise between the mode quaity (assessed by the RMS error) and the number of degrees of freedom (D.F.). We introduce the methodoogica hypotheses common to a experiments which answer to the three foowing questions: what is the reation between the RMS error and the number of D.F. of the mode? What is the MDL criteria sensibiity in function of the precision ε? Finay, how do the proposed criteria behave? The speech corpus used in these experiments is a French one made of approximatey 7000 syabes randomy extracted from a 7000 sentences corpus. The acoustic signa was recorded in a professiona recording studio; the speaker was asked to read the text. Then, the acoustic signa was annotated and segmented into phonetic units. The fundamenta frequency, F 0, was anayzed in an automatic way according to an estimation process based primariy on the autocorreation function of the speech signa. Next, an automatic agorithm was appied to the phonetic chain pronounced by the speaker so as to find the underying syabes. We choose the syabe as the minima support of a meodic contour. Our objective is to measure the performance of the unsupervised styisation of meodic contours using different decision criteria. For interna knots, the mode b θ depends on ( + m + 1) parameters, and if this number is greater than the number N of F 0 vaues, it is more economica to memorize the curve. When we estimate a B-spine mode, we then choose to satisfy the foowing condition on the interna knot number : N ( + m + 1). The curves having different engths, we normaize the number of interna knots in order to cacuate the means of comparabe number of D.F. for a the curves. More precisey, we divide the knot number of a B-spine mode by the number of points of the curve. A normaized number of D.F. equa to 1 wi then correspond to the whoe curve, i.e. to a fu mode. A proposed experiments use mean vaues cacuation: the mean vaue of the RMS errors and the mean vaue of the B- spine modes D.F. These ones are formuated as confidence intervas with a confidence threshod equa to 99%. Considering the nature of our experiments, a singe experimenta sampe containing 7000 curves is used, the confidence intervas are obtained by resamping the empirica distributions (Bootstrap methodoogy). 6. Resuts and discussion The MDL criterion seects the mode structure between a the possibe modes for a given syabe. So the first stage is the estimation of the modes θ for each syabe varying interna knot number. The second stage is to appy the MDL criteria on each syabe to decide which mode b θ is the best one. In this section, we present the different experiments we carried out. 6.1. RMS error versus D.F. These experiments are conducted in order to observe the evoution of the RMS error in function of the degrees of freedom number. The obtained curve (figure 1) permits to measure the parameter number impact on the estimated modes quaity. The normaized D.F. number varies from 0 to 1. A syabes of the corpus are taken into account in this figure. When D.F. number increases, RMS error decreases. Athough this resut was foreseeabe, it justifies the search of a RMS (Hz), 99% confidence interva 14 13 1 11 10 9 8 7 6 5 4 3 1 0 ower bound upper bound mean 0.1 0. 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Normaized degrees of freedom Figure 1: RMS error 99% confidence interva evoution in function of the normaized degrees of freedom. compromise between the mode precision and its compexity. In particuar, we notice a point of infection in the curve with an average normaized D.F. number cose to 0.65. This vaue corresponds to a mean vaue of RMS error cose to 1Hz. The increase at the end of the curve probaby shows a weakness of the SA strategy when the knot number becomes high. Indeed, the condition (4) is no more satisfied. Athough the optimization process is goba, SA does not necessariy provide an optima knot vector. Finay, the MDL goa being to obtain a compromise between precision and compexity, a satisfying criterion shoud estimate a mean RMS error and a mean normaized D.F. number near the infection point of the curve. 6.. MDL and ε reation It is important to compare variabe precision MDL criteria to fixed precision ones. For that, we study the evoution of the RMS error according to a the possibe vaues of ε (criteria (a).1 et (b).1). Thus, we assess the infuence of the parameter precision ε on the RMS error and the D.F. number. Figure represents the RMS error confidence intervas in function of ε vaues. Figure 3 shows the evoution of the normaized D.F. number in function of ε. RMS error vaues and ε vaues are represented with a ogarithmic scae. Notice that the fact of considering the contro points as integers corresponds to the choice of ε = 1. 6..1. Criterion (a) Less precisey the mode is coded (ε arger), higher knot number is aowed. Indeed, when ε increases, the term og ε decreases, the criterion seects a higher vaue of and the RMS error then decreases. In figure, we can notice a quite significant variation of the RMS error. Indeed, there is a scae factor cose to 10 between its maximum and minimum vaues. Moreover, we observe an important increase of the D.F. number. To sum up, the curve variations underine the impact of ε on the RMS error and the chosen D.F. number. 6... Criterion (b) The same observations as for criterion (a) are vaid for criterion (b). Indeed, the infuence of ε is important, it governs the criterion performance. However, this criterion gives higher RMS error vaues than the previous one. According to proposition 4.1, the contro point description ength is greater and penaizes more the MDL criterion (b). Therefore, the average D.F. number stemming from (b) is ower than the one from (a), inducing a RMS error increase. This observation is coherent with the evoution of the RMS error in function of the D.F. number in figure 1.

RMS (Hz), 99% confidence interva 9 8 7 6 5 4 3 1 ower bound upper bound mean 0-30 -5-0 -15-10 -5 0 og ε Figure : RMS error 99% confidence interva in function of og ε for criterion (a). Normaized degrees of freedom, 99% confidence interva 1 0.9 0.8 0.7 0.6 0.5 0.4-30 -5-0 -15-10 -5 0 og ε Figure 3: Normaized degrees of freedom 99% confidence interva evoution in function of og ε for criterion (a). 6.3. Proposed MDL criteria anaysis To evauate the proposed criteria, we cacuate confidence intervas on the RMS error and the normaized D.F. number for the MDL seected modes. The resuts are summarized in tabe 1. By comparing these confidence intervas to previous resuts, we wi be abe to observe the reiabiity of the proposed criteria. Contrary to the preceding experiments, ε is variabe (criteria (a)., (a).3, (b).3 et (b).3). These resuts permit the distinction of two compromises. First, the criterion (a). gives a mean RMS error of 0.68Hz with a normaized D.F. number of 0.599, whie criterion (b). gives respectivey 1.57Hz and 0.544. Criterion (b) uses a contro point description ength higher than the one used in (a). Therefore, it penaizes more and the seected vaues are smaer. The mean RMS error then increases. Secondy, two variabe modes were tested for each criterion. The use of y Bbc sighty improves the RMS error resuts for each criterion and impies an increase of the D.F. number. Indeed, according to (9), RMS y Bbc so the criteria (a).3 et (b).3 are ess penaizing. If the RMS error is privieged, the third versions of the criteria are better than the others. To compare fixed and variabe precision criterion, it is necessary to compare the normaized D.F. numbers, taking equa mean RMS errors. Simiary, a reciproca comparison is essentia. For criterion (a).3, if we choose a mean RMS error equa to 0.4Hz, figure gives the og ε vaue 3. By reporting this Tabe 1: 99% confidence intervas for the RMS error and the Normaized Degrees of Freedom (Norm. D.F.). Criteria RMS (Hz) norm. D.F. (a). 0.68 ± 0.11 0.599 ± 0.006 (a).3 0.4 ± 0.07 0.67 ± 0.006 (b). 1.57 ± 0.16 0.544 ± 0.006 (b).3 1.11 ± 0.1 0.567 ± 0.006 vaue on figure 3, the obtained D.F. number is approximatey 0.75. If we appy this reasoning in the same way with equa D.F., we can concude that criterion (a).3 is better than a fixed precision one. The same reasoning is vaid for (b).3. We have to compare the variabe precision criteria to the genera evoution of the RMS error in function of the normaized degrees of freedom (fig.1). Criterion (a).3 is ocated near the infection point of the curve at coordinates (0.67, 0.4). Criterion (b).3 is before the infection point at (0.567, 1.11). We concude that criterion (b) is ess efficient than (a). In [5], we compared a B-spine mode to a spine mode using an experimenta framework cose to the one used here. We showed, tabe 1 page 4 of the artice, that B-spine modes with a normaized degree order of 60% eads to a mean RMS error about 3Hz (this resut can be found figure 1 abscissa 0.6). As for the spine mode, it ed to a mean RMS error around 1Hz. In [10], it has been shown that MoMe eads to 6Hz of mean RMS error in the best case. These resuts suggest, on the one hand, that a B-spine mode outperforms a spine mode and, on the other hand, a MDL criterion with variabe precision improves B-spine mode performance. 7. Concusion In this artice, we present a new approach to estimate meodic contours using a B-Spine mode. The precision of such a modeing is important to characterize meody in speech processing. B-spine mode generaizes spine mode and aows to describe precisey the oca irreguarities of the curve. The main contribution of this artice concerns the estimation of an optima knot number for the B-spine mode thanks to the MDL methodoogy. Appied to F 0 contours modeing at the syabic eve, this approach eads to a mean RMS error of 0.4Hz with a normaized number of degrees of freedom equa to 0.63 (a normaized number of degrees of freedom equa to 1 corresponds to the use of as much parameters as points of the curve). These RMS vaues are, on the one hand, ess than the F 0 JND threshod (around a few Hertz) and on the other hand, they are obtained with a compression factor reativey high (37% on average). 8. References [1] A. Raux and A. Back, A unit seection approach to f0 modeing and its appication to emphasis, in Proc. ASRU Conf., 003, pp. 700 703. [] D. Hirst, A. D. Cristo, and R. Espesser, Leves of representation and eves of anaysis for the description of intonation systems, In M. Horne (Ed.),Prosody : Theory and Experiment, Kuwer Academic Pusbisher, vo. 14, pp. 51 87, 000. [3] P. Tayor, Anaysis and synthesis of intonation using the

tit mode, J. Acoust. Soc. America, vo. 107, pp. 1697 1714, 000. [4] S. Sakai and J. Gass, Fundamenta frequency modeing for corpus-based speech synthesis based on statistica earning techniques, in Proc. ASRU Conf., 003, pp. 71 717. [5] N. Barbot, O. Boeffard, and D. Loive, F0 styisation with a free-knot b-spine mode and simuated-anneaing optimization, in Proc. Eurospeech Conf., 005, pp. 35 38. [6] M. H. Hansen and B. Yu, Mode seection and the principe of minimum description ength, J. Amer. Stat. Assoc., vo. 96, no. 454, pp. 746 774, 001. [7] M. Figueiredo, J. Leitao, and A. Jain, Unsupervised contour representation and estimation using b-spines and a minimum description ength criterion, IEEE Trans. Image Proc., vo. 9, no. 6, pp. 1075 1086, 000. [8] T. C. Lee, An introduction to coding theory and the twopart minimum description ength principe, Int. Stat. Review, vo. 69, no., pp. 169 183, 001. [9] G. Goub and C. V. Loan, Matrix Computations. Johns Hopkins Univ. Press, Batimore, MA., 1989. [10] S. Mouine, O. Boeffard, and P. Bagshaw, Automatic adaptation of the mome f 0 styisation agorithm to new corpora, in Proc. of ICSLP, 004.