PARAMETRIC coding has proven to be very effective

Size: px

Start display at page:

Download "PARAMETRIC coding has proven to be very effective"

Brandon Underwood
5 years ago
Views:

1 966 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 3, MARCH 2007 High-Resolution Spherical Quantization of Sinusoidal Parameters Pim Korten, Jesper Jensen, and Richard Heusdens Abstract Sinusoidal coding is an often employed technique in low bit-rate audio coding. Therefore, methods for efficient quantization of sinusoidal parameters are of great importance. In this paper, we use high-resolution assumptions to derive analytical expressions for the optimal entropy-constrained unrestricted spherical quantizers for the amplitude, phase, and frequency parameters of the sinusoidal model. This is done both for the case of a single sinusoid, and for the more practically relevant case of multiple sinusoids distributed across multiple segments. To account for psychoacoustical effects of the auditory system, a perceptual distortion measure is used. The optimal quantizers minimize a high-resolution approximation of the expected perceptual distortion, while the corresponding quantization indices satisfy an entropy constraint. The quantizers turn out to be flexible and of low complexity, in the sense that they can be determined easily for varying bit rate requirements, without any sort of retraining or iterative procedures. In an objective comparison it is shown that for the squared error distortion measure, the rate-distortion performance of the proposed method is very close to that of the theoretically optimal entropy-constrained vector quantization. Furthermore, for the perceptual distortion measure, the proposed scheme is shown to objectively outperform an existing sinusoidal quantization scheme, where frequency quantization is done independently. Finally, a subjective listening test, in which the proposed scheme is compared to an existing state-ofthe-art sinusoidal quantization scheme with fixed quantizers for all input signals, indicates that the proposed scheme leads to an average bit rate reduction of 20%, at the same subjective quality level as the existing scheme. Index Terms High-resolution quantization, point density functions, sinusoidal coding, unrestricted spherical quantization. Fig. 1. Sinusoidal coding. component. Often, the bit budget available for encoding the sinusoidal component is allocated dynamically based on the bit needs of the other component subcoders. For this reason, it is desirable to have simple and flexible quantizers which can adapt easily to changing bit-rate requirements without any sort of iterative quantizer (re)design procedures. Developing an efficient quantization scheme for the sinusoidal component and its corresponding parameters is therefore critical. Fig. 1 shows a block diagram of a typical sinusoidal subcoder, which models the input signal as a sum of sinusoids I. INTRODUCTION PARAMETRIC coding has proven to be very effective for representing audio signals at low bit rates [1] [4]. Typically, a parametric coder is subdivided into several separate subcoders, each operating on different components of the input signal; these generally include a sinusoidal component and a noise component, and sometimes also include a transient component. For many audio signals, the sinusoidal component, represented by amplitude, phase, and frequency parameters, is perceptually the most important of the three [3]. Consequently, the main part of the bit budget is typically assigned to this Manuscript received June 3, 2005; revised June 13, The work was supported by STW, applied science division of NWO, and the technology program of the Dutch ministry of Economic Affairs. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Gerald Schuller. The authors are with the Department of Mediamatics, Delft University of Technology, 2628 CD Delft, The Netherlands ( p.e.l.korten@tudelft.nl; j.jensen@tudelft.nl; r.heusdens@tudelft.nl). Digital Object Identifier /TASL where,, and denote amplitude, phase, and frequency, respectively. Note that the input signal for the sinusoidal coder may consist of the original signal or the output of the transient coder. The parameters are then quantized and the corresponding quantization indices are entropy encoded. In this paper, we focus on quantizing the sinusoidal amplitude, phase, and frequency parameters efficiently. More specifically, we aim at minimizing the quantization distortion (as measured by an appropriate distortion measure), subject to an entropy constraint. The quantizers in this paper are derived under high-resolution assumptions, i.e., the input space is assumed to be covered by a very large number of quantization cells. Consequently, the probability density functions of the input variables can be assumed constant in each quantization cell [5]. Using these assumptions, considerable simplifications can be made in distortion and entropy formulas, resulting in analytically simple expressions for the optimal quantizers, which turn out to be valid already at practical low bit rates /$ IEEE

2 KORTEN et al.: HIGH-RESOLUTION SPHERICAL QUANTIZATION OF SINUSOIDAL PARAMETERS 967 The proposed quantization scheme in this paper consists of optimal high-resolution entropy-constrained quantizers for amplitude, phase, and frequency, which are derived using a perceptual spectral distortion measure. Note that parts of this paper have been presented in [13] and [15]. The proposed quantizers minimize the expected perceptual distortion, while satisfying an entropy constraint. Note that in this paper, the rate needed to encode the masking threshold is included in the entropy constraint. The proposed scheme is denoted by entropy-constrained unrestricted spherical quantization (ECUSQ), where the term unrestricted refers to the fact that amplitude, phase, and frequency parameters are dependently quantized. Optimal ECUSQ quantizers are derived for both for the case of a single sinusoid, and for the more practically relevant case of multiple sinusoids distributed across multiple segments. In the single sinusoid case, the rate-distortion performance of the proposed ECUSQ scheme is compared to that of the theoretically optimal entropy-constrained vector quantization (ECVQ) and to that of entropy-constrained strictly spherical quantization (ECSSQ) in which the parameters are quantized independently. For these comparisons, we use the squared error measure. By doing so, implementing the ECVQ algorithm is feasible, in terms of computational complexity, which is not the case if the perceptual distortion measure is used. For the multiple sinusoids-multiple segments case, the performance of the proposed scheme is both objectively and subjectively compared to that of existing entropyconstrained quantization schemes. First, the proposed scheme is objectively compared to the ECUPQ+ scheme, which is a combination of the ECUPQ scheme (entropy-constrained unrestricted polar quantization only amplitude and phase quantization) and an optimal independent entropy-constrained frequency quantizer. The main advantage of the proposed method over ECUPQ+ is that the bit distribution between amplitude, phase, and frequency does not need to be determined beforehand, but follows as a result of the derived formulas, whereas in ECUPQ+ this needs to be chosen a priori. Second, a listening test was done in which the subjective performance of the proposed scheme is compared to that of an existing state-of-the-art sinusoidal quantization scheme using log-quantizers, in which the quantizers are fixed for all input signals. Note that these quantizers are also used in the standardized MPEG-4 SSC coder [6] to quantize births of sinusoidal tracks. In a practical sinusoidal coding scheme, the sinusoidal parameters are usually (time/frequency) differentially encoded. In this paper, we derive quantizers which quantize the sinusoidal parameters directly instead of differentially. However, in [7], it is shown that the proposed quantization scheme can be easily extended to include differential encoding as well. The remainder of this paper is organized as follows. In Section II, we discuss previous work concerning sinusoidal quantization and the perceptual distortion measure. In Section III, we discuss the single sinusoid case. The optimal ECUSQ quantizers and the optimal bit distribution are determined. Furthermore, the proposed scheme is compared to ECVQ and ECSSQ, using the squared error distortion measure. Section IV discusses the case of multiple sinusoids distributed across multiple segments. After developing the optimal ECUSQ quantizers, the proposed scheme is compared to ECUPQ+ (objectively) and the sinusoidal log-quantization scheme (subjectively). In Section V, we give some conclusions of our work. Finally, some proofs are included in the Appendix. II. PREVIOUS WORK The ECUSQ quantizers generalize and advance previous work in sinusoidal quantization and coding. Additionally, the ECUSQ derivations rely on an established perceptual distortion measure. In this section, we will discuss these two points. A. Sinusoidal Quantization and Coding In [9] [11], unrestricted polar quantization (UPQ) has been introduced, in which only amplitude and phase parameters are quantized. In this scheme, phase quantization depends on the input amplitude. The derivations in [9] [11] are done subject to a resolution constraint, i.e., a fixed number of quantization cells and a fixed rate. However, in some applications, an entropy constraint rather than a resolution constraint is of interest. In [12], entropy-constrained unrestricted polar quantization (ECUPQ) is introduced, and using high-resolution assumptions, analytical expressions for the optimal scalar ECUPQ amplitude and phase quantizers are derived. These quantizers minimize the expected distortion, while satisfying an entropy constraint. A shortcoming of this work, however, is that it does not consider quantization of frequency parameters. In [13] and [14], ECUPQ is generalized to include frequency quantization. In the first citation, this extended scheme is denoted by entropy-constrained unrestricted spherical quantization (ECUSQ). Analogously with ECUPQ, amplitude, phase, and frequency are quantized dependently in this scheme. In both citations, optimal scalar high-resolution amplitude, phase, and frequency quantizers are derived, so as to minimize a prespecified distortion, while satisfying an entropy constraint. Unlike the ECUPQ quantizers derived in [12], the quantizers in [13] and [14] are dependent on the frame-length and shape of the analysis/synthesis window (as one would expect). Such a framelength dependent quantization is important in coding schemes, where variable segment length analysis is used, see, e.g., [16] and [17]. In [13], a mean-squared-error distortion measure is used, whereas in [14] a perceptually weighted mean-squarederror distortion measure is used to account for psychoacoustical effects of the auditory system. Additionally, in [15] the work in [13] is extended to a perceptual spectral distortion measure. Hence, both methods in [14] and [15] account for perceptual effects; however, the method presented in [14] has a few restrictions in comparison with [15]. First, only phase quantization is made dependent of the individual perceptual weights, resulting in amplitude and frequency quantizers that do not account for auditory perception, whereas in [15], all three quantizers take perception into account. Second, the weights in [14] are considered fixed when computing the expectation of the total quantization distortion over all possible input signals, while varying the input signal should result in varying perceptual weights, as is the case in [15]. Finally, in [14], only one segment is used, defined by a rectangular window, whereas in [15], the more practically relevant situation of multiple segments defined by nonrectangular windows is considered. However, in [15], the rate needed to encode the masking threshold is not taken into account.

3 968 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 3, MARCH 2007 There exist two standardized sinusoidal coders, namely MPEG-4 HILN [18] and MPEG-4 SSC [6]. The HILN coder typically operates at bit rates lower than 16 kb/s, while the SSC coder operates at higher bit rates around 24 kb/s. These coders incorporate multiple signal model components as described in the introduction. Since in the proposed scheme we only focus on the quantization of the sinusoidal parameters, and not on the complete coder, we will not use these coders for benchmarking. B. Perceptual Distortion Measure The perceptual distortion measure used throughout this paper is introduced in [19] and is defined by where denotes the Fourier transform operation, and denotes the difference between the original signal and the quantized signal. Furthermore, is the analysis window used and is a weighting function representing the sensitivity of the human auditory system at a particular (normalized) frequency. Note that the perceptual distortion measure introduced in [19] has a rather different notational form than (1). However, in [17], it is proven that the two measures are equal if is selected to be the inverse of the masking threshold corresponding to the input signal. In this way, frequencies for which the auditory system is less sensitive will contribute less to the total distortion than frequencies for which the auditory system is more sensitive. Note that this perceptual model only accounts for spectral masking effects, and does not include temporal effects. III. ECUSQ OF A SINGLE SINUSOID In this section, we will derive optimal ECUSQ high-resolution quantizers for amplitude, phase, and frequency, for the case where the input signal is represented by one single sinusoid. Furthermore, objective comparisons will be made with several other quantization schemes using a special case of the perceptual distortion measure, the -squared-error measure. (1) where is the joint probability density function of amplitude, phase, and frequency, corresponding to distributions, and, respectively. Furthermore, and.in high-resolution theory, quantizers are described by quantization point density functions [20], [21], which when integrated over a region give the total number of quantization levels within. Thus, in the case of one-dimensional quantizers, the quantizer step sizes are simply given by the reciprocal values of the point density functions, that is,. Note that these point densities do not specify the location of the quantization points. In our scheme, we encounter point density functions for amplitude, phase, and frequency, denoted by,, and, respectively. Note that since we consider unrestricted quantization, the quantization point density functions are assumed to depend on all three parameters. To be able to reconstruct at the decoder, the masking threshold sample also has to be quantized and encoded, as will become clear later. Note that this sample is sufficient to reconstruct the sinusoid, i.e., we do not need to encode the entire masking threshold. Throughout this paper, we will quantize the masking threshold samples uniformly in the db-domain with a stepsize of 8 db, of which the effect is experimentally found to be inaudible and hence negligible for the perceptual distortion. However, encoding the masking threshold samples does give a contribution to the rate. We will discuss the extent of this contribution later. In the remainder of this section, we derive a high-resolution approximation for the entropy of the quantization indices. Let,,, and denote the alphabets of amplitude, phase, frequency, and masking threshold quantization indices, respectively. The joint entropy of the quantization indices is equal to where is the rate needed to encode the masking threshold sample. Under high-resolution assumptions, can be approximated by (3) A. High-Resolution Approximations of Expected Distortion and Entropy In this section, we will derive high-resolution expressions for the expected perceptual distortion and entropy, for a single sinusoid. In the single sinusoid case, the input signal is approximated by one sinusoid, i.e.,, for. Here, and are amplitude, phase, and frequency respectively,, and is the frame-length. The masking threshold is denoted by. Furthermore, the quantization error signal is given by. Our goal is to minimize the expected perceptual distortion. In Appendix A, it is proven that under high-resolution assumptions is approximated by where has distribution and (4) (2)

4 KORTEN et al.: HIGH-RESOLUTION SPHERICAL QUANTIZATION OF SINUSOIDAL PARAMETERS 969 is the joint differential entropy of amplitude, phase, and frequency, conditioned on. In, we used the high-resolution assumption that probability density functions are constant within a quantization cell to approximate and (10) are introduced for notational simplicity. Substituting (9) back into (6) (8), we find closed-form expressions for the optimal high-resolution ECUSQ quantizers that solve (5) Furthermore, we replaced sums by integrals and quantization step sizes by quantization point densities. Note that the integration over falls out in the three integrals expressions in (4) since is fully determined by,, and. B. Optimal Quantizers In this section, we will derive the quantization point densities that minimize the expected perceptual distortion, while satisfying an entropy constraint subject to (5) where is a prespecified target entropy. This constrained minimization problem can be solved using the method of Lagrange multipliers, turning it into an unconstrained minimization problem. In this method, the Lagrangian cost function is minimized, where is the Lagrangian multiplier, which should be chosen such that the entropy constraint is satisfied. A well-known theorem in variational analysis states that is minimized if the so-called Euler Lagrange equations for with respect to,, and, individually, are satisfied [22]. Solving these equations, the quantization point densities that minimize the cost function are found to be Substituting (6) (8) in the entropy constraint, using (4), we find the optimal value of the Lagrange multiplier, as shown by (9) at the bottom of the page, where (6) (7) (8) where (11) (12) (13) accounts for perceptual effects. Note that since is proportional to, the perceptually more important sinusoids are quantized more finely. Since is inversely proportional to the power of the sinusoid, the optimal amplitude density gives rise to a logarithmic amplitude quantizer. Both phase and frequency quantizers, however, are uniform for given amplitude. Commonly, logarithmic frequency quantization is used, which is based on psychoacoustical data measured for signal durations of about 1 s. However, in the proposed scheme, sinusoids are segmented into (short) frames. Hence, the errors introduced by the frequency quantization, which are noise-like, do not introduce a frequency error of the complete (long duration) sinusoid, but will have a noisy character due to the segmentation into relatively short time frames. Since the psychoacoustical model we use is developed for short-time prediction of errors, the logarithmic behavior will hardly occur, even for long duration signals. The distortion-rate relation for ECUSQ, concerning a single sinusoid, can now be found by substituting (11) (13) in (2), as shown by (14) at the bottom of the next page. It is easy to verify that all three parameters give exactly the same contribution to this distortion. Furthermore, it is not difficult to show that if is an even-symmetric window, the distortion (14) is minimal for. We assume this to be the case in the remainder of this paper. C. Implementation Issues As mentioned earlier, point density functions do not contain any information about the actual location of the quantization reconstruction points. In order to make a practical implementation of the derived quantizers, we will make some assumptions. (9)

5 970 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 3, MARCH 2007 First, we use the quantized amplitude value instead of the original amplitude value to obtain the optimal phase and frequency quantizers, i.e., and. Furthermore, the amplitude, phase, and frequency quantization points are assumed to be in the middle of the corresponding quantization intervals. The first amplitude quantization point is chosen to be at 0, and in each amplitude quantization level the first phase and frequency quantization point is selected to be 0 radians. Finally, in order to practically compute the derived quantizers, the term, defined in (10), needs to be calculated. The problem of how to compute this term is dealt with in the second part of this paper. D. Simulation Example In this section, we compare the theoretical high-resolution rate-distortion approximation derived in (14) to a practically obtained rate-distortion curve, which is constructed by generating a large number of realizations of single sinusoids, quantizing these sinusoids with the derived quantizers for different target entropies, and measuring the resulting average perceptual distortion and entropy of quantization indices. In order to generate these sinusoids, the input distributions of amplitude, phase, and frequency have to be given a priori. Assume that amplitude and frequency are Rayleigh distributed. This distribution has probability density function. We choose and. The phase is assumed to be uniformly distributed on. These distributions are close to the ones we observed in experiments with real audio data. Furthermore, we assume that,, and are independent. Knowing the distributions, a large number of triplets is generated, and subsequently quantized with the quantizers derived in (11) (13) for a given target entropy, where we use a Hanning window with length. In order to quantize the triplets, we need to estimate,, and. Knowing, we obtain, where is the Euler constant. Furthermore, is determined by quantizing the masking threshold sample for every triplet in the db-domain using a step size of 8 db, yielding. Since the three parameters are independently distributed, we have. Estimating the probability of each masking threshold quantization index and multiplying this probability with the differential entropy of the set of amplitudes, phases, and frequencies that gave rise to that index, and summing the result over all indices, we obtain an estimation of the conditional differential entropies Fig. 2. Theoretical versus practical distortion-rate performance for ECUSQ concerning a single sinusoid. where the differential entropies conditioned on a specific masking threshold quantization index are estimated by first determining the underlying probability density functions, using a variant of the nearest-neighbor method [23]. Using (1), the quantization distortion for each triplet is then determined, and averaged over all triplets. Second, after quantizing the triplets, the joint entropy of quantization indices is estimated. This is done by computing per individual triplet, where the entropies,, and are estimated by determining their corresponding conditional probability mass functions, using the known input distributions and step sizes. The final estimation of the joint entropy of quantization indices is then obtained by averaging over all triplets. Repeating this procedure for several different target entropies, we obtain a practical rate-distortion curve as plotted in Fig. 2, where we used and a frame-length. In the same figure, the theoretical high-resolution approximation of the average distortion given by (14) is plotted. It can clearly be seen that the curves converge towards each other, which verifies that (14) is indeed a valid approximation, already at practical low bit rates. At an entropy of 30 bits, the difference between the curves is only 0.1 db, and for higher rates this difference decreases, where we note that the (14)

6 KORTEN et al.: HIGH-RESOLUTION SPHERICAL QUANTIZATION OF SINUSOIDAL PARAMETERS 971 practical rate for representing a triplet is in the range of 12 to 20 bits. For very low rates, where the high-resolution assumptions do not hold, it is clear that the approximation (14) is not valid anymore. E. Distribution of the Entropy Between Amplitude, Phase, and Frequency In this section, we will determine the distribution of the entropy between amplitude, phase, frequency, and the masking threshold corresponding to the optimal ECUSQ quantizers (11) (13), using the entropy chain rule. Since we consider unrestricted quantization, we begin with computing, and use this to compute. Then we are able to determine by applying the chain rule. Using high-resolution assumptions we obtain Fig. 3. Entropies of quantization indices as a function of frame-length for H =20. as derived in the previous section and substituting the optimal amplitude quantization point density (11) into the entropies (18), (17), and (15) then gives (15) We apply the conditional entropy chain rule to determine this we need.for where we used that Subtracting (15) from (16) we obtain (16) (17) Finally, we can determine by subtracting (17) and (15) from (4) (18) In the aforementioned derivations, denotes conditional differential entropy of the corresponding variables. To illustrate these formulas, we assume again that amplitude, phase, and frequency are independently distributed, where and are Rayleigh distributed with and, and is uniformly distributed over. Since the three parameters are independently distributed, we have and. Using the values for,,,, and Note that, which is exactly the entropy constraint imposed on the optimal quantizer design. For a fixed target entropy, these entropies only depend on the frame-length and the window (both through ). We see that in this example phase will always be assigned 2.65 bits more than amplitude. Furthermore, if the frame-length is increased, more bits will be assigned to frequency, and hence less to amplitude and phase. This can be expected since for increasing frame-length, the frequency quantization error grows more rapidly than the amplitude and phase quantization errors. Consequently, more bits will have to be assigned to the frequency quantizer in order to keep the distortion minimal. In Fig. 3, the entropies of the quantization indices are plotted as a function of for, where we used a Hanning window. F. Special Case: the -Squared-Error Measure In order to see how well the proposed scheme performs in a rate-distortion sense, we would like to compare its performance with that of ECVQ, which is, according to theory, the optimal scheme in our framework. However, this would imply computing the masking threshold for each training vector in the

7 972 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 3, MARCH 2007 Next, we will compare the distortion-rate performances of ECUSQ with that of ECSSQ and ECVQ, each using the -squared-error measure. 1) Entropy-Constrained Strictly Spherical Quantization: In strictly spherical quantization, amplitude, phase, and frequency are quantized independently of each other. Hence, the quantization point density functions can only depend on their corresponding parameter:,, and. In the same way as in the unrestricted case, we then minimize the expected distortion (2) (with ), with respect to these densities, under the entropy constraint, resulting in the following expression for the ECSSQ distortion-rate relation, where we assume independence of,, and Fig. 4. ECUSQ cells. (a) Phase quantization for fixed amplitude and frequency quantization level. (b) Phase-frequency quantization for fixed amplitude quantization level. vector quantization algorithm. Since the number of training vectors we use to compare both schemes at high rates is on the order of 2, this is not feasible. Therefore, instead we consider a special case of the perceptual distortion measure, the -squared-error measure, for which we objectively compare the performance of the proposed scheme with that of entropy-constrained strictly spherical quantization (ECSSQ), in which the parameters are quantized independently, and ECVQ. Taking in the perceptual distortion (1) gives where we used Parseval Plancherel s formula. This leads us back to the -squared-error measure, as considered in [13], where perception is not taken into account. The optimal ECUSQ quantizers can then also be simplified, since we have,, and, i.e., in the case the optimal amplitude quantizer is uniform, and both the optimal phase and frequency quantizer are uniform in phase and frequency and depend linearly on amplitude. Since in a practical implementation of these quantizers, the quantized amplitude value is used instead of the original amplitude value, this means that within any amplitude quantization level, phase and frequency quantization are uniform. Fig. 4(a) shows a few ECUSQ cells for this simplified case, for fixed amplitude and frequency quantization level, and phase between 0 and. Note that the quantization step sizes in this figure are chosen such that we obtain a clear illustration of the shape of the ECUSQ cells; they do not match the actual step sizes corresponding to the optimal quantizers. In Fig. 4(b), the phase-frequency quantization plane is plotted for a fixed amplitude quantization level. This plane can be thought of as an unfolded half sphere. The highlighted part corresponds to the frequency quantization level chosen in the first figure. For the simplified distortion measure, the distortion-rate relation (14) becomes (19) (20) where. The ratio between the distortions of ECUSQ and ECSSQ can now be obtained through (19) and (20) Note that this ratio only depends on the amplitude probability density function. If we choose a Rayleigh distribution for the amplitude, it is easy to verify that this ratio is equal to for all possible variances. This indicates that ECUSQ outperforms ECSSQ significantly. To obtain the same distortion as ECUSQ, the rate of ECSSQ has to be increased by bits. 2) Entropy-Constrained Vector Quantization: The optimal entropy-constrained vector-quantizer is designed using a variant of the LBG algorithm [24], applied to triplets, containing amplitude, phase, and frequency parameters. Therefore, this scheme can be seen as a 3-D vector quantizer. Although ECVQ is optimal in our framework, the design of the ECVQ codebook turns out to be very computationally expensive, due to searching through the entire codebook for each training vector and each iteration. Applying the ECVQ algorithm to a training set containing triplets, we obtain a rate distortion curve for the ECVQ scheme, as plotted in Fig. 5. In this experiment, a codebook size of triplets is chosen, and the frame length is set at. We use a rectangular window and the same input distributions as used in Section III-D, from which the triplets are generated. In the same figure, both the practical and the theoretical curve corresponding to the ECUSQ scheme are plotted, for the same input settings, where we note that the difference between the ECUSQ curves at low rates is due to the fact that the theoretical high-resolution approximation is not valid for low rates. It can clearly be seen that for high rates, the two methods perform comparably, with ECVQ having a slight advantage, as theory predicts. However, generating the ECVQ curve took about one week of computation time on a 3-GHz CPU, 4-GB memory, using Matlab. This is due to the fact that the proposed scheme is only valid at high rates, and in order to compare both schemes at high rates, the codebook for ECVQ must be very large, which

8 KORTEN et al.: HIGH-RESOLUTION SPHERICAL QUANTIZATION OF SINUSOIDAL PARAMETERS 973 the input signal within segment, which we denote by, is then approximated by Fig. 5. Rate-distortion performance of ECVQ and ECUSQ. makes the algorithm extremely computationally expensive. In contrast, the ECUSQ curves can be generated instantly, since we have derived closed-form analytical expressions for the optimal ECUSQ quantizers. IV. ECUSQ OF MULTIPLE SINUSOIDS IN MULTIPLE SEGMENTS In this section, we will derive optimal ECUSQ high-resolution quantizers for amplitude, phase, and frequency for the situations experienced in practice, where multiple sinusoids are distributed across multiple segments. Furthermore, objective and subjective comparisons will be made with existing state-of-the-art quantization schemes. A. High-Resolution Approximations of the Expected Distortion and Entropy In this section, we derive high-resolution approximations for the total expected perceptual distortion and the total entropy. In sinusoidal coding, the input signal is divided into a number of consecutive segments of variable length and each segment is then modeled as a sum of sinusoids. Let denote the number of segments, the number of sinusoidal components in segment, and the length of the analysis window used in segment. Let denote the th component in segment. The part of where, and are the amplitude, phase, and frequency of the th component in segment, respectively. The quantization distortion in a segment consists of the quantization distortion of the individual components plus a contribution due to the mutual interaction of the components. As shown in [25], this mutual interaction can be neglected if the sinusoids are spaced sufficiently far apart in the frequency domain. For practical purposes, this is the case if the sinusoids are estimated using the psychoacoustical matching pursuit algorithm [26]. Define,, and for. The masking threshold corresponding to segment is denoted by. Note that only the samples need to be encoded. Again, quantizing these values uniformly in the db-domain using a stepsize of 8 db was experimentally found to be inaudible. Furthermore, the rate needed to encode the samples can be measured and is approximately 2 to 2.5 bits per sample, depending on the input signal. The quantization error signal is given by. In Appendix B, it is shown that under high-resolution assumptions, the expected perceptual distortion can be approximated by (21), shown at the bottom of the page, where is the joint probability density function of all amplitudes, phases, and frequencies in segment, where,, and are the corresponding distributions. Furthermore,. The total expected perceptual distortion approximated by can then be (22) where we assume that components in different segments are statistically independent. (21)

9 974 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 3, MARCH 2007 In the remainder of this section, we derive a high-resolution approximation for the total entropy of all quantization indices, over all segments. Let,, and denote the alphabets of amplitude, phase, and frequency quantization indices, respectively, corresponding to the components in segment. Additionally, the masking threshold samples have to be quantized for all components. Let denote the alphabet of masking threshold quantization indices in segment. The joint entropy of all quantization indices in segment is equal to We assume that entropies are additive over segments (25) Consequently, by summing (23) over all segments, and using (24), we obtain a high-resolution approximation for the total entropy of quantization indices. B. Optimal Quantizers In this section, we will derive the quantization point densities that minimize the total expected perceptual distortion, while satisfying an entropy constraint, i.e., subject to (26) (23) where is the rate needed to encode the masking threshold in segment. Using high-resolution assumptions, we approximate where is a prespecified target entropy. Using the derived high-resolution approximations for and, this constrained minimization problem can be solved as in the single sinusoid case, by applying the the method of Lagrange multipliers. The Lagrangian cost function we wish to minimize is given by, where the Lagrangian multiplier is chosen such that the entropy constraint is satisfied. Minimizing the cost function by evaluating the Euler Lagrange equations with respect to,, and yields (27) (28) where has distribution and. Furthermore (24) (29) Substituting (27) (29) in the entropy constraint, using (24) and (25), we find the optimal value of the Lagrange multiplier, as shown by (30) at the bottom of the page, where is the joint differential entropy of amplitude, phase, and frequency, corresponding to segment, conditioned on. Note that the integration over has been omitted in the three integral expressions in (24), since is fully determined by,, and. (30)

10 KORTEN et al.: HIGH-RESOLUTION SPHERICAL QUANTIZATION OF SINUSOIDAL PARAMETERS 975 are introduced for notational simplicity. Substituting (30) back into (27) (29), we obtain analytical expressions for the optimal high-resolution ECUSQ quantizers that solve (26), as shown by (31) (33) at the bottom of the page, where to the simulation example for the single sinusoid case, using observed data from the input signal. The term can be practically computed in the following way. Denote, then we have incorporates the perceptual aspect of the quantizers. The distortion-rate relation for ECUSQ can now be found by substituting (31) (33) into (22), using (21), as shown by (34) at the bottom of the page. It is easy to verify that all components and all parameters give exactly the same contribution to this distortion. This means that, on average, all components give rise to the same distortion, adding up to (34). Furthermore, focusing on a single component, all three sinusoidal parameters contribute equally to the distortion corresponding to this component. C. Implementation Issues In order to practically implement the derived quantizers, we use the same assumptions as stated in Section III-C for the single sinusoid case. Moreover, we assume that amplitudes, phases, frequencies, and masking threshold samples are independently and identically distributed over all segments and all components, with independent distributions,,, and, respectively. This assumption is made for the following reason. We typically use a maximum of 75 components per segment, which is too little data to be able to accurately estimate differential entropies and other source dependent terms per segment, which are needed for quantization. Assuming independently and identically distributed variables allows us to take all components from the entire input signal together, resulting in sufficient data to make the mentioned estimations. We realize that this is not necessarily a valid assumption for real audio, as distributions can change over time. Under this assumption, we have, and secondly. The values of,,,, and are estimated similar (35) Since the masking threshold is inside the expectation operator, the expression in (35) cannot be computed analytically. However, since it is reasonable to assume that the random variables are statistically independent for all and, the Central Limit Theorem gives where denotes the variance, assuming that the individual variances of satisfy the Lindenberg conditions [27]. This implies that the expectation in (35) can be removed (36) for sufficiently large. The expression in (36) can be analytically computed. (31) (32) (33) (34)

11 976 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 3, MARCH 2007 rate audio coder in [28] spends approximately 20 bits per parameter triplet. For very low bitrates, where the high-resolution assumptions are not valid, the approximation does not hold and the curves diverge. Fig. 6. Theoretical versus practical distortion-rate performance for ECUSQ, using a pop music fragment. E. Objective Comparison With ECUPQ+ In this section, the proposed ECUSQ scheme is objectively compared in terms of rate-distortion performance to the ECUPQ+ scheme, which is a combination of the ECUPQ scheme and an optimal entropy-constrained frequency quantizer. Entropy-constrained unrestricted polar quantization was introduced in [12], and as the term polar already implies, in this scheme only amplitude and phase quantization is considered. Since ECUPQ is unrestricted, amplitude and phase are quantized dependently. If frequency quantization is left out, it can be shown that the expected perceptual distortion (49), corresponding to the th component in segment evaluates to Finally, next to all quantization indices, the following side information has to be sent to the decoder in order to reconstruct the sinusoidal parameters: the segmentation of the input signal:, and ; the distribution of sinusoids: ; source-dependent variables:, and. This information has to be sent for each segment, and is sufficient to encode and decode the sinusoidal parameters. D. Simulation Example In this section, we compare the theoretical high-resolution distortion-rate relation derived in (34) to a practically obtained distortion-rate curve, which is generated by encoding a pop music fragment using the SiCAS coder [28]. In this sinusoidal coder, first the jointly optimal segmentation and distribution of sinusoids is determined, using segment lengths of 512, 768, 1024, or 1280 samples, a maximum number of 75 sinusoids per segment, and approximately 2000 sinusoids per second on average. The sample frequency used was 48 khz. In this way, we obtain a framework of segments and corresponding sinusoids. This framework is assumed a fixed input for the quantization scheme, independent of the used target entropy. Subsequently, the sinusoids are quantized one by one, using the derived optimal high-resolution ECUSQ quantizers, for a given target entropy. Using the perceptual distortion measure, we compute the distortion for each component in each segment. The total perceptual distortion is then obtained by summing over all segments and components. Repeating this procedure for several different target entropies, we obtain the practical rate-distortion curve plotted in Fig. 6, where we express the rate in terms of the target entropy per component.in the same figure, the theoretical high-resolution distortion-rate relation (34) is plotted. Since the curves clearly converge towards each other, the high-resolution approximation (34) for the total expected perceptual distortion is valid, even at bitrates as low as 14 bits per parameter triplet, a bit rate which is of great interest in practical applications. In comparison, the low where the frequencies are given and fixed. Furthermore, the entropy constraint reduces to, where, and is a prespecified target entropy. The entropy of amplitude, phase and masking threshold quantization indices in segment can under high-resolution assumptions be approximated by It is then straightforward to derive, using variational techniques, the amplitude and phase point density functions that minimize the total expected perceptual distortion (22) while satisfying the entropy constraint. These are given by (37) (39), shown at the bottom of the next page. To be able to compare the performances of the ECUPQ scheme and the proposed ECUSQ scheme, we need to design a frequency quantizer, for use with the ECUPQ amplitude and phase quantizers obtained above. This can be naturally done by considering the problem of independent quantization of frequency, i.e., amplitude and phase quantization are not taken into account, and deriving the corresponding optimal entropy-constrained frequency quantizer. Similar to previous derivations, we obtain the frequency point density function that

12 KORTEN et al.: HIGH-RESOLUTION SPHERICAL QUANTIZATION OF SINUSOIDAL PARAMETERS 977 minimizes the total expected perceptual distortion (22), while satisfying the entropy constraint (40) where is the prespecified target entropy for frequency parameters. Since the rate needed to encode the masking threshold is already included in the entropy constraint for the amplitude and phase quantizers, it can be omitted in the frequency quantization. The quantizers (38) (40) form a quantization scheme, denoted by ECUPQ+. In ECUSQ the optimal quantizers are derived by jointly optimizing for all three parameters simultaneously at target entropy, while in ECUPQ+ the quantizers are found by first jointly optimizing for amplitude and phase at target entropy, and then separately optimizing for frequency at target entropy. Thus, in theory ECUSQ outperforms ECUPQ+. In order to compare the two schemes at the same total target entropy, we must have. An important advantage ECUSQ offers over the ECUPQ+ method is that in the proposed scheme, the bit distribution between amplitude, phase, and frequency follows directly from the derived formulas, as seen for the single sinusoid case in Section III-E. In contrast, ECUPQ+ does not specify the part of the total bit budget that is assigned to amplitude and phase parameters, or the part of the total bit budget that is assigned to frequency parameters. Hence, these values have to be chosen a priori, which makes it more difficult to find the optimal distribution of bits in this scheme, especially because this optimal distribution will turn out to be dependent on the total target rate. Encoding a pop music fragment with the SiCAS coder, using the same settings as in Section IV-D, we can objectively compare the performances of both quantization schemes by computing the total perceptual distortion for each scheme, for several different target entropies. The results are shown in Fig. 7, where the distortion in the ECUPQ+ scheme is plotted for several values of, the percentage of the bit budget that is assigned to frequency. These results indicate that in practice for every target rate, one can find a bit balance in the ECUPQ+ scheme, such that it performs very close to the ECUSQ scheme. However, choosing a nonoptimal bit balance can lead to a considerable performance difference Fig. 7. Rate-distortion performance of ECUSQ and ECUPQ+, p percentage of the bit budget that is assigned to frequency. denotes the as compared to ECUSQ. Note that the optimal bit balance in the ECUPQ+ scheme is dependent on the target entropy, i.e., no fixed distribution is optimal. Consequently, the optimal bit balance in ECUPQ+ has to be redetermined for every target rate, whereas in ECUSQ this balance follows naturally, which is due to jointly optimizing for all three parameters. Note that similar plots are obtained when encoding other types of audio. F. Subjective Comparisons: Listening Test Results A listening test was performed to compare the proposed ECUSQ scheme to a state-ofthe-art sinusoidal quantization scheme that is used in the SiCAS coder [28]. In contrast to ECUSQ, the quantizers in the SiCAS coder are fixed for all input signals. Amplitude and frequency quantization are logarithmic, where the relative step sizes are, and, respectively. Phase is uniformly quantized with a step size of. After entropy coding, the measured bit rate in the SiCAS quantization scheme is approximately 20 bits per component, where components are quantized/encoded without using differential techniques. Note that the described amplitude and frequency quantizers with these step sizes are also used in the standardized MPEG-4 SSC coder [6] to quantize births of sinusoidal tracks. We used five different excerpts in our listening test: jazz music, harpsichord, German male speech, classical music, and Eric Clapton (pop music), all sampled at 48 khz, and each with (37) (38) (39)

13 978 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 3, MARCH 2007 TABLE I LISTENING TEST RESULTS: RELATIVE PREFERENCES [%] FOR ECUSQ AND CORRESPONDING STATISTICAL SIGNIFICANCE a duration of approximately 10 s. First, using the SiCAS coder, the target signal is divided into variable-length segments, and sinusoids are distributed across the segments [28], using segment lengths of 512, 768, 1024, or 1280 samples, a maximum number of 75 sinusoids per segment, and 1000 sinusoids per second on average. This framework will be the fixed input for the quantizers. Second, the excerpts are quantized with both quantization schemes, using a specified target entropy in the ECUSQ scheme. Finally, for each excerpt, the error signal due to modeling by sinusoids is then added to the quantized signal, such that any artifacts are due to quantization of the sinusoidal parameters, as desired. We then obtain the final excerpts that are used in this listening test. For each excerpt, we presented the listeners three versions at a time: an excerpt quantized with the ECUSQ scheme for a specified target entropy, the same excerpt quantized with the SiCAS quantization scheme, and the original version. The participants were instructed to indicate whether the ECUSQ version had a worse or better/equal audio quality as compared to the SiCAS version, i.e., a binary choice. The excerpt quantized with the SiCAS quantization scheme is very close to transparent quality, which is mainly since the error signal due to modeling is added to the quantized signal in this test. For this reason, the distinction between equal and better is very small here. The described procedure was carried out for eight different ECUSQ target entropies per component, ranging from 12 to 19 bits, using increments of 1 bit, and for all five excerpts. Furthermore, every participant performed the entire test twice. A total of 11 listeners participated in the test and the authors were not included. The results are presented in Table I, where for each excerpt and each target entropy per component, we give the percentage of listeners that indicated the ECUSQ encoded version as being better/equal. Note that since the test was done twice by every listener, these percentages are based on 22 test results. In the last row, the percentages are averaged over the five excerpts. Furthermore, the statistical significance of the percentages is stated in brackets, where indicates that the ECUSQ scheme performs statistically significantly worse, and indicates that the ECUSQ scheme performs statistically significantly better or equal. Furthermore, (0) indicates that the corresponding percentage is not statistically significant. These results were obtained by applying the Wilcoxon matched-pairs signed-rank test of equality of medians [29], using a significance level of Clearly, for the lower target entropies at 12 and 13 bits the SiCAS quantization scheme performs better for all excerpts. At 14 bits, the SiCAS scheme performs better for the German male speech and the harpsichord excerpt, while the remaining percentages are statistically insignificant. At 15 bits, the ECUSQ scheme performs better or equal for the jazz and classical music excerpts. Since the SiCAS quantization scheme uses 20 bits per component, we gain the considerable amount of 5 bits per component for these two excerpts by using the ECUSQ scheme, maintaining the same subjective quality level as in the SiCAS scheme. For the German male speech excerpt this gain is 4 bits, and for the more critical harpsichord and Eric Clapton excerpt we gain 3 bits per component. Note that since the excerpts quantized with the SiCAS scheme are very close to transparent quality, so are the ECUSQ quantized excerpts at the mentioned bit rates. In conclusion, by applying the ECUSQ scheme as derived in this paper, we can gain up to 5 bits per component, which corresponds to a bit rate reduction of 25% for the settings in this experiment, and still achieve close to transparent subjective quality. V. CONCLUSION This paper presented a scheme for entropy-constrained quantization of sinusoidal parameters. In this scheme, which is called ECUSQ, all sinusoidal parameters amplitude, phase, and frequency are quantized dependently. Using high-resolution assumptions we derived analytical expressions for the optimal ECUSQ amplitude, phase, and frequency quantizers, which minimize the expected perceptual distortion while the corresponding quantization indices satisfy an entropy constraint. The perceptual distortion measure used in this work is based on psychoacoustical properties of the auditory system. The ECUSQ quantizers were derived both for the case of a single sinusoid, and for the more practically relevant case of multiple sinusoids distributed across multiple segments. As desired, the quantizers prove to be flexible and of low complexity, in the sense that they can be determined easily for varying bit rate requirements, without any sort of iterative retraining procedures. To measure the performance of the proposed scheme, it was compared both objectively and subjectively to several existing entropy-constrained quantization schemes. For the squared error distortion measure, we demonstrated that ECUSQ performs very close to the theoretically optimal entropy-constrained vector quantization, in terms of objective rate-distortion performance. Furthermore, for the perceptual distortion measure, it was shown that the ECUSQ scheme objectively outperforms an existing sinusoidal quantization scheme, where frequency quantization is done independent of amplitude/phase quantization. Finally, a subjective listening test was conducted, in which the proposed scheme is compared to an existing state-of-the-art sinusoidal quantization scheme with fixed quantizers for all input signals. An average bit rate reduction of 20% was achieved by the proposed scheme, at the same subjective quality level as the existing scheme.

14 KORTEN et al.: HIGH-RESOLUTION SPHERICAL QUANTIZATION OF SINUSOIDAL PARAMETERS 979 APPENDIX A High-resolution Approximation of the Expected Distortion Single Sinusoid: In this section, we derive a high-resolution expression of the expected perceptual distortion. First, we evaluate the perceptual distortion (1) for the single sinusoid case. Let. Note that since a sinusoid can be written as a sum of two complex exponentials with opposite frequencies, the perceptual distortion measure (1) equals where is the joint probability density function of amplitude, phase, and frequency, corresponding to distributions, and, respectively. Let denote the USQ cell corresponding to amplitude, phase, and frequency quantization indices, and, respectively, and let, and denote their corresponding alphabets. For notational simplicity, we omit the mutual dependencies between the quantization indices. Summing over all USQ cells yields (41) For large, the power spectrum of the windowed error signal will converge to a sum of delta-functions at frequencies and and their opposites. Then, the sidelobes of can be neglected; furthermore, the widths of the main lobes are sufficiently small to assume the masking threshold to be constant across a main lobe. Since we can also assume that, due to high-resolution assumptions and the smooth masking curve, we can approximate (41) by (42) for sufficiently large. The contribution of the complex exponentials at negative frequencies to the integral in (42) can be neglected. Defining, this means for sufficiently large, where (43) (46) where. Here we used the high-resolution assumption that is approximately constant over each USQ cell. Since we then have a uniform distribution in each cell, the optimal quantization reconstruction points are centered in the quantization intervals, i.e., cell is defined by amplitude, phase, and frequency quantization intervals given by,, and, respectively, where denotes the length of the respective quantization interval. Using these boundaries, the integral in (46) can be evaluated, by substituting (43) and (44). For each cell, this is carried out in the same way, so we will focus on a single cell and leave out the quantization indices for notational simplicity, as shown by (47) at the bottom of the next page, where.in, we used the high-resolution assumption that the masking threshold can be considered flat within each USQ cell. Second, in, we substituted Taylor expansions of and, around and, respectively, neglecting terms of order higher than three. Substituting (47) back into (46), we obtain (44) and. The expected perceptual distortion is given by (45)

15 980 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 3, MARCH 2007 Here we again used high-resolution assumptions, and hence replaced sums by integrals and quantization step sizes by so-called quantization point densities. APPENDIX B High-Resolution Approximation of the Expected Distortion Multiple Sinusoids and Segments: In this section, we derive a high-resolution expression of the expected perceptual distortion corresponding to the th component in segment. Let. Using the approximation (43), the perceptual distortion (1) corresponding to the th component in segment evaluates to (48) for sufficiently large, where. The expected perceptual distortion corresponding to the th component in segment is given by. For notational simplicity, we number these cells by. Substituting (48) in (49), and using high-resolution assumptions, we then obtain (50) where,, and are quantized to cell. Applying (44), and using Taylor series, the integral in (50) can be approximated by (49) Here is the joint probability density function of all amplitudes, phases, and frequencies in segment, where,, and are the corresponding distributions. The integral in (49) can be evaluated by summing over all possible -dimensional quantization cells for (51) where and where the step sizes correspond to quantization cell. Furthermore, we used the fact that the optimal quantization reconstruction points are centered in the corresponding quantization inter- (47)

KORTEN et al.: HIGH-RESOLUTION SPHERICAL QUANTIZATION OF SINUSOIDAL PARAMETERS 981 vals.

previous page. ACKNOWLEDGMENT The authors would like to thank the reviewers for their useful comments. REFERENCES [1] K. N. Ha

, Atlanta, Georgia, May 1996, vol. 2, pp. 1045 1048. [2] H. Purnhagen, Advances in parametric audio coding, in Proc. 1999 IEEE Workshop Applications Signal Process. to Audio Acoust.

16 KORTEN et al.: HIGH-RESOLUTION SPHERICAL QUANTIZATION OF SINUSOIDAL PARAMETERS 981 vals. Substituting (51) into (50), and using high-resolution assumptions to replace sums by integrals and step sizes by point density functions, we obtain the equation as shown at the bottom of the previous page. ACKNOWLEDGMENT The authors would like to thank the reviewers for their useful comments. REFERENCES [1] K. N. Hamdy, M. Ali, and A. H. Tewfik, Low bit rate high quality audio coding with combined harmonic and wavelet representation, in Proc. IEEE Int. Conf. Acoust. Speech, Signal Process., Atlanta, Georgia, May 1996, vol. 2, pp [2] H. Purnhagen, Advances in parametric audio coding, in Proc IEEE Workshop Applications Signal Process. to Audio Acoust., New Paltz, NY, Oct. 1999, pp [3] T. S. Verma and T. H. Y. Meng, A 6 kbps to 85 kbps scalable audio coder, in Proc. IEEE Int. Conf. Acoust. Speech, Signal Process., Istanbul, Turkey, Jun. 2000, vol. 2, pp [4] A. W. J. Oomen and A. C. B. den, Sinusoids plus noise modeling for audio signals, in Proc. Audio Eng. Soc. 17th Conf. High Quality Audio Coding, Florence, Italy, Sep. 1999, pp [5] R. M. Gray, Source Coding Theory. Norwell, MA: Kluwer, [6] E. Schuijers, W. Oomen, B. den Brinker, and J. Breebaart, Advances in parametric coding for high-quality audio, in Proc. 114th Conv. Audio Eng. Soc., Amsterdam, The Netherlands, Mar [7] R. Heusdens, J. Jensen, P. E. L. Korten, and R. Vafin, Rate-distortion optimal high-resolution differential quantisation for sinusoidal coding of audio and speech, in Proc. IEEE Workshop Applications of Signal Process. to Audio Acoust., New Paltz, NY, Oct. 2005, pp [8] J. Jensen and R. Heusdens, A comparison of differential encoding schemes, in Proc. IEEE Workshop on Applications of Signal Process. to Audio and Acoustics, New Paltz, NY, Oct. 2003, pp [9] G. Wilson, Magnitude/phase quantization of independent Gaussian variates, IEEE Trans. Commun., vol. COM-28, no. 11, pp , Nov [10] P. F. Swaszek and T. W. Ku, Asymptotic performance of unrestricted polar quantizer, IEEE Trans. Inf. Theory, vol. IT-32, no. 2, pp , Mar [11] D. L. Neuhoff, Polar quantization revisited, in Proc. IEEE Int. Symp. Inf. Theory, Ulm, Germany, Jul. 1997, pp [12] R. Vafin and W. B. Kleijn, Entropy-constrained polar quantization and its application to audio coding, IEEE Trans. Speech Audio Process., vol. 13, no. 2, pp , Mar [13] P. E. L. Korten, J. Jensen, and R. Heusdens, High rate spherical quantization of sinusoidal parameters, in Proc. 12th Eur. Signal Process. Conf., Vienna, Austria, Sep. 2004, pp [14] R. Vafin, D. Prakash, and W. B. Kleijn, On frequency quantization in sinusoidal audio coding, IEEE Signal Process. Lett., vol. 12, no. 3, pp , Mar [15] P. E. L. Korten, J. Jensen, and R. Heusdens, High-resolution spherical quantization of sinusoidal parameters using a perceptual distortion measure, in Proc. IEEE Int. Conf. Acoust. Speech, Signal Process., Philadelphia, PA, Mar. 2005, vol. 3, pp [16] P. Prandoni, M. Goodwin, and M. Vetterli, Optimal time segmentation for signal modeling and compression, in Proc. IEEE Int. Conf. Acoust. Speech, and Signal Process., Munich, Germany, Apr. 1997, pp [17] R. Heusdens and S. van de Par, Rate-distortion optimal sinusoidal modeling of audio and speech using psychoacoustical matching pursuits, in Proc. IEEE Int. Conf. Acoust. Speech, Signal Process., Orlando, FL, May 2002, vol. II, pp [18] H. Purnhagen and N. Meine, HILN The MPEG-4 parametric audio coding tools, in Proc. IEEE Int. Symp. Circuits Syst., Geneva, Switzerland, May 2000, pp [19] S. van de Par, A. Kohlrausch, G. Charestan, and R. Heusdens, Anew psychoacoustical masking model for audio coding applications, in Proc. IEEE Int. Conf. Acoust. Speech, Signal Process., Orlando, FL, May 2002, vol. 2, pp [20] R. M. Gray and D. L. Neuhoff, Quantization, IEEE Trans. Inf. Theory, vol. 44, no. 6, pp , Oct [21] S. P. Lloyd, Least squares quantization in PCM, IEEE Trans. Inf. Theory, vol. IT-28, no. 2, pp , Mar [22] H. Sagan, Introduction to the Calculus of Variations, ser. Dover Books on Mathematics. New York: Dover, [23] R. O. Duda and P. E. Hart, Pattern Classification and Scene Analysis. New York: Wiley, [24] P. A. Chou, T. Lookabaugh, and R. M. Gray, Entropy-constrained vector quantization, IEEE Trans. Acoust., Speech, Signal Process., vol. 37, pp , Jan [25] R. P. Westerlaken, High-resolution quantisation of sinusoidal parameters using a perceptual distortion measure, M.S. thesis, Delft Univ. Technol., Delft, The Netherlands, [26] R. Heusdens, R. Vafin, and W. B. Kleijn, Sinusoidal modelling using psychoacoustic-adaptive matching pursuits, IEEE Signal Process. Lett., vol. 9, no. 8, pp , Aug [27] W. Feller, Introduction to Probability Theory and Its Applications, 2nd ed. New York: Wiley, 1971, vol. 2. [28] R. Heusdens et al., Bit-rate scalable intraframe sinusoidal audio coding based on rate-distortion optimization, J. Audio Eng. Soc., vol. 54, no. 3, pp , Mar [29] D. J. Sheskin, Handbook of Parametric and Nonparametric Statistical Procedures, 3rd ed. London, U.K.: Chapman & Hall, Pim Korten received the M.Sc. degree in applied mathematics from Delft University of Technology, Delft, The Netherlands, in 2003 and is currently working towards the Ph.D. degree in electrical engineering in the Information and Communication Theory Group, Department of Mediamatics, Faculty of Electrical Engineering, Mathematics, and Computer Science (EEMCS), Delft University of Technology. His research interests include perceptual audio coding, sinusoidal modeling and quantization, high-resolution quantization theory, and rate-distortion optimization. Jesper Jensen received the M.Sc and Ph.D. degrees in electrical engineering from Aalborg University, Aalborg, Denmark, in 1996 and 2000, respectively. From 1996 to 2001, he was with Center for Person- Kommunikation (CPK), Aalborg University, as a Researcher, Ph.D. student, and Assistant Research Professor. In 1999, he was a Visiting Researcher at the Center for Spoken Language Research, University of Colorado, Boulder. He is currently an Assistant Professor at the Delft University of Technology, Delft, The Netherlands. His main research interests are digital speech and audio signal processing, including coding, synthesis, and enhancement. Richard Heusdens received the M.Sc. and Ph.D. degrees from the Delft University of Technology, Delft, The Netherlands, in 1992 and 1997, respectively. Since 2002, he has been an Associate Professor in the Department of Mediamatics, Delft University of Technology. In the spring of 1992, he joined the Digital Signal Processing Group, Philips Research Laboratories, Eindhoven, The Netherlands. He has worked on various topics in the field of signal processing, such as image/video compression and VLSI architectures for image processing algorithms. In 1997, he joined the Circuits and Systems Group, Delft University of Technology, where he was a Postdoctoral Researcher. In 2000, he moved to the Information and Communication Theory (ICT) Group, where he became an Assistant Professor responsible for the audio and speech processing activities within the ICT group. He is involved in research projects that cover subjects such as audio and speech coding, speech enhancement, and digital watermarking of audio.

Entropy-constrained quantization of exponentially damped sinusoids parameters

Entropy-constrained quantization of exponentially damped sinusoids parameters Olivier Derrien, Roland Badeau, Gaël Richard To cite this version: Olivier Derrien, Roland Badeau, Gaël Richard. Entropy-constrained