PARAMETRIC coding has proven to be very effective

Size: px
Start display at page:

Download "PARAMETRIC coding has proven to be very effective"

Transcription

1 966 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 3, MARCH 2007 High-Resolution Spherical Quantization of Sinusoidal Parameters Pim Korten, Jesper Jensen, and Richard Heusdens Abstract Sinusoidal coding is an often employed technique in low bit-rate audio coding. Therefore, methods for efficient quantization of sinusoidal parameters are of great importance. In this paper, we use high-resolution assumptions to derive analytical expressions for the optimal entropy-constrained unrestricted spherical quantizers for the amplitude, phase, and frequency parameters of the sinusoidal model. This is done both for the case of a single sinusoid, and for the more practically relevant case of multiple sinusoids distributed across multiple segments. To account for psychoacoustical effects of the auditory system, a perceptual distortion measure is used. The optimal quantizers minimize a high-resolution approximation of the expected perceptual distortion, while the corresponding quantization indices satisfy an entropy constraint. The quantizers turn out to be flexible and of low complexity, in the sense that they can be determined easily for varying bit rate requirements, without any sort of retraining or iterative procedures. In an objective comparison it is shown that for the squared error distortion measure, the rate-distortion performance of the proposed method is very close to that of the theoretically optimal entropy-constrained vector quantization. Furthermore, for the perceptual distortion measure, the proposed scheme is shown to objectively outperform an existing sinusoidal quantization scheme, where frequency quantization is done independently. Finally, a subjective listening test, in which the proposed scheme is compared to an existing state-ofthe-art sinusoidal quantization scheme with fixed quantizers for all input signals, indicates that the proposed scheme leads to an average bit rate reduction of 20%, at the same subjective quality level as the existing scheme. Index Terms High-resolution quantization, point density functions, sinusoidal coding, unrestricted spherical quantization. Fig. 1. Sinusoidal coding. component. Often, the bit budget available for encoding the sinusoidal component is allocated dynamically based on the bit needs of the other component subcoders. For this reason, it is desirable to have simple and flexible quantizers which can adapt easily to changing bit-rate requirements without any sort of iterative quantizer (re)design procedures. Developing an efficient quantization scheme for the sinusoidal component and its corresponding parameters is therefore critical. Fig. 1 shows a block diagram of a typical sinusoidal subcoder, which models the input signal as a sum of sinusoids I. INTRODUCTION PARAMETRIC coding has proven to be very effective for representing audio signals at low bit rates [1] [4]. Typically, a parametric coder is subdivided into several separate subcoders, each operating on different components of the input signal; these generally include a sinusoidal component and a noise component, and sometimes also include a transient component. For many audio signals, the sinusoidal component, represented by amplitude, phase, and frequency parameters, is perceptually the most important of the three [3]. Consequently, the main part of the bit budget is typically assigned to this Manuscript received June 3, 2005; revised June 13, The work was supported by STW, applied science division of NWO, and the technology program of the Dutch ministry of Economic Affairs. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Gerald Schuller. The authors are with the Department of Mediamatics, Delft University of Technology, 2628 CD Delft, The Netherlands ( p.e.l.korten@tudelft.nl; j.jensen@tudelft.nl; r.heusdens@tudelft.nl). Digital Object Identifier /TASL where,, and denote amplitude, phase, and frequency, respectively. Note that the input signal for the sinusoidal coder may consist of the original signal or the output of the transient coder. The parameters are then quantized and the corresponding quantization indices are entropy encoded. In this paper, we focus on quantizing the sinusoidal amplitude, phase, and frequency parameters efficiently. More specifically, we aim at minimizing the quantization distortion (as measured by an appropriate distortion measure), subject to an entropy constraint. The quantizers in this paper are derived under high-resolution assumptions, i.e., the input space is assumed to be covered by a very large number of quantization cells. Consequently, the probability density functions of the input variables can be assumed constant in each quantization cell [5]. Using these assumptions, considerable simplifications can be made in distortion and entropy formulas, resulting in analytically simple expressions for the optimal quantizers, which turn out to be valid already at practical low bit rates /$ IEEE

2 KORTEN et al.: HIGH-RESOLUTION SPHERICAL QUANTIZATION OF SINUSOIDAL PARAMETERS 967 The proposed quantization scheme in this paper consists of optimal high-resolution entropy-constrained quantizers for amplitude, phase, and frequency, which are derived using a perceptual spectral distortion measure. Note that parts of this paper have been presented in [13] and [15]. The proposed quantizers minimize the expected perceptual distortion, while satisfying an entropy constraint. Note that in this paper, the rate needed to encode the masking threshold is included in the entropy constraint. The proposed scheme is denoted by entropy-constrained unrestricted spherical quantization (ECUSQ), where the term unrestricted refers to the fact that amplitude, phase, and frequency parameters are dependently quantized. Optimal ECUSQ quantizers are derived for both for the case of a single sinusoid, and for the more practically relevant case of multiple sinusoids distributed across multiple segments. In the single sinusoid case, the rate-distortion performance of the proposed ECUSQ scheme is compared to that of the theoretically optimal entropy-constrained vector quantization (ECVQ) and to that of entropy-constrained strictly spherical quantization (ECSSQ) in which the parameters are quantized independently. For these comparisons, we use the squared error measure. By doing so, implementing the ECVQ algorithm is feasible, in terms of computational complexity, which is not the case if the perceptual distortion measure is used. For the multiple sinusoids-multiple segments case, the performance of the proposed scheme is both objectively and subjectively compared to that of existing entropyconstrained quantization schemes. First, the proposed scheme is objectively compared to the ECUPQ+ scheme, which is a combination of the ECUPQ scheme (entropy-constrained unrestricted polar quantization only amplitude and phase quantization) and an optimal independent entropy-constrained frequency quantizer. The main advantage of the proposed method over ECUPQ+ is that the bit distribution between amplitude, phase, and frequency does not need to be determined beforehand, but follows as a result of the derived formulas, whereas in ECUPQ+ this needs to be chosen a priori. Second, a listening test was done in which the subjective performance of the proposed scheme is compared to that of an existing state-of-the-art sinusoidal quantization scheme using log-quantizers, in which the quantizers are fixed for all input signals. Note that these quantizers are also used in the standardized MPEG-4 SSC coder [6] to quantize births of sinusoidal tracks. In a practical sinusoidal coding scheme, the sinusoidal parameters are usually (time/frequency) differentially encoded. In this paper, we derive quantizers which quantize the sinusoidal parameters directly instead of differentially. However, in [7], it is shown that the proposed quantization scheme can be easily extended to include differential encoding as well. The remainder of this paper is organized as follows. In Section II, we discuss previous work concerning sinusoidal quantization and the perceptual distortion measure. In Section III, we discuss the single sinusoid case. The optimal ECUSQ quantizers and the optimal bit distribution are determined. Furthermore, the proposed scheme is compared to ECVQ and ECSSQ, using the squared error distortion measure. Section IV discusses the case of multiple sinusoids distributed across multiple segments. After developing the optimal ECUSQ quantizers, the proposed scheme is compared to ECUPQ+ (objectively) and the sinusoidal log-quantization scheme (subjectively). In Section V, we give some conclusions of our work. Finally, some proofs are included in the Appendix. II. PREVIOUS WORK The ECUSQ quantizers generalize and advance previous work in sinusoidal quantization and coding. Additionally, the ECUSQ derivations rely on an established perceptual distortion measure. In this section, we will discuss these two points. A. Sinusoidal Quantization and Coding In [9] [11], unrestricted polar quantization (UPQ) has been introduced, in which only amplitude and phase parameters are quantized. In this scheme, phase quantization depends on the input amplitude. The derivations in [9] [11] are done subject to a resolution constraint, i.e., a fixed number of quantization cells and a fixed rate. However, in some applications, an entropy constraint rather than a resolution constraint is of interest. In [12], entropy-constrained unrestricted polar quantization (ECUPQ) is introduced, and using high-resolution assumptions, analytical expressions for the optimal scalar ECUPQ amplitude and phase quantizers are derived. These quantizers minimize the expected distortion, while satisfying an entropy constraint. A shortcoming of this work, however, is that it does not consider quantization of frequency parameters. In [13] and [14], ECUPQ is generalized to include frequency quantization. In the first citation, this extended scheme is denoted by entropy-constrained unrestricted spherical quantization (ECUSQ). Analogously with ECUPQ, amplitude, phase, and frequency are quantized dependently in this scheme. In both citations, optimal scalar high-resolution amplitude, phase, and frequency quantizers are derived, so as to minimize a prespecified distortion, while satisfying an entropy constraint. Unlike the ECUPQ quantizers derived in [12], the quantizers in [13] and [14] are dependent on the frame-length and shape of the analysis/synthesis window (as one would expect). Such a framelength dependent quantization is important in coding schemes, where variable segment length analysis is used, see, e.g., [16] and [17]. In [13], a mean-squared-error distortion measure is used, whereas in [14] a perceptually weighted mean-squarederror distortion measure is used to account for psychoacoustical effects of the auditory system. Additionally, in [15] the work in [13] is extended to a perceptual spectral distortion measure. Hence, both methods in [14] and [15] account for perceptual effects; however, the method presented in [14] has a few restrictions in comparison with [15]. First, only phase quantization is made dependent of the individual perceptual weights, resulting in amplitude and frequency quantizers that do not account for auditory perception, whereas in [15], all three quantizers take perception into account. Second, the weights in [14] are considered fixed when computing the expectation of the total quantization distortion over all possible input signals, while varying the input signal should result in varying perceptual weights, as is the case in [15]. Finally, in [14], only one segment is used, defined by a rectangular window, whereas in [15], the more practically relevant situation of multiple segments defined by nonrectangular windows is considered. However, in [15], the rate needed to encode the masking threshold is not taken into account.

3 968 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 3, MARCH 2007 There exist two standardized sinusoidal coders, namely MPEG-4 HILN [18] and MPEG-4 SSC [6]. The HILN coder typically operates at bit rates lower than 16 kb/s, while the SSC coder operates at higher bit rates around 24 kb/s. These coders incorporate multiple signal model components as described in the introduction. Since in the proposed scheme we only focus on the quantization of the sinusoidal parameters, and not on the complete coder, we will not use these coders for benchmarking. B. Perceptual Distortion Measure The perceptual distortion measure used throughout this paper is introduced in [19] and is defined by where denotes the Fourier transform operation, and denotes the difference between the original signal and the quantized signal. Furthermore, is the analysis window used and is a weighting function representing the sensitivity of the human auditory system at a particular (normalized) frequency. Note that the perceptual distortion measure introduced in [19] has a rather different notational form than (1). However, in [17], it is proven that the two measures are equal if is selected to be the inverse of the masking threshold corresponding to the input signal. In this way, frequencies for which the auditory system is less sensitive will contribute less to the total distortion than frequencies for which the auditory system is more sensitive. Note that this perceptual model only accounts for spectral masking effects, and does not include temporal effects. III. ECUSQ OF A SINGLE SINUSOID In this section, we will derive optimal ECUSQ high-resolution quantizers for amplitude, phase, and frequency, for the case where the input signal is represented by one single sinusoid. Furthermore, objective comparisons will be made with several other quantization schemes using a special case of the perceptual distortion measure, the -squared-error measure. (1) where is the joint probability density function of amplitude, phase, and frequency, corresponding to distributions, and, respectively. Furthermore, and.in high-resolution theory, quantizers are described by quantization point density functions [20], [21], which when integrated over a region give the total number of quantization levels within. Thus, in the case of one-dimensional quantizers, the quantizer step sizes are simply given by the reciprocal values of the point density functions, that is,. Note that these point densities do not specify the location of the quantization points. In our scheme, we encounter point density functions for amplitude, phase, and frequency, denoted by,, and, respectively. Note that since we consider unrestricted quantization, the quantization point density functions are assumed to depend on all three parameters. To be able to reconstruct at the decoder, the masking threshold sample also has to be quantized and encoded, as will become clear later. Note that this sample is sufficient to reconstruct the sinusoid, i.e., we do not need to encode the entire masking threshold. Throughout this paper, we will quantize the masking threshold samples uniformly in the db-domain with a stepsize of 8 db, of which the effect is experimentally found to be inaudible and hence negligible for the perceptual distortion. However, encoding the masking threshold samples does give a contribution to the rate. We will discuss the extent of this contribution later. In the remainder of this section, we derive a high-resolution approximation for the entropy of the quantization indices. Let,,, and denote the alphabets of amplitude, phase, frequency, and masking threshold quantization indices, respectively. The joint entropy of the quantization indices is equal to where is the rate needed to encode the masking threshold sample. Under high-resolution assumptions, can be approximated by (3) A. High-Resolution Approximations of Expected Distortion and Entropy In this section, we will derive high-resolution expressions for the expected perceptual distortion and entropy, for a single sinusoid. In the single sinusoid case, the input signal is approximated by one sinusoid, i.e.,, for. Here, and are amplitude, phase, and frequency respectively,, and is the frame-length. The masking threshold is denoted by. Furthermore, the quantization error signal is given by. Our goal is to minimize the expected perceptual distortion. In Appendix A, it is proven that under high-resolution assumptions is approximated by where has distribution and (4) (2)

4 KORTEN et al.: HIGH-RESOLUTION SPHERICAL QUANTIZATION OF SINUSOIDAL PARAMETERS 969 is the joint differential entropy of amplitude, phase, and frequency, conditioned on. In, we used the high-resolution assumption that probability density functions are constant within a quantization cell to approximate and (10) are introduced for notational simplicity. Substituting (9) back into (6) (8), we find closed-form expressions for the optimal high-resolution ECUSQ quantizers that solve (5) Furthermore, we replaced sums by integrals and quantization step sizes by quantization point densities. Note that the integration over falls out in the three integrals expressions in (4) since is fully determined by,, and. B. Optimal Quantizers In this section, we will derive the quantization point densities that minimize the expected perceptual distortion, while satisfying an entropy constraint subject to (5) where is a prespecified target entropy. This constrained minimization problem can be solved using the method of Lagrange multipliers, turning it into an unconstrained minimization problem. In this method, the Lagrangian cost function is minimized, where is the Lagrangian multiplier, which should be chosen such that the entropy constraint is satisfied. A well-known theorem in variational analysis states that is minimized if the so-called Euler Lagrange equations for with respect to,, and, individually, are satisfied [22]. Solving these equations, the quantization point densities that minimize the cost function are found to be Substituting (6) (8) in the entropy constraint, using (4), we find the optimal value of the Lagrange multiplier, as shown by (9) at the bottom of the page, where (6) (7) (8) where (11) (12) (13) accounts for perceptual effects. Note that since is proportional to, the perceptually more important sinusoids are quantized more finely. Since is inversely proportional to the power of the sinusoid, the optimal amplitude density gives rise to a logarithmic amplitude quantizer. Both phase and frequency quantizers, however, are uniform for given amplitude. Commonly, logarithmic frequency quantization is used, which is based on psychoacoustical data measured for signal durations of about 1 s. However, in the proposed scheme, sinusoids are segmented into (short) frames. Hence, the errors introduced by the frequency quantization, which are noise-like, do not introduce a frequency error of the complete (long duration) sinusoid, but will have a noisy character due to the segmentation into relatively short time frames. Since the psychoacoustical model we use is developed for short-time prediction of errors, the logarithmic behavior will hardly occur, even for long duration signals. The distortion-rate relation for ECUSQ, concerning a single sinusoid, can now be found by substituting (11) (13) in (2), as shown by (14) at the bottom of the next page. It is easy to verify that all three parameters give exactly the same contribution to this distortion. Furthermore, it is not difficult to show that if is an even-symmetric window, the distortion (14) is minimal for. We assume this to be the case in the remainder of this paper. C. Implementation Issues As mentioned earlier, point density functions do not contain any information about the actual location of the quantization reconstruction points. In order to make a practical implementation of the derived quantizers, we will make some assumptions. (9)

5 970 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 3, MARCH 2007 First, we use the quantized amplitude value instead of the original amplitude value to obtain the optimal phase and frequency quantizers, i.e., and. Furthermore, the amplitude, phase, and frequency quantization points are assumed to be in the middle of the corresponding quantization intervals. The first amplitude quantization point is chosen to be at 0, and in each amplitude quantization level the first phase and frequency quantization point is selected to be 0 radians. Finally, in order to practically compute the derived quantizers, the term, defined in (10), needs to be calculated. The problem of how to compute this term is dealt with in the second part of this paper. D. Simulation Example In this section, we compare the theoretical high-resolution rate-distortion approximation derived in (14) to a practically obtained rate-distortion curve, which is constructed by generating a large number of realizations of single sinusoids, quantizing these sinusoids with the derived quantizers for different target entropies, and measuring the resulting average perceptual distortion and entropy of quantization indices. In order to generate these sinusoids, the input distributions of amplitude, phase, and frequency have to be given a priori. Assume that amplitude and frequency are Rayleigh distributed. This distribution has probability density function. We choose and. The phase is assumed to be uniformly distributed on. These distributions are close to the ones we observed in experiments with real audio data. Furthermore, we assume that,, and are independent. Knowing the distributions, a large number of triplets is generated, and subsequently quantized with the quantizers derived in (11) (13) for a given target entropy, where we use a Hanning window with length. In order to quantize the triplets, we need to estimate,, and. Knowing, we obtain, where is the Euler constant. Furthermore, is determined by quantizing the masking threshold sample for every triplet in the db-domain using a step size of 8 db, yielding. Since the three parameters are independently distributed, we have. Estimating the probability of each masking threshold quantization index and multiplying this probability with the differential entropy of the set of amplitudes, phases, and frequencies that gave rise to that index, and summing the result over all indices, we obtain an estimation of the conditional differential entropies Fig. 2. Theoretical versus practical distortion-rate performance for ECUSQ concerning a single sinusoid. where the differential entropies conditioned on a specific masking threshold quantization index are estimated by first determining the underlying probability density functions, using a variant of the nearest-neighbor method [23]. Using (1), the quantization distortion for each triplet is then determined, and averaged over all triplets. Second, after quantizing the triplets, the joint entropy of quantization indices is estimated. This is done by computing per individual triplet, where the entropies,, and are estimated by determining their corresponding conditional probability mass functions, using the known input distributions and step sizes. The final estimation of the joint entropy of quantization indices is then obtained by averaging over all triplets. Repeating this procedure for several different target entropies, we obtain a practical rate-distortion curve as plotted in Fig. 2, where we used and a frame-length. In the same figure, the theoretical high-resolution approximation of the average distortion given by (14) is plotted. It can clearly be seen that the curves converge towards each other, which verifies that (14) is indeed a valid approximation, already at practical low bit rates. At an entropy of 30 bits, the difference between the curves is only 0.1 db, and for higher rates this difference decreases, where we note that the (14)

6 KORTEN et al.: HIGH-RESOLUTION SPHERICAL QUANTIZATION OF SINUSOIDAL PARAMETERS 971 practical rate for representing a triplet is in the range of 12 to 20 bits. For very low rates, where the high-resolution assumptions do not hold, it is clear that the approximation (14) is not valid anymore. E. Distribution of the Entropy Between Amplitude, Phase, and Frequency In this section, we will determine the distribution of the entropy between amplitude, phase, frequency, and the masking threshold corresponding to the optimal ECUSQ quantizers (11) (13), using the entropy chain rule. Since we consider unrestricted quantization, we begin with computing, and use this to compute. Then we are able to determine by applying the chain rule. Using high-resolution assumptions we obtain Fig. 3. Entropies of quantization indices as a function of frame-length for H =20. as derived in the previous section and substituting the optimal amplitude quantization point density (11) into the entropies (18), (17), and (15) then gives (15) We apply the conditional entropy chain rule to determine this we need.for where we used that Subtracting (15) from (16) we obtain (16) (17) Finally, we can determine by subtracting (17) and (15) from (4) (18) In the aforementioned derivations, denotes conditional differential entropy of the corresponding variables. To illustrate these formulas, we assume again that amplitude, phase, and frequency are independently distributed, where and are Rayleigh distributed with and, and is uniformly distributed over. Since the three parameters are independently distributed, we have and. Using the values for,,,, and Note that, which is exactly the entropy constraint imposed on the optimal quantizer design. For a fixed target entropy, these entropies only depend on the frame-length and the window (both through ). We see that in this example phase will always be assigned 2.65 bits more than amplitude. Furthermore, if the frame-length is increased, more bits will be assigned to frequency, and hence less to amplitude and phase. This can be expected since for increasing frame-length, the frequency quantization error grows more rapidly than the amplitude and phase quantization errors. Consequently, more bits will have to be assigned to the frequency quantizer in order to keep the distortion minimal. In Fig. 3, the entropies of the quantization indices are plotted as a function of for, where we used a Hanning window. F. Special Case: the -Squared-Error Measure In order to see how well the proposed scheme performs in a rate-distortion sense, we would like to compare its performance with that of ECVQ, which is, according to theory, the optimal scheme in our framework. However, this would imply computing the masking threshold for each training vector in the

7 972 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 3, MARCH 2007 Next, we will compare the distortion-rate performances of ECUSQ with that of ECSSQ and ECVQ, each using the -squared-error measure. 1) Entropy-Constrained Strictly Spherical Quantization: In strictly spherical quantization, amplitude, phase, and frequency are quantized independently of each other. Hence, the quantization point density functions can only depend on their corresponding parameter:,, and. In the same way as in the unrestricted case, we then minimize the expected distortion (2) (with ), with respect to these densities, under the entropy constraint, resulting in the following expression for the ECSSQ distortion-rate relation, where we assume independence of,, and Fig. 4. ECUSQ cells. (a) Phase quantization for fixed amplitude and frequency quantization level. (b) Phase-frequency quantization for fixed amplitude quantization level. vector quantization algorithm. Since the number of training vectors we use to compare both schemes at high rates is on the order of 2, this is not feasible. Therefore, instead we consider a special case of the perceptual distortion measure, the -squared-error measure, for which we objectively compare the performance of the proposed scheme with that of entropy-constrained strictly spherical quantization (ECSSQ), in which the parameters are quantized independently, and ECVQ. Taking in the perceptual distortion (1) gives where we used Parseval Plancherel s formula. This leads us back to the -squared-error measure, as considered in [13], where perception is not taken into account. The optimal ECUSQ quantizers can then also be simplified, since we have,, and, i.e., in the case the optimal amplitude quantizer is uniform, and both the optimal phase and frequency quantizer are uniform in phase and frequency and depend linearly on amplitude. Since in a practical implementation of these quantizers, the quantized amplitude value is used instead of the original amplitude value, this means that within any amplitude quantization level, phase and frequency quantization are uniform. Fig. 4(a) shows a few ECUSQ cells for this simplified case, for fixed amplitude and frequency quantization level, and phase between 0 and. Note that the quantization step sizes in this figure are chosen such that we obtain a clear illustration of the shape of the ECUSQ cells; they do not match the actual step sizes corresponding to the optimal quantizers. In Fig. 4(b), the phase-frequency quantization plane is plotted for a fixed amplitude quantization level. This plane can be thought of as an unfolded half sphere. The highlighted part corresponds to the frequency quantization level chosen in the first figure. For the simplified distortion measure, the distortion-rate relation (14) becomes (19) (20) where. The ratio between the distortions of ECUSQ and ECSSQ can now be obtained through (19) and (20) Note that this ratio only depends on the amplitude probability density function. If we choose a Rayleigh distribution for the amplitude, it is easy to verify that this ratio is equal to for all possible variances. This indicates that ECUSQ outperforms ECSSQ significantly. To obtain the same distortion as ECUSQ, the rate of ECSSQ has to be increased by bits. 2) Entropy-Constrained Vector Quantization: The optimal entropy-constrained vector-quantizer is designed using a variant of the LBG algorithm [24], applied to triplets, containing amplitude, phase, and frequency parameters. Therefore, this scheme can be seen as a 3-D vector quantizer. Although ECVQ is optimal in our framework, the design of the ECVQ codebook turns out to be very computationally expensive, due to searching through the entire codebook for each training vector and each iteration. Applying the ECVQ algorithm to a training set containing triplets, we obtain a rate distortion curve for the ECVQ scheme, as plotted in Fig. 5. In this experiment, a codebook size of triplets is chosen, and the frame length is set at. We use a rectangular window and the same input distributions as used in Section III-D, from which the triplets are generated. In the same figure, both the practical and the theoretical curve corresponding to the ECUSQ scheme are plotted, for the same input settings, where we note that the difference between the ECUSQ curves at low rates is due to the fact that the theoretical high-resolution approximation is not valid for low rates. It can clearly be seen that for high rates, the two methods perform comparably, with ECVQ having a slight advantage, as theory predicts. However, generating the ECVQ curve took about one week of computation time on a 3-GHz CPU, 4-GB memory, using Matlab. This is due to the fact that the proposed scheme is only valid at high rates, and in order to compare both schemes at high rates, the codebook for ECVQ must be very large, which

8 KORTEN et al.: HIGH-RESOLUTION SPHERICAL QUANTIZATION OF SINUSOIDAL PARAMETERS 973 the input signal within segment, which we denote by, is then approximated by Fig. 5. Rate-distortion performance of ECVQ and ECUSQ. makes the algorithm extremely computationally expensive. In contrast, the ECUSQ curves can be generated instantly, since we have derived closed-form analytical expressions for the optimal ECUSQ quantizers. IV. ECUSQ OF MULTIPLE SINUSOIDS IN MULTIPLE SEGMENTS In this section, we will derive optimal ECUSQ high-resolution quantizers for amplitude, phase, and frequency for the situations experienced in practice, where multiple sinusoids are distributed across multiple segments. Furthermore, objective and subjective comparisons will be made with existing state-of-the-art quantization schemes. A. High-Resolution Approximations of the Expected Distortion and Entropy In this section, we derive high-resolution approximations for the total expected perceptual distortion and the total entropy. In sinusoidal coding, the input signal is divided into a number of consecutive segments of variable length and each segment is then modeled as a sum of sinusoids. Let denote the number of segments, the number of sinusoidal components in segment, and the length of the analysis window used in segment. Let denote the th component in segment. The part of where, and are the amplitude, phase, and frequency of the th component in segment, respectively. The quantization distortion in a segment consists of the quantization distortion of the individual components plus a contribution due to the mutual interaction of the components. As shown in [25], this mutual interaction can be neglected if the sinusoids are spaced sufficiently far apart in the frequency domain. For practical purposes, this is the case if the sinusoids are estimated using the psychoacoustical matching pursuit algorithm [26]. Define,, and for. The masking threshold corresponding to segment is denoted by. Note that only the samples need to be encoded. Again, quantizing these values uniformly in the db-domain using a stepsize of 8 db was experimentally found to be inaudible. Furthermore, the rate needed to encode the samples can be measured and is approximately 2 to 2.5 bits per sample, depending on the input signal. The quantization error signal is given by. In Appendix B, it is shown that under high-resolution assumptions, the expected perceptual distortion can be approximated by (21), shown at the bottom of the page, where is the joint probability density function of all amplitudes, phases, and frequencies in segment, where,, and are the corresponding distributions. Furthermore,. The total expected perceptual distortion approximated by can then be (22) where we assume that components in different segments are statistically independent. (21)

9 974 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 3, MARCH 2007 In the remainder of this section, we derive a high-resolution approximation for the total entropy of all quantization indices, over all segments. Let,, and denote the alphabets of amplitude, phase, and frequency quantization indices, respectively, corresponding to the components in segment. Additionally, the masking threshold samples have to be quantized for all components. Let denote the alphabet of masking threshold quantization indices in segment. The joint entropy of all quantization indices in segment is equal to We assume that entropies are additive over segments (25) Consequently, by summing (23) over all segments, and using (24), we obtain a high-resolution approximation for the total entropy of quantization indices. B. Optimal Quantizers In this section, we will derive the quantization point densities that minimize the total expected perceptual distortion, while satisfying an entropy constraint, i.e., subject to (26) (23) where is the rate needed to encode the masking threshold in segment. Using high-resolution assumptions, we approximate where is a prespecified target entropy. Using the derived high-resolution approximations for and, this constrained minimization problem can be solved as in the single sinusoid case, by applying the the method of Lagrange multipliers. The Lagrangian cost function we wish to minimize is given by, where the Lagrangian multiplier is chosen such that the entropy constraint is satisfied. Minimizing the cost function by evaluating the Euler Lagrange equations with respect to,, and yields (27) (28) where has distribution and. Furthermore (24) (29) Substituting (27) (29) in the entropy constraint, using (24) and (25), we find the optimal value of the Lagrange multiplier, as shown by (30) at the bottom of the page, where is the joint differential entropy of amplitude, phase, and frequency, corresponding to segment, conditioned on. Note that the integration over has been omitted in the three integral expressions in (24), since is fully determined by,, and. (30)

10 KORTEN et al.: HIGH-RESOLUTION SPHERICAL QUANTIZATION OF SINUSOIDAL PARAMETERS 975 are introduced for notational simplicity. Substituting (30) back into (27) (29), we obtain analytical expressions for the optimal high-resolution ECUSQ quantizers that solve (26), as shown by (31) (33) at the bottom of the page, where to the simulation example for the single sinusoid case, using observed data from the input signal. The term can be practically computed in the following way. Denote, then we have incorporates the perceptual aspect of the quantizers. The distortion-rate relation for ECUSQ can now be found by substituting (31) (33) into (22), using (21), as shown by (34) at the bottom of the page. It is easy to verify that all components and all parameters give exactly the same contribution to this distortion. This means that, on average, all components give rise to the same distortion, adding up to (34). Furthermore, focusing on a single component, all three sinusoidal parameters contribute equally to the distortion corresponding to this component. C. Implementation Issues In order to practically implement the derived quantizers, we use the same assumptions as stated in Section III-C for the single sinusoid case. Moreover, we assume that amplitudes, phases, frequencies, and masking threshold samples are independently and identically distributed over all segments and all components, with independent distributions,,, and, respectively. This assumption is made for the following reason. We typically use a maximum of 75 components per segment, which is too little data to be able to accurately estimate differential entropies and other source dependent terms per segment, which are needed for quantization. Assuming independently and identically distributed variables allows us to take all components from the entire input signal together, resulting in sufficient data to make the mentioned estimations. We realize that this is not necessarily a valid assumption for real audio, as distributions can change over time. Under this assumption, we have, and secondly. The values of,,,, and are estimated similar (35) Since the masking threshold is inside the expectation operator, the expression in (35) cannot be computed analytically. However, since it is reasonable to assume that the random variables are statistically independent for all and, the Central Limit Theorem gives where denotes the variance, assuming that the individual variances of satisfy the Lindenberg conditions [27]. This implies that the expectation in (35) can be removed (36) for sufficiently large. The expression in (36) can be analytically computed. (31) (32) (33) (34)

11 976 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 3, MARCH 2007 rate audio coder in [28] spends approximately 20 bits per parameter triplet. For very low bitrates, where the high-resolution assumptions are not valid, the approximation does not hold and the curves diverge. Fig. 6. Theoretical versus practical distortion-rate performance for ECUSQ, using a pop music fragment. E. Objective Comparison With ECUPQ+ In this section, the proposed ECUSQ scheme is objectively compared in terms of rate-distortion performance to the ECUPQ+ scheme, which is a combination of the ECUPQ scheme and an optimal entropy-constrained frequency quantizer. Entropy-constrained unrestricted polar quantization was introduced in [12], and as the term polar already implies, in this scheme only amplitude and phase quantization is considered. Since ECUPQ is unrestricted, amplitude and phase are quantized dependently. If frequency quantization is left out, it can be shown that the expected perceptual distortion (49), corresponding to the th component in segment evaluates to Finally, next to all quantization indices, the following side information has to be sent to the decoder in order to reconstruct the sinusoidal parameters: the segmentation of the input signal:, and ; the distribution of sinusoids: ; source-dependent variables:, and. This information has to be sent for each segment, and is sufficient to encode and decode the sinusoidal parameters. D. Simulation Example In this section, we compare the theoretical high-resolution distortion-rate relation derived in (34) to a practically obtained distortion-rate curve, which is generated by encoding a pop music fragment using the SiCAS coder [28]. In this sinusoidal coder, first the jointly optimal segmentation and distribution of sinusoids is determined, using segment lengths of 512, 768, 1024, or 1280 samples, a maximum number of 75 sinusoids per segment, and approximately 2000 sinusoids per second on average. The sample frequency used was 48 khz. In this way, we obtain a framework of segments and corresponding sinusoids. This framework is assumed a fixed input for the quantization scheme, independent of the used target entropy. Subsequently, the sinusoids are quantized one by one, using the derived optimal high-resolution ECUSQ quantizers, for a given target entropy. Using the perceptual distortion measure, we compute the distortion for each component in each segment. The total perceptual distortion is then obtained by summing over all segments and components. Repeating this procedure for several different target entropies, we obtain the practical rate-distortion curve plotted in Fig. 6, where we express the rate in terms of the target entropy per component.in the same figure, the theoretical high-resolution distortion-rate relation (34) is plotted. Since the curves clearly converge towards each other, the high-resolution approximation (34) for the total expected perceptual distortion is valid, even at bitrates as low as 14 bits per parameter triplet, a bit rate which is of great interest in practical applications. In comparison, the low where the frequencies are given and fixed. Furthermore, the entropy constraint reduces to, where, and is a prespecified target entropy. The entropy of amplitude, phase and masking threshold quantization indices in segment can under high-resolution assumptions be approximated by It is then straightforward to derive, using variational techniques, the amplitude and phase point density functions that minimize the total expected perceptual distortion (22) while satisfying the entropy constraint. These are given by (37) (39), shown at the bottom of the next page. To be able to compare the performances of the ECUPQ scheme and the proposed ECUSQ scheme, we need to design a frequency quantizer, for use with the ECUPQ amplitude and phase quantizers obtained above. This can be naturally done by considering the problem of independent quantization of frequency, i.e., amplitude and phase quantization are not taken into account, and deriving the corresponding optimal entropy-constrained frequency quantizer. Similar to previous derivations, we obtain the frequency point density function that

12 KORTEN et al.: HIGH-RESOLUTION SPHERICAL QUANTIZATION OF SINUSOIDAL PARAMETERS 977 minimizes the total expected perceptual distortion (22), while satisfying the entropy constraint (40) where is the prespecified target entropy for frequency parameters. Since the rate needed to encode the masking threshold is already included in the entropy constraint for the amplitude and phase quantizers, it can be omitted in the frequency quantization. The quantizers (38) (40) form a quantization scheme, denoted by ECUPQ+. In ECUSQ the optimal quantizers are derived by jointly optimizing for all three parameters simultaneously at target entropy, while in ECUPQ+ the quantizers are found by first jointly optimizing for amplitude and phase at target entropy, and then separately optimizing for frequency at target entropy. Thus, in theory ECUSQ outperforms ECUPQ+. In order to compare the two schemes at the same total target entropy, we must have. An important advantage ECUSQ offers over the ECUPQ+ method is that in the proposed scheme, the bit distribution between amplitude, phase, and frequency follows directly from the derived formulas, as seen for the single sinusoid case in Section III-E. In contrast, ECUPQ+ does not specify the part of the total bit budget that is assigned to amplitude and phase parameters, or the part of the total bit budget that is assigned to frequency parameters. Hence, these values have to be chosen a priori, which makes it more difficult to find the optimal distribution of bits in this scheme, especially because this optimal distribution will turn out to be dependent on the total target rate. Encoding a pop music fragment with the SiCAS coder, using the same settings as in Section IV-D, we can objectively compare the performances of both quantization schemes by computing the total perceptual distortion for each scheme, for several different target entropies. The results are shown in Fig. 7, where the distortion in the ECUPQ+ scheme is plotted for several values of, the percentage of the bit budget that is assigned to frequency. These results indicate that in practice for every target rate, one can find a bit balance in the ECUPQ+ scheme, such that it performs very close to the ECUSQ scheme. However, choosing a nonoptimal bit balance can lead to a considerable performance difference Fig. 7. Rate-distortion performance of ECUSQ and ECUPQ+, p percentage of the bit budget that is assigned to frequency. denotes the as compared to ECUSQ. Note that the optimal bit balance in the ECUPQ+ scheme is dependent on the target entropy, i.e., no fixed distribution is optimal. Consequently, the optimal bit balance in ECUPQ+ has to be redetermined for every target rate, whereas in ECUSQ this balance follows naturally, which is due to jointly optimizing for all three parameters. Note that similar plots are obtained when encoding other types of audio. F. Subjective Comparisons: Listening Test Results A listening test was performed to compare the proposed ECUSQ scheme to a state-ofthe-art sinusoidal quantization scheme that is used in the SiCAS coder [28]. In contrast to ECUSQ, the quantizers in the SiCAS coder are fixed for all input signals. Amplitude and frequency quantization are logarithmic, where the relative step sizes are, and, respectively. Phase is uniformly quantized with a step size of. After entropy coding, the measured bit rate in the SiCAS quantization scheme is approximately 20 bits per component, where components are quantized/encoded without using differential techniques. Note that the described amplitude and frequency quantizers with these step sizes are also used in the standardized MPEG-4 SSC coder [6] to quantize births of sinusoidal tracks. We used five different excerpts in our listening test: jazz music, harpsichord, German male speech, classical music, and Eric Clapton (pop music), all sampled at 48 khz, and each with (37) (38) (39)

13 978 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 3, MARCH 2007 TABLE I LISTENING TEST RESULTS: RELATIVE PREFERENCES [%] FOR ECUSQ AND CORRESPONDING STATISTICAL SIGNIFICANCE a duration of approximately 10 s. First, using the SiCAS coder, the target signal is divided into variable-length segments, and sinusoids are distributed across the segments [28], using segment lengths of 512, 768, 1024, or 1280 samples, a maximum number of 75 sinusoids per segment, and 1000 sinusoids per second on average. This framework will be the fixed input for the quantizers. Second, the excerpts are quantized with both quantization schemes, using a specified target entropy in the ECUSQ scheme. Finally, for each excerpt, the error signal due to modeling by sinusoids is then added to the quantized signal, such that any artifacts are due to quantization of the sinusoidal parameters, as desired. We then obtain the final excerpts that are used in this listening test. For each excerpt, we presented the listeners three versions at a time: an excerpt quantized with the ECUSQ scheme for a specified target entropy, the same excerpt quantized with the SiCAS quantization scheme, and the original version. The participants were instructed to indicate whether the ECUSQ version had a worse or better/equal audio quality as compared to the SiCAS version, i.e., a binary choice. The excerpt quantized with the SiCAS quantization scheme is very close to transparent quality, which is mainly since the error signal due to modeling is added to the quantized signal in this test. For this reason, the distinction between equal and better is very small here. The described procedure was carried out for eight different ECUSQ target entropies per component, ranging from 12 to 19 bits, using increments of 1 bit, and for all five excerpts. Furthermore, every participant performed the entire test twice. A total of 11 listeners participated in the test and the authors were not included. The results are presented in Table I, where for each excerpt and each target entropy per component, we give the percentage of listeners that indicated the ECUSQ encoded version as being better/equal. Note that since the test was done twice by every listener, these percentages are based on 22 test results. In the last row, the percentages are averaged over the five excerpts. Furthermore, the statistical significance of the percentages is stated in brackets, where indicates that the ECUSQ scheme performs statistically significantly worse, and indicates that the ECUSQ scheme performs statistically significantly better or equal. Furthermore, (0) indicates that the corresponding percentage is not statistically significant. These results were obtained by applying the Wilcoxon matched-pairs signed-rank test of equality of medians [29], using a significance level of Clearly, for the lower target entropies at 12 and 13 bits the SiCAS quantization scheme performs better for all excerpts. At 14 bits, the SiCAS scheme performs better for the German male speech and the harpsichord excerpt, while the remaining percentages are statistically insignificant. At 15 bits, the ECUSQ scheme performs better or equal for the jazz and classical music excerpts. Since the SiCAS quantization scheme uses 20 bits per component, we gain the considerable amount of 5 bits per component for these two excerpts by using the ECUSQ scheme, maintaining the same subjective quality level as in the SiCAS scheme. For the German male speech excerpt this gain is 4 bits, and for the more critical harpsichord and Eric Clapton excerpt we gain 3 bits per component. Note that since the excerpts quantized with the SiCAS scheme are very close to transparent quality, so are the ECUSQ quantized excerpts at the mentioned bit rates. In conclusion, by applying the ECUSQ scheme as derived in this paper, we can gain up to 5 bits per component, which corresponds to a bit rate reduction of 25% for the settings in this experiment, and still achieve close to transparent subjective quality. V. CONCLUSION This paper presented a scheme for entropy-constrained quantization of sinusoidal parameters. In this scheme, which is called ECUSQ, all sinusoidal parameters amplitude, phase, and frequency are quantized dependently. Using high-resolution assumptions we derived analytical expressions for the optimal ECUSQ amplitude, phase, and frequency quantizers, which minimize the expected perceptual distortion while the corresponding quantization indices satisfy an entropy constraint. The perceptual distortion measure used in this work is based on psychoacoustical properties of the auditory system. The ECUSQ quantizers were derived both for the case of a single sinusoid, and for the more practically relevant case of multiple sinusoids distributed across multiple segments. As desired, the quantizers prove to be flexible and of low complexity, in the sense that they can be determined easily for varying bit rate requirements, without any sort of iterative retraining procedures. To measure the performance of the proposed scheme, it was compared both objectively and subjectively to several existing entropy-constrained quantization schemes. For the squared error distortion measure, we demonstrated that ECUSQ performs very close to the theoretically optimal entropy-constrained vector quantization, in terms of objective rate-distortion performance. Furthermore, for the perceptual distortion measure, it was shown that the ECUSQ scheme objectively outperforms an existing sinusoidal quantization scheme, where frequency quantization is done independent of amplitude/phase quantization. Finally, a subjective listening test was conducted, in which the proposed scheme is compared to an existing state-of-the-art sinusoidal quantization scheme with fixed quantizers for all input signals. An average bit rate reduction of 20% was achieved by the proposed scheme, at the same subjective quality level as the existing scheme.

14 KORTEN et al.: HIGH-RESOLUTION SPHERICAL QUANTIZATION OF SINUSOIDAL PARAMETERS 979 APPENDIX A High-resolution Approximation of the Expected Distortion Single Sinusoid: In this section, we derive a high-resolution expression of the expected perceptual distortion. First, we evaluate the perceptual distortion (1) for the single sinusoid case. Let. Note that since a sinusoid can be written as a sum of two complex exponentials with opposite frequencies, the perceptual distortion measure (1) equals where is the joint probability density function of amplitude, phase, and frequency, corresponding to distributions, and, respectively. Let denote the USQ cell corresponding to amplitude, phase, and frequency quantization indices, and, respectively, and let, and denote their corresponding alphabets. For notational simplicity, we omit the mutual dependencies between the quantization indices. Summing over all USQ cells yields (41) For large, the power spectrum of the windowed error signal will converge to a sum of delta-functions at frequencies and and their opposites. Then, the sidelobes of can be neglected; furthermore, the widths of the main lobes are sufficiently small to assume the masking threshold to be constant across a main lobe. Since we can also assume that, due to high-resolution assumptions and the smooth masking curve, we can approximate (41) by (42) for sufficiently large. The contribution of the complex exponentials at negative frequencies to the integral in (42) can be neglected. Defining, this means for sufficiently large, where (43) (46) where. Here we used the high-resolution assumption that is approximately constant over each USQ cell. Since we then have a uniform distribution in each cell, the optimal quantization reconstruction points are centered in the quantization intervals, i.e., cell is defined by amplitude, phase, and frequency quantization intervals given by,, and, respectively, where denotes the length of the respective quantization interval. Using these boundaries, the integral in (46) can be evaluated, by substituting (43) and (44). For each cell, this is carried out in the same way, so we will focus on a single cell and leave out the quantization indices for notational simplicity, as shown by (47) at the bottom of the next page, where.in, we used the high-resolution assumption that the masking threshold can be considered flat within each USQ cell. Second, in, we substituted Taylor expansions of and, around and, respectively, neglecting terms of order higher than three. Substituting (47) back into (46), we obtain (44) and. The expected perceptual distortion is given by (45)

15 980 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 3, MARCH 2007 Here we again used high-resolution assumptions, and hence replaced sums by integrals and quantization step sizes by so-called quantization point densities. APPENDIX B High-Resolution Approximation of the Expected Distortion Multiple Sinusoids and Segments: In this section, we derive a high-resolution expression of the expected perceptual distortion corresponding to the th component in segment. Let. Using the approximation (43), the perceptual distortion (1) corresponding to the th component in segment evaluates to (48) for sufficiently large, where. The expected perceptual distortion corresponding to the th component in segment is given by. For notational simplicity, we number these cells by. Substituting (48) in (49), and using high-resolution assumptions, we then obtain (50) where,, and are quantized to cell. Applying (44), and using Taylor series, the integral in (50) can be approximated by (49) Here is the joint probability density function of all amplitudes, phases, and frequencies in segment, where,, and are the corresponding distributions. The integral in (49) can be evaluated by summing over all possible -dimensional quantization cells for (51) where and where the step sizes correspond to quantization cell. Furthermore, we used the fact that the optimal quantization reconstruction points are centered in the corresponding quantization inter- (47)

16 KORTEN et al.: HIGH-RESOLUTION SPHERICAL QUANTIZATION OF SINUSOIDAL PARAMETERS 981 vals. Substituting (51) into (50), and using high-resolution assumptions to replace sums by integrals and step sizes by point density functions, we obtain the equation as shown at the bottom of the previous page. ACKNOWLEDGMENT The authors would like to thank the reviewers for their useful comments. REFERENCES [1] K. N. Hamdy, M. Ali, and A. H. Tewfik, Low bit rate high quality audio coding with combined harmonic and wavelet representation, in Proc. IEEE Int. Conf. Acoust. Speech, Signal Process., Atlanta, Georgia, May 1996, vol. 2, pp [2] H. Purnhagen, Advances in parametric audio coding, in Proc IEEE Workshop Applications Signal Process. to Audio Acoust., New Paltz, NY, Oct. 1999, pp [3] T. S. Verma and T. H. Y. Meng, A 6 kbps to 85 kbps scalable audio coder, in Proc. IEEE Int. Conf. Acoust. Speech, Signal Process., Istanbul, Turkey, Jun. 2000, vol. 2, pp [4] A. W. J. Oomen and A. C. B. den, Sinusoids plus noise modeling for audio signals, in Proc. Audio Eng. Soc. 17th Conf. High Quality Audio Coding, Florence, Italy, Sep. 1999, pp [5] R. M. Gray, Source Coding Theory. Norwell, MA: Kluwer, [6] E. Schuijers, W. Oomen, B. den Brinker, and J. Breebaart, Advances in parametric coding for high-quality audio, in Proc. 114th Conv. Audio Eng. Soc., Amsterdam, The Netherlands, Mar [7] R. Heusdens, J. Jensen, P. E. L. Korten, and R. Vafin, Rate-distortion optimal high-resolution differential quantisation for sinusoidal coding of audio and speech, in Proc. IEEE Workshop Applications of Signal Process. to Audio Acoust., New Paltz, NY, Oct. 2005, pp [8] J. Jensen and R. Heusdens, A comparison of differential encoding schemes, in Proc. IEEE Workshop on Applications of Signal Process. to Audio and Acoustics, New Paltz, NY, Oct. 2003, pp [9] G. Wilson, Magnitude/phase quantization of independent Gaussian variates, IEEE Trans. Commun., vol. COM-28, no. 11, pp , Nov [10] P. F. Swaszek and T. W. Ku, Asymptotic performance of unrestricted polar quantizer, IEEE Trans. Inf. Theory, vol. IT-32, no. 2, pp , Mar [11] D. L. Neuhoff, Polar quantization revisited, in Proc. IEEE Int. Symp. Inf. Theory, Ulm, Germany, Jul. 1997, pp [12] R. Vafin and W. B. Kleijn, Entropy-constrained polar quantization and its application to audio coding, IEEE Trans. Speech Audio Process., vol. 13, no. 2, pp , Mar [13] P. E. L. Korten, J. Jensen, and R. Heusdens, High rate spherical quantization of sinusoidal parameters, in Proc. 12th Eur. Signal Process. Conf., Vienna, Austria, Sep. 2004, pp [14] R. Vafin, D. Prakash, and W. B. Kleijn, On frequency quantization in sinusoidal audio coding, IEEE Signal Process. Lett., vol. 12, no. 3, pp , Mar [15] P. E. L. Korten, J. Jensen, and R. Heusdens, High-resolution spherical quantization of sinusoidal parameters using a perceptual distortion measure, in Proc. IEEE Int. Conf. Acoust. Speech, Signal Process., Philadelphia, PA, Mar. 2005, vol. 3, pp [16] P. Prandoni, M. Goodwin, and M. Vetterli, Optimal time segmentation for signal modeling and compression, in Proc. IEEE Int. Conf. Acoust. Speech, and Signal Process., Munich, Germany, Apr. 1997, pp [17] R. Heusdens and S. van de Par, Rate-distortion optimal sinusoidal modeling of audio and speech using psychoacoustical matching pursuits, in Proc. IEEE Int. Conf. Acoust. Speech, Signal Process., Orlando, FL, May 2002, vol. II, pp [18] H. Purnhagen and N. Meine, HILN The MPEG-4 parametric audio coding tools, in Proc. IEEE Int. Symp. Circuits Syst., Geneva, Switzerland, May 2000, pp [19] S. van de Par, A. Kohlrausch, G. Charestan, and R. Heusdens, Anew psychoacoustical masking model for audio coding applications, in Proc. IEEE Int. Conf. Acoust. Speech, Signal Process., Orlando, FL, May 2002, vol. 2, pp [20] R. M. Gray and D. L. Neuhoff, Quantization, IEEE Trans. Inf. Theory, vol. 44, no. 6, pp , Oct [21] S. P. Lloyd, Least squares quantization in PCM, IEEE Trans. Inf. Theory, vol. IT-28, no. 2, pp , Mar [22] H. Sagan, Introduction to the Calculus of Variations, ser. Dover Books on Mathematics. New York: Dover, [23] R. O. Duda and P. E. Hart, Pattern Classification and Scene Analysis. New York: Wiley, [24] P. A. Chou, T. Lookabaugh, and R. M. Gray, Entropy-constrained vector quantization, IEEE Trans. Acoust., Speech, Signal Process., vol. 37, pp , Jan [25] R. P. Westerlaken, High-resolution quantisation of sinusoidal parameters using a perceptual distortion measure, M.S. thesis, Delft Univ. Technol., Delft, The Netherlands, [26] R. Heusdens, R. Vafin, and W. B. Kleijn, Sinusoidal modelling using psychoacoustic-adaptive matching pursuits, IEEE Signal Process. Lett., vol. 9, no. 8, pp , Aug [27] W. Feller, Introduction to Probability Theory and Its Applications, 2nd ed. New York: Wiley, 1971, vol. 2. [28] R. Heusdens et al., Bit-rate scalable intraframe sinusoidal audio coding based on rate-distortion optimization, J. Audio Eng. Soc., vol. 54, no. 3, pp , Mar [29] D. J. Sheskin, Handbook of Parametric and Nonparametric Statistical Procedures, 3rd ed. London, U.K.: Chapman & Hall, Pim Korten received the M.Sc. degree in applied mathematics from Delft University of Technology, Delft, The Netherlands, in 2003 and is currently working towards the Ph.D. degree in electrical engineering in the Information and Communication Theory Group, Department of Mediamatics, Faculty of Electrical Engineering, Mathematics, and Computer Science (EEMCS), Delft University of Technology. His research interests include perceptual audio coding, sinusoidal modeling and quantization, high-resolution quantization theory, and rate-distortion optimization. Jesper Jensen received the M.Sc and Ph.D. degrees in electrical engineering from Aalborg University, Aalborg, Denmark, in 1996 and 2000, respectively. From 1996 to 2001, he was with Center for Person- Kommunikation (CPK), Aalborg University, as a Researcher, Ph.D. student, and Assistant Research Professor. In 1999, he was a Visiting Researcher at the Center for Spoken Language Research, University of Colorado, Boulder. He is currently an Assistant Professor at the Delft University of Technology, Delft, The Netherlands. His main research interests are digital speech and audio signal processing, including coding, synthesis, and enhancement. Richard Heusdens received the M.Sc. and Ph.D. degrees from the Delft University of Technology, Delft, The Netherlands, in 1992 and 1997, respectively. Since 2002, he has been an Associate Professor in the Department of Mediamatics, Delft University of Technology. In the spring of 1992, he joined the Digital Signal Processing Group, Philips Research Laboratories, Eindhoven, The Netherlands. He has worked on various topics in the field of signal processing, such as image/video compression and VLSI architectures for image processing algorithms. In 1997, he joined the Circuits and Systems Group, Delft University of Technology, where he was a Postdoctoral Researcher. In 2000, he moved to the Information and Communication Theory (ICT) Group, where he became an Assistant Professor responsible for the audio and speech processing activities within the ICT group. He is involved in research projects that cover subjects such as audio and speech coding, speech enhancement, and digital watermarking of audio.

Entropy-constrained quantization of exponentially damped sinusoids parameters

Entropy-constrained quantization of exponentially damped sinusoids parameters Entropy-constrained quantization of exponentially damped sinusoids parameters Olivier Derrien, Roland Badeau, Gaël Richard To cite this version: Olivier Derrien, Roland Badeau, Gaël Richard. Entropy-constrained

More information

EE368B Image and Video Compression

EE368B Image and Video Compression EE368B Image and Video Compression Homework Set #2 due Friday, October 20, 2000, 9 a.m. Introduction The Lloyd-Max quantizer is a scalar quantizer which can be seen as a special case of a vector quantizer

More information

The Choice of MPEG-4 AAC encoding parameters as a direct function of the perceptual entropy of the audio signal

The Choice of MPEG-4 AAC encoding parameters as a direct function of the perceptual entropy of the audio signal The Choice of MPEG-4 AAC encoding parameters as a direct function of the perceptual entropy of the audio signal Claus Bauer, Mark Vinton Abstract This paper proposes a new procedure of lowcomplexity to

More information

Proc. of NCC 2010, Chennai, India

Proc. of NCC 2010, Chennai, India Proc. of NCC 2010, Chennai, India Trajectory and surface modeling of LSF for low rate speech coding M. Deepak and Preeti Rao Department of Electrical Engineering Indian Institute of Technology, Bombay

More information

MANY digital speech communication applications, e.g.,

MANY digital speech communication applications, e.g., 406 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 2, FEBRUARY 2007 An MMSE Estimator for Speech Enhancement Under a Combined Stochastic Deterministic Speech Model Richard C.

More information

Selective Use Of Multiple Entropy Models In Audio Coding

Selective Use Of Multiple Entropy Models In Audio Coding Selective Use Of Multiple Entropy Models In Audio Coding Sanjeev Mehrotra, Wei-ge Chen Microsoft Corporation One Microsoft Way, Redmond, WA 98052 {sanjeevm,wchen}@microsoft.com Abstract The use of multiple

More information

ON SCALABLE CODING OF HIDDEN MARKOV SOURCES. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose

ON SCALABLE CODING OF HIDDEN MARKOV SOURCES. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose ON SCALABLE CODING OF HIDDEN MARKOV SOURCES Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose Department of Electrical and Computer Engineering University of California, Santa Barbara, CA, 93106

More information

On Perceptual Audio Compression with Side Information at the Decoder

On Perceptual Audio Compression with Side Information at the Decoder On Perceptual Audio Compression with Side Information at the Decoder Adel Zahedi, Jan Østergaard, Søren Holdt Jensen, Patrick Naylor, and Søren Bech Department of Electronic Systems Aalborg University,

More information

Scalar and Vector Quantization. National Chiao Tung University Chun-Jen Tsai 11/06/2014

Scalar and Vector Quantization. National Chiao Tung University Chun-Jen Tsai 11/06/2014 Scalar and Vector Quantization National Chiao Tung University Chun-Jen Tsai 11/06/014 Basic Concept of Quantization Quantization is the process of representing a large, possibly infinite, set of values

More information

Introduction p. 1 Compression Techniques p. 3 Lossless Compression p. 4 Lossy Compression p. 5 Measures of Performance p. 5 Modeling and Coding p.

Introduction p. 1 Compression Techniques p. 3 Lossless Compression p. 4 Lossy Compression p. 5 Measures of Performance p. 5 Modeling and Coding p. Preface p. xvii Introduction p. 1 Compression Techniques p. 3 Lossless Compression p. 4 Lossy Compression p. 5 Measures of Performance p. 5 Modeling and Coding p. 6 Summary p. 10 Projects and Problems

More information

The information loss in quantization

The information loss in quantization The information loss in quantization The rough meaning of quantization in the frame of coding is representing numerical quantities with a finite set of symbols. The mapping between numbers, which are normally

More information

Waveform-Based Coding: Outline

Waveform-Based Coding: Outline Waveform-Based Coding: Transform and Predictive Coding Yao Wang Polytechnic University, Brooklyn, NY11201 http://eeweb.poly.edu/~yao Based on: Y. Wang, J. Ostermann, and Y.-Q. Zhang, Video Processing and

More information

A Video Codec Incorporating Block-Based Multi-Hypothesis Motion-Compensated Prediction

A Video Codec Incorporating Block-Based Multi-Hypothesis Motion-Compensated Prediction SPIE Conference on Visual Communications and Image Processing, Perth, Australia, June 2000 1 A Video Codec Incorporating Block-Based Multi-Hypothesis Motion-Compensated Prediction Markus Flierl, Thomas

More information

MULTI-RESOLUTION SIGNAL DECOMPOSITION WITH TIME-DOMAIN SPECTROGRAM FACTORIZATION. Hirokazu Kameoka

MULTI-RESOLUTION SIGNAL DECOMPOSITION WITH TIME-DOMAIN SPECTROGRAM FACTORIZATION. Hirokazu Kameoka MULTI-RESOLUTION SIGNAL DECOMPOSITION WITH TIME-DOMAIN SPECTROGRAM FACTORIZATION Hiroazu Kameoa The University of Toyo / Nippon Telegraph and Telephone Corporation ABSTRACT This paper proposes a novel

More information

Empirical Lower Bound on the Bitrate for the Transparent Memoryless Coding of Wideband LPC Parameters

Empirical Lower Bound on the Bitrate for the Transparent Memoryless Coding of Wideband LPC Parameters Empirical Lower Bound on the Bitrate for the Transparent Memoryless Coding of Wideband LPC Parameters Author So, Stephen, Paliwal, Kuldip Published 2006 Journal Title IEEE Signal Processing Letters DOI

More information

OBJECT CODING OF HARMONIC SOUNDS USING SPARSE AND STRUCTURED REPRESENTATIONS

OBJECT CODING OF HARMONIC SOUNDS USING SPARSE AND STRUCTURED REPRESENTATIONS OBJECT CODING OF HARMONIC SOUNDS USING SPARSE AND STRUCTURED REPRESENTATIONS Grégory Cornuz 1, Emmanuel Ravelli 1,2, 1 Institut Jean Le Rond d Alembert, LAM team Université Pierre et Marie Curie - Paris

More information

LOW COMPLEXITY WIDEBAND LSF QUANTIZATION USING GMM OF UNCORRELATED GAUSSIAN MIXTURES

LOW COMPLEXITY WIDEBAND LSF QUANTIZATION USING GMM OF UNCORRELATED GAUSSIAN MIXTURES LOW COMPLEXITY WIDEBAND LSF QUANTIZATION USING GMM OF UNCORRELATED GAUSSIAN MIXTURES Saikat Chatterjee and T.V. Sreenivas Department of Electrical Communication Engineering Indian Institute of Science,

More information

HARMONIC VECTOR QUANTIZATION

HARMONIC VECTOR QUANTIZATION HARMONIC VECTOR QUANTIZATION Volodya Grancharov, Sigurdur Sverrisson, Erik Norvell, Tomas Toftgård, Jonas Svedberg, and Harald Pobloth SMN, Ericsson Research, Ericsson AB 64 8, Stockholm, Sweden ABSTRACT

More information

Signal Modeling Techniques in Speech Recognition. Hassan A. Kingravi

Signal Modeling Techniques in Speech Recognition. Hassan A. Kingravi Signal Modeling Techniques in Speech Recognition Hassan A. Kingravi Outline Introduction Spectral Shaping Spectral Analysis Parameter Transforms Statistical Modeling Discussion Conclusions 1: Introduction

More information

at Some sort of quantization is necessary to represent continuous signals in digital form

at Some sort of quantization is necessary to represent continuous signals in digital form Quantization at Some sort of quantization is necessary to represent continuous signals in digital form x(n 1,n ) x(t 1,tt ) D Sampler Quantizer x q (n 1,nn ) Digitizer (A/D) Quantization is also used for

More information

Real-Time Perceptual Moving-Horizon Multiple-Description Audio Coding

Real-Time Perceptual Moving-Horizon Multiple-Description Audio Coding 4286 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL 59, NO 9, SEPTEMBER 2011 Real-Time Perceptual Moving-Horizon Multiple-Description Audio Coding Jan Østergaard, Member, IEEE, Daniel E Quevedo, Member, IEEE,

More information

Design Criteria for the Quadratically Interpolated FFT Method (I): Bias due to Interpolation

Design Criteria for the Quadratically Interpolated FFT Method (I): Bias due to Interpolation CENTER FOR COMPUTER RESEARCH IN MUSIC AND ACOUSTICS DEPARTMENT OF MUSIC, STANFORD UNIVERSITY REPORT NO. STAN-M-4 Design Criteria for the Quadratically Interpolated FFT Method (I): Bias due to Interpolation

More information

Rate-Constrained Multihypothesis Prediction for Motion-Compensated Video Compression

Rate-Constrained Multihypothesis Prediction for Motion-Compensated Video Compression IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL 12, NO 11, NOVEMBER 2002 957 Rate-Constrained Multihypothesis Prediction for Motion-Compensated Video Compression Markus Flierl, Student

More information

4. Quantization and Data Compression. ECE 302 Spring 2012 Purdue University, School of ECE Prof. Ilya Pollak

4. Quantization and Data Compression. ECE 302 Spring 2012 Purdue University, School of ECE Prof. Ilya Pollak 4. Quantization and Data Compression ECE 32 Spring 22 Purdue University, School of ECE Prof. What is data compression? Reducing the file size without compromising the quality of the data stored in the

More information

On Optimal Coding of Hidden Markov Sources

On Optimal Coding of Hidden Markov Sources 2014 Data Compression Conference On Optimal Coding of Hidden Markov Sources Mehdi Salehifar, Emrah Akyol, Kumar Viswanatha, and Kenneth Rose Department of Electrical and Computer Engineering University

More information

Multimedia Networking ECE 599

Multimedia Networking ECE 599 Multimedia Networking ECE 599 Prof. Thinh Nguyen School of Electrical Engineering and Computer Science Based on lectures from B. Lee, B. Girod, and A. Mukherjee 1 Outline Digital Signal Representation

More information

SPEECH ANALYSIS AND SYNTHESIS

SPEECH ANALYSIS AND SYNTHESIS 16 Chapter 2 SPEECH ANALYSIS AND SYNTHESIS 2.1 INTRODUCTION: Speech signal analysis is used to characterize the spectral information of an input speech signal. Speech signal analysis [52-53] techniques

More information

PERCEPTUAL MATCHING PURSUIT WITH GABOR DICTIONARIES AND TIME-FREQUENCY MASKING. Gilles Chardon, Thibaud Necciari, and Peter Balazs

PERCEPTUAL MATCHING PURSUIT WITH GABOR DICTIONARIES AND TIME-FREQUENCY MASKING. Gilles Chardon, Thibaud Necciari, and Peter Balazs 21 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) PERCEPTUAL MATCHING PURSUIT WITH GABOR DICTIONARIES AND TIME-FREQUENCY MASKING Gilles Chardon, Thibaud Necciari, and

More information

Multimedia Systems Giorgio Leonardi A.A Lecture 4 -> 6 : Quantization

Multimedia Systems Giorgio Leonardi A.A Lecture 4 -> 6 : Quantization Multimedia Systems Giorgio Leonardi A.A.2014-2015 Lecture 4 -> 6 : Quantization Overview Course page (D.I.R.): https://disit.dir.unipmn.it/course/view.php?id=639 Consulting: Office hours by appointment:

More information

Low Bit-Rate Speech Codec Based on a Long-Term Harmonic Plus Noise Model

Low Bit-Rate Speech Codec Based on a Long-Term Harmonic Plus Noise Model PAPERS Journal of the Audio Engineering Society Vol. 64, No. 11, November 216 ( C 216) DOI: https://doi.org/1.17743/jaes.216.28 Low Bit-Rate Speech Codec Based on a Long-Term Harmonic Plus Noise Model

More information

Design of Optimal Quantizers for Distributed Source Coding

Design of Optimal Quantizers for Distributed Source Coding Design of Optimal Quantizers for Distributed Source Coding David Rebollo-Monedero, Rui Zhang and Bernd Girod Information Systems Laboratory, Electrical Eng. Dept. Stanford University, Stanford, CA 94305

More information

Pulse-Code Modulation (PCM) :

Pulse-Code Modulation (PCM) : PCM & DPCM & DM 1 Pulse-Code Modulation (PCM) : In PCM each sample of the signal is quantized to one of the amplitude levels, where B is the number of bits used to represent each sample. The rate from

More information

Multimedia Communications. Scalar Quantization

Multimedia Communications. Scalar Quantization Multimedia Communications Scalar Quantization Scalar Quantization In many lossy compression applications we want to represent source outputs using a small number of code words. Process of representing

More information

on a per-coecient basis in large images is computationally expensive. Further, the algorithm in [CR95] needs to be rerun, every time a new rate of com

on a per-coecient basis in large images is computationally expensive. Further, the algorithm in [CR95] needs to be rerun, every time a new rate of com Extending RD-OPT with Global Thresholding for JPEG Optimization Viresh Ratnakar University of Wisconsin-Madison Computer Sciences Department Madison, WI 53706 Phone: (608) 262-6627 Email: ratnakar@cs.wisc.edu

More information

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC9/WG11 CODING OF MOVING PICTURES AND AUDIO ISO/IEC JTC1/SC9/WG11 MPEG 98/M3833 July 1998 Source:

More information

EE-597 Notes Quantization

EE-597 Notes Quantization EE-597 Notes Quantization Phil Schniter June, 4 Quantization Given a continuous-time and continuous-amplitude signal (t, processing and storage by modern digital hardware requires discretization in both

More information

THIS paper is aimed at designing efficient decoding algorithms

THIS paper is aimed at designing efficient decoding algorithms IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 45, NO. 7, NOVEMBER 1999 2333 Sort-and-Match Algorithm for Soft-Decision Decoding Ilya Dumer, Member, IEEE Abstract Let a q-ary linear (n; k)-code C be used

More information

Performance Bounds for Joint Source-Channel Coding of Uniform. Departements *Communications et **Signal

Performance Bounds for Joint Source-Channel Coding of Uniform. Departements *Communications et **Signal Performance Bounds for Joint Source-Channel Coding of Uniform Memoryless Sources Using a Binary ecomposition Seyed Bahram ZAHIR AZAMI*, Olivier RIOUL* and Pierre UHAMEL** epartements *Communications et

More information

Problem Set III Quantization

Problem Set III Quantization Problem Set III Quantization Christopher Tsai Problem #2.1 Lloyd-Max Quantizer To train both the Lloyd-Max Quantizer and our Entropy-Constrained Quantizer, we employ the following training set of images,

More information

Vector Quantization. Institut Mines-Telecom. Marco Cagnazzo, MN910 Advanced Compression

Vector Quantization. Institut Mines-Telecom. Marco Cagnazzo, MN910 Advanced Compression Institut Mines-Telecom Vector Quantization Marco Cagnazzo, cagnazzo@telecom-paristech.fr MN910 Advanced Compression 2/66 19.01.18 Institut Mines-Telecom Vector Quantization Outline Gain-shape VQ 3/66 19.01.18

More information

SINGLE-CHANNEL speech enhancement methods based

SINGLE-CHANNEL speech enhancement methods based IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 6, AUGUST 2007 1741 Minimum Mean-Square Error Estimation of Discrete Fourier Coefficients With Generalized Gamma Priors Jan S.

More information

SCALABLE AUDIO CODING USING WATERMARKING

SCALABLE AUDIO CODING USING WATERMARKING SCALABLE AUDIO CODING USING WATERMARKING Mahmood Movassagh Peter Kabal Department of Electrical and Computer Engineering McGill University, Montreal, Canada Email: {mahmood.movassagh@mail.mcgill.ca, peter.kabal@mcgill.ca}

More information

CODING SAMPLE DIFFERENCES ATTEMPT 1: NAIVE DIFFERENTIAL CODING

CODING SAMPLE DIFFERENCES ATTEMPT 1: NAIVE DIFFERENTIAL CODING 5 0 DPCM (Differential Pulse Code Modulation) Making scalar quantization work for a correlated source -- a sequential approach. Consider quantizing a slowly varying source (AR, Gauss, ρ =.95, σ 2 = 3.2).

More information

2D Spectrogram Filter for Single Channel Speech Enhancement

2D Spectrogram Filter for Single Channel Speech Enhancement Proceedings of the 7th WSEAS International Conference on Signal, Speech and Image Processing, Beijing, China, September 15-17, 007 89 D Spectrogram Filter for Single Channel Speech Enhancement HUIJUN DING,

More information

Vector Quantizers for Reduced Bit-Rate Coding of Correlated Sources

Vector Quantizers for Reduced Bit-Rate Coding of Correlated Sources Vector Quantizers for Reduced Bit-Rate Coding of Correlated Sources Russell M. Mersereau Center for Signal and Image Processing Georgia Institute of Technology Outline Cache vector quantization Lossless

More information

Aalborg Universitet. On Perceptual Distortion Measures and Parametric Modeling Christensen, Mads Græsbøll. Published in: Proceedings of Acoustics'08

Aalborg Universitet. On Perceptual Distortion Measures and Parametric Modeling Christensen, Mads Græsbøll. Published in: Proceedings of Acoustics'08 Aalborg Universitet On Perceptual Distortion Measures and Parametric Modeling Christensen, Mads Græsbøll Published in: Proceedings of Acoustics'08 Publication date: 2008 Document Version Publisher's PDF,

More information

Perceptual Feedback in Multigrid Motion Estimation Using an Improved DCT Quantization

Perceptual Feedback in Multigrid Motion Estimation Using an Improved DCT Quantization IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 10, NO. 10, OCTOBER 2001 1411 Perceptual Feedback in Multigrid Motion Estimation Using an Improved DCT Quantization Jesús Malo, Juan Gutiérrez, I. Epifanio,

More information

A Systematic Description of Source Significance Information

A Systematic Description of Source Significance Information A Systematic Description of Source Significance Information Norbert Goertz Institute for Digital Communications School of Engineering and Electronics The University of Edinburgh Mayfield Rd., Edinburgh

More information

arxiv: v1 [cs.mm] 16 Feb 2016

arxiv: v1 [cs.mm] 16 Feb 2016 Perceptual Vector Quantization for Video Coding Jean-Marc Valin and Timothy B. Terriberry Mozilla, Mountain View, USA Xiph.Org Foundation arxiv:1602.05209v1 [cs.mm] 16 Feb 2016 ABSTRACT This paper applies

More information

SIGNAL COMPRESSION. 8. Lossy image compression: Principle of embedding

SIGNAL COMPRESSION. 8. Lossy image compression: Principle of embedding SIGNAL COMPRESSION 8. Lossy image compression: Principle of embedding 8.1 Lossy compression 8.2 Embedded Zerotree Coder 161 8.1 Lossy compression - many degrees of freedom and many viewpoints The fundamental

More information

Compression and Coding

Compression and Coding Compression and Coding Theory and Applications Part 1: Fundamentals Gloria Menegaz 1 Transmitter (Encoder) What is the problem? Receiver (Decoder) Transformation information unit Channel Ordering (significance)

More information

In-loop atom modulus quantization for matching. pursuit and its application to video coding

In-loop atom modulus quantization for matching. pursuit and its application to video coding In-loop atom modulus quantization for matching pursuit and its application to video coding hristophe De Vleeschouwer Laboratoire de Télécommunications Université catholique de Louvain, elgium Avideh akhor

More information

SCELP: LOW DELAY AUDIO CODING WITH NOISE SHAPING BASED ON SPHERICAL VECTOR QUANTIZATION

SCELP: LOW DELAY AUDIO CODING WITH NOISE SHAPING BASED ON SPHERICAL VECTOR QUANTIZATION SCELP: LOW DELAY AUDIO CODING WITH NOISE SHAPING BASED ON SPHERICAL VECTOR QUANTIZATION Hauke Krüger and Peter Vary Institute of Communication Systems and Data Processing RWTH Aachen University, Templergraben

More information

Multimedia & Computer Visualization. Exercise #5. JPEG compression

Multimedia & Computer Visualization. Exercise #5. JPEG compression dr inż. Jacek Jarnicki, dr inż. Marek Woda Institute of Computer Engineering, Control and Robotics Wroclaw University of Technology {jacek.jarnicki, marek.woda}@pwr.wroc.pl Exercise #5 JPEG compression

More information

ITCT Lecture IV.3: Markov Processes and Sources with Memory

ITCT Lecture IV.3: Markov Processes and Sources with Memory ITCT Lecture IV.3: Markov Processes and Sources with Memory 4. Markov Processes Thus far, we have been occupied with memoryless sources and channels. We must now turn our attention to sources with memory.

More information

Intraframe Prediction with Intraframe Update Step for Motion-Compensated Lifted Wavelet Video Coding

Intraframe Prediction with Intraframe Update Step for Motion-Compensated Lifted Wavelet Video Coding Intraframe Prediction with Intraframe Update Step for Motion-Compensated Lifted Wavelet Video Coding Aditya Mavlankar, Chuo-Ling Chang, and Bernd Girod Information Systems Laboratory, Department of Electrical

More information

The Secrets of Quantization. Nimrod Peleg Update: Sept. 2009

The Secrets of Quantization. Nimrod Peleg Update: Sept. 2009 The Secrets of Quantization Nimrod Peleg Update: Sept. 2009 What is Quantization Representation of a large set of elements with a much smaller set is called quantization. The number of elements in the

More information

CS578- Speech Signal Processing

CS578- Speech Signal Processing CS578- Speech Signal Processing Lecture 7: Speech Coding Yannis Stylianou University of Crete, Computer Science Dept., Multimedia Informatics Lab yannis@csd.uoc.gr Univ. of Crete Outline 1 Introduction

More information

LATTICE VECTOR QUANTIZATION FOR IMAGE CODING USING EXPANSION OF CODEBOOK

LATTICE VECTOR QUANTIZATION FOR IMAGE CODING USING EXPANSION OF CODEBOOK LATTICE VECTOR QUANTIZATION FOR IMAGE CODING USING EXPANSION OF CODEBOOK R. R. Khandelwal 1, P. K. Purohit 2 and S. K. Shriwastava 3 1 Shri Ramdeobaba College Of Engineering and Management, Nagpur richareema@rediffmail.com

More information

Research Article Adaptive Long-Term Coding of LSF Parameters Trajectories for Large-Delay/Very- to Ultra-Low Bit-Rate Speech Coding

Research Article Adaptive Long-Term Coding of LSF Parameters Trajectories for Large-Delay/Very- to Ultra-Low Bit-Rate Speech Coding Hindawi Publishing Corporation EURASIP Journal on Audio, Speech, and Music Processing Volume 00, Article ID 597039, 3 pages doi:0.55/00/597039 Research Article Adaptive Long-Term Coding of LSF Parameters

More information

A Study of Source Controlled Channel Decoding for GSM AMR Vocoder

A Study of Source Controlled Channel Decoding for GSM AMR Vocoder A Study of Source Controlled Channel Decoding for GSM AMR Vocoder K.S.M. Phanindra Girish A Redekar David Koilpillai Department of Electrical Engineering, IIT Madras, Chennai-6000036. phanindra@midascomm.com,

More information

BASIC COMPRESSION TECHNIQUES

BASIC COMPRESSION TECHNIQUES BASIC COMPRESSION TECHNIQUES N. C. State University CSC557 Multimedia Computing and Networking Fall 2001 Lectures # 05 Questions / Problems / Announcements? 2 Matlab demo of DFT Low-pass windowed-sinc

More information

Time-domain representations

Time-domain representations Time-domain representations Speech Processing Tom Bäckström Aalto University Fall 2016 Basics of Signal Processing in the Time-domain Time-domain signals Before we can describe speech signals or modelling

More information

A Nonlinear Psychoacoustic Model Applied to the ISO MPEG Layer 3 Coder

A Nonlinear Psychoacoustic Model Applied to the ISO MPEG Layer 3 Coder A Nonlinear Psychoacoustic Model Applied to the ISO MPEG Layer 3 Coder Frank Baumgarte, Charalampos Ferekidis, Hendrik Fuchs Institut für Theoretische Nachrichtentechnik und Informationsverarbeitung Universität

More information

Wavelet Footprints: Theory, Algorithms, and Applications

Wavelet Footprints: Theory, Algorithms, and Applications 1306 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 51, NO. 5, MAY 2003 Wavelet Footprints: Theory, Algorithms, and Applications Pier Luigi Dragotti, Member, IEEE, and Martin Vetterli, Fellow, IEEE Abstract

More information

Jorge Silva and Shrikanth Narayanan, Senior Member, IEEE. 1 is the probability measure induced by the probability density function

Jorge Silva and Shrikanth Narayanan, Senior Member, IEEE. 1 is the probability measure induced by the probability density function 890 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 3, MAY 2006 Average Divergence Distance as a Statistical Discrimination Measure for Hidden Markov Models Jorge Silva and Shrikanth

More information

Sound Recognition in Mixtures

Sound Recognition in Mixtures Sound Recognition in Mixtures Juhan Nam, Gautham J. Mysore 2, and Paris Smaragdis 2,3 Center for Computer Research in Music and Acoustics, Stanford University, 2 Advanced Technology Labs, Adobe Systems

More information

On Compression Encrypted Data part 2. Prof. Ja-Ling Wu The Graduate Institute of Networking and Multimedia National Taiwan University

On Compression Encrypted Data part 2. Prof. Ja-Ling Wu The Graduate Institute of Networking and Multimedia National Taiwan University On Compression Encrypted Data part 2 Prof. Ja-Ling Wu The Graduate Institute of Networking and Multimedia National Taiwan University 1 Brief Summary of Information-theoretic Prescription At a functional

More information

Compression methods: the 1 st generation

Compression methods: the 1 st generation Compression methods: the 1 st generation 1998-2017 Josef Pelikán CGG MFF UK Praha pepca@cgg.mff.cuni.cz http://cgg.mff.cuni.cz/~pepca/ Still1g 2017 Josef Pelikán, http://cgg.mff.cuni.cz/~pepca 1 / 32 Basic

More information

Nonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation

Nonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation Nonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation Mikkel N. Schmidt and Morten Mørup Technical University of Denmark Informatics and Mathematical Modelling Richard

More information

ENVELOPE MODELING FOR SPEECH AND AUDIO PROCESSING USING DISTRIBUTION QUANTIZATION

ENVELOPE MODELING FOR SPEECH AND AUDIO PROCESSING USING DISTRIBUTION QUANTIZATION ENVELOPE MODELING FOR SPEECH AND AUDIO PROCESSING USING DISTRIBUTION QUANTIZATION Tobias Jähnel *, Tom Bäckström * and Benjamin Schubert * International Audio Laboratories Erlangen, Friedrich-Alexander-University

More information

6196 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 9, SEPTEMBER 2011

6196 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 9, SEPTEMBER 2011 6196 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 9, SEPTEMBER 2011 On the Structure of Real-Time Encoding and Decoding Functions in a Multiterminal Communication System Ashutosh Nayyar, Student

More information

Adaptive Karhunen-Loeve Transform for Enhanced Multichannel Audio Coding

Adaptive Karhunen-Loeve Transform for Enhanced Multichannel Audio Coding Adaptive Karhunen-Loeve Transform for Enhanced Multichannel Audio Coding Dai Yang, Hongmei Ai, Chris Kyriakakis and C.-C. Jay Kuo Integrated Media Systems Center and Department of Electrical Engineering-Systems

More information

L used in various speech coding applications for representing

L used in various speech coding applications for representing IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 1, NO. 1. JANUARY 1993 3 Efficient Vector Quantization of LPC Parameters at 24 BitsFrame Kuldip K. Paliwal, Member, IEEE, and Bishnu S. Atal, Fellow,

More information

Audio Coding. Fundamentals Quantization Waveform Coding Subband Coding P NCTU/CSIE DSPLAB C.M..LIU

Audio Coding. Fundamentals Quantization Waveform Coding Subband Coding P NCTU/CSIE DSPLAB C.M..LIU Audio Coding P.1 Fundamentals Quantization Waveform Coding Subband Coding 1. Fundamentals P.2 Introduction Data Redundancy Coding Redundancy Spatial/Temporal Redundancy Perceptual Redundancy Compression

More information

A Generalized Subspace Approach for Enhancing Speech Corrupted by Colored Noise

A Generalized Subspace Approach for Enhancing Speech Corrupted by Colored Noise 334 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL 11, NO 4, JULY 2003 A Generalized Subspace Approach for Enhancing Speech Corrupted by Colored Noise Yi Hu, Student Member, IEEE, and Philipos C

More information

Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi

Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi Lecture - 41 Pulse Code Modulation (PCM) So, if you remember we have been talking

More information

Supplementary Figure 1: Scheme of the RFT. (a) At first, we separate two quadratures of the field (denoted by and ); (b) then, each quadrature

Supplementary Figure 1: Scheme of the RFT. (a) At first, we separate two quadratures of the field (denoted by and ); (b) then, each quadrature Supplementary Figure 1: Scheme of the RFT. (a At first, we separate two quadratures of the field (denoted by and ; (b then, each quadrature undergoes a nonlinear transformation, which results in the sine

More information

Frequency Domain Speech Analysis

Frequency Domain Speech Analysis Frequency Domain Speech Analysis Short Time Fourier Analysis Cepstral Analysis Windowed (short time) Fourier Transform Spectrogram of speech signals Filter bank implementation* (Real) cepstrum and complex

More information

A Multivariate Time-Frequency Based Phase Synchrony Measure for Quantifying Functional Connectivity in the Brain

A Multivariate Time-Frequency Based Phase Synchrony Measure for Quantifying Functional Connectivity in the Brain A Multivariate Time-Frequency Based Phase Synchrony Measure for Quantifying Functional Connectivity in the Brain Dr. Ali Yener Mutlu Department of Electrical and Electronics Engineering, Izmir Katip Celebi

More information

GAUSSIANIZATION METHOD FOR IDENTIFICATION OF MEMORYLESS NONLINEAR AUDIO SYSTEMS

GAUSSIANIZATION METHOD FOR IDENTIFICATION OF MEMORYLESS NONLINEAR AUDIO SYSTEMS GAUSSIANIATION METHOD FOR IDENTIFICATION OF MEMORYLESS NONLINEAR AUDIO SYSTEMS I. Marrakchi-Mezghani (1),G. Mahé (2), M. Jaïdane-Saïdane (1), S. Djaziri-Larbi (1), M. Turki-Hadj Alouane (1) (1) Unité Signaux

More information

CEPSTRAL analysis has been widely used in signal processing

CEPSTRAL analysis has been widely used in signal processing 162 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 7, NO. 2, MARCH 1999 On Second-Order Statistics and Linear Estimation of Cepstral Coefficients Yariv Ephraim, Fellow, IEEE, and Mazin Rahim, Senior

More information

MATCHING-PURSUIT DICTIONARY PRUNING FOR MPEG-4 VIDEO OBJECT CODING

MATCHING-PURSUIT DICTIONARY PRUNING FOR MPEG-4 VIDEO OBJECT CODING MATCHING-PURSUIT DICTIONARY PRUNING FOR MPEG-4 VIDEO OBJECT CODING Yannick Morvan, Dirk Farin University of Technology Eindhoven 5600 MB Eindhoven, The Netherlands email: {y.morvan;d.s.farin}@tue.nl Peter

More information

Transform Coding. Transform Coding Principle

Transform Coding. Transform Coding Principle Transform Coding Principle of block-wise transform coding Properties of orthonormal transforms Discrete cosine transform (DCT) Bit allocation for transform coefficients Entropy coding of transform coefficients

More information

Half-Pel Accurate Motion-Compensated Orthogonal Video Transforms

Half-Pel Accurate Motion-Compensated Orthogonal Video Transforms Flierl and Girod: Half-Pel Accurate Motion-Compensated Orthogonal Video Transforms, IEEE DCC, Mar. 007. Half-Pel Accurate Motion-Compensated Orthogonal Video Transforms Markus Flierl and Bernd Girod Max

More information

Fast Near-Optimal Energy Allocation for Multimedia Loading on Multicarrier Systems

Fast Near-Optimal Energy Allocation for Multimedia Loading on Multicarrier Systems Fast Near-Optimal Energy Allocation for Multimedia Loading on Multicarrier Systems Michael A. Enright and C.-C. Jay Kuo Department of Electrical Engineering and Signal and Image Processing Institute University

More information

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS Emad M. Grais and Hakan Erdogan Faculty of Engineering and Natural Sciences, Sabanci University, Orhanli

More information

I. INTRODUCTION. A. Related Work

I. INTRODUCTION. A. Related Work 1624 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17, NO. 9, SEPTEMBER 2008 Rate Bounds on SSIM Index of Quantized Images Sumohana S. Channappayya, Member, IEEE, Alan Conrad Bovik, Fellow, IEEE, and Robert

More information

A Nonuniform Quantization Scheme for High Speed SAR ADC Architecture

A Nonuniform Quantization Scheme for High Speed SAR ADC Architecture A Nonuniform Quantization Scheme for High Speed SAR ADC Architecture Youngchun Kim Electrical and Computer Engineering The University of Texas Wenjuan Guo Intel Corporation Ahmed H Tewfik Electrical and

More information

On Common Information and the Encoding of Sources that are Not Successively Refinable

On Common Information and the Encoding of Sources that are Not Successively Refinable On Common Information and the Encoding of Sources that are Not Successively Refinable Kumar Viswanatha, Emrah Akyol, Tejaswi Nanjundaswamy and Kenneth Rose ECE Department, University of California - Santa

More information

LECTURE NOTES IN AUDIO ANALYSIS: PITCH ESTIMATION FOR DUMMIES

LECTURE NOTES IN AUDIO ANALYSIS: PITCH ESTIMATION FOR DUMMIES LECTURE NOTES IN AUDIO ANALYSIS: PITCH ESTIMATION FOR DUMMIES Abstract March, 3 Mads Græsbøll Christensen Audio Analysis Lab, AD:MT Aalborg University This document contains a brief introduction to pitch

More information

Optimal Multiple Description and Multiresolution Scalar Quantizer Design

Optimal Multiple Description and Multiresolution Scalar Quantizer Design Optimal ultiple Description and ultiresolution Scalar Quantizer Design ichelle Effros California Institute of Technology Abstract I present new algorithms for fixed-rate multiple description and multiresolution

More information

Analysis of methods for speech signals quantization

Analysis of methods for speech signals quantization INFOTEH-JAHORINA Vol. 14, March 2015. Analysis of methods for speech signals quantization Stefan Stojkov Mihajlo Pupin Institute, University of Belgrade Belgrade, Serbia e-mail: stefan.stojkov@pupin.rs

More information

Optimal Power Allocation for Parallel Gaussian Broadcast Channels with Independent and Common Information

Optimal Power Allocation for Parallel Gaussian Broadcast Channels with Independent and Common Information SUBMIED O IEEE INERNAIONAL SYMPOSIUM ON INFORMAION HEORY, DE. 23 1 Optimal Power Allocation for Parallel Gaussian Broadcast hannels with Independent and ommon Information Nihar Jindal and Andrea Goldsmith

More information

SIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding

SIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding SIGNAL COMPRESSION Lecture 7 Variable to Fix Encoding 1. Tunstall codes 2. Petry codes 3. Generalized Tunstall codes for Markov sources (a presentation of the paper by I. Tabus, G. Korodi, J. Rissanen.

More information

encoding without prediction) (Server) Quantization: Initial Data 0, 1, 2, Quantized Data 0, 1, 2, 3, 4, 8, 16, 32, 64, 128, 256

encoding without prediction) (Server) Quantization: Initial Data 0, 1, 2, Quantized Data 0, 1, 2, 3, 4, 8, 16, 32, 64, 128, 256 General Models for Compression / Decompression -they apply to symbols data, text, and to image but not video 1. Simplest model (Lossless ( encoding without prediction) (server) Signal Encode Transmit (client)

More information

Primal-Dual Algorithms for Audio Decomposition Using Mixed Norms

Primal-Dual Algorithms for Audio Decomposition Using Mixed Norms Noname manuscript No. (will be inserted by the editor Primal-Dual Algorithms for Audio Decomposition Using Mixed Norms İlker Bayram and Ö. Deniz Akyıldız Received: date / Accepted: date Abstract We consider

More information

IMPROVED MULTI-MICROPHONE NOISE REDUCTION PRESERVING BINAURAL CUES

IMPROVED MULTI-MICROPHONE NOISE REDUCTION PRESERVING BINAURAL CUES IMPROVED MULTI-MICROPHONE NOISE REDUCTION PRESERVING BINAURAL CUES Andreas I. Koutrouvelis Richard C. Hendriks Jesper Jensen Richard Heusdens Circuits and Systems (CAS) Group, Delft University of Technology,

More information

Towards control over fading channels

Towards control over fading channels Towards control over fading channels Paolo Minero, Massimo Franceschetti Advanced Network Science University of California San Diego, CA, USA mail: {minero,massimo}@ucsd.edu Invited Paper) Subhrakanti

More information

Speech Signal Representations

Speech Signal Representations Speech Signal Representations Berlin Chen 2003 References: 1. X. Huang et. al., Spoken Language Processing, Chapters 5, 6 2. J. R. Deller et. al., Discrete-Time Processing of Speech Signals, Chapters 4-6

More information