1 Introduction 1 INTRODUCTION 1
|
|
- Terence Summers
- 6 years ago
- Views:
Transcription
1 1 INTRODUCTION 1 Audio Denoising by Time-Frequency Block Thresholding Guoshen Yu, Stéphane Mallat and Emmanuel Bacry CMAP, Ecole Polytechnique, Palaiseau, France March 27 Abstract For audio denoising, diagonal thresholding estimators of spectrogram coefficients produce a musical noise that degrades audio perception. We introduce a block thresholding which produces hardly any musical noise and improves the SNR compared to diagonal thresholdings or Ephraim and Malah estimators. Spectrogram coefficients are grouped into blocks to compute attenuation factors. This block grouping regularizes the estimation which removes musical noises. The block size is adapted to the signal properties by minimizing a Stein unbiased estimator of the block thresholding risk. Index Terms Audio denoising, Block thresholding, Diagonal thresholding, Ephraim and Malah, SURE. 1 Introduction Audio signals are often contaminated by background environment noise and buzzing or humming noise from audio equipments. Audio denoising aims at attenuating the noise while retaining the underlying signals. Applications such as music and speech restoration are numerous. Thresholding estimators [11] remove noise by thresholding to zero small coefficients in an appropriate sparse signal representation. Image denoising by thresholding wavelet coefficients is particularly effective to suppress noise from images, and these estimators are used in many applications. For audio signals, despite interesting work on such thresholding estimators [8, 21, 24], the results are less convincing. Indeed, thresholding the spectrogram or the wavelet coefficients of a noisy audio signal produces a musical noise [6, 26]. This noise is a sum of localized time-frequency structures corresponding to isolated spectrogram or wavelet coefficients above the threshold. This superposition of musical noise contaminates the denoised sound and degrades the audio perception. Currently, the audio denoising method most often used is the Ephraim and Malah noise suppression rule [12, 13] and their variants [25] applied to spectrograms. This technique introduces little musical noise and maintains a small amplitude residual noise that masks this musical noise. This paper introduces a block thresholding estimator that produces hardly any musical noise with no residual noise, by grouping spectrogram coefficients in time-frequency blocks [26]. A block thresholding restores spectrograms that are more regular without isolated coefficients responsible for musical noise. Taking advantage of the time-frequency regularity of audio sounds, it also improves the resulting SNR. Comparisons are made with Ephraim and Malah estimators. Block thresholding estimators were first introduced by Cai and Silverman [3, 4, 5] to improve noise removal in orthonormal wavelet bases. Mathematical studies [15, 16, 17] proved the minimax optimality of wavelet block thresholding for certain classes of signals. For audio denoising, the grouping of spectrogram coefficients in blocks can be automatically adjusted to the signal content, by minimizing the resulting risk calculated with the Stein estimator [23]. We begin by reviewing conventional diagonal thresholding estimators and explain why they
2 2 DIAGONAL THRESHOLDING 2 produce musical noise for audio signals. Section 3 introduces the block thresholding estimators of Cai and Silverman [4] in the general context of orthogonal bases and frames. Block thresholding of spectrogram coefficients are studied for audio denoising, and comparisons are made with Ephraim and Malah methods. To adjust the size of blocks that group spectrogram coefficients, Section 4 explains how to compute the Stein unbiased risk estimate [23] of a block thresholding algorithm, and adjust the block size to minimize the risk estimation. A post-processing with an empirical Wiener shrinkage [14] is presented in Section 5 to further improve the estimation. 2 Diagonal Thresholding Next section describes the properties of diagonal thresholding estimators both in orthogonal bases and in frames, and Section 2.2 explains why they produce musical noises when applied to audio spectrograms. 2.1 Properties of Diagonal Thresholding Estimators Let y be a noisy signal that is the sum of a clean signal f and a noise ǫ of zero mean: y[n] = f[n] + ǫ[n], n =, 1,...,N 1. (1) Thresholding estimators decompose noisy signals in a basis or in a frame and set to zero small amplitude coefficients. Let F = {g m } 1 m N be a family of vectors that define an orthonormal basis of R N. Decomposing y in F yields with y F [m] = f F [m] + ǫ F [m], 1 m N (2) y F [m] = y, g m, f F [m] = f, g m and ǫ F [m] = ǫ, g m. A diagonal estimator in this basis modifies the amplitude of each coefficient y F [m] with a factor a[m] and reconstructs ˆf = N D m (y F [m])g m = m=1 N a[m] y F [m] g m. (3) To reduce the quadratic risk E{ f ˆf 2 } one can verify that the attenuation factor should satisfy a[m] 1. The estimator is said to be diagonal if a[m] depends only upon y F [m]. For diagonal estimators, one can verify [11] that a lower bound of the quadratic risk E{ f ˆf 2 } is obtained by choosing f F [m] 2 a[m] = f F [m] 2 + σ 2 (4) [m] where σ 2 [m] = E{ ǫ F [m] 2 } is the variance of each noisy coefficient. The resulting lower bound risk is N f F [m] 2 σ 2 [m] R o = f F [m] 2 + σ 2 [m]. (5) m=1 m=1
3 2 DIAGONAL THRESHOLDING 3 This lower bound cannot be reached because the oracle attenuation factor (4) depends upon f F [m] which is unknown. A simple diagonal estimator is the empirical Wiener estimator [2] defined by ( ) D m (x) = x 1 σ2 [m] x 2 + where we write (z) + = max(z, ). Donoho and Johnstone [11] have introduced better thresholding estimators that can produce a risk close to the oracle lower bound. A hard thresholding keeps coefficients above a threshold T m = λσ[m]: D m (x) = x1 { x >λ σ[m]} (7) in which case the attenuation factor a[m] is or 1. A soft thresholding reduces the amplitude of all coefficients ( D m (x) = x 1 λσ[m] ). (8) x + To minimize the risk, Donoho and Jonhstone proved that the threshold T m should be proportional to the noise standard deviation and depends upon the signal size. Asymptotically, an optimal choice is: T m = 2 log e N σ[m]. (9) When the noise ǫ is Gaussian and white, and hence σ[m] = σ for all 1 m N, Donoho and Johnstone [11] proved that for N 4 the hard and soft thresholding risk is close to the minimum oracle risk: R o E{ f ˆf 2 } (2 log e N + 2.4) ( σ 2 + R o ). (1) A frame is a family of M N vectors F = {g m } m Γ that defines a redundant signal representation f F [m] = f, g m. A tight frame satisfies an energy conservation like an orthogonal basis f 2 = 1 f F [m] 2 A and as a result one can prove that [19] f = 1 A m Γ f F [m] g m, m Γ where A is the frame bound. A thresholding estimator in a tight frame behaves similarly to an averaging of thresholding estimators in several orthonormal bases, which often improves the resulting SNR [9]. The thresholding risk in a frame can also be related to an oracle risk with an upper bound similar to (1). In numerical applications, thresholding estimators in tight frames are thus prefered to thresholding estimators in a single orthogonal basis. (6)
4 2 DIAGONAL THRESHOLDING Audio Denoising by Diagonal Thresholding Audio signal denoising can be implemented with a thresholding in a windowed Fourier frame. It amounts to a simple thresholding of the resulting spectrogram, but it produces a musical noise corresponding to isolated coefficients above threshold. Let w[n] be a window of size R normalized to w 2 = 1. A windowed Fourier frame is defined by ( )} i2πrn F = {g l,r [n]} = {w[n lu] exp R, 1 l N/u,1 r R where u is the window shifting step, and l, r are respectively the time and frequency indices. The resulting windowed Fourier coefficients are computed with an FFT for each translated window f F [l, r] = f, g l,r = N ( ) i2πrn f[n]w[n lu] exp R n=1 and { f F [l, r] 2 } 1 l N/u,1 r R is the spectrogram. Thresholding windowed Fourier coefficients thus amounts to threshold a spectrogram. If the window w[n] is chosen so that l w[n lu] 2 = A, n, (11) R then one can prove [1] that the windowed Fourier frame is a tight frame with frame bound A. In the following, we use half-overlapping windows with u = R/2 and with a window w that is the square root of a Hanning window to satisfy (11). If the noise is stationary then the noise variance σ 2 l,r = E{ǫ F[l, r] 2 } depends only upon the frequency index r and if it is white then it has a constant value σ 2. For an empirical Wiener diagonal estimator (6), the attenuation factor is ( ) a[l, r] = 1 σ2 [l, r] y F [l, r] 2, + which coincides with the square of the suppression rule for the method of power subtraction [1, 2, 18], and is known to produce musical noises. To illustrate the musical noise produced by a spectrogram thresholding, Fig. 1 shows the denoising of a short recording of the Mozart oboe concerto with a white Gaussian noise. Fig. 1(a) and 1(b) show respectively the log spectrograms log f F [l, r] and log y F [l, r] of the original signal f and its noisy version y. Thresholding y F [l, r] amounts to multiplying it by attenuation factors a[l, r] equal to or 1. Fig. 1(c) shows this attenuation map, with black points corresponding to a[l, r] = 1. As it can be observed in the zoom in Fig. 1(c ) this attenuation map includes many isolated black points. In the reconstruction process, these isolated coefficients restore isolated windowed Fourier vectors g l,r [n] that are perceived as a musical noise. A soft thresholding produces a similar phenomenon because each coefficient is also thresholded independently from its neighbors. To remove this musical noise, next section uses a block thresholding estimator that takes into account the fact that large spectrogram coefficients of most audio sounds are aggregated together in the time-frequency plane.
5 3 TIME-FREQUENCY BLOCK THRESHOLDING 5 (a) (b) (c) Log-spectrogram of original Mozart. (d) Log-spectrogram of noisy Mozart (a ) Hard-thresholding Adaptive block thresholding. (b ) (c ) Zoom of (a). (d ) Zoom of (b). Zoom of (c). Zoom of (d). Figure 1: Log-spectrogram of original and noisy Mozart and attenuation coefficients of hard thresholding and block thresholding. (a )(b )(c )(d ) are respectively zooms of the marked regions in (a)(b)(c)(d). Values of attenuation coefficients from 1 (black) to (white). 3 Time-Frequency Block Thresholding The block thresholding algorithm of Cai and Silverman [3, 4] regularizes diagonal thresholding estimations by grouping coefficients in blocks and computing a single attenuation factor for all coefficients in each block. We present this estimator in a general context of orthogonal bases and frames before applying it to spectrograms for audio denoising. By regularizing the thresholding estimation over blocks of coefficients, the musical noise is almost completely removed and the SNR is improved.
6 3 TIME-FREQUENCY BLOCK THRESHOLDING Block Thresholding in Bases and Frames Let F = {g m } m Γ be an orthonormal basis or a frame of R N. The set Γ of all indices m is segmented in K blocks B k in which indices are grouped together. If F is a windowed Fourier frame then the time-frequency indices m = (l, r) are grouped in time-frequency blocks B k whose shape may a priori be chosen arbitrarily. A block thresholding estimator multiplies all coefficients within B k with a same attenuation factor a k ˆf = K k=1 m B k a k y F [m] g m (12) This estimator is not diagonal because the value of each a k may depend upon all coefficients y F [m] within B k. A lower bound of the risk E{ ˆf f 2 } is obtained with an oracle attenuation. Let B # k be the number of coefficients within a block B k. The average signal and noise energy in this block are: f 2 F,k = 1 B # k m B k f F [m] 2 and σ 2 k = 1 B # k m B k σ 2 [m]. Similarly to the oracle attenuation factor (4), one can verify that a minimum risk is obtained by choosing a k = f2 F,k f 2 F,k + σ2 k σ 2 k = 1 ff,k 2 + σ2 k, (13) and the resulting oracle block risk is R bo = K k=1 f 2 F,k σ2 k f 2 F,k + σ2 k. (14) Clearly the oracle block attenuation factor a k in (13) cannot be calculated since it depends upon the values of f F [m]. The goal is to find a block estimator whose risk E{ ˆf f 2 } is as close as possible to the lower bound R bo. Observe that the oracle risk with blocks R bo in (14) is always larger than the oracle risk R o in (5) without blocks, because it is obtained through the same minimization but with less parameters as attenuation factors remain constant over each block. Reducing the number of attenuation parameters with a block technique increases the oracle risk lower bound but it regularizes the estimation when attenuation factors are computed from empirical coefficients. A direct calculation shows that K R bo R o = k=1 m B k ξ F,k ξ F [m](σ 2 k σ2 [m]) + (f 2 F,k f F[m] 2 ) (ξ F,k + 1)(ξ F [m] + 1), (15) with ξ F,k = f2 F,k is the average SNR in block B k and ξ F [m] = ff[m] 2 σk 2 σ is the SNR of the coefficient 2 corresponding to the index m. Equation (15) indicates that R bo is close to R o if both the noise
7 3 TIME-FREQUENCY BLOCK THRESHOLDING 7 and the signal coefficients have little variation in each block. Consequently the risk of the block thresholding estimator is reduced by choosing the blocks so that in each block B k either (i) f F [m] and σ 2 [m] vary little; or (ii) ξ F,k 1, ξ F [m] 1 and σ 2 [m] varies little; or (iii) ξ F,k 1, ξ F [m] 1 and f F [m] varies little. Cai and Silverman block thresholding operators [3, 4] use the James Stein shrinkage rule [22]. We cannot compute the original signal energy in the block but we can calculate the noisy signal energy yf,k 2 = 1 B # y F [m] 2 k m B k and observe that E{y 2 F,k } = f2 F,k + σ2 k. (16) The James Stein shrinkage rule [22] is similar to the oracle formula (13) where ff,k 2 + σ2 k is replaced by y 2 F,k : a k = ( 1 λσ2 k y 2 F,k ) +, (17) with a thresholding parameter λ 1. For blocks of size 1, if λ = 1 then this shrinkage rule corresponds to the empirical diagonal Wiener estimator defined in (6). If the noise ǫ is a Gaussian white noise, then, like in the case of diagonal thresholding estimators, the resulting risk E{ ˆf f 2 } can be shown to be close to the oracle risk (14). The average noise energy over a block B k ǫ 2 F,k = 1 B # ǫ F [m] 2 (18) k m B k has a χ 2 distribution with B # B # k degrees of freedom because each noise coefficient ǫ F[m] is a k Gaussian random variable of variance σ 2. If all blocks B k have the same size B #, then Cai [3] proved that R bo E{ ˆf f 2 } 2λR bo + 4Nσ 2 Prob{ǫ 2 F > λσ2 }, (19) where Prob{} is the probability measure and ǫ 2 F is the average noise energy over a block of size B #. The second term 4Nσ 2 Prob{ǫ 2 F > λσ2 } in the risk upper bound (19) is a variance term corresponding to a probability of keeping pure noise coefficients, i.e., f is zero (y = ǫ) and a k (c.f. (17)). Prob{ǫ 2 F > λσ2 } is the probability to keep a residual noise. The oracle risk and the variance terms in (19) are competing. When λ increases the first term increases and the variance term decreases. Similarly, when the block size B # k increases the oracle risk R bo increases whereas the variance decreases. Adjusting λ and the block sizes B # k can be interpreted as an optimization between the bias and the variance of our block thresholding estimator. The parameters λ and B # k are set by adjusting the residual noise probability where δ is the residual noise probability that one tolerates. Prob{ǫ 2 F > λσ2 } = δ (2)
8 3 TIME-FREQUENCY BLOCK THRESHOLDING 8 Cai [3] shows that choosing B # = log e N and λ = 4.55 yields the following block oracle inequality (19): R ba 2λ R ob + 2σ 2. (21) A tight frame is similar to a union of several orthonormal bases and the risk of a block thresholding estimator in a tight frame behaves similarly as the sum of the risks in several orthonormal bases. However, even if the noise is Gaussian white, because of the redundancy between frame vectors, the average noise energy ǫ 2 F over a block of size B# no longer follows a χ 2 B # distribution. 3.2 Block Thresholding in Short-Time Fourier Frames The time-frequency block thresholding can be applied directly with short-time Fourier frames. Some specifications about choice of parameters are discussed below. Choice of Block We group time-frequency contiguous short-time Fourier coefficients in disjoint rectangular blocks. The block size is B # k = L k W k, where L k and W k are respectively the block length in time and the block width in frequency. For simplicity, dyadic lengths L k = 8, 4, 2 and widths W k = 16, 8, 4, 2, 1 will be used (the unit being the time-frequency index in spectrogram). In this section, fixed block length and width are assigned to all the blocks, i.e., L k = L, W k = W and B # k = B# = L W, k. Choice of Thresholding Level λ Given a choice of block size and the residual noise probability level δ that one tolerates, the thresholding level λ is defined by (2). For each block width and length, λ is estimated using Monte Carlo simulation of ǫ 2 F. Table 1 shows the resulting λ with δ =.1%. Let us remark that for a block width W > 1, blocks that contain same number of coefficients B # = L W have close λ values. W = 16 W = 8 W = 4 W = 2 W = 1 L = L = L = Table 1: Thresholding level λ calculated with different block size B # = L W and with δ =.1%. 3.3 Block Thresholding and Ephraim and Malah In the Ephraim and Malah methods [12, 13, 6] and their variants [7, 25], two factors contribute essentially to the elimination of musical noise: the recursive decision-directed a priori SNR estimator that induces a temporal regularization in the estimator, and the suppression rules that retain a uniform noise which masks efficiently the musical noise in denoised signals. We discuss a connection between the block thresholding estimation and the decision-directed a priori SNR
9 3 TIME-FREQUENCY BLOCK THRESHOLDING 9 estimator. The masking noise technique is incorporated in block thresholding estimator. Ephraim and Malah Methods Estimating the a priori SNR ξ[l, r] = f F [l, r] 2 /σ 2 [l, r] is an important step of most noise suppression rules. In their milestone paper [12], Ephraim and Malah proposed a decision-directed estimator of the a priori SNR with a recursive procedure ˆξ[l, r] = α ˆf F [l 1, r] 2 σ 2 [l 1, r] ( yf [l, r] 2 ) + (1 α) σ 2 1, (22) [l, r] + where α [, 1] is a weighting parameter. In the first term, ˆf F [l 1, r] is the previously computed estimate of f F [l 1, r]. The second term is a maximum likelihood estimate of the SNR of the current coefficient. The decision-directed SNR estimator is recursive and induces a temporal regularization on ˆξ[l, r] with a causal smooth window exponentially decreasing. Based on an independent Gaussian distribution assumption of signal coefficients f F [l, r], Ephraim and Malah proposed a noise suppression rule as ˆf F [l, r] = a[l, r]y F [l, r] (23) with a[l, r] = ( ) [ ( ) ( )] π v[l, r] v[l, r] v[l, r] v[l, r] exp (1 + v[l, r])i + v[l, r]i 1 2 γ[l, r] (24) where γ[l, r] = y F [l, r] 2 /σ 2 [l, r] is called the a posteriori SNR of f F [l, r], v[l, r] is defined by v[l, r] = ξ[l,r] ξ[l,r]+1 γ[l, r] and I ( ) and I 1 ( ) denote respectively the modified Bessel function of zero and first order. Fig. 2-b shows the value of a[l, r] as a function of ξ[l, r] in db with different values of γ[l, r]. Note that the curve corresponding to γ 1 = ξ is close to the average case, since E{γ} 1 = ξ. The Ephraim and Malah suppression rule, compared with block thresholding in Fig. 2-a, performs less severe attenuation when the a priori SNR ξ[l, r] is very small; moreover, the attenuation decreases when the a posteriori SNR γ[l, r] increases. As a result, the Ephraim and Malah suppression rule is able to retain some residual masking noise. Block Thresholding A block thresholding estimation (17) also depends upon an estimated a priori SNR calculated on each block: ( ) ( a k = 1 λσ2 k = 1 λ ), (25) yf,k 2 ˆξ k where + ˆξ k = y2 F,k σ 2 k 1 (26) is an unbiased estimate of the a priori SNR ξ[l, r] computed by averaging the coefficient energy in a block.
10 3 TIME-FREQUENCY BLOCK THRESHOLDING 1 To retain a low-amplitude masking noise, a non-zero attenuation floor value is kept by modifying (25): ( ) ( ( a k = max 1 λσ2 k, a = max 1 λ ) ), a (27) yf,k 2 ˆξ k where < a 1 is a masking noise attenuation factor. The experiments show that with a around.5, the small residual noise masks completely the remaining very weak musical noise. Fig. 2(a) plots the attenuation factor (27) of the block thresholding in function of ˆξ k with different λ and a. Note that the curve with λ = 1 corresponds to the attenuation with oracle. The block thresholding makes stronger attenuation than the Ephraim and Malah suppression rule when the a priori SNR is weak. This explains why the block thresholding is better at eliminating the noise (if a is small) than the Ephaim and Malah suppression rule. (a) Gain (db) λ = 1., a =.6 λ = 1.5, a =.9 1 λ = 2., a =.7 λ = 2.5, a = A priori SNR (db) (b) Gain (db) γ 1 = 2 db γ 1 = db γ 1 = 2 db γ 1 = ξ A priori SNR (db) Figure 2: Attenuation factor versus a priori SNR ξ. (a) Block Thresholding (27) for different thresholding parameters λ and masking noise attenuation factor a. (b) Ephraim and Malah suppression rule (24) for different a posteriori SNR γ. 3.4 Experiments and Results The experiments presented below have been performed on various types of signals: Piano is a simple example that contains a single clear clavier stroke; Mozart and Centuria are musical excerpts that contain respectively quick notes played by a solo oboe and by some drums; Tête is a speech signal (in French). Centuria is sampled at 44 khz and all the other signals are sampled at 11 khz. They were corrupted by white Gaussian noise of different amplitude. Short-time Fourier transform with half-overlapping windows were used in the experiments. These windows are square root of Hanning windows of size 5 ms for Piano and Mozart, 3 ms for Centuria and 2 ms for Tête. 1 1 The audio denoising examples are available online at?????.
11 3 TIME-FREQUENCY BLOCK THRESHOLDING Performance Comparison Table 2 compares the performance in terms of SNR for block thresholding (block lengths and widths are discussed in the next section), Ephraim and Malah suppression rule equipped with the decision-directed SNR estimator [12] and hard thresholding. Two levels of noise removal have been used for the block thresholding and the Ephraim and Malah method. For the partial noise removal level (P), both methods were calibrated to retain a residual noise of similar energy : we chose a.5 in (27) for block thresholding and α.98 in (22) for the Ephraim and Malah method. To achieve the maximum noise removal level (M), we chose a = and α.999. For hard thresholding, the threshold was set equal to 3σ, where σ 2 is the noise variance. SNR Hard Block Thresholding Ephraim-Malah ( Mozart ) Thresholding Method P M P M db db db db Signal Hard Block Thresholding Ephraim-Malah (1 db SNR) Thresholding Method P M P M Piano Centuria Tête Table 2: Performance comparison. Top: Mozart with different SNR. Bottom: Piano, Centuria and Tête with 1 db SNR. From left to right: hard thresholding, block thresholding (with partial (P) and maximum (M) noise removal), Ephraim and Malah suppression rule equipped with the decision-directed SNR estimator (with partial (P) and maximum (M) noise removal levels). With partial noise removal level (P), in both methods, the residual noise masks the musical noise, however, block thresholding introduces less signal distortion as reflected by the systematic 2dB SNR improvement. With the maximum noise removal level (M), the musical noise cannot be masked by the residual noise since there is nearly no residual noise left. Whereas block thresholding hardly produces any musical noise, the Ephraim and Malah method results in noticeable musical noise, especially when the SNR of the noisy signal is small ( Mozart at db and 5 db). Note that the Ephraim and Malah method sometimes produces a resonance artifact, as if the sound was coming from far away. Such artifacts are especially strong for speech signals when α in the decision directed SNR estimator (22) is close to 1, which leads to a temporal window decreasing very slowly. Block thresholding does not create such artifact. Table 2 shows that a hard thresholding produces a smaller SNR than block thresholding (for both level (P) and (M)). Actually, it also produces a very strong musical noise. Fig. 3 displays
12 3 TIME-FREQUENCY BLOCK THRESHOLDING 12 the different attenuation coefficient maps for the Tête signal. It shows that block thresholding coefficients (Fig. 3(c)) are closer to the oracle coefficients (Fig. 3(f)) than the hard thresholding coefficients (Fig. 3(b)). Moreover the block thresholding coefficients map is much more regular than the hard thresholding one. This gives a visual confimation that block thresholding produces less signal distortion than hard thresholding. Note that the block thresholding scheme can also be implemented with half-overlapping blocks to further regularize the estimator. It is equivalent to compute 4 block thresholding estimators with blocks shifted by L/2 in time and/or by W/2 in frequency and then averaging the 4 signal estimations. It leads to a.2 db SNR improvement over the standard block thresholding with non-overlapping blocks, which is not much given the significant increase in the computational complexity. (a) (b) (c) Log spectrogram of noisy Tête (d) Hard-thresholding (e) Block thresholding (f) Adaptive block thresholding Adaptive block thresholding with empirical Wiener shrinkage post-processing Attenuation with oracle Figure 3: (a) log-spectrogram of Tête. Attenuation coefficients of hard-thresholding in (b), block thresholding in (c), adaptive block thresholding in (d), adaptive block thresholding with the empirical Wiener shrinkage as a post-processing in (e) and attenuation with oracle in (f). Values of attenuation coefficients from 1 (black) to (white).
13 4 ADAPTIVE BLOCK THRESHOLDING Block Sizes in Block Thresholding The block thresholding results presented in Table 2 are obtained with optimal block sizes that maximize the SNR among block lengths L = 8, 4, 2 in time and block widths W = 16, 8, 4, 2, 1 in frequency. Optimal block sizes are respectively (L, W) = (4, 1) for Piano, (L, W) = (8, 1) for Mozart, (L, W) = (8, 16) for Centuria and (L, W) = (4, 8) for Tête. Since the noise is white and thus uniform in time and frequency, (15) shows that the optimal block size and shape depends upon the time-frequency spread of the signal components. Within the block size family previously mentioned, there is a difference of more than 2 db SNR between the best and worse block sizes. Block sizes could also be adapted to different signal parts. Fig.4 zooms on the onset of Mozart signal whose log-spectrogram is illustrated in Fig 1(b). As shown in Figs 4(a) and (b), at the beginning of the harmonics, blocks of large attenuation factors spread beyond the onset of the signal. Fig4 (b ) illustrates the horizontal blocks at the onsets marked in Figs 4(a) and (b). This produces a pre-echo artifact 2 in the denoised signal. In the time interval where the blocks exceed the signal onset, little attenuation is performed, the noise is not eliminated, consequently a sound is heard before the very beginning of the original signal. A smaller block size would reduce this time interval and thus reduce this pre-echo artifact. (a) (b) (b ) (c) (c ) Figure 4: Zoom on the onset of Mozart. (a) log-spectrogram. Attenuation coefficients of block thresholding in (b) and adaptive block thresholding in (c). Values of attenuation coefficients from 1 (black) to (white). (b ) and (c ) illustrate respectively the block partition with block thresholding and adaptive thresholding at the onset marked in (b) and (c). 4 Adaptive Block Thresholding An adaptive block thresholding adapts block sizes to the time-frequency signal property by minimizing an estimation of the risk. Appropriate block sizes reduce pre-echo artifacts (as described in Section 3.4.2) and improve the SNR. 2 We call this artifact pre-echo though, originally, pre-echo corresponds to a psychoacoustic phenomenon where an unusually noticeable artifact is heard in a sound recording from the energy of time domain transients smeared backwards in time after processing in the frequency domain due to the Gibbs phenomenon.
14 4 ADAPTIVE BLOCK THRESHOLDING SURE of Block Thresholding Estimator The best choice of block sizes minimizes the estimation risk E{ ˆf f 2 }. This risk cannot be calculated since f is unknown, but it can be estimated with a Stein Unbiased Risk Estimate (SURE) [23]. Best block sizes are computed by minimizing this estimated risk. SURE is an estimate of the risk of an arbitrary estimator Ŷ of the mean value vector Y of a multivariate normal random vector X and having an identity covariance matrix. Since it is unbiased, E{SURE} = E Ŷ Y 2. Theorem (Stein Unbiased Risk Estimate SURE). Let X = (x 1,..., x p ) be a multivariate normal random vector of dimension p with mean Y and having an identity covariance matrix. Let X+h(X) be an estimate of Y, where h = (h 1,..., h p ) : R p R p almost differentiable (h i : R p R 1, i). Define h = p i=1 x i h i. If E So { p i=1 x i h i (X) } <, then E X + h(x) Y 2 = p + E { h(x) h(x) }. (28) SURE := p + h(x) h(x) (29) is an unbiased estimate of the risk of X +h(x), called Stein Unbiased Risk Estimate (SURE) [23]. The proof of (28) is essentially based on the fact that φ (y) = yφ(y), where φ(y) is the standard normal density [23]. Following the approach of Cai [3, 5], one can apply the SURE estimator to compute the risk of a block thresholding estimator. The Gaussian noise coefficients are uncorrelated and hence independent. Let us normalize the observed data z F [m] = y F [m]/σ[m], m Γ so that the normalized noise has an identity covariance matrix. Applying the SURE to the block thresholding estimator (17) on a block B k of size p = B # k, one has ( ) h m (X) = λ z F [m]1 zf,k 2 z 2 F,k >λ z F[m]1 z 2 F,k λ, m B k, (3) where zf,k 2 = 1 B # m B k z F [m] 2. Applying (29), one gets SURE Bk for a block thresholding k estimator SURE Bk = B # k + λ2 B # k 2λ(B# k 2) 1 zf,k 2 z 2 F,k >λ + B# k (z2 F,k 2)1 zf,k 2 λ. (31) Since SURE is unbiased, E{SURE Bk } = E{ m B k f F [m] ˆf F [m] 2 }. When the noise is Gaussian white, orthogonal coefficients are independent. For a tight frame this hypothesis is not valid, but (31) still applies approximately because a tight frame behaves similarly to a union of orthogonal bases. 1 One can verify that the variance of SURE B # Bk is approximately proportional to 1. When k B # k the blocks are small it is necessary to reduce this variance by making an average over several blocks B k inside a macroblock M: SURE M = k M SURE B k. Let M # be the number of coefficients 1 in all the blocks included in M, SURE M # M has a variance proportional to 1. M #
15 5 POST-PROCESSING: EMPIRICAL WIENER SHRINKAGE 15 The adaptive block thresholding groups coefficients in blocks whose sizes are adjusted to minimize SURE and it attenuates coefficients in those blocks. The blocks B k are sets of coefficients that are not necessarily connected or rectangular. In the following by block size we mean a choice of block shape and size among a collection of possibilities. In this adaptive grouping procedure, neighboring coefficients y F [m] are grouped in disjoint macroblocks M j, j = 1, 2..., J. A macroblock M j can be segmented in blocks B k of same size B # (j). Several such segmentations are possible and we want to choose the one that leads to the smallest risk estimated with SURE. The optimal block size B # (j) for the blocks B k in M j is calculated by minimizing the SURE in M j, i.e., B # (j) = arg min B # SURE Mj = argmin B # k M j SURE Bk, j = 1, 2..., J (32) To reduce its variance, SURE is calculated over blocks of identical size imposed in each macroblock. Macroblock size should not be too large in order to maintain enough adaptivity in the size evolution of blocks. Once the block sizes are computed, coefficients in each B k are attenuated with (17), where λ is calculated with (2). 4.2 Adaptive Block Thresholding in Short-Time Fourier Frames The time-frequency adaptive block thresholding is applied directly to short-time Fourier frames. In numerical experiments each macroblock is segmented with 15 possible block sizes B # = L W with a combination of block length L = 8, 4, 2 and block width W = 16, 8, 4, 2, 1. The thresholding parameter λ is calculated with (2). The size of macroblocks is set to be equal to the maximum block size B max # = Fig. 5 illustrates different segmentations of these macroblocks into time-frequency blocks of same size. Experiments have been performed on the same audio signals as in Subsection 3.4, with 1 db SNR, with the same short-time Fourier frames and with the maximum noise removal level (M), i.e., with a = in (27). The first two columns of Table 3 compare the performance in terms of SNR between the adaptive block thresholding and the block thresholding with an optimal fixed block size obtained with an oracle. For three out of the four signals, the adaptive block thresholding improves the SNR relatively to the optimal fixed-size block thresholding. With Piano the SNR improvement is as high as.5 db. With Mozart, the result is the second best among the 15 block size candidates and.25 db below the result obtained with the optimal block size. As shown in Figs 4(c)(c ), compared with Figs 4(b)(b ), in the first part of Mozart, the adaptive block method chooses blocks of shorter length L that hardly exceed the onset of the signal. This reduces considerably the pre-echo artifact discussed in Section After the onset, the adaptive block method chooses narrow horizontal blocks, of the same width as the non adaptive method, that are able to capture the harmonic structure of the signal. 5 Post-processing: Empirical Wiener Shrinkage As a post-processing, an empirical Wiener shrinkage [14] is cascaded after the adaptive block thresholding. It allows more flexible and accurate attenuation decision while it inherits the time-
16 5 POST-PROCESSING: EMPIRICAL WIENER SHRINKAGE 16 Figure 5: Partition of macroblocks into blocks of different sizes. Block Thresholding with Adaptive Block Thresholding Optimal Fixed Size Adaptive Block Thresholding with Empirical Wiener Shrinkage as Post-processing Piano Mozart Centuria Tête Table 3: Performance comparison between the block thresholding with the optimal fixed block size, the adaptive block thresholding and the adaptive block thresholding with the empirical Wiener shrinkage as a post-processing. frequency regularization of the estimate from the adaptive block thresholding. The basic idea is to use the denoised signal as if it was the clean signal. Let us denote f the denoised signal obtained by the adaptive block thresholding algorithm and f F [m] = f, g m. An empirical Wiener shrinkage is a diagonal thresholding with attenuation coefficients defined as in (4): a[m] = f F [m] 2 f F [m] 2 + σ 2. (33) Table 3 shows that an improvement of.25 db SNR on average is brought by the empirical Wiener shrinkage as a post-processing and.5 db on Mozart. Audio improvement due to the post-processing includes less distortion of the underlying signals and further removal of the musical noise.
17 6 CONCLUSION 17 Fig. 3(e) displays the attenuation coefficients map of the empirical Wiener shrinkage. It maintains the same time-frequency regularity of the adaptive block thresholding (Fig. 3(d)), and its coefficients are closer to the oracle coefficients (Fig. 3(f)). 6 Conclusion A diagonal thresholding of spectrogram coefficients is unsuitable for audio signal denoising because it produces too much musical noise. This paper describes a time-frequency block thresholding which produces hardly any musical noise and improves the SNR relatively to start-of-the-art methods such as Ephraim and Malah estimations. A block thresholding groups time-frequency signal coefficients in blocks and then attenuates coefficients in each block. This block grouping regularizes estimations and contributes to the elimination of the musical noise. The block size can also be adapted to the signal properties by minimizing a SURE estimator of the block thresholding risk. For audio signals it reduces distortions such as pre-echo artifacts. References [1] M. Berouti, R. Schwartz, J. Makhoul, Enhancement of speech corrupted by acoustic noise, Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP Vol. 4, pp , [2] S. Boll, Suppression of acoustic noise in speech using spectral subtraction, IEEE Trans. Acoust., Speech, Signal Process. ASSP-27, pp , [3] T. Cai, Adaptive wavelet estimation: a block thresholding and oracle inequality approach, Ann. Statist, 27, , [4] T. Cai and B.W. Silverman, Incorporation information on neighboring coefficients into wavelet estimation, Sankhya, 63, , 21. [5] T. Cai and H. Zhou, A data-driven block thresholding approach to wavelet estimation, Technical Report, Statistics Department, University of Pennsylvania, 25. [6] O. Cappe, Elimination of the musical noise phenomenon with the Ephraim and Malah Noise Suppressor, IEEE Trans. Speech and Audio Processing, vol. 2, p.p , Apr [7] I. Cohen, Speech enhancement using a noncausal a priori SNR estimator, Signal Processing Letters, IEEE, vol. 11, Issue 9, pp , Sept. 24. [8] I. Cohen, Enhancement of Speech Using Bark-Scaled Wavelet Packet Decomposition, Eurospeech, 21, Scandinavia. [9] R.R. Coifman, D.L. Donoho, Translation-Invariant De-Noising, [1] I. Daubechies, A. Grossmann, Y Meyer, Painless nonorthogonal expansions, J. Math. Phys., Vol. 27, No. 5, pp , 1986.
18 REFERENCES 18 [11] D. Donoho and I. Johnstone, Idea Spatial Adaptation via Wavelet Shrinkage, Biometrika, vol. 81, pp , [12] Y. Ephraim, D. Malah, Speech enhancement using a minimum mean square error short-time spectral amplitude estimator, IEEE. Trans. Acoust. Speech Signal Process, 32 (6), , Dec [13] Y. Ephraim and D. Malah, Speech enhancement using a minimum mean square error logspectral amplitude estimator, IEEE Trans. on Acoust., Speech, Signal Processing, vol. ASSP- 33, pp , Apr [14] S. Ghael, A. Sayeed and R. Baraniuk, Improved wavelet denoising via empirical wiener filtering, Proceedings for SPIE, Mathematical Imaging, San Diego, July [15] P. Hall, G. Kerkyacharian and D. Picard, A note on the wavelet oracle, Statistics and Probability Letters, 43, , [16] P. Hall, G. Kerkyacharian and D. Picard, Block threshold rules for curve estimation using kernel and wavelet methods, Ann. Statist, 26, , [17] P. Hall, G. Kerkyacharian and D. Picard, On the minimax optimality of block thresholded wavelet estimators, Statistica Sinica, 9, 33-5, [18] J.S. Lim and A.V. Oppenheim, Enhancement and bandwidth compression of noisy speech, Proc. of the IEEE, vol.67, Dec [19] S. Mallat, A Wavelet Tour of Signal Processing, 2nd edition, New York Academic, [2] R.J. McAulay, and M.L. Malpass, Speech enhancement using soft decision noise suppression filter, IEEE Trans. Acoust., Speech, Signal Process, ASSP-28, pp , 198. [21] H. Sheikhzadeh and H. R. Abutalebi, An improved wavelet-based speech enhancement system, EUROSPEECH, 21, [22] C. Stein and W. James, Estimation with quadratic loss, Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability 1 (Berkeley, University of California Press), , [23] C. Stein, Estimation of the mean of a multivariate normal distribution, Ann. Statist , 198. [24] J. S. Walker, Denoising Gabor Transforms, submitted. [25] P. J. Wolfe and S. J. Godsill, Simple alternatives to the Ephraim and Malah suppression rule for speech enhancement, IEEE Workshop on Statistical Signal Processing, pp , Aug. 21. [26] G. Yu, E. Bacry and S. Mallat, Audio Signal Denoising with Complex Wavelets and Adaptive block attenuation, to be appeared in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Hawaii, 27.
Optimal Speech Enhancement Under Signal Presence Uncertainty Using Log-Spectral Amplitude Estimator
1 Optimal Speech Enhancement Under Signal Presence Uncertainty Using Log-Spectral Amplitude Estimator Israel Cohen Lamar Signal Processing Ltd. P.O.Box 573, Yokneam Ilit 20692, Israel E-mail: icohen@lamar.co.il
More informationWhich wavelet bases are the best for image denoising?
Which wavelet bases are the best for image denoising? Florian Luisier a, Thierry Blu a, Brigitte Forster b and Michael Unser a a Biomedical Imaging Group (BIG), Ecole Polytechnique Fédérale de Lausanne
More informationBayesian Estimation of Time-Frequency Coefficients for Audio Signal Enhancement
Bayesian Estimation of Time-Frequency Coefficients for Audio Signal Enhancement Patrick J. Wolfe Department of Engineering University of Cambridge Cambridge CB2 1PZ, UK pjw47@eng.cam.ac.uk Simon J. Godsill
More informationSignal Denoising with Wavelets
Signal Denoising with Wavelets Selin Aviyente Department of Electrical and Computer Engineering Michigan State University March 30, 2010 Introduction Assume an additive noise model: x[n] = f [n] + w[n]
More informationLecture Notes 5: Multiresolution Analysis
Optimization-based data analysis Fall 2017 Lecture Notes 5: Multiresolution Analysis 1 Frames A frame is a generalization of an orthonormal basis. The inner products between the vectors in a frame and
More informationMANY digital speech communication applications, e.g.,
406 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 2, FEBRUARY 2007 An MMSE Estimator for Speech Enhancement Under a Combined Stochastic Deterministic Speech Model Richard C.
More informationEMPLOYING PHASE INFORMATION FOR AUDIO DENOISING. İlker Bayram. Istanbul Technical University, Istanbul, Turkey
EMPLOYING PHASE INFORMATION FOR AUDIO DENOISING İlker Bayram Istanbul Technical University, Istanbul, Turkey ABSTRACT Spectral audio denoising methods usually make use of the magnitudes of a time-frequency
More informationSparsity Measure and the Detection of Significant Data
Sparsity Measure and the Detection of Significant Data Abdourrahmane Atto, Dominique Pastor, Grégoire Mercier To cite this version: Abdourrahmane Atto, Dominique Pastor, Grégoire Mercier. Sparsity Measure
More informationModifying Voice Activity Detection in Low SNR by correction factors
Modifying Voice Activity Detection in Low SNR by correction factors H. Farsi, M. A. Mozaffarian, H.Rahmani Department of Electrical Engineering University of Birjand P.O. Box: +98-9775-376 IRAN hfarsi@birjand.ac.ir
More informationEstimation Error Bounds for Frame Denoising
Estimation Error Bounds for Frame Denoising Alyson K. Fletcher and Kannan Ramchandran {alyson,kannanr}@eecs.berkeley.edu Berkeley Audio-Visual Signal Processing and Communication Systems group Department
More informationDesign of Image Adaptive Wavelets for Denoising Applications
Design of Image Adaptive Wavelets for Denoising Applications Sanjeev Pragada and Jayanthi Sivaswamy Center for Visual Information Technology International Institute of Information Technology - Hyderabad,
More informationSatellite image deconvolution using complex wavelet packets
Satellite image deconvolution using complex wavelet packets André Jalobeanu, Laure Blanc-Féraud, Josiane Zerubia ARIANA research group INRIA Sophia Antipolis, France CNRS / INRIA / UNSA www.inria.fr/ariana
More informationMultiresolution Analysis
Multiresolution Analysis DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Frames Short-time Fourier transform
More informationWavelet Footprints: Theory, Algorithms, and Applications
1306 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 51, NO. 5, MAY 2003 Wavelet Footprints: Theory, Algorithms, and Applications Pier Luigi Dragotti, Member, IEEE, and Martin Vetterli, Fellow, IEEE Abstract
More informationOPTIMAL SURE PARAMETERS FOR SIGMOIDAL WAVELET SHRINKAGE
17th European Signal Processing Conference (EUSIPCO 009) Glasgow, Scotland, August 4-8, 009 OPTIMAL SURE PARAMETERS FOR SIGMOIDAL WAVELET SHRINKAGE Abdourrahmane M. Atto 1, Dominique Pastor, Gregoire Mercier
More information2D Spectrogram Filter for Single Channel Speech Enhancement
Proceedings of the 7th WSEAS International Conference on Signal, Speech and Image Processing, Beijing, China, September 15-17, 007 89 D Spectrogram Filter for Single Channel Speech Enhancement HUIJUN DING,
More informationIMPROVEMENTS IN MODAL PARAMETER EXTRACTION THROUGH POST-PROCESSING FREQUENCY RESPONSE FUNCTION ESTIMATES
IMPROVEMENTS IN MODAL PARAMETER EXTRACTION THROUGH POST-PROCESSING FREQUENCY RESPONSE FUNCTION ESTIMATES Bere M. Gur Prof. Christopher Niezreci Prof. Peter Avitabile Structural Dynamics and Acoustic Systems
More informationCovariance smoothing and consistent Wiener filtering for artifact reduction in audio source separation
Covariance smoothing and consistent Wiener filtering for artifact reduction in audio source separation Emmanuel Vincent METISS Team Inria Rennes - Bretagne Atlantique E. Vincent (Inria) Artifact reduction
More informationSINGLE-CHANNEL SPEECH PRESENCE PROBABILITY ESTIMATION USING INTER-FRAME AND INTER-BAND CORRELATIONS
204 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) SINGLE-CHANNEL SPEECH PRESENCE PROBABILITY ESTIMATION USING INTER-FRAME AND INTER-BAND CORRELATIONS Hajar Momeni,2,,
More informationDesign Criteria for the Quadratically Interpolated FFT Method (I): Bias due to Interpolation
CENTER FOR COMPUTER RESEARCH IN MUSIC AND ACOUSTICS DEPARTMENT OF MUSIC, STANFORD UNIVERSITY REPORT NO. STAN-M-4 Design Criteria for the Quadratically Interpolated FFT Method (I): Bias due to Interpolation
More informationA POSTERIORI SPEECH PRESENCE PROBABILITY ESTIMATION BASED ON AVERAGED OBSERVATIONS AND A SUPER-GAUSSIAN SPEECH MODEL
A POSTERIORI SPEECH PRESENCE PROBABILITY ESTIMATION BASED ON AVERAGED OBSERVATIONS AND A SUPER-GAUSSIAN SPEECH MODEL Balázs Fodor Institute for Communications Technology Technische Universität Braunschweig
More informationNOISE ROBUST RELATIVE TRANSFER FUNCTION ESTIMATION. M. Schwab, P. Noll, and T. Sikora. Technical University Berlin, Germany Communication System Group
NOISE ROBUST RELATIVE TRANSFER FUNCTION ESTIMATION M. Schwab, P. Noll, and T. Sikora Technical University Berlin, Germany Communication System Group Einsteinufer 17, 1557 Berlin (Germany) {schwab noll
More informationBIAS CORRECTION METHODS FOR ADAPTIVE RECURSIVE SMOOTHING WITH APPLICATIONS IN NOISE PSD ESTIMATION. Robert Rehr, Timo Gerkmann
BIAS CORRECTION METHODS FOR ADAPTIVE RECURSIVE SMOOTHING WITH APPLICATIONS IN NOISE PSD ESTIMATION Robert Rehr, Timo Gerkmann Speech Signal Processing Group, Department of Medical Physics and Acoustics
More informationSINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan
SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS Emad M. Grais and Hakan Erdogan Faculty of Engineering and Natural Sciences, Sabanci University, Orhanli
More informationSparse Time-Frequency Transforms and Applications.
Sparse Time-Frequency Transforms and Applications. Bruno Torrésani http://www.cmi.univ-mrs.fr/~torresan LATP, Université de Provence, Marseille DAFx, Montreal, September 2006 B. Torrésani (LATP Marseille)
More informationDiscussion of Regularization of Wavelets Approximations by A. Antoniadis and J. Fan
Discussion of Regularization of Wavelets Approximations by A. Antoniadis and J. Fan T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania Professors Antoniadis and Fan are
More informationSparse linear models
Sparse linear models Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 2/22/2016 Introduction Linear transforms Frequency representation Short-time
More informationDenoising Gabor Transforms
1 Denoising Gabor Transforms James S. Walker Abstract We describe denoising one-dimensional signals by thresholding Blackman windowed Gabor transforms. This method is compared with Gauss-windowed Gabor
More informationImproved Speech Presence Probabilities Using HMM-Based Inference, with Applications to Speech Enhancement and ASR
Improved Speech Presence Probabilities Using HMM-Based Inference, with Applications to Speech Enhancement and ASR Bengt J. Borgström, Student Member, IEEE, and Abeer Alwan, IEEE Fellow Abstract This paper
More informationNOISE reduction is an important fundamental signal
1526 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 20, NO. 5, JULY 2012 Non-Causal Time-Domain Filters for Single-Channel Noise Reduction Jesper Rindom Jensen, Student Member, IEEE,
More informationREAL-TIME TIME-FREQUENCY BASED BLIND SOURCE SEPARATION. Scott Rickard, Radu Balan, Justinian Rosca. Siemens Corporate Research Princeton, NJ 08540
REAL-TIME TIME-FREQUENCY BASED BLIND SOURCE SEPARATION Scott Rickard, Radu Balan, Justinian Rosca Siemens Corporate Research Princeton, NJ 84 fscott.rickard,radu.balan,justinian.roscag@scr.siemens.com
More informationA New Poisson Noisy Image Denoising Method Based on the Anscombe Transformation
A New Poisson Noisy Image Denoising Method Based on the Anscombe Transformation Jin Quan 1, William G. Wee 1, Chia Y. Han 2, and Xuefu Zhou 1 1 School of Electronic and Computing Systems, University of
More informationMMSE Denoising of 2-D Signals Using Consistent Cycle Spinning Algorithm
Denoising of 2-D Signals Using Consistent Cycle Spinning Algorithm Bodduluri Asha, B. Leela kumari Abstract: It is well known that in a real world signals do not exist without noise, which may be negligible
More informationA SPEECH PRESENCE PROBABILITY ESTIMATOR BASED ON FIXED PRIORS AND A HEAVY-TAILED SPEECH MODEL
A SPEECH PRESENCE PROBABILITY ESTIMATOR BASED ON FIXED PRIORS AND A HEAVY-TAILED SPEECH MODEL Balázs Fodor Institute for Communications Technology Technische Universität Braunschweig 386 Braunschweig,
More informationSingle Channel Signal Separation Using MAP-based Subspace Decomposition
Single Channel Signal Separation Using MAP-based Subspace Decomposition Gil-Jin Jang, Te-Won Lee, and Yung-Hwan Oh 1 Spoken Language Laboratory, Department of Computer Science, KAIST 373-1 Gusong-dong,
More informationA SPECTRAL SUBTRACTION RULE FOR REAL-TIME DSP IMPLEMENTATION OF NOISE REDUCTION IN SPEECH SIGNALS
Proc. of the 1 th Int. Conference on Digital Audio Effects (DAFx-9), Como, Italy, September 1-4, 9 A SPECTRAL SUBTRACTION RULE FOR REAL-TIME DSP IMPLEMENTATION OF NOISE REDUCTION IN SPEECH SIGNALS Matteo
More informationAdaptive Wavelet Estimation: A Block Thresholding and Oracle Inequality Approach
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1999 Adaptive Wavelet Estimation: A Block Thresholding and Oracle Inequality Approach T. Tony Cai University of Pennsylvania
More informationSPEECH ENHANCEMENT USING PCA AND VARIANCE OF THE RECONSTRUCTION ERROR IN DISTRIBUTED SPEECH RECOGNITION
SPEECH ENHANCEMENT USING PCA AND VARIANCE OF THE RECONSTRUCTION ERROR IN DISTRIBUTED SPEECH RECOGNITION Amin Haji Abolhassani 1, Sid-Ahmed Selouani 2, Douglas O Shaughnessy 1 1 INRS-Energie-Matériaux-Télécommunications,
More informationThe Lifting Wavelet Transform for Periodogram Smoothing
ISSN : 976-8491 (Online) ISSN : 2229-4333 (Print) IJCST Vo l. 3, Is s u e 1, Ja n. - Ma r c h 212 The Lifting for Periodogram Smoothing 1 M.Venakatanarayana, 2 Dr. T.Jayachandra Prasad 1 Dept. of ECE,
More informationWavelet de-noising for blind source separation in noisy mixtures.
Wavelet for blind source separation in noisy mixtures. Bertrand Rivet 1, Vincent Vigneron 1, Anisoara Paraschiv-Ionescu 2 and Christian Jutten 1 1 Institut National Polytechnique de Grenoble. Laboratoire
More informationAn Investigation of 3D Dual-Tree Wavelet Transform for Video Coding
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com An Investigation of 3D Dual-Tree Wavelet Transform for Video Coding Beibei Wang, Yao Wang, Ivan Selesnick and Anthony Vetro TR2004-132 December
More informationRecent Advancements in Speech Enhancement
Recent Advancements in Speech Enhancement Yariv Ephraim and Israel Cohen 1 May 17, 2004 Abstract Speech enhancement is a long standing problem with numerous applications ranging from hearing aids, to coding
More informationNew Statistical Model for the Enhancement of Noisy Speech
New Statistical Model for the Enhancement of Noisy Speech Electrical Engineering Department Technion - Israel Institute of Technology February 22, 27 Outline Problem Formulation and Motivation 1 Problem
More informationA Brief Survey of Speech Enhancement 1
A Brief Survey of Speech Enhancement 1 Yariv Ephraim, Hanoch Lev-Ari and William J.J. Roberts 2 August 2, 2003 Abstract We present a brief overview of the speech enhancement problem for wide-band noise
More informationAn Introduction to Wavelets and some Applications
An Introduction to Wavelets and some Applications Milan, May 2003 Anestis Antoniadis Laboratoire IMAG-LMC University Joseph Fourier Grenoble, France An Introduction to Wavelets and some Applications p.1/54
More informationPERCEPTUAL MATCHING PURSUIT WITH GABOR DICTIONARIES AND TIME-FREQUENCY MASKING. Gilles Chardon, Thibaud Necciari, and Peter Balazs
21 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) PERCEPTUAL MATCHING PURSUIT WITH GABOR DICTIONARIES AND TIME-FREQUENCY MASKING Gilles Chardon, Thibaud Necciari, and
More informationSparse linear models and denoising
Lecture notes 4 February 22, 2016 Sparse linear models and denoising 1 Introduction 1.1 Definition and motivation Finding representations of signals that allow to process them more effectively is a central
More informationMedian Filter Based Realizations of the Robust Time-Frequency Distributions
TIME-FREQUENCY SIGNAL ANALYSIS 547 Median Filter Based Realizations of the Robust Time-Frequency Distributions Igor Djurović, Vladimir Katkovnik, LJubiša Stanković Abstract Recently, somenewefficient tools
More informationExpressions for the covariance matrix of covariance data
Expressions for the covariance matrix of covariance data Torsten Söderström Division of Systems and Control, Department of Information Technology, Uppsala University, P O Box 337, SE-7505 Uppsala, Sweden
More informationMULTI-RESOLUTION SIGNAL DECOMPOSITION WITH TIME-DOMAIN SPECTROGRAM FACTORIZATION. Hirokazu Kameoka
MULTI-RESOLUTION SIGNAL DECOMPOSITION WITH TIME-DOMAIN SPECTROGRAM FACTORIZATION Hiroazu Kameoa The University of Toyo / Nippon Telegraph and Telephone Corporation ABSTRACT This paper proposes a novel
More informationImage Denoising using Uniform Curvelet Transform and Complex Gaussian Scale Mixture
EE 5359 Multimedia Processing Project Report Image Denoising using Uniform Curvelet Transform and Complex Gaussian Scale Mixture By An Vo ISTRUCTOR: Dr. K. R. Rao Summer 008 Image Denoising using Uniform
More informationDenosing Using Wavelets and Projections onto the l 1 -Ball
1 Denosing Using Wavelets and Projections onto the l 1 -Ball October 6, 2014 A. Enis Cetin, M. Tofighi Dept. of Electrical and Electronic Engineering, Bilkent University, Ankara, Turkey cetin@bilkent.edu.tr,
More informationGAUSSIANIZATION METHOD FOR IDENTIFICATION OF MEMORYLESS NONLINEAR AUDIO SYSTEMS
GAUSSIANIATION METHOD FOR IDENTIFICATION OF MEMORYLESS NONLINEAR AUDIO SYSTEMS I. Marrakchi-Mezghani (1),G. Mahé (2), M. Jaïdane-Saïdane (1), S. Djaziri-Larbi (1), M. Turki-Hadj Alouane (1) (1) Unité Signaux
More informationWavelet Based Image Restoration Using Cross-Band Operators
1 Wavelet Based Image Restoration Using Cross-Band Operators Erez Cohen Electrical Engineering Department Technion - Israel Institute of Technology Supervised by Prof. Israel Cohen 2 Layout Introduction
More informationA Generalized Subspace Approach for Enhancing Speech Corrupted by Colored Noise
334 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL 11, NO 4, JULY 2003 A Generalized Subspace Approach for Enhancing Speech Corrupted by Colored Noise Yi Hu, Student Member, IEEE, and Philipos C
More informationRecent developments on sparse representation
Recent developments on sparse representation Zeng Tieyong Department of Mathematics, Hong Kong Baptist University Email: zeng@hkbu.edu.hk Hong Kong Baptist University Dec. 8, 2008 First Previous Next Last
More informationImage representation with multi-scale gradients
Image representation with multi-scale gradients Eero P Simoncelli Center for Neural Science, and Courant Institute of Mathematical Sciences New York University http://www.cns.nyu.edu/~eero Visual image
More informationCurvelet imaging & processing: sparseness constrained least-squares migration
Curvelet imaging & processing: sparseness constrained least-squares migration Felix J. Herrmann and Peyman P. Moghaddam (EOS-UBC) felix@eos.ubc.ca & www.eos.ubc.ca/~felix thanks to: Gilles, Peyman and
More informationWavelet Analysis for Nanoscopic TEM Biomedical Images with Effective Weiner Filter
Wavelet Analysis for Nanoscopic TEM Biomedical Images with Effective Weiner Filter Garima Goyal goyal.garima18@gmail.com Assistant Professor, Department of Information Science & Engineering Jyothy Institute
More informationDenoising via Recursive Wavelet Thresholding. Alyson Kerry Fletcher. A thesis submitted in partial satisfaction of the requirements for the degree of
Denoising via Recursive Wavelet Thresholding by Alyson Kerry Fletcher A thesis submitted in partial satisfaction of the requirements for the degree of Master of Science in Electrical Engineering in the
More informationModeling speech signals in the time frequency domain using GARCH
Signal Processing () 53 59 Fast communication Modeling speech signals in the time frequency domain using GARCH Israel Cohen Department of Electrical Engineering, Technion Israel Institute of Technology,
More informationWavelet Analysis of Print Defects
Wavelet Analysis of Print Defects Kevin D. Donohue, Chengwu Cui, and M.Vijay Venkatesh University of Kentucky, Lexington, Kentucky Lexmark International Inc., Lexington, Kentucky Abstract This paper examines
More informationNonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation
Nonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation Mikkel N. Schmidt and Morten Mørup Technical University of Denmark Informatics and Mathematical Modelling Richard
More informationIndependent Component Analysis and Unsupervised Learning. Jen-Tzung Chien
Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood
More informationLINEARIZED BREGMAN ITERATIONS FOR FRAME-BASED IMAGE DEBLURRING
LINEARIZED BREGMAN ITERATIONS FOR FRAME-BASED IMAGE DEBLURRING JIAN-FENG CAI, STANLEY OSHER, AND ZUOWEI SHEN Abstract. Real images usually have sparse approximations under some tight frame systems derived
More informationPDE-SVD BASED AUDIO DENOISING. George Baravdish, Gianpaolo Evangelista, Olof Svensson
PDE-SVD BASED AUDIO DENOISING George Baravdish, Gianpaolo Evangelista, Olof Svensson Linköping University Norrköping, Sweden Faten Sofya Mosul University Mosul, Iraq ABSTRACT In this paper we present a
More informationSound Recognition in Mixtures
Sound Recognition in Mixtures Juhan Nam, Gautham J. Mysore 2, and Paris Smaragdis 2,3 Center for Computer Research in Music and Acoustics, Stanford University, 2 Advanced Technology Labs, Adobe Systems
More informationIntroduction Wavelet shrinage methods have been very successful in nonparametric regression. But so far most of the wavelet regression methods have be
Wavelet Estimation For Samples With Random Uniform Design T. Tony Cai Department of Statistics, Purdue University Lawrence D. Brown Department of Statistics, University of Pennsylvania Abstract We show
More informationLECTURE NOTES IN AUDIO ANALYSIS: PITCH ESTIMATION FOR DUMMIES
LECTURE NOTES IN AUDIO ANALYSIS: PITCH ESTIMATION FOR DUMMIES Abstract March, 3 Mads Græsbøll Christensen Audio Analysis Lab, AD:MT Aalborg University This document contains a brief introduction to pitch
More informationWavelet denoising of magnetic prospecting data
JOURNAL OF BALKAN GEOPHYSICAL SOCIETY, Vol. 8, No.2, May, 2005, p. 28-36 Wavelet denoising of magnetic prospecting data Basiliki Tsivouraki-Papafotiou, Gregory N. Tsokas and Panagiotis Tsurlos (Received
More informationCorrespondence. Wavelet Thresholding for Multiple Noisy Image Copies
IEEE TRASACTIOS O IMAGE PROCESSIG, VOL. 9, O. 9, SEPTEMBER 000 63 Correspondence Wavelet Thresholding for Multiple oisy Image Copies S. Grace Chang, Bin Yu, and Martin Vetterli Abstract This correspondence
More informationApplication of the Tuned Kalman Filter in Speech Enhancement
Application of the Tuned Kalman Filter in Speech Enhancement Orchisama Das, Bhaswati Goswami and Ratna Ghosh Department of Instrumentation and Electronics Engineering Jadavpur University Kolkata, India
More informationDigital Image Processing Lectures 15 & 16
Lectures 15 & 16, Professor Department of Electrical and Computer Engineering Colorado State University CWT and Multi-Resolution Signal Analysis Wavelet transform offers multi-resolution by allowing for
More informationTwo Denoising Methods by Wavelet Transform
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 47, NO. 12, DECEMBER 1999 3401 Two Denoising Methods by Wavelet Transform Quan Pan, Lei Zhang, Guanzhong Dai, and Hongcai Zhang Abstract Two wavelet-based noise
More informationDigital Signal Processing
Digital Signal Processing 0 (010) 157 1578 Contents lists available at ScienceDirect Digital Signal Processing www.elsevier.com/locate/dsp Improved minima controlled recursive averaging technique using
More informationcovariance function, 174 probability structure of; Yule-Walker equations, 174 Moving average process, fluctuations, 5-6, 175 probability structure of
Index* The Statistical Analysis of Time Series by T. W. Anderson Copyright 1971 John Wiley & Sons, Inc. Aliasing, 387-388 Autoregressive {continued) Amplitude, 4, 94 case of first-order, 174 Associated
More informationCOMPLEX WAVELET TRANSFORM IN SIGNAL AND IMAGE ANALYSIS
COMPLEX WAVELET TRANSFORM IN SIGNAL AND IMAGE ANALYSIS MUSOKO VICTOR, PROCHÁZKA ALEŠ Institute of Chemical Technology, Department of Computing and Control Engineering Technická 905, 66 8 Prague 6, Cech
More informationA priori SNR estimation and noise estimation for speech enhancement
Yao et al. EURASIP Journal on Advances in Signal Processing (2016) 2016:101 DOI 10.1186/s13634-016-0398-z EURASIP Journal on Advances in Signal Processing RESEARCH A priori SNR estimation and noise estimation
More informationCEPSTRAL analysis has been widely used in signal processing
162 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 7, NO. 2, MARCH 1999 On Second-Order Statistics and Linear Estimation of Cepstral Coefficients Yariv Ephraim, Fellow, IEEE, and Mazin Rahim, Senior
More informationA Multi-window Fractional Evolutionary Spectral Analysis
A Multi-window Fractional Evolutionary Spectral Analysis YALÇIN ÇEKİÇ, AYDIN AKAN, and MAHMUT ÖZTÜRK University of Bahcesehir, Department of Electrical and Electronics Engineering Bahcesehir, 49, Istanbul,
More informationUNIFORMLY MOST POWERFUL CYCLIC PERMUTATION INVARIANT DETECTION FOR DISCRETE-TIME SIGNALS
UNIFORMLY MOST POWERFUL CYCLIC PERMUTATION INVARIANT DETECTION FOR DISCRETE-TIME SIGNALS F. C. Nicolls and G. de Jager Department of Electrical Engineering, University of Cape Town Rondebosch 77, South
More informationA NO-REFERENCE SHARPNESS METRIC SENSITIVE TO BLUR AND NOISE. Xiang Zhu and Peyman Milanfar
A NO-REFERENCE SARPNESS METRIC SENSITIVE TO BLUR AND NOISE Xiang Zhu and Peyman Milanfar Electrical Engineering Department University of California at Santa Cruz, CA, 9564 xzhu@soeucscedu ABSTRACT A no-reference
More informationAn Overview of Sparsity with Applications to Compression, Restoration, and Inverse Problems
An Overview of Sparsity with Applications to Compression, Restoration, and Inverse Problems Justin Romberg Georgia Tech, School of ECE ENS Winter School January 9, 2012 Lyon, France Applied and Computational
More informationSimultaneous Multi-frame MAP Super-Resolution Video Enhancement using Spatio-temporal Priors
Simultaneous Multi-frame MAP Super-Resolution Video Enhancement using Spatio-temporal Priors Sean Borman and Robert L. Stevenson Department of Electrical Engineering, University of Notre Dame Notre Dame,
More informationSignal Modeling Techniques in Speech Recognition. Hassan A. Kingravi
Signal Modeling Techniques in Speech Recognition Hassan A. Kingravi Outline Introduction Spectral Shaping Spectral Analysis Parameter Transforms Statistical Modeling Discussion Conclusions 1: Introduction
More informationA Priori SNR Estimation Using a Generalized Decision Directed Approach
A Priori SNR Estimation Using a Generalized Decision Directed Approach Aleksej Chinaev, Reinhold Haeb-Umbach Department of Communications Engineering, Paderborn University, 3398 Paderborn, Germany {chinaev,haeb}@nt.uni-paderborn.de
More informationPitch Estimation and Tracking with Harmonic Emphasis On The Acoustic Spectrum
Downloaded from vbn.aau.dk on: marts 31, 2019 Aalborg Universitet Pitch Estimation and Tracking with Harmonic Emphasis On The Acoustic Spectrum Karimian-Azari, Sam; Mohammadiha, Nasser; Jensen, Jesper
More informationarxiv:math/ v1 [math.na] 12 Feb 2005
arxiv:math/0502252v1 [math.na] 12 Feb 2005 An Orthogonal Discrete Auditory Transform Jack Xin and Yingyong Qi Abstract An orthogonal discrete auditory transform (ODAT) from sound signal to spectrum is
More informationAcoustic MIMO Signal Processing
Yiteng Huang Jacob Benesty Jingdong Chen Acoustic MIMO Signal Processing With 71 Figures Ö Springer Contents 1 Introduction 1 1.1 Acoustic MIMO Signal Processing 1 1.2 Organization of the Book 4 Part I
More informationMultinomial Data. f(y θ) θ y i. where θ i is the probability that a given trial results in category i, i = 1,..., k. The parameter space is
Multinomial Data The multinomial distribution is a generalization of the binomial for the situation in which each trial results in one and only one of several categories, as opposed to just two, as in
More informationLow-Complexity Image Denoising via Analytical Form of Generalized Gaussian Random Vectors in AWGN
Low-Complexity Image Denoising via Analytical Form of Generalized Gaussian Random Vectors in AWGN PICHID KITTISUWAN Rajamangala University of Technology (Ratanakosin), Department of Telecommunication Engineering,
More informationPARAMETRIC coding has proven to be very effective
966 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 3, MARCH 2007 High-Resolution Spherical Quantization of Sinusoidal Parameters Pim Korten, Jesper Jensen, and Richard Heusdens
More informationSparse & Redundant Signal Representation, and its Role in Image Processing
Sparse & Redundant Signal Representation, and its Role in Michael Elad The CS Department The Technion Israel Institute of technology Haifa 3000, Israel Wave 006 Wavelet and Applications Ecole Polytechnique
More informationA Lower Bound Theorem. Lin Hu.
American J. of Mathematics and Sciences Vol. 3, No -1,(January 014) Copyright Mind Reader Publications ISSN No: 50-310 A Lower Bound Theorem Department of Applied Mathematics, Beijing University of Technology,
More informationMultivariate Bayes Wavelet Shrinkage and Applications
Journal of Applied Statistics Vol. 32, No. 5, 529 542, July 2005 Multivariate Bayes Wavelet Shrinkage and Applications GABRIEL HUERTA Department of Mathematics and Statistics, University of New Mexico
More informationWavelet Shrinkage for Nonequispaced Samples
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1998 Wavelet Shrinkage for Nonequispaced Samples T. Tony Cai University of Pennsylvania Lawrence D. Brown University
More informationMULTI-SCALE IMAGE DENOISING BASED ON GOODNESS OF FIT (GOF) TESTS
MULTI-SCALE IMAGE DENOISING BASED ON GOODNESS OF FIT (GOF) TESTS Naveed ur Rehman 1, Khuram Naveed 1, Shoaib Ehsan 2, Klaus McDonald-Maier 2 1 Department of Electrical Engineering, COMSATS Institute of
More informationTHE quintessential goal of statistical estimation is to
I TRANSACTIONS ON INFORMATION THORY, VOL. 45, NO. 7, NOVMBR 1999 2225 On Denoising and Best Signal Representation Hamid Krim, Senior Member, I, Dewey Tucker, Stéphane Mallat, Member, I, and David Donoho
More informationWavelet Based Image Denoising Technique
(IJACSA) International Journal of Advanced Computer Science and Applications, Wavelet Based Image Denoising Technique Sachin D Ruikar Dharmpal D Doye Department of Electronics and Telecommunication Engineering
More informationA Data-Driven Block Thresholding Approach To Wavelet Estimation
A Data-Driven Block Thresholding Approach To Wavelet Estimation T. Tony Cai 1 and Harrison H. Zhou University of Pennsylvania and Yale University Abstract A data-driven block thresholding procedure for
More information