RECTIFIED LINEAR UNIT CAN ASSIST GRIFFIN LIM PHASE RECOVERY. Kohei Yatabe, Yoshiki Masuyama and Yasuhiro Oikawa

Size: px
Start display at page:

Download "RECTIFIED LINEAR UNIT CAN ASSIST GRIFFIN LIM PHASE RECOVERY. Kohei Yatabe, Yoshiki Masuyama and Yasuhiro Oikawa"

Transcription

1 RECTIFIED LINEAR UNIT CAN ASSIST GRIFFIN LIM PHASE RECOVERY Kohei Yatabe, Yoshiki Masuyama and Yasuhiro Oikawa Department of Intermedia Art and Science, Waseda University, Tokyo, Japan ABSTRACT Phase recovery is an essential process for reconstructing a timedomain signal from the corresponding spectrogram when its phase is contaminated or unavailable. Recently, a phase recovery method using deep neural network (DNN) was proposed, which interested us because the inverse short-time Fourier transform (inverse STFT) was utilized within the network. This inverse STFT converts a spectrogram into its time-domain counterpart, and then the activation function, leaky rectified linear unit (ReLU), is applied. Such nonlinear operation in time domain resembles the speech enhancement method called the harmonic regeneration noise reduction (HRNR). In HRNR, a time-domain nonlinearity, typically ReLU, is applied for assistance in enhancing the higher-order harmonics. From this point of view, one question arose in our mind: Can time-domain ReLU solely assist phase recovery? Inspired by this curious connection between the recent DNN-based phase recovery method and HRNR in speech enhancement, the ReLU assisted Griffin Lim algorithm is proposed in this paper to investigate the above question. Through an experiment of speech denoising with the oracle Wiener filter, some positive effect of the time-domain nonlinearity is confirmed in terms of the scores of the short-time objective intelligibility (STOI). Index Terms Spectrogram, redundancy, consistency, timedomain nonlinearity, harmonic regeneration. 1. INTRODUCTION Recent important trend in signal processing and speech enhancement includes phase recovery of an audio signal. Many of the popular acoustical processing methods are formulated in the time-frequency domain, obtained through the short-time Fourier transform (STFT), where the processing is usually implemented as a procedure of modifying the amplitude at each time-frequency bin. Although the spectrograms are parametrized by both amplitude and phase as they are expressed as a collection of complex numbers, phase had been ignored for several decades until the pioneering works demonstrated its importance. Recently, the so-called phase-aware signal processing gains considerable attention in the community, and a number of methodologies have been proposed [1 3]. This paper focuses on its branch called phase recovery which aims to obtain a better phase spectrogram under the given amplitude (together with noisy phase in some applications such as speech denoising). As usual in signal processing, phase recovery methods can be categorized by the amount of imposed prior knowledge. One of the most general algorithms is the Griffin Lim algorithm [4 6] which retrieves the phase only based on the redundancy of the timefrequency representation. In the algorithm, the phase is modified only by the linear transformation between time and time-frequency domains (STFT and its inverse), and no assumption is made upon the structure of the data. Therefore, even though the Griffin Lim algorithm might not achieve a good performance due to the insufficiency of the prior knowledge, it is utilized in a wide variety of applications. On the other hand, there are several phase recovery methods based on the structure of the data. For example, the harmonic structure of speech signals has been considered in the model-based phase recovery [7 10] which can obtain a better result with a price of narrowing the range of applications. Very recently, a phase recovery method based on a deep neural network (DNN) was proposed [11] along the extraordinary success stories of DNN in the last decade. Although it might not seems to have assumptions on the data, DNN heavily relies on the extremely rich prior knowledge, which is automatically learned from the training dataset, when it is applied as a signal processor. One thing from the DNN-based phase recovery in [11] which interested the authors is the use of the inverse STFT to obtain the time-domain signal within the network. As DNN is a composition of affine and nonlinear functions [12], this time-domain signal obtained by the inverse-stft layer was fed into the nonlinear functions [11]. Such nonlinearity in the time domain reminds us a speech enhancement method called harmonic regeneration noise reduction (HRNR) [13 15] which utilizes a time-domain nonlinear function, together with the inverse STFT, to recover the harmonic structure (especially in the high-frequency range) of speech signals. A typical choice of the nonlinear function in HRNR is the half-wave rectifier [13 15] which is equivalent to the quite popular activation called the rectified linear unit (ReLU) in the literature of DNN [16,17]. Indeed, the DNN-based phase recovery method in [11] utilized a variant of ReLU in the time domain, namely Leaky ReLU. This curious connection between the DNN-based phase recovery and HRNR suggested one possibility: Time-domain nonlinearity can solely contribute to phase recovery without a network. For investigating this conjecture, a combination of time-domain nonlinearity and a phase recovery algorithm is proposed, and its performance for speech enhancement is experimentally investigated in this paper. The Griffin Lim algorithm is chosen for the baseline method because it is the standard phase recovery algorithm without any assumption on the structure of data. ReLU is incorporated within its procedure, after the inverse STFT as the DNN-based method did, in order to artificially generate harmonic components as in HRNR. This modified Griffin Lim algorithm with the time-domain ReLU is compared to that without ReLU for seeing the effect of the timedomain nonlinearity. An experiment of speech denoising using the oracle Wiener filter is conducted with 200 speech signals obtained from the TIMIT database, and its result indicates the above conjecture positively. 2. PHASE RECOVERY OF SPECTROGRAM In this section, the standard time-frequency domain representation (spectrogram) of speech signals is briefly reviewed. The ordinary Griffin Lim algorithm is also introduced here so that the proposed modification in the subsequent section becomes apparent /18/$31.00 c 2018 IEEE

2 2.1. Time-frequency representation of audio signal Let the STFT of a signal x with a window w be defined as L 1 (F wx)[m, n] = x[l + an] w[l] e 2πibml/L, (1) l=0 where z is the complex conjugate of z, i = 1 is the imaginary unit, L is the window length, n and m are the time and frequency indices, and a and b are the time and frequency shifting steps, respectively. By denoting the inverse STFT F w, the reconstruction formula of STFT can be represented as x = F w F wx, where w is a suitable synthesis window associated with w, or the dual window [18 21] of w. For the sake of simplicity, only the Paseval tight case is considered in this paper, i.e., the window is self-dual w = w (the same window can be used in both analysis and synthesis to reconstruct the signal x = F w F wx). A spectrogram corresponding to x will be denoted by X[m, n] (= (F wx)[m, n]) for convenience Speech enhancement based on amplitude restoration One of the most popular strategies for enhancing audio signals is filtering in the time-frequency domain. By multiplying some scalar, so-called the time-frequency mask M[m, n], to each bin of the spectrogram X[m, n] and taking inverse STFT, F w (M X), a nonstationary filter can be approximately realized, where represents the element-wise multiplication. Ordinarily, in the acoustical applications, this bin-wise scalar M[m, n] (which may also be called spectral gain or Gabor multiplier) is treated as a nonnegative real number, that is, only amplitude of the spectrogram is modified. This practice stems from multiple reasons including the optimality in the sense of minimum mean square error estimates [1]. However, every spectrogram consists of not only amplitude but also phase which is essential for recovering the time-domain signals. Amplitude-only restoration of spectrogram results in contaminated signal, even when the recovered amplitude is perfect, owing to the reconstruction of the time-domain signal by inverse STFT using noisy phase Phase recovery by Griffin Lim algorithm Recently, the importance of restoring phase spectrogram gains considerable attentions through the pioneering studies [1 3], which emerges the field of phase-aware signal processing and modeling of complex spectrograms [22 24]. One of the most famous algorithms for obtaining a better phase spectrogram from the corresponding amplitude is the Griffin Lim algorithm [4]. This algorithm imposes two expectations upon the target spectrogram: The resulting spectrogram should (1) maintain the given amplitude and (2) have minimum norm among the all possible spectrograms corresponding to their time-domain counterpart. The latter condition is often called consistency, and therefore a method based on it is categorized into the consistency-based phase recovery [4 6]. The Griffin Lim algorithm implements the above expectations by the alternating projections with a hope for acquiring better phase 1 : X [k+1] = P A(P C(X [k] )), (2) 1 Note that, in general, those two expectations cannot be met simultaneously, and therefore the Griffin Lim algorithm does not ensure optimality in those senses. Indeed, Eq. (2) can be interpreted as a projected gradient algorithm with the relaxed consistency criterion, and thus consistency is not supposed to be satisfied. In this paper, such detail of the algorithm is omitted because the objective of the paper is to demonstrate the possibility of considering the time-domain nonlinearity in phase recovery and not to propose a new algorithm. For some examples of algorithmic investigation, see [10, 25]. where P S is the metric projection onto a set S [26], P S(X) = arg min X Y, (3) Y S is the Euclidean norm, k is an iteration index, A is a set of spectrograms X whose amplitude are equal to a given nonnegative value a[m, n] 0, i.e., X[m, n] = a[m, n], and C is a set of consistent spectrograms X = F wf w X (the set of fixed points of F wf w ). These projections onto the sets C and A are given by P C(X) = F wf w X, (4) P A(X) = a X X, (5) where, and represent element-wise absolute value, multiplication and division, respectively, and the result of division is replaced by zero when X[m, n] = 0. While the Griffin Lim algorithm has been successfully applied to a number of applications, its poor adaptability to a specific situation might have been restricted the practical performance. As the algorithm pays attention to the consistency only, and no applicationspecific structure is considered in the projections, it is presumed that incorporating some data-specific structure can contribute to improve the quality of estimated phase. In the next section, the time-domain ReLU is introduced for considering the harmonic structure of audio signals within the Griffin Lim algorithm. 3. GRIFFIN LIM ALGORITHM ASSISTED BY RECTIFIED LINEAR UNIT Some audio signals including speech have a specific structure of harmonics. In this section, a combination of the Griffin Lim algorithm and the time-domain ReLU is proposed with the hope of capturing a structure similar to that of speech signals Nonlinear harmonic regeneration Spectrograms of speech and audio signals are often comprised of harmonic components whose frequencies are integer multiple of the fundamental frequency. This well-known structure, the harmonic structure, has been utilized in many signal processing methods especially in speech enhancement. One notable use of such structure is the harmonic regeneration technique [13 15] which artificially generates harmonics from enhanced speech signals for obtaining a better estimate of a priori signal-to-noise ratio (SNR). For generating the harmonics artificially, a nonlinear function is applied in the time domain. The half-wave rectifier, namely ReLU, is a typical choice for such time-domain nonlinearity [13 15], ReLU(x) = max{x, 0 }, (6) where the maximum operator is evaluated element-wise. By clipping the negative components, harmonics are generated as illustrated in Fig. 1, where the horizontal green and light blue bands in the middle row represents the magnitude of generated harmonics. The important observation is that the phase spectrogram of the rectified sinusoid (in the bottom row of Fig. 1) has a certain structured pattern, corresponding to the generated harmonics, which does not exist in the original signal. That is, phase of the harmonics can be aligned based on the fundamental-frequency component through the time-domain nonlinear operation. Although the relationship between the phase of natural audio signals and this artificially generated pattern is not so clear, it might be possible to improve phase 556 2

3 recovery because the fundamental-frequency component is often the largest component which should contain better information for the recovery. Then, the following question is raised naturally: Can an element-wise nonlinear operation in the time domain help a phase recovery algorithm to improve the performance? For experimentally investigating this question, a combination of the Griffin Lim algorithm and ReLU is proposed Proposed ReLU assisted Griffin Lim algorithm Here, time-domain ReLU is incorporated into the procedure of the Griffin Lim algorithm. To do so, the following projection onto the set related to time-domain rectified signals is introduced: PN (X) = Fw ReLU(Fw X), (7) where N is a set of the consistent spectrograms whose time domain counterpart is nonnegative. By replacing the projection corresponding to consistency PC in Eq. (2) with this ReLU combined variation PN, the ReLU assisted Griffin Lim algorithm (ReLU-GLA) is proposed as the following procedure: X [k+1] = PA (PN (X [k] )), Fig. 1. Illustration of amplitude/phase spectrogram corresponding to a sinusoid and its rectified counterpart (from top to bottom: timedomain signal, amplitude spectrogram, and phase spectrogram). (8) where the only difference to the original algorithm in Eq. (2) is the additional ReLU term in Eq. (7) which does not exist in Eq. (4). This slight modification, which does not increase the computational complexity thanks to the extremely cheap evaluation of ReLU, can contribute to the quality of recovered phase to some extent as shown in the next section. Note that the additional nonlinear distortion imposed by this operation does not remain in each intermediate result X [k+1] because the projection onto the given amplitude spectrogram PA completely removes such distortion by replacing the amplitude to the predetermined values. That is, the generated harmonics only contributes to the phase, and therefore it is safe to choose any nonlinear operation in the time domain. Here, ReLU was chosen for just a representative example, and any other nonlinearity can be incorporated in the totally same manner STOI EXPERIMENT In order to investigate the question raised in Section 3.1, a numerical experiment was performed. Test signals consisting of 100 male and 100 female speech signals [1], obtained from the TIMIT database [27], were corrupted by the additive Gaussian noise whose amplitude was adjusted so that the SNRs of the simulated signals became 5, 10, 15 or 20 db. The Gaussian noise was generated 10 times for each speech signal, and thus, in total, 2000 noisy signals were utilized for each SNR. These noisy signals were enhanced by the Wiener filters constructed in the oracle condition (both signal and noise power at each time-frequency bin were known), where the STFT is calculated by the canonical tight variant of the 32 ms Hann window shifted by 16 ms. The iteration of the Griffin Lim and the proposed algorithms were started from the observed noisy phase, and the projection PA enforces the amplitude spectrogram to be the Wiener filtered one. For the evaluation, scores of the shorttime objective intelligibility (STOI) [28] was calculated as a perceptual measure of enhanced speech signals Number of iteration Fig. 2. Average scores of STOI for each iteration. The blue lines indicate the scores of ordinary Griffin Lim algorithm, while the red lines are those of the proposed ReLU-GLA. SNRs of input signals corresponding to each line are written within the figure. The experimental results summarized by the average of STOI scores of the 2000 noisy speech signals for each SNR are illustrated in Fig. 2, where the blue lines indicate the conventional GLA, and the red ones correspond to the proposed ReLU-GLA. For all four SNRs, the proposed ReLU-GLA attained the better scores at the first iteration, and then the difference of the scores of both methods seems to decrease as the iteration number increases. Although this result indicates that some positive effects of incorporating the time-domain ReLU into the phase recovery algorithm exist, the effect for each speech signal cannot be confirmed from this figure because each line represents the average of the 2000 trials. Therefore, the results are further illustrated by histograms for contrasting individual effects. Since the difference of the scores diminished for the large iteration numbers, the results of the first and 10th iteration are utilized to construct the histograms in the next page. 2 STOI was chosen in this paper because the performance of phase recovery cannot be measured by quantity sensitive to the difference of constant phase such as SNR. Other popular measures including PESQ were not calculated because, unfortunately, the first author only had extremely limited time before the deadlines of the initial and revisional submissions

4 Frequency Frequency Difference of STOI improvement Difference of STOI improvement Fig. 3. Histograms of difference of STOI improvement. STOI of the proposed ReLU assisted algorithm was subtracted by that of the ordinary Griffin Lim algorithm. Both algorithms were iterated once from the initial values, i.e., these results were obtained by the singleshot projection PA (PC (XWiener )) and PA (PN (XWiener )). The vertical red lines represent the position of 0, and therefore the bars at the right side of these red lines indicate the results where the proposed method was better than the conventional one. Fig. 4. Histograms of difference of STOI improvement. The algorithms were iterated 10 times from the initial values. is correct, then one can consider a more sophisticated nonlinearity which shapes the waveform closer to the target signals, maybe by learning from a dataset, to obtain a better phase recovery method. The reason for the diminishing phenomenon of the effect of nonlinearity should be because the Griffin Lim algorithm does not consider the observed phase within its procedure. As in Eq. (2), and also in Eq. (8), the phase spectrogram is modified without considering the observed phase. That is, the phase is close to the observed one only in the first few iterations where the effect of the initial value remain, and the resulting phase after a number of iterations is not directly related to the observation. A phase recovery method considering data fidelity to phase, unlike the Griffin Lim algorithm, might be possible to receive more benefit from the time-domain ReLU, or any other nonlinearity, and therefore seeking such algorithm together with an effective time-domain nonlinear function for harmonic regeneration should be the next direction of the research. The histograms of difference of STOI scores are shown in Figs. 3 and 4. For each signal, the score of the proposed algorithm was subtracted by that of the conventional GLA for clarifying the difference between them. Therefore, the center of the horizontal axis (represented by the vertical red line) means that the improvements of STOI achieved by both algorithms were the same. The positive value in the horizontal axis (right side of the red line) indicates that the proposed ReLU-GLA was better than the conventional one, and the negative value indicates the opposite situation. From Fig. 3, it can be confirmed that the proposed algorithm improved most of the test samples at the first iteration than the conventional GLA. That is, the single-shot projection PA (PN (XWiener )) improved STOI more than the conventional projection PA (PC (XWiener )), where XWiener represents the noisy spectrogram whose amplitude was enhanced by the Wiener filter. This result is important because projecting the Wienerfiltered data once may improve the intelligibility without the pain of iteration. Indeed, STOI scores of all 8000 samples (2000 per SNR) were improved from those of the initial values XWiener with observed phase. Although the effect of the time-domain nonlinearity diminished after some iterations, its positive effect can also be seen in the 10th iteration as shown in Fig. 4. These results indicated that the time-domain ReLU can assist the Griffin Lim algorithm in terms of STOI at the beginning of the iteration, and its effect remains in some iterations. The reason for this positive effect of ReLU might be the pulsetrain-like waveform of rectified signals as in Fig. 1. As considered in the source-filter model, speech signals consist of a sequence of pulses. Then, an appropriate phase for a speech signal should recover such sequential pulses, while an inappropriate one may not correspond to pulses. The time-domain ReLU might have a power to align the phase of the harmonics so that the waveform in time domain becomes more pulse-like as in Fig. 1. If the above discussion 5. CONCLUSIONS In this paper, inspired by the DNN-based phase recovery and the harmonic regeneration technique for speech enhancement, a variant of the well-known Griffin Lim algorithm combined with the timedomain ReLU was proposed. The effectiveness of the time-domain nonlinearity for speech denoising in terms of STOI was experimentally confirmed. The experimental results shed light on the possibility of utilizing such time-domain nonlinear function within a signal reconstruction process (or utilizing inverse-stft layer within the network in the words of DNN). Both ReLU and the Griffin-Lim algorithm are just one example of the possibilities, and searching a better combination as well as a DNN model containing time-domain representation within the network is remained as the future works. 6. ACKNOWLEDGMENT The first author would like to thank Dr. Ryoichi Miyazaki for his support on prior works and helpful comments on the time-domain nonlinear operation in speech enhancement

5 7. REFERENCES [1] P. Mowlaee, J. Kulmer, J. Stahl, and F. Mayer, Single Channel Phase-Aware Signal Processing in Speech Communication: Theory and Practice, Wiley, [2] T. Gerkmann, M. Krawczyk-Becker, and J. Le Roux, Phase processing for single-channel speech enhancement: History and recent advances, IEEE Signal Process. Mag., vol. 32, no. 2, pp , Mar [3] P. Mowlaee, R. Saeidi, and Y. Stylianou, Advances in phaseaware signal processing in speech communication, Speech Commun., vol. 81, pp. 1 29, [4] D. Griffin and J. Lim, Signal estimation from modified shorttime Fourier transform, IEEE Trans. Acoust., Speech, Signal Process., vol. 32, no. 2, pp , Apr [5] J. Le Roux, H. Kameoka, N. Ono, and S. Sagayama, Fast signal reconstruction from magnitude STFT spectrogram based on spectrogram consistency, in Int. Conf. Digital Audio Effects (DAFx-10), Sep [6] N. Perraudin, P. Balazs, and P. L. Søndergaard, A fast griffinlim algorithm, in IEEE Workshop Appl. Signal Process. Audio Acoust. (WASPAA), Oct 2013, pp [7] M. Krawczyk and T. Gerkmann, STFT phase reconstruction in voiced speech for an improved single-channel speech enhancement, IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 22, no. 12, pp , Dec [8] Y. Wakabayashi, T. Fukumori, M. Nakayama, T. Nishiura, and Y. Yamashita, Single-channel speech enhancement with phase reconstruction based on phase distortion averaging, IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 26, no. 9, pp , Sep [9] P. Magron, R. Badeau, and B. David, Model-based stft phase recovery for audio source separation, IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 26, no. 6, pp , June [10] Y. Masuyama, K. Yatabe, and Y. Oikawa, Model-based phase recovery of spectrograms via optimization on Riemannian manifolds, in Int. Workshop Acoust. Signal Enhance. (IWAENC), Sep [11] K. Oyamada, H. Kameoka, T. Kaneko, K. Tanaka, N. Hojo, and H. Ando, Generative adversarial network-based approach to signal reconstruction from magnitude spectrograms, arxiv: , Sep [12] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, MIT Press, [13] C. Plapous, C. Marro, and P. Scalart, Improved signal-tonoise ratio estimation for speech enhancement, IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 6, pp , Nov [14] M. Une and R. Miyazaki, Evaluation of sound quality and speech recognition performance using harmonic regeneration for various noise reduction techniques, in RISP Int. Workshop Nonlinear Circuits, Commun Signal Process. (NCSP), Mar. 2017, pp [15] M. Une and R. Miyazaki, Musical-noise-free speech enhancement with low speech distortion by biased harmonic regeneration technique, in Int. Workshop Acoust. Signal Enhance. (IWAENC), Sep [16] X. Glorot, A. Bordes, and Y. Bengio, Deep sparse rectifier neural networks, in Proc. Fourteenth Int. Conf. Artif. Intell. Stat., Apr. 2011, vol. 15, pp [17] S. Sonoda and N. Murata, Neural network with unbounded activation functions is universal approximator, Appl. Comput. Harmon. Anal., vol. 43, no. 2, pp , [18] H. G. Feichtinger and T. Strohmer, Eds., Gabor Analysis and Algorithms: Theory and Applications, Birkhäuser Boston, Boston, MA, [19] K. Gröchenig, Foundations of Time-Frequency Analysis, Birkhäuser Boston, Boston, MA, [20] P. L. Søndergaard, Gabor frames by sampling and periodization, Adv. Comput. Math., vol. 27, no. 4, pp , [21] O. Christensen, Frames and Bases: An Introductory Course, Birkhauser, [22] K. Yatabe and Y. Oikawa, Phase corrected total variation for audio signals, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2018, pp [23] K. Yatabe and D. Kitamura, Determined blind source separation via proximal splitting algorithm, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2018, pp [24] Y. Masuyama, K. Yatabe, and Y. Oikawa, Low-rankness of complex-valued spectrogram and its application to phaseaware audio processing, (submitted). [25] Y. Masuyama, K. Yatabe, and Y. Oikawa, Griffin Lim like phase recovery via alternating direction method of multipliers, (submitted). [26] A. Cegielski, Iterative Methods for Fixed Point Problems in Hilbert Spaces, Springer, [27] J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallett, and N. L. Dahlgren, DARPA TIMIT Acoustic Phonetic Continuous Speech Corpus CDROM, NIST, [28] C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen, An algorithm for intelligibility prediction of time-frequency weighted noisy speech, IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 7, pp , Sep

ON ADVERSARIAL TRAINING AND LOSS FUNCTIONS FOR SPEECH ENHANCEMENT. Ashutosh Pandey 1 and Deliang Wang 1,2. {pandey.99, wang.5664,

ON ADVERSARIAL TRAINING AND LOSS FUNCTIONS FOR SPEECH ENHANCEMENT. Ashutosh Pandey 1 and Deliang Wang 1,2. {pandey.99, wang.5664, ON ADVERSARIAL TRAINING AND LOSS FUNCTIONS FOR SPEECH ENHANCEMENT Ashutosh Pandey and Deliang Wang,2 Department of Computer Science and Engineering, The Ohio State University, USA 2 Center for Cognitive

More information

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS Emad M. Grais and Hakan Erdogan Faculty of Engineering and Natural Sciences, Sabanci University, Orhanli

More information

MULTI-RESOLUTION SIGNAL DECOMPOSITION WITH TIME-DOMAIN SPECTROGRAM FACTORIZATION. Hirokazu Kameoka

MULTI-RESOLUTION SIGNAL DECOMPOSITION WITH TIME-DOMAIN SPECTROGRAM FACTORIZATION. Hirokazu Kameoka MULTI-RESOLUTION SIGNAL DECOMPOSITION WITH TIME-DOMAIN SPECTROGRAM FACTORIZATION Hiroazu Kameoa The University of Toyo / Nippon Telegraph and Telephone Corporation ABSTRACT This paper proposes a novel

More information

SINGLE-CHANNEL SPEECH PRESENCE PROBABILITY ESTIMATION USING INTER-FRAME AND INTER-BAND CORRELATIONS

SINGLE-CHANNEL SPEECH PRESENCE PROBABILITY ESTIMATION USING INTER-FRAME AND INTER-BAND CORRELATIONS 204 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) SINGLE-CHANNEL SPEECH PRESENCE PROBABILITY ESTIMATION USING INTER-FRAME AND INTER-BAND CORRELATIONS Hajar Momeni,2,,

More information

Proc. of NCC 2010, Chennai, India

Proc. of NCC 2010, Chennai, India Proc. of NCC 2010, Chennai, India Trajectory and surface modeling of LSF for low rate speech coding M. Deepak and Preeti Rao Department of Electrical Engineering Indian Institute of Technology, Bombay

More information

BIAS CORRECTION METHODS FOR ADAPTIVE RECURSIVE SMOOTHING WITH APPLICATIONS IN NOISE PSD ESTIMATION. Robert Rehr, Timo Gerkmann

BIAS CORRECTION METHODS FOR ADAPTIVE RECURSIVE SMOOTHING WITH APPLICATIONS IN NOISE PSD ESTIMATION. Robert Rehr, Timo Gerkmann BIAS CORRECTION METHODS FOR ADAPTIVE RECURSIVE SMOOTHING WITH APPLICATIONS IN NOISE PSD ESTIMATION Robert Rehr, Timo Gerkmann Speech Signal Processing Group, Department of Medical Physics and Acoustics

More information

MANY digital speech communication applications, e.g.,

MANY digital speech communication applications, e.g., 406 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 2, FEBRUARY 2007 An MMSE Estimator for Speech Enhancement Under a Combined Stochastic Deterministic Speech Model Richard C.

More information

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement Simon Leglaive 1 Laurent Girin 1,2 Radu Horaud 1 1: Inria Grenoble Rhône-Alpes 2: Univ. Grenoble Alpes, Grenoble INP,

More information

2D Spectrogram Filter for Single Channel Speech Enhancement

2D Spectrogram Filter for Single Channel Speech Enhancement Proceedings of the 7th WSEAS International Conference on Signal, Speech and Image Processing, Beijing, China, September 15-17, 007 89 D Spectrogram Filter for Single Channel Speech Enhancement HUIJUN DING,

More information

Adapting Wavenet for Speech Enhancement DARIO RETHAGE JULY 12, 2017

Adapting Wavenet for Speech Enhancement DARIO RETHAGE JULY 12, 2017 Adapting Wavenet for Speech Enhancement DARIO RETHAGE JULY 12, 2017 I am v Master Student v 6 months @ Music Technology Group, Universitat Pompeu Fabra v Deep learning for acoustic source separation v

More information

COLLABORATIVE SPEECH DEREVERBERATION: REGULARIZED TENSOR FACTORIZATION FOR CROWDSOURCED MULTI-CHANNEL RECORDINGS. Sanna Wager, Minje Kim

COLLABORATIVE SPEECH DEREVERBERATION: REGULARIZED TENSOR FACTORIZATION FOR CROWDSOURCED MULTI-CHANNEL RECORDINGS. Sanna Wager, Minje Kim COLLABORATIVE SPEECH DEREVERBERATION: REGULARIZED TENSOR FACTORIZATION FOR CROWDSOURCED MULTI-CHANNEL RECORDINGS Sanna Wager, Minje Kim Indiana University School of Informatics, Computing, and Engineering

More information

Exploring the Relationship between Conic Affinity of NMF Dictionaries and Speech Enhancement Metrics

Exploring the Relationship between Conic Affinity of NMF Dictionaries and Speech Enhancement Metrics Interspeech 2018 2-6 September 2018, Hyderabad Exploring the Relationship between Conic Affinity of NMF Dictionaries and Speech Enhancement Metrics Pavlos Papadopoulos, Colin Vaz, Shrikanth Narayanan Signal

More information

Matrix Factorization for Speech Enhancement

Matrix Factorization for Speech Enhancement Matrix Factorization for Speech Enhancement Peter Li Peter.Li@nyu.edu Yijun Xiao ryjxiao@nyu.edu 1 Introduction In this report, we explore techniques for speech enhancement using matrix factorization.

More information

MULTISENSORY SPEECH ENHANCEMENT IN NOISY ENVIRONMENTS USING BONE-CONDUCTED AND AIR-CONDUCTED MICROPHONES. Mingzi Li,Israel Cohen and Saman Mousazadeh

MULTISENSORY SPEECH ENHANCEMENT IN NOISY ENVIRONMENTS USING BONE-CONDUCTED AND AIR-CONDUCTED MICROPHONES. Mingzi Li,Israel Cohen and Saman Mousazadeh MULTISENSORY SPEECH ENHANCEMENT IN NOISY ENVIRONMENTS USING BONE-CONDUCTED AND AIR-CONDUCTED MICROPHONES Mingzi Li,Israel Cohen and Saman Mousazadeh Department of Electrical Engineering, Technion - Israel

More information

Covariance smoothing and consistent Wiener filtering for artifact reduction in audio source separation

Covariance smoothing and consistent Wiener filtering for artifact reduction in audio source separation Covariance smoothing and consistent Wiener filtering for artifact reduction in audio source separation Emmanuel Vincent METISS Team Inria Rennes - Bretagne Atlantique E. Vincent (Inria) Artifact reduction

More information

SPEECH enhancement has been studied extensively as a

SPEECH enhancement has been studied extensively as a JOURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2017 1 Phase-Aware Speech Enhancement Based on Deep Neural Networks Naijun Zheng and Xiao-Lei Zhang Abstract Short-time frequency transform STFT)

More information

A NEURAL NETWORK ALTERNATIVE TO NON-NEGATIVE AUDIO MODELS. University of Illinois at Urbana-Champaign Adobe Research

A NEURAL NETWORK ALTERNATIVE TO NON-NEGATIVE AUDIO MODELS. University of Illinois at Urbana-Champaign Adobe Research A NEURAL NETWORK ALTERNATIVE TO NON-NEGATIVE AUDIO MODELS Paris Smaragdis, Shrikant Venkataramani University of Illinois at Urbana-Champaign Adobe Research ABSTRACT We present a neural network that can

More information

On Spectral Basis Selection for Single Channel Polyphonic Music Separation

On Spectral Basis Selection for Single Channel Polyphonic Music Separation On Spectral Basis Selection for Single Channel Polyphonic Music Separation Minje Kim and Seungjin Choi Department of Computer Science Pohang University of Science and Technology San 31 Hyoja-dong, Nam-gu

More information

Consistent Wiener Filtering for Audio Source Separation

Consistent Wiener Filtering for Audio Source Separation MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Consistent Wiener Filtering for Audio Source Separation Le Roux, J.; Vincent, E. TR2012-090 October 2012 Abstract Wiener filtering is one of

More information

A SUBSPACE METHOD FOR SPEECH ENHANCEMENT IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes

A SUBSPACE METHOD FOR SPEECH ENHANCEMENT IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes A SUBSPACE METHOD FOR SPEECH ENHANCEMENT IN THE MODULATION DOMAIN Yu ang and Mike Brookes Department of Electrical and Electronic Engineering, Exhibition Road, Imperial College London, UK Email: {yw09,

More information

arxiv: v1 [stat.ml] 31 Oct 2016

arxiv: v1 [stat.ml] 31 Oct 2016 Full-Capacity Unitary Recurrent Neural Networks arxiv:1611.00035v1 [stat.ml] 31 Oct 2016 Scott Wisdom 1, Thomas Powers 1, John R. Hershey 2, Jonathan Le Roux 2, and Les Atlas 1 1 Department of Electrical

More information

A SPEECH PRESENCE PROBABILITY ESTIMATOR BASED ON FIXED PRIORS AND A HEAVY-TAILED SPEECH MODEL

A SPEECH PRESENCE PROBABILITY ESTIMATOR BASED ON FIXED PRIORS AND A HEAVY-TAILED SPEECH MODEL A SPEECH PRESENCE PROBABILITY ESTIMATOR BASED ON FIXED PRIORS AND A HEAVY-TAILED SPEECH MODEL Balázs Fodor Institute for Communications Technology Technische Universität Braunschweig 386 Braunschweig,

More information

Joint Filtering and Factorization for Recovering Latent Structure from Noisy Speech Data

Joint Filtering and Factorization for Recovering Latent Structure from Noisy Speech Data Joint Filtering and Factorization for Recovering Latent Structure from Noisy Speech Data Colin Vaz, Vikram Ramanarayanan, and Shrikanth Narayanan Ming Hsieh Department of Electrical Engineering University

More information

Nonnegative Matrix Factorization with Markov-Chained Bases for Modeling Time-Varying Patterns in Music Spectrograms

Nonnegative Matrix Factorization with Markov-Chained Bases for Modeling Time-Varying Patterns in Music Spectrograms Nonnegative Matrix Factorization with Markov-Chained Bases for Modeling Time-Varying Patterns in Music Spectrograms Masahiro Nakano 1, Jonathan Le Roux 2, Hirokazu Kameoka 2,YuKitano 1, Nobutaka Ono 1,

More information

Bayesian Estimation of Time-Frequency Coefficients for Audio Signal Enhancement

Bayesian Estimation of Time-Frequency Coefficients for Audio Signal Enhancement Bayesian Estimation of Time-Frequency Coefficients for Audio Signal Enhancement Patrick J. Wolfe Department of Engineering University of Cambridge Cambridge CB2 1PZ, UK pjw47@eng.cam.ac.uk Simon J. Godsill

More information

Non-Stationary Noise Power Spectral Density Estimation Based on Regional Statistics

Non-Stationary Noise Power Spectral Density Estimation Based on Regional Statistics Non-Stationary Noise Power Spectral Density Estimation Based on Regional Statistics Xiaofei Li, Laurent Girin, Sharon Gannot, Radu Horaud To cite this version: Xiaofei Li, Laurent Girin, Sharon Gannot,

More information

Optimal Speech Enhancement Under Signal Presence Uncertainty Using Log-Spectral Amplitude Estimator

Optimal Speech Enhancement Under Signal Presence Uncertainty Using Log-Spectral Amplitude Estimator 1 Optimal Speech Enhancement Under Signal Presence Uncertainty Using Log-Spectral Amplitude Estimator Israel Cohen Lamar Signal Processing Ltd. P.O.Box 573, Yokneam Ilit 20692, Israel E-mail: icohen@lamar.co.il

More information

Fast Angular Synchronization for Phase Retrieval via Incomplete Information

Fast Angular Synchronization for Phase Retrieval via Incomplete Information Fast Angular Synchronization for Phase Retrieval via Incomplete Information Aditya Viswanathan a and Mark Iwen b a Department of Mathematics, Michigan State University; b Department of Mathematics & Department

More information

Probabilistic Inference of Speech Signals from Phaseless Spectrograms

Probabilistic Inference of Speech Signals from Phaseless Spectrograms Probabilistic Inference of Speech Signals from Phaseless Spectrograms Kannan Achan, Sam T. Roweis, Brendan J. Frey Machine Learning Group University of Toronto Abstract Many techniques for complex speech

More information

Non-Negative Matrix Factorization And Its Application to Audio. Tuomas Virtanen Tampere University of Technology

Non-Negative Matrix Factorization And Its Application to Audio. Tuomas Virtanen Tampere University of Technology Non-Negative Matrix Factorization And Its Application to Audio Tuomas Virtanen Tampere University of Technology tuomas.virtanen@tut.fi 2 Contents Introduction to audio signals Spectrogram representation

More information

Audible sound field visualization by using Schlieren technique

Audible sound field visualization by using Schlieren technique Audible sound field visualization by using Schlieren technique Nachanant Chitanont, Kohei Yatabe and Yuhiro Oikawa Department of Intermedia Art and Science, Weda University, Tokyo, Japan Paper Number:

More information

Tensor-Train Long Short-Term Memory for Monaural Speech Enhancement

Tensor-Train Long Short-Term Memory for Monaural Speech Enhancement 1 Tensor-Train Long Short-Term Memory for Monaural Speech Enhancement Suman Samui, Indrajit Chakrabarti, and Soumya K. Ghosh, arxiv:1812.10095v1 [cs.sd] 25 Dec 2018 Abstract In recent years, Long Short-Term

More information

Gaussian Processes for Audio Feature Extraction

Gaussian Processes for Audio Feature Extraction Gaussian Processes for Audio Feature Extraction Dr. Richard E. Turner (ret26@cam.ac.uk) Computational and Biological Learning Lab Department of Engineering University of Cambridge Machine hearing pipeline

More information

NMF WITH SPECTRAL AND TEMPORAL CONTINUITY CRITERIA FOR MONAURAL SOUND SOURCE SEPARATION. Julian M. Becker, Christian Sohn and Christian Rohlfing

NMF WITH SPECTRAL AND TEMPORAL CONTINUITY CRITERIA FOR MONAURAL SOUND SOURCE SEPARATION. Julian M. Becker, Christian Sohn and Christian Rohlfing NMF WITH SPECTRAL AND TEMPORAL CONTINUITY CRITERIA FOR MONAURAL SOUND SOURCE SEPARATION Julian M. ecker, Christian Sohn Christian Rohlfing Institut für Nachrichtentechnik RWTH Aachen University D-52056

More information

Estimating Correlation Coefficient Between Two Complex Signals Without Phase Observation

Estimating Correlation Coefficient Between Two Complex Signals Without Phase Observation Estimating Correlation Coefficient Between Two Complex Signals Without Phase Observation Shigeki Miyabe 1B, Notubaka Ono 2, and Shoji Makino 1 1 University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki

More information

Sound Recognition in Mixtures

Sound Recognition in Mixtures Sound Recognition in Mixtures Juhan Nam, Gautham J. Mysore 2, and Paris Smaragdis 2,3 Center for Computer Research in Music and Acoustics, Stanford University, 2 Advanced Technology Labs, Adobe Systems

More information

Single Channel Signal Separation Using MAP-based Subspace Decomposition

Single Channel Signal Separation Using MAP-based Subspace Decomposition Single Channel Signal Separation Using MAP-based Subspace Decomposition Gil-Jin Jang, Te-Won Lee, and Yung-Hwan Oh 1 Spoken Language Laboratory, Department of Computer Science, KAIST 373-1 Gusong-dong,

More information

STRUCTURE-AWARE DICTIONARY LEARNING WITH HARMONIC ATOMS

STRUCTURE-AWARE DICTIONARY LEARNING WITH HARMONIC ATOMS 19th European Signal Processing Conference (EUSIPCO 2011) Barcelona, Spain, August 29 - September 2, 2011 STRUCTURE-AWARE DICTIONARY LEARNING WITH HARMONIC ATOMS Ken O Hanlon and Mark D.Plumbley Queen

More information

Modifying Voice Activity Detection in Low SNR by correction factors

Modifying Voice Activity Detection in Low SNR by correction factors Modifying Voice Activity Detection in Low SNR by correction factors H. Farsi, M. A. Mozaffarian, H.Rahmani Department of Electrical Engineering University of Birjand P.O. Box: +98-9775-376 IRAN hfarsi@birjand.ac.ir

More information

Estimation Error Bounds for Frame Denoising

Estimation Error Bounds for Frame Denoising Estimation Error Bounds for Frame Denoising Alyson K. Fletcher and Kannan Ramchandran {alyson,kannanr}@eecs.berkeley.edu Berkeley Audio-Visual Signal Processing and Communication Systems group Department

More information

A POSTERIORI SPEECH PRESENCE PROBABILITY ESTIMATION BASED ON AVERAGED OBSERVATIONS AND A SUPER-GAUSSIAN SPEECH MODEL

A POSTERIORI SPEECH PRESENCE PROBABILITY ESTIMATION BASED ON AVERAGED OBSERVATIONS AND A SUPER-GAUSSIAN SPEECH MODEL A POSTERIORI SPEECH PRESENCE PROBABILITY ESTIMATION BASED ON AVERAGED OBSERVATIONS AND A SUPER-GAUSSIAN SPEECH MODEL Balázs Fodor Institute for Communications Technology Technische Universität Braunschweig

More information

Estimation of the Optimum Rotational Parameter for the Fractional Fourier Transform Using Domain Decomposition

Estimation of the Optimum Rotational Parameter for the Fractional Fourier Transform Using Domain Decomposition Estimation of the Optimum Rotational Parameter for the Fractional Fourier Transform Using Domain Decomposition Seema Sud 1 1 The Aerospace Corporation, 4851 Stonecroft Blvd. Chantilly, VA 20151 Abstract

More information

Recovery of Compactly Supported Functions from Spectrogram Measurements via Lifting

Recovery of Compactly Supported Functions from Spectrogram Measurements via Lifting Recovery of Compactly Supported Functions from Spectrogram Measurements via Lifting Mark Iwen markiwen@math.msu.edu 2017 Friday, July 7 th, 2017 Joint work with... Sami Merhi (Michigan State University)

More information

EUSIPCO

EUSIPCO EUSIPCO 013 1569746769 SUBSET PURSUIT FOR ANALYSIS DICTIONARY LEARNING Ye Zhang 1,, Haolong Wang 1, Tenglong Yu 1, Wenwu Wang 1 Department of Electronic and Information Engineering, Nanchang University,

More information

A Log-Frequency Approach to the Identification of the Wiener-Hammerstein Model

A Log-Frequency Approach to the Identification of the Wiener-Hammerstein Model A Log-Frequency Approach to the Identification of the Wiener-Hammerstein Model The MIT Faculty has made this article openly available Please share how this access benefits you Your story matters Citation

More information

Spatially adaptive alpha-rooting in BM3D sharpening

Spatially adaptive alpha-rooting in BM3D sharpening Spatially adaptive alpha-rooting in BM3D sharpening Markku Mäkitalo and Alessandro Foi Department of Signal Processing, Tampere University of Technology, P.O. Box FIN-553, 33101, Tampere, Finland e-mail:

More information

Decompositions of frames and a new frame identity

Decompositions of frames and a new frame identity Decompositions of frames and a new frame identity Radu Balan a, Peter G. Casazza b, Dan Edidin c and Gitta Kutyniok d a Siemens Corporate Research, 755 College Road East, Princeton, NJ 08540, USA; b Department

More information

Robust Sound Event Detection in Continuous Audio Environments

Robust Sound Event Detection in Continuous Audio Environments Robust Sound Event Detection in Continuous Audio Environments Haomin Zhang 1, Ian McLoughlin 2,1, Yan Song 1 1 National Engineering Laboratory of Speech and Language Information Processing The University

More information

IMPROVED MULTI-MICROPHONE NOISE REDUCTION PRESERVING BINAURAL CUES

IMPROVED MULTI-MICROPHONE NOISE REDUCTION PRESERVING BINAURAL CUES IMPROVED MULTI-MICROPHONE NOISE REDUCTION PRESERVING BINAURAL CUES Andreas I. Koutrouvelis Richard C. Hendriks Jesper Jensen Richard Heusdens Circuits and Systems (CAS) Group, Delft University of Technology,

More information

Deep Learning: Approximation of Functions by Composition

Deep Learning: Approximation of Functions by Composition Deep Learning: Approximation of Functions by Composition Zuowei Shen Department of Mathematics National University of Singapore Outline 1 A brief introduction of approximation theory 2 Deep learning: approximation

More information

SEPARATION OF ACOUSTIC SIGNALS USING SELF-ORGANIZING NEURAL NETWORKS. Temujin Gautama & Marc M. Van Hulle

SEPARATION OF ACOUSTIC SIGNALS USING SELF-ORGANIZING NEURAL NETWORKS. Temujin Gautama & Marc M. Van Hulle SEPARATION OF ACOUSTIC SIGNALS USING SELF-ORGANIZING NEURAL NETWORKS Temujin Gautama & Marc M. Van Hulle K.U.Leuven, Laboratorium voor Neuro- en Psychofysiologie Campus Gasthuisberg, Herestraat 49, B-3000

More information

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6)

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement to the material discussed in

More information

Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic Approximation Algorithm

Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic Approximation Algorithm EngOpt 2008 - International Conference on Engineering Optimization Rio de Janeiro, Brazil, 0-05 June 2008. Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic

More information

Image Denoising using Uniform Curvelet Transform and Complex Gaussian Scale Mixture

Image Denoising using Uniform Curvelet Transform and Complex Gaussian Scale Mixture EE 5359 Multimedia Processing Project Report Image Denoising using Uniform Curvelet Transform and Complex Gaussian Scale Mixture By An Vo ISTRUCTOR: Dr. K. R. Rao Summer 008 Image Denoising using Uniform

More information

A Probability Model for Interaural Phase Difference

A Probability Model for Interaural Phase Difference A Probability Model for Interaural Phase Difference Michael I. Mandel, Daniel P.W. Ellis Department of Electrical Engineering Columbia University, New York, New York {mim,dpwe}@ee.columbia.edu Abstract

More information

Scalable audio separation with light Kernel Additive Modelling

Scalable audio separation with light Kernel Additive Modelling Scalable audio separation with light Kernel Additive Modelling Antoine Liutkus 1, Derry Fitzgerald 2, Zafar Rafii 3 1 Inria, Université de Lorraine, LORIA, UMR 7503, France 2 NIMBUS Centre, Cork Institute

More information

JOINT ACOUSTIC AND SPECTRAL MODELING FOR SPEECH DEREVERBERATION USING NON-NEGATIVE REPRESENTATIONS

JOINT ACOUSTIC AND SPECTRAL MODELING FOR SPEECH DEREVERBERATION USING NON-NEGATIVE REPRESENTATIONS JOINT ACOUSTIC AND SPECTRAL MODELING FOR SPEECH DEREVERBERATION USING NON-NEGATIVE REPRESENTATIONS Nasser Mohammadiha Paris Smaragdis Simon Doclo Dept. of Medical Physics and Acoustics and Cluster of Excellence

More information

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood

More information

Approximately dual frames in Hilbert spaces and applications to Gabor frames

Approximately dual frames in Hilbert spaces and applications to Gabor frames Approximately dual frames in Hilbert spaces and applications to Gabor frames Ole Christensen and Richard S. Laugesen October 22, 200 Abstract Approximately dual frames are studied in the Hilbert space

More information

Over-enhancement Reduction in Local Histogram Equalization using its Degrees of Freedom. Alireza Avanaki

Over-enhancement Reduction in Local Histogram Equalization using its Degrees of Freedom. Alireza Avanaki Over-enhancement Reduction in Local Histogram Equalization using its Degrees of Freedom Alireza Avanaki ABSTRACT A well-known issue of local (adaptive) histogram equalization (LHE) is over-enhancement

More information

Application of the Tuned Kalman Filter in Speech Enhancement

Application of the Tuned Kalman Filter in Speech Enhancement Application of the Tuned Kalman Filter in Speech Enhancement Orchisama Das, Bhaswati Goswami and Ratna Ghosh Department of Instrumentation and Electronics Engineering Jadavpur University Kolkata, India

More information

Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs

Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs Paris Smaragdis TR2004-104 September

More information

Analysis of Communication Systems Using Iterative Methods Based on Banach s Contraction Principle

Analysis of Communication Systems Using Iterative Methods Based on Banach s Contraction Principle Analysis of Communication Systems Using Iterative Methods Based on Banach s Contraction Principle H. Azari Soufiani, M. J. Saberian, M. A. Akhaee, R. Nasiri Mahallati, F. Marvasti Multimedia Signal, Sound

More information

CONVOLUTIVE NON-NEGATIVE MATRIX FACTORISATION WITH SPARSENESS CONSTRAINT

CONVOLUTIVE NON-NEGATIVE MATRIX FACTORISATION WITH SPARSENESS CONSTRAINT CONOLUTIE NON-NEGATIE MATRIX FACTORISATION WITH SPARSENESS CONSTRAINT Paul D. O Grady Barak A. Pearlmutter Hamilton Institute National University of Ireland, Maynooth Co. Kildare, Ireland. ABSTRACT Discovering

More information

EMPLOYING PHASE INFORMATION FOR AUDIO DENOISING. İlker Bayram. Istanbul Technical University, Istanbul, Turkey

EMPLOYING PHASE INFORMATION FOR AUDIO DENOISING. İlker Bayram. Istanbul Technical University, Istanbul, Turkey EMPLOYING PHASE INFORMATION FOR AUDIO DENOISING İlker Bayram Istanbul Technical University, Istanbul, Turkey ABSTRACT Spectral audio denoising methods usually make use of the magnitudes of a time-frequency

More information

Denoising Gabor Transforms

Denoising Gabor Transforms 1 Denoising Gabor Transforms James S. Walker Abstract We describe denoising one-dimensional signals by thresholding Blackman windowed Gabor transforms. This method is compared with Gauss-windowed Gabor

More information

IMPROVEMENTS IN MODAL PARAMETER EXTRACTION THROUGH POST-PROCESSING FREQUENCY RESPONSE FUNCTION ESTIMATES

IMPROVEMENTS IN MODAL PARAMETER EXTRACTION THROUGH POST-PROCESSING FREQUENCY RESPONSE FUNCTION ESTIMATES IMPROVEMENTS IN MODAL PARAMETER EXTRACTION THROUGH POST-PROCESSING FREQUENCY RESPONSE FUNCTION ESTIMATES Bere M. Gur Prof. Christopher Niezreci Prof. Peter Avitabile Structural Dynamics and Acoustic Systems

More information

Independent Component Analysis and Unsupervised Learning

Independent Component Analysis and Unsupervised Learning Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien National Cheng Kung University TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent

More information

Classification of Hand-Written Digits Using Scattering Convolutional Network

Classification of Hand-Written Digits Using Scattering Convolutional Network Mid-year Progress Report Classification of Hand-Written Digits Using Scattering Convolutional Network Dongmian Zou Advisor: Professor Radu Balan Co-Advisor: Dr. Maneesh Singh (SRI) Background Overview

More information

Dept. Electronics and Electrical Engineering, Keio University, Japan. NTT Communication Science Laboratories, NTT Corporation, Japan.

Dept. Electronics and Electrical Engineering, Keio University, Japan. NTT Communication Science Laboratories, NTT Corporation, Japan. JOINT SEPARATION AND DEREVERBERATION OF REVERBERANT MIXTURES WITH DETERMINED MULTICHANNEL NON-NEGATIVE MATRIX FACTORIZATION Hideaki Kagami, Hirokazu Kameoka, Masahiro Yukawa Dept. Electronics and Electrical

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

NONNEGATIVE MATRIX FACTORIZATION WITH TRANSFORM LEARNING. Dylan Fagot, Herwig Wendt and Cédric Févotte

NONNEGATIVE MATRIX FACTORIZATION WITH TRANSFORM LEARNING. Dylan Fagot, Herwig Wendt and Cédric Févotte NONNEGATIVE MATRIX FACTORIZATION WITH TRANSFORM LEARNING Dylan Fagot, Herwig Wendt and Cédric Févotte IRIT, Université de Toulouse, CNRS, Toulouse, France firstname.lastname@irit.fr ABSTRACT Traditional

More information

Phase-dependent anisotropic Gaussian model for audio source separation

Phase-dependent anisotropic Gaussian model for audio source separation Phase-dependent anisotropic Gaussian model for audio source separation Paul Magron, Roland Badeau, Bertrand David To cite this version: Paul Magron, Roland Badeau, Bertrand David. Phase-dependent anisotropic

More information

Finite Frame Quantization

Finite Frame Quantization Finite Frame Quantization Liam Fowl University of Maryland August 21, 2018 1 / 38 Overview 1 Motivation 2 Background 3 PCM 4 First order Σ quantization 5 Higher order Σ quantization 6 Alternative Dual

More information

arxiv: v1 [physics.optics] 5 Mar 2012

arxiv: v1 [physics.optics] 5 Mar 2012 Designing and using prior knowledge for phase retrieval Eliyahu Osherovich, Michael Zibulevsky, and Irad Yavneh arxiv:1203.0879v1 [physics.optics] 5 Mar 2012 Computer Science Department, Technion Israel

More information

Phoneme segmentation based on spectral metrics

Phoneme segmentation based on spectral metrics Phoneme segmentation based on spectral metrics Xianhua Jiang, Johan Karlsson, and Tryphon T. Georgiou Introduction We consider the classic problem of segmenting speech signals into individual phonemes.

More information

COMPLEX WAVELET TRANSFORM IN SIGNAL AND IMAGE ANALYSIS

COMPLEX WAVELET TRANSFORM IN SIGNAL AND IMAGE ANALYSIS COMPLEX WAVELET TRANSFORM IN SIGNAL AND IMAGE ANALYSIS MUSOKO VICTOR, PROCHÁZKA ALEŠ Institute of Chemical Technology, Department of Computing and Control Engineering Technická 905, 66 8 Prague 6, Cech

More information

THE task of identifying the environment in which a sound

THE task of identifying the environment in which a sound 1 Feature Learning with Matrix Factorization Applied to Acoustic Scene Classification Victor Bisot, Romain Serizel, Slim Essid, and Gaël Richard Abstract In this paper, we study the usefulness of various

More information

MULTIPITCH ESTIMATION AND INSTRUMENT RECOGNITION BY EXEMPLAR-BASED SPARSE REPRESENTATION. Ikuo Degawa, Kei Sato, Masaaki Ikehara

MULTIPITCH ESTIMATION AND INSTRUMENT RECOGNITION BY EXEMPLAR-BASED SPARSE REPRESENTATION. Ikuo Degawa, Kei Sato, Masaaki Ikehara MULTIPITCH ESTIMATION AND INSTRUMENT RECOGNITION BY EXEMPLAR-BASED SPARSE REPRESENTATION Ikuo Degawa, Kei Sato, Masaaki Ikehara EEE Dept. Keio University Yokohama, Kanagawa 223-8522 Japan E-mail:{degawa,

More information

Real-Time Spectrogram Inversion Using Phase Gradient Heap Integration

Real-Time Spectrogram Inversion Using Phase Gradient Heap Integration Real-Time Spectrogram Inversion Using Phase Gradient Heap Integration Zdeněk Průša 1 and Peter L. Søndergaard 2 1,2 Acoustics Research Institute, Austrian Academy of Sciences, Vienna, Austria 2 Oticon

More information

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?

More information

Spectral Domain Speech Enhancement using HMM State-Dependent Super-Gaussian Priors

Spectral Domain Speech Enhancement using HMM State-Dependent Super-Gaussian Priors IEEE SIGNAL PROCESSING LETTERS 1 Spectral Domain Speech Enhancement using HMM State-Dependent Super-Gaussian Priors Nasser Mohammadiha, Student Member, IEEE, Rainer Martin, Fellow, IEEE, and Arne Leijon,

More information

SIMULTANEOUS NOISE CLASSIFICATION AND REDUCTION USING A PRIORI LEARNED MODELS

SIMULTANEOUS NOISE CLASSIFICATION AND REDUCTION USING A PRIORI LEARNED MODELS TO APPEAR IN IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEPT. 22 25, 23, UK SIMULTANEOUS NOISE CLASSIFICATION AND REDUCTION USING A PRIORI LEARNED MODELS Nasser Mohammadiha

More information

A State-Space Approach to Dynamic Nonnegative Matrix Factorization

A State-Space Approach to Dynamic Nonnegative Matrix Factorization 1 A State-Space Approach to Dynamic Nonnegative Matrix Factorization Nasser Mohammadiha, Paris Smaragdis, Ghazaleh Panahandeh, Simon Doclo arxiv:179.5v1 [cs.lg] 31 Aug 17 Abstract Nonnegative matrix factorization

More information

Improved Method for Epoch Extraction in High Pass Filtered Speech

Improved Method for Epoch Extraction in High Pass Filtered Speech Improved Method for Epoch Extraction in High Pass Filtered Speech D. Govind Center for Computational Engineering & Networking Amrita Vishwa Vidyapeetham (University) Coimbatore, Tamilnadu 642 Email: d

More information

Adaptive Corrected Procedure for TVL1 Image Deblurring under Impulsive Noise

Adaptive Corrected Procedure for TVL1 Image Deblurring under Impulsive Noise Adaptive Corrected Procedure for TVL1 Image Deblurring under Impulsive Noise Minru Bai(x T) College of Mathematics and Econometrics Hunan University Joint work with Xiongjun Zhang, Qianqian Shao June 30,

More information

Sparse Time-Frequency Transforms and Applications.

Sparse Time-Frequency Transforms and Applications. Sparse Time-Frequency Transforms and Applications. Bruno Torrésani http://www.cmi.univ-mrs.fr/~torresan LATP, Université de Provence, Marseille DAFx, Montreal, September 2006 B. Torrésani (LATP Marseille)

More information

Detection of Overlapping Acoustic Events Based on NMF with Shared Basis Vectors

Detection of Overlapping Acoustic Events Based on NMF with Shared Basis Vectors Detection of Overlapping Acoustic Events Based on NMF with Shared Basis Vectors Kazumasa Yamamoto Department of Computer Science Chubu University Kasugai, Aichi, Japan Email: yamamoto@cs.chubu.ac.jp Chikara

More information

Single Channel Music Sound Separation Based on Spectrogram Decomposition and Note Classification

Single Channel Music Sound Separation Based on Spectrogram Decomposition and Note Classification Single Channel Music Sound Separation Based on Spectrogram Decomposition and Note Classification Hafiz Mustafa and Wenwu Wang Centre for Vision, Speech and Signal Processing (CVSSP) University of Surrey,

More information

ALTERNATIVE OBJECTIVE FUNCTIONS FOR DEEP CLUSTERING

ALTERNATIVE OBJECTIVE FUNCTIONS FOR DEEP CLUSTERING ALTERNATIVE OBJECTIVE FUNCTIONS FOR DEEP CLUSTERING Zhong-Qiu Wang,2, Jonathan Le Roux, John R. Hershey Mitsubishi Electric Research Laboratories (MERL), USA 2 Department of Computer Science and Engineering,

More information

2 Regularized Image Reconstruction for Compressive Imaging and Beyond

2 Regularized Image Reconstruction for Compressive Imaging and Beyond EE 367 / CS 448I Computational Imaging and Display Notes: Compressive Imaging and Regularized Image Reconstruction (lecture ) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement

More information

Improved noise power spectral density tracking by a MAP-based postprocessor

Improved noise power spectral density tracking by a MAP-based postprocessor Improved noise power spectral density tracking by a MAP-based postprocessor Aleksej Chinaev, Alexander Krueger, Dang Hai Tran Vu, Reinhold Haeb-Umbach University of Paderborn, Germany March 8th, 01 Computer

More information

Design Criteria for the Quadratically Interpolated FFT Method (I): Bias due to Interpolation

Design Criteria for the Quadratically Interpolated FFT Method (I): Bias due to Interpolation CENTER FOR COMPUTER RESEARCH IN MUSIC AND ACOUSTICS DEPARTMENT OF MUSIC, STANFORD UNIVERSITY REPORT NO. STAN-M-4 Design Criteria for the Quadratically Interpolated FFT Method (I): Bias due to Interpolation

More information

Characterization of Gradient Dominance and Regularity Conditions for Neural Networks

Characterization of Gradient Dominance and Regularity Conditions for Neural Networks Characterization of Gradient Dominance and Regularity Conditions for Neural Networks Yi Zhou Ohio State University Yingbin Liang Ohio State University Abstract zhou.1172@osu.edu liang.889@osu.edu The past

More information

Chirp Transform for FFT

Chirp Transform for FFT Chirp Transform for FFT Since the FFT is an implementation of the DFT, it provides a frequency resolution of 2π/N, where N is the length of the input sequence. If this resolution is not sufficient in a

More information

LECTURE NOTES IN AUDIO ANALYSIS: PITCH ESTIMATION FOR DUMMIES

LECTURE NOTES IN AUDIO ANALYSIS: PITCH ESTIMATION FOR DUMMIES LECTURE NOTES IN AUDIO ANALYSIS: PITCH ESTIMATION FOR DUMMIES Abstract March, 3 Mads Græsbøll Christensen Audio Analysis Lab, AD:MT Aalborg University This document contains a brief introduction to pitch

More information

ORTHOGONALITY-REGULARIZED MASKED NMF FOR LEARNING ON WEAKLY LABELED AUDIO DATA. Iwona Sobieraj, Lucas Rencker, Mark D. Plumbley

ORTHOGONALITY-REGULARIZED MASKED NMF FOR LEARNING ON WEAKLY LABELED AUDIO DATA. Iwona Sobieraj, Lucas Rencker, Mark D. Plumbley ORTHOGONALITY-REGULARIZED MASKED NMF FOR LEARNING ON WEAKLY LABELED AUDIO DATA Iwona Sobieraj, Lucas Rencker, Mark D. Plumbley University of Surrey Centre for Vision Speech and Signal Processing Guildford,

More information

Fast algorithms for informed source separation

Fast algorithms for informed source separation Fast algorithms for informed source separation Augustin Lefèvre augustin.lefevre@uclouvain.be September, 10th 2013 Source separation in 5 minutes Recover source estimates from a mixed signal We consider

More information

NOISE ROBUST RELATIVE TRANSFER FUNCTION ESTIMATION. M. Schwab, P. Noll, and T. Sikora. Technical University Berlin, Germany Communication System Group

NOISE ROBUST RELATIVE TRANSFER FUNCTION ESTIMATION. M. Schwab, P. Noll, and T. Sikora. Technical University Berlin, Germany Communication System Group NOISE ROBUST RELATIVE TRANSFER FUNCTION ESTIMATION M. Schwab, P. Noll, and T. Sikora Technical University Berlin, Germany Communication System Group Einsteinufer 17, 1557 Berlin (Germany) {schwab noll

More information

Sparse linear models

Sparse linear models Sparse linear models Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 2/22/2016 Introduction Linear transforms Frequency representation Short-time

More information