RECTIFIED LINEAR UNIT CAN ASSIST GRIFFIN LIM PHASE RECOVERY. Kohei Yatabe, Yoshiki Masuyama and Yasuhiro Oikawa
|
|
- Adrian Evans
- 5 years ago
- Views:
Transcription
1 RECTIFIED LINEAR UNIT CAN ASSIST GRIFFIN LIM PHASE RECOVERY Kohei Yatabe, Yoshiki Masuyama and Yasuhiro Oikawa Department of Intermedia Art and Science, Waseda University, Tokyo, Japan ABSTRACT Phase recovery is an essential process for reconstructing a timedomain signal from the corresponding spectrogram when its phase is contaminated or unavailable. Recently, a phase recovery method using deep neural network (DNN) was proposed, which interested us because the inverse short-time Fourier transform (inverse STFT) was utilized within the network. This inverse STFT converts a spectrogram into its time-domain counterpart, and then the activation function, leaky rectified linear unit (ReLU), is applied. Such nonlinear operation in time domain resembles the speech enhancement method called the harmonic regeneration noise reduction (HRNR). In HRNR, a time-domain nonlinearity, typically ReLU, is applied for assistance in enhancing the higher-order harmonics. From this point of view, one question arose in our mind: Can time-domain ReLU solely assist phase recovery? Inspired by this curious connection between the recent DNN-based phase recovery method and HRNR in speech enhancement, the ReLU assisted Griffin Lim algorithm is proposed in this paper to investigate the above question. Through an experiment of speech denoising with the oracle Wiener filter, some positive effect of the time-domain nonlinearity is confirmed in terms of the scores of the short-time objective intelligibility (STOI). Index Terms Spectrogram, redundancy, consistency, timedomain nonlinearity, harmonic regeneration. 1. INTRODUCTION Recent important trend in signal processing and speech enhancement includes phase recovery of an audio signal. Many of the popular acoustical processing methods are formulated in the time-frequency domain, obtained through the short-time Fourier transform (STFT), where the processing is usually implemented as a procedure of modifying the amplitude at each time-frequency bin. Although the spectrograms are parametrized by both amplitude and phase as they are expressed as a collection of complex numbers, phase had been ignored for several decades until the pioneering works demonstrated its importance. Recently, the so-called phase-aware signal processing gains considerable attention in the community, and a number of methodologies have been proposed [1 3]. This paper focuses on its branch called phase recovery which aims to obtain a better phase spectrogram under the given amplitude (together with noisy phase in some applications such as speech denoising). As usual in signal processing, phase recovery methods can be categorized by the amount of imposed prior knowledge. One of the most general algorithms is the Griffin Lim algorithm [4 6] which retrieves the phase only based on the redundancy of the timefrequency representation. In the algorithm, the phase is modified only by the linear transformation between time and time-frequency domains (STFT and its inverse), and no assumption is made upon the structure of the data. Therefore, even though the Griffin Lim algorithm might not achieve a good performance due to the insufficiency of the prior knowledge, it is utilized in a wide variety of applications. On the other hand, there are several phase recovery methods based on the structure of the data. For example, the harmonic structure of speech signals has been considered in the model-based phase recovery [7 10] which can obtain a better result with a price of narrowing the range of applications. Very recently, a phase recovery method based on a deep neural network (DNN) was proposed [11] along the extraordinary success stories of DNN in the last decade. Although it might not seems to have assumptions on the data, DNN heavily relies on the extremely rich prior knowledge, which is automatically learned from the training dataset, when it is applied as a signal processor. One thing from the DNN-based phase recovery in [11] which interested the authors is the use of the inverse STFT to obtain the time-domain signal within the network. As DNN is a composition of affine and nonlinear functions [12], this time-domain signal obtained by the inverse-stft layer was fed into the nonlinear functions [11]. Such nonlinearity in the time domain reminds us a speech enhancement method called harmonic regeneration noise reduction (HRNR) [13 15] which utilizes a time-domain nonlinear function, together with the inverse STFT, to recover the harmonic structure (especially in the high-frequency range) of speech signals. A typical choice of the nonlinear function in HRNR is the half-wave rectifier [13 15] which is equivalent to the quite popular activation called the rectified linear unit (ReLU) in the literature of DNN [16,17]. Indeed, the DNN-based phase recovery method in [11] utilized a variant of ReLU in the time domain, namely Leaky ReLU. This curious connection between the DNN-based phase recovery and HRNR suggested one possibility: Time-domain nonlinearity can solely contribute to phase recovery without a network. For investigating this conjecture, a combination of time-domain nonlinearity and a phase recovery algorithm is proposed, and its performance for speech enhancement is experimentally investigated in this paper. The Griffin Lim algorithm is chosen for the baseline method because it is the standard phase recovery algorithm without any assumption on the structure of data. ReLU is incorporated within its procedure, after the inverse STFT as the DNN-based method did, in order to artificially generate harmonic components as in HRNR. This modified Griffin Lim algorithm with the time-domain ReLU is compared to that without ReLU for seeing the effect of the timedomain nonlinearity. An experiment of speech denoising using the oracle Wiener filter is conducted with 200 speech signals obtained from the TIMIT database, and its result indicates the above conjecture positively. 2. PHASE RECOVERY OF SPECTROGRAM In this section, the standard time-frequency domain representation (spectrogram) of speech signals is briefly reviewed. The ordinary Griffin Lim algorithm is also introduced here so that the proposed modification in the subsequent section becomes apparent /18/$31.00 c 2018 IEEE
2 2.1. Time-frequency representation of audio signal Let the STFT of a signal x with a window w be defined as L 1 (F wx)[m, n] = x[l + an] w[l] e 2πibml/L, (1) l=0 where z is the complex conjugate of z, i = 1 is the imaginary unit, L is the window length, n and m are the time and frequency indices, and a and b are the time and frequency shifting steps, respectively. By denoting the inverse STFT F w, the reconstruction formula of STFT can be represented as x = F w F wx, where w is a suitable synthesis window associated with w, or the dual window [18 21] of w. For the sake of simplicity, only the Paseval tight case is considered in this paper, i.e., the window is self-dual w = w (the same window can be used in both analysis and synthesis to reconstruct the signal x = F w F wx). A spectrogram corresponding to x will be denoted by X[m, n] (= (F wx)[m, n]) for convenience Speech enhancement based on amplitude restoration One of the most popular strategies for enhancing audio signals is filtering in the time-frequency domain. By multiplying some scalar, so-called the time-frequency mask M[m, n], to each bin of the spectrogram X[m, n] and taking inverse STFT, F w (M X), a nonstationary filter can be approximately realized, where represents the element-wise multiplication. Ordinarily, in the acoustical applications, this bin-wise scalar M[m, n] (which may also be called spectral gain or Gabor multiplier) is treated as a nonnegative real number, that is, only amplitude of the spectrogram is modified. This practice stems from multiple reasons including the optimality in the sense of minimum mean square error estimates [1]. However, every spectrogram consists of not only amplitude but also phase which is essential for recovering the time-domain signals. Amplitude-only restoration of spectrogram results in contaminated signal, even when the recovered amplitude is perfect, owing to the reconstruction of the time-domain signal by inverse STFT using noisy phase Phase recovery by Griffin Lim algorithm Recently, the importance of restoring phase spectrogram gains considerable attentions through the pioneering studies [1 3], which emerges the field of phase-aware signal processing and modeling of complex spectrograms [22 24]. One of the most famous algorithms for obtaining a better phase spectrogram from the corresponding amplitude is the Griffin Lim algorithm [4]. This algorithm imposes two expectations upon the target spectrogram: The resulting spectrogram should (1) maintain the given amplitude and (2) have minimum norm among the all possible spectrograms corresponding to their time-domain counterpart. The latter condition is often called consistency, and therefore a method based on it is categorized into the consistency-based phase recovery [4 6]. The Griffin Lim algorithm implements the above expectations by the alternating projections with a hope for acquiring better phase 1 : X [k+1] = P A(P C(X [k] )), (2) 1 Note that, in general, those two expectations cannot be met simultaneously, and therefore the Griffin Lim algorithm does not ensure optimality in those senses. Indeed, Eq. (2) can be interpreted as a projected gradient algorithm with the relaxed consistency criterion, and thus consistency is not supposed to be satisfied. In this paper, such detail of the algorithm is omitted because the objective of the paper is to demonstrate the possibility of considering the time-domain nonlinearity in phase recovery and not to propose a new algorithm. For some examples of algorithmic investigation, see [10, 25]. where P S is the metric projection onto a set S [26], P S(X) = arg min X Y, (3) Y S is the Euclidean norm, k is an iteration index, A is a set of spectrograms X whose amplitude are equal to a given nonnegative value a[m, n] 0, i.e., X[m, n] = a[m, n], and C is a set of consistent spectrograms X = F wf w X (the set of fixed points of F wf w ). These projections onto the sets C and A are given by P C(X) = F wf w X, (4) P A(X) = a X X, (5) where, and represent element-wise absolute value, multiplication and division, respectively, and the result of division is replaced by zero when X[m, n] = 0. While the Griffin Lim algorithm has been successfully applied to a number of applications, its poor adaptability to a specific situation might have been restricted the practical performance. As the algorithm pays attention to the consistency only, and no applicationspecific structure is considered in the projections, it is presumed that incorporating some data-specific structure can contribute to improve the quality of estimated phase. In the next section, the time-domain ReLU is introduced for considering the harmonic structure of audio signals within the Griffin Lim algorithm. 3. GRIFFIN LIM ALGORITHM ASSISTED BY RECTIFIED LINEAR UNIT Some audio signals including speech have a specific structure of harmonics. In this section, a combination of the Griffin Lim algorithm and the time-domain ReLU is proposed with the hope of capturing a structure similar to that of speech signals Nonlinear harmonic regeneration Spectrograms of speech and audio signals are often comprised of harmonic components whose frequencies are integer multiple of the fundamental frequency. This well-known structure, the harmonic structure, has been utilized in many signal processing methods especially in speech enhancement. One notable use of such structure is the harmonic regeneration technique [13 15] which artificially generates harmonics from enhanced speech signals for obtaining a better estimate of a priori signal-to-noise ratio (SNR). For generating the harmonics artificially, a nonlinear function is applied in the time domain. The half-wave rectifier, namely ReLU, is a typical choice for such time-domain nonlinearity [13 15], ReLU(x) = max{x, 0 }, (6) where the maximum operator is evaluated element-wise. By clipping the negative components, harmonics are generated as illustrated in Fig. 1, where the horizontal green and light blue bands in the middle row represents the magnitude of generated harmonics. The important observation is that the phase spectrogram of the rectified sinusoid (in the bottom row of Fig. 1) has a certain structured pattern, corresponding to the generated harmonics, which does not exist in the original signal. That is, phase of the harmonics can be aligned based on the fundamental-frequency component through the time-domain nonlinear operation. Although the relationship between the phase of natural audio signals and this artificially generated pattern is not so clear, it might be possible to improve phase 556 2
3 recovery because the fundamental-frequency component is often the largest component which should contain better information for the recovery. Then, the following question is raised naturally: Can an element-wise nonlinear operation in the time domain help a phase recovery algorithm to improve the performance? For experimentally investigating this question, a combination of the Griffin Lim algorithm and ReLU is proposed Proposed ReLU assisted Griffin Lim algorithm Here, time-domain ReLU is incorporated into the procedure of the Griffin Lim algorithm. To do so, the following projection onto the set related to time-domain rectified signals is introduced: PN (X) = Fw ReLU(Fw X), (7) where N is a set of the consistent spectrograms whose time domain counterpart is nonnegative. By replacing the projection corresponding to consistency PC in Eq. (2) with this ReLU combined variation PN, the ReLU assisted Griffin Lim algorithm (ReLU-GLA) is proposed as the following procedure: X [k+1] = PA (PN (X [k] )), Fig. 1. Illustration of amplitude/phase spectrogram corresponding to a sinusoid and its rectified counterpart (from top to bottom: timedomain signal, amplitude spectrogram, and phase spectrogram). (8) where the only difference to the original algorithm in Eq. (2) is the additional ReLU term in Eq. (7) which does not exist in Eq. (4). This slight modification, which does not increase the computational complexity thanks to the extremely cheap evaluation of ReLU, can contribute to the quality of recovered phase to some extent as shown in the next section. Note that the additional nonlinear distortion imposed by this operation does not remain in each intermediate result X [k+1] because the projection onto the given amplitude spectrogram PA completely removes such distortion by replacing the amplitude to the predetermined values. That is, the generated harmonics only contributes to the phase, and therefore it is safe to choose any nonlinear operation in the time domain. Here, ReLU was chosen for just a representative example, and any other nonlinearity can be incorporated in the totally same manner STOI EXPERIMENT In order to investigate the question raised in Section 3.1, a numerical experiment was performed. Test signals consisting of 100 male and 100 female speech signals [1], obtained from the TIMIT database [27], were corrupted by the additive Gaussian noise whose amplitude was adjusted so that the SNRs of the simulated signals became 5, 10, 15 or 20 db. The Gaussian noise was generated 10 times for each speech signal, and thus, in total, 2000 noisy signals were utilized for each SNR. These noisy signals were enhanced by the Wiener filters constructed in the oracle condition (both signal and noise power at each time-frequency bin were known), where the STFT is calculated by the canonical tight variant of the 32 ms Hann window shifted by 16 ms. The iteration of the Griffin Lim and the proposed algorithms were started from the observed noisy phase, and the projection PA enforces the amplitude spectrogram to be the Wiener filtered one. For the evaluation, scores of the shorttime objective intelligibility (STOI) [28] was calculated as a perceptual measure of enhanced speech signals Number of iteration Fig. 2. Average scores of STOI for each iteration. The blue lines indicate the scores of ordinary Griffin Lim algorithm, while the red lines are those of the proposed ReLU-GLA. SNRs of input signals corresponding to each line are written within the figure. The experimental results summarized by the average of STOI scores of the 2000 noisy speech signals for each SNR are illustrated in Fig. 2, where the blue lines indicate the conventional GLA, and the red ones correspond to the proposed ReLU-GLA. For all four SNRs, the proposed ReLU-GLA attained the better scores at the first iteration, and then the difference of the scores of both methods seems to decrease as the iteration number increases. Although this result indicates that some positive effects of incorporating the time-domain ReLU into the phase recovery algorithm exist, the effect for each speech signal cannot be confirmed from this figure because each line represents the average of the 2000 trials. Therefore, the results are further illustrated by histograms for contrasting individual effects. Since the difference of the scores diminished for the large iteration numbers, the results of the first and 10th iteration are utilized to construct the histograms in the next page. 2 STOI was chosen in this paper because the performance of phase recovery cannot be measured by quantity sensitive to the difference of constant phase such as SNR. Other popular measures including PESQ were not calculated because, unfortunately, the first author only had extremely limited time before the deadlines of the initial and revisional submissions
4 Frequency Frequency Difference of STOI improvement Difference of STOI improvement Fig. 3. Histograms of difference of STOI improvement. STOI of the proposed ReLU assisted algorithm was subtracted by that of the ordinary Griffin Lim algorithm. Both algorithms were iterated once from the initial values, i.e., these results were obtained by the singleshot projection PA (PC (XWiener )) and PA (PN (XWiener )). The vertical red lines represent the position of 0, and therefore the bars at the right side of these red lines indicate the results where the proposed method was better than the conventional one. Fig. 4. Histograms of difference of STOI improvement. The algorithms were iterated 10 times from the initial values. is correct, then one can consider a more sophisticated nonlinearity which shapes the waveform closer to the target signals, maybe by learning from a dataset, to obtain a better phase recovery method. The reason for the diminishing phenomenon of the effect of nonlinearity should be because the Griffin Lim algorithm does not consider the observed phase within its procedure. As in Eq. (2), and also in Eq. (8), the phase spectrogram is modified without considering the observed phase. That is, the phase is close to the observed one only in the first few iterations where the effect of the initial value remain, and the resulting phase after a number of iterations is not directly related to the observation. A phase recovery method considering data fidelity to phase, unlike the Griffin Lim algorithm, might be possible to receive more benefit from the time-domain ReLU, or any other nonlinearity, and therefore seeking such algorithm together with an effective time-domain nonlinear function for harmonic regeneration should be the next direction of the research. The histograms of difference of STOI scores are shown in Figs. 3 and 4. For each signal, the score of the proposed algorithm was subtracted by that of the conventional GLA for clarifying the difference between them. Therefore, the center of the horizontal axis (represented by the vertical red line) means that the improvements of STOI achieved by both algorithms were the same. The positive value in the horizontal axis (right side of the red line) indicates that the proposed ReLU-GLA was better than the conventional one, and the negative value indicates the opposite situation. From Fig. 3, it can be confirmed that the proposed algorithm improved most of the test samples at the first iteration than the conventional GLA. That is, the single-shot projection PA (PN (XWiener )) improved STOI more than the conventional projection PA (PC (XWiener )), where XWiener represents the noisy spectrogram whose amplitude was enhanced by the Wiener filter. This result is important because projecting the Wienerfiltered data once may improve the intelligibility without the pain of iteration. Indeed, STOI scores of all 8000 samples (2000 per SNR) were improved from those of the initial values XWiener with observed phase. Although the effect of the time-domain nonlinearity diminished after some iterations, its positive effect can also be seen in the 10th iteration as shown in Fig. 4. These results indicated that the time-domain ReLU can assist the Griffin Lim algorithm in terms of STOI at the beginning of the iteration, and its effect remains in some iterations. The reason for this positive effect of ReLU might be the pulsetrain-like waveform of rectified signals as in Fig. 1. As considered in the source-filter model, speech signals consist of a sequence of pulses. Then, an appropriate phase for a speech signal should recover such sequential pulses, while an inappropriate one may not correspond to pulses. The time-domain ReLU might have a power to align the phase of the harmonics so that the waveform in time domain becomes more pulse-like as in Fig. 1. If the above discussion 5. CONCLUSIONS In this paper, inspired by the DNN-based phase recovery and the harmonic regeneration technique for speech enhancement, a variant of the well-known Griffin Lim algorithm combined with the timedomain ReLU was proposed. The effectiveness of the time-domain nonlinearity for speech denoising in terms of STOI was experimentally confirmed. The experimental results shed light on the possibility of utilizing such time-domain nonlinear function within a signal reconstruction process (or utilizing inverse-stft layer within the network in the words of DNN). Both ReLU and the Griffin-Lim algorithm are just one example of the possibilities, and searching a better combination as well as a DNN model containing time-domain representation within the network is remained as the future works. 6. ACKNOWLEDGMENT The first author would like to thank Dr. Ryoichi Miyazaki for his support on prior works and helpful comments on the time-domain nonlinear operation in speech enhancement
5 7. REFERENCES [1] P. Mowlaee, J. Kulmer, J. Stahl, and F. Mayer, Single Channel Phase-Aware Signal Processing in Speech Communication: Theory and Practice, Wiley, [2] T. Gerkmann, M. Krawczyk-Becker, and J. Le Roux, Phase processing for single-channel speech enhancement: History and recent advances, IEEE Signal Process. Mag., vol. 32, no. 2, pp , Mar [3] P. Mowlaee, R. Saeidi, and Y. Stylianou, Advances in phaseaware signal processing in speech communication, Speech Commun., vol. 81, pp. 1 29, [4] D. Griffin and J. Lim, Signal estimation from modified shorttime Fourier transform, IEEE Trans. Acoust., Speech, Signal Process., vol. 32, no. 2, pp , Apr [5] J. Le Roux, H. Kameoka, N. Ono, and S. Sagayama, Fast signal reconstruction from magnitude STFT spectrogram based on spectrogram consistency, in Int. Conf. Digital Audio Effects (DAFx-10), Sep [6] N. Perraudin, P. Balazs, and P. L. Søndergaard, A fast griffinlim algorithm, in IEEE Workshop Appl. Signal Process. Audio Acoust. (WASPAA), Oct 2013, pp [7] M. Krawczyk and T. Gerkmann, STFT phase reconstruction in voiced speech for an improved single-channel speech enhancement, IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 22, no. 12, pp , Dec [8] Y. Wakabayashi, T. Fukumori, M. Nakayama, T. Nishiura, and Y. Yamashita, Single-channel speech enhancement with phase reconstruction based on phase distortion averaging, IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 26, no. 9, pp , Sep [9] P. Magron, R. Badeau, and B. David, Model-based stft phase recovery for audio source separation, IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 26, no. 6, pp , June [10] Y. Masuyama, K. Yatabe, and Y. Oikawa, Model-based phase recovery of spectrograms via optimization on Riemannian manifolds, in Int. Workshop Acoust. Signal Enhance. (IWAENC), Sep [11] K. Oyamada, H. Kameoka, T. Kaneko, K. Tanaka, N. Hojo, and H. Ando, Generative adversarial network-based approach to signal reconstruction from magnitude spectrograms, arxiv: , Sep [12] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, MIT Press, [13] C. Plapous, C. Marro, and P. Scalart, Improved signal-tonoise ratio estimation for speech enhancement, IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 6, pp , Nov [14] M. Une and R. Miyazaki, Evaluation of sound quality and speech recognition performance using harmonic regeneration for various noise reduction techniques, in RISP Int. Workshop Nonlinear Circuits, Commun Signal Process. (NCSP), Mar. 2017, pp [15] M. Une and R. Miyazaki, Musical-noise-free speech enhancement with low speech distortion by biased harmonic regeneration technique, in Int. Workshop Acoust. Signal Enhance. (IWAENC), Sep [16] X. Glorot, A. Bordes, and Y. Bengio, Deep sparse rectifier neural networks, in Proc. Fourteenth Int. Conf. Artif. Intell. Stat., Apr. 2011, vol. 15, pp [17] S. Sonoda and N. Murata, Neural network with unbounded activation functions is universal approximator, Appl. Comput. Harmon. Anal., vol. 43, no. 2, pp , [18] H. G. Feichtinger and T. Strohmer, Eds., Gabor Analysis and Algorithms: Theory and Applications, Birkhäuser Boston, Boston, MA, [19] K. Gröchenig, Foundations of Time-Frequency Analysis, Birkhäuser Boston, Boston, MA, [20] P. L. Søndergaard, Gabor frames by sampling and periodization, Adv. Comput. Math., vol. 27, no. 4, pp , [21] O. Christensen, Frames and Bases: An Introductory Course, Birkhauser, [22] K. Yatabe and Y. Oikawa, Phase corrected total variation for audio signals, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2018, pp [23] K. Yatabe and D. Kitamura, Determined blind source separation via proximal splitting algorithm, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2018, pp [24] Y. Masuyama, K. Yatabe, and Y. Oikawa, Low-rankness of complex-valued spectrogram and its application to phaseaware audio processing, (submitted). [25] Y. Masuyama, K. Yatabe, and Y. Oikawa, Griffin Lim like phase recovery via alternating direction method of multipliers, (submitted). [26] A. Cegielski, Iterative Methods for Fixed Point Problems in Hilbert Spaces, Springer, [27] J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallett, and N. L. Dahlgren, DARPA TIMIT Acoustic Phonetic Continuous Speech Corpus CDROM, NIST, [28] C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen, An algorithm for intelligibility prediction of time-frequency weighted noisy speech, IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 7, pp , Sep
ON ADVERSARIAL TRAINING AND LOSS FUNCTIONS FOR SPEECH ENHANCEMENT. Ashutosh Pandey 1 and Deliang Wang 1,2. {pandey.99, wang.5664,
ON ADVERSARIAL TRAINING AND LOSS FUNCTIONS FOR SPEECH ENHANCEMENT Ashutosh Pandey and Deliang Wang,2 Department of Computer Science and Engineering, The Ohio State University, USA 2 Center for Cognitive
More informationSINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan
SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS Emad M. Grais and Hakan Erdogan Faculty of Engineering and Natural Sciences, Sabanci University, Orhanli
More informationMULTI-RESOLUTION SIGNAL DECOMPOSITION WITH TIME-DOMAIN SPECTROGRAM FACTORIZATION. Hirokazu Kameoka
MULTI-RESOLUTION SIGNAL DECOMPOSITION WITH TIME-DOMAIN SPECTROGRAM FACTORIZATION Hiroazu Kameoa The University of Toyo / Nippon Telegraph and Telephone Corporation ABSTRACT This paper proposes a novel
More informationSINGLE-CHANNEL SPEECH PRESENCE PROBABILITY ESTIMATION USING INTER-FRAME AND INTER-BAND CORRELATIONS
204 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) SINGLE-CHANNEL SPEECH PRESENCE PROBABILITY ESTIMATION USING INTER-FRAME AND INTER-BAND CORRELATIONS Hajar Momeni,2,,
More informationProc. of NCC 2010, Chennai, India
Proc. of NCC 2010, Chennai, India Trajectory and surface modeling of LSF for low rate speech coding M. Deepak and Preeti Rao Department of Electrical Engineering Indian Institute of Technology, Bombay
More informationBIAS CORRECTION METHODS FOR ADAPTIVE RECURSIVE SMOOTHING WITH APPLICATIONS IN NOISE PSD ESTIMATION. Robert Rehr, Timo Gerkmann
BIAS CORRECTION METHODS FOR ADAPTIVE RECURSIVE SMOOTHING WITH APPLICATIONS IN NOISE PSD ESTIMATION Robert Rehr, Timo Gerkmann Speech Signal Processing Group, Department of Medical Physics and Acoustics
More informationMANY digital speech communication applications, e.g.,
406 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 2, FEBRUARY 2007 An MMSE Estimator for Speech Enhancement Under a Combined Stochastic Deterministic Speech Model Richard C.
More informationA Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement
A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement Simon Leglaive 1 Laurent Girin 1,2 Radu Horaud 1 1: Inria Grenoble Rhône-Alpes 2: Univ. Grenoble Alpes, Grenoble INP,
More information2D Spectrogram Filter for Single Channel Speech Enhancement
Proceedings of the 7th WSEAS International Conference on Signal, Speech and Image Processing, Beijing, China, September 15-17, 007 89 D Spectrogram Filter for Single Channel Speech Enhancement HUIJUN DING,
More informationAdapting Wavenet for Speech Enhancement DARIO RETHAGE JULY 12, 2017
Adapting Wavenet for Speech Enhancement DARIO RETHAGE JULY 12, 2017 I am v Master Student v 6 months @ Music Technology Group, Universitat Pompeu Fabra v Deep learning for acoustic source separation v
More informationCOLLABORATIVE SPEECH DEREVERBERATION: REGULARIZED TENSOR FACTORIZATION FOR CROWDSOURCED MULTI-CHANNEL RECORDINGS. Sanna Wager, Minje Kim
COLLABORATIVE SPEECH DEREVERBERATION: REGULARIZED TENSOR FACTORIZATION FOR CROWDSOURCED MULTI-CHANNEL RECORDINGS Sanna Wager, Minje Kim Indiana University School of Informatics, Computing, and Engineering
More informationExploring the Relationship between Conic Affinity of NMF Dictionaries and Speech Enhancement Metrics
Interspeech 2018 2-6 September 2018, Hyderabad Exploring the Relationship between Conic Affinity of NMF Dictionaries and Speech Enhancement Metrics Pavlos Papadopoulos, Colin Vaz, Shrikanth Narayanan Signal
More informationMatrix Factorization for Speech Enhancement
Matrix Factorization for Speech Enhancement Peter Li Peter.Li@nyu.edu Yijun Xiao ryjxiao@nyu.edu 1 Introduction In this report, we explore techniques for speech enhancement using matrix factorization.
More informationMULTISENSORY SPEECH ENHANCEMENT IN NOISY ENVIRONMENTS USING BONE-CONDUCTED AND AIR-CONDUCTED MICROPHONES. Mingzi Li,Israel Cohen and Saman Mousazadeh
MULTISENSORY SPEECH ENHANCEMENT IN NOISY ENVIRONMENTS USING BONE-CONDUCTED AND AIR-CONDUCTED MICROPHONES Mingzi Li,Israel Cohen and Saman Mousazadeh Department of Electrical Engineering, Technion - Israel
More informationCovariance smoothing and consistent Wiener filtering for artifact reduction in audio source separation
Covariance smoothing and consistent Wiener filtering for artifact reduction in audio source separation Emmanuel Vincent METISS Team Inria Rennes - Bretagne Atlantique E. Vincent (Inria) Artifact reduction
More informationSPEECH enhancement has been studied extensively as a
JOURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2017 1 Phase-Aware Speech Enhancement Based on Deep Neural Networks Naijun Zheng and Xiao-Lei Zhang Abstract Short-time frequency transform STFT)
More informationA NEURAL NETWORK ALTERNATIVE TO NON-NEGATIVE AUDIO MODELS. University of Illinois at Urbana-Champaign Adobe Research
A NEURAL NETWORK ALTERNATIVE TO NON-NEGATIVE AUDIO MODELS Paris Smaragdis, Shrikant Venkataramani University of Illinois at Urbana-Champaign Adobe Research ABSTRACT We present a neural network that can
More informationOn Spectral Basis Selection for Single Channel Polyphonic Music Separation
On Spectral Basis Selection for Single Channel Polyphonic Music Separation Minje Kim and Seungjin Choi Department of Computer Science Pohang University of Science and Technology San 31 Hyoja-dong, Nam-gu
More informationConsistent Wiener Filtering for Audio Source Separation
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Consistent Wiener Filtering for Audio Source Separation Le Roux, J.; Vincent, E. TR2012-090 October 2012 Abstract Wiener filtering is one of
More informationA SUBSPACE METHOD FOR SPEECH ENHANCEMENT IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes
A SUBSPACE METHOD FOR SPEECH ENHANCEMENT IN THE MODULATION DOMAIN Yu ang and Mike Brookes Department of Electrical and Electronic Engineering, Exhibition Road, Imperial College London, UK Email: {yw09,
More informationarxiv: v1 [stat.ml] 31 Oct 2016
Full-Capacity Unitary Recurrent Neural Networks arxiv:1611.00035v1 [stat.ml] 31 Oct 2016 Scott Wisdom 1, Thomas Powers 1, John R. Hershey 2, Jonathan Le Roux 2, and Les Atlas 1 1 Department of Electrical
More informationA SPEECH PRESENCE PROBABILITY ESTIMATOR BASED ON FIXED PRIORS AND A HEAVY-TAILED SPEECH MODEL
A SPEECH PRESENCE PROBABILITY ESTIMATOR BASED ON FIXED PRIORS AND A HEAVY-TAILED SPEECH MODEL Balázs Fodor Institute for Communications Technology Technische Universität Braunschweig 386 Braunschweig,
More informationJoint Filtering and Factorization for Recovering Latent Structure from Noisy Speech Data
Joint Filtering and Factorization for Recovering Latent Structure from Noisy Speech Data Colin Vaz, Vikram Ramanarayanan, and Shrikanth Narayanan Ming Hsieh Department of Electrical Engineering University
More informationNonnegative Matrix Factorization with Markov-Chained Bases for Modeling Time-Varying Patterns in Music Spectrograms
Nonnegative Matrix Factorization with Markov-Chained Bases for Modeling Time-Varying Patterns in Music Spectrograms Masahiro Nakano 1, Jonathan Le Roux 2, Hirokazu Kameoka 2,YuKitano 1, Nobutaka Ono 1,
More informationBayesian Estimation of Time-Frequency Coefficients for Audio Signal Enhancement
Bayesian Estimation of Time-Frequency Coefficients for Audio Signal Enhancement Patrick J. Wolfe Department of Engineering University of Cambridge Cambridge CB2 1PZ, UK pjw47@eng.cam.ac.uk Simon J. Godsill
More informationNon-Stationary Noise Power Spectral Density Estimation Based on Regional Statistics
Non-Stationary Noise Power Spectral Density Estimation Based on Regional Statistics Xiaofei Li, Laurent Girin, Sharon Gannot, Radu Horaud To cite this version: Xiaofei Li, Laurent Girin, Sharon Gannot,
More informationOptimal Speech Enhancement Under Signal Presence Uncertainty Using Log-Spectral Amplitude Estimator
1 Optimal Speech Enhancement Under Signal Presence Uncertainty Using Log-Spectral Amplitude Estimator Israel Cohen Lamar Signal Processing Ltd. P.O.Box 573, Yokneam Ilit 20692, Israel E-mail: icohen@lamar.co.il
More informationFast Angular Synchronization for Phase Retrieval via Incomplete Information
Fast Angular Synchronization for Phase Retrieval via Incomplete Information Aditya Viswanathan a and Mark Iwen b a Department of Mathematics, Michigan State University; b Department of Mathematics & Department
More informationProbabilistic Inference of Speech Signals from Phaseless Spectrograms
Probabilistic Inference of Speech Signals from Phaseless Spectrograms Kannan Achan, Sam T. Roweis, Brendan J. Frey Machine Learning Group University of Toronto Abstract Many techniques for complex speech
More informationNon-Negative Matrix Factorization And Its Application to Audio. Tuomas Virtanen Tampere University of Technology
Non-Negative Matrix Factorization And Its Application to Audio Tuomas Virtanen Tampere University of Technology tuomas.virtanen@tut.fi 2 Contents Introduction to audio signals Spectrogram representation
More informationAudible sound field visualization by using Schlieren technique
Audible sound field visualization by using Schlieren technique Nachanant Chitanont, Kohei Yatabe and Yuhiro Oikawa Department of Intermedia Art and Science, Weda University, Tokyo, Japan Paper Number:
More informationTensor-Train Long Short-Term Memory for Monaural Speech Enhancement
1 Tensor-Train Long Short-Term Memory for Monaural Speech Enhancement Suman Samui, Indrajit Chakrabarti, and Soumya K. Ghosh, arxiv:1812.10095v1 [cs.sd] 25 Dec 2018 Abstract In recent years, Long Short-Term
More informationGaussian Processes for Audio Feature Extraction
Gaussian Processes for Audio Feature Extraction Dr. Richard E. Turner (ret26@cam.ac.uk) Computational and Biological Learning Lab Department of Engineering University of Cambridge Machine hearing pipeline
More informationNMF WITH SPECTRAL AND TEMPORAL CONTINUITY CRITERIA FOR MONAURAL SOUND SOURCE SEPARATION. Julian M. Becker, Christian Sohn and Christian Rohlfing
NMF WITH SPECTRAL AND TEMPORAL CONTINUITY CRITERIA FOR MONAURAL SOUND SOURCE SEPARATION Julian M. ecker, Christian Sohn Christian Rohlfing Institut für Nachrichtentechnik RWTH Aachen University D-52056
More informationEstimating Correlation Coefficient Between Two Complex Signals Without Phase Observation
Estimating Correlation Coefficient Between Two Complex Signals Without Phase Observation Shigeki Miyabe 1B, Notubaka Ono 2, and Shoji Makino 1 1 University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki
More informationSound Recognition in Mixtures
Sound Recognition in Mixtures Juhan Nam, Gautham J. Mysore 2, and Paris Smaragdis 2,3 Center for Computer Research in Music and Acoustics, Stanford University, 2 Advanced Technology Labs, Adobe Systems
More informationSingle Channel Signal Separation Using MAP-based Subspace Decomposition
Single Channel Signal Separation Using MAP-based Subspace Decomposition Gil-Jin Jang, Te-Won Lee, and Yung-Hwan Oh 1 Spoken Language Laboratory, Department of Computer Science, KAIST 373-1 Gusong-dong,
More informationSTRUCTURE-AWARE DICTIONARY LEARNING WITH HARMONIC ATOMS
19th European Signal Processing Conference (EUSIPCO 2011) Barcelona, Spain, August 29 - September 2, 2011 STRUCTURE-AWARE DICTIONARY LEARNING WITH HARMONIC ATOMS Ken O Hanlon and Mark D.Plumbley Queen
More informationModifying Voice Activity Detection in Low SNR by correction factors
Modifying Voice Activity Detection in Low SNR by correction factors H. Farsi, M. A. Mozaffarian, H.Rahmani Department of Electrical Engineering University of Birjand P.O. Box: +98-9775-376 IRAN hfarsi@birjand.ac.ir
More informationEstimation Error Bounds for Frame Denoising
Estimation Error Bounds for Frame Denoising Alyson K. Fletcher and Kannan Ramchandran {alyson,kannanr}@eecs.berkeley.edu Berkeley Audio-Visual Signal Processing and Communication Systems group Department
More informationA POSTERIORI SPEECH PRESENCE PROBABILITY ESTIMATION BASED ON AVERAGED OBSERVATIONS AND A SUPER-GAUSSIAN SPEECH MODEL
A POSTERIORI SPEECH PRESENCE PROBABILITY ESTIMATION BASED ON AVERAGED OBSERVATIONS AND A SUPER-GAUSSIAN SPEECH MODEL Balázs Fodor Institute for Communications Technology Technische Universität Braunschweig
More informationEstimation of the Optimum Rotational Parameter for the Fractional Fourier Transform Using Domain Decomposition
Estimation of the Optimum Rotational Parameter for the Fractional Fourier Transform Using Domain Decomposition Seema Sud 1 1 The Aerospace Corporation, 4851 Stonecroft Blvd. Chantilly, VA 20151 Abstract
More informationRecovery of Compactly Supported Functions from Spectrogram Measurements via Lifting
Recovery of Compactly Supported Functions from Spectrogram Measurements via Lifting Mark Iwen markiwen@math.msu.edu 2017 Friday, July 7 th, 2017 Joint work with... Sami Merhi (Michigan State University)
More informationEUSIPCO
EUSIPCO 013 1569746769 SUBSET PURSUIT FOR ANALYSIS DICTIONARY LEARNING Ye Zhang 1,, Haolong Wang 1, Tenglong Yu 1, Wenwu Wang 1 Department of Electronic and Information Engineering, Nanchang University,
More informationA Log-Frequency Approach to the Identification of the Wiener-Hammerstein Model
A Log-Frequency Approach to the Identification of the Wiener-Hammerstein Model The MIT Faculty has made this article openly available Please share how this access benefits you Your story matters Citation
More informationSpatially adaptive alpha-rooting in BM3D sharpening
Spatially adaptive alpha-rooting in BM3D sharpening Markku Mäkitalo and Alessandro Foi Department of Signal Processing, Tampere University of Technology, P.O. Box FIN-553, 33101, Tampere, Finland e-mail:
More informationDecompositions of frames and a new frame identity
Decompositions of frames and a new frame identity Radu Balan a, Peter G. Casazza b, Dan Edidin c and Gitta Kutyniok d a Siemens Corporate Research, 755 College Road East, Princeton, NJ 08540, USA; b Department
More informationRobust Sound Event Detection in Continuous Audio Environments
Robust Sound Event Detection in Continuous Audio Environments Haomin Zhang 1, Ian McLoughlin 2,1, Yan Song 1 1 National Engineering Laboratory of Speech and Language Information Processing The University
More informationIMPROVED MULTI-MICROPHONE NOISE REDUCTION PRESERVING BINAURAL CUES
IMPROVED MULTI-MICROPHONE NOISE REDUCTION PRESERVING BINAURAL CUES Andreas I. Koutrouvelis Richard C. Hendriks Jesper Jensen Richard Heusdens Circuits and Systems (CAS) Group, Delft University of Technology,
More informationDeep Learning: Approximation of Functions by Composition
Deep Learning: Approximation of Functions by Composition Zuowei Shen Department of Mathematics National University of Singapore Outline 1 A brief introduction of approximation theory 2 Deep learning: approximation
More informationSEPARATION OF ACOUSTIC SIGNALS USING SELF-ORGANIZING NEURAL NETWORKS. Temujin Gautama & Marc M. Van Hulle
SEPARATION OF ACOUSTIC SIGNALS USING SELF-ORGANIZING NEURAL NETWORKS Temujin Gautama & Marc M. Van Hulle K.U.Leuven, Laboratorium voor Neuro- en Psychofysiologie Campus Gasthuisberg, Herestraat 49, B-3000
More informationEE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6)
EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement to the material discussed in
More informationNoise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic Approximation Algorithm
EngOpt 2008 - International Conference on Engineering Optimization Rio de Janeiro, Brazil, 0-05 June 2008. Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic
More informationImage Denoising using Uniform Curvelet Transform and Complex Gaussian Scale Mixture
EE 5359 Multimedia Processing Project Report Image Denoising using Uniform Curvelet Transform and Complex Gaussian Scale Mixture By An Vo ISTRUCTOR: Dr. K. R. Rao Summer 008 Image Denoising using Uniform
More informationA Probability Model for Interaural Phase Difference
A Probability Model for Interaural Phase Difference Michael I. Mandel, Daniel P.W. Ellis Department of Electrical Engineering Columbia University, New York, New York {mim,dpwe}@ee.columbia.edu Abstract
More informationScalable audio separation with light Kernel Additive Modelling
Scalable audio separation with light Kernel Additive Modelling Antoine Liutkus 1, Derry Fitzgerald 2, Zafar Rafii 3 1 Inria, Université de Lorraine, LORIA, UMR 7503, France 2 NIMBUS Centre, Cork Institute
More informationJOINT ACOUSTIC AND SPECTRAL MODELING FOR SPEECH DEREVERBERATION USING NON-NEGATIVE REPRESENTATIONS
JOINT ACOUSTIC AND SPECTRAL MODELING FOR SPEECH DEREVERBERATION USING NON-NEGATIVE REPRESENTATIONS Nasser Mohammadiha Paris Smaragdis Simon Doclo Dept. of Medical Physics and Acoustics and Cluster of Excellence
More informationIndependent Component Analysis and Unsupervised Learning. Jen-Tzung Chien
Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood
More informationApproximately dual frames in Hilbert spaces and applications to Gabor frames
Approximately dual frames in Hilbert spaces and applications to Gabor frames Ole Christensen and Richard S. Laugesen October 22, 200 Abstract Approximately dual frames are studied in the Hilbert space
More informationOver-enhancement Reduction in Local Histogram Equalization using its Degrees of Freedom. Alireza Avanaki
Over-enhancement Reduction in Local Histogram Equalization using its Degrees of Freedom Alireza Avanaki ABSTRACT A well-known issue of local (adaptive) histogram equalization (LHE) is over-enhancement
More informationApplication of the Tuned Kalman Filter in Speech Enhancement
Application of the Tuned Kalman Filter in Speech Enhancement Orchisama Das, Bhaswati Goswami and Ratna Ghosh Department of Instrumentation and Electronics Engineering Jadavpur University Kolkata, India
More informationNon-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs Paris Smaragdis TR2004-104 September
More informationAnalysis of Communication Systems Using Iterative Methods Based on Banach s Contraction Principle
Analysis of Communication Systems Using Iterative Methods Based on Banach s Contraction Principle H. Azari Soufiani, M. J. Saberian, M. A. Akhaee, R. Nasiri Mahallati, F. Marvasti Multimedia Signal, Sound
More informationCONVOLUTIVE NON-NEGATIVE MATRIX FACTORISATION WITH SPARSENESS CONSTRAINT
CONOLUTIE NON-NEGATIE MATRIX FACTORISATION WITH SPARSENESS CONSTRAINT Paul D. O Grady Barak A. Pearlmutter Hamilton Institute National University of Ireland, Maynooth Co. Kildare, Ireland. ABSTRACT Discovering
More informationEMPLOYING PHASE INFORMATION FOR AUDIO DENOISING. İlker Bayram. Istanbul Technical University, Istanbul, Turkey
EMPLOYING PHASE INFORMATION FOR AUDIO DENOISING İlker Bayram Istanbul Technical University, Istanbul, Turkey ABSTRACT Spectral audio denoising methods usually make use of the magnitudes of a time-frequency
More informationDenoising Gabor Transforms
1 Denoising Gabor Transforms James S. Walker Abstract We describe denoising one-dimensional signals by thresholding Blackman windowed Gabor transforms. This method is compared with Gauss-windowed Gabor
More informationIMPROVEMENTS IN MODAL PARAMETER EXTRACTION THROUGH POST-PROCESSING FREQUENCY RESPONSE FUNCTION ESTIMATES
IMPROVEMENTS IN MODAL PARAMETER EXTRACTION THROUGH POST-PROCESSING FREQUENCY RESPONSE FUNCTION ESTIMATES Bere M. Gur Prof. Christopher Niezreci Prof. Peter Avitabile Structural Dynamics and Acoustic Systems
More informationIndependent Component Analysis and Unsupervised Learning
Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien National Cheng Kung University TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent
More informationClassification of Hand-Written Digits Using Scattering Convolutional Network
Mid-year Progress Report Classification of Hand-Written Digits Using Scattering Convolutional Network Dongmian Zou Advisor: Professor Radu Balan Co-Advisor: Dr. Maneesh Singh (SRI) Background Overview
More informationDept. Electronics and Electrical Engineering, Keio University, Japan. NTT Communication Science Laboratories, NTT Corporation, Japan.
JOINT SEPARATION AND DEREVERBERATION OF REVERBERANT MIXTURES WITH DETERMINED MULTICHANNEL NON-NEGATIVE MATRIX FACTORIZATION Hideaki Kagami, Hirokazu Kameoka, Masahiro Yukawa Dept. Electronics and Electrical
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationNONNEGATIVE MATRIX FACTORIZATION WITH TRANSFORM LEARNING. Dylan Fagot, Herwig Wendt and Cédric Févotte
NONNEGATIVE MATRIX FACTORIZATION WITH TRANSFORM LEARNING Dylan Fagot, Herwig Wendt and Cédric Févotte IRIT, Université de Toulouse, CNRS, Toulouse, France firstname.lastname@irit.fr ABSTRACT Traditional
More informationPhase-dependent anisotropic Gaussian model for audio source separation
Phase-dependent anisotropic Gaussian model for audio source separation Paul Magron, Roland Badeau, Bertrand David To cite this version: Paul Magron, Roland Badeau, Bertrand David. Phase-dependent anisotropic
More informationFinite Frame Quantization
Finite Frame Quantization Liam Fowl University of Maryland August 21, 2018 1 / 38 Overview 1 Motivation 2 Background 3 PCM 4 First order Σ quantization 5 Higher order Σ quantization 6 Alternative Dual
More informationarxiv: v1 [physics.optics] 5 Mar 2012
Designing and using prior knowledge for phase retrieval Eliyahu Osherovich, Michael Zibulevsky, and Irad Yavneh arxiv:1203.0879v1 [physics.optics] 5 Mar 2012 Computer Science Department, Technion Israel
More informationPhoneme segmentation based on spectral metrics
Phoneme segmentation based on spectral metrics Xianhua Jiang, Johan Karlsson, and Tryphon T. Georgiou Introduction We consider the classic problem of segmenting speech signals into individual phonemes.
More informationCOMPLEX WAVELET TRANSFORM IN SIGNAL AND IMAGE ANALYSIS
COMPLEX WAVELET TRANSFORM IN SIGNAL AND IMAGE ANALYSIS MUSOKO VICTOR, PROCHÁZKA ALEŠ Institute of Chemical Technology, Department of Computing and Control Engineering Technická 905, 66 8 Prague 6, Cech
More informationTHE task of identifying the environment in which a sound
1 Feature Learning with Matrix Factorization Applied to Acoustic Scene Classification Victor Bisot, Romain Serizel, Slim Essid, and Gaël Richard Abstract In this paper, we study the usefulness of various
More informationMULTIPITCH ESTIMATION AND INSTRUMENT RECOGNITION BY EXEMPLAR-BASED SPARSE REPRESENTATION. Ikuo Degawa, Kei Sato, Masaaki Ikehara
MULTIPITCH ESTIMATION AND INSTRUMENT RECOGNITION BY EXEMPLAR-BASED SPARSE REPRESENTATION Ikuo Degawa, Kei Sato, Masaaki Ikehara EEE Dept. Keio University Yokohama, Kanagawa 223-8522 Japan E-mail:{degawa,
More informationReal-Time Spectrogram Inversion Using Phase Gradient Heap Integration
Real-Time Spectrogram Inversion Using Phase Gradient Heap Integration Zdeněk Průša 1 and Peter L. Søndergaard 2 1,2 Acoustics Research Institute, Austrian Academy of Sciences, Vienna, Austria 2 Oticon
More informationDeep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści
Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?
More informationSpectral Domain Speech Enhancement using HMM State-Dependent Super-Gaussian Priors
IEEE SIGNAL PROCESSING LETTERS 1 Spectral Domain Speech Enhancement using HMM State-Dependent Super-Gaussian Priors Nasser Mohammadiha, Student Member, IEEE, Rainer Martin, Fellow, IEEE, and Arne Leijon,
More informationSIMULTANEOUS NOISE CLASSIFICATION AND REDUCTION USING A PRIORI LEARNED MODELS
TO APPEAR IN IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEPT. 22 25, 23, UK SIMULTANEOUS NOISE CLASSIFICATION AND REDUCTION USING A PRIORI LEARNED MODELS Nasser Mohammadiha
More informationA State-Space Approach to Dynamic Nonnegative Matrix Factorization
1 A State-Space Approach to Dynamic Nonnegative Matrix Factorization Nasser Mohammadiha, Paris Smaragdis, Ghazaleh Panahandeh, Simon Doclo arxiv:179.5v1 [cs.lg] 31 Aug 17 Abstract Nonnegative matrix factorization
More informationImproved Method for Epoch Extraction in High Pass Filtered Speech
Improved Method for Epoch Extraction in High Pass Filtered Speech D. Govind Center for Computational Engineering & Networking Amrita Vishwa Vidyapeetham (University) Coimbatore, Tamilnadu 642 Email: d
More informationAdaptive Corrected Procedure for TVL1 Image Deblurring under Impulsive Noise
Adaptive Corrected Procedure for TVL1 Image Deblurring under Impulsive Noise Minru Bai(x T) College of Mathematics and Econometrics Hunan University Joint work with Xiongjun Zhang, Qianqian Shao June 30,
More informationSparse Time-Frequency Transforms and Applications.
Sparse Time-Frequency Transforms and Applications. Bruno Torrésani http://www.cmi.univ-mrs.fr/~torresan LATP, Université de Provence, Marseille DAFx, Montreal, September 2006 B. Torrésani (LATP Marseille)
More informationDetection of Overlapping Acoustic Events Based on NMF with Shared Basis Vectors
Detection of Overlapping Acoustic Events Based on NMF with Shared Basis Vectors Kazumasa Yamamoto Department of Computer Science Chubu University Kasugai, Aichi, Japan Email: yamamoto@cs.chubu.ac.jp Chikara
More informationSingle Channel Music Sound Separation Based on Spectrogram Decomposition and Note Classification
Single Channel Music Sound Separation Based on Spectrogram Decomposition and Note Classification Hafiz Mustafa and Wenwu Wang Centre for Vision, Speech and Signal Processing (CVSSP) University of Surrey,
More informationALTERNATIVE OBJECTIVE FUNCTIONS FOR DEEP CLUSTERING
ALTERNATIVE OBJECTIVE FUNCTIONS FOR DEEP CLUSTERING Zhong-Qiu Wang,2, Jonathan Le Roux, John R. Hershey Mitsubishi Electric Research Laboratories (MERL), USA 2 Department of Computer Science and Engineering,
More information2 Regularized Image Reconstruction for Compressive Imaging and Beyond
EE 367 / CS 448I Computational Imaging and Display Notes: Compressive Imaging and Regularized Image Reconstruction (lecture ) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement
More informationImproved noise power spectral density tracking by a MAP-based postprocessor
Improved noise power spectral density tracking by a MAP-based postprocessor Aleksej Chinaev, Alexander Krueger, Dang Hai Tran Vu, Reinhold Haeb-Umbach University of Paderborn, Germany March 8th, 01 Computer
More informationDesign Criteria for the Quadratically Interpolated FFT Method (I): Bias due to Interpolation
CENTER FOR COMPUTER RESEARCH IN MUSIC AND ACOUSTICS DEPARTMENT OF MUSIC, STANFORD UNIVERSITY REPORT NO. STAN-M-4 Design Criteria for the Quadratically Interpolated FFT Method (I): Bias due to Interpolation
More informationCharacterization of Gradient Dominance and Regularity Conditions for Neural Networks
Characterization of Gradient Dominance and Regularity Conditions for Neural Networks Yi Zhou Ohio State University Yingbin Liang Ohio State University Abstract zhou.1172@osu.edu liang.889@osu.edu The past
More informationChirp Transform for FFT
Chirp Transform for FFT Since the FFT is an implementation of the DFT, it provides a frequency resolution of 2π/N, where N is the length of the input sequence. If this resolution is not sufficient in a
More informationLECTURE NOTES IN AUDIO ANALYSIS: PITCH ESTIMATION FOR DUMMIES
LECTURE NOTES IN AUDIO ANALYSIS: PITCH ESTIMATION FOR DUMMIES Abstract March, 3 Mads Græsbøll Christensen Audio Analysis Lab, AD:MT Aalborg University This document contains a brief introduction to pitch
More informationORTHOGONALITY-REGULARIZED MASKED NMF FOR LEARNING ON WEAKLY LABELED AUDIO DATA. Iwona Sobieraj, Lucas Rencker, Mark D. Plumbley
ORTHOGONALITY-REGULARIZED MASKED NMF FOR LEARNING ON WEAKLY LABELED AUDIO DATA Iwona Sobieraj, Lucas Rencker, Mark D. Plumbley University of Surrey Centre for Vision Speech and Signal Processing Guildford,
More informationFast algorithms for informed source separation
Fast algorithms for informed source separation Augustin Lefèvre augustin.lefevre@uclouvain.be September, 10th 2013 Source separation in 5 minutes Recover source estimates from a mixed signal We consider
More informationNOISE ROBUST RELATIVE TRANSFER FUNCTION ESTIMATION. M. Schwab, P. Noll, and T. Sikora. Technical University Berlin, Germany Communication System Group
NOISE ROBUST RELATIVE TRANSFER FUNCTION ESTIMATION M. Schwab, P. Noll, and T. Sikora Technical University Berlin, Germany Communication System Group Einsteinufer 17, 1557 Berlin (Germany) {schwab noll
More informationSparse linear models
Sparse linear models Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 2/22/2016 Introduction Linear transforms Frequency representation Short-time
More information