Integrating neural network based beamforming and weighted prediction error dereverberation
|
|
- Elinor Maxwell
- 5 years ago
- Views:
Transcription
1 Interspeech September 2018, Hyderabad Integrating neural network based beamorming and weighted prediction error dereverberation Lukas Drude 1, Christoph Boeddeker 1, Jahn Heymann 1, Reinhold Haeb-Umbach 1, Keisuke Kinoshita 2, Marc Delcroix 2, Tomohiro Nakatani 2 1 Paderborn University, Department o Communications Engineering, Paderborn, Germany 2 NTT Communication Science Laboratories, NTT Corporation, Kyoto, Japan {drude, boeddeker, heymann, haeb}@nt.upb.de {kinoshita.k, marc.delcroix, nakatani.tomohiro}@lab.ntt.co.jp Abstract The weighted prediction error () algorithm has proven to be a very successul dereverberation method or the REVERB challenge. Likewise, neural network based mask estimation or beamorming demonstrated very good noise suppression in the CHiME 3 and CHiME 4 challenges. Recently, it has been shown that this estimator can also be trained to perorm dereverberation and denoising jointly. However, up to now a comparison o a neural beamormer and is still missing, so is an investigation into a combination o the two. Thereore, we here provide an extensive evaluation o both and consequently propose variants to integrate deep neural network based beamorming with. For these integrated variants we identiy a consistent word error rate (WER) reduction on two distinct databases. In particular, our study shows that deep learning based beamorming beneits rom a model-based dereverberation technique (i.e. ) and vice versa. Our key indings are: (a) Neural beamorming yields the lower WERs in comparison to the more channels and noise are present. (b) Integration o and a neural beamormer consistently outperorms all stand-alone systems. Index Terms: speech recognition, speech enhancement, denoising, dereverberation 1. Introduction Automatic speech recognition () has ound wide-spread acceptance and works astonishingly well in close-talking scenarios. Although a lot o research is devoted to ar-ield technologies and products are already commercially available, in reverberant and noisy environments remains challenging. These impairments can be addressed with a dereverberating and/or denoising ront-end. is a compelling algorithm to blindly dereverberate acoustic signals based on long-term linear prediction. Proposed as early as 2008 it gained constant attention in subsequent years [1, 2]. Possibly most prominent is its use in the Google Home speech assistant hardware in online conditions [3, 4]. Signal distortions due to noise are addressed quite dierently. Besides denoising auto-encoders [5, 6] and dictionary based approaches [7] beamorming is very successul [8, 9]. As o recently, deep neural networks (s) are employed to more robustly estimate masks which are then used to calculate speech and noise covariance matrices or beamorming. Most notably, all top perorming systems o the CHiME 4 challenge employed some orm o neural beamorming [10, 11, 12, 13]. Interestingly, when the mask estimator is trained to distinguish between early arriving speech vs. late arriving speech and noise, the beamormer dereverberates and denoises the observation [14]. However, there is limited research on integrating dereverberation and beamorming. Delcroix et al. compare consecutive execution o either minimum variance distortionless response (MVDR) beamorming ollowed by or vice versa [15] on REVERB challenge data [16] but do not consider any urther integration. They estimate noise covariance matrices on the ial and inal 10 rames o each utterance but do not use any kind o or mask estimation. Kinoshita et al. evaluate, how proits rom side inormation provided by a [17]. They conclude, that the does not improve overall dereverberation perormance over but allows to operate on much shorter block sizes without perormance degradation. Cohen et al. use and MVDR beamorming in an interesting twostep system: dereverberates the signal irst and provides a distortion covariance matrix based on the estimate o the reverberation tail. Then, an MVDR beamormer uses the covariance matrix o the reverberation tail instead o a noise covariance matrix [18]. Ito et al. elegantly integrate and model based blind source separation but again omit any discriminatively trained model or mask estimation [19]. To the best o our knowledge an integration o and neural network based beamorming is still missing. Thereore we propose to combine neural network based generalized eigenvalue () beamorming and dereverberation and assess consecutive and integrated combinations. To provide an in-depth analysis we evaluate all systems on the REVERB challenge data [16], which generally avors dereverberation algorithms, and on a database composed o Wall Street Journal (WSJ) utterances [20] with VoiceHome room impulse responses (RIRs) and noise [21, 22], which tends to avor beamorming approaches. 2. Scenario and signal model Let an observed signal vector y in the short time Fourier transormation (STFT) domain cover D microphone channels, where t and are the time rame index and requency bin index, respectively. The observation may be impaired by (convolutive) reverberation and additive noise n. We here assume, that the early part o the RIR is beneicial whereas the reverberation tail hinders understanding. Thereore, we consider the irst 50 ms ater the main peak o the RIR as h early and the remaining part as h (tail). Consequently, the mixing process in the STFT domain is modeled as ollows: y = x early + x tail + n, (1) where x early and x tail are the STFTs o the source signal convolved with the early RIR and with the late relections, respectively. In essence, we explicitly allow RIRs longer than the length o a DFT window /Interspeech
2 3. Baseline: The underlying idea o is to estimate the reverberation tail o the signal and subtract it rom the observation to obtain a maximum likelihood estimate o early arriving speech. Let us start by deining how to obtain a single channel estimate with given ilter weights g τ,,d,d : +K 1,d = y,d g τ,,d,d y t τ,,d τ= d ˆx early ˆx early = y G H ỹ t, (2) where > 0 is a minimum delay to avoid removing correlations caused by the speech source, K is the number o taps used or estimation and d is the sensor index. To simpliy notation G C DK D and ỹ t, C DK 1 are stacked representations o the ilter weights and the observations. maximizes the likelihood o the model under the assumption that each direct signal is a realization o a zero-mean complex (proper) Gaussian with an unknown time-varying variance λ : p(x early,d ; λ ) = CN (x early,d ; 0, λ ). (3) The maximum likelihood optimization does not lead to a closed orm solution. However, an iterative procedure alternates between estimating the ilter coeicients G and the time-varying variance λ : Step 1) λ = Step 2) R = t P = t 1 (δ δ) D t+δ τ=t δ d ˆx early τ,,d 2, (4) ỹ t, ỹ H t, λ C DK DK, (5) ỹ t, y H λ C DK D, (6) G = R 1 P C DK D. (7) Here, we use a context o (δ δ) rames to improve the variance estimate as proposed by Nakatani et al. [23]. 4. Baseline: Neural beamorming Neural beamorming combines a -based mask estimator with an analytic ormulation to obtain a beamorming vector or speech enhancement [14]. Firstly, a mask estimation network is trained on single channel magnitude spectra to guarantee independence rom the microphone array coniguration. The training minimizes a cross entropy loss with an oracle speech and distortion mask. For denoising, the mask estimator is trained to dierentiate between speech and noise. As demonstrated in an earlier study [14], the neural beamormer given the right training targets is also able to perorm denoising and dereverberation simultaneously to some degree. Thereore, we here opt to train it to distinguish between the early arriving speech on the one hand and the late arriving speech and noise on the other hand: M (s/n),d = 1, 0, else, x early,d 2 x tail,d +n,d 2 s th (s/n), n where th (s/n) are threshold values to improve robustness 1. The mask estimator consists o a bidirectional long short-term memory (BLSTM) layer with 512 orward and 512 backward units, 1 For details regarding the thresholds see [14] Sec (8) two linear layers with 1024 units and ELU activation unctions [24] and a inal linear layer with units and a sigmoid activation unction [14]. Thus, the inal layer yields the two masks or each channel. The masks or each channel are ˆM (s) (n) pooled with a median operation to obtain and ˆM. The beamormer has proven to be robust with respect to numerical instabilities and yields great improvements in terms o both signal to noise ratio (SNR) gain and WER reduction, while oten outperorming the requently used MVDR beamormer [8, 14]. It optimizes the expected output SNR: w = argmax w w H Φ (s) w w H Φ (n) w, (9) where the spatial covariance matrices are obtained by a weighted mean o dyadic products: Φ (s/n) = / y y H. (10) t The enhanced signal is then given by ˆx early = w H y. It is naturally constrained to linear eects and leaves all urther nonlinear enhancements to the acoustic model (AM). To reduce distortions caused by arbitrary scaling o each beamorming vector, we opt to apply blind analytic normalization (BAN) [8]. 5. Proposed systems We propose and analyze three speech enhancement ront-ends which can then be used with any single-channel back-end. The irst two combine with neural beamorming but avoid any interaction between the two algorithms. Both can be seen as the logical extension o [15] with a neural network mask estimator or the beamorming step. The third is a real integration o both algorithms with a eedback loop Neural beamorming ollowed by One option is to perorm neural beamorming irst as depicted in Fig. 1. A estimates masks which are then used to obtain speech and distortion covariance matrices with Eq. (10). ilter coeicients are then obtained with Eq. (9) and applied to the observation to obtain an intermediate enhanced singlechannel signal ˆx early = w H y. Subsequently three iterations o single channel according to Eqs. (4) (7) (omitting the summation over d) are perormed to obtain an estimate ˆx early Using single channel is signiicantly aster, but crosschannel inormation can not be exploited or urther dereverberation. In this constellation a context o δ > 0 is particularly important or a robust power estimate. y t Figure 1: Neural beamorming ollowed by. Power in each time-requency bin is estimated in the block. can just operate on a single channel but is exposed to less noise
3 5.2. ollowed by neural beamorming When applying irst as in Fig. 2, the power estimation is ialized with the observed signal plugged into Eq. (4). Consecutively three iterations are perormed. The algorithm has to estimate more parameters on each utterance but can potentially use cross-channel inormation or dereverberation to obtain. Then, the end result is obtained by applying the beamorming vector w to the dereverberated signal, although the mask estimator trained to select early arriving speech just saw the observation y. It is possible to perorm mask estimation on which yields urther improvements. y λ ˆM (s/n) Figure 2: ollowed by neural beamorming. The beamormer receives a dereverberated version o the observation. Power in each time-requency bin is estimated in the block Integration o neural beamorming and Alternatively, the beamorming step can be integrated into the loop as shown in Fig. 3. This way, the power estimates λ or are computed rom the beamorming result and are already more precise. Assuming a number o three iterations, the mask estimator has to run only once whereas beamorming is perormed our times. Since the beamorming is much aster than the neural network additional beamorming steps are cheap. It is possible to run the mask estimator on in each iteration which improves the WERs but is computationally more expensive. y λ ˆx early Figure 3: Proposed way to integrate neural beamorming and dereverberation. Masks or beamorming are provided by a. Power in each time-requency bin is estimated in the block. Ater the irst iteration, beamorming is applied to the dereverberated estimate instead o the input y. 6. Acoustic model To train an acoustic model or each database, we irst extracted alignments by processing the early arriving speech x early using a triphone GMM-HMM recognizer. We then extracted MFCC eatures rom the reverberated and noisy observations with deltas and delta-deltas to train a 7+1 layer acoustic model with a context o rames. The training recipe adopts the mechanics o the CHiME 3 baseline training recipe master/egs/chime3/s5/local/run_dnn.sh 7. Evaluation To assess the advantages and disadvantages o each approach we evaluate the proposed combinations and the stand-alone systems in terms o WERs on two distinct databases: (a) The REVERB challenge database is very reverberated with very little noise. However, the reverberation is realistic, since the database contains real recordings in reverberant rooms. (b) The WSJ+VoiceHome database is airly noisy. The room impulse responses are recorded and convolved with the utterances granting more control over the simulation setup. All presented WERs are obtained using the standard trigram language model available with the WSJ corpus. In all presented results language model weight {4,..., 15}, number o ilter taps K {1,..., 20}, delay {1, 2, 3} and context δ {0, 1} or power estimation were optimized on the development set o each database. The DFT window size was set to 1024 (64 ms) while the shit was set to 256 (16 ms) or all reported results Results on REVERB challenge data The REVERB challenge dataset [16] contains simulated and real utterances. The training data only consists o simulated recordings while we used only the real development and test recordings or cross-validation and test, respectively. For simulated data WSJCAM0 utterances [25] are convolved with measured RIRs. Noise is added with approximately 20 db SNR. Reverberation times (T60) are in the range o ms. The real dataset consists o utterances rom the MC-WSJ-AV corpus [26] which are recorded in a noisy reverberant room with a reverberation time o approximately 700 ms. First, we analyzed the eect o dierent numbers o ilter taps K on the WER. Previous studies ound, that longer utterances allow higher numbers o ilter taps since the parameters can be more precisely estimated. For the given test dataset with an average utterance duration o 6.5 s around 7 taps (160 ms ield o view) turned out to be optimal as visualized in Fig. 4. The integration as well as show a similar trend or higher K. However, the steep WER increase or K < 4 is gone. Already a very small number o taps yields an improvement over the dereverberating beamormer itsel. One possible interpretation is, that reduction o very late reverber- WER in % Unprocessed Integration Number o ilter taps K Figure 4: WER depending on number o taps K or dierent systems on the REVERB real evaluation set with 8 channels. Integrated methods tend to work better or lower number o taps. 3045
4 Table 1: WERs in % or all systems evaluated on the REVERB real evaluation dataset with dierent number o channels. Many systems coincide or the single channel track. Table 2: WERs in % or all systems evaluated on the WSJ+VoiceHome dataset with dierent number o channels. Many systems coincide or the single channel track. Front-end Number o channels Front-end Number o channels Unprocessed Integration Unprocessed Integration ation is already done by the dereverberating beamormer and just needs to cover shorter time dierences. The combination is airly constant in the range o 0 to 20 taps since this combination just needs to estimate K K instead o D K D K coeicients. Tbl. 1 summarizes WERs or all systems. The WER or unprocessed signals coincides with beamorming when just one channel is available since a BAN ilter is used. In combinations, the WER reduction is achieved due to. For up to two channels outperorms a dereverberating beamormer. The best perormance with 8 channels is achieved when using with a relative WER reduction o 51 % over the unprocessed system and 25 % over alone. Nevertheless, causes a relative WER reduction o 13 % over a dereverberating and is thereore crucial when thriving or best perormance Results on WSJ with VoiceHome RIRs and noise Similar to the simulation setup proposed by Bertin et al. [22] WSJ utterances (test eval92 5k) are convolved with VoiceHome RIRs and VoiceHome background noise [21] with reverberation times (T60) in the range o ms. Worth noting, the RIRs are recorded in three dierent houses, such that WER in % Unprocessed Integration SNR in db Figure 5: Comparison o WERs or dierent SNR conditions and with 8 channels. The WSJ+VoiceHome test dataset was recreated or each SNR condition. All parameters or each datapoint are obtained on the development dataset. training, cross-validation and test can use disjunct RIRs to ensure generalization. The VoiceHome background noise is very dynamic and contains i.e. vacuum cleaner, dish washing or interviews on television typically ound in households. On the WSJ+VoiceHome dataset best WERs were obtained without BAN. However, to ease comparison with the REVERB results, we opted to present results with BAN. Since [15] states that shows some noise robustness although the original derivation does not explicitly model additive noise, we irst analyze the eect o dierent SNR conditions. To do so, we sweep the SNR o every utterance o the test data in Fig. 5. It turns out that is able to improve WERs in all noise conditions. However, beamorming alone is the signiicant driver on this noisy database. Particularly in noisy conditions the integration o neural network based beamorming into the processing loop yields best perormance. Tbl. 2 summarizes WERs or each system with a ixed random SNR in the range o 0 to 10 db. The result does not coincide with the unprocessed signal, since our implementation may introduce phase changes when perorming an eigenvalue decomposition on a scalar. The integrated system works best or any number o D > 1 channels. 8. Future directions We are interested in analyzing how a single neural network can provide a power estimate or similar to [17] as well as a mask estimate or neural beamorming thus allowing to get rid o the iteration in a noise robust integrated system. 9. Conclusions Across both databases and a wide range o parameter combinations, a combination o and neural beamorming consistently improves WERs over (a) a dereverberating neural beamormer and (b) dereverberation. Neural beamorming is particularly important when many channels are available and the observations are very noisy. However, is crucial to obtain best WERs when combined with neural beamorming. 10. Acknowledgements Calculations leading to the results presented here were perormed on resources provided by the Paderborn Center or Parallel Computing. 3046
5 11. Reerences [1] T. Nakatani, T. Yoshioka, K. Kinoshita, M. Miyoshi, and B.-H. Juang, Blind speech dereverberation with multi-channel linear prediction based on short time ourier transorm representation, in IEEE International Conerence on Acoustics, Speech and Signal Processing (ICASSP), [2] T. Yoshioka and T. Nakatani, Generalization o multi-channel linear prediction methods or blind MIMO impulse response shortening, IEEE Transactions on Audio, Speech, and Language Processing, [3] B. Li, T. Sainath, A. Narayanan, J. Caroselli, M. Bacchiani, A. Misra, I. Sharan, H. Sak, G. Pundak, K. Chin et al., Acoustic modeling or Google home, INTERSECH, [4] J. Caroselli, I. Sharan, A. Narayanan, and R. Rose, Adaptive multichannel dereverberation or automatic speech recognition, in Interspeech, [5] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, Extracting and composing robust eatures with denoising autoencoders, in Proceedings o the 25th international conerence on Machine learning. ACM, [6] A. Maas, Q. Le, T. O Neil, O. Vinyals, P. Nguyen, and A. Ng, Recurrent neural networks or noise reduction in robust, in Thirteenth Annual Conerence o the International Speech Communication Association, [7] D. Lee and H. Seung, Learning the parts o objects by nonnegative matrix actorization, Nature, [8] E. Warsitz and R. Haeb-Umbach, Blind acoustic beamorming based on generalized eigenvalue decomposition, IEEE Transactions on Audio, Speech, and Language Processing, [9] M. Souden, J. Benesty, and S. Aes, A study o the LCMV and MVDR noise reduction ilters, IEEE Transactions on Signal Processing, [10] J. Du, Y.-H. Tu, L. Sun, F. Ma, H.-K. Wang, J. Pan, C. Liu, J.-D. Chen, and C.-H. Lee, The USTC-iFlytek system or CHiME-4 challenge, CHiME-4 workshop, [11] T. Menne, J. Heymann, A. Alexandridis, K. Irie, A. Zeyer, M. Kitza, P. Golik, I. Kulikov, L. Drude, R. Schlüter et al., The RWTH/UPB/FORTH system combination or the 4th CHiME challenge evaluation, in CHiME-4 workshop, [12] H. Erdogan, T. Hayashi, J. Hershey, T. Hori, C. Hori, W.-N. Hsu, S. Kim, J. Le Roux, Z. Meng, and S. Watanabe, Multi-channel speech recognition: LSTMs all the way through, in CHiME-4 workshop, [13] J. Heymann, L. Drude, and R. Haeb-Umbach, Wide residual BLSTM network with discriminative speaker adaptation or robust speech recognition, in CHiME-4 workshop, [14], A generic neural acoustic beamorming architecture or robust multi-channel speech processing, Computer Speech & Language, [15] M. Delcroix, T. Yoshioka, A. Ogawa, Y. Kubo, M. Fujimoto, N. Ito, K. Kinoshita, M. Espi, S. Araki, T. Hori et al., Strategies or distant speech recognition in reverberant environments, EURASIP Journal on Advances in Signal Processing, [16] K. Kinoshita, M. Delcroix, T. Yoshioka, T. Nakatani, A. Sehr, W. Kellermann, and R. Maas, The REVERB challenge: A common evaluation ramework or dereverberation and recognition o reverberant speech, in IEEE Workshop on Applications o Signal Processing to Audio and Acoustics (WASPAA), [17] K. Kinoshita, M. Delcroix, H. Kwon, T. Mori, and T. Nakatani, Neural network-based spectrum estimation or online dereverberation, Interspeech, [18] A. Cohen, G. Stemmer, S. Ingalsuo, and S. Markovich-Golan, Combined weighted prediction error and minimum variance distortionless response or dereverberation, in IEEE International Conerence on Acoustics, Speech and Signal Processing (ICASSP), [19] N. Ito, S. Araki, T. Yoshioka, and T. Nakatani, Relaxed disjointness based clustering or joint blind source separation and dereverberation, in International Workshop on Acoustic Signal Enhancement (IWAENC), [20] J. Garoolo, D. Gra, D. Paul, and D. Pallett, CSR-I (WSJ0) complete LDC93S6A, Web Download. Philadelphia: Linguistic Data Consortium, [Online]. Available: https: //catalog.ldc.upenn.edu/ldc93s6a [21] N. Bertin, E. Camberlein, E. Vincent, R. Lebarbenchon, S. Peillon, E. Lamand, S. Sivasankaran, F. Bimbot, I. Illina, A. Tom, S. Fleury, and E. Jamet, A French corpus or distant-microphone speech processing in real homes, in Interspeech, [22] S. Sivasankaran, E. Vincent, and I. Illina, A combined evaluation o established and new approaches or speech recognition in varied reverberation conditions, Computer Speech & Language, [23] T. Nakatani, T. Yoshioka, K. Kinoshita, M. Miyoshi, and B.-H. Juang, Speech dereverberation based on variance-normalized delayed linear prediction, IEEE Transactions on Audio, Speech, and Language Processing, [24] D.-A. Clevert, T. Unterthiner, and S. Hochreiter, Fast and accurate deep network learning by exponential linear units (ELUs), arxiv preprint arxiv: , [25] T. Robinson, J. Fransen, D. Pye, J. Foote, and S. Renals, WSJ- CAM0: a British English speech corpus or large vocabulary continuous speech recognition, in International Conerence on Acoustics, Speech, and Signal Processing (ICASSP), [26] M. Lincoln, I. McCowan, J. Vepa, and H. K. Maganti, The multichannel Wall Street Journal audio visual corpus (MC-WSJ-AV): Speciication and ial experiments, in IEEE Workshop on Automatic Speech Recognition and Understanding,
Deep clustering-based beamforming for separation with unknown number of sources
INTERSPEECH 2017 August 20 24, 2017, Stockholm, Sweden Deep clustering-based beamorming or separation with unknown number o sources Takuya Higuchi, Keisuke Kinoshita, Marc Delcroix, Kateřina Žmolíková,
More informationA unified convolutional beamformer for simultaneous denoising and dereverberation
1 A unified convolutional beamformer for simultaneous denoising and dereverberation Tomohiro Nakatani, Senior Member, IEEE, Keisuke Kinoshita, Senior Member, IEEE arxiv:1812.08400v2 [eess.as] 26 Dec 2018
More informationMASK WEIGHTED STFT RATIOS FOR RELATIVE TRANSFER FUNCTION ESTIMATION AND ITS APPLICATION TO ROBUST ASR
MASK WEIGHTED STFT RATIOS FOR RELATIVE TRANSFER FUNCTION ESTIMATION AND ITS APPLICATION TO ROBUST ASR Zhong-Qiu Wang, DeLiang Wang, Department of Computer Science and Engineering, The Ohio State University,
More informationSpatial Diffuseness Features for DNN-Based Speech Recognition in Noisy and Reverberant Environments
Spatial Diffuseness Features for DNN-Based Speech Recognition in Noisy and Reverberant Environments Andreas Schwarz, Christian Huemmer, Roland Maas, Walter Kellermann Lehrstuhl für Multimediakommunikation
More informationUncertainty training and decoding methods of deep neural networks based on stochastic representation of enhanced features
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Uncertainty training and decoding methods of deep neural networks based on stochastic representation of enhanced features Tachioka, Y.; Watanabe,
More informationarxiv: v2 [cs.sd] 15 Nov 2017
Rank-1 Constrained Multichannel Wiener Filter for Speech Recognition in Noisy Environments Ziteng Wang a,, Emmanuel Vincent b, Romain Serizel c, Yonghong Yan a arxiv:1707.00201v2 [cs.sd] 15 Nov 2017 a
More informationSINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan
SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS Emad M. Grais and Hakan Erdogan Faculty of Engineering and Natural Sciences, Sabanci University, Orhanli
More informationTHE MARKOV SELECTION MODEL FOR CONCURRENT SPEECH RECOGNITION
THE MARKOV SELECTION MODEL FOR CONCURRENT SEECH RECOGNITION aris Smaragdis Adobe Systems Inc. Cambridge, MA, USA Bhiksha Raj Walt Disney Research ittsburgh, A, USA ABSTRACT In this paper we introduce a
More informationarxiv: v1 [cs.lg] 4 Aug 2016
An improved uncertainty decoding scheme with weighted samples for DNN-HMM hybrid systems Christian Huemmer 1, Ramón Fernández Astudillo 2, and Walter Kellermann 1 1 Multimedia Communications and Signal
More informationCOLLABORATIVE SPEECH DEREVERBERATION: REGULARIZED TENSOR FACTORIZATION FOR CROWDSOURCED MULTI-CHANNEL RECORDINGS. Sanna Wager, Minje Kim
COLLABORATIVE SPEECH DEREVERBERATION: REGULARIZED TENSOR FACTORIZATION FOR CROWDSOURCED MULTI-CHANNEL RECORDINGS Sanna Wager, Minje Kim Indiana University School of Informatics, Computing, and Engineering
More informationNOISE ROBUST RELATIVE TRANSFER FUNCTION ESTIMATION. M. Schwab, P. Noll, and T. Sikora. Technical University Berlin, Germany Communication System Group
NOISE ROBUST RELATIVE TRANSFER FUNCTION ESTIMATION M. Schwab, P. Noll, and T. Sikora Technical University Berlin, Germany Communication System Group Einsteinufer 17, 1557 Berlin (Germany) {schwab noll
More informationSPOC: An Innovative Beamforming Method
SPOC: An Innovative Beamorming Method Benjamin Shapo General Dynamics Ann Arbor, MI ben.shapo@gd-ais.com Roy Bethel The MITRE Corporation McLean, VA rbethel@mitre.org ABSTRACT The purpose o a radar or
More informationMaking Machines Understand Us in Reverberant Rooms [Robustness against reverberation for automatic speech recognition]
Making Machines Understand Us in Reverberant Rooms [Robustness against reverberation for automatic speech recognition] Yoshioka, T., Sehr A., Delcroix M., Kinoshita K., Maas R., Nakatani T., Kellermann
More informationTight integration of spatial and spectral features for BSS with Deep Clustering embeddings
INTERSPEECH 27 August 2 24, 27, Stocholm, Sweden Tight integration of spatial and spectral features for BSS with Deep Clustering embeddings Luas Drude, Reinhold Haeb-Umbach Paderborn University, Department
More informationMultichannel Online Dereverberation based on Spectral Magnitude Inverse Filtering
1 Multichannel Online Dereverberation based on Spectral Magnitude Inverse Filtering Xiaofei Li, Laurent Girin, Sharon Gannot and Radu Horaud arxiv:1812.08471v1 [cs.sd] 20 Dec 2018 Abstract This paper addresses
More informationProbabilistic Model of Error in Fixed-Point Arithmetic Gaussian Pyramid
Probabilistic Model o Error in Fixed-Point Arithmetic Gaussian Pyramid Antoine Méler John A. Ruiz-Hernandez James L. Crowley INRIA Grenoble - Rhône-Alpes 655 avenue de l Europe 38 334 Saint Ismier Cedex
More informationCOMP 408/508. Computer Vision Fall 2017 PCA for Recognition
COMP 408/508 Computer Vision Fall 07 PCA or Recognition Recall: Color Gradient by PCA v λ ( G G, ) x x x R R v, v : eigenvectors o D D with v ^v (, ) x x λ, λ : eigenvalues o D D with λ >λ v λ ( B B, )
More informationThe THU-SPMI CHiME-4 system : Lightweight design with advanced multi-channel processing, feature enhancement, and language modeling
The THU-SPMI CHiME-4 system : Lightweight design with advanced multi-channel processing, feature enhancement, and language modeling Hongyu Xiang, Bin ang, Zhijian Ou Speech Processing and Machine Intelligence
More informationDeep NMF for Speech Separation
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Deep NMF for Speech Separation Le Roux, J.; Hershey, J.R.; Weninger, F.J. TR2015-029 April 2015 Abstract Non-negative matrix factorization
More informationLeast-Squares Spectral Analysis Theory Summary
Least-Squares Spectral Analysis Theory Summary Reerence: Mtamakaya, J. D. (2012). Assessment o Atmospheric Pressure Loading on the International GNSS REPRO1 Solutions Periodic Signatures. Ph.D. dissertation,
More informationCoupled initialization of multi-channel non-negative matrix factorization based on spatial and spectral information
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Coupled initialization of multi-channel non-negative matrix factorization based on spatial and spectral information Tachioka, Y.; Narita, T.;
More informationGlobal SNR Estimation of Speech Signals using Entropy and Uncertainty Estimates from Dropout Networks
Interspeech 2018 2-6 September 2018, Hyderabad Global SNR Estimation of Speech Signals using Entropy and Uncertainty Estimates from Dropout Networks Rohith Aralikatti, Dilip Kumar Margam, Tanay Sharma,
More informationJOINT DEREVERBERATION AND NOISE REDUCTION BASED ON ACOUSTIC MULTICHANNEL EQUALIZATION. Ina Kodrasi, Simon Doclo
JOINT DEREVERBERATION AND NOISE REDUCTION BASED ON ACOUSTIC MULTICHANNEL EQUALIZATION Ina Kodrasi, Simon Doclo University of Oldenburg, Department of Medical Physics and Acoustics, and Cluster of Excellence
More informationBIDIRECTIONAL LSTM-HMM HYBRID SYSTEM FOR POLYPHONIC SOUND EVENT DETECTION
BIDIRECTIONAL LSTM-HMM HYBRID SYSTEM FOR POLYPHONIC SOUND EVENT DETECTION Tomoki Hayashi 1, Shinji Watanabe 2, Tomoki Toda 1, Takaaki Hori 2, Jonathan Le Roux 2, Kazuya Takeda 1 1 Nagoya University, Furo-cho,
More informationFREQUENCY DOMAIN SINGULAR VALUE DECOMPOSITION FOR EFFICIENT SPATIAL AUDIO CODING. Sina Zamani, Tejaswi Nanjundaswamy, Kenneth Rose
FREQUENCY DOMAIN SINGULAR VALUE DECOMPOSITION FOR EFFICIENT SPATIAL AUDIO CODING Sina Zamani, Tejaswi Nanjundaswamy, Kenneth Rose Department o Electrical and Computer Engineering, University o Caliornia
More informationA UNIFIED FRAMEWORK FOR MULTICHANNEL FAST QRD-LS ADAPTIVE FILTERS BASED ON BACKWARD PREDICTION ERRORS
A UNIFIED FRAMEWORK FOR MULTICHANNEL FAST QRD-LS ADAPTIVE FILTERS BASED ON BACKWARD PREDICTION ERRORS César A Medina S,José A Apolinário Jr y, and Marcio G Siqueira IME Department o Electrical Engineering
More informationESTIMATION OF RELATIVE TRANSFER FUNCTION IN THE PRESENCE OF STATIONARY NOISE BASED ON SEGMENTAL POWER SPECTRAL DENSITY MATRIX SUBTRACTION
ESTIMATION OF RELATIVE TRANSFER FUNCTION IN THE PRESENCE OF STATIONARY NOISE BASED ON SEGMENTAL POWER SPECTRAL DENSITY MATRIX SUBTRACTION Xiaofei Li 1, Laurent Girin 1,, Radu Horaud 1 1 INRIA Grenoble
More informationScattered Data Approximation of Noisy Data via Iterated Moving Least Squares
Scattered Data Approximation o Noisy Data via Iterated Moving Least Squares Gregory E. Fasshauer and Jack G. Zhang Abstract. In this paper we ocus on two methods or multivariate approximation problems
More informationAn Inverse-Gamma Source Variance Prior with Factorized Parameterization for Audio Source Separation
An Inverse-Gamma Source Variance Prior with Factorized Parameterization or Audio Source Separation Dionyssos Kounades-Bastian, Laurent Girin, Xavier Alameda-Pineda, Sharon Gannot, Radu Horaud To cite this
More informationJOINT ACOUSTIC AND SPECTRAL MODELING FOR SPEECH DEREVERBERATION USING NON-NEGATIVE REPRESENTATIONS
JOINT ACOUSTIC AND SPECTRAL MODELING FOR SPEECH DEREVERBERATION USING NON-NEGATIVE REPRESENTATIONS Nasser Mohammadiha Paris Smaragdis Simon Doclo Dept. of Medical Physics and Acoustics and Cluster of Excellence
More informationMultiple Sound Source Counting and Localization Based on Spatial Principal Eigenvector
INTERSPEECH 27 August 2 24, 27, Stockholm, Sweden Multiple Sound Source Counting and Localization Based on Spatial Principal Eigenvector Bing Yang, Hong Liu, Cheng Pang Key Laboratory of Machine Perception,
More informationNoise-Presence-Probability-Based Noise PSD Estimation by Using DNNs
Noise-Presence-Probability-Based Noise PSD Estimation by Using DNNs 12. ITG Fachtagung Sprachkommunikation Aleksej Chinaev, Jahn Heymann, Lukas Drude, Reinhold Haeb-Umbach Department of Communications
More informationDiffuse noise suppression with asynchronous microphone array based on amplitude additivity model
Diffuse noise suppression with asynchronous microphone array based on amplitude additivity model Yoshikazu Murase, Hironobu Chiba, Nobutaka Ono, Shigeki Miyabe, Takeshi Yamada, and Shoji Makino University
More informationEstimation and detection of a periodic signal
Estimation and detection o a periodic signal Daniel Aronsson, Erik Björnemo, Mathias Johansson Signals and Systems Group, Uppsala University, Sweden, e-mail: Daniel.Aronsson,Erik.Bjornemo,Mathias.Johansson}@Angstrom.uu.se
More informationExperiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition
Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition ABSTRACT It is well known that the expectation-maximization (EM) algorithm, commonly used to estimate hidden
More informationA Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement
A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement Simon Leglaive 1 Laurent Girin 1,2 Radu Horaud 1 1: Inria Grenoble Rhône-Alpes 2: Univ. Grenoble Alpes, Grenoble INP,
More informationarxiv: v2 [cs.sd] 15 May 2017
FREQUENCY DOMAIN SINGULAR VALUE DECOMPOSITION FOR EFFICIENT SPATIAL AUDIO CODING Sina Zamani, Tejaswi Nanjundaswamy, Kenneth Rose Department o Electrical and Computer Engineering, University o Caliornia
More informationBLACK BOX OPTIMIZATION FOR AUTOMATIC SPEECH RECOGNITION. Shinji Watanabe and Jonathan Le Roux
2014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) BLACK BOX OPTIMIZATION FOR AUTOMATIC SPEECH RECOGNITION Shinji Watanabe and Jonathan Le Roux Mitsubishi Electric Research
More informationSeq2Tree: A Tree-Structured Extension of LSTM Network
Seq2Tree: A Tree-Structured Extension of LSTM Network Weicheng Ma Computer Science Department, Boston University 111 Cummington Mall, Boston, MA wcma@bu.edu Kai Cao Cambia Health Solutions kai.cao@cambiahealth.com
More informationEstimating Correlation Coefficient Between Two Complex Signals Without Phase Observation
Estimating Correlation Coefficient Between Two Complex Signals Without Phase Observation Shigeki Miyabe 1B, Notubaka Ono 2, and Shoji Makino 1 1 University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki
More informationTwo-step self-tuning phase-shifting interferometry
Two-step sel-tuning phase-shiting intererometry J. Vargas, 1,* J. Antonio Quiroga, T. Belenguer, 1 M. Servín, 3 J. C. Estrada 3 1 Laboratorio de Instrumentación Espacial, Instituto Nacional de Técnica
More informationCHAPTER 8 ANALYSIS OF AVERAGE SQUARED DIFFERENCE SURFACES
CAPTER 8 ANALYSS O AVERAGE SQUARED DERENCE SURACES n Chapters 5, 6, and 7, the Spectral it algorithm was used to estimate both scatterer size and total attenuation rom the backscattered waveorms by minimizing
More informationTLT-5200/5206 COMMUNICATION THEORY, Exercise 3, Fall TLT-5200/5206 COMMUNICATION THEORY, Exercise 3, Fall Problem 1.
TLT-5/56 COMMUNICATION THEORY, Exercise 3, Fall Problem. The "random walk" was modelled as a random sequence [ n] where W[i] are binary i.i.d. random variables with P[W[i] = s] = p (orward step with probability
More informationAn Ensemble Kalman Smoother for Nonlinear Dynamics
1852 MONTHLY WEATHER REVIEW VOLUME 128 An Ensemble Kalman Smoother or Nonlinear Dynamics GEIR EVENSEN Nansen Environmental and Remote Sensing Center, Bergen, Norway PETER JAN VAN LEEUWEN Institute or Marine
More informationALTERNATIVE OBJECTIVE FUNCTIONS FOR DEEP CLUSTERING
ALTERNATIVE OBJECTIVE FUNCTIONS FOR DEEP CLUSTERING Zhong-Qiu Wang,2, Jonathan Le Roux, John R. Hershey Mitsubishi Electric Research Laboratories (MERL), USA 2 Department of Computer Science and Engineering,
More informationSINGLE-CHANNEL SPEECH PRESENCE PROBABILITY ESTIMATION USING INTER-FRAME AND INTER-BAND CORRELATIONS
204 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) SINGLE-CHANNEL SPEECH PRESENCE PROBABILITY ESTIMATION USING INTER-FRAME AND INTER-BAND CORRELATIONS Hajar Momeni,2,,
More informationAdditional exercises in Stationary Stochastic Processes
Mathematical Statistics, Centre or Mathematical Sciences Lund University Additional exercises 8 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
More information"Robust Automatic Speech Recognition through on-line Semi Blind Source Extraction"
"Robust Automatic Speech Recognition through on-line Semi Blind Source Extraction" Francesco Nesta, Marco Matassoni {nesta, matassoni}@fbk.eu Fondazione Bruno Kessler-Irst, Trento (ITALY) For contacts:
More informationAcoustic MIMO Signal Processing
Yiteng Huang Jacob Benesty Jingdong Chen Acoustic MIMO Signal Processing With 71 Figures Ö Springer Contents 1 Introduction 1 1.1 Acoustic MIMO Signal Processing 1 1.2 Organization of the Book 4 Part I
More informationBLSTM-HMM HYBRID SYSTEM COMBINED WITH SOUND ACTIVITY DETECTION NETWORK FOR POLYPHONIC SOUND EVENT DETECTION
BLSTM-HMM HYBRID SYSTEM COMBINED WITH SOUND ACTIVITY DETECTION NETWORK FOR POLYPHONIC SOUND EVENT DETECTION Tomoki Hayashi 1, Shinji Watanabe 2, Tomoki Toda 1, Takaaki Hori 2, Jonathan Le Roux 2, Kazuya
More informationSINGLE-CHANNEL BLIND ESTIMATION OF REVERBERATION PARAMETERS
SINGLE-CHANNEL BLIND ESTIMATION OF REVERBERATION PARAMETERS Clement S. J. Doire, Mike Brookes, Patrick A. Naylor, Dave Betts, Christopher M. Hicks, Mohammad A. Dmour, Søren Holdt Jensen Electrical and
More informationShinji Watanabe Mitsubishi Electric Research Laboratories (MERL) Xiong Xiao Nanyang Technological University. Marc Delcroix
Shinji Watanabe Mitsubishi Electric Research Laboratories (MERL) Xiong Xiao Nanyang Technological University Marc Delcroix NTT Communication Science Laboratories 2 1. 2. 3. 4. 3 List of abbreviations ASR
More informationAlternative Learning Algorithm for Stereophonic Acoustic Echo Canceller without Pre-Processing
1958 PAPER Special Section on Digital Signal Processing Alternative Learning Algorithm or Stereophonic Acoustic Echo Canceller without Pre-Processing Akihiro HIRANO a),kenjinakayama, Memers, Daisuke SOMEDA,
More information2. ETA EVALUATIONS USING WEBER FUNCTIONS. Introduction
. ETA EVALUATIONS USING WEBER FUNCTIONS Introduction So ar we have seen some o the methods or providing eta evaluations that appear in the literature and we have seen some o the interesting properties
More informationRobust Residual Selection for Fault Detection
Robust Residual Selection or Fault Detection Hamed Khorasgani*, Daniel E Jung**, Gautam Biswas*, Erik Frisk**, and Mattias Krysander** Abstract A number o residual generation methods have been developed
More informationThe achievable limits of operational modal analysis. * Siu-Kui Au 1)
The achievable limits o operational modal analysis * Siu-Kui Au 1) 1) Center or Engineering Dynamics and Institute or Risk and Uncertainty, University o Liverpool, Liverpool L69 3GH, United Kingdom 1)
More informationRecent advances in distant speech recognition
Interspeech 2016 tutorial: Recent advances in distant speech recognition Marc Delcroix NTT Communication Science Laboratories Shinji Watanabe Mitsubishi Electric Research Laboratories (MERL) Table of contents
More informationUltra Fast Calculation of Temperature Profiles of VLSI ICs in Thermal Packages Considering Parameter Variations
Ultra Fast Calculation o Temperature Proiles o VLSI ICs in Thermal Packages Considering Parameter Variations Je-Hyoung Park, Virginia Martín Hériz, Ali Shakouri, and Sung-Mo Kang Dept. o Electrical Engineering,
More informationHighway-LSTM and Recurrent Highway Networks for Speech Recognition
Highway-LSTM and Recurrent Highway Networks for Speech Recognition Golan Pundak, Tara N. Sainath Google Inc., New York, NY, USA {golan, tsainath}@google.com Abstract Recently, very deep networks, with
More informationIMPROVED NOISE CANCELLATION IN DISCRETE COSINE TRANSFORM DOMAIN USING ADAPTIVE BLOCK LMS FILTER
SANJAY KUMAR GUPTA* et al. ISSN: 50 3676 [IJESAT] INTERNATIONAL JOURNAL OF ENGINEERING SCIENCE & ADVANCED TECHNOLOGY Volume-, Issue-3, 498 50 IMPROVED NOISE CANCELLATION IN DISCRETE COSINE TRANSFORM DOMAIN
More informationCHAPTER 1: INTRODUCTION. 1.1 Inverse Theory: What It Is and What It Does
Geosciences 567: CHAPTER (RR/GZ) CHAPTER : INTRODUCTION Inverse Theory: What It Is and What It Does Inverse theory, at least as I choose to deine it, is the ine art o estimating model parameters rom data
More informationGaussian Process Regression Models for Predicting Stock Trends
Gaussian Process Regression Models or Predicting Stock Trends M. Todd Farrell Andrew Correa December 5, 7 Introduction Historical stock price data is a massive amount o time-series data with little-to-no
More informationFrom perceptrons to word embeddings. Simon Šuster University of Groningen
From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written
More informationSpeaker recognition by means of Deep Belief Networks
Speaker recognition by means of Deep Belief Networks Vasileios Vasilakakis, Sandro Cumani, Pietro Laface, Politecnico di Torino, Italy {first.lastname}@polito.it 1. Abstract Most state of the art speaker
More informationUnfolded Recurrent Neural Networks for Speech Recognition
INTERSPEECH 2014 Unfolded Recurrent Neural Networks for Speech Recognition George Saon, Hagen Soltau, Ahmad Emami and Michael Picheny IBM T. J. Watson Research Center, Yorktown Heights, NY, 10598 gsaon@us.ibm.com
More informationITERATED SRINKAGE ALGORITHM FOR BASIS PURSUIT MINIMIZATION
ITERATED SRINKAGE ALGORITHM FOR BASIS PURSUIT MINIMIZATION Michael Elad The Computer Science Department The Technion Israel Institute o technology Haia 3000, Israel * SIAM Conerence on Imaging Science
More informationOPTIMAL PLACEMENT AND UTILIZATION OF PHASOR MEASUREMENTS FOR STATE ESTIMATION
OPTIMAL PLACEMENT AND UTILIZATION OF PHASOR MEASUREMENTS FOR STATE ESTIMATION Xu Bei, Yeo Jun Yoon and Ali Abur Teas A&M University College Station, Teas, U.S.A. abur@ee.tamu.edu Abstract This paper presents
More informationResidual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition
INTERSPEECH 017 August 0 4, 017, Stockholm, Sweden Residual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition Jaeyoung Kim 1, Mostafa El-Khamy 1, Jungwon Lee 1 1 Samsung Semiconductor,
More informationA Priori SNR Estimation Using Weibull Mixture Model
A Priori SNR Estimation Using Weibull Mixture Model 12. ITG Fachtagung Sprachkommunikation Aleksej Chinaev, Jens Heitkaemper, Reinhold Haeb-Umbach Department of Communications Engineering Paderborn University
More informationAcoustic Vector Sensor based Speech Source Separation with Mixed Gaussian-Laplacian Distributions
Acoustic Vector Sensor based Speech Source Separation with Mixed Gaussian-Laplacian Distributions Xiaoyi Chen, Atiyeh Alinaghi, Xionghu Zhong and Wenwu Wang Department of Acoustic Engineering, School of
More informationEXPLOITING LSTM STRUCTURE IN DEEP NEURAL NETWORKS FOR SPEECH RECOGNITION
EXPLOITING LSTM STRUCTURE IN DEEP NEURAL NETWORKS FOR SPEECH RECOGNITION Tianxing He Jasha Droppo Key Lab. o Shanghai Education Commission or Intelligent Interaction and Cognitive Engineering SpeechLab,
More informationModeling Time-Frequency Patterns with LSTM vs. Convolutional Architectures for LVCSR Tasks
Modeling Time-Frequency Patterns with LSTM vs Convolutional Architectures for LVCSR Tasks Tara N Sainath, Bo Li Google, Inc New York, NY, USA {tsainath, boboli}@googlecom Abstract Various neural network
More informationSupplementary material for Continuous-action planning for discounted infinite-horizon nonlinear optimal control with Lipschitz values
Supplementary material or Continuous-action planning or discounted ininite-horizon nonlinear optimal control with Lipschitz values List o main notations x, X, u, U state, state space, action, action space,
More informationEstimation of Sample Reactivity Worth with Differential Operator Sampling Method
Progress in NUCLEAR SCIENCE and TECHNOLOGY, Vol. 2, pp.842-850 (2011) ARTICLE Estimation o Sample Reactivity Worth with Dierential Operator Sampling Method Yasunobu NAGAYA and Takamasa MORI Japan Atomic
More informationMULTISENSORY SPEECH ENHANCEMENT IN NOISY ENVIRONMENTS USING BONE-CONDUCTED AND AIR-CONDUCTED MICROPHONES. Mingzi Li,Israel Cohen and Saman Mousazadeh
MULTISENSORY SPEECH ENHANCEMENT IN NOISY ENVIRONMENTS USING BONE-CONDUCTED AND AIR-CONDUCTED MICROPHONES Mingzi Li,Israel Cohen and Saman Mousazadeh Department of Electrical Engineering, Technion - Israel
More informationarxiv: v1 [cs.it] 12 Mar 2014
COMPRESSIVE SIGNAL PROCESSING WITH CIRCULANT SENSING MATRICES Diego Valsesia Enrico Magli Politecnico di Torino (Italy) Dipartimento di Elettronica e Telecomunicazioni arxiv:403.2835v [cs.it] 2 Mar 204
More informationDept. Electronics and Electrical Engineering, Keio University, Japan. NTT Communication Science Laboratories, NTT Corporation, Japan.
JOINT SEPARATION AND DEREVERBERATION OF REVERBERANT MIXTURES WITH DETERMINED MULTICHANNEL NON-NEGATIVE MATRIX FACTORIZATION Hideaki Kagami, Hirokazu Kameoka, Masahiro Yukawa Dept. Electronics and Electrical
More informationFeature-Space Structural MAPLR with Regression Tree-based Multiple Transformation Matrices for DNN
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Feature-Space Structural MAPLR with Regression Tree-based Multiple Transformation Matrices for DNN Kanagawa, H.; Tachioka, Y.; Watanabe, S.;
More informationOBSERVER/KALMAN AND SUBSPACE IDENTIFICATION OF THE UBC BENCHMARK STRUCTURAL MODEL
OBSERVER/KALMAN AND SUBSPACE IDENTIFICATION OF THE UBC BENCHMARK STRUCTURAL MODEL Dionisio Bernal, Burcu Gunes Associate Proessor, Graduate Student Department o Civil and Environmental Engineering, 7 Snell
More informationIndependent Component Analysis and Unsupervised Learning
Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien National Cheng Kung University TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent
More informationIndependent Component Analysis and Unsupervised Learning. Jen-Tzung Chien
Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood
More informationGMM-based classification from noisy features
GMM-based classification from noisy features Alexey Ozerov, Mathieu Lagrange and Emmanuel Vincent INRIA, Centre de Rennes - Bretagne Atlantique STMS Lab IRCAM - CNRS - UPMC alexey.ozerov@inria.fr, mathieu.lagrange@ircam.fr,
More informationTinySR. Peter Schmidt-Nielsen. August 27, 2014
TinySR Peter Schmidt-Nielsen August 27, 2014 Abstract TinySR is a light weight real-time small vocabulary speech recognizer written entirely in portable C. The library fits in a single file (plus header),
More informationSOUND-source enhancement has been studied for many. DNN-based Source Enhancement to Increase Objective Sound Quality Assessment Score
DNN-based Source Enhancement to Increase Objective Sound Quality Assessment Score Yuma Koizumi Member, IEEE,, Kenta Niwa Member, IEEE,, Yusuke Hioka Senior Member, IEEE, Kazunori Kobayashi and Yoichi Haneda
More informationEstimation of Human Emotions Using Thermal Facial Information
Estimation o Human Emotions Using Thermal acial Inormation Hung Nguyen 1, Kazunori Kotani 1, an Chen 1, Bac Le 2 1 Japan Advanced Institute o Science and Technology, 1-1 Asahidai, Nomi, Ishikawa, Japan
More informationON ADVERSARIAL TRAINING AND LOSS FUNCTIONS FOR SPEECH ENHANCEMENT. Ashutosh Pandey 1 and Deliang Wang 1,2. {pandey.99, wang.5664,
ON ADVERSARIAL TRAINING AND LOSS FUNCTIONS FOR SPEECH ENHANCEMENT Ashutosh Pandey and Deliang Wang,2 Department of Computer Science and Engineering, The Ohio State University, USA 2 Center for Cognitive
More informationTemporal Modeling and Basic Speech Recognition
UNIVERSITY ILLINOIS @ URBANA-CHAMPAIGN OF CS 498PS Audio Computing Lab Temporal Modeling and Basic Speech Recognition Paris Smaragdis paris@illinois.edu paris.cs.illinois.edu Today s lecture Recognizing
More informationSegmental Recurrent Neural Networks for End-to-end Speech Recognition
Segmental Recurrent Neural Networks for End-to-end Speech Recognition Liang Lu, Lingpeng Kong, Chris Dyer, Noah Smith and Steve Renals TTI-Chicago, UoE, CMU and UW 9 September 2016 Background A new wave
More informationIn many diverse fields physical data is collected or analysed as Fourier components.
1. Fourier Methods In many diverse ields physical data is collected or analysed as Fourier components. In this section we briely discuss the mathematics o Fourier series and Fourier transorms. 1. Fourier
More informationExploring the Relationship between Conic Affinity of NMF Dictionaries and Speech Enhancement Metrics
Interspeech 2018 2-6 September 2018, Hyderabad Exploring the Relationship between Conic Affinity of NMF Dictionaries and Speech Enhancement Metrics Pavlos Papadopoulos, Colin Vaz, Shrikanth Narayanan Signal
More informationThe Deutsch-Jozsa Problem: De-quantization and entanglement
The Deutsch-Jozsa Problem: De-quantization and entanglement Alastair A. Abbott Department o Computer Science University o Auckland, New Zealand May 31, 009 Abstract The Deustch-Jozsa problem is one o the
More informationConvolutive Transfer Function Generalized Sidelobe Canceler
IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, VOL. XX, NO. Y, MONTH 8 Convolutive Transfer Function Generalized Sidelobe Canceler Ronen Talmon, Israel Cohen, Senior Member, IEEE, and Sharon
More informationImproving Reverberant VTS for Hands-free Robust Speech Recognition
Improving Reverberant VTS for Hands-free Robust Speech Recognition Y.-Q. Wang, M. J. F. Gales Cambridge University Engineering Department Trumpington St., Cambridge CB2 1PZ, U.K. {yw293, mjfg}@eng.cam.ac.uk
More informationPower Spectral Analysis of Elementary Cellular Automata
Power Spectral Analysis o Elementary Cellular Automata Shigeru Ninagawa Division o Inormation and Computer Science, Kanazawa Institute o Technology, 7- Ohgigaoka, Nonoichi, Ishikawa 92-850, Japan Spectral
More information(3) where the mixing vector is the Fourier transform of are the STFT coefficients of the sources I. INTRODUCTION
1830 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 7, SEPTEMBER 2010 Under-Determined Reverberant Audio Source Separation Using a Full-Rank Spatial Covariance Model Ngoc Q.
More informationMAXIMUM LIKELIHOOD BASED MULTI-CHANNEL ISOTROPIC REVERBERATION REDUCTION FOR HEARING AIDS
MAXIMUM LIKELIHOOD BASED MULTI-CHANNEL ISOTROPIC REVERBERATION REDUCTION FOR HEARING AIDS Adam Kuklasiński, Simon Doclo, Søren Holdt Jensen, Jesper Jensen, Oticon A/S, 765 Smørum, Denmark University of
More informationANALYSIS OF p-norm REGULARIZED SUBPROBLEM MINIMIZATION FOR SPARSE PHOTON-LIMITED IMAGE RECOVERY
ANALYSIS OF p-norm REGULARIZED SUBPROBLEM MINIMIZATION FOR SPARSE PHOTON-LIMITED IMAGE RECOVERY Aramayis Orkusyan, Lasith Adhikari, Joanna Valenzuela, and Roummel F. Marcia Department o Mathematics, Caliornia
More informationPhysics 5153 Classical Mechanics. Solution by Quadrature-1
October 14, 003 11:47:49 1 Introduction Physics 5153 Classical Mechanics Solution by Quadrature In the previous lectures, we have reduced the number o eective degrees o reedom that are needed to solve
More informationAn exploration of dropout with LSTMs
An exploration of out with LSTMs Gaofeng Cheng 1,3, Vijayaditya Peddinti 4,5, Daniel Povey 4,5, Vimal Manohar 4,5, Sanjeev Khudanpur 4,5,Yonghong Yan 1,2,3 1 Key Laboratory of Speech Acoustics and Content
More information3. Several Random Variables
. Several Random Variables. Two Random Variables. Conditional Probabilit--Revisited. Statistical Independence.4 Correlation between Random Variables. Densit unction o the Sum o Two Random Variables. Probabilit
More information