Efficient Target Activity Detection Based on Recurrent Neural Networks

Size: px
Start display at page:

Download "Efficient Target Activity Detection Based on Recurrent Neural Networks"

Transcription

1 Efficient Target Activity Detection Based on Recurrent Neural Networks D. Gerber, S. Meier, and W. Kellermann Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU)

2 Motivation Target φ tar oise 1 / 15

3 Motivation Target Interferer Interferer φ tar oise 1 / 15

4 Motivation Target Interferer Interferer φ tar Noise 1 / 15

5 Motivation Interferer Target φ tar Interferer Noise Goal: Detect time frames m with dominant target source 1 / 15

6 Motivation Interferer Target φ tar Interferer Noise Goal: Detect time frames m with dominant target source Target Activity Detection (TAD) 1 / 15

7 Motivation Proposed method: Artificial neural networks (ANNs) Meier and Kellermann (2016) Feature vector f Hidden layers Decision 2 / 15

8 Motivation Proposed method: Artificial neural networks (ANNs) Meier and Kellermann (2016) Feature vector f Hidden layers Decision Questions: How to define the feature vector? What network topology? How to incorporate memory? 2 / 15

9 Outline Motivation Features for TAD ANN-based feature combination Experiments 3 / 15

10 Features for TAD (1) Feature 1: Beamforming-based SNR estimate Target source components equalized accounting for measured HRTFs 4 / 15

11 Features for TAD (1) Feature 1: Beamforming-based SNR estimate Beamformer towards target ˆσ 2 s Target source components equalized accounting for measured HRTFs Add up equalized mic signals Beamformer 4 / 15

12 Features for TAD (1) Feature 1: Beamforming-based SNR estimate Beamformer towards target ˆσ 2 s Nullsteering beamformer towards target ˆσ 2 n Target source components equalized accounting for measured HRTFs Add up equalized mic signals Beamformer Subtract equalized (frontal) mic signals Nullformer 4 / 15

13 Features for TAD (1) Feature 1: Beamforming-based SNR estimate Beamformer towards target ˆσ 2 s Nullsteering beamformer towards target ˆσ 2 n Target source components equalized accounting for measured HRTFs Add up equalized mic signals Beamformer Subtract equalized (frontal) mic signals Nullformer SNR estimate as feature: f SNR (t) = ˆσ2 s (t) ˆσ 2 n(t) 4 / 15

14 Features for TAD (1) Feature 1: Beamforming-based SNR estimate Beamformer towards target ˆσ 2 s Nullsteering beamformer towards target ˆσ 2 n [ f t = f SNR (t) Target source components equalized accounting for measured HRTFs Add up equalized mic signals Beamformer Subtract equalized (frontal) mic signals Nullformer SNR estimate as feature: f SNR (t) = ˆσ2 s (t) ˆσ 2 n(t) ] T 4 / 15

15 Feature 2: Crosscorrelation ratio r 13 ( k) Features for TAD (2) K +K k (t) [ f t = f SNR (t) ] T 5 / 15

16 Feature 2: Crosscorrelation ratio Features for TAD (2) r 13 ( k) r 13 ( k T (t), t) K +K Target source creates peak at TDOA k T (t) k k T (t) [ f t = f SNR (t) ] T 5 / 15

17 Features for TAD (2) Feature 2: Crosscorrelation ratio r 13 ( k) r 13 ( k T (t), t) K +K k k T (t) Target source creates peak at TDOA k T (t) Ratio with strongest peak at k k T (t) as feature: f corr (t) = r 13 ( k T (t), t) max k k T (t) r 13( k, t) [ f t = f SNR (t) ] T 5 / 15

18 Feature 2: Crosscorrelation ratio Features for TAD (2) max k k T (t) r 13 ( k) r 13 ( k T (t), t) r 13 ( k, t) K +K k T (t) k Target source creates peak at TDOA k T (t) Ratio with strongest peak at k k T (t) as feature: f corr (t) = r 13 ( k T (t), t) max k k T (t) r 13( k, t) [ f t = f SNR (t) ] T 5 / 15

19 Feature 2: Crosscorrelation ratio Features for TAD (2) max k k T (t) r 13 ( k) r 13 ( k T (t), t) r 13 ( k, t) K +K k T (t) [ f t = f SNR (t) k Target source creates peak at TDOA k T (t) Ratio with strongest peak at k k T (t) as feature: f corr (t) = r 13 ( k T (t), t) max k k T (t) r 13( k, t) Interpretation: Power ratio between target and strongest interferer ] T 5 / 15

20 Feature 2: Crosscorrelation ratio Features for TAD (2) max k k T (t) r 13 ( k) r 13 ( k T (t), t) r 13 ( k, t) K +K k T (t) k [ f t = f SNR (t), f corr (t) Target source creates peak at TDOA k T (t) Ratio with strongest peak at k k T (t) as feature: f corr (t) = r 13 ( k T (t), t) max k k T (t) r 13( k, t) Interpretation: Power ratio between target and strongest interferer ] T 5 / 15

21 Features for TAD (3) Feature 3: Adaptive differential beamformer φ diff Adaptive differential beamformer [Elko and Pong (1995)] steers null towards dominant sources [ f t = f SNR (t), f corr (t) ] T 6 / 15

22 Features for TAD (3) Feature 3: Adaptive differential beamformer φ diff Adaptive differential beamformer [Elko and Pong (1995)] steers null towards dominant sources [ f t = f SNR (t), f corr (t) ] T 6 / 15

23 Features for TAD (3) Feature 3: Adaptive differential beamformer φ diff Adaptive differential beamformer [Elko and Pong (1995)] steers null towards dominant sources [ f t = f SNR (t), f corr (t) ] T 6 / 15

24 Features for TAD (3) Feature 3: Adaptive differential beamformer φ diff Adaptive differential beamformer [Elko and Pong (1995)] steers null towards dominant sources Direction φ diff as feature: f diff (t) = [cos (φ diff (t)), sin (φ diff (t))] T [ f t = f SNR (t), f corr (t), f diff (t) T ] T 6 / 15

25 Features for TAD (4) Feature 4: Microphone signal variances Detect overall powers and unilateral scenarios f σ 2(t) = [ σ 2 v 1 (t), σ 2 v 3 (t) ] T [ f t = f SNR (t), f corr (t), f diff (t) T, f σ 2(t) T ] T 7 / 15

26 Features for TAD (4) Feature 4: Microphone signal variances Detect overall powers and unilateral scenarios f σ 2(t) = [ σ 2 v 1 (t), σ 2 v 3 (t) ] T Feature 5: Target source DoA Complementing f diff (t) f φ (t) = [cos (φ tar (t)), sin (φ tar (t))] T [ ] T f t = f SNR (t), f corr (t), f diff (t) T, f σ 2(t) T, f φ (t) T 7 / 15

27 ANN-based feature combination Mapping of the feature vector f t to a decision y t Feedforward Neural Network (FNN) [Meier and Kellermann (2016)] f t y t 8 / 15

28 ANN-based feature combination Mapping of the feature vector f t to a decision y t Feedforward Neural Network (FNN) [Meier and Kellermann (2016)] Subsequent decisions dependent How to incorporate memory? f t y t 8 / 15

29 ANN-based feature combination Mapping of the feature vector f t to a decision y t Feedforward Neural Network (FNN) [Meier and Kellermann (2016)] Subsequent decisions dependent How to incorporate memory? Sequential FNNs f t f t 1 f t 2 y t 8 / 15

30 ANN-based feature combination Mapping of the feature vector f t to a decision y t Feedforward Neural Network (FNN) [Meier and Kellermann (2016)] Subsequent decisions dependent How to incorporate memory? Sequential FNNs Recurrent Neural Networks (RNNs) f t y t 8 / 15

31 ANN-based feature combination Mapping of the feature vector f t to a decision y t Feedforward Neural Network (FNN) [Meier and Kellermann (2016)] Subsequent decisions dependent How to incorporate memory? Sequential FNNs Recurrent Neural Networks (RNNs) Long Short-Term Memory (LSTM) networks ( longer memory) f t y t 8 / 15

32 ANN-based feature combination Mapping of the feature vector f t to a decision y t Feedforward Neural Network (FNN) [Meier and Kellermann (2016)] Subsequent decisions dependent How to incorporate memory? Sequential FNNs Recurrent Neural Networks (RNNs) Long Short-Term Memory (LSTM) networks ( longer memory) Gated Recurrent Unit (GRU) networks ( less complex) f t y t 8 / 15

33 Experiments Table: Investigated network types. Feed-forward FNN α=0 Non-smoothed features FNN α=0.7 Recursively smoothed features (α = 0.7) FNN seq Sequential features Recurrent RNN Vanilla RNNs LSTM Long Short-Term Memory GRU Gated Recurrent Unit 9 / 15

34 Experiments Data set consisting of 38 recordings: 1 target speaker 1-4 interferers (same level as target speaker) Babble noise (SNR: 10 db) Various source positions, living room-like environment (T ms) Recordings with hearing aids on a KEMAR head Training set: 29 scenarios (20s each), test set: 9 scenarios (10s each) Sampling rate: 16kHz 10 / 15

35 Experiments Data set consisting of 38 recordings: 1 target speaker 1-4 interferers (same level as target speaker) Babble noise (SNR: 10 db) Various source positions, living room-like environment (T ms) Recordings with hearing aids on a KEMAR head Training set: 29 scenarios (20s each), test set: 9 scenarios (10s each) Sampling rate: 16kHz Ground truth: oracle SINR > 10dB 10 / 15

36 Experiments Data set consisting of 38 recordings: 1 target speaker 1-4 interferers (same level as target speaker) Babble noise (SNR: 10 db) Various source positions, living room-like environment (T ms) Recordings with hearing aids on a KEMAR head Training set: 29 scenarios (20s each), test set: 9 scenarios (10s each) Sampling rate: 16kHz Ground truth: oracle SINR > 10dB Implementation in Python (Theano/Lasagne) Regularization: FNNs dropout layers, RNNs synaptic noise Intel i7-920 CPU, Geforce GTX 970 GPU Number of layers L [1, 6] Number of nodes per layer N [1, 32] Best network topology chosen for each network type individually 10 / 15

37 Evaluation measures Matthews Correlation Coefficient: Tg MCC = TP TN FP FN (TP + FP)(TP + FN)(TN + FP)(TN + FN) 11 / 15

38 Evaluation measures Matthews Correlation Coefficient: True Positives Tg MCC = TP TN FP FN (TP + FP)(TP + FN)(TN + FP)(TN + FN) 11 / 15

39 Evaluation measures Matthews Correlation Coefficient: True Negatives Tg MCC = TP TN FP FN (TP + FP)(TP + FN)(TN + FP)(TN + FN) 11 / 15

40 Evaluation measures Matthews Correlation Coefficient: False Positives Tg MCC = TP TN FP FN (TP + FP)(TP + FN)(TN + FP)(TN + FN) 11 / 15

41 Evaluation measures Matthews Correlation Coefficient: False Negatives Tg MCC = TP TN FP FN (TP + FP)(TP + FN)(TN + FP)(TN + FN) 11 / 15

42 Evaluation measures Matthews Correlation Coefficient: Tg MCC = TP TN FP FN (TP + FP)(TP + FN)(TN + FP)(TN + FN) Perfect detection: MCC = 1, Random detection: MCC = 0, Total disagreement: MCC = 1, 11 / 15

43 Evaluation measures Matthews Correlation Coefficient: Tg MCC = TP TN FP FN (TP + FP)(TP + FN)(TN + FP)(TN + FN) Area Under Curve (AUC): Receiver operating curve (ROC) TP rate FP rate Perfect detection: MCC = 1, Random detection: MCC = 0, Total disagreement: MCC = 1, 11 / 15

44 Evaluation measures Matthews Correlation Coefficient: Tg MCC = TP TN FP FN (TP + FP)(TP + FN)(TN + FP)(TN + FN) Area Under Curve (AUC): Receiver operating curve (ROC) High TP rate at low FP rate AUC 1 TP rate AUC FP rate Perfect detection: MCC = 1, Random detection: MCC = 0, Total disagreement: MCC = 1, 11 / 15

45 Evaluation measures Matthews Correlation Coefficient: Tg MCC = TP TN FP FN (TP + FP)(TP + FN)(TN + FP)(TN + FN) Area Under Curve (AUC): Receiver operating curve (ROC) High TP rate at low FP rate AUC 1 TP rate AUC FP rate Perfect detection: MCC = 1, AUC = 1 Random detection: MCC = 0, AUC = 0.5 Total disagreement: MCC = 1, AUC = 0 11 / 15

46 Results Performance Complexity Network type ACC AUC MCC N L RTT FNN α=0 FNN α=0.7 FNN (seq) RNN LSTM GRU N # nodes per layer ACC TP+TN accuracy ( TP+TN+FP+FN ) L # layers RTT relative testing time 12 / 15

47 Results Performance Complexity Network type ACC AUC MCC N L RTT FNN α= FNN α= FNN (seq) RNN LSTM GRU N # nodes per layer ACC TP+TN accuracy ( TP+TN+FP+FN ) L # layers RTT relative testing time 12 / 15

48 Results Performance Complexity Network type ACC AUC MCC N L RTT FNN α= FNN α= FNN (seq) RNN LSTM GRU N # nodes per layer ACC TP+TN accuracy ( TP+TN+FP+FN ) L # layers RTT relative testing time 12 / 15

49 Results Performance Complexity Network type ACC AUC MCC N L RTT FNN α= FNN α= FNN (seq) RNN LSTM GRU N # nodes per layer ACC TP+TN accuracy ( TP+TN+FP+FN ) L # layers RTT relative testing time 12 / 15

50 Results Performance Complexity Network type ACC AUC MCC N L RTT FNN α= FNN α= FNN (seq) RNN LSTM GRU N # nodes per layer ACC TP+TN accuracy ( TP+TN+FP+FN ) L # layers RTT relative testing time 12 / 15

51 Summary ANN-based feature combination leads to good detection of target dominance intervals Exploiting memory is beneficial for TAD Vanilla RNNs outperform sequential FNNs even with smaller network depth LSTMs and GRUs do not lead to significant improvements no benefit from long-term memory 13 / 15

52 Thank you for your attention! 14 / 15

53 References Elko, G. W. and Pong, A.-T. N. (1995). A simple adaptive first-order differential microphone. In Proc. IEEE Workshop Applications Signal Process. Audio Acoustics (WASPAA), pages Meier, S. and Kellermann, W. (2016). Artificial neural network-based feature combination for spatial voice activity detection. In Proc. Annual Conf. Int. Speech Communication Assoc. (Interspeech), pages , San Francisco, USA. 15 / 15

54 Appendix: Complete results Performance Network type ACC AUC MCC FNN (nos) FNN (smo) FNN (seq) RNN LSTM GRU Complexity Network type N L P / P RRT RTT FNN (nos) / FNN (smo) / FNN (seq) / RNN / LSTM / GRU / B 1

Source localization and separation for binaural hearing aids

Source localization and separation for binaural hearing aids Source localization and separation for binaural hearing aids Mehdi Zohourian, Gerald Enzner, Rainer Martin Listen Workshop, July 218 Institute of Communication Acoustics Outline 1 Introduction 2 Binaural

More information

Spatial Diffuseness Features for DNN-Based Speech Recognition in Noisy and Reverberant Environments

Spatial Diffuseness Features for DNN-Based Speech Recognition in Noisy and Reverberant Environments Spatial Diffuseness Features for DNN-Based Speech Recognition in Noisy and Reverberant Environments Andreas Schwarz, Christian Huemmer, Roland Maas, Walter Kellermann Lehrstuhl für Multimediakommunikation

More information

Comparison of RTF Estimation Methods between a Head-Mounted Binaural Hearing Device and an External Microphone

Comparison of RTF Estimation Methods between a Head-Mounted Binaural Hearing Device and an External Microphone Comparison of RTF Estimation Methods between a Head-Mounted Binaural Hearing Device and an External Microphone Nico Gößling, Daniel Marquardt and Simon Doclo Department of Medical Physics and Acoustics

More information

RECURRENT NETWORKS I. Philipp Krähenbühl

RECURRENT NETWORKS I. Philipp Krähenbühl RECURRENT NETWORKS I Philipp Krähenbühl RECAP: CLASSIFICATION conv 1 conv 2 conv 3 conv 4 1 2 tu RECAP: SEGMENTATION conv 1 conv 2 conv 3 conv 4 RECAP: DETECTION conv 1 conv 2 conv 3 conv 4 RECAP: GENERATION

More information

ESTIMATION OF RELATIVE TRANSFER FUNCTION IN THE PRESENCE OF STATIONARY NOISE BASED ON SEGMENTAL POWER SPECTRAL DENSITY MATRIX SUBTRACTION

ESTIMATION OF RELATIVE TRANSFER FUNCTION IN THE PRESENCE OF STATIONARY NOISE BASED ON SEGMENTAL POWER SPECTRAL DENSITY MATRIX SUBTRACTION ESTIMATION OF RELATIVE TRANSFER FUNCTION IN THE PRESENCE OF STATIONARY NOISE BASED ON SEGMENTAL POWER SPECTRAL DENSITY MATRIX SUBTRACTION Xiaofei Li 1, Laurent Girin 1,, Radu Horaud 1 1 INRIA Grenoble

More information

Monaural speech separation using source-adapted models

Monaural speech separation using source-adapted models Monaural speech separation using source-adapted models Ron Weiss, Dan Ellis {ronw,dpwe}@ee.columbia.edu LabROSA Department of Electrical Enginering Columbia University 007 IEEE Workshop on Applications

More information

MAXIMUM LIKELIHOOD BASED MULTI-CHANNEL ISOTROPIC REVERBERATION REDUCTION FOR HEARING AIDS

MAXIMUM LIKELIHOOD BASED MULTI-CHANNEL ISOTROPIC REVERBERATION REDUCTION FOR HEARING AIDS MAXIMUM LIKELIHOOD BASED MULTI-CHANNEL ISOTROPIC REVERBERATION REDUCTION FOR HEARING AIDS Adam Kuklasiński, Simon Doclo, Søren Holdt Jensen, Jesper Jensen, Oticon A/S, 765 Smørum, Denmark University of

More information

NOISE ROBUST RELATIVE TRANSFER FUNCTION ESTIMATION. M. Schwab, P. Noll, and T. Sikora. Technical University Berlin, Germany Communication System Group

NOISE ROBUST RELATIVE TRANSFER FUNCTION ESTIMATION. M. Schwab, P. Noll, and T. Sikora. Technical University Berlin, Germany Communication System Group NOISE ROBUST RELATIVE TRANSFER FUNCTION ESTIMATION M. Schwab, P. Noll, and T. Sikora Technical University Berlin, Germany Communication System Group Einsteinufer 17, 1557 Berlin (Germany) {schwab noll

More information

Modifying Voice Activity Detection in Low SNR by correction factors

Modifying Voice Activity Detection in Low SNR by correction factors Modifying Voice Activity Detection in Low SNR by correction factors H. Farsi, M. A. Mozaffarian, H.Rahmani Department of Electrical Engineering University of Birjand P.O. Box: +98-9775-376 IRAN hfarsi@birjand.ac.ir

More information

SINGLE-CHANNEL SPEECH PRESENCE PROBABILITY ESTIMATION USING INTER-FRAME AND INTER-BAND CORRELATIONS

SINGLE-CHANNEL SPEECH PRESENCE PROBABILITY ESTIMATION USING INTER-FRAME AND INTER-BAND CORRELATIONS 204 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) SINGLE-CHANNEL SPEECH PRESENCE PROBABILITY ESTIMATION USING INTER-FRAME AND INTER-BAND CORRELATIONS Hajar Momeni,2,,

More information

arxiv: v1 [cs.sd] 30 Oct 2015

arxiv: v1 [cs.sd] 30 Oct 2015 ACE Challenge Workshop, a satellite event of IEEE-WASPAA 15 October 18-1, 15, New Paltz, NY ESTIMATION OF THE DIRECT-TO-REVERBERANT ENERGY RATIO USING A SPHERICAL MICROPHONE ARRAY Hanchi Chen, Prasanga

More information

A new method for a nonlinear acoustic echo cancellation system

A new method for a nonlinear acoustic echo cancellation system A new method for a nonlinear acoustic echo cancellation system Tuan Van Huynh Department of Physics and Computer Science, Faculty of Physics and Engineering Physics, University of Science, Vietnam National

More information

arxiv: v1 [cs.lg] 27 Oct 2017

arxiv: v1 [cs.lg] 27 Oct 2017 ADVANCED LSTM: A STUDY ABOUT BETTER TIME DEPENDENCY MODELING IN EMOTION RECOGNITION Fei Tao 1, Gang Liu 2 1. Multimodal Signal Processing (MSP) Lab, The University of Texas at Dallas, Richardson TX 2.

More information

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition

More information

Gate Activation Signal Analysis for Gated Recurrent Neural Networks and Its Correlation with Phoneme Boundaries

Gate Activation Signal Analysis for Gated Recurrent Neural Networks and Its Correlation with Phoneme Boundaries INTERSPEECH 2017 August 20 24, 2017, Stockholm, Sweden Gate Activation Signal Analysis for Gated Recurrent Neural Networks and Its Correlation with Phoneme Boundaries Yu-Hsuan Wang, Cheng-Tao Chung, Hung-yi

More information

Recurrent Autoregressive Networks for Online Multi-Object Tracking. Presented By: Ishan Gupta

Recurrent Autoregressive Networks for Online Multi-Object Tracking. Presented By: Ishan Gupta Recurrent Autoregressive Networks for Online Multi-Object Tracking Presented By: Ishan Gupta Outline Multi Object Tracking Recurrent Autoregressive Networks (RANs) RANs for Online Tracking Other State

More information

BIDIRECTIONAL LSTM-HMM HYBRID SYSTEM FOR POLYPHONIC SOUND EVENT DETECTION

BIDIRECTIONAL LSTM-HMM HYBRID SYSTEM FOR POLYPHONIC SOUND EVENT DETECTION BIDIRECTIONAL LSTM-HMM HYBRID SYSTEM FOR POLYPHONIC SOUND EVENT DETECTION Tomoki Hayashi 1, Shinji Watanabe 2, Tomoki Toda 1, Takaaki Hori 2, Jonathan Le Roux 2, Kazuya Takeda 1 1 Nagoya University, Furo-cho,

More information

Diagnostics. Gad Kimmel

Diagnostics. Gad Kimmel Diagnostics Gad Kimmel Outline Introduction. Bootstrap method. Cross validation. ROC plot. Introduction Motivation Estimating properties of an estimator. Given data samples say the average. x 1, x 2,...,

More information

Binaural Beamforming Using Pre-Determined Relative Acoustic Transfer Functions

Binaural Beamforming Using Pre-Determined Relative Acoustic Transfer Functions Binaural Beamforming Using Pre-Determined Relative Acoustic Transfer Functions Andreas I. Koutrouvelis, Richard C. Hendriks, Richard Heusdens, Jesper Jensen and Meng Guo e-mails: {a.koutrouvelis, r.c.hendriks,

More information

Modeling Time-Frequency Patterns with LSTM vs. Convolutional Architectures for LVCSR Tasks

Modeling Time-Frequency Patterns with LSTM vs. Convolutional Architectures for LVCSR Tasks Modeling Time-Frequency Patterns with LSTM vs Convolutional Architectures for LVCSR Tasks Tara N Sainath, Bo Li Google, Inc New York, NY, USA {tsainath, boboli}@googlecom Abstract Various neural network

More information

Acoustic Signal Processing. Algorithms for Reverberant. Environments

Acoustic Signal Processing. Algorithms for Reverberant. Environments Acoustic Signal Processing Algorithms for Reverberant Environments Terence Betlehem B.Sc. B.E.(Hons) ANU November 2005 A thesis submitted for the degree of Doctor of Philosophy of The Australian National

More information

Performance Measures. Sören Sonnenburg. Fraunhofer FIRST.IDA, Kekuléstr. 7, Berlin, Germany

Performance Measures. Sören Sonnenburg. Fraunhofer FIRST.IDA, Kekuléstr. 7, Berlin, Germany Sören Sonnenburg Fraunhofer FIRST.IDA, Kekuléstr. 7, 2489 Berlin, Germany Roadmap: Contingency Table Scores from the Contingency Table Curves from the Contingency Table Discussion Sören Sonnenburg Contingency

More information

Applications of multi-class machine

Applications of multi-class machine Applications of multi-class machine learning models to drug design Marvin Waldman, Michael Lawless, Pankaj R. Daga, Robert D. Clark Simulations Plus, Inc. Lancaster CA, USA Overview Applications of multi-class

More information

Enhancement of Noisy Speech. State-of-the-Art and Perspectives

Enhancement of Noisy Speech. State-of-the-Art and Perspectives Enhancement of Noisy Speech State-of-the-Art and Perspectives Rainer Martin Institute of Communications Technology (IFN) Technical University of Braunschweig July, 2003 Applications of Noise Reduction

More information

Analysis of Multilayer Neural Network Modeling and Long Short-Term Memory

Analysis of Multilayer Neural Network Modeling and Long Short-Term Memory Analysis of Multilayer Neural Network Modeling and Long Short-Term Memory Danilo López, Nelson Vera, Luis Pedraza International Science Index, Mathematical and Computational Sciences waset.org/publication/10006216

More information

Modelling Time Series with Neural Networks. Volker Tresp Summer 2017

Modelling Time Series with Neural Networks. Volker Tresp Summer 2017 Modelling Time Series with Neural Networks Volker Tresp Summer 2017 1 Modelling of Time Series The next figure shows a time series (DAX) Other interesting time-series: energy prize, energy consumption,

More information

Long-Short Term Memory and Other Gated RNNs

Long-Short Term Memory and Other Gated RNNs Long-Short Term Memory and Other Gated RNNs Sargur Srihari srihari@buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Sequence Modeling

More information

DIFFUSION-BASED DISTRIBUTED MVDR BEAMFORMER

DIFFUSION-BASED DISTRIBUTED MVDR BEAMFORMER 14 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) DIFFUSION-BASED DISTRIBUTED MVDR BEAMFORMER Matt O Connor 1 and W. Bastiaan Kleijn 1,2 1 School of Engineering and Computer

More information

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH Hoang Trang 1, Tran Hoang Loc 1 1 Ho Chi Minh City University of Technology-VNU HCM, Ho Chi

More information

Slide credit from Hung-Yi Lee & Richard Socher

Slide credit from Hung-Yi Lee & Richard Socher Slide credit from Hung-Yi Lee & Richard Socher 1 Review Recurrent Neural Network 2 Recurrent Neural Network Idea: condition the neural network on all previous words and tie the weights at each time step

More information

Autoregressive Neural Models for Statistical Parametric Speech Synthesis

Autoregressive Neural Models for Statistical Parametric Speech Synthesis Autoregressive Neural Models for Statistical Parametric Speech Synthesis シンワン Xin WANG 2018-01-11 contact: wangxin@nii.ac.jp we welcome critical comments, suggestions, and discussion 1 https://www.slideshare.net/kotarotanahashi/deep-learning-library-coyotecnn

More information

Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook

Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook Recap Standard RNNs Training: Backpropagation Through Time (BPTT) Application to sequence modeling Language modeling Applications: Automatic speech

More information

Presented By: Omer Shmueli and Sivan Niv

Presented By: Omer Shmueli and Sivan Niv Deep Speaker: an End-to-End Neural Speaker Embedding System Chao Li, Xiaokong Ma, Bing Jiang, Xiangang Li, Xuewei Zhang, Xiao Liu, Ying Cao, Ajay Kannan, Zhenyao Zhu Presented By: Omer Shmueli and Sivan

More information

Recurrent Neural Networks. Jian Tang

Recurrent Neural Networks. Jian Tang Recurrent Neural Networks Jian Tang tangjianpku@gmail.com 1 RNN: Recurrent neural networks Neural networks for sequence modeling Summarize a sequence with fix-sized vector through recursively updating

More information

Multimodal context analysis and prediction

Multimodal context analysis and prediction Multimodal context analysis and prediction Valeria Tomaselli (valeria.tomaselli@st.com) Sebastiano Battiato Giovanni Maria Farinella Tiziana Rotondo (PhD student) Outline 2 Context analysis vs prediction

More information

Performance Evaluation and Comparison

Performance Evaluation and Comparison Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Cross Validation and Resampling 3 Interval Estimation

More information

Robust Adaptive Beamforming Based on Low-Complexity Shrinkage-Based Mismatch Estimation

Robust Adaptive Beamforming Based on Low-Complexity Shrinkage-Based Mismatch Estimation 1 Robust Adaptive Beamforming Based on Low-Complexity Shrinkage-Based Mismatch Estimation Hang Ruan and Rodrigo C. de Lamare arxiv:1311.2331v1 [cs.it] 11 Nov 213 Abstract In this work, we propose a low-complexity

More information

A Probability Model for Interaural Phase Difference

A Probability Model for Interaural Phase Difference A Probability Model for Interaural Phase Difference Michael I. Mandel, Daniel P.W. Ellis Department of Electrical Engineering Columbia University, New York, New York {mim,dpwe}@ee.columbia.edu Abstract

More information

Deep Recurrent Neural Networks

Deep Recurrent Neural Networks Deep Recurrent Neural Networks Artem Chernodub e-mail: a.chernodub@gmail.com web: http://zzphoto.me ZZ Photo IMMSP NASU 2 / 28 Neuroscience Biological-inspired models Machine Learning p x y = p y x p(x)/p(y)

More information

Novel spectrum sensing schemes for Cognitive Radio Networks

Novel spectrum sensing schemes for Cognitive Radio Networks Novel spectrum sensing schemes for Cognitive Radio Networks Cantabria University Santander, May, 2015 Supélec, SCEE Rennes, France 1 The Advanced Signal Processing Group http://gtas.unican.es The Advanced

More information

Big Data Analytics: Evaluating Classification Performance April, 2016 R. Bohn. Some overheads from Galit Shmueli and Peter Bruce 2010

Big Data Analytics: Evaluating Classification Performance April, 2016 R. Bohn. Some overheads from Galit Shmueli and Peter Bruce 2010 Big Data Analytics: Evaluating Classification Performance April, 2016 R. Bohn 1 Some overheads from Galit Shmueli and Peter Bruce 2010 Most accurate Best! Actual value Which is more accurate?? 2 Why Evaluate

More information

Model Accuracy Measures

Model Accuracy Measures Model Accuracy Measures Master in Bioinformatics UPF 2017-2018 Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA Barcelona, Spain Variables What we can measure (attributes) Hypotheses

More information

Speech and Language Processing

Speech and Language Processing Speech and Language Processing Lecture 5 Neural network based acoustic and language models Information and Communications Engineering Course Takahiro Shinoaki 08//6 Lecture Plan (Shinoaki s part) I gives

More information

An exploration of dropout with LSTMs

An exploration of dropout with LSTMs An exploration of out with LSTMs Gaofeng Cheng 1,3, Vijayaditya Peddinti 4,5, Daniel Povey 4,5, Vimal Manohar 4,5, Sanjeev Khudanpur 4,5,Yonghong Yan 1,2,3 1 Key Laboratory of Speech Acoustics and Content

More information

Adapting Wavenet for Speech Enhancement DARIO RETHAGE JULY 12, 2017

Adapting Wavenet for Speech Enhancement DARIO RETHAGE JULY 12, 2017 Adapting Wavenet for Speech Enhancement DARIO RETHAGE JULY 12, 2017 I am v Master Student v 6 months @ Music Technology Group, Universitat Pompeu Fabra v Deep learning for acoustic source separation v

More information

A Second-Order-Statistics-based Solution for Online Multichannel Noise Tracking and Reduction

A Second-Order-Statistics-based Solution for Online Multichannel Noise Tracking and Reduction A Second-Order-Statistics-based Solution for Online Multichannel Noise Tracking and Reduction Mehrez Souden, Jingdong Chen, Jacob Benesty, and Sofiène Affes Abstract We propose a second-order-statistics-based

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 7. Models with binary response II GLM (Spring, 2018) Lecture 7 1 / 13 Existence of estimates Lemma (Claudia Czado, München, 2004) The log-likelihood ln L(β) in logistic

More information

AN APPROACH TO PREVENT ADAPTIVE BEAMFORMERS FROM CANCELLING THE DESIRED SIGNAL. Tofigh Naghibi and Beat Pfister

AN APPROACH TO PREVENT ADAPTIVE BEAMFORMERS FROM CANCELLING THE DESIRED SIGNAL. Tofigh Naghibi and Beat Pfister AN APPROACH TO PREVENT ADAPTIVE BEAMFORMERS FROM CANCELLING THE DESIRED SIGNAL Tofigh Naghibi and Beat Pfister Speech Processing Group, Computer Engineering and Networks Lab., ETH Zurich, Switzerland {naghibi,pfister}@tik.ee.ethz.ch

More information

Semi-Supervised Source Localization on Multiple-Manifolds with Distributed Microphones

Semi-Supervised Source Localization on Multiple-Manifolds with Distributed Microphones IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING 1 Semi-Supervised Source Localization on Multiple-Manifolds with Distributed Microphones Bracha Laufer-Goldshtein Student Member, IEEE, Ronen

More information

USING STATISTICAL ROOM ACOUSTICS FOR ANALYSING THE OUTPUT SNR OF THE MWF IN ACOUSTIC SENSOR NETWORKS. Toby Christian Lawin-Ore, Simon Doclo

USING STATISTICAL ROOM ACOUSTICS FOR ANALYSING THE OUTPUT SNR OF THE MWF IN ACOUSTIC SENSOR NETWORKS. Toby Christian Lawin-Ore, Simon Doclo th European Signal Processing Conference (EUSIPCO 1 Bucharest, Romania, August 7-31, 1 USING STATISTICAL ROOM ACOUSTICS FOR ANALYSING THE OUTPUT SNR OF THE MWF IN ACOUSTIC SENSOR NETWORKS Toby Christian

More information

RARE SOUND EVENT DETECTION USING 1D CONVOLUTIONAL RECURRENT NEURAL NETWORKS

RARE SOUND EVENT DETECTION USING 1D CONVOLUTIONAL RECURRENT NEURAL NETWORKS RARE SOUND EVENT DETECTION USING 1D CONVOLUTIONAL RECURRENT NEURAL NETWORKS Hyungui Lim 1, Jeongsoo Park 1,2, Kyogu Lee 2, Yoonchang Han 1 1 Cochlear.ai, Seoul, Korea 2 Music and Audio Research Group,

More information

A low intricacy variable step-size partial update adaptive algorithm for Acoustic Echo Cancellation USNRao

A low intricacy variable step-size partial update adaptive algorithm for Acoustic Echo Cancellation USNRao ISSN: 77-3754 International Journal of Engineering and Innovative echnology (IJEI Volume 1, Issue, February 1 A low intricacy variable step-size partial update adaptive algorithm for Acoustic Echo Cancellation

More information

STFT Bin Selection for Localization Algorithms based on the Sparsity of Speech Signal Spectra

STFT Bin Selection for Localization Algorithms based on the Sparsity of Speech Signal Spectra STFT Bin Selection for Localization Algorithms based on the Sparsity of Speech Signal Spectra Andreas Brendel, Chengyu Huang, and Walter Kellermann Multimedia Communications and Signal Processing, Friedrich-Alexander-Universität

More information

Simple and efficient solutions to the problems associated with acoustic echo cancellation

Simple and efficient solutions to the problems associated with acoustic echo cancellation Scholars' Mine Doctoral Dissertations Student Research & Creative Works Summer 2007 Simple and efficient solutions to the problems associated with acoustic echo cancellation Asif Iqbal Mohammad Follow

More information

MASK WEIGHTED STFT RATIOS FOR RELATIVE TRANSFER FUNCTION ESTIMATION AND ITS APPLICATION TO ROBUST ASR

MASK WEIGHTED STFT RATIOS FOR RELATIVE TRANSFER FUNCTION ESTIMATION AND ITS APPLICATION TO ROBUST ASR MASK WEIGHTED STFT RATIOS FOR RELATIVE TRANSFER FUNCTION ESTIMATION AND ITS APPLICATION TO ROBUST ASR Zhong-Qiu Wang, DeLiang Wang, Department of Computer Science and Engineering, The Ohio State University,

More information

Highway-LSTM and Recurrent Highway Networks for Speech Recognition

Highway-LSTM and Recurrent Highway Networks for Speech Recognition Highway-LSTM and Recurrent Highway Networks for Speech Recognition Golan Pundak, Tara N. Sainath Google Inc., New York, NY, USA {golan, tsainath}@google.com Abstract Recently, very deep networks, with

More information

PREDICTION OF HETERODIMERIC PROTEIN COMPLEXES FROM PROTEIN-PROTEIN INTERACTION NETWORKS USING DEEP LEARNING

PREDICTION OF HETERODIMERIC PROTEIN COMPLEXES FROM PROTEIN-PROTEIN INTERACTION NETWORKS USING DEEP LEARNING PREDICTION OF HETERODIMERIC PROTEIN COMPLEXES FROM PROTEIN-PROTEIN INTERACTION NETWORKS USING DEEP LEARNING Peiying (Colleen) Ruan, PhD, Deep Learning Solution Architect 3/26/2018 Background OUTLINE Method

More information

Albenzio Cirillo INFOCOM Dpt. Università degli Studi di Roma, Sapienza

Albenzio Cirillo INFOCOM Dpt. Università degli Studi di Roma, Sapienza Albenzio Cirillo INFOCOM Dpt. Università degli Studi di Roma, Sapienza albenzio.cirillo@uniroma1.it http://ispac.ing.uniroma1.it/albenzio/index.htm ET2010 XXVI Riunione Annuale dei Ricercatori di Elettrotecnica

More information

Sound Source Tracking Using Microphone Arrays

Sound Source Tracking Using Microphone Arrays Sound Source Tracking Using Microphone Arrays WANG PENG and WEE SER Center for Signal Processing School of Electrical & Electronic Engineering Nanayang Technological Univerisy SINGAPORE, 639798 Abstract:

More information

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as

More information

arxiv: v1 [cs.lg] 4 Aug 2016

arxiv: v1 [cs.lg] 4 Aug 2016 An improved uncertainty decoding scheme with weighted samples for DNN-HMM hybrid systems Christian Huemmer 1, Ramón Fernández Astudillo 2, and Walter Kellermann 1 1 Multimedia Communications and Signal

More information

Boundary Contraction Training for Acoustic Models based on Discrete Deep Neural Networks

Boundary Contraction Training for Acoustic Models based on Discrete Deep Neural Networks INTERSPEECH 2014 Boundary Contraction Training for Acoustic Models based on Discrete Deep Neural Networks Ryu Takeda, Naoyuki Kanda, and Nobuo Nukaga Central Research Laboratory, Hitachi Ltd., 1-280, Kokubunji-shi,

More information

Moving Average Rules to Find. Confusion Matrix. CC283 Intelligent Problem Solving 05/11/2010. Edward Tsang (all rights reserved) 1

Moving Average Rules to Find. Confusion Matrix. CC283 Intelligent Problem Solving 05/11/2010. Edward Tsang (all rights reserved) 1 Machine Learning Overview Supervised Learning Training esting Te Unseen data Data Observed x 1 x 2... x n 1.6 7.1... 2.7 1.4 6.8... 3.1 2.1 5.4... 2.8... Machine Learning Patterns y = f(x) Target y Buy

More information

Musical noise reduction in time-frequency-binary-masking-based blind source separation systems

Musical noise reduction in time-frequency-binary-masking-based blind source separation systems Musical noise reduction in time-frequency-binary-masing-based blind source separation systems, 3, a J. Čermá, 1 S. Arai, 1. Sawada and 1 S. Maino 1 Communication Science Laboratories, Corporation, Kyoto,

More information

CHAPTER 5 EEG SIGNAL CLASSIFICATION BASED ON NN WITH ICA AND STFT

CHAPTER 5 EEG SIGNAL CLASSIFICATION BASED ON NN WITH ICA AND STFT 69 CHAPTER 5 EEG SIGNAL CLASSIFICATION BASED ON NN WITH ICA AND STFT 5.1 OVERVIEW A novel approach is proposed for Electroencephalogram signal classification using Artificial Neural Network based on Independent

More information

"Robust Automatic Speech Recognition through on-line Semi Blind Source Extraction"

Robust Automatic Speech Recognition through on-line Semi Blind Source Extraction "Robust Automatic Speech Recognition through on-line Semi Blind Source Extraction" Francesco Nesta, Marco Matassoni {nesta, matassoni}@fbk.eu Fondazione Bruno Kessler-Irst, Trento (ITALY) For contacts:

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Jeff Clune Assistant Professor Evolving Artificial Intelligence Laboratory Announcements Be making progress on your projects! Three Types of Learning Unsupervised Supervised Reinforcement

More information

Ensemble Methods. NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan

Ensemble Methods. NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan Ensemble Methods NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan How do you make a decision? What do you want for lunch today?! What did you have last night?! What are your favorite

More information

An EM Algorithm for Localizing Multiple Sound Sources in Reverberant Environments

An EM Algorithm for Localizing Multiple Sound Sources in Reverberant Environments An EM Algorithm for Localizing Multiple Sound Sources in Reverberant Environments Michael I. Mandel, Daniel P. W. Ellis LabROSA, Dept. of Electrical Engineering Columbia University New York, NY {mim,dpwe}@ee.columbia.edu

More information

Segmental Recurrent Neural Networks for End-to-end Speech Recognition

Segmental Recurrent Neural Networks for End-to-end Speech Recognition Segmental Recurrent Neural Networks for End-to-end Speech Recognition Liang Lu, Lingpeng Kong, Chris Dyer, Noah Smith and Steve Renals TTI-Chicago, UoE, CMU and UW 9 September 2016 Background A new wave

More information

Improved noise power spectral density tracking by a MAP-based postprocessor

Improved noise power spectral density tracking by a MAP-based postprocessor Improved noise power spectral density tracking by a MAP-based postprocessor Aleksej Chinaev, Alexander Krueger, Dang Hai Tran Vu, Reinhold Haeb-Umbach University of Paderborn, Germany March 8th, 01 Computer

More information

Smart Home Health Analytics Information Systems University of Maryland Baltimore County

Smart Home Health Analytics Information Systems University of Maryland Baltimore County Smart Home Health Analytics Information Systems University of Maryland Baltimore County 1 IEEE Expert, October 1996 2 Given sample S from all possible examples D Learner L learns hypothesis h based on

More information

Introduction to RNNs!

Introduction to RNNs! Introduction to RNNs Arun Mallya Best viewed with Computer Modern fonts installed Outline Why Recurrent Neural Networks (RNNs)? The Vanilla RNN unit The RNN forward pass Backpropagation refresher The RNN

More information

Source localization in an ocean waveguide using supervised machine learning

Source localization in an ocean waveguide using supervised machine learning Source localization in an ocean waveguide using supervised machine learning Haiqiang Niu, Emma Reeves, and Peter Gerstoft Scripps Institution of Oceanography, UC San Diego Part I Localization on Noise09

More information

Recurrent neural networks

Recurrent neural networks 12-1: Recurrent neural networks Prof. J.C. Kao, UCLA Recurrent neural networks Motivation Network unrollwing Backpropagation through time Vanishing and exploding gradients LSTMs GRUs 12-2: Recurrent neural

More information

A POSTERIORI SPEECH PRESENCE PROBABILITY ESTIMATION BASED ON AVERAGED OBSERVATIONS AND A SUPER-GAUSSIAN SPEECH MODEL

A POSTERIORI SPEECH PRESENCE PROBABILITY ESTIMATION BASED ON AVERAGED OBSERVATIONS AND A SUPER-GAUSSIAN SPEECH MODEL A POSTERIORI SPEECH PRESENCE PROBABILITY ESTIMATION BASED ON AVERAGED OBSERVATIONS AND A SUPER-GAUSSIAN SPEECH MODEL Balázs Fodor Institute for Communications Technology Technische Universität Braunschweig

More information

EE-559 Deep learning Recurrent Neural Networks

EE-559 Deep learning Recurrent Neural Networks EE-559 Deep learning 11.1. Recurrent Neural Networks François Fleuret https://fleuret.org/ee559/ Sun Feb 24 20:33:31 UTC 2019 Inference from sequences François Fleuret EE-559 Deep learning / 11.1. Recurrent

More information

A Generalized Subspace Approach for Enhancing Speech Corrupted by Colored Noise

A Generalized Subspace Approach for Enhancing Speech Corrupted by Colored Noise 334 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL 11, NO 4, JULY 2003 A Generalized Subspace Approach for Enhancing Speech Corrupted by Colored Noise Yi Hu, Student Member, IEEE, and Philipos C

More information

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation) Learning for Deep Neural Networks (Back-propagation) Outline Summary of Previous Standford Lecture Universal Approximation Theorem Inference vs Training Gradient Descent Back-Propagation

More information

ON THE LIMITATIONS OF BINAURAL REPRODUCTION OF MONAURAL BLIND SOURCE SEPARATION OUTPUT SIGNALS

ON THE LIMITATIONS OF BINAURAL REPRODUCTION OF MONAURAL BLIND SOURCE SEPARATION OUTPUT SIGNALS th European Signal Processing Conference (EUSIPCO 12) Bucharest, Romania, August 27-31, 12 ON THE LIMITATIONS OF BINAURAL REPRODUCTION OF MONAURAL BLIND SOURCE SEPARATION OUTPUT SIGNALS Klaus Reindl, Walter

More information

Sensor Tasking and Control

Sensor Tasking and Control Sensor Tasking and Control Sensing Networking Leonidas Guibas Stanford University Computation CS428 Sensor systems are about sensing, after all... System State Continuous and Discrete Variables The quantities

More information

BLSTM-HMM HYBRID SYSTEM COMBINED WITH SOUND ACTIVITY DETECTION NETWORK FOR POLYPHONIC SOUND EVENT DETECTION

BLSTM-HMM HYBRID SYSTEM COMBINED WITH SOUND ACTIVITY DETECTION NETWORK FOR POLYPHONIC SOUND EVENT DETECTION BLSTM-HMM HYBRID SYSTEM COMBINED WITH SOUND ACTIVITY DETECTION NETWORK FOR POLYPHONIC SOUND EVENT DETECTION Tomoki Hayashi 1, Shinji Watanabe 2, Tomoki Toda 1, Takaaki Hori 2, Jonathan Le Roux 2, Kazuya

More information

Development of a Deep Recurrent Neural Network Controller for Flight Applications

Development of a Deep Recurrent Neural Network Controller for Flight Applications Development of a Deep Recurrent Neural Network Controller for Flight Applications American Control Conference (ACC) May 26, 2017 Scott A. Nivison Pramod P. Khargonekar Department of Electrical and Computer

More information

arxiv: v1 [eess.as] 12 Jul 2018

arxiv: v1 [eess.as] 12 Jul 2018 OPTIMAL BINAURAL LCMV BEAMFORMING IN COMPLEX ACOUSTIC SCENARIOS: TEORETICAL AND PRACTICAL INSIGTS Nico Gößling 1 Daniel Marquardt 1 Ivo Merks Tao Zhang Simon Doclo 1 1 University of Oldenburg Department

More information

Tensor-Train Long Short-Term Memory for Monaural Speech Enhancement

Tensor-Train Long Short-Term Memory for Monaural Speech Enhancement 1 Tensor-Train Long Short-Term Memory for Monaural Speech Enhancement Suman Samui, Indrajit Chakrabarti, and Soumya K. Ghosh, arxiv:1812.10095v1 [cs.sd] 25 Dec 2018 Abstract In recent years, Long Short-Term

More information

Reduced-cost combination of adaptive filters for acoustic echo cancellation

Reduced-cost combination of adaptive filters for acoustic echo cancellation Reduced-cost combination of adaptive filters for acoustic echo cancellation Luis A. Azpicueta-Ruiz and Jerónimo Arenas-García Dept. Signal Theory and Communications, Universidad Carlos III de Madrid Leganés,

More information

TTIC 31230, Fundamentals of Deep Learning David McAllester, April Vanishing and Exploding Gradients. ReLUs. Xavier Initialization

TTIC 31230, Fundamentals of Deep Learning David McAllester, April Vanishing and Exploding Gradients. ReLUs. Xavier Initialization TTIC 31230, Fundamentals of Deep Learning David McAllester, April 2017 Vanishing and Exploding Gradients ReLUs Xavier Initialization Batch Normalization Highway Architectures: Resnets, LSTMs and GRUs Causes

More information

Analysis of the Learning Process of a Recurrent Neural Network on the Last k-bit Parity Function

Analysis of the Learning Process of a Recurrent Neural Network on the Last k-bit Parity Function Analysis of the Learning Process of a Recurrent Neural Network on the Last k-bit Parity Function Austin Wang Adviser: Xiuyuan Cheng May 4, 2017 1 Abstract This study analyzes how simple recurrent neural

More information

Temporal Backpropagation for FIR Neural Networks

Temporal Backpropagation for FIR Neural Networks Temporal Backpropagation for FIR Neural Networks Eric A. Wan Stanford University Department of Electrical Engineering, Stanford, CA 94305-4055 Abstract The traditional feedforward neural network is a static

More information

Spatial sound. Lecture 8: EE E6820: Speech & Audio Processing & Recognition. Columbia University Dept. of Electrical Engineering

Spatial sound. Lecture 8: EE E6820: Speech & Audio Processing & Recognition. Columbia University Dept. of Electrical Engineering EE E6820: Speech & Audio Processing & Recognition Lecture 8: Spatial sound 1 Spatial acoustics 2 Binaural perception 3 Synthesizing spatial audio 4 Extracting spatial sounds Dan Ellis

More information

Proc. of NCC 2010, Chennai, India

Proc. of NCC 2010, Chennai, India Proc. of NCC 2010, Chennai, India Trajectory and surface modeling of LSF for low rate speech coding M. Deepak and Preeti Rao Department of Electrical Engineering Indian Institute of Technology, Bombay

More information

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Speaker Representation and Verification Part II. by Vasileios Vasilakakis Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation

More information

BIAS CORRECTION METHODS FOR ADAPTIVE RECURSIVE SMOOTHING WITH APPLICATIONS IN NOISE PSD ESTIMATION. Robert Rehr, Timo Gerkmann

BIAS CORRECTION METHODS FOR ADAPTIVE RECURSIVE SMOOTHING WITH APPLICATIONS IN NOISE PSD ESTIMATION. Robert Rehr, Timo Gerkmann BIAS CORRECTION METHODS FOR ADAPTIVE RECURSIVE SMOOTHING WITH APPLICATIONS IN NOISE PSD ESTIMATION Robert Rehr, Timo Gerkmann Speech Signal Processing Group, Department of Medical Physics and Acoustics

More information

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood

More information

Sensitivity Considerations in Compressed Sensing

Sensitivity Considerations in Compressed Sensing Sensitivity Considerations in Compressed Sensing Louis L. Scharf, 1 Edwin K. P. Chong, 1,2 Ali Pezeshki, 2 and J. Rockey Luo 2 1 Department of Mathematics, Colorado State University Fort Collins, CO 8523,

More information

Residual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition

Residual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition INTERSPEECH 017 August 0 4, 017, Stockholm, Sweden Residual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition Jaeyoung Kim 1, Mostafa El-Khamy 1, Jungwon Lee 1 1 Samsung Semiconductor,

More information

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks INFOB2KI 2017-2018 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Artificial Neural Networks Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html

More information

An Autoregressive Recurrent Mixture Density Network for Parametric Speech Synthesis

An Autoregressive Recurrent Mixture Density Network for Parametric Speech Synthesis ICASSP 07 New Orleans, USA An Autoregressive Recurrent Mixture Density Network for Parametric Speech Synthesis Xin WANG, Shinji TAKAKI, Junichi YAMAGISHI National Institute of Informatics, Japan 07-03-07

More information

Multiple Speaker Tracking with the Factorial von Mises- Fisher Filter

Multiple Speaker Tracking with the Factorial von Mises- Fisher Filter Multiple Speaker Tracking with the Factorial von Mises- Fisher Filter IEEE International Workshop on Machine Learning for Signal Processing Sept 21-24, 2014 Reims, France Johannes Traa, Paris Smaragdis

More information

Recurrent Neural Network

Recurrent Neural Network Recurrent Neural Network Xiaogang Wang xgwang@ee..edu.hk March 2, 2017 Xiaogang Wang (linux) Recurrent Neural Network March 2, 2017 1 / 48 Outline 1 Recurrent neural networks Recurrent neural networks

More information