Efficient Target Activity Detection Based on Recurrent Neural Networks
|
|
- Christopher Andrews
- 5 years ago
- Views:
Transcription
1 Efficient Target Activity Detection Based on Recurrent Neural Networks D. Gerber, S. Meier, and W. Kellermann Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU)
2 Motivation Target φ tar oise 1 / 15
3 Motivation Target Interferer Interferer φ tar oise 1 / 15
4 Motivation Target Interferer Interferer φ tar Noise 1 / 15
5 Motivation Interferer Target φ tar Interferer Noise Goal: Detect time frames m with dominant target source 1 / 15
6 Motivation Interferer Target φ tar Interferer Noise Goal: Detect time frames m with dominant target source Target Activity Detection (TAD) 1 / 15
7 Motivation Proposed method: Artificial neural networks (ANNs) Meier and Kellermann (2016) Feature vector f Hidden layers Decision 2 / 15
8 Motivation Proposed method: Artificial neural networks (ANNs) Meier and Kellermann (2016) Feature vector f Hidden layers Decision Questions: How to define the feature vector? What network topology? How to incorporate memory? 2 / 15
9 Outline Motivation Features for TAD ANN-based feature combination Experiments 3 / 15
10 Features for TAD (1) Feature 1: Beamforming-based SNR estimate Target source components equalized accounting for measured HRTFs 4 / 15
11 Features for TAD (1) Feature 1: Beamforming-based SNR estimate Beamformer towards target ˆσ 2 s Target source components equalized accounting for measured HRTFs Add up equalized mic signals Beamformer 4 / 15
12 Features for TAD (1) Feature 1: Beamforming-based SNR estimate Beamformer towards target ˆσ 2 s Nullsteering beamformer towards target ˆσ 2 n Target source components equalized accounting for measured HRTFs Add up equalized mic signals Beamformer Subtract equalized (frontal) mic signals Nullformer 4 / 15
13 Features for TAD (1) Feature 1: Beamforming-based SNR estimate Beamformer towards target ˆσ 2 s Nullsteering beamformer towards target ˆσ 2 n Target source components equalized accounting for measured HRTFs Add up equalized mic signals Beamformer Subtract equalized (frontal) mic signals Nullformer SNR estimate as feature: f SNR (t) = ˆσ2 s (t) ˆσ 2 n(t) 4 / 15
14 Features for TAD (1) Feature 1: Beamforming-based SNR estimate Beamformer towards target ˆσ 2 s Nullsteering beamformer towards target ˆσ 2 n [ f t = f SNR (t) Target source components equalized accounting for measured HRTFs Add up equalized mic signals Beamformer Subtract equalized (frontal) mic signals Nullformer SNR estimate as feature: f SNR (t) = ˆσ2 s (t) ˆσ 2 n(t) ] T 4 / 15
15 Feature 2: Crosscorrelation ratio r 13 ( k) Features for TAD (2) K +K k (t) [ f t = f SNR (t) ] T 5 / 15
16 Feature 2: Crosscorrelation ratio Features for TAD (2) r 13 ( k) r 13 ( k T (t), t) K +K Target source creates peak at TDOA k T (t) k k T (t) [ f t = f SNR (t) ] T 5 / 15
17 Features for TAD (2) Feature 2: Crosscorrelation ratio r 13 ( k) r 13 ( k T (t), t) K +K k k T (t) Target source creates peak at TDOA k T (t) Ratio with strongest peak at k k T (t) as feature: f corr (t) = r 13 ( k T (t), t) max k k T (t) r 13( k, t) [ f t = f SNR (t) ] T 5 / 15
18 Feature 2: Crosscorrelation ratio Features for TAD (2) max k k T (t) r 13 ( k) r 13 ( k T (t), t) r 13 ( k, t) K +K k T (t) k Target source creates peak at TDOA k T (t) Ratio with strongest peak at k k T (t) as feature: f corr (t) = r 13 ( k T (t), t) max k k T (t) r 13( k, t) [ f t = f SNR (t) ] T 5 / 15
19 Feature 2: Crosscorrelation ratio Features for TAD (2) max k k T (t) r 13 ( k) r 13 ( k T (t), t) r 13 ( k, t) K +K k T (t) [ f t = f SNR (t) k Target source creates peak at TDOA k T (t) Ratio with strongest peak at k k T (t) as feature: f corr (t) = r 13 ( k T (t), t) max k k T (t) r 13( k, t) Interpretation: Power ratio between target and strongest interferer ] T 5 / 15
20 Feature 2: Crosscorrelation ratio Features for TAD (2) max k k T (t) r 13 ( k) r 13 ( k T (t), t) r 13 ( k, t) K +K k T (t) k [ f t = f SNR (t), f corr (t) Target source creates peak at TDOA k T (t) Ratio with strongest peak at k k T (t) as feature: f corr (t) = r 13 ( k T (t), t) max k k T (t) r 13( k, t) Interpretation: Power ratio between target and strongest interferer ] T 5 / 15
21 Features for TAD (3) Feature 3: Adaptive differential beamformer φ diff Adaptive differential beamformer [Elko and Pong (1995)] steers null towards dominant sources [ f t = f SNR (t), f corr (t) ] T 6 / 15
22 Features for TAD (3) Feature 3: Adaptive differential beamformer φ diff Adaptive differential beamformer [Elko and Pong (1995)] steers null towards dominant sources [ f t = f SNR (t), f corr (t) ] T 6 / 15
23 Features for TAD (3) Feature 3: Adaptive differential beamformer φ diff Adaptive differential beamformer [Elko and Pong (1995)] steers null towards dominant sources [ f t = f SNR (t), f corr (t) ] T 6 / 15
24 Features for TAD (3) Feature 3: Adaptive differential beamformer φ diff Adaptive differential beamformer [Elko and Pong (1995)] steers null towards dominant sources Direction φ diff as feature: f diff (t) = [cos (φ diff (t)), sin (φ diff (t))] T [ f t = f SNR (t), f corr (t), f diff (t) T ] T 6 / 15
25 Features for TAD (4) Feature 4: Microphone signal variances Detect overall powers and unilateral scenarios f σ 2(t) = [ σ 2 v 1 (t), σ 2 v 3 (t) ] T [ f t = f SNR (t), f corr (t), f diff (t) T, f σ 2(t) T ] T 7 / 15
26 Features for TAD (4) Feature 4: Microphone signal variances Detect overall powers and unilateral scenarios f σ 2(t) = [ σ 2 v 1 (t), σ 2 v 3 (t) ] T Feature 5: Target source DoA Complementing f diff (t) f φ (t) = [cos (φ tar (t)), sin (φ tar (t))] T [ ] T f t = f SNR (t), f corr (t), f diff (t) T, f σ 2(t) T, f φ (t) T 7 / 15
27 ANN-based feature combination Mapping of the feature vector f t to a decision y t Feedforward Neural Network (FNN) [Meier and Kellermann (2016)] f t y t 8 / 15
28 ANN-based feature combination Mapping of the feature vector f t to a decision y t Feedforward Neural Network (FNN) [Meier and Kellermann (2016)] Subsequent decisions dependent How to incorporate memory? f t y t 8 / 15
29 ANN-based feature combination Mapping of the feature vector f t to a decision y t Feedforward Neural Network (FNN) [Meier and Kellermann (2016)] Subsequent decisions dependent How to incorporate memory? Sequential FNNs f t f t 1 f t 2 y t 8 / 15
30 ANN-based feature combination Mapping of the feature vector f t to a decision y t Feedforward Neural Network (FNN) [Meier and Kellermann (2016)] Subsequent decisions dependent How to incorporate memory? Sequential FNNs Recurrent Neural Networks (RNNs) f t y t 8 / 15
31 ANN-based feature combination Mapping of the feature vector f t to a decision y t Feedforward Neural Network (FNN) [Meier and Kellermann (2016)] Subsequent decisions dependent How to incorporate memory? Sequential FNNs Recurrent Neural Networks (RNNs) Long Short-Term Memory (LSTM) networks ( longer memory) f t y t 8 / 15
32 ANN-based feature combination Mapping of the feature vector f t to a decision y t Feedforward Neural Network (FNN) [Meier and Kellermann (2016)] Subsequent decisions dependent How to incorporate memory? Sequential FNNs Recurrent Neural Networks (RNNs) Long Short-Term Memory (LSTM) networks ( longer memory) Gated Recurrent Unit (GRU) networks ( less complex) f t y t 8 / 15
33 Experiments Table: Investigated network types. Feed-forward FNN α=0 Non-smoothed features FNN α=0.7 Recursively smoothed features (α = 0.7) FNN seq Sequential features Recurrent RNN Vanilla RNNs LSTM Long Short-Term Memory GRU Gated Recurrent Unit 9 / 15
34 Experiments Data set consisting of 38 recordings: 1 target speaker 1-4 interferers (same level as target speaker) Babble noise (SNR: 10 db) Various source positions, living room-like environment (T ms) Recordings with hearing aids on a KEMAR head Training set: 29 scenarios (20s each), test set: 9 scenarios (10s each) Sampling rate: 16kHz 10 / 15
35 Experiments Data set consisting of 38 recordings: 1 target speaker 1-4 interferers (same level as target speaker) Babble noise (SNR: 10 db) Various source positions, living room-like environment (T ms) Recordings with hearing aids on a KEMAR head Training set: 29 scenarios (20s each), test set: 9 scenarios (10s each) Sampling rate: 16kHz Ground truth: oracle SINR > 10dB 10 / 15
36 Experiments Data set consisting of 38 recordings: 1 target speaker 1-4 interferers (same level as target speaker) Babble noise (SNR: 10 db) Various source positions, living room-like environment (T ms) Recordings with hearing aids on a KEMAR head Training set: 29 scenarios (20s each), test set: 9 scenarios (10s each) Sampling rate: 16kHz Ground truth: oracle SINR > 10dB Implementation in Python (Theano/Lasagne) Regularization: FNNs dropout layers, RNNs synaptic noise Intel i7-920 CPU, Geforce GTX 970 GPU Number of layers L [1, 6] Number of nodes per layer N [1, 32] Best network topology chosen for each network type individually 10 / 15
37 Evaluation measures Matthews Correlation Coefficient: Tg MCC = TP TN FP FN (TP + FP)(TP + FN)(TN + FP)(TN + FN) 11 / 15
38 Evaluation measures Matthews Correlation Coefficient: True Positives Tg MCC = TP TN FP FN (TP + FP)(TP + FN)(TN + FP)(TN + FN) 11 / 15
39 Evaluation measures Matthews Correlation Coefficient: True Negatives Tg MCC = TP TN FP FN (TP + FP)(TP + FN)(TN + FP)(TN + FN) 11 / 15
40 Evaluation measures Matthews Correlation Coefficient: False Positives Tg MCC = TP TN FP FN (TP + FP)(TP + FN)(TN + FP)(TN + FN) 11 / 15
41 Evaluation measures Matthews Correlation Coefficient: False Negatives Tg MCC = TP TN FP FN (TP + FP)(TP + FN)(TN + FP)(TN + FN) 11 / 15
42 Evaluation measures Matthews Correlation Coefficient: Tg MCC = TP TN FP FN (TP + FP)(TP + FN)(TN + FP)(TN + FN) Perfect detection: MCC = 1, Random detection: MCC = 0, Total disagreement: MCC = 1, 11 / 15
43 Evaluation measures Matthews Correlation Coefficient: Tg MCC = TP TN FP FN (TP + FP)(TP + FN)(TN + FP)(TN + FN) Area Under Curve (AUC): Receiver operating curve (ROC) TP rate FP rate Perfect detection: MCC = 1, Random detection: MCC = 0, Total disagreement: MCC = 1, 11 / 15
44 Evaluation measures Matthews Correlation Coefficient: Tg MCC = TP TN FP FN (TP + FP)(TP + FN)(TN + FP)(TN + FN) Area Under Curve (AUC): Receiver operating curve (ROC) High TP rate at low FP rate AUC 1 TP rate AUC FP rate Perfect detection: MCC = 1, Random detection: MCC = 0, Total disagreement: MCC = 1, 11 / 15
45 Evaluation measures Matthews Correlation Coefficient: Tg MCC = TP TN FP FN (TP + FP)(TP + FN)(TN + FP)(TN + FN) Area Under Curve (AUC): Receiver operating curve (ROC) High TP rate at low FP rate AUC 1 TP rate AUC FP rate Perfect detection: MCC = 1, AUC = 1 Random detection: MCC = 0, AUC = 0.5 Total disagreement: MCC = 1, AUC = 0 11 / 15
46 Results Performance Complexity Network type ACC AUC MCC N L RTT FNN α=0 FNN α=0.7 FNN (seq) RNN LSTM GRU N # nodes per layer ACC TP+TN accuracy ( TP+TN+FP+FN ) L # layers RTT relative testing time 12 / 15
47 Results Performance Complexity Network type ACC AUC MCC N L RTT FNN α= FNN α= FNN (seq) RNN LSTM GRU N # nodes per layer ACC TP+TN accuracy ( TP+TN+FP+FN ) L # layers RTT relative testing time 12 / 15
48 Results Performance Complexity Network type ACC AUC MCC N L RTT FNN α= FNN α= FNN (seq) RNN LSTM GRU N # nodes per layer ACC TP+TN accuracy ( TP+TN+FP+FN ) L # layers RTT relative testing time 12 / 15
49 Results Performance Complexity Network type ACC AUC MCC N L RTT FNN α= FNN α= FNN (seq) RNN LSTM GRU N # nodes per layer ACC TP+TN accuracy ( TP+TN+FP+FN ) L # layers RTT relative testing time 12 / 15
50 Results Performance Complexity Network type ACC AUC MCC N L RTT FNN α= FNN α= FNN (seq) RNN LSTM GRU N # nodes per layer ACC TP+TN accuracy ( TP+TN+FP+FN ) L # layers RTT relative testing time 12 / 15
51 Summary ANN-based feature combination leads to good detection of target dominance intervals Exploiting memory is beneficial for TAD Vanilla RNNs outperform sequential FNNs even with smaller network depth LSTMs and GRUs do not lead to significant improvements no benefit from long-term memory 13 / 15
52 Thank you for your attention! 14 / 15
53 References Elko, G. W. and Pong, A.-T. N. (1995). A simple adaptive first-order differential microphone. In Proc. IEEE Workshop Applications Signal Process. Audio Acoustics (WASPAA), pages Meier, S. and Kellermann, W. (2016). Artificial neural network-based feature combination for spatial voice activity detection. In Proc. Annual Conf. Int. Speech Communication Assoc. (Interspeech), pages , San Francisco, USA. 15 / 15
54 Appendix: Complete results Performance Network type ACC AUC MCC FNN (nos) FNN (smo) FNN (seq) RNN LSTM GRU Complexity Network type N L P / P RRT RTT FNN (nos) / FNN (smo) / FNN (seq) / RNN / LSTM / GRU / B 1
Source localization and separation for binaural hearing aids
Source localization and separation for binaural hearing aids Mehdi Zohourian, Gerald Enzner, Rainer Martin Listen Workshop, July 218 Institute of Communication Acoustics Outline 1 Introduction 2 Binaural
More informationSpatial Diffuseness Features for DNN-Based Speech Recognition in Noisy and Reverberant Environments
Spatial Diffuseness Features for DNN-Based Speech Recognition in Noisy and Reverberant Environments Andreas Schwarz, Christian Huemmer, Roland Maas, Walter Kellermann Lehrstuhl für Multimediakommunikation
More informationComparison of RTF Estimation Methods between a Head-Mounted Binaural Hearing Device and an External Microphone
Comparison of RTF Estimation Methods between a Head-Mounted Binaural Hearing Device and an External Microphone Nico Gößling, Daniel Marquardt and Simon Doclo Department of Medical Physics and Acoustics
More informationRECURRENT NETWORKS I. Philipp Krähenbühl
RECURRENT NETWORKS I Philipp Krähenbühl RECAP: CLASSIFICATION conv 1 conv 2 conv 3 conv 4 1 2 tu RECAP: SEGMENTATION conv 1 conv 2 conv 3 conv 4 RECAP: DETECTION conv 1 conv 2 conv 3 conv 4 RECAP: GENERATION
More informationESTIMATION OF RELATIVE TRANSFER FUNCTION IN THE PRESENCE OF STATIONARY NOISE BASED ON SEGMENTAL POWER SPECTRAL DENSITY MATRIX SUBTRACTION
ESTIMATION OF RELATIVE TRANSFER FUNCTION IN THE PRESENCE OF STATIONARY NOISE BASED ON SEGMENTAL POWER SPECTRAL DENSITY MATRIX SUBTRACTION Xiaofei Li 1, Laurent Girin 1,, Radu Horaud 1 1 INRIA Grenoble
More informationMonaural speech separation using source-adapted models
Monaural speech separation using source-adapted models Ron Weiss, Dan Ellis {ronw,dpwe}@ee.columbia.edu LabROSA Department of Electrical Enginering Columbia University 007 IEEE Workshop on Applications
More informationMAXIMUM LIKELIHOOD BASED MULTI-CHANNEL ISOTROPIC REVERBERATION REDUCTION FOR HEARING AIDS
MAXIMUM LIKELIHOOD BASED MULTI-CHANNEL ISOTROPIC REVERBERATION REDUCTION FOR HEARING AIDS Adam Kuklasiński, Simon Doclo, Søren Holdt Jensen, Jesper Jensen, Oticon A/S, 765 Smørum, Denmark University of
More informationNOISE ROBUST RELATIVE TRANSFER FUNCTION ESTIMATION. M. Schwab, P. Noll, and T. Sikora. Technical University Berlin, Germany Communication System Group
NOISE ROBUST RELATIVE TRANSFER FUNCTION ESTIMATION M. Schwab, P. Noll, and T. Sikora Technical University Berlin, Germany Communication System Group Einsteinufer 17, 1557 Berlin (Germany) {schwab noll
More informationModifying Voice Activity Detection in Low SNR by correction factors
Modifying Voice Activity Detection in Low SNR by correction factors H. Farsi, M. A. Mozaffarian, H.Rahmani Department of Electrical Engineering University of Birjand P.O. Box: +98-9775-376 IRAN hfarsi@birjand.ac.ir
More informationSINGLE-CHANNEL SPEECH PRESENCE PROBABILITY ESTIMATION USING INTER-FRAME AND INTER-BAND CORRELATIONS
204 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) SINGLE-CHANNEL SPEECH PRESENCE PROBABILITY ESTIMATION USING INTER-FRAME AND INTER-BAND CORRELATIONS Hajar Momeni,2,,
More informationarxiv: v1 [cs.sd] 30 Oct 2015
ACE Challenge Workshop, a satellite event of IEEE-WASPAA 15 October 18-1, 15, New Paltz, NY ESTIMATION OF THE DIRECT-TO-REVERBERANT ENERGY RATIO USING A SPHERICAL MICROPHONE ARRAY Hanchi Chen, Prasanga
More informationA new method for a nonlinear acoustic echo cancellation system
A new method for a nonlinear acoustic echo cancellation system Tuan Van Huynh Department of Physics and Computer Science, Faculty of Physics and Engineering Physics, University of Science, Vietnam National
More informationarxiv: v1 [cs.lg] 27 Oct 2017
ADVANCED LSTM: A STUDY ABOUT BETTER TIME DEPENDENCY MODELING IN EMOTION RECOGNITION Fei Tao 1, Gang Liu 2 1. Multimodal Signal Processing (MSP) Lab, The University of Texas at Dallas, Richardson TX 2.
More informationPattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition
Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition
More informationGate Activation Signal Analysis for Gated Recurrent Neural Networks and Its Correlation with Phoneme Boundaries
INTERSPEECH 2017 August 20 24, 2017, Stockholm, Sweden Gate Activation Signal Analysis for Gated Recurrent Neural Networks and Its Correlation with Phoneme Boundaries Yu-Hsuan Wang, Cheng-Tao Chung, Hung-yi
More informationRecurrent Autoregressive Networks for Online Multi-Object Tracking. Presented By: Ishan Gupta
Recurrent Autoregressive Networks for Online Multi-Object Tracking Presented By: Ishan Gupta Outline Multi Object Tracking Recurrent Autoregressive Networks (RANs) RANs for Online Tracking Other State
More informationBIDIRECTIONAL LSTM-HMM HYBRID SYSTEM FOR POLYPHONIC SOUND EVENT DETECTION
BIDIRECTIONAL LSTM-HMM HYBRID SYSTEM FOR POLYPHONIC SOUND EVENT DETECTION Tomoki Hayashi 1, Shinji Watanabe 2, Tomoki Toda 1, Takaaki Hori 2, Jonathan Le Roux 2, Kazuya Takeda 1 1 Nagoya University, Furo-cho,
More informationDiagnostics. Gad Kimmel
Diagnostics Gad Kimmel Outline Introduction. Bootstrap method. Cross validation. ROC plot. Introduction Motivation Estimating properties of an estimator. Given data samples say the average. x 1, x 2,...,
More informationBinaural Beamforming Using Pre-Determined Relative Acoustic Transfer Functions
Binaural Beamforming Using Pre-Determined Relative Acoustic Transfer Functions Andreas I. Koutrouvelis, Richard C. Hendriks, Richard Heusdens, Jesper Jensen and Meng Guo e-mails: {a.koutrouvelis, r.c.hendriks,
More informationModeling Time-Frequency Patterns with LSTM vs. Convolutional Architectures for LVCSR Tasks
Modeling Time-Frequency Patterns with LSTM vs Convolutional Architectures for LVCSR Tasks Tara N Sainath, Bo Li Google, Inc New York, NY, USA {tsainath, boboli}@googlecom Abstract Various neural network
More informationAcoustic Signal Processing. Algorithms for Reverberant. Environments
Acoustic Signal Processing Algorithms for Reverberant Environments Terence Betlehem B.Sc. B.E.(Hons) ANU November 2005 A thesis submitted for the degree of Doctor of Philosophy of The Australian National
More informationPerformance Measures. Sören Sonnenburg. Fraunhofer FIRST.IDA, Kekuléstr. 7, Berlin, Germany
Sören Sonnenburg Fraunhofer FIRST.IDA, Kekuléstr. 7, 2489 Berlin, Germany Roadmap: Contingency Table Scores from the Contingency Table Curves from the Contingency Table Discussion Sören Sonnenburg Contingency
More informationApplications of multi-class machine
Applications of multi-class machine learning models to drug design Marvin Waldman, Michael Lawless, Pankaj R. Daga, Robert D. Clark Simulations Plus, Inc. Lancaster CA, USA Overview Applications of multi-class
More informationEnhancement of Noisy Speech. State-of-the-Art and Perspectives
Enhancement of Noisy Speech State-of-the-Art and Perspectives Rainer Martin Institute of Communications Technology (IFN) Technical University of Braunschweig July, 2003 Applications of Noise Reduction
More informationAnalysis of Multilayer Neural Network Modeling and Long Short-Term Memory
Analysis of Multilayer Neural Network Modeling and Long Short-Term Memory Danilo López, Nelson Vera, Luis Pedraza International Science Index, Mathematical and Computational Sciences waset.org/publication/10006216
More informationModelling Time Series with Neural Networks. Volker Tresp Summer 2017
Modelling Time Series with Neural Networks Volker Tresp Summer 2017 1 Modelling of Time Series The next figure shows a time series (DAX) Other interesting time-series: energy prize, energy consumption,
More informationLong-Short Term Memory and Other Gated RNNs
Long-Short Term Memory and Other Gated RNNs Sargur Srihari srihari@buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Sequence Modeling
More informationDIFFUSION-BASED DISTRIBUTED MVDR BEAMFORMER
14 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) DIFFUSION-BASED DISTRIBUTED MVDR BEAMFORMER Matt O Connor 1 and W. Bastiaan Kleijn 1,2 1 School of Engineering and Computer
More informationHYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH
HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH Hoang Trang 1, Tran Hoang Loc 1 1 Ho Chi Minh City University of Technology-VNU HCM, Ho Chi
More informationSlide credit from Hung-Yi Lee & Richard Socher
Slide credit from Hung-Yi Lee & Richard Socher 1 Review Recurrent Neural Network 2 Recurrent Neural Network Idea: condition the neural network on all previous words and tie the weights at each time step
More informationAutoregressive Neural Models for Statistical Parametric Speech Synthesis
Autoregressive Neural Models for Statistical Parametric Speech Synthesis シンワン Xin WANG 2018-01-11 contact: wangxin@nii.ac.jp we welcome critical comments, suggestions, and discussion 1 https://www.slideshare.net/kotarotanahashi/deep-learning-library-coyotecnn
More informationRecurrent Neural Networks (Part - 2) Sumit Chopra Facebook
Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook Recap Standard RNNs Training: Backpropagation Through Time (BPTT) Application to sequence modeling Language modeling Applications: Automatic speech
More informationPresented By: Omer Shmueli and Sivan Niv
Deep Speaker: an End-to-End Neural Speaker Embedding System Chao Li, Xiaokong Ma, Bing Jiang, Xiangang Li, Xuewei Zhang, Xiao Liu, Ying Cao, Ajay Kannan, Zhenyao Zhu Presented By: Omer Shmueli and Sivan
More informationRecurrent Neural Networks. Jian Tang
Recurrent Neural Networks Jian Tang tangjianpku@gmail.com 1 RNN: Recurrent neural networks Neural networks for sequence modeling Summarize a sequence with fix-sized vector through recursively updating
More informationMultimodal context analysis and prediction
Multimodal context analysis and prediction Valeria Tomaselli (valeria.tomaselli@st.com) Sebastiano Battiato Giovanni Maria Farinella Tiziana Rotondo (PhD student) Outline 2 Context analysis vs prediction
More informationPerformance Evaluation and Comparison
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Cross Validation and Resampling 3 Interval Estimation
More informationRobust Adaptive Beamforming Based on Low-Complexity Shrinkage-Based Mismatch Estimation
1 Robust Adaptive Beamforming Based on Low-Complexity Shrinkage-Based Mismatch Estimation Hang Ruan and Rodrigo C. de Lamare arxiv:1311.2331v1 [cs.it] 11 Nov 213 Abstract In this work, we propose a low-complexity
More informationA Probability Model for Interaural Phase Difference
A Probability Model for Interaural Phase Difference Michael I. Mandel, Daniel P.W. Ellis Department of Electrical Engineering Columbia University, New York, New York {mim,dpwe}@ee.columbia.edu Abstract
More informationDeep Recurrent Neural Networks
Deep Recurrent Neural Networks Artem Chernodub e-mail: a.chernodub@gmail.com web: http://zzphoto.me ZZ Photo IMMSP NASU 2 / 28 Neuroscience Biological-inspired models Machine Learning p x y = p y x p(x)/p(y)
More informationNovel spectrum sensing schemes for Cognitive Radio Networks
Novel spectrum sensing schemes for Cognitive Radio Networks Cantabria University Santander, May, 2015 Supélec, SCEE Rennes, France 1 The Advanced Signal Processing Group http://gtas.unican.es The Advanced
More informationBig Data Analytics: Evaluating Classification Performance April, 2016 R. Bohn. Some overheads from Galit Shmueli and Peter Bruce 2010
Big Data Analytics: Evaluating Classification Performance April, 2016 R. Bohn 1 Some overheads from Galit Shmueli and Peter Bruce 2010 Most accurate Best! Actual value Which is more accurate?? 2 Why Evaluate
More informationModel Accuracy Measures
Model Accuracy Measures Master in Bioinformatics UPF 2017-2018 Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA Barcelona, Spain Variables What we can measure (attributes) Hypotheses
More informationSpeech and Language Processing
Speech and Language Processing Lecture 5 Neural network based acoustic and language models Information and Communications Engineering Course Takahiro Shinoaki 08//6 Lecture Plan (Shinoaki s part) I gives
More informationAn exploration of dropout with LSTMs
An exploration of out with LSTMs Gaofeng Cheng 1,3, Vijayaditya Peddinti 4,5, Daniel Povey 4,5, Vimal Manohar 4,5, Sanjeev Khudanpur 4,5,Yonghong Yan 1,2,3 1 Key Laboratory of Speech Acoustics and Content
More informationAdapting Wavenet for Speech Enhancement DARIO RETHAGE JULY 12, 2017
Adapting Wavenet for Speech Enhancement DARIO RETHAGE JULY 12, 2017 I am v Master Student v 6 months @ Music Technology Group, Universitat Pompeu Fabra v Deep learning for acoustic source separation v
More informationA Second-Order-Statistics-based Solution for Online Multichannel Noise Tracking and Reduction
A Second-Order-Statistics-based Solution for Online Multichannel Noise Tracking and Reduction Mehrez Souden, Jingdong Chen, Jacob Benesty, and Sofiène Affes Abstract We propose a second-order-statistics-based
More informationGeneralized Linear Models
Generalized Linear Models Lecture 7. Models with binary response II GLM (Spring, 2018) Lecture 7 1 / 13 Existence of estimates Lemma (Claudia Czado, München, 2004) The log-likelihood ln L(β) in logistic
More informationAN APPROACH TO PREVENT ADAPTIVE BEAMFORMERS FROM CANCELLING THE DESIRED SIGNAL. Tofigh Naghibi and Beat Pfister
AN APPROACH TO PREVENT ADAPTIVE BEAMFORMERS FROM CANCELLING THE DESIRED SIGNAL Tofigh Naghibi and Beat Pfister Speech Processing Group, Computer Engineering and Networks Lab., ETH Zurich, Switzerland {naghibi,pfister}@tik.ee.ethz.ch
More informationSemi-Supervised Source Localization on Multiple-Manifolds with Distributed Microphones
IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING 1 Semi-Supervised Source Localization on Multiple-Manifolds with Distributed Microphones Bracha Laufer-Goldshtein Student Member, IEEE, Ronen
More informationUSING STATISTICAL ROOM ACOUSTICS FOR ANALYSING THE OUTPUT SNR OF THE MWF IN ACOUSTIC SENSOR NETWORKS. Toby Christian Lawin-Ore, Simon Doclo
th European Signal Processing Conference (EUSIPCO 1 Bucharest, Romania, August 7-31, 1 USING STATISTICAL ROOM ACOUSTICS FOR ANALYSING THE OUTPUT SNR OF THE MWF IN ACOUSTIC SENSOR NETWORKS Toby Christian
More informationRARE SOUND EVENT DETECTION USING 1D CONVOLUTIONAL RECURRENT NEURAL NETWORKS
RARE SOUND EVENT DETECTION USING 1D CONVOLUTIONAL RECURRENT NEURAL NETWORKS Hyungui Lim 1, Jeongsoo Park 1,2, Kyogu Lee 2, Yoonchang Han 1 1 Cochlear.ai, Seoul, Korea 2 Music and Audio Research Group,
More informationA low intricacy variable step-size partial update adaptive algorithm for Acoustic Echo Cancellation USNRao
ISSN: 77-3754 International Journal of Engineering and Innovative echnology (IJEI Volume 1, Issue, February 1 A low intricacy variable step-size partial update adaptive algorithm for Acoustic Echo Cancellation
More informationSTFT Bin Selection for Localization Algorithms based on the Sparsity of Speech Signal Spectra
STFT Bin Selection for Localization Algorithms based on the Sparsity of Speech Signal Spectra Andreas Brendel, Chengyu Huang, and Walter Kellermann Multimedia Communications and Signal Processing, Friedrich-Alexander-Universität
More informationSimple and efficient solutions to the problems associated with acoustic echo cancellation
Scholars' Mine Doctoral Dissertations Student Research & Creative Works Summer 2007 Simple and efficient solutions to the problems associated with acoustic echo cancellation Asif Iqbal Mohammad Follow
More informationMASK WEIGHTED STFT RATIOS FOR RELATIVE TRANSFER FUNCTION ESTIMATION AND ITS APPLICATION TO ROBUST ASR
MASK WEIGHTED STFT RATIOS FOR RELATIVE TRANSFER FUNCTION ESTIMATION AND ITS APPLICATION TO ROBUST ASR Zhong-Qiu Wang, DeLiang Wang, Department of Computer Science and Engineering, The Ohio State University,
More informationHighway-LSTM and Recurrent Highway Networks for Speech Recognition
Highway-LSTM and Recurrent Highway Networks for Speech Recognition Golan Pundak, Tara N. Sainath Google Inc., New York, NY, USA {golan, tsainath}@google.com Abstract Recently, very deep networks, with
More informationPREDICTION OF HETERODIMERIC PROTEIN COMPLEXES FROM PROTEIN-PROTEIN INTERACTION NETWORKS USING DEEP LEARNING
PREDICTION OF HETERODIMERIC PROTEIN COMPLEXES FROM PROTEIN-PROTEIN INTERACTION NETWORKS USING DEEP LEARNING Peiying (Colleen) Ruan, PhD, Deep Learning Solution Architect 3/26/2018 Background OUTLINE Method
More informationAlbenzio Cirillo INFOCOM Dpt. Università degli Studi di Roma, Sapienza
Albenzio Cirillo INFOCOM Dpt. Università degli Studi di Roma, Sapienza albenzio.cirillo@uniroma1.it http://ispac.ing.uniroma1.it/albenzio/index.htm ET2010 XXVI Riunione Annuale dei Ricercatori di Elettrotecnica
More informationSound Source Tracking Using Microphone Arrays
Sound Source Tracking Using Microphone Arrays WANG PENG and WEE SER Center for Signal Processing School of Electrical & Electronic Engineering Nanayang Technological Univerisy SINGAPORE, 639798 Abstract:
More informationArtificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino
Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as
More informationarxiv: v1 [cs.lg] 4 Aug 2016
An improved uncertainty decoding scheme with weighted samples for DNN-HMM hybrid systems Christian Huemmer 1, Ramón Fernández Astudillo 2, and Walter Kellermann 1 1 Multimedia Communications and Signal
More informationBoundary Contraction Training for Acoustic Models based on Discrete Deep Neural Networks
INTERSPEECH 2014 Boundary Contraction Training for Acoustic Models based on Discrete Deep Neural Networks Ryu Takeda, Naoyuki Kanda, and Nobuo Nukaga Central Research Laboratory, Hitachi Ltd., 1-280, Kokubunji-shi,
More informationMoving Average Rules to Find. Confusion Matrix. CC283 Intelligent Problem Solving 05/11/2010. Edward Tsang (all rights reserved) 1
Machine Learning Overview Supervised Learning Training esting Te Unseen data Data Observed x 1 x 2... x n 1.6 7.1... 2.7 1.4 6.8... 3.1 2.1 5.4... 2.8... Machine Learning Patterns y = f(x) Target y Buy
More informationMusical noise reduction in time-frequency-binary-masking-based blind source separation systems
Musical noise reduction in time-frequency-binary-masing-based blind source separation systems, 3, a J. Čermá, 1 S. Arai, 1. Sawada and 1 S. Maino 1 Communication Science Laboratories, Corporation, Kyoto,
More informationCHAPTER 5 EEG SIGNAL CLASSIFICATION BASED ON NN WITH ICA AND STFT
69 CHAPTER 5 EEG SIGNAL CLASSIFICATION BASED ON NN WITH ICA AND STFT 5.1 OVERVIEW A novel approach is proposed for Electroencephalogram signal classification using Artificial Neural Network based on Independent
More information"Robust Automatic Speech Recognition through on-line Semi Blind Source Extraction"
"Robust Automatic Speech Recognition through on-line Semi Blind Source Extraction" Francesco Nesta, Marco Matassoni {nesta, matassoni}@fbk.eu Fondazione Bruno Kessler-Irst, Trento (ITALY) For contacts:
More informationArtificial Intelligence
Artificial Intelligence Jeff Clune Assistant Professor Evolving Artificial Intelligence Laboratory Announcements Be making progress on your projects! Three Types of Learning Unsupervised Supervised Reinforcement
More informationEnsemble Methods. NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan
Ensemble Methods NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan How do you make a decision? What do you want for lunch today?! What did you have last night?! What are your favorite
More informationAn EM Algorithm for Localizing Multiple Sound Sources in Reverberant Environments
An EM Algorithm for Localizing Multiple Sound Sources in Reverberant Environments Michael I. Mandel, Daniel P. W. Ellis LabROSA, Dept. of Electrical Engineering Columbia University New York, NY {mim,dpwe}@ee.columbia.edu
More informationSegmental Recurrent Neural Networks for End-to-end Speech Recognition
Segmental Recurrent Neural Networks for End-to-end Speech Recognition Liang Lu, Lingpeng Kong, Chris Dyer, Noah Smith and Steve Renals TTI-Chicago, UoE, CMU and UW 9 September 2016 Background A new wave
More informationImproved noise power spectral density tracking by a MAP-based postprocessor
Improved noise power spectral density tracking by a MAP-based postprocessor Aleksej Chinaev, Alexander Krueger, Dang Hai Tran Vu, Reinhold Haeb-Umbach University of Paderborn, Germany March 8th, 01 Computer
More informationSmart Home Health Analytics Information Systems University of Maryland Baltimore County
Smart Home Health Analytics Information Systems University of Maryland Baltimore County 1 IEEE Expert, October 1996 2 Given sample S from all possible examples D Learner L learns hypothesis h based on
More informationIntroduction to RNNs!
Introduction to RNNs Arun Mallya Best viewed with Computer Modern fonts installed Outline Why Recurrent Neural Networks (RNNs)? The Vanilla RNN unit The RNN forward pass Backpropagation refresher The RNN
More informationSource localization in an ocean waveguide using supervised machine learning
Source localization in an ocean waveguide using supervised machine learning Haiqiang Niu, Emma Reeves, and Peter Gerstoft Scripps Institution of Oceanography, UC San Diego Part I Localization on Noise09
More informationRecurrent neural networks
12-1: Recurrent neural networks Prof. J.C. Kao, UCLA Recurrent neural networks Motivation Network unrollwing Backpropagation through time Vanishing and exploding gradients LSTMs GRUs 12-2: Recurrent neural
More informationA POSTERIORI SPEECH PRESENCE PROBABILITY ESTIMATION BASED ON AVERAGED OBSERVATIONS AND A SUPER-GAUSSIAN SPEECH MODEL
A POSTERIORI SPEECH PRESENCE PROBABILITY ESTIMATION BASED ON AVERAGED OBSERVATIONS AND A SUPER-GAUSSIAN SPEECH MODEL Balázs Fodor Institute for Communications Technology Technische Universität Braunschweig
More informationEE-559 Deep learning Recurrent Neural Networks
EE-559 Deep learning 11.1. Recurrent Neural Networks François Fleuret https://fleuret.org/ee559/ Sun Feb 24 20:33:31 UTC 2019 Inference from sequences François Fleuret EE-559 Deep learning / 11.1. Recurrent
More informationA Generalized Subspace Approach for Enhancing Speech Corrupted by Colored Noise
334 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL 11, NO 4, JULY 2003 A Generalized Subspace Approach for Enhancing Speech Corrupted by Colored Noise Yi Hu, Student Member, IEEE, and Philipos C
More information<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)
Learning for Deep Neural Networks (Back-propagation) Outline Summary of Previous Standford Lecture Universal Approximation Theorem Inference vs Training Gradient Descent Back-Propagation
More informationON THE LIMITATIONS OF BINAURAL REPRODUCTION OF MONAURAL BLIND SOURCE SEPARATION OUTPUT SIGNALS
th European Signal Processing Conference (EUSIPCO 12) Bucharest, Romania, August 27-31, 12 ON THE LIMITATIONS OF BINAURAL REPRODUCTION OF MONAURAL BLIND SOURCE SEPARATION OUTPUT SIGNALS Klaus Reindl, Walter
More informationSensor Tasking and Control
Sensor Tasking and Control Sensing Networking Leonidas Guibas Stanford University Computation CS428 Sensor systems are about sensing, after all... System State Continuous and Discrete Variables The quantities
More informationBLSTM-HMM HYBRID SYSTEM COMBINED WITH SOUND ACTIVITY DETECTION NETWORK FOR POLYPHONIC SOUND EVENT DETECTION
BLSTM-HMM HYBRID SYSTEM COMBINED WITH SOUND ACTIVITY DETECTION NETWORK FOR POLYPHONIC SOUND EVENT DETECTION Tomoki Hayashi 1, Shinji Watanabe 2, Tomoki Toda 1, Takaaki Hori 2, Jonathan Le Roux 2, Kazuya
More informationDevelopment of a Deep Recurrent Neural Network Controller for Flight Applications
Development of a Deep Recurrent Neural Network Controller for Flight Applications American Control Conference (ACC) May 26, 2017 Scott A. Nivison Pramod P. Khargonekar Department of Electrical and Computer
More informationarxiv: v1 [eess.as] 12 Jul 2018
OPTIMAL BINAURAL LCMV BEAMFORMING IN COMPLEX ACOUSTIC SCENARIOS: TEORETICAL AND PRACTICAL INSIGTS Nico Gößling 1 Daniel Marquardt 1 Ivo Merks Tao Zhang Simon Doclo 1 1 University of Oldenburg Department
More informationTensor-Train Long Short-Term Memory for Monaural Speech Enhancement
1 Tensor-Train Long Short-Term Memory for Monaural Speech Enhancement Suman Samui, Indrajit Chakrabarti, and Soumya K. Ghosh, arxiv:1812.10095v1 [cs.sd] 25 Dec 2018 Abstract In recent years, Long Short-Term
More informationReduced-cost combination of adaptive filters for acoustic echo cancellation
Reduced-cost combination of adaptive filters for acoustic echo cancellation Luis A. Azpicueta-Ruiz and Jerónimo Arenas-García Dept. Signal Theory and Communications, Universidad Carlos III de Madrid Leganés,
More informationTTIC 31230, Fundamentals of Deep Learning David McAllester, April Vanishing and Exploding Gradients. ReLUs. Xavier Initialization
TTIC 31230, Fundamentals of Deep Learning David McAllester, April 2017 Vanishing and Exploding Gradients ReLUs Xavier Initialization Batch Normalization Highway Architectures: Resnets, LSTMs and GRUs Causes
More informationAnalysis of the Learning Process of a Recurrent Neural Network on the Last k-bit Parity Function
Analysis of the Learning Process of a Recurrent Neural Network on the Last k-bit Parity Function Austin Wang Adviser: Xiuyuan Cheng May 4, 2017 1 Abstract This study analyzes how simple recurrent neural
More informationTemporal Backpropagation for FIR Neural Networks
Temporal Backpropagation for FIR Neural Networks Eric A. Wan Stanford University Department of Electrical Engineering, Stanford, CA 94305-4055 Abstract The traditional feedforward neural network is a static
More informationSpatial sound. Lecture 8: EE E6820: Speech & Audio Processing & Recognition. Columbia University Dept. of Electrical Engineering
EE E6820: Speech & Audio Processing & Recognition Lecture 8: Spatial sound 1 Spatial acoustics 2 Binaural perception 3 Synthesizing spatial audio 4 Extracting spatial sounds Dan Ellis
More informationProc. of NCC 2010, Chennai, India
Proc. of NCC 2010, Chennai, India Trajectory and surface modeling of LSF for low rate speech coding M. Deepak and Preeti Rao Department of Electrical Engineering Indian Institute of Technology, Bombay
More informationSpeaker Representation and Verification Part II. by Vasileios Vasilakakis
Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation
More informationBIAS CORRECTION METHODS FOR ADAPTIVE RECURSIVE SMOOTHING WITH APPLICATIONS IN NOISE PSD ESTIMATION. Robert Rehr, Timo Gerkmann
BIAS CORRECTION METHODS FOR ADAPTIVE RECURSIVE SMOOTHING WITH APPLICATIONS IN NOISE PSD ESTIMATION Robert Rehr, Timo Gerkmann Speech Signal Processing Group, Department of Medical Physics and Acoustics
More informationIndependent Component Analysis and Unsupervised Learning. Jen-Tzung Chien
Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood
More informationSensitivity Considerations in Compressed Sensing
Sensitivity Considerations in Compressed Sensing Louis L. Scharf, 1 Edwin K. P. Chong, 1,2 Ali Pezeshki, 2 and J. Rockey Luo 2 1 Department of Mathematics, Colorado State University Fort Collins, CO 8523,
More informationResidual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition
INTERSPEECH 017 August 0 4, 017, Stockholm, Sweden Residual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition Jaeyoung Kim 1, Mostafa El-Khamy 1, Jungwon Lee 1 1 Samsung Semiconductor,
More informationARTIFICIAL INTELLIGENCE. Artificial Neural Networks
INFOB2KI 2017-2018 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Artificial Neural Networks Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html
More informationAn Autoregressive Recurrent Mixture Density Network for Parametric Speech Synthesis
ICASSP 07 New Orleans, USA An Autoregressive Recurrent Mixture Density Network for Parametric Speech Synthesis Xin WANG, Shinji TAKAKI, Junichi YAMAGISHI National Institute of Informatics, Japan 07-03-07
More informationMultiple Speaker Tracking with the Factorial von Mises- Fisher Filter
Multiple Speaker Tracking with the Factorial von Mises- Fisher Filter IEEE International Workshop on Machine Learning for Signal Processing Sept 21-24, 2014 Reims, France Johannes Traa, Paris Smaragdis
More informationRecurrent Neural Network
Recurrent Neural Network Xiaogang Wang xgwang@ee..edu.hk March 2, 2017 Xiaogang Wang (linux) Recurrent Neural Network March 2, 2017 1 / 48 Outline 1 Recurrent neural networks Recurrent neural networks
More information