A Survey on Voice Activity Detection Methods
|
|
- Philomena Lee Nash
- 5 years ago
- Views:
Transcription
1 e-issn Volume 2 Issue 4, April 2016 pp Scientific Journal Impact Factor : A Survey on Voice Activity Detection Methods Shabeeba T. K. 1, Anand Pavithran 2 1,2 Department of Computer Science and Engineering MES College of Engineering, Kuttippuram Kerala, , India Abstract Voice Activity Detection(VAD) is a technique used in speech processing in which the presence or absence of human speech is detected. It can facilitate speech processing, and can also be used to deactivate some processes during non- speech section of an audio session. Various VAD algorithms have been developed that provide varying features and compromises between latency, sensitivity, accuracy and computational cost. The VAD methods formulates the decision rule on a frame by frame basis using instantaneous measures of the divergence distance between speech and noise. The different measures which are used in VAD methods include spectral slope, correlation coefficients, log likelihood ratio, cepstral, weighted cepstral, and modified distance measures etc. Statistical and Machine learning methods have been used for VAD recently. The study involves various VAD methods. Keywords Voice Activity Detection, Deep Belief Network I. INTRODUCTION Determining the beginning and the termination of speech in the presence of background noise is a complicated problem. Voice activity detector (VAD) tries to separate speech signals from background noises. The result of a VAD decision is a binary value, which indicates the presence of speech in the input signal (for example the output value is 1) or the presence of noise only (for example the output value is 0). For automatic speech recognition, endpoint detection is required to isolate the speech of interest so as to be able to create a speech pattern or template. A VAD algorithm is an integral part from amongst a variety of speech communication systems, such as speech recognition and speech coding in mobile phones, and IP telephony. In telecommunication systems an effective VAD algorithm plays an important role, especially in automatic speech recognition (ASR) systems. It can be used to deactivate some processes during non-speech section of an audio session. And also it is used to reduce the computation by eliminating unnecessary transmission and processing of non- speech segments and to reduce potential mis-recognition errors in non-speech segments. In several speech communications scenarios, it is useful to use discontinuous transmission (DTX). In a wireless (cell phone) case, avoiding transmission during speech pauses will prolong battery life in portable units and reduce interference to other wireless users (users in nearby cells using the same frequencies)[7]. For conversational speech, each side normally talks less than 50% of the time. The typical design of a VAD algorithm is as follows: 1) There may be a noise reduction stage. 2) Some features or quantities are calculated from a section of the input signal. 3) A classification rule is applied to classify the section as speech or nonspeech. The classification of VAD are[1]: VADs in standard speech processing All rights Reserved 668
2 Statistical signal processing based VADs. Supervised machine learning based VADs. Unsupervised machine learning based VADs. This paper pay particular attention to the study of different VAD methods and the performance evaluation of them. Here five VAD methods are considered and their advantages and disadvantages are listed. II. VAD METHODS Researches are always been conducted to improve the efficiency of voice activity detection with maximum accuracy. This chapter briefly presents some of such effective approaches to voice activity detection. A. Discriminative Training for Multiple Observation Likeli- hood Ratio Based Voice Activity Detection VAD decisions made from multiple observations will reduce the miss-hit rate in the speech offset region or false-alarm rate in the noise nonstationary region than that made from a single instantaneous observation, taking advantage of the strong correlation in the consecutive time-frames of speech. This paper[2] propose a supervised machine learning based VAD in which two discriminative training methods are further studied for effective combination of multiple observation LRs, in terms of misclassification errors and receiver operating characteristics(roc) curves. 1) Signal Model and Single Observation LLR: Assume that the speech is degraded by an uncorrelated additive noise. Under two hypotheses H0 (speech-pause) and H1 (speech- active), the observation in the short-time Fourier transform (STFT) domain can be written as, H0 (speech-pause) : x k,t = n k,t H1 (speech-active) : x k,t = s k,t + n k,t (2.1) where k and t are the frequency-bin and time-frame index, respectively. 2) Multiple Observation LLRs: Incorporate contextual information into the decision rule will increase the robustness of VAD in noisy conditions. Suppose that a collection of M sequential LLRs from the current time-frame t, denoted as l t = { l t, l t 1,..., l t M+1 } T, is used to make VAD decision for the current time-frame t, a new statistic that reflects the dependence on the current time-frame as well as its previous M-1 time-frames, can be expressed as, where w = { w1, w2,..., wm } T is a vector of the combination weights for different time-frames. The decision rule can then be established as, 3) Discriminative Training: In discriminative training, VAD performance is directly associated with a designed objective function, which can be optimized within the training All rights Reserved 669
3 1) Minimal Classification Error Training Suppose there is a set of labeled LLRs for training, denoted as L = {L s,l p } where L s = {l i s, i=1,2,...,n s } and L p = {l j p, j=1,2,...,n p } represent the portion of the training set containing all the LLRs labeled as speech-active or speech- pause, respectively. Minimum classification error (MCE) training is a well known discriminative training approach, which aims at minimizing the misclassification errors over the entire training set. The MCE loss function and can be defined as, Basically, the minimization of MCE can improve the VAD performance in terms of reduced amount of two types of errors(e.g., the miss-hit errors and false-alarm errors). 2) Maximal Area Under the ROC Curve Training The ROC curves are frequently used to completely describe the VAD performance. A ROC curve is drawn by varying the decision threshold to reflect the relationship between speech-hit rate (HR1), defined as the fraction of all actual speech frames that are correctly classified as speech-active frames against the false-alarm rate (FAR0), defined as the the fraction of all the actual speech-pause(e.g., noise only) frames that are incorrectly classified as speech frames. As illustrated in Figure 1, the closer the ROC curve is toward the upper left corner, the better the classifier s ability to discriminate between the two classes. Thus, the area under the ROC curve (AUC) is a general, robust measure of classifier discrimination performance, Fig. 1. Illustration of ROC curve and AUC regardless of the decision threshold, which may be unknown, changeable over time, or might vary depending on how the classifier will be used in practical applications. B. Support Vector Machine Based VAD SVM based VAD [3] employs effective feature vectors: a posteriori SNR, a priori SNR and a predicted SNR as principal parameters. 1) Feature Vector Extraction: A noise signal d is added to a speech signal s, with their sum being denoted by x. By taking the Discrete Fourier Transform(DFT), the noise spectra D, the clean speech spectra S the noisy speech spectra X such All rights Reserved 670
4 where k is the frequency-bin index (k = 0,1,...M-1 ) and n is the frame index (n = 0,1,...). Assuming that speech is degraded by uncorrelated additive noise, for each frame, two hypotheses are there, H0 : speech absent : X(n) = D(n) H1 : speech present : X(n) = S(n)+D(n) (2.6) in which X(n), D(n) and S(n) denote the DFT coefficients at the n th frame of the noisy speech, noise and clean speech respectively. Consider the a posteriori SNR γ k (n) as the first feature vector, which is derived by the ratio of the input signal X k (n) and the variance λ d,k (n) of the noise signal D k (n) updated in the periods of speech absence. The second feature vector is the a priori SNR is calculated using a decision directed approach and the third feature is the predicted SNR, which is estimated by the long-term smoothed power spectra of the background noise and speech. The estimated noise power spectra for the predicted SNR estimation is given by, and speech power spectra is, where are the estimates for λ d,k (n) and λ s,k (n). Also, δ d(=0.98) and δ s (=0.98) are the experimental chosen parameter values for D k (n) and S k (n). 2) VAD based on SVM: The SVM makes it possible to build an optimal hyper plane that is separated without error where the distance between closest vectors and the hyper plane becomes maximal. Given training data consisting of N dimensional patterns x i and the corresponding class labels z i, (x 1,z 1 ),...,(x l,z l ), x R N, z { +1,-1 }, the equation f or the hyper plane is given by, where w is the weight vector, b is the bias and <u,v> represents the inner product between u and v. The SVM inherently offers support vectors x i * (i = 1,...,k) and optimized bias b* from the training data, and then output function of the linear SVM for an input vector x All rights Reserved 671
5 Kernel functions are introduced rather than the linear kernel in order to consider nonlinear input space. Sometimes processing the kernel is cheaper than processing the entire feature. C. Maximum Margin Clustering Based Statistical VAD With Multiple Observation Compound Feature Maximum Margin Clustering Based VAD(MMC based VAD) [4] extends the idea of SVM which aims at finding a maximum margin hyper plane. One maximum margin hyper plane could be found in the feature space which will lead to the minimal classification error. 1) Feature Extraction: A new feature called multiple observation compound feature (MO-CF) is proposed. It takes the advantages of the statistical model and the multiple observation techniques. Specifically, it consists of two sub features. The first sub feature of MO-CF is the multiple observation signal-to-noise ratio (MO-SNR) feature ρ MO which is derived from single-observation SNR (SO-SNR). It has a better control over the randomness of the SNR estimation and leads to better performance on speech detection rate (SD) than SO-SNR. However, MO-SNR increases the false alarm rate(fa)simultaneously. To overcome this drawback, multiple observation maximum probability (MO-MP) φ is included as the second subfeature. The φ vector is derived from revised MO-LRT (RMO-LRT) and inherits the good ability of RMOLRT on FA. The major difference between MO-MP and RMOLRT is that MO-MP consists of LRT scores of all DFT bins under the maximum probabilistic global hypotheses while RMO-LRT is a sum of the LRT scores. Obviously, the former is more informative than the latter. Although MO-MP could yield higher SD than RMO- LRT, it is still inferior to MO-SNR on SD. In order to combine the merits of the two proposed sub features, the MO-CF is defined as, where β is to balance the contributions of the two sub-features for the best overall performance. D. VAD Based on Unsupervised Learning Framework VADs are generally characterized by acoustic features and classifiers. In this paper [5], select the smoothed subband logarithmic energy as the acoustic feature. The input signal is grouped into several Mel subbands in the frequency domain. Then, the logarithmic energy is calculated by using the logarithmic value of the absolute magnitude sum of each subband. Eventually, it is smoothed to form an envelope for classification. Two Gaussian models are employed as the classifier to describe the logarithmic energy distributions of speech and nonspeech, respectively. These two models are incorporated into a two-component Gaussian Mixture Model(GMM). Its parameters are estimated in an unsupervised way. Speech/nonspeech classification is firstly conducted at each subband. Then, all subband s decisions are summarized by a voting procedure. 1) Modeling Logarithmic Energy Distribution With GMM: Assuming that both speech and nonspeech log energies obey the Gaussian distribution, the bimodal distribution can be fitted by a two-component GMM, where one component with the smaller mean is identified as the nonspeech mode and the other component for the speech mode. This model is described by the following equations. Let x k denote the logarithmic energy of a subband at the time k. z is the speech/nonspeech label, z {0,1}, where 0 denotes nonspeech and 1 for speech. According to the Baye s rule, we have the All rights Reserved 672
6 where p(z) is the prior probability of speech/nonspeech, and is actually equal to the weight coefficient w z (w 0 +w 1 =1). p(x k z,λ ) represents the likelihood of given the speech/nonspeech model: where µ z and K z, respectively, denote the mean and variance. λ µ z, K z, w z z=0,1 is the parameter set of the GMM. An interesting point is that, the mean difference µ 1 -µ 0 represents the a posteriori SNR because µ 1 and µ 0 are, respectively, the averaged logarithmic energy of speech and nonspeech. E. Deep Belief Networks based VAD The DBN-based VAD first connects multiple acoustic features of an observation in serial to a long feature vector which is used as the visible layer [i.e., input] of DBN[1]. Then, a new feature is extracted by transferring the long feature vector through multiple nonlinear hidden layers. Finally, the class of the observation is predicted by a linear classifier [i.e., softmax output layer] of DBN with the new feature as its input. Because VAD only contains two classes [i.e., K=2 ], we can further get the prediction function of the DBN-based VAD as follows: where H1/H0 denotes the speech/noise hypothesis, ε is a tunable decision threshold, usually setting to 0 and s k is defined as, where d k is defined as, and g (L) (.) is the activation function of the L th hidden layer, is the weights between the adjacent two layers with i as the i th unit of the L th layer and j as the j th unit of the (L-1) th layer and { x r } r is the input feature vector. 1) Deep Belief Networks: DBN is a type of the deep neural networks, if trained successfully, they can achieve a strong generalization ability with few training data. It is a probabilistic generative model that consists of multiple hidden layers of stochastic latent variables. The top two layers of DBN All rights Reserved 673
7 undirected, symmetric connections and form an associative memory. Other hidden layers form a topdown directed acyclic graph [6]. The units in the lowest layer are called visible units, which represent an input feature vector. Successively connected two layers formulate a constituent module of DBN, called restricted Boltzmann machine (RBM), therefore, DBN is a stack of RBMs. The training process of DBN consists of two phases[1]. First, it takes a greedy layer-wise unsupervised pre-training phase of the stacked RBMs to find initial parameters that are close to a good solution of the deep neural network. Then, it takes a supervised back-propagation training phase to fine-tune the initial parameters. The key point that contributes to the success of DBN is the greedy layer-wise unsupervised pre-training of the RBM models. It performs like a regularizer of the supervised training phase that prevents DBN from over-fitting to the training set. Fig. 2. An RBM with l visible units and J hidden units. Because the layer-wise unsupervised pre-training of the RBM models contributes to the success of DBN, this special training process is introduced below. RBM is an energy-model based two layer, bipartite, undirected stochastic graphical model as shown in figure 2. Specifically, one layer of RBM is composed of visible units v, and the other layer is composed of hidden units h. There are symmetric connections between the two layers and no connection within each layer. The connection weights can be represented by a weight matrix W. This paper consider only the Bernoulli (visible)- Bernoulli (hidden) RBM, which means v i {0,1} and h j {0,1}. RBM tries to find a model that maximize the likelihood of v, which is equivalent to the following optimization problem, where the marginal distribution P(v ; W) is defined as, With denoted as the partition function or the normalization factor, and the energy model is given by, Energy (v,h;w)=-b T v- c T h-h T Wv (2.22) where b and c are bias terms of visible layer and hidden All rights Reserved 674
8 III. PERFORMANCE ANALYSIS The former section deals with several voice activity detec- tion methods including statistical, supervised machine learn- ing, unsupervised machine learning and DBN based methods. A comparative study of those methods are conducted. The advantages and disadvantages of these methods are listed. The Multiple Observation Likelihood Ratio Based VAD is robust in noisy conditions, but its disadvantage is the high computational complexity. The SVM based VAD makes use of a time-varying signal-to-noise ratio and kernal trick is there. In SVM based VAD, the choice of kernal is a complex task. In MMC based VAD there is no labeling of training data, as a result the computational complexity is high. Even though it has good performance at low level SNR. The VAD Based on Unsupervised Learning Framework doesn t rely on nonspeech beginning and it uses only a simple acoustic feature for classification, so the output will not be much accurate. The advantages of multiple acoustic features are combined in DBN based VAD such that the variations of the features can be descried. As the number of features increases, the complexity of the network also increases and it will take more time for voice activty detection. IV. CONCLUSION Voice activity detector (VAD) tries to separate speech signals from background noises. There are various methods for VAD such as statistical, supervised machine learning, unsupervised machine learning based etc. In this work several VAD methods are studied and their performance is evaluated. In which the DBN based VAD is outperforming others. The DBN-based VAD aims to extract a new feature that can fully express the advantages of all acoustic features. The complexity of DBN is more as the number of features are more. It will be more advantageous that if we can achieve the same accuracy with a less complex network. REFERENCES [1] Xiao-Lei Zhang and Ji Wu, Deep Belief Networks Based Voice Ac- tivity Detection, IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 4, April [2] Tao Yu and John H. L. Hansen, Discriminative Training for Multiple Observation Likelihood Ratio Based Voice Activity Detection, IEEE Signal Processing Letters, Vol. 17, No. 11, November [3] Ji Wu and Xiao-Lei Zhang, VAD based on statistical models and machine learning approaches, ELSEVIER, Computer Speech and Lang., [4] Ji Wu and Xiao-Lei Zhang, Maximum Margin Clustering Based Statis- tical VAD With Multiple Observation Compound Feature, IEEE Signal Processing Letters, Vol. 18, No. 5, May [5] Dongwen Ying, Yonghong Yan, Jianwu Dang, and Frank K. Soong, Voice Activity Detection Based on an Unsupervised Learning Frame- work, IEEE Transactions on Audio, Speech, and Language Processing, Vol. 19, No. 8, November [6] D. Yu and L. Deng, Deep learning and its applications to signal and information processing, IEEE Signal Processing Magazine, vol. 28, no. 1, pp , Jan [7] Lawrence R. Rabiner and R. W. Schafer, Digital Processing of Speech Signals, Pearson Education, Jan. All rights Reserved 675
Modifying Voice Activity Detection in Low SNR by correction factors
Modifying Voice Activity Detection in Low SNR by correction factors H. Farsi, M. A. Mozaffarian, H.Rahmani Department of Electrical Engineering University of Birjand P.O. Box: +98-9775-376 IRAN hfarsi@birjand.ac.ir
More informationDeep Learning Srihari. Deep Belief Nets. Sargur N. Srihari
Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationSpeaker Representation and Verification Part II. by Vasileios Vasilakakis
Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation
More informationUNSUPERVISED LEARNING
UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training
More informationDeep Neural Networks
Deep Neural Networks DT2118 Speech and Speaker Recognition Giampiero Salvi KTH/CSC/TMH giampi@kth.se VT 2015 1 / 45 Outline State-to-Output Probability Model Artificial Neural Networks Perceptron Multi
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationDeep unsupervised learning
Deep unsupervised learning Advanced data-mining Yongdai Kim Department of Statistics, Seoul National University, South Korea Unsupervised learning In machine learning, there are 3 kinds of learning paradigm.
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Slide Set 3: Detection Theory January 2018 Heikki Huttunen heikki.huttunen@tut.fi Department of Signal Processing Tampere University of Technology Detection theory
More informationRobust Speaker Identification
Robust Speaker Identification by Smarajit Bose Interdisciplinary Statistical Research Unit Indian Statistical Institute, Kolkata Joint work with Amita Pal and Ayanendranath Basu Overview } } } } } } }
More informationCS534 Machine Learning - Spring Final Exam
CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the
More informationGreedy Layer-Wise Training of Deep Networks
Greedy Layer-Wise Training of Deep Networks Yoshua Bengio, Pascal Lamblin, Dan Popovici, Hugo Larochelle NIPS 2007 Presented by Ahmed Hefny Story so far Deep neural nets are more expressive: Can learn
More informationSpeaker Verification Using Accumulative Vectors with Support Vector Machines
Speaker Verification Using Accumulative Vectors with Support Vector Machines Manuel Aguado Martínez, Gabriel Hernández-Sierra, and José Ramón Calvo de Lara Advanced Technologies Application Center, Havana,
More informationLecture 16 Deep Neural Generative Models
Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed
More informationNovel spectrum sensing schemes for Cognitive Radio Networks
Novel spectrum sensing schemes for Cognitive Radio Networks Cantabria University Santander, May, 2015 Supélec, SCEE Rennes, France 1 The Advanced Signal Processing Group http://gtas.unican.es The Advanced
More informationA Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement
A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement Simon Leglaive 1 Laurent Girin 1,2 Radu Horaud 1 1: Inria Grenoble Rhône-Alpes 2: Univ. Grenoble Alpes, Grenoble INP,
More informationVoice Activity Detection Using Pitch Feature
Voice Activity Detection Using Pitch Feature Presented by: Shay Perera 1 CONTENTS Introduction Related work Proposed Improvement References Questions 2 PROBLEM speech Non speech Speech Region Non Speech
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationIntroduction to Statistical Inference
Structural Health Monitoring Using Statistical Pattern Recognition Introduction to Statistical Inference Presented by Charles R. Farrar, Ph.D., P.E. Outline Introduce statistical decision making for Structural
More informationIntroduction to Machine Learning Midterm Exam
10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but
More informationEEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1
EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationBrief Introduction of Machine Learning Techniques for Content Analysis
1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview
More informationIntroduction to Neural Networks
Introduction to Neural Networks Steve Renals Automatic Speech Recognition ASR Lecture 10 24 February 2014 ASR Lecture 10 Introduction to Neural Networks 1 Neural networks for speech recognition Introduction
More informationLarge-Scale Feature Learning with Spike-and-Slab Sparse Coding
Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab
More informationReading Group on Deep Learning Session 4 Unsupervised Neural Networks
Reading Group on Deep Learning Session 4 Unsupervised Neural Networks Jakob Verbeek & Daan Wynen 206-09-22 Jakob Verbeek & Daan Wynen Unsupervised Neural Networks Outline Autoencoders Restricted) Boltzmann
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationPerformance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project
Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore
More informationCSC321 Lecture 20: Autoencoders
CSC321 Lecture 20: Autoencoders Roger Grosse Roger Grosse CSC321 Lecture 20: Autoencoders 1 / 16 Overview Latent variable models so far: mixture models Boltzmann machines Both of these involve discrete
More informationIndependent Component Analysis and Unsupervised Learning. Jen-Tzung Chien
Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood
More informationBayesian Networks Inference with Probabilistic Graphical Models
4190.408 2016-Spring Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang intelligence Lab Seoul National University 4190.408 Artificial (2016-Spring) 1 Machine Learning? Learning
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationUsually the estimation of the partition function is intractable and it becomes exponentially hard when the complexity of the model increases. However,
Odyssey 2012 The Speaker and Language Recognition Workshop 25-28 June 2012, Singapore First attempt of Boltzmann Machines for Speaker Verification Mohammed Senoussaoui 1,2, Najim Dehak 3, Patrick Kenny
More informationDeep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści
Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?
More informationEnvironmental Sound Classification in Realistic Situations
Environmental Sound Classification in Realistic Situations K. Haddad, W. Song Brüel & Kjær Sound and Vibration Measurement A/S, Skodsborgvej 307, 2850 Nærum, Denmark. X. Valero La Salle, Universistat Ramon
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction
More informationIntroduction to Signal Detection and Classification. Phani Chavali
Introduction to Signal Detection and Classification Phani Chavali Outline Detection Problem Performance Measures Receiver Operating Characteristics (ROC) F-Test - Test Linear Discriminant Analysis (LDA)
More informationMining Classification Knowledge
Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology COST Doctoral School, Troina 2008 Outline 1. Bayesian classification
More informationRobust Sound Event Detection in Continuous Audio Environments
Robust Sound Event Detection in Continuous Audio Environments Haomin Zhang 1, Ian McLoughlin 2,1, Yan Song 1 1 National Engineering Laboratory of Speech and Language Information Processing The University
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationIntroduction to Machine Learning Midterm Exam Solutions
10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,
More informationMining Classification Knowledge
Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology SE lecture revision 2013 Outline 1. Bayesian classification
More informationThe Origin of Deep Learning. Lili Mou Jan, 2015
The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets
More informationSignal Modeling Techniques in Speech Recognition. Hassan A. Kingravi
Signal Modeling Techniques in Speech Recognition Hassan A. Kingravi Outline Introduction Spectral Shaping Spectral Analysis Parameter Transforms Statistical Modeling Discussion Conclusions 1: Introduction
More informationRepresentational Power of Restricted Boltzmann Machines and Deep Belief Networks. Nicolas Le Roux and Yoshua Bengio Presented by Colin Graber
Representational Power of Restricted Boltzmann Machines and Deep Belief Networks Nicolas Le Roux and Yoshua Bengio Presented by Colin Graber Introduction Representational abilities of functions with some
More informationMidterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationRobust Classification using Boltzmann machines by Vasileios Vasilakakis
Robust Classification using Boltzmann machines by Vasileios Vasilakakis The scope of this report is to propose an architecture of Boltzmann machines that could be used in the context of classification,
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More informationFinal Exam, Machine Learning, Spring 2009
Name: Andrew ID: Final Exam, 10701 Machine Learning, Spring 2009 - The exam is open-book, open-notes, no electronics other than calculators. - The maximum possible score on this exam is 100. You have 3
More informationPattern Recognition and Machine Learning. Perceptrons and Support Vector machines
Pattern Recognition and Machine Learning James L. Crowley ENSIMAG 3 - MMIS Fall Semester 2016 Lessons 6 10 Jan 2017 Outline Perceptrons and Support Vector machines Notation... 2 Perceptrons... 3 History...3
More informationSinger Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers
Singer Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers Kumari Rambha Ranjan, Kartik Mahto, Dipti Kumari,S.S.Solanki Dept. of Electronics and Communication Birla
More informationEstimation of Relative Operating Characteristics of Text Independent Speaker Verification
International Journal of Engineering Science Invention Volume 1 Issue 1 December. 2012 PP.18-23 Estimation of Relative Operating Characteristics of Text Independent Speaker Verification Palivela Hema 1,
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationSpeaker recognition by means of Deep Belief Networks
Speaker recognition by means of Deep Belief Networks Vasileios Vasilakakis, Sandro Cumani, Pietro Laface, Politecnico di Torino, Italy {first.lastname}@polito.it 1. Abstract Most state of the art speaker
More informationIndependent Component Analysis and Unsupervised Learning
Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien National Cheng Kung University TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent
More informationDigital Signal Processing
Digital Signal Processing 0 (010) 157 1578 Contents lists available at ScienceDirect Digital Signal Processing www.elsevier.com/locate/dsp Improved minima controlled recursive averaging technique using
More informationIntro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation
Lecture 15. Pattern Classification (I): Statistical Formulation Outline Statistical Pattern Recognition Maximum Posterior Probability (MAP) Classifier Maximum Likelihood (ML) Classifier K-Nearest Neighbor
More informationAugmented Statistical Models for Speech Recognition
Augmented Statistical Models for Speech Recognition Mark Gales & Martin Layton 31 August 2005 Trajectory Models For Speech Processing Workshop Overview Dependency Modelling in Speech Recognition: latent
More informationECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction
ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering
More informationDetection of Anomalies in Texture Images using Multi-Resolution Features
Detection of Anomalies in Texture Images using Multi-Resolution Features Electrical Engineering Department Supervisor: Prof. Israel Cohen Outline Introduction 1 Introduction Anomaly Detection Texture Segmentation
More informationarxiv: v2 [cs.ne] 22 Feb 2013
Sparse Penalty in Deep Belief Networks: Using the Mixed Norm Constraint arxiv:1301.3533v2 [cs.ne] 22 Feb 2013 Xanadu C. Halkias DYNI, LSIS, Universitè du Sud, Avenue de l Université - BP20132, 83957 LA
More informationLearning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014
Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of
More informationHow to do backpropagation in a brain
How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep
More informationSUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION
SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology
More informationFrom perceptrons to word embeddings. Simon Šuster University of Groningen
From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written
More informationSupport Vector Machines using GMM Supervectors for Speaker Verification
1 Support Vector Machines using GMM Supervectors for Speaker Verification W. M. Campbell, D. E. Sturim, D. A. Reynolds MIT Lincoln Laboratory 244 Wood Street Lexington, MA 02420 Corresponding author e-mail:
More informationRestricted Boltzmann Machines for Collaborative Filtering
Restricted Boltzmann Machines for Collaborative Filtering Authors: Ruslan Salakhutdinov Andriy Mnih Geoffrey Hinton Benjamin Schwehn Presentation by: Ioan Stanculescu 1 Overview The Netflix prize problem
More informationLecture 3: Machine learning, classification, and generative models
EE E6820: Speech & Audio Processing & Recognition Lecture 3: Machine learning, classification, and generative models 1 Classification 2 Generative models 3 Gaussian models Michael Mandel
More informationMathematical Formulation of Our Example
Mathematical Formulation of Our Example We define two binary random variables: open and, where is light on or light off. Our question is: What is? Computer Vision 1 Combining Evidence Suppose our robot
More informationLecture 3: Pattern Classification
EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures
More information10-701/ Machine Learning, Fall
0-70/5-78 Machine Learning, Fall 2003 Homework 2 Solution If you have questions, please contact Jiayong Zhang .. (Error Function) The sum-of-squares error is the most common training
More informationPerformance Evaluation and Comparison
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Cross Validation and Resampling 3 Interval Estimation
More informationModel-based unsupervised segmentation of birdcalls from field recordings
Model-based unsupervised segmentation of birdcalls from field recordings Anshul Thakur School of Computing and Electrical Engineering Indian Institute of Technology Mandi Himachal Pradesh, India Email:
More informationCOMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines. COMP9444 c Alan Blair, 2017
COMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines COMP9444 17s2 Boltzmann Machines 1 Outline Content Addressable Memory Hopfield Network Generative Models Boltzmann Machine Restricted Boltzmann
More informationSelf Supervised Boosting
Self Supervised Boosting Max Welling, Richard S. Zemel, and Geoffrey E. Hinton Department of omputer Science University of Toronto 1 King s ollege Road Toronto, M5S 3G5 anada Abstract Boosting algorithms
More informationLecture 3: Pattern Classification. Pattern classification
EE E68: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mitures and
More informationLearning Deep Architectures for AI. Part II - Vijay Chakilam
Learning Deep Architectures for AI - Yoshua Bengio Part II - Vijay Chakilam Limitations of Perceptron x1 W, b 0,1 1,1 y x2 weight plane output =1 output =0 There is no value for W and b such that the model
More informationUnsupervised Learning
CS 3750 Advanced Machine Learning hkc6@pitt.edu Unsupervised Learning Data: Just data, no labels Goal: Learn some underlying hidden structure of the data P(, ) P( ) Principle Component Analysis (Dimensionality
More informationWhen Dictionary Learning Meets Classification
When Dictionary Learning Meets Classification Bufford, Teresa 1 Chen, Yuxin 2 Horning, Mitchell 3 Shee, Liberty 1 Mentor: Professor Yohann Tendero 1 UCLA 2 Dalhousie University 3 Harvey Mudd College August
More informationSINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan
SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS Emad M. Grais and Hakan Erdogan Faculty of Engineering and Natural Sciences, Sabanci University, Orhanli
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationPILCO: A Model-Based and Data-Efficient Approach to Policy Search
PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol
More informationDeep Learning Architecture for Univariate Time Series Forecasting
CS229,Technical Report, 2014 Deep Learning Architecture for Univariate Time Series Forecasting Dmitry Vengertsev 1 Abstract This paper studies the problem of applying machine learning with deep architecture
More informationDensity functionals from deep learning
Density functionals from deep learning Jeffrey M. McMahon Department of Physics & Astronomy March 15, 2016 Jeffrey M. McMahon (WSU) March 15, 2016 1 / 18 Kohn Sham Density-functional Theory (KS-DFT) The
More informationWhat Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1
What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1 Multi-layer networks Steve Renals Machine Learning Practical MLP Lecture 3 7 October 2015 MLP Lecture 3 Multi-layer networks 2 What Do Single
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1
More informationLearning Methods for Linear Detectors
Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2011/2012 Lesson 20 27 April 2012 Contents Learning Methods for Linear Detectors Learning Linear Detectors...2
More informationSINGLE-CHANNEL SPEECH PRESENCE PROBABILITY ESTIMATION USING INTER-FRAME AND INTER-BAND CORRELATIONS
204 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) SINGLE-CHANNEL SPEECH PRESENCE PROBABILITY ESTIMATION USING INTER-FRAME AND INTER-BAND CORRELATIONS Hajar Momeni,2,,
More informationVOICE ACTIVITY DETECTION IN PRESENCE OF TRANSIENT NOISE USING SPECTRAL CLUSTERING AND DIFFUSION KERNELS
2014 IEEE 28-th Convention of Electrical and Electronics Engineers in Israel VOICE ACTIVITY DETECTION IN PRESENCE OF TRANSIENT NOISE USING SPECTRAL CLUSTERING AND DIFFUSION KERNELS Oren Rosen, Saman Mousazadeh
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu November 3, 2015 Methods to Learn Matrix Data Text Data Set Data Sequence Data Time Series Graph
More informationCPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017
CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class
More informationIntelligent Systems Statistical Machine Learning
Intelligent Systems Statistical Machine Learning Carsten Rother, Dmitrij Schlesinger WS2014/2015, Our tasks (recap) The model: two variables are usually present: - the first one is typically discrete k
More informationIntroduction to SVM and RVM
Introduction to SVM and RVM Machine Learning Seminar HUS HVL UIB Yushu Li, UIB Overview Support vector machine SVM First introduced by Vapnik, et al. 1992 Several literature and wide applications Relevance
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationDetection theory 101 ELEC-E5410 Signal Processing for Communications
Detection theory 101 ELEC-E5410 Signal Processing for Communications Binary hypothesis testing Null hypothesis H 0 : e.g. noise only Alternative hypothesis H 1 : signal + noise p(x;h 0 ) γ p(x;h 1 ) Trade-off
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Support Vector Machine (SVM) Hamid R. Rabiee Hadi Asheri, Jafar Muhammadi, Nima Pourdamghani Spring 2013 http://ce.sharif.edu/courses/91-92/2/ce725-1/ Agenda Introduction
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More information