THE MARKOV SELECTION MODEL FOR CONCURRENT SPEECH RECOGNITION

Size: px
Start display at page:

Download "THE MARKOV SELECTION MODEL FOR CONCURRENT SPEECH RECOGNITION"

Transcription

1 THE MARKOV SELECTION MODEL FOR CONCURRENT SEECH RECOGNITION aris Smaragdis Adobe Systems Inc. Cambridge, MA, USA Bhiksha Raj Walt Disney Research ittsburgh, A, USA ABSTRACT In this paper we introduce a new Markov model that is capable o recogniing speech rom recordings o simultaneously speaking a priori known speakers. This work is based on recent work on nonnegative representations o spectrograms, which has been shown to be very eective in source separation problems. In this paper we extend these approaches to design a Markov selection model that is able to recognie sequences even when they are presented mixed together. We do so without the need to perorm separation on the signals. Unlike actorial Markov models which have been used similarly in the past, this approach eatures a low computational complexity in the number o sources and Markov states, which makes it a highly eicient alternative. We demonstrate the use o this ramework in recogniing speech rom mixtures o known speakers.. INTRODUCTION Speech recognition o concurrent speakers is a signiicantly hard task. Although contrived as a scenario, its solution can be o great use to noise-robust speech recognition and can provide insight on how to deal with some o the hardest problems in acoustic sensing. Current models or speech recognition cannot be easily extended to deal with additive intererence, and oten need to be complemented with a separation algorithm that preprocesses the data beore recognition takes place. This is oten a risky combination since the output o a separation algorithm is not guaranteed to be recogniable speech, at least not by a speech recognition system. A dierent temporally-sensitive approach characteries the speech rom all concurrent sources (speakers) by HMMs. The sum o the speech is then characteried by a actorial HMM, which is essentially a product o the HMMs representing the individual sources. Inerence can be run on this actorial HMM to determine what was spoken by individual speakers. The problem o course is computational complexity: the number o states in the actorial HMM is the product o the states in the HMMs or individual sources, i.e. is polynomial in the number o concurrent sources. The time taken or inerence is polynomial in twice the number o sources, requiring complicated variational methods to make it tractable. In this paper we introduce a Markov selection model coupled with a probabilistic decomposition that allows us to recognie additive speech mixtures in time that is linear in the number o concurrent sources. We make use o speaker-dependent models, which when used on speech mixtures can obtain reliable estimates o all the spoken utterances. In the ollowing sections we describe an Author also with Language Technologies Institute, School o Computer Science, Carnegie Mellon University. additive model which has been used in the past or source separation, then describe how it can be incorporated into an HMM-based speech recognition ramework to orm the proposed Markov selection model. Finally we present results rom experiments based on the ICSL 006 speech separation challenge data [].. ADDITIVE MODELS OF SOUNDS Recently we have seen wide use o non-negative actoriation methods with applications to source separation rom single channel recordings ([,, ]). In this paper we will make use o such a model and adapt it or use in a speech recognition system. Let us begin by reviewing how these models work and also setup the notation used henceorth. Many modern source separation methods use prior knowledge o the sources in a mixture. A common scenario is one where or two speakers, speaker a and speaker b,wehave training recordings x a (t) and x b (t), and a mixture m(t) =y a (t) +y b (t). Our goal is then to use the inormation extracted rom x a (t) and x b (t) to estimate y a (t) and y b (t) by observing only m(t). A very eicient and capable approach to perorm this task is by using non-negative spectrum actoriation methods. Here we will use a probabilistic version o these techniques which allows us to later incorporate it in a Markov model. In this setting, we extract the spectral magnitude o the observed signals at regularly sampled analysis rames: X () DFT(x(T ( ) +,..., T )), () where T is the sie o the analysis rame we choose to use. Doing so we can obtain X a and X b, the magnitude spectra or signals rom speakers a and b. We model magnitude spectra as histograms drawn rom a mixture o multinomial distributions. This leads to the ollowing latent variable model: X () ( ) (), () where the symbol represents drawing rom a distribution, ( ) represents the th component multinomial and () is the probability with which it is mixed to produce X, the magnitude spectrum vector or the th analysis rame. M is the total number o component multinomials. The component multinomials (which we will reer to as multinomial bases ) ( ) or any speaker and the corresponding mixture weights () or each spectral vector can now be estimated using an Expectation-Maximiation algorithm. This is essentially a simpliied plsi model [5], but looking past its probabilistic ormulation we note that ( ) is in act

2 a normalied spectrum. The set o all multinomials can thus be viewed as a dictionary o spectral bases, Equation can be viewed as an algebraic decomposition and M as the rank o this decomposition. () can be seen as weights that tell us how to put the dictionary elements together to approximate the input at hand. Thus Equation can be written as: X () ˆX () =g M X ( ) (), () where g = X (). The scalar g ensures that the eventual approximation is scaled appropriately to match the input. This can also be thought o as a non-negative matrix actoriation [6] in which ( ) and () correspond to the two non-negative actors. There are two key observations we need to make in order to be able to extract y a (t) and y b (t) rom m(t). Theirst one is that in general it will hold that: M () Y a ()+Y b (). () This means that the magnitude spectrogram o the mixture o the two sources will be approximately equal to the sum o the magnitude spectrograms o the two sources. Although due to phase cancellations we cannot achieve exact equality, this assumption has been used with great success so ar and it is largely correct or most practical purposes. The second observation is that the multinomial bases a ( ), that we can estimate rom X a, can describe Y a better than the bases b ( ) estimated rom X b, and vice versa, i.e.: D KL D KL Y a g! a ( ) () < Y a g! (5) b ( ) () and vice-versa. Where D KL( ) denotes the Kullback-Leibler divergence, a ( ) and b ( ) are the dictionaries learned rom x a and x b, and each () is the optimal weight distribution or given each o the two dictionaries. These two observations then allow us to assume that the observed mixture M () can be explained well using both dictionaries a ( ) and b ( ): approximating Y a M () g (a) a ( ) () +g (b) b ( ) (), or two optimally selected instances o (). In addition to being well explained, we would expect to see most o the energy o each speaker being represented by the part o this summation that includes the multinomial bases or that speaker. For both the dictionary learning and the weight estimation parts, we can use the Expectation-Maximiation algorithm to estimate any o the quantities in the above equations. The update (6) T s (a) B s X T s w (b) B s X Fig.. (a) A graphical representation o a conventional HMM. The state at each time is dependent on the state at the previous time and generates the observation. Dotted arrows indicate injection o parameters. (b) The proposed Markov selection model. The state selects the multinomial bases that generate the observation. The bases are mixed by a weight w. This additional dependence is highlighted by the dotted outline. equations or any dictionary element ( ) and its corresponding weight () or an input X () are: ( )X () () = (7), ( )X () ( )X () ( ) = (8), ( )X () where: () ( ) ( ) = ( ) ( ). (9) The dictionary o multinomial bases or each o the sources may be learned rom separate training data. These can be used to decompose mixed recordings (i.e. to ind the mixture weights () or all bases). Once the decomposition in Equation 6 is achieved, we can then recompose separately Y a () and Y b () and then invert them back to the time domain to obtain our separated estimates o y a (t) and y b (t). The perormance o this approach can vary depending on various details we do not present here, but in its better orms this has been shown to achieve suppression o unwanted speakers up to 0dB or more []. However since the objective o this paper is to recognie as opposed to separate, it would be more beneicial to use direct recognition using this model as opposed to separating and then recogniing. To do so we will incorporate this model into a hidden Markov model structure as shown in the ollowing section.. THE MARKOV SELECTION MODEL.. Model deinition In this section we will introduce an application o the model and the observations made in the previous section as applied on temporal data. A conventional Hidden Markov Model is a doublystochastic model comprising an underlying Markov chain and observation probability densities at each state in the chain. The parameters characteriing the model are the initial state probabilities Π={ (s) s} representing the probabilities o beginning a Markov chain at each state, a transition matrix T = { (s i s j) s i,s j} that represents the set o all transition probabilities between every pair o states, and a set o state output distributions B =

3 ( ) ( ) T, T, T, ( ) ( ) o all weights to obtain the likelihood o the observation, we will use the Markov-chain-independent a posteriori state probability or all inerence and learning o Markov chain parameters. Secondly, we will approximate the a posteriori state probability by the sum o a posteriori most likely mixture weights or the Multinomial bases selected by any state. I.e. we use the approximation: i X () i ( X ) ˆ ( X ) = arg max ( x) = () (0) ind (s x) = X ˆ ( X )= X () () s s Fig.. A two state let-to-right Markov model is shown at the top. Each state selects one pair o multinomial bases. The two bases that describe each state are shown let and right as ( i). The bottom o the igure displays the input X () that this model can best describe. The let part being best described as a mixture o ( ) and ( ) and the right part by ( ) and ( ). { (x s) s} representing the probability o generating observations rom each o the states. The graphical model or the HMM is shown in Figure.a. The model we propose extends this conventional model as shown in Figure.b. Instead o states generating observations directly, they generate the labels s = {} o sets o multinomial bases that will produce observations. Thus, the output distributions o the HMM are B = { ( s s) s}. To generate observations, the multinomial bases in s are mixed according to weights w. The vector o weights or all bases, w, which actually represents a multinomial over, is drawn rom a distribution which, or this paper, we will assume is uniorm. Only the bases selected by the state (and their weights, appropriately normalied) are used to generate the inal observation. Since the underlying Markov process contributes to data generation primarily by selecting bases, we reer to this as the Markov Selection Model. Figure illustrates the generation process with an example. A key aspect o the above model is that the weights w are not ixed but are themselves drawn or every observation. Further, the draw o the weights themselves is not dependent on the state in any manner, but is independent. The actual probability o an observation depends on the mixture weights. Thus, in order to compute the complete likelihood o an observation we must integrate the product o the weight-dependent likelihood o the observation and the probability o drawing the mixture weight vector over the entire probability simplex on which w resides. We note that the primary use or this model is that o inerring the underlying state sequence given this model. To do so, it is suicient to determine the Markov-chain-independent a posteriori probabilities ind (s x) o the states, and utilie those or estimating the state sequence; the actual observation probability (x s) is not required. Indeed, this observation is also utilied in several approaches to HMM-based speech recognition systems where the Markov-chain-independent a posteriori probabilities o states are obtained through models such as Neural Networks [7] or inerence o the underlying word sequence. As a irst step, instead o explicitly integrating over the space 0 where () is the same value reerred to in Equation 8. In other words, we derive the mixture weights that maximie the likelihood o the portion o the graph enclosed by the dashed outline in Figure.b. We do so without reerence to the Markov chain and utilie them to compute the markov-chain-independent conditional probabilities or states, which will be used in the inerence. This eectively actors the observation dependency and the state dependency o the model. Thus, one o the advantages conerred by the approximation is that the model o Figure.b gets actored in two parts. The irst (enclosed by the dashed outline in the igure) is essentially a plsa model that obtains w ml and thereby (). The second, given the () computed rom the irst part, is eectively an HMM with ( ) as state output densities. Inerence and learning can run largely independently in the two components, with the plsa component employed to learn its parameters, while the HMM can use the standard Baum-Welch training procedure [8] to learn the Markov chain parameters Π and T. The two components must however combine or learning the multinomial bases ( )... arameter estimation We train this structure by appropriately adapting the Baum-Welch training procedure [8] to use the new state model. Due to space constraints in this ormat, we present the learning process at a supericial level. In the irst step we compute the emission probability terms or each state. Since this is locally also a maximum likelihood estimate we estimate an intermediate value o the optimal weight vector by: () ( ) ( ) = () ( ) ( ) ( )X () () = (), ( )X () Note that the above estimation does not reer to the underlying Markov chain or its states; all computations are local to the components within the dotted outline o Figure.b. Once () has been obtained, we compute the posterior state probability (s X )= ( s) using Equation. The orward backward algorithm can then be employed as in conventional HMM. Forward probabilities α, backward probabilities β and state posteriors γ are given by the recursions: α (s) = X s α (s )T s,s ( s) β (s) = X s β +(s )T s,s +( s ) γ (s) = α (s)β (s) s α (s )β (s ). ()

4 s s w X s s Fig.. Model or producing a mixture o two sources. Each source ollows its own Markov chain. Each source s state selects some subset o multinomial bases. The mixture weights or all bases (including both sources) are independently drawn. The inal observation is generated by the weighted mixture o bases selected by both sources. In the Maximiation step we need to estimate all the dictionary elements (, i). To do so we use the state posteriors to appropriately weigh Equation 8 and obtain: ( ) = s: s γ (s) ( )X (),,s : s γ (s ) ( )X () (5) Here s : s represents the set o states which can select basis. Update rules or transition matrix T and the initial state probabilities are the same as with traditional HMM models. The most prevalent problem we encounter during training is that o strong local optima which can cause convergence towards a poor solution. This can happen or example when the multinomial bases or the terminal state adapt aster towards explaining the irst ew input time points. Since this is a potentially serious problem we need to ensure that convergence o the dictionary elements is not too rapid so that there is a signiicant likelihood that dictionary elements across states can switch i needed. This can be easily achieved by imposing an anti-sparsity prior on the activation o the dictionary elements. We do so by using a Dirichlet prior over the mixture weights or all ( ) with hyper-parameters α j slowly transitioning rom.5 to during training. This measure ensures that we obtain consistent results over multiple runs and that we do not converge on blatantly wrong local optima... Sequence Estimation The procedure or computing the optimal state sequence, given all model parameters, is straight orward. For each observation we compute the emission probability or each state through the EM estimation o Equations and Equation. The Viterbi algorithm can then be used to ind the optimal state sequence. The two examples in Figure illustrate the results obtained, both rom the learning and the state sequence estimation. For each example, a three-state model o the proposed architecture is learned. We then obtain the optimal state sequence or each data sequence using the model estimated rom it. The state segmentations obtained are shown in the bottom plots o Figure. As we can observe, the segmentation obtained is intuitive each o the states captures a locally consistent region o the data.. MODELING MIXTURES OF SOUNDS The main advantage o the proposed model becomes apparent when we use it to analye the sum o the output o two separate processes. Let X a () and X b () be two data sequences obtained separately rom two sources that are well modeled by the model o Section. Let the actual observation X () =X a ()+X b (). Then it is relatively straightorward to show that the statistical model or X () is given by Figure. As beore, each o the two sources ollows its own independent Markov chain. The state output distributions or each source are selector unctions, as in the case o the single source. The primary dierence lies in the manner in which the summed data are generated. An independent process now draws a mixture weight vector that includes mixture weights or all bases o both sources. The inal observation is obtained by the mixing o the bases selected by the states o both o the sources using the drawn mixture weights. The problem o estimating the state sequences or the individual sources is now easily solved. Using the same approximation we use in Section. we irst compute the optimal weights or all bases using iterations o Equation. The iterations compute the () or all bases rom all sources. Once these are computed, the Markov-chain-independent a posteriori state probabilities or each o the states o the Markov models or both sources are computed using Equation as ollows: (s X (i) )= X s () (6) where X (i) is the i th source at time step, s is any state in the Markov model or the i th source and s is the set o bases selected by the state. Remarkably, the model permits us compute the state emission probabilities or the individual sources, given only the sum o their outputs. The optimal state sequences or the individual source can now be independently obtained by the Viterbi algorithm. As a result, the complexity o thisprocess, given K sources, each modeled by N states is O(KN ), equivalent to perorming K independent Viterbi decodes. This is in contrast to conventional actorial approach to modeling the mixture o multiple sources, where the resulting model has N K states and Viterbi estimation o the optimal state sequence requires O(N K ) operations necessitating complex variational approximations to simpliy the problem. Figure 5 illustrates the proposed procedure. The top plot is a mixed data sequence composed as a sum o the two sequences in Figure. Ordinarily in this situation we would have to use a actorial Markov model which would consider all twelve possible combinations between both models states, and then obtain the most likely state paths using a -d Viterbi search. Using the proposed approach we obtain the individual emission scores or the states o the individual HMMs or every time instant and obtain the optimal state sequence independently or both sources. The obtained state sequences are shown in the bottom plots o Figure 5. We note that they are identical to the state sequences obtained rom the isolated sequences in Figure. Multiple runs over various inputs provide similar results with only occasional and minor dierences between the states extracted rom the isolated sequences and their sum. 5. RECOGNIZING MIXTURES OF SEAKERS In this section we present two experiments that demonstrate the use o this model in speech recognition applications. We present a small experiment that perorms digit recognition on simultaneous digit speaking by the same speaker, and a larger experiment based on the speech separation challenge data [].

5 Series Series Sum o series and estimate or HMM estimate or HMM estimate or HMM estimate or HMM Fig.. Two input patterns (top igures) and their corresponding state sequences (bottom igures) as discovered by the proposed model. Fig. 5. Sum o the patterns in Figure and the extracted state sequences as computed by the models learned on the isolated patterns. 5.. A small scale experiment 0 + mix + mix + mix +5 mix In this experiment we use digit data rom the speech separation challenge corpus to illustrate the ability o this model to discover sequences rom speech mixtures. In this experiment we chose ten utterances o ive dierent digits rom one speaker and we trained an instance o the proposed Markov model or each digit. For simplicity we used our states or all digits and three requency distributions or each state. As an input we used pre-emphasied magnitude spectra rom roughly 5ms windows. We then used an additional unknown utterance o each digit to construct a set o sound mixtures containing one digit each. We analyed these mixtures using the pre-learned digit models and examined their estimated likelihoods in order to discover which utterances were spoken in the mixture. A representative example o these results is shown or our mixture cases in Figure 6. The log likelihoods o the spoken digits were signiicantly higher than the non-spoken digits and rom them we can easily deduce the contents o the recording. In the next section we generalie this idea to a much larger and more thorough experiment. 5.. A large scale experiment In this section we describe an experiment using the speaker separation challenge data set []. We present results on both the task set orth or this challenge, but also or ull word recognition. The data in this challenge were composed o mixture recordings o two speakers simultaneously uttering sentences o a predeined structure. In the irst case we evaluate the task is to identiy a speciic word in the sentence uttered by the primary speaker, whereas in the second case we attempt to recognie all words or both utterances. The eatures we used were magnitude spectral eatures. We used a time rame o about 0ms, and a rame advance o 5ms. The magnitude spectra were preemphasied so that the higher requency content was more pronounced. We trained the proposed model or each word and each speaker using the number o states guidelines provided by the dataset documentation. We used one Log likelihood Fig. 6. Estimated model log likelihoods evaluated on mixtures o digits. Each subplot contains the log likelihoods o each digit model or a dierent digit mixture as denoted above each plot. As shown here the two highest log likelihoods coincide with the digits that were spoken in that mixture, thereby providing us with the digit recognition. requency distribution per state and trained each model or 500 iterations. The resulting models rom each speaker were then combined to orm a larger Markov model which can model an entire target sentence with equiprobable jumps between all candidate words at each section. For each mixture sentence the speaker identities were provided in advance and the two Markov models describing all the possible utterances were used to estimate the most likely state sequence or each speaker as described in the previous section. The results o these simulations are shown in tables, or the proposed task in the challenge, and or the word recognition rates or both o the simultaneous utterances. The SNR column describes the amplitude dierence between the primary and the secondary speakers. As expected the louder the primary speaker is the better results we achieve. The Same speaker column shows the results when the two utterances were recorded rom the same speaker. This is the worst case scenario since the dictionary elements in our model will have maximal overlap and the state posterior probabilities will be unreliable. We clearly see that in this case the results are the lowest we obtain. The Same gender column describes the results when the two speakers were o the same

6 gender. This is a somewhat better situation since the will be less overlap between the state dictionary elements and the results relect that by being higher. Finally the even better case is when the two speakers are o dierent gender, in which case there is a high likelihood that dictionary elements will not overlap signiicantly, thus we obtain the best results. The inal two columns present the average results o our proposed model and or a baseline comparison, the GMM Avg. label presents the average results we obtained using the same representation and a Gaussian state HMM, while treating the secondary speaker as noise. The overall results we obtain rank high in terms o previously achieved results or this task [], and come at a signiicantly lower computational cost than other approaches due to the eicient decoding scheme we introduce. SNR Same Same Di Avg. GHMM speaker gender gender Avg. 6dB 58.% 68.% 69.8% 65.% 8.0% db 6.% 6.% 6.7% 58.0% 7.% 0dB.7% 5.9% 60.5% 8.6% 9.% -db.7%.8% 5.0% 9.% 0.8% -6dB.6% 6.0% 5.7%.% 5.5% -9dB 8.7%.5% 7.0% 5.%.% Table. Detailed task results using the proposed model SNR Same Same Di Avg. speaker gender gender Clean N/A N/A N/A 88% 6dB 68% % 80% 59% 8% 70% 77% 5% db 57% % 77% 67% 80% 76% 7% 6% 0dB 6% 5% 68% 75% 76% 80% 6% 69% -db 5% 65% 6% 80% 7% 8% 55% 76% -6dB 6% 7% 5% 8% 6% 86% 7% 8% -9dB % 80% 8% 87% 57% 87% % 8% Table. Overall word recognition results using the proposed model. Let percentages are denoting the correct recognition rate or the primary speaker s words, right percentages do so or the secondary speaker. An interesting point to make here is that the representation that we used is balancing a tradeo between mixture modeling and recognition. The ine requency resolution and linear amplitude scale that we use aid in discriminating the two speakers and acilitates the additivity assumption, but it also impedes recognition since it highlights pitch and amplitude variances. In contrast to that, a speech recognition system would use a lower requency resolution that conceals pitch inormation but maintains spectral shape, and would also use that representation in the log amplitude domain so that subtle amplitude patterns can be easier to detect. Selecting the proper representation is a process that involves trading o the ability to discriminate sources and the ability to recognie, something which is application dependent. Once the state transitions have been estimated rom a mixture, it is also trivial to perorm separation o the constituent sources. Since this operation is out o the scope o this paper we deer the presentation o this experiment to uture publications. 6. CONCLUSIONS In this paper we introduced the Markov Selection Model, a new statistical model which combines models used or source separation with a Markov structure. We demonstrate how this model can learn and recognie sequences, but also perorm recognition and state estimation even when presented mixed mixed signals. Supericially this model is similar to the one in [9], but at closer inspection provides a more extensive basis model and is substantially dierent in estimation and inerence. We ormulate this model in such a way that so that it allows us to perorm state estimation on mixed sequences with linear complexity in the number o sources. This is a signiicant computational improvement as compared to similarly employed actorial Markov models, and one that doesn t sacriice perormance by a noticeable amount. This structure can also be represented as a Conditional Random Field, which can result in additional structural possibilities and a more straightorward inerence ormulation, and it can also be used or solving source separation and denoising problems. We anticipate to address these possibilities in uture work. Acknowledgements The authors acknowledge Gautham Mysore s help in the preparation o this manuscript. 7. REFERENCES [] martin/ SpeechSeparationChallenge.htm [] Raj, B.,. Smaragdis. Latent Variable Decomposition o Spectrograms or Single Channel Speaker Separation, in 005 IEEE Workshop on Applications o Signal rocessing to Audio and Acoustics (WASAA 005) [] Virtanen, T. and A. T. Cemgil. Mixtures o Gamma riors or Non-Negative Matrix Factoriation Based Speech Separation, in 8th International Conerence on Independent Component Analysis and Signal Separation (ICA 009). [] Smaragdis,., M. Shashanka, and B. Raj. A sparse nonparametric approach or single channel separation o known sounds, Neural Inormation rocessing Systems (NIS) 009. [5] Homann, T. robabilistic Latent Semantic Indexing, in 999 ACM SIGIR Special Interest Group on Inormation Retrieval Conerence (SIGIR 999). [6] Lee D.D., and H.S. Seung. Learning the parts o objects by non-negative matrix actoriation. Nature 0, 999. [7] Bourlard, H. and N. Morgan, Hybrid HMM/ANN systems or speech recognition: Overview and new research directions, LNCS, Springer Berlin, vol. 87, 998, pp [8] Rabiner, L.R. and B. H. Juang. An introduction to hidden Markov models. IEEE Acoustics, Speech and Signal rocessing (ASS) Magaine, (): 6, 986. [9] Oerov, A., C. Févotte and M. Charbit. Factorial scaled hidden Markov model or polyphonic audio representation and source separation, in 009 IEEE Workshop on Applications o Signal rocessing to Audio and Acoustics (WASAA 009).

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

The achievable limits of operational modal analysis. * Siu-Kui Au 1)

The achievable limits of operational modal analysis. * Siu-Kui Au 1) The achievable limits o operational modal analysis * Siu-Kui Au 1) 1) Center or Engineering Dynamics and Institute or Risk and Uncertainty, University o Liverpool, Liverpool L69 3GH, United Kingdom 1)

More information

Sound Recognition in Mixtures

Sound Recognition in Mixtures Sound Recognition in Mixtures Juhan Nam, Gautham J. Mysore 2, and Paris Smaragdis 2,3 Center for Computer Research in Music and Acoustics, Stanford University, 2 Advanced Technology Labs, Adobe Systems

More information

Single Channel Signal Separation Using MAP-based Subspace Decomposition

Single Channel Signal Separation Using MAP-based Subspace Decomposition Single Channel Signal Separation Using MAP-based Subspace Decomposition Gil-Jin Jang, Te-Won Lee, and Yung-Hwan Oh 1 Spoken Language Laboratory, Department of Computer Science, KAIST 373-1 Gusong-dong,

More information

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS Emad M. Grais and Hakan Erdogan Faculty of Engineering and Natural Sciences, Sabanci University, Orhanli

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

Environmental Sound Classification in Realistic Situations

Environmental Sound Classification in Realistic Situations Environmental Sound Classification in Realistic Situations K. Haddad, W. Song Brüel & Kjær Sound and Vibration Measurement A/S, Skodsborgvej 307, 2850 Nærum, Denmark. X. Valero La Salle, Universistat Ramon

More information

Sparse and Shift-Invariant Feature Extraction From Non-Negative Data

Sparse and Shift-Invariant Feature Extraction From Non-Negative Data MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Sparse and Shift-Invariant Feature Extraction From Non-Negative Data Paris Smaragdis, Bhiksha Raj, Madhusudana Shashanka TR8-3 August 8 Abstract

More information

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood

More information

Gaussian Process Regression Models for Predicting Stock Trends

Gaussian Process Regression Models for Predicting Stock Trends Gaussian Process Regression Models or Predicting Stock Trends M. Todd Farrell Andrew Correa December 5, 7 Introduction Historical stock price data is a massive amount o time-series data with little-to-no

More information

p(d θ ) l(θ ) 1.2 x x x

p(d θ ) l(θ ) 1.2 x x x p(d θ ).2 x 0-7 0.8 x 0-7 0.4 x 0-7 l(θ ) -20-40 -60-80 -00 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ θ x FIGURE 3.. The top graph shows several training points in one dimension, known or assumed to

More information

Non-Negative Matrix Factorization And Its Application to Audio. Tuomas Virtanen Tampere University of Technology

Non-Negative Matrix Factorization And Its Application to Audio. Tuomas Virtanen Tampere University of Technology Non-Negative Matrix Factorization And Its Application to Audio Tuomas Virtanen Tampere University of Technology tuomas.virtanen@tut.fi 2 Contents Introduction to audio signals Spectrogram representation

More information

Least-Squares Spectral Analysis Theory Summary

Least-Squares Spectral Analysis Theory Summary Least-Squares Spectral Analysis Theory Summary Reerence: Mtamakaya, J. D. (2012). Assessment o Atmospheric Pressure Loading on the International GNSS REPRO1 Solutions Periodic Signatures. Ph.D. dissertation,

More information

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II) CISC 889 Bioinformatics (Spring 24) Hidden Markov Models (II) a. Likelihood: forward algorithm b. Decoding: Viterbi algorithm c. Model building: Baum-Welch algorithm Viterbi training Hidden Markov models

More information

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes (bilmes@cs.berkeley.edu) International Computer Science Institute

More information

Independent Component Analysis and Unsupervised Learning

Independent Component Analysis and Unsupervised Learning Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien National Cheng Kung University TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent

More information

Non-negative Matrix Factorization: Algorithms, Extensions and Applications

Non-negative Matrix Factorization: Algorithms, Extensions and Applications Non-negative Matrix Factorization: Algorithms, Extensions and Applications Emmanouil Benetos www.soi.city.ac.uk/ sbbj660/ March 2013 Emmanouil Benetos Non-negative Matrix Factorization March 2013 1 / 25

More information

Hidden Markov Models and Gaussian Mixture Models

Hidden Markov Models and Gaussian Mixture Models Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 23&27 January 2014 ASR Lectures 4&5 Hidden Markov Models and Gaussian

More information

A TWO-LAYER NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR VOCABULARY DISCOVERY. MengSun,HugoVanhamme

A TWO-LAYER NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR VOCABULARY DISCOVERY. MengSun,HugoVanhamme A TWO-LAYER NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR VOCABULARY DISCOVERY MengSun,HugoVanhamme Department of Electrical Engineering-ESAT, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, Bus

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs

Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs Paris Smaragdis TR2004-104 September

More information

CS229 Project: Musical Alignment Discovery

CS229 Project: Musical Alignment Discovery S A A V S N N R R S CS229 Project: Musical Alignment iscovery Woodley Packard ecember 16, 2005 Introduction Logical representations of musical data are widely available in varying forms (for instance,

More information

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement Simon Leglaive 1 Laurent Girin 1,2 Radu Horaud 1 1: Inria Grenoble Rhône-Alpes 2: Univ. Grenoble Alpes, Grenoble INP,

More information

Math 350: An exploration of HMMs through doodles.

Math 350: An exploration of HMMs through doodles. Math 350: An exploration of HMMs through doodles. Joshua Little (407673) 19 December 2012 1 Background 1.1 Hidden Markov models. Markov chains (MCs) work well for modelling discrete-time processes, or

More information

A New OCR System Similar to ASR System

A New OCR System Similar to ASR System A ew OCR System Similar to ASR System Abstract Optical character recognition (OCR) system is created using the concepts of automatic speech recognition where the hidden Markov Model is widely used. Results

More information

Study Notes on the Latent Dirichlet Allocation

Study Notes on the Latent Dirichlet Allocation Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection

More information

Lecture 3: ASR: HMMs, Forward, Viterbi

Lecture 3: ASR: HMMs, Forward, Viterbi Original slides by Dan Jurafsky CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 3: ASR: HMMs, Forward, Viterbi Fun informative read on phonetics The

More information

Hidden Markov Modelling

Hidden Markov Modelling Hidden Markov Modelling Introduction Problem formulation Forward-Backward algorithm Viterbi search Baum-Welch parameter estimation Other considerations Multiple observation sequences Phone-based models

More information

FACTORIAL HMMS FOR ACOUSTIC MODELING. Beth Logan and Pedro Moreno

FACTORIAL HMMS FOR ACOUSTIC MODELING. Beth Logan and Pedro Moreno ACTORIAL HMMS OR ACOUSTIC MODELING Beth Logan and Pedro Moreno Cambridge Research Laboratories Digital Equipment Corporation One Kendall Square, Building 700, 2nd loor Cambridge, Massachusetts 02139 United

More information

EUSIPCO

EUSIPCO EUSIPCO 213 1569744273 GAMMA HIDDEN MARKOV MODEL AS A PROBABILISTIC NONNEGATIVE MATRIX FACTORIZATION Nasser Mohammadiha, W. Bastiaan Kleijn, Arne Leijon KTH Royal Institute of Technology, Department of

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Nonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation

Nonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation Nonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation Mikkel N. Schmidt and Morten Mørup Technical University of Denmark Informatics and Mathematical Modelling Richard

More information

MAP adaptation with SphinxTrain

MAP adaptation with SphinxTrain MAP adaptation with SphinxTrain David Huggins-Daines dhuggins@cs.cmu.edu Language Technologies Institute Carnegie Mellon University MAP adaptation with SphinxTrain p.1/12 Theory of MAP adaptation Standard

More information

Master 2 Informatique Probabilistic Learning and Data Analysis

Master 2 Informatique Probabilistic Learning and Data Analysis Master 2 Informatique Probabilistic Learning and Data Analysis Faicel Chamroukhi Maître de Conférences USTV, LSIS UMR CNRS 7296 email: chamroukhi@univ-tln.fr web: chamroukhi.univ-tln.fr 2013/2014 Faicel

More information

10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course)

10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course) 10. Hidden Markov Models (HMM) for Speech Processing (some slides taken from Glass and Zue course) Definition of an HMM The HMM are powerful statistical methods to characterize the observed samples of

More information

CHAPTER 1: INTRODUCTION. 1.1 Inverse Theory: What It Is and What It Does

CHAPTER 1: INTRODUCTION. 1.1 Inverse Theory: What It Is and What It Does Geosciences 567: CHAPTER (RR/GZ) CHAPTER : INTRODUCTION Inverse Theory: What It Is and What It Does Inverse theory, at least as I choose to deine it, is the ine art o estimating model parameters rom data

More information

Outline of Today s Lecture

Outline of Today s Lecture University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005 Jeff A. Bilmes Lecture 12 Slides Feb 23 rd, 2005 Outline of Today s

More information

Data-Intensive Computing with MapReduce

Data-Intensive Computing with MapReduce Data-Intensive Computing with MapReduce Session 8: Sequence Labeling Jimmy Lin University of Maryland Thursday, March 14, 2013 This work is licensed under a Creative Commons Attribution-Noncommercial-Share

More information

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm Probabilistic Graphical Models 10-708 Homework 2: Due February 24, 2014 at 4 pm Directions. This homework assignment covers the material presented in Lectures 4-8. You must complete all four problems to

More information

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010 Hidden Markov Models Aarti Singh Slides courtesy: Eric Xing Machine Learning 10-701/15-781 Nov 8, 2010 i.i.d to sequential data So far we assumed independent, identically distributed data Sequential data

More information

L23: hidden Markov models

L23: hidden Markov models L23: hidden Markov models Discrete Markov processes Hidden Markov models Forward and Backward procedures The Viterbi algorithm This lecture is based on [Rabiner and Juang, 1993] Introduction to Speech

More information

Lecture 21: Spectral Learning for Graphical Models

Lecture 21: Spectral Learning for Graphical Models 10-708: Probabilistic Graphical Models 10-708, Spring 2016 Lecture 21: Spectral Learning for Graphical Models Lecturer: Eric P. Xing Scribes: Maruan Al-Shedivat, Wei-Cheng Chang, Frederick Liu 1 Motivation

More information

Machine Learning for Signal Processing Expectation Maximization Mixture Models. Bhiksha Raj 27 Oct /

Machine Learning for Signal Processing Expectation Maximization Mixture Models. Bhiksha Raj 27 Oct / Machine Learning for Signal rocessing Expectation Maximization Mixture Models Bhiksha Raj 27 Oct 2016 11755/18797 1 Learning Distributions for Data roblem: Given a collection of examples from some data,

More information

APPENDIX 1 ERROR ESTIMATION

APPENDIX 1 ERROR ESTIMATION 1 APPENDIX 1 ERROR ESTIMATION Measurements are always subject to some uncertainties no matter how modern and expensive equipment is used or how careully the measurements are perormed These uncertainties

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Lecture Notes Speech Communication 2, SS 2004 Erhard Rank/Franz Pernkopf Signal Processing and Speech Communication Laboratory Graz University of Technology Inffeldgasse 16c, A-8010

More information

Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition

Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition ABSTRACT It is well known that the expectation-maximization (EM) algorithm, commonly used to estimate hidden

More information

MULTI-RESOLUTION SIGNAL DECOMPOSITION WITH TIME-DOMAIN SPECTROGRAM FACTORIZATION. Hirokazu Kameoka

MULTI-RESOLUTION SIGNAL DECOMPOSITION WITH TIME-DOMAIN SPECTROGRAM FACTORIZATION. Hirokazu Kameoka MULTI-RESOLUTION SIGNAL DECOMPOSITION WITH TIME-DOMAIN SPECTROGRAM FACTORIZATION Hiroazu Kameoa The University of Toyo / Nippon Telegraph and Telephone Corporation ABSTRACT This paper proposes a novel

More information

Temporal Modeling and Basic Speech Recognition

Temporal Modeling and Basic Speech Recognition UNIVERSITY ILLINOIS @ URBANA-CHAMPAIGN OF CS 498PS Audio Computing Lab Temporal Modeling and Basic Speech Recognition Paris Smaragdis paris@illinois.edu paris.cs.illinois.edu Today s lecture Recognizing

More information

Hidden Markov Models, I. Examples. Steven R. Dunbar. Toy Models. Standard Mathematical Models. Realistic Hidden Markov Models.

Hidden Markov Models, I. Examples. Steven R. Dunbar. Toy Models. Standard Mathematical Models. Realistic Hidden Markov Models. , I. Toy Markov, I. February 17, 2017 1 / 39 Outline, I. Toy Markov 1 Toy 2 3 Markov 2 / 39 , I. Toy Markov A good stack of examples, as large as possible, is indispensable for a thorough understanding

More information

Probabilistic Latent Variable Models as Non-Negative Factorizations

Probabilistic Latent Variable Models as Non-Negative Factorizations 1 Probabilistic Latent Variable Models as Non-Negative Factorizations Madhusudana Shashanka, Bhiksha Raj, Paris Smaragdis Abstract In this paper, we present a family of probabilistic latent variable models

More information

Hidden Markov models

Hidden Markov models Hidden Markov models Charles Elkan November 26, 2012 Important: These lecture notes are based on notes written by Lawrence Saul. Also, these typeset notes lack illustrations. See the classroom lectures

More information

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing Hidden Markov Models By Parisa Abedi Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed data Sequential (non i.i.d.) data Time-series data E.g. Speech

More information

An Efficient Posterior Regularized Latent Variable Model for Interactive Sound Source Separation

An Efficient Posterior Regularized Latent Variable Model for Interactive Sound Source Separation An Efficient Posterior Regularized Latent Variable Model for Interactive Sound Source Separation Nicholas J. Bryan Center for Computer Research in Music and Acoustics, Stanford University Gautham J. Mysore

More information

Hidden Markov Models

Hidden Markov Models 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 22 April 2, 2018 1 Reminders Homework

More information

CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm

CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm + September13, 2016 Professor Meteer CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm Thanks to Dan Jurafsky for these slides + ASR components n Feature

More information

Deep clustering-based beamforming for separation with unknown number of sources

Deep clustering-based beamforming for separation with unknown number of sources INTERSPEECH 2017 August 20 24, 2017, Stockholm, Sweden Deep clustering-based beamorming or separation with unknown number o sources Takuya Higuchi, Keisuke Kinoshita, Marc Delcroix, Kateřina Žmolíková,

More information

Hidden Markov Models and Gaussian Mixture Models

Hidden Markov Models and Gaussian Mixture Models Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 25&29 January 2018 ASR Lectures 4&5 Hidden Markov Models and Gaussian

More information

ORTHOGONALITY-REGULARIZED MASKED NMF FOR LEARNING ON WEAKLY LABELED AUDIO DATA. Iwona Sobieraj, Lucas Rencker, Mark D. Plumbley

ORTHOGONALITY-REGULARIZED MASKED NMF FOR LEARNING ON WEAKLY LABELED AUDIO DATA. Iwona Sobieraj, Lucas Rencker, Mark D. Plumbley ORTHOGONALITY-REGULARIZED MASKED NMF FOR LEARNING ON WEAKLY LABELED AUDIO DATA Iwona Sobieraj, Lucas Rencker, Mark D. Plumbley University of Surrey Centre for Vision Speech and Signal Processing Guildford,

More information

SIMULTANEOUS NOISE CLASSIFICATION AND REDUCTION USING A PRIORI LEARNED MODELS

SIMULTANEOUS NOISE CLASSIFICATION AND REDUCTION USING A PRIORI LEARNED MODELS TO APPEAR IN IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEPT. 22 25, 23, UK SIMULTANEOUS NOISE CLASSIFICATION AND REDUCTION USING A PRIORI LEARNED MODELS Nasser Mohammadiha

More information

Objectives. By the time the student is finished with this section of the workbook, he/she should be able

Objectives. By the time the student is finished with this section of the workbook, he/she should be able FUNCTIONS Quadratic Functions......8 Absolute Value Functions.....48 Translations o Functions..57 Radical Functions...61 Eponential Functions...7 Logarithmic Functions......8 Cubic Functions......91 Piece-Wise

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Lecture 15. Probabilistic Models on Graph

Lecture 15. Probabilistic Models on Graph Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how

More information

Monaural speech separation using source-adapted models

Monaural speech separation using source-adapted models Monaural speech separation using source-adapted models Ron Weiss, Dan Ellis {ronw,dpwe}@ee.columbia.edu LabROSA Department of Electrical Enginering Columbia University 007 IEEE Workshop on Applications

More information

CHAPTER 8 ANALYSIS OF AVERAGE SQUARED DIFFERENCE SURFACES

CHAPTER 8 ANALYSIS OF AVERAGE SQUARED DIFFERENCE SURFACES CAPTER 8 ANALYSS O AVERAGE SQUARED DERENCE SURACES n Chapters 5, 6, and 7, the Spectral it algorithm was used to estimate both scatterer size and total attenuation rom the backscattered waveorms by minimizing

More information

Probabilistic Model of Error in Fixed-Point Arithmetic Gaussian Pyramid

Probabilistic Model of Error in Fixed-Point Arithmetic Gaussian Pyramid Probabilistic Model o Error in Fixed-Point Arithmetic Gaussian Pyramid Antoine Méler John A. Ruiz-Hernandez James L. Crowley INRIA Grenoble - Rhône-Alpes 655 avenue de l Europe 38 334 Saint Ismier Cedex

More information

Roberto s Notes on Differential Calculus Chapter 8: Graphical analysis Section 1. Extreme points

Roberto s Notes on Differential Calculus Chapter 8: Graphical analysis Section 1. Extreme points Roberto s Notes on Dierential Calculus Chapter 8: Graphical analysis Section 1 Extreme points What you need to know already: How to solve basic algebraic and trigonometric equations. All basic techniques

More information

Discovering Convolutive Speech Phones using Sparseness and Non-Negativity Constraints

Discovering Convolutive Speech Phones using Sparseness and Non-Negativity Constraints Discovering Convolutive Speech Phones using Sparseness and Non-Negativity Constraints Paul D. O Grady and Barak A. Pearlmutter Hamilton Institute, National University of Ireland Maynooth, Co. Kildare,

More information

The Deutsch-Jozsa Problem: De-quantization and entanglement

The Deutsch-Jozsa Problem: De-quantization and entanglement The Deutsch-Jozsa Problem: De-quantization and entanglement Alastair A. Abbott Department o Computer Science University o Auckland, New Zealand May 31, 009 Abstract The Deustch-Jozsa problem is one o the

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

On High-Rate Cryptographic Compression Functions

On High-Rate Cryptographic Compression Functions On High-Rate Cryptographic Compression Functions Richard Ostertág and Martin Stanek Department o Computer Science Faculty o Mathematics, Physics and Inormatics Comenius University Mlynská dolina, 842 48

More information

Nonnegative Matrix Factorization with Markov-Chained Bases for Modeling Time-Varying Patterns in Music Spectrograms

Nonnegative Matrix Factorization with Markov-Chained Bases for Modeling Time-Varying Patterns in Music Spectrograms Nonnegative Matrix Factorization with Markov-Chained Bases for Modeling Time-Varying Patterns in Music Spectrograms Masahiro Nakano 1, Jonathan Le Roux 2, Hirokazu Kameoka 2,YuKitano 1, Nobutaka Ono 1,

More information

Hidden Markov Models. x 1 x 2 x 3 x N

Hidden Markov Models. x 1 x 2 x 3 x N Hidden Markov Models 1 1 1 1 K K K K x 1 x x 3 x N Example: The dishonest casino A casino has two dice: Fair die P(1) = P() = P(3) = P(4) = P(5) = P(6) = 1/6 Loaded die P(1) = P() = P(3) = P(4) = P(5)

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

University of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I

University of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I University of Cambridge MPhil in Computer Speech Text & Internet Technology Module: Speech Processing II Lecture 2: Hidden Markov Models I o o o o o 1 2 3 4 T 1 b 2 () a 12 2 a 3 a 4 5 34 a 23 b () b ()

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Slides mostly from Mitch Marcus and Eric Fosler (with lots of modifications). Have you seen HMMs? Have you seen Kalman filters? Have you seen dynamic programming? HMMs are dynamic

More information

Jorge Silva and Shrikanth Narayanan, Senior Member, IEEE. 1 is the probability measure induced by the probability density function

Jorge Silva and Shrikanth Narayanan, Senior Member, IEEE. 1 is the probability measure induced by the probability density function 890 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 3, MAY 2006 Average Divergence Distance as a Statistical Discrimination Measure for Hidden Markov Models Jorge Silva and Shrikanth

More information

Single-channel source separation using non-negative matrix factorization

Single-channel source separation using non-negative matrix factorization Single-channel source separation using non-negative matrix factorization Mikkel N. Schmidt Technical University of Denmark mns@imm.dtu.dk www.mikkelschmidt.dk DTU Informatics Department of Informatics

More information

Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391

Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391 Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391 Parameters of an HMM States: A set of states S=s 1, s n Transition probabilities: A= a 1,1, a 1,2,, a n,n

More information

Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features

Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Heiga ZEN (Byung Ha CHUN) Nagoya Inst. of Tech., Japan Overview. Research backgrounds 2.

More information

HIDDEN MARKOV MODELS IN SPEECH RECOGNITION

HIDDEN MARKOV MODELS IN SPEECH RECOGNITION HIDDEN MARKOV MODELS IN SPEECH RECOGNITION Wayne Ward Carnegie Mellon University Pittsburgh, PA 1 Acknowledgements Much of this talk is derived from the paper "An Introduction to Hidden Markov Models",

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Hidden Markov Models Barnabás Póczos & Aarti Singh Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed

More information

A UNIFIED FRAMEWORK FOR MULTICHANNEL FAST QRD-LS ADAPTIVE FILTERS BASED ON BACKWARD PREDICTION ERRORS

A UNIFIED FRAMEWORK FOR MULTICHANNEL FAST QRD-LS ADAPTIVE FILTERS BASED ON BACKWARD PREDICTION ERRORS A UNIFIED FRAMEWORK FOR MULTICHANNEL FAST QRD-LS ADAPTIVE FILTERS BASED ON BACKWARD PREDICTION ERRORS César A Medina S,José A Apolinário Jr y, and Marcio G Siqueira IME Department o Electrical Engineering

More information

Bayesian Hierarchical Modeling for Music and Audio Processing at LabROSA

Bayesian Hierarchical Modeling for Music and Audio Processing at LabROSA Bayesian Hierarchical Modeling for Music and Audio Processing at LabROSA Dawen Liang (LabROSA) Joint work with: Dan Ellis (LabROSA), Matt Hoffman (Adobe Research), Gautham Mysore (Adobe Research) 1. Bayesian

More information

REVIEW OF SINGLE CHANNEL SOURCE SEPARATION TECHNIQUES

REVIEW OF SINGLE CHANNEL SOURCE SEPARATION TECHNIQUES REVIEW OF SINGLE CHANNEL SOURCE SEPARATION TECHNIQUES Kedar Patki University of Rochester Dept. of Electrical and Computer Engineering kedar.patki@rochester.edu ABSTRACT The paper reviews the problem of

More information

Information retrieval LSI, plsi and LDA. Jian-Yun Nie

Information retrieval LSI, plsi and LDA. Jian-Yun Nie Information retrieval LSI, plsi and LDA Jian-Yun Nie Basics: Eigenvector, Eigenvalue Ref: http://en.wikipedia.org/wiki/eigenvector For a square matrix A: Ax = λx where x is a vector (eigenvector), and

More information

Statistical Sequence Recognition and Training: An Introduction to HMMs

Statistical Sequence Recognition and Training: An Introduction to HMMs Statistical Sequence Recognition and Training: An Introduction to HMMs EECS 225D Nikki Mirghafori nikki@icsi.berkeley.edu March 7, 2005 Credit: many of the HMM slides have been borrowed and adapted, with

More information

CONVOLUTIVE NON-NEGATIVE MATRIX FACTORISATION WITH SPARSENESS CONSTRAINT

CONVOLUTIVE NON-NEGATIVE MATRIX FACTORISATION WITH SPARSENESS CONSTRAINT CONOLUTIE NON-NEGATIE MATRIX FACTORISATION WITH SPARSENESS CONSTRAINT Paul D. O Grady Barak A. Pearlmutter Hamilton Institute National University of Ireland, Maynooth Co. Kildare, Ireland. ABSTRACT Discovering

More information

Inference and estimation in probabilistic time series models

Inference and estimation in probabilistic time series models 1 Inference and estimation in probabilistic time series models David Barber, A Taylan Cemgil and Silvia Chiappa 11 Time series The term time series refers to data that can be represented as a sequence

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS

PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS Jinjin Ye jinjin.ye@mu.edu Michael T. Johnson mike.johnson@mu.edu Richard J. Povinelli richard.povinelli@mu.edu

More information

Hidden Markov Models. Dr. Naomi Harte

Hidden Markov Models. Dr. Naomi Harte Hidden Markov Models Dr. Naomi Harte The Talk Hidden Markov Models What are they? Why are they useful? The maths part Probability calculations Training optimising parameters Viterbi unseen sequences Real

More information

We Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named

We Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named We Live in Exciting Times ACM (an international computing research society) has named CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Apr. 2, 2019 Yoshua Bengio,

More information

OBSERVER/KALMAN AND SUBSPACE IDENTIFICATION OF THE UBC BENCHMARK STRUCTURAL MODEL

OBSERVER/KALMAN AND SUBSPACE IDENTIFICATION OF THE UBC BENCHMARK STRUCTURAL MODEL OBSERVER/KALMAN AND SUBSPACE IDENTIFICATION OF THE UBC BENCHMARK STRUCTURAL MODEL Dionisio Bernal, Burcu Gunes Associate Proessor, Graduate Student Department o Civil and Environmental Engineering, 7 Snell

More information

A State-Space Approach to Dynamic Nonnegative Matrix Factorization

A State-Space Approach to Dynamic Nonnegative Matrix Factorization 1 A State-Space Approach to Dynamic Nonnegative Matrix Factorization Nasser Mohammadiha, Paris Smaragdis, Ghazaleh Panahandeh, Simon Doclo arxiv:179.5v1 [cs.lg] 31 Aug 17 Abstract Nonnegative matrix factorization

More information

More on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013

More on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013 More on HMMs and other sequence models Intro to NLP - ETHZ - 18/03/2013 Summary Parts of speech tagging HMMs: Unsupervised parameter estimation Forward Backward algorithm Bayesian variants Discriminative

More information

Speech Recognition Lecture 8: Expectation-Maximization Algorithm, Hidden Markov Models.

Speech Recognition Lecture 8: Expectation-Maximization Algorithm, Hidden Markov Models. Speech Recognition Lecture 8: Expectation-Maximization Algorithm, Hidden Markov Models. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.com This Lecture Expectation-Maximization (EM)

More information

1 What is a hidden Markov model?

1 What is a hidden Markov model? 1 What is a hidden Markov model? Consider a Markov chain {X k }, where k is a non-negative integer. Suppose {X k } embedded in signals corrupted by some noise. Indeed, {X k } is hidden due to noise and

More information

An Introduction to Bioinformatics Algorithms Hidden Markov Models

An Introduction to Bioinformatics Algorithms   Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information