Investigating Mixed Discrete/Continuous Dynamic Bayesian Networks with Application to Automatic Speech Recognition

Size: px
Start display at page:

Download "Investigating Mixed Discrete/Continuous Dynamic Bayesian Networks with Application to Automatic Speech Recognition"

Transcription

1 Investigating Mixed Discrete/Continuous Dynamic Bayesian Networks with Application to Automatic Speech Recognition Bertrand Mesot IDIAP Research Institute P.O. Box 59 CH-9 Martigny Switzerland Thesis Proposal September 5 Submitted to: The Swiss Federal Institute of Technology of Lausanne (EPFL), School of Engineering Sciences (STI), Signal Processing Institute Thesis Director: Prof Hervé Bourlard Supervisor: Dr David Barber ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE

2 Abbreviations HMM Seg-HMM SAR-HMM SLDS DBN PDF EM ML Hidden Markov Model Segmental HMM Switching Autoregressive HMM Switching Linear Dynamical System Dynamical Bayesian Network Probability Distribution Function Expectation Maximisation Maximum Likelihood Notation s t h t o t v t x :T φ The state of a discrete (switch) hidden variable at time t The state of a continuous hidden variable at time t A feature vector at time t A sample of the speech signal at time t Shorthand for x, x,..., x T A particular setting of the HMM parameters

3 Introduction Speech is fundamentally a mixture of discrete and continuous effects. On the one hand, we have the words or sub-words units like phonemes, on the other hand, there is the waveform which comes from the oscillations of the vocal chords and is modulated by the vocal tract. Current HMM-based recognition systems are mainly centred on the modelling of the discrete part and assume that the information carried by the continuous component can be compressed into a set of features which encompass static as well as dynamic information. While this approach has been very successful, from a formal point of view it has the drawback of making the model inconsistent. This is mainly because the introduction of dynamical features induces correlation between the observations at various time steps, while an HMM assumes that there is no correlation between observations coming from the same state. Two models that address this issue are the Segmental HMM (Seg-HMM) and the Switching Autoregressive HMM (SAR- HMM). The Seg-HMM tries to capture the short term correlations that exist between consecutive features, while the SAR-HMM does not use features at all and rather models the raw speech signal by means of a set of autoregressive processes. Both models can be seen as a step towards a better modelling of the continuous component of the speech signal, but they still do not address the issue completely. Indeed in a Seg-HMM continuity is broken at segments boundaries and, in a SAR-HMM, the AR processes are defined on the noisy signal, which makes the model quite sensitive to noise. The goal of this thesis is to go a step further in the modelling of the continuous component of the speech signal by proposing a more general model that belongs to the class of Switching Linear Dynamical Systems (SLDSs). SLDSs are particularly suited for the modelling of the speech signal because they combine in a single consistent model both continuous and discrete variables. Contrary to the SAR-HMM, the continuity of the signal is directly inherited from the continuity of the hidden variable. This approach has the benefit of clearly separating the underlying dynamics of the signal modelled and the noise. Furthermore, compared to the Seg-HMM, the continuous hidden variable is not integrated out over a segment, but is shared by all time steps. Therefore segments are no more required and the problem of discontinuities at segment boundaries disappears. Another advantage of a SLDS over an HMM is that interactions between discrete and continuous hidden variables can be easily handled. This is particularly useful for modelling state duration for example, since one can condition the discrete transition probability on the state of the continuous variable in order to constrain state transitions to occur only at a particular moment. Finally, the use of a continuous hidden variable makes the introduction of knowledge about the structure of the signal straightforward whilst maintaining the consistency of the model. As we will show the characteristic shape and dynamics of the harmonic stacking structures observed in speech utterances can be efficiently encoded into an SLDS by considering a Fourier type representation. This approach is particularly interesting because it allows knowledge about the structure of the signal to be integrated in a more natural way than with an AR process and also considerably reduces the number of parameters that need to be trained. For example, the inherent structure of phonemes does not change whether it is spoken by a male or a female, the pitch however changes and must

4 be considered as a parameter. Ultimately our model will represent the speech waveform as a dynamical system, which is controlled by a set of parameters whose values depend on the state of a discrete switch variable, that itself represents a sub-word unit, like a phoneme for example. With this structure it is also possible to integrate higher level knowledge, like a word model, in a hierarchical way. The following section starts by briefly presenting the basic principles of speech recognition, it then describes in more details the Seg-HMM and the SAR-HMM and shows what, we believe, are their deficits. We then introduce the SLDSs and explain how we plan to apply them to speech recognition. Section presents preparatory works that we carried out in order to demonstrate the validity of our model. It discusses the difficulties of doing inference in SLDSs and shows that, even though inference in SLDSs is intractable, it can be efficiently approximated by an algorithm that we developed. In this section we also compare the robustness to noise of a SAR-HMM and a SLDS. Finally, Section 6 presents a detailed research plan. Background Motivation. General Framework Current state-of-the-art automatic speech recognition (ASR) systems are based on the principle of statistical pattern matching. A speech waveform is first converted into a sequence of acoustic (or feature) vectors o :T and the corresponding sentence (or utterance) is split into a sequence of words w :N. The job of the ASR system is to find the most likely sequence of words ŵ :N, given the observed sequence of acoustic vectors: ŵ :N = arg max p(w :N o :T ). () w :N To do so, each word is split into simple units called phones and each of those phones is modelled by an HMM. The HMM are then concatenated according to the position of the phones in the utterance to form the corresponding model if a phone appears more than once, the parameters of the HMM are shared between all the instances. The probability of the observed sequence of acoustic vectors given the sequence of words p(o :T w :N ) is then computed and Bayes rule is used to calculate p(w :N o :T ): p(w :N o :T ) = p(o :T w :N )p(w :N ) p(o :T ) () where the a priori probability of the sequence of words p(w :N ) is given by a language model. This procedure is repeated for all possible sequences of words and the most likely is considered to be the actual transcription of the speech waveform. The underlying behaviour of the recognition process is actually much more complicated than what has been presented so far. The speech signal is generally pre-emphasised before being cut into small, usually 5ms long, overlapping windows on which a Fourier analysis is carried out to obtain slices of the short-term

5 s t s t s t+ o t o t o t+ Figure : Dynamical Bayesian network representation of an HMM. The time is symbolised by t, the discrete state variable by s and the continuous visible variable by o. spectrogram. Various transformations are then applied on those slices in order to generate acoustic vectors. Each phone is then usually modelled by a hidden Markov chain of three states. Phone models are combined to form word models which are combined together to form a single composite HMM representing all possible utterances. Finding the most likely word sequence ŵ :N is then reduced to finding the most likely state sequence ŝ :T, given the observations: ŝ :T = arg max p(s :T o :T ). () s :T This problem can be efficiently solved by using the Viterbi algorithm, while finding the most likely word sequence, as given by Equation, is intractable. Before being used for recognition the parameters of an HMM must be adjusted in order to fit the model to the observations. Usually one wants to find the parameter setting ˆφ that maximises the log-likelihood of the observations, given the word sequence: ˆφ = arg max φ log p(o :T w :N, φ). () This training procedure is carried out by the Baum-Welch algorithm which is a specialisation of the expectation maximisation (EM) algorithm to the HMM. The Baum-Welch algorithm itself relies on the Forward-Backward algorithm which computes for each time step the posterior probability of the state given the whole sequence of observations, p(s t o :T ). This procedure is called inference and its complexity is directly related to the structure of the model. An HMM for example, can be represented by the Dynamical Bayesian Network (DBN) of Figure and defines the following joint probability: p(s :T, o :T ) = p(o s )p(s ) T p(o t s t )p(s t s t ) (5) where p(s ) is the state prior, p(s t s t ) the transition probability and p(o t s t ) the emission probability. The actual value of those probabilities are the parameters of the HMM. HMM training based on maximum log-likelihood (ML) may not be adequate since the discriminative power of the model is not enforced. For this reason t= The interested reader may refer to [] for a detailed explanation.

6 people have tried various alternative approaches like, for example, conditional log-likelihood maximisation of the state sequence p(s :T o :T ) [7], maximum mutual information (MMI) [, 9] or minimum word/phone error (MWE/MPE) []. Discriminative training is more expensive than ML training and approximations are usually required. It has nevertheless been successfully used for connected digit recognition [] as well as large vocabulary speech recognition [8, ].. Segmental HMMs The idea behind Seg-HMM [9, ] is that recognition performance would be improved if it was possible to model more accurately the relation that exists between consecutive feature vectors. To do so, the observation sequence o :T is assumed to be the concatenation of an a priori unknown number of segments, where each one belongs to a certain a priori unknown state. The length of the segments as well as the state to which they belong are discrete hidden variables that need to be inferred. Formally speaking the segmental HMM is just a plain HMM where the probability distribution function (PDF) of a single acoustic vector has been replaced by the PDF of a sequence of vectors. Segments are thus assumed to be independent and, given a state sequence s :N of length N and the duration of each segment l :N, the probability of the entire sequence of observations o :T is given by the product of the probabilities of each segment: p(o :T l :N, s :N ) = N p(o ti:t i+l i l i, s i ) (6) i= where t i is the starting time of the ith segment and l i the length of that segment. The probability of the observations given the state sequence only is obtained by integrating over all possible segment lengths: p(o :T s :N ) = l :N p(o :T l :N, s :N )p(l :N s :N ) (7) with p(l :N s :N ) = p(l s ) N p(l i s i, l i, s i ), (8) where the length of the segments l i are assumed to be conditionally independent. The introduction of the additional variable l makes the Seg-HMM slightly more complex compared to a plain HMM, since this variable must be inferred as well. Various approaches exist for modelling observation dependency inside a segment. They can be classified into two families, the stationary and non-stationary models []. We will concentrate ourself on the non-stationary family, because it contains all the stationary models as a special case and it is the closest to our ultimate model. The basic idea is to condition the PDF of the visible variable on an additional hidden variable h which encodes some prior knowledge that one has about the dynamics of the observed segment. For example, Digalakis [] i= 5

7 s i s i s i h t h t+li o t o t+li Figure : DBN representation of a segmental HMM. The segment number is symbolised by i and its length by l i. s is a discrete hidden variable, h is a continuous hidden variable and o is a continuous visible variable. models the evolution of the feature vectors with a stochastic linear dynamical system of the form: h t = A(s i )h t + η h (s i ) with η h (s i ) N ( µ h (s i ), Σ h (s i ) ) o t = B(s i )h t + η v (s i ) with η v (s i ) N ( µ v (s i ), Σ v (s i ) ) (9) where A(s i ) is a transition matrix that encodes the dynamics and B(s i ) is a projection matrix. The DBN representation of such a non-stationary Seg-HMM is depicted in Figure. Note that since the continuous hidden variable is local to a segment, the emission probability is easily obtained by integrating out the hidden variable: p(o ti:t i+l i l i, s i ) = h :li l i p(h l i, s i ) p(o ti+t h t, l i, s i )p(h t h t, l i, s i ). () There are two drawbacks to such an approach. First, although continuity is efficiently modelled inside a segment, discontinuities are still present at segment boundaries. In this respect the Seg-HMM represents only a step towards a (in our view) more natural model, for which the continuous hidden variable, instead of being integrated out, is kept across segment boundaries. Second, the use of feature vectors complicates the encoding of knowledge that we have about the dynamics of the observations. We will see in Section that structures like the harmonic stacking can be very easily encoded into the model if the speech waveform is used instead of feature vectors. Note that a similar approach has also been envisaged in [], where a Seg-HMM is used for pitch tracking, voiced/unvoiced detection and timescale modification of the speech signal. t=. Switching Autoregressive HMMs The autoregressive HMM [] and its recent extension, the switching autoregressive HMM (SAR-HMM) [6], are two well-known attempts to model the speech waveform directly. The latter model assumes that the sequence of samples can be represented by a switching autoregressive process, i.e., the sample v t at each 6

8 s t s t s t+ v t v t v t+ Figure : DBN representation of an AR-HMM. The time is symbolised by t, the discrete state variable by s and the continuous visible variable by v. time step can be predicted by a stochastic linear equation of the form: v t = R c r (s t )v t r + η(s t ) with η(s t ) N (, σ (s t )) () r= where c r (s t ) is an autoregressive coefficient and σ (s t ) the variance of the innovation η(s t ). The DBN representation of such a model is shown on Figure. As can be seen by comparing Figures and, the only difference between an HMM and a SAR-HMM is the dependency between the observations at various time steps. The interesting thing about the SAR-HMM approach is that the parameters of the AR process are controlled by the discrete variable. Contrary to the HMM where no continuity exists between the observations, the dynamics produced by the SAR-HMM is still influenced by the discrete variable, but smoothness is conserved. To this respect the SAR-HMM is considerably more powerful than the Seg-HMM, where the continuity is lost when the state changes. A drawback of Equation is that the predicted observation v t depends on the previous noisy observations. This is unfortunate since therefore in situations where considerable noise is present in the observations, the prediction will be poor. An alternative to encourage observation continuity and smoothness is to use a continuous hidden variable h, forming an autoregressive process in the hidden space. The sample v t can then be made to depend only on the cleaner hidden variable. Predictions in noisy environments should thus be inherently more robust since the observation is seen as a corrupted version of a clean, constrained and controlled hidden dynamics. This can be modelled and indeed motivates our interest in the SLDS, as described in Section.. Explicit State Duration Another weakness of HMMs is their failure to effectively model phone duration. Indeed due to the Markov assumption, the probability of staying in the same phone n consecutive time steps is given by a geometric distribution, whereas the empirical phone duration distribution is rather Poisson or Gaussian [, 5, 6]. State-of-the-art ASR systems partially resolve this limitation by modelling a phone with two or three states with identical emission PDF. Although this We are assuming that no extra time counting hidden variables are introduced. It is possible to make more realistic phone durations using temporal counters, but the vast increase in the number of hidden states is prohibitive. 7

9 s t s t s t+ h t h t h t+ v t v t v t+ Figure : DBN representation of an SLDS. The time is symbolised by t, the discrete state variable by s, the continuous state variable by h and the continuous visible variable by v. works quite well in practice, this is a clear limitation of the model and various solutions have been proposed [7,, 5, 6]. Explicit state duration is not used in state-of-the-art ASR systems, possibly because it generally requires an additional hidden variable and the computational overhead introduced is not deemed to be worth the potential performance improvement. Explicit state duration is nevertheless a mandatory component of the Seg-HMM since the length of the segment emitted by a state must be known a priori each state is associated with a hidden variable which conditions the length of the generated sequence. This approach has the benefit of being completely generic since any arbitrary PDF can be used to model duration. A drawback however is that the length of a segment is given a priori and thus cannot be influenced by the hidden variable. For example, if the hidden variable is used to model a linear trajectory, to preserve continuity, it could be useful to allow segment boundaries to occur only when the trajectory is at a specific position. This approach has been used for example in [] to model speech waveforms with a Seg-HMM. To reduce the number of possible segmentations, they only consider those where segments begin and end on zero crossings of the signal. Proposed Solution The previous section showed how the SAR-HMM and the Seg-HMM are limited in the way they can model the discrete and continuous nature of speech at the same time. We want to go a step further by formulating a model which combines the continuous hidden variable found in the non-stationary Seg-HMM with the discrete switching process found in the SAR-HMM. Contrary to the Seg-HMM we prefer to model the speech waveform directly since this maintains the consistency of the model and allows straightforward encoding of structures like the harmonic stacking. The proposed model belongs to the class of Switching Linear Dynamical Systems (SLDSs) whose DBN representation is shown in Figure. This model is fundamentally different from the previously discussed SAR-HMM in the sense that the knowledge that one has about the dynamics of the signal is encoded into a continuous hidden variable and not as a correlation between the samples. The 8

10 resulting model should thus be far less sensitive to noise than the SAR-HMM. It is also different than the Seg-HMM since the continuous hidden variable is not local to a segment, but is shared between all time steps. Consequently, the continuous hidden variable cannot be simply integrated out like in a Seg-HMM, but needs to be inferred along with the discrete switch variable. This has the negative effect of making inference intractable. Fortunately, inference can still be quite efficiently approximated thanks to various approximation algorithms that we will present in Section.. Formally speaking the SLDS, as shown in Figure, defines the following joint distribution: p(v :T, s :T, h :T ) = t p(v t s t, h t )p(h t h t, s t, s t )p(s t s t, h t ). () The continuous transition PDF p(h t h t, s t, s t ) is the main novelty introduced by the SLDS. This term actually contains the knowledge that we have of the dynamics of the signal by assuming that h t follow a stochastic linear equation of the form: with h t = A(s t, s t )h t + η h (s t, s t ) () η h (s t, s t ) N (µ h (s t, s t ), Σ h (s t, s t )) () where the transition matrix A(s t, s t ) encodes the dynamics and η h (s t, s t ) is a source of innovation. For example a single cosine wave with frequency ω can be obtained by setting: [ ] cos(ω) sin(ω) A(s t, s t ) Θ(ω) = (5) sin(ω) cos(ω) in Equation. The continuous hidden variable can then be interpreted as a two-dimensional rotating vector. This Fourier type representation is particularly attractive, because it is very handy to encode prior knowledge that we have about the speech signal, like for example the harmonic stacking structure of a phoneme. Those structures contain a certain number of particularly active frequencies whose dynamics carry information about what is actually spoken. Therefore, being able to efficiently track those frequencies may significantly improve the recognition accuracy. For that we can define a block diagonal transition matrix of the form: ρ (s t )Θ(s t, ω ) A(s t ) = ρ (s t )Θ(s t, ω ).... ρ N (s t )Θ(s t, ω N ). (6) where ω,..., ω N are N frequencies that we want to track and ρ,..., ρ N are N damping factors. The harmonic stacking effect can then be achieved by constraining the frequencies ω,..., ω N to be pre-defined multiples of the base frequency ω. If we furthermore assume that some relation exists between the 9

11 damping factors ρ i (s t ) for example, in the case of a vibrating string, damping factors of harmonics scale approximately geometrically with respect to that of the fundamental frequency [9] then the number of free parameters (those which will be learned) is considerably reduced. By making these free parameters dependent on the the switch s state s t, it is then possible to represents different kind of dynamics which nevertheless possesses the same underlying harmonic stacking structure. For example, there is a difference between the fundamental frequencies of a male and a female speaker or between the characteristic frequencies of different phonemes. That kind of approach has been successfully used in music transcription [, ] and, we think, may be useful for speech recognition as well. The emission distribution p(v t s t, h t ) specifies the transformation that must be applied to the continuous hidden variable in order to generate the waveform. Similarly to Equation, v t is assumed to follow a stochastic linear equation of the form: v t = B(s t )h t + η v (s t ) with η v (s t ) N (µ v (s t ), Σ v (s t )) (7) where B(s t ) is the projection matrix and η v (s t ) a source of noise. The projection in our case is straightforward since it reduces to summing all the cosine components of the hidden variable. This is achieved by the following N matrix: [ ] B =.... (8) Contrary to the SAR-HMM where η n and η v are mixed together, a SLDS clearly separates the innovation process from the noise process. The benefit of this approach is considerable when the environmental conditions are not wellcontrolled, because it is possible, by adapting η v, to choose the level of noise that is filtered out of the signal before this latter is processed by the hidden layer. The discrete transition distribution p(s t s t, h t ) specifies the dynamics of the model parameters, i.e, the hidden continuous dynamics can switch between various regimes whose parameters depend on the value of the discrete switch variable s. The state of this switch variable is what we are ultimately interested in because it represents a particular thing that we want to recognise, like for example a phoneme or a gender. A novelty of our model is the conditioning of the discrete transition distribution on the state of the continuous hidden variable. This additional link is particularly useful for modelling state duration, for example by forbidding transitions as long as h is not close to zero. Finally, similarly to the HMM, recognition is achieved by finding the most likely sequence of state ŝ :T, given the sequence of samples v :T : ŝ :T = arg max p(s :T v :T ) (9) s :T where p(s :T v :T ) is obtained by integrating over all possible hidden trajectories: p(s :T v :T ) = p(s :T, h :T v :T ). () h :T It is important to note that, although exact decoding and training is formally possible, the number of computations required grows exponentially with the

12 number of samples considered this is explained in Section.. A second problem is that current approximation techniques tend to be numerically unstable and thus prevent their use to long (more than, time steps) sequences. Fortunately, work done during the preparation of this proposal led to the development of the expectation correction (EC) algorithm [], a new stable approximation algorithm for doing inference in SLDSs. Furthermore, training of SLDSs with the maximum likelihood algorithm has already been tested on real data []. We also wish to apply discriminative training methods to SLDSs, since they may be expected to improve recognition performance. Preparatory Work The complexity of a model is measured by the complexity of the corresponding inference process. Doing inference on a SAR-HMM is not more difficult than with a plain HMM. In a Seg-HMM the inference process is slightly more complicated since there are two additional hidden variables. It nevertheless stays tractable, because the additional continuous variable is integrated out, while the second one, the length of the segment, is discrete. With a SLDS however, the complexity of the inference process is exponential in the number of samples considered. Given the fundamental role played by the inference process during training, it is therefore extremely important to dispose of an accurate and stable approximation algorithm. The following section describes a new approximation techniques that we developed and compare its performance to other well-known approximation techniques. An important aspect of our proposed model is that, contrary to the SAR- HMM, it makes a clear distinction between the actual process that generates the signal and the noise that gets added on it. In Section we claimed that this approach is particularly useful when the environmental conditions change, because the noise is filtered out from the signal, before it reaches the hidden layer. To demonstrate that point, we implemented the SAR-HMM and compared its performance with that of an autoregressive SLDS (AR-SLDS). Details about the experiments that we carried out as well as the results are given in Section... Approximate Inference in SLDSs.. The Problem Inference in SLDSs, like in the SAR-HMM and the Seg-HMM, is generally done in two steps. The first one proceeds forward in time and computes, for each time t, a message ρ(s t, h t ) that carries information coming from the past. The second one proceeds backward in time and computes, for each time t, a message λ(s t, h t ) that carries information coming from the future. The forward and backward messages are then combined to obtain the joint posterior distribution of the hidden variables: p(s t, h t v :T ) = ρ(s t, h t )λ(s t, h t ). ()

13 The forward message, also called filtered estimate, is computed by the following recursive equation: ρ(s t, h t ) p(v t s t, h t ) p(h t h t, s t )p(s t s t )ρ(s t, h t ). () s t h t A problem with this expression is that, since the initial message ρ(s, h ) is a mixture of S Gaussians S is the number of switch states due to the sum over s t, the next message will be a mixture of S Gaussians. One then easily deduces that the number of mixture components at time t will be S t and therefore the number of computations that need to be carried out at each time step will grow exponentially. A solution to this problem is to collapse the mixture obtained at each time step to a mixture with fewer components. This corresponds to the so-called Gaussian Sum Approximation [] which is a form of Assumed Density Filtering (ADF) []. Contrary to the forward message which is equivalent to p(s t, h t v :t ), the backward message does not correspond to a probability distribution and makes unclear how one should perform the collapse. The Expectation Propagation (EP) algorithm [] addresses this issue by defining the backward message as the division of the posterior distribution by the forward message: λ(s t, h t ) = s t h t p(s t, h t, s t, h t v :T ). () ρ(s t, h t ) The main drawback of such an approach is that, because we do not know how to divide by a mixture of Gaussians, this latter expression can only be computed if the forward message is collapsed to a single Gaussian. Note that a similar formula is used to find the forward message and thus implies the same constraint on the backward message. Another weakness of EP is its sensibility to numerical instabilities. In order to improve on that point we devised a new formulation of the backward pass that thanks to an auxiliary variable trick that will not be exposed here allows the backward message to be considered as a probability. As we will see later, the new derivation will prove to be far more stable than the original one. Being forced to restrict ourselves to approximate messages by single Gaussians is a bit frustrating because it seems intuitively clear that, in the cases when a high dimensional continuous hidden variable gets projected to a comparatively low dimensional visible variable, multi-modalities in the distribution of h are likely to occur. Indeed knowing the value of the visible variable may not provide enough information to precisely infer the value of the hidden variable. To cope with that problem we developed a new approximation algorithm called Expectation Correction, where the backward pass computes directly the posterior distribution p(s t, h t v :T ) by correcting the forward estimation, hence the name of the algorithm. The recursion for the backward message is given by: λ(s t, h t ) p(s t, h t, s t+, h t+ v :T ) () s t+ h t+ p(h t s t, s t+, v :T ) p(s t s t+, h t+, v :t )λ(s t+, h t+ ). s t+ h t+ The interested reader may refer to [6] for a detailed explanation.

14 Similarly to Equation, this expression defines a mixture of S t Gaussians which can be collapsed to a mixture with a fewer number of components. In order to evaluate this latter expression two other approximations are required: (a) dropping of a dependence p(h t+ s t, s t+, v :T ) p(h t+ s t+, v :T ); (b) approximation of the integral of p(s t s t+, h t+, v :t ) with respect to λ(s t+, h t+ ). Compared to the Generalised Pseudo Bayes (GPB) method which discards all future information by replacing (b) by p(s t s t+, v :t )λ(s t+, h t+ ), the approach taken by EC is more conservative. Indeed, as we will see, simply evaluating the distribution p(s t s t+, h t+, v :t ) at the mean of λ(h t+ s t+ ) is sufficient to improve on GPB. Other popular approaches to filtering and smoothing are based on sequential Monte Carlo [5]. Whilst potentially powerful, these non-analytic methods typically suffer in high-dimensional latent/hidden spaces since they are often based on naive importance sampling, which restricts their practical use. Implementations of Rao-Blackwellisation (see for example []) may not help in difficult problems where the continuous posterior is highly non-gaussian, and we are unaware of methods that have addressed this... Experiments We would like to test our Expectation Correction smoothing method in a problem with a reasonably long temporal sequence. An obvious difficulty arises here in that, since the exact computation is exponential in the number T of sample considered, a formally exact evaluation of the method is infeasible. A reasonable approach under these circumstances, is to suppose that the generated switch states will be close to the most probable state of the true posterior p(s t v :T ). That is, we sample a hidden state s and h from the prior, and then a visible observation v. Then, sequentially, we generate hidden states and visible states for the next time steps. The task for smoothing inference is, given only the parameters of the model and the visible observations (but not any of the hidden states h :T and s :T ), to infer p(h t v :T ) and p(s t v :T ). A simple performance measure is to assume that the original sample states s :T are the correct inferences, and compare how our most probable posterior smoothed estimates arg max st p(s t v :T ) compare with the assumed correct s t. The reader should bear in mind, of course, that this is just a tractable surrogate for comparing our estimate of p(s t v :T ) with the exact value of p(s t v :T ). We look at two sets of experiments, both on time series of length T = with S = switch states. In both sets of experiments, we compared methods using a single Gaussian, and methods using multiple Gaussians. The number of Gaussians used was set to S throughout. One is relatively easy and the other relatively hard. From the viewpoint of classical signal processing, both experiments are extremely difficult in the sense that they cannot be solved by short time Fourier methods, since changes occur in the dynamics at a much higher rate than the typical frequencies in the signal, see Figure 5. In the easy experiments, we used a small hidden dimension H =, with a moderate amount of transition and observation noise. As can be seen from Figure 6a, Particle Filtering performs reasonably well, although its performance is enhanced by Rao- Blackwellisation (RBPF). Assumed Density Filtering using a single Gaussian and with a mixture performed roughly the same as RBPF, as did the methods

15 EP ECS 6 8 ADFM 6 8 ECM Figure 5: Results on a typical example from our hard problem for the methods of Expectation Propagation (EP), Assumed Density Filtering using a mixture of four Gaussians (ADFM), Expectation Correction using a Single Gaussian (ECS), and Expectation Correction using a mixture of four Gaussians (ECM). Plotted is the one dimensional visible signal, with a marker coloured by the most probable posterior estimated switch variable, which can be one of two states. A cross indicates a switch variable inference error. Only methods which have mixture representations of the posterior succeed indeed, the ECM method gives no errors. based on Generalised Pseudo Bayes, using either the ADF single Gaussian results, or the Gaussian mixture results. A standard implementation of Expectation Propagation, even in this case, suffers from many numerical stability problems, but is improved somewhat by our own more stable implementation. The Expectation Correction method using a single Gaussian dramatically improves on the ADF single Gaussian filtered estimate. Using a small number of mixture components in Expectation Correction improves the situation further. In the hard case, Figure 6b, we used a larger hidden dimension, H =, with a small amount of transition noise, and a large amount of observation noise. We chose these parameters since this will most likely result in highly multi-modal posteriors. In this case, only those methods that used a mixture of Gaussians performed well otherwise, the methods were little better than random guessing. Expectation Correction with a small number of mixture components, apart from a small number of errors, dramatically gives almost perfect performance. Readers interested in Particle Filters may wonder why Rao-Blackwellisation does not seem to perform well. Our explanation is that the standard implementation we used [] still makes the assumption that a single Gaussian is adequate to describe the posterior filtered estimate p(h t s t, v :t ). In our hard experiment, any method which does not deal with multi-modality of the posterior is doomed. Thanks to EC, inference in SLDSs can now be efficiently and correctly carried out, thus greatly improving the quality of the training and decoding as well. Indeed stability and accuracy are essential if one want to apply a SLDS to real data, since speech utterances, even downsampled to 8kHz, contain more than ten thousand samples. Unfortunately training a SLDS is still difficult, partly

16 5 6 PF RBPF PF RBPF ADFS 5 6 GPB 5 6 ADFS 5 6 GPB ADFM 5 6 GPBM 5 6 ADFM 5 6 GPBM EP 5 6 EP 5 6 EP 5 6 EP ECSS 5 6 ECMS 5 6 ECSS 5 6 ECMS (a) (b) 5 6 Figure 6: Histograms of the number of errors over experiments, i.e., the x- axis gives the number of errors and the y-axis the number of experiments that resulted in a certain number of errors. Particle Filter (PF), Rao-Blackwellised Particle Filter (RBPF), Assumed Density Filtering with Single Gaussian (ADFS), Generalised Pseudo Bayes (GPB), Assumed Density Filtering with Multiple Gaussians (ADFM), Generalised Pseudo Bayes with Multiple Gaussians (GPBM), Expectation Propagation (EP), our implementation of Expectation Propagation (EP), Expectation Correction Smoothing with a Single Gaussian (ECSS), Expectation Correction Smoothing with Multiple Gaussians (ECSM). 5

17 because of the high number of parameters involved and the sensitivity to initial conditions, but mainly because of numerical instabilities that are still present in the update process. One of the goals of this thesis is to improve training of SLDS.. Experiments on Noise Robustness Of the two models presented in Section, the SAR-HMM is the one which is the closest to a SLDS. Indeed, by getting rid of the feature vectors, the SAR-HMM is, in our view, the most elaborate attempt so far to build a consistent model that tries to capture the inherent continuity of the speech signal while retaining the idea of an underlying discrete switching process. However, as mentioned in Section, in a SAR-HMM the predicted observation depends exclusively on the previous noisy observations. Therefore, if the noise level in the test set is different from that of the training set, the performance may drop significantly. On the contrary, the performance of a SLDS should not be too much affected by the level of noise because this is effectively filtered out before it reaches the hidden layer. In order to demonstrate this point we first implemented the SAR-HMM described in [6] and trained one model for each of the digits of the TI-DIGITS database. The training set for each digit was composed of utterances downsampled to 8kHz, the models were composed of ten states with a left-right transition matrix and the AR processes were of order. The models were then evaluated on a test set composed of utterances of each digit, and the recognition was simply achieved by selecting the model with the highest likelihood. The models where initialised using uniform segmentation; Viterbi alignment has also been tested, but did not bring any improvement. As mentioned in the previous section, efficient training in SLDSs is still a subject of research. For this reason, instead of training a SLDS, we simply embedded the trained SAR-HMM into the hidden layer of a SLDS. The resulting model, called AR-SLDS, has the benefit of separating the part of the signal that comes from the innovation, from the part that comes from the noise. Indeed the hidden variance which comes from the trained SAR-HMM in this case represents the amount of innovation that is introduced into the system at each time step, while the visible variance represents the amount of noise that gets added onto the resulting clean hidden signal. We therefore expect that, by setting the visible variance to the actual noise variance, the performance of the AR-SLDS would be roughly the same as that of the SAR-HMM tested on the clean utterances. For the purpose of the demonstration we set the noise variance by hand in our experiments, although, ultimately, we intend to learn the noise level automatically. In order to compare our results with those obtained by a standard, featurebased HMM, we trained an HMM on the same training set that we used for the SAR-HMM. We chose the same setup as that used to get the baseline performance on the Aurora database, namely 8 states, left-right transition matrix, mixtures of Gaussians and 9 MFCC features including first and second temporal derivatives as well as energy. The recognition accuracies achieved by the three models are shown in Table. Since we used a generic SLDS code for those experiments the time re- 6

18 Noise Variance SNR (db) HMM SAR-HMM AR-SLDS clean.% 86.8% 86.8%.8.% 87.% 86.% 9.6.% 87.% 9.8% % 6.7% 9.6% % 5.5% 88.% % 9.7% 87.% 5. 5.% 8.6% 79.%..6% 9.% 8.%. 9.% 9.% 6.6% Table : Comparison of the recognition accuracies of an AR-HMM and a AR- SLDS for various levels of Gaussian noise. The SNR was estimated with the NIST stnr program. quired to evaluate the whole test set with this model was particularly long, we therefore built a different test set for each level of noise by randomly selecting utterances (among ) per digit, i.e, a total of utterances. The HMM is quite robust, since it is still able to reach 95% with added noise of variance 6, while the performance of the SAR-HMM is already close to that of a random guess, i.e., 9.%. However as soon as the noise variance gets above 6 the performance of the HMM drops very quickly. Indeed the robustness of the HMM mainly depends on the efficiency of the feature extraction, which is done regardless of the model. Once the features have been computed, the HMM only serves as a mean to calculate the likelihood of the corresponding sequence of observations. With the AR-SLDS, the robustness comes from the additional continuous hidden variable which encodes the knowledge that we have about the clean speech signal, and the most likely filtered signal is computed along with the most likely state sequence. As expected, the AR-SLDS is less sensitive to changes in noise conditions compared to the SAR-HMM and is even more robust than the HMM on high variances. It is important here to notice that due to the embedding of the SAR-HMM, the performance of the AR-SLDS is bounded above by that of the SAR-HMM. Indeed an SAR-HMM accuracy of 98.6% is reported in [6], but we have not been able to reproduce this performance. Although we are lacking results to support this, we expect that a better SAR-HMM performance would improve the accuracy of the AR-SLDS as well. Another interesting fact concerning the AR-SLDS is that, for some value of the noise variance, the accuracy is significantly improved compared to that obtained on the clean utterances. This improvement can be explained by looking at Figure 7, where the noise correlation of the original signal is compared to that of some corrupted signals. As one can see, as soon as the variance of the additional noise is of the same order as that of the original correlated noise, the resulting noise looks like something completely uncorrelated. This effect is particularly useful when an AR-SLDS is used because that uncorrelated noise is implicitly and efficiently removed from the signal. Therefore the signal seen by the hidden layer is cleaner than that 7

19 x Original x e v t+ v t+ v t+ v t x x e 9 v t x v t+ v t x x e v t x Figure 7: Correlation of two consecutive samples of a silence utterance which has been corrupted by different levels of noise. seen when a lower level of noise is used, and hence the better performance. Of course, when the level of noise is too high, the accuracy decreases because the filtering throws away parts of the signal that are useful for recognition. To illustrate that effect, we compared in Figures 8 and 9, the power spectrums of the corrupted signal with that of the filtered signal, i.e., the signal seen by the hidden layer. For a noise variance of and the correlated noise is still dominant and the SAR-HMM performs as well as the AR-SLDS. When the noise variance reaches 9, the noise correlation is lower, but still quite close to what it is in the training set; the performance of the SAR-HMM is still fine while that of the AR-SLDS increases because the noise starts to be filtered out. With a noise variance of 8, the signal is clearly degraded and a significant drop in the SAR-HMM performance is seen. On the contrary, the performance of the AR-SLDS is even better because the uncorrelated noise can be efficiently filtered out, leaving a clean filtered signal which is easily recognised by the embedded SAR-HMM. Finally, as the noise variance further increases, the performance of the AR-SLDS decreases because it becomes more and more difficult to remove the noise and to reconstruct the signal. The accuracy of the AR-SLDS is nevertheless still quite high compared to that of the SAR-SLDS. These experiments demonstrate that using a continuous hidden variable is profitable from the point of view of noise robustness since, although the SLDS was not trained directly, a simple embedding of a trained SAR-HMM into an 8

20 SLDS gave better results than the SAR-HMM alone and, for high noise variance, better than the HMM. This demonstrates that once the model has been trained, the variance of the visible noise can be automatically adapted to match new environmental conditions that did not exist in the training set. This is particularly useful because instead of having to train a model on a lot of different noise conditions, we have a versatile model that can adapt itself to unknown situations. The accuracy of the model could be further improved by introducing more specific knowledge that we have about the structure of the speech signal. Indeed, the SAR-HMM is a rather generic model for which encoding of structures like harmonic stacking is cumbersome. A Fourier type representation is more adequate in this case, and motivates the model proposed in Section. 5 Summary The work done during the thesis will hopefully be beneficial to at least two fields: speech recognition and machine learning. We are aware that our proposed approach to speech recognition is quite different to what is generally found in the literature, but we nevertheless think that it is an approach worth trying for the following reasons. Current HMM-based speech recognition systems, by including feature vectors which encompass time derivatives, are inconsistent. Although it has been recently suggested that this can be corrected by explicitly modelling the relationship between static and dynamic features [8, 7, 8], we think that this approach makes the modelling of the continuous component of the speech signal more difficult than it actually is. We also think that in order to improve speech recognition, it is necessary to introduce as much knowledge as we can about the dynamics of the signal. For this reason, we choose to model the speech waveform directly by means of a Fourier type representation, which allows encoding of structures like the harmonic stacking straightforward. Although this approach has already been used for music transcription, its application to speech recognition is, as far as we know, totally new. We tried to show with the preparatory work that evidences of a potential improvement of recognition accuracy exist. The ultimate goal of the thesis is indeed to demonstrate that state-of-the-art-speech recognition can be improved by a better understanding and modelling of the continuous component of speech. The thesis will also be beneficial to the field of machine learning. For example, the recent development of the EC algorithm is a major step towards more stable and accurate training algorithms for SLDSs. In that respect, the work done during the preparation of this proposal should lead to the publication of at least one journal paper and is already described in various research reports [, 5, 6]. More generally, the research done throughout the thesis will be mainly centred around the improvement of state-of-the-art techniques in speech recognition as well as in machine learning. It is thus expected to lead to various publications in the best journals and conferences in those fields. 9

21 Original Original e e e (a) e e e (b) Figure 8: (a) Original power spectrums of a one corrupted by various levels of noise, and (b) power spectrums of the corresponding filtered signal.

22 e 7 e e e e (a) e e e (b) Figure 9: (a) Original power spectrums of a one corrupted by various levels of noise, and (b) power spectrums of the corresponding filtered signal.

23 6 Research Plan This section describes a number of tasks that have to be carried out during this thesis. 6. Task : AR-HMM vs SLDS The goal of this task is to compare the recognition performance of the SAR- HMM [6] and an SLDS on single digits recognition. Contrary to what was presented in Section., the SLDS will not be limited to an AR-SLDS and will be fully trained. Its structure will be similar to the DBN shown on Figure, but augmented with a discrete variable to model the deterministic counter used in the SAR-HMM. The SLDS will first be trained and tested on the single digit utterances of the TI-DIGITS database in order to compare the results with those obtained by Ephraim and Roberts. As previously mentioned, our implementation of the SAR-HMM does not reach the same accuracy as that reported in [6], and we will thus also have to analyse the reasons of this deficiency. Other tests will also be done on the whole set of utterances of the TI-DIGITS databases and the Numbers database as well. The results are expected to demonstrate that modelling dynamics at the hidden space level is more robust than what is done in an SAR-HMM. If this assumption is confirmed noise robustness will eventually be tested on the AURORA task. 6. Task : Explicit State Duration Modelling The goal of this task is to evaluate various explicit state duration models. The deterministic counter used by the SLDS of Task will be replaced by an explicit state duration distribution to model explicitly temporal dependencies over potentially several hundred milliseconds. The new SLDS will then be trained and tested on the same databases than Task. The results should show if the additional computational overhead caused by the use of a stochastic state duration model is worth the performance improvement. 6. Task : Discriminative Training Task and assume that an efficient training procedure exists for the proposed SLDS. Maximum likelihood is an obvious objective function and will be used as a baseline. As an alternative we will formulate a discriminative training algorithm for the SLDS. Different approximations will be considered: (a) log-likelihood ratio and (b) conditional likelihood maximisation. In (b) novel inference and learning techniques will need to be developed. The goal of this task is to explore the possibility of applying discriminative training procedures to SLDS and to evaluate the performance of the training algorithm on Task and. In that respect, correspondence with the authors is ongoing.

CURRENT state-of-the-art automatic speech recognition

CURRENT state-of-the-art automatic speech recognition 1850 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 6, AUGUST 2007 Switching Linear Dynamical Systems for Noise Robust Speech Recognition Bertrand Mesot and David Barber Abstract

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Hidden Markov Models Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Additional References: David

More information

Augmented Statistical Models for Speech Recognition

Augmented Statistical Models for Speech Recognition Augmented Statistical Models for Speech Recognition Mark Gales & Martin Layton 31 August 2005 Trajectory Models For Speech Processing Workshop Overview Dependency Modelling in Speech Recognition: latent

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data Statistical Machine Learning from Data Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne (EPFL),

More information

University of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I

University of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I University of Cambridge MPhil in Computer Speech Text & Internet Technology Module: Speech Processing II Lecture 2: Hidden Markov Models I o o o o o 1 2 3 4 T 1 b 2 () a 12 2 a 3 a 4 5 34 a 23 b () b ()

More information

Math 350: An exploration of HMMs through doodles.

Math 350: An exploration of HMMs through doodles. Math 350: An exploration of HMMs through doodles. Joshua Little (407673) 19 December 2012 1 Background 1.1 Hidden Markov models. Markov chains (MCs) work well for modelling discrete-time processes, or

More information

Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features

Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Heiga ZEN (Byung Ha CHUN) Nagoya Inst. of Tech., Japan Overview. Research backgrounds 2.

More information

CS 188: Artificial Intelligence Fall 2011

CS 188: Artificial Intelligence Fall 2011 CS 188: Artificial Intelligence Fall 2011 Lecture 20: HMMs / Speech / ML 11/8/2011 Dan Klein UC Berkeley Today HMMs Demo bonanza! Most likely explanation queries Speech recognition A massive HMM! Details

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Particle Filters and Applications of HMMs Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Jorge Silva and Shrikanth Narayanan, Senior Member, IEEE. 1 is the probability measure induced by the probability density function

Jorge Silva and Shrikanth Narayanan, Senior Member, IEEE. 1 is the probability measure induced by the probability density function 890 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 3, MAY 2006 Average Divergence Distance as a Statistical Discrimination Measure for Hidden Markov Models Jorge Silva and Shrikanth

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, etworks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Hidden Markov Models and Gaussian Mixture Models

Hidden Markov Models and Gaussian Mixture Models Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 23&27 January 2014 ASR Lectures 4&5 Hidden Markov Models and Gaussian

More information

Hidden Markov Modelling

Hidden Markov Modelling Hidden Markov Modelling Introduction Problem formulation Forward-Backward algorithm Viterbi search Baum-Welch parameter estimation Other considerations Multiple observation sequences Phone-based models

More information

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes (bilmes@cs.berkeley.edu) International Computer Science Institute

More information

Noisy Pattern Search using Hidden Markov Models

Noisy Pattern Search using Hidden Markov Models Noisy Pattern Search using Hidden Markov Models David Barber University College London March 3, 22 String search We have a pattern string david that we wish to search for in an observation string, say

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition

Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition ABSTRACT It is well known that the expectation-maximization (EM) algorithm, commonly used to estimate hidden

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Particle Filters and Applications of HMMs Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro

More information

Temporal Modeling and Basic Speech Recognition

Temporal Modeling and Basic Speech Recognition UNIVERSITY ILLINOIS @ URBANA-CHAMPAIGN OF CS 498PS Audio Computing Lab Temporal Modeling and Basic Speech Recognition Paris Smaragdis paris@illinois.edu paris.cs.illinois.edu Today s lecture Recognizing

More information

Forward algorithm vs. particle filtering

Forward algorithm vs. particle filtering Particle Filtering ØSometimes X is too big to use exact inference X may be too big to even store B(X) E.g. X is continuous X 2 may be too big to do updates ØSolution: approximate inference Track samples

More information

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010 Hidden Markov Models Aarti Singh Slides courtesy: Eric Xing Machine Learning 10-701/15-781 Nov 8, 2010 i.i.d to sequential data So far we assumed independent, identically distributed data Sequential data

More information

Expectation propagation for signal detection in flat-fading channels

Expectation propagation for signal detection in flat-fading channels Expectation propagation for signal detection in flat-fading channels Yuan Qi MIT Media Lab Cambridge, MA, 02139 USA yuanqi@media.mit.edu Thomas Minka CMU Statistics Department Pittsburgh, PA 15213 USA

More information

Adapting Wavenet for Speech Enhancement DARIO RETHAGE JULY 12, 2017

Adapting Wavenet for Speech Enhancement DARIO RETHAGE JULY 12, 2017 Adapting Wavenet for Speech Enhancement DARIO RETHAGE JULY 12, 2017 I am v Master Student v 6 months @ Music Technology Group, Universitat Pompeu Fabra v Deep learning for acoustic source separation v

More information

Hidden Markov models

Hidden Markov models Hidden Markov models Charles Elkan November 26, 2012 Important: These lecture notes are based on notes written by Lawrence Saul. Also, these typeset notes lack illustrations. See the classroom lectures

More information

Sound Recognition in Mixtures

Sound Recognition in Mixtures Sound Recognition in Mixtures Juhan Nam, Gautham J. Mysore 2, and Paris Smaragdis 2,3 Center for Computer Research in Music and Acoustics, Stanford University, 2 Advanced Technology Labs, Adobe Systems

More information

HMM and IOHMM Modeling of EEG Rhythms for Asynchronous BCI Systems

HMM and IOHMM Modeling of EEG Rhythms for Asynchronous BCI Systems HMM and IOHMM Modeling of EEG Rhythms for Asynchronous BCI Systems Silvia Chiappa and Samy Bengio {chiappa,bengio}@idiap.ch IDIAP, P.O. Box 592, CH-1920 Martigny, Switzerland Abstract. We compare the use

More information

Expectation Correction for Smoothed Inference in Switching Linear Dynamical Systems

Expectation Correction for Smoothed Inference in Switching Linear Dynamical Systems Expectation CorrectionBarber Expectation Correction for Smoothed Inference in Switching Linear Dynamical Systems David Barber IDIAP Research Institute Rue du Simplon 4 CH-1920 Martigny Switzerland david.barber@idiap.ch

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 24. Hidden Markov Models & message passing Looking back Representation of joint distributions Conditional/marginal independence

More information

Hidden Markov Models and Gaussian Mixture Models

Hidden Markov Models and Gaussian Mixture Models Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 25&29 January 2018 ASR Lectures 4&5 Hidden Markov Models and Gaussian

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Slides mostly from Mitch Marcus and Eric Fosler (with lots of modifications). Have you seen HMMs? Have you seen Kalman filters? Have you seen dynamic programming? HMMs are dynamic

More information

How to do backpropagation in a brain

How to do backpropagation in a brain How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep

More information

CS 5522: Artificial Intelligence II

CS 5522: Artificial Intelligence II CS 5522: Artificial Intelligence II Particle Filters and Applications of HMMs Instructor: Wei Xu Ohio State University [These slides were adapted from CS188 Intro to AI at UC Berkeley.] Recap: Reasoning

More information

ECE521 Lecture 19 HMM cont. Inference in HMM

ECE521 Lecture 19 HMM cont. Inference in HMM ECE521 Lecture 19 HMM cont. Inference in HMM Outline Hidden Markov models Model definitions and notations Inference in HMMs Learning in HMMs 2 Formally, a hidden Markov model defines a generative process

More information

MODULE -4 BAYEIAN LEARNING

MODULE -4 BAYEIAN LEARNING MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities

More information

PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS

PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS Jinjin Ye jinjin.ye@mu.edu Michael T. Johnson mike.johnson@mu.edu Richard J. Povinelli richard.povinelli@mu.edu

More information

CS 5522: Artificial Intelligence II

CS 5522: Artificial Intelligence II CS 5522: Artificial Intelligence II Particle Filters and Applications of HMMs Instructor: Alan Ritter Ohio State University [These slides were adapted from CS188 Intro to AI at UC Berkeley. All materials

More information

Hidden Markov Models in Language Processing

Hidden Markov Models in Language Processing Hidden Markov Models in Language Processing Dustin Hillard Lecture notes courtesy of Prof. Mari Ostendorf Outline Review of Markov models What is an HMM? Examples General idea of hidden variables: implications

More information

Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi

Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi Lecture - 41 Pulse Code Modulation (PCM) So, if you remember we have been talking

More information

Augmented Statistical Models for Classifying Sequence Data

Augmented Statistical Models for Classifying Sequence Data Augmented Statistical Models for Classifying Sequence Data Martin Layton Corpus Christi College University of Cambridge September 2006 Dissertation submitted to the University of Cambridge for the degree

More information

Hidden Markov Models Part 2: Algorithms

Hidden Markov Models Part 2: Algorithms Hidden Markov Models Part 2: Algorithms CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Hidden Markov Model An HMM consists of:

More information

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing Hidden Markov Models By Parisa Abedi Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed data Sequential (non i.i.d.) data Time-series data E.g. Speech

More information

Hidden Markov Models and other Finite State Automata for Sequence Processing

Hidden Markov Models and other Finite State Automata for Sequence Processing To appear in The Handbook of Brain Theory and Neural Networks, Second edition, (M.A. Arbib, Ed.), Cambridge, MA: The MIT Press, 2002. http://mitpress.mit.edu The MIT Press Hidden Markov Models and other

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Proc. of NCC 2010, Chennai, India

Proc. of NCC 2010, Chennai, India Proc. of NCC 2010, Chennai, India Trajectory and surface modeling of LSF for low rate speech coding M. Deepak and Preeti Rao Department of Electrical Engineering Indian Institute of Technology, Bombay

More information

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels? Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity

More information

Lecture 3: ASR: HMMs, Forward, Viterbi

Lecture 3: ASR: HMMs, Forward, Viterbi Original slides by Dan Jurafsky CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 3: ASR: HMMs, Forward, Viterbi Fun informative read on phonetics The

More information

An Evolutionary Programming Based Algorithm for HMM training

An Evolutionary Programming Based Algorithm for HMM training An Evolutionary Programming Based Algorithm for HMM training Ewa Figielska,Wlodzimierz Kasprzak Institute of Control and Computation Engineering, Warsaw University of Technology ul. Nowowiejska 15/19,

More information

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 10: Acoustic Models

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 10: Acoustic Models Statistical NLP Spring 2009 The Noisy Channel Model Lecture 10: Acoustic Models Dan Klein UC Berkeley Search through space of all possible sentences. Pick the one that is most probable given the waveform.

More information

Statistical NLP Spring The Noisy Channel Model

Statistical NLP Spring The Noisy Channel Model Statistical NLP Spring 2009 Lecture 10: Acoustic Models Dan Klein UC Berkeley The Noisy Channel Model Search through space of all possible sentences. Pick the one that is most probable given the waveform.

More information

The Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech

The Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech CS 294-5: Statistical Natural Language Processing The Noisy Channel Model Speech Recognition II Lecture 21: 11/29/05 Search through space of all possible sentences. Pick the one that is most probable given

More information

9 Multi-Model State Estimation

9 Multi-Model State Estimation Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof. N. Shimkin 9 Multi-Model State

More information

Expectation Correction for Smoothed Inference in Switching Linear Dynamical Systems

Expectation Correction for Smoothed Inference in Switching Linear Dynamical Systems Expectation Correction Barber Expectation Correction for Smoothed Inference in Switching Linear Dynamical Systems David Barber IDIAP Research Institute Rue du Simplon 4 CH-1920 Martigny Switzerland david.barber@idiap.ch

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Hidden Markov Models Barnabás Póczos & Aarti Singh Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed

More information

Engineering Part IIB: Module 4F11 Speech and Language Processing Lectures 4/5 : Speech Recognition Basics

Engineering Part IIB: Module 4F11 Speech and Language Processing Lectures 4/5 : Speech Recognition Basics Engineering Part IIB: Module 4F11 Speech and Language Processing Lectures 4/5 : Speech Recognition Basics Phil Woodland: pcw@eng.cam.ac.uk Lent 2013 Engineering Part IIB: Module 4F11 What is Speech Recognition?

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA Contents in latter part Linear Dynamical Systems What is different from HMM? Kalman filter Its strength and limitation Particle Filter

More information

Lecture 21: Spectral Learning for Graphical Models

Lecture 21: Spectral Learning for Graphical Models 10-708: Probabilistic Graphical Models 10-708, Spring 2016 Lecture 21: Spectral Learning for Graphical Models Lecturer: Eric P. Xing Scribes: Maruan Al-Shedivat, Wei-Cheng Chang, Frederick Liu 1 Motivation

More information

1. Markov models. 1.1 Markov-chain

1. Markov models. 1.1 Markov-chain 1. Markov models 1.1 Markov-chain Let X be a random variable X = (X 1,..., X t ) taking values in some set S = {s 1,..., s N }. The sequence is Markov chain if it has the following properties: 1. Limited

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Dr Philip Jackson Centre for Vision, Speech & Signal Processing University of Surrey, UK 1 3 2 http://www.ee.surrey.ac.uk/personal/p.jackson/isspr/ Outline 1. Recognizing patterns

More information

Machine Learning Overview

Machine Learning Overview Machine Learning Overview Sargur N. Srihari University at Buffalo, State University of New York USA 1 Outline 1. What is Machine Learning (ML)? 2. Types of Information Processing Problems Solved 1. Regression

More information

Doctoral Course in Speech Recognition. May 2007 Kjell Elenius

Doctoral Course in Speech Recognition. May 2007 Kjell Elenius Doctoral Course in Speech Recognition May 2007 Kjell Elenius CHAPTER 12 BASIC SEARCH ALGORITHMS State-based search paradigm Triplet S, O, G S, set of initial states O, set of operators applied on a state

More information

Deep Learning for Speech Recognition. Hung-yi Lee

Deep Learning for Speech Recognition. Hung-yi Lee Deep Learning for Speech Recognition Hung-yi Lee Outline Conventional Speech Recognition How to use Deep Learning in acoustic modeling? Why Deep Learning? Speaker Adaptation Multi-task Deep Learning New

More information

An Asynchronous Hidden Markov Model for Audio-Visual Speech Recognition

An Asynchronous Hidden Markov Model for Audio-Visual Speech Recognition An Asynchronous Hidden Markov Model for Audio-Visual Speech Recognition Samy Bengio Dalle Molle Institute for Perceptual Artificial Intelligence (IDIAP) CP 592, rue du Simplon 4, 1920 Martigny, Switzerland

More information

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated

More information

Dynamic Bayesian Networks with Deterministic Latent Tables

Dynamic Bayesian Networks with Deterministic Latent Tables Dynamic Bayesian Networks with Deterministic Latent Tables David Barber Institute for Adaptive and Neural Computation Edinburgh University 5 Forrest Hill, Edinburgh, EH1 QL, U.K. dbarber@anc.ed.ac.uk Abstract

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Roman Barták Department of Theoretical Computer Science and Mathematical Logic Summary of last lecture We know how to do probabilistic reasoning over time transition model P(X t

More information

HMM part 1. Dr Philip Jackson

HMM part 1. Dr Philip Jackson Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. HMM part 1 Dr Philip Jackson Probability fundamentals Markov models State topology diagrams Hidden Markov models -

More information

CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm

CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm + September13, 2016 Professor Meteer CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm Thanks to Dan Jurafsky for these slides + ASR components n Feature

More information

Data-Intensive Computing with MapReduce

Data-Intensive Computing with MapReduce Data-Intensive Computing with MapReduce Session 8: Sequence Labeling Jimmy Lin University of Maryland Thursday, March 14, 2013 This work is licensed under a Creative Commons Attribution-Noncommercial-Share

More information

Natural Language Processing Prof. Pushpak Bhattacharyya Department of Computer Science & Engineering, Indian Institute of Technology, Bombay

Natural Language Processing Prof. Pushpak Bhattacharyya Department of Computer Science & Engineering, Indian Institute of Technology, Bombay Natural Language Processing Prof. Pushpak Bhattacharyya Department of Computer Science & Engineering, Indian Institute of Technology, Bombay Lecture - 21 HMM, Forward and Backward Algorithms, Baum Welch

More information

Discriminative models for speech recognition

Discriminative models for speech recognition Discriminative models for speech recognition Anton Ragni Peterhouse University of Cambridge A thesis submitted for the degree of Doctor of Philosophy 2013 Declaration This dissertation is the result of

More information

A TWO-LAYER NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR VOCABULARY DISCOVERY. MengSun,HugoVanhamme

A TWO-LAYER NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR VOCABULARY DISCOVERY. MengSun,HugoVanhamme A TWO-LAYER NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR VOCABULARY DISCOVERY MengSun,HugoVanhamme Department of Electrical Engineering-ESAT, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, Bus

More information

Stochastic Structural Dynamics Prof. Dr. C. S. Manohar Department of Civil Engineering Indian Institute of Science, Bangalore

Stochastic Structural Dynamics Prof. Dr. C. S. Manohar Department of Civil Engineering Indian Institute of Science, Bangalore Stochastic Structural Dynamics Prof. Dr. C. S. Manohar Department of Civil Engineering Indian Institute of Science, Bangalore Lecture No. # 33 Probabilistic methods in earthquake engineering-2 So, we have

More information

A New OCR System Similar to ASR System

A New OCR System Similar to ASR System A ew OCR System Similar to ASR System Abstract Optical character recognition (OCR) system is created using the concepts of automatic speech recognition where the hidden Markov Model is widely used. Results

More information

CS229 Project: Musical Alignment Discovery

CS229 Project: Musical Alignment Discovery S A A V S N N R R S CS229 Project: Musical Alignment iscovery Woodley Packard ecember 16, 2005 Introduction Logical representations of musical data are widely available in varying forms (for instance,

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

Bayesian Networks BY: MOHAMAD ALSABBAGH

Bayesian Networks BY: MOHAMAD ALSABBAGH Bayesian Networks BY: MOHAMAD ALSABBAGH Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN Introduction Conditional

More information

CEPSTRAL analysis has been widely used in signal processing

CEPSTRAL analysis has been widely used in signal processing 162 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 7, NO. 2, MARCH 1999 On Second-Order Statistics and Linear Estimation of Cepstral Coefficients Yariv Ephraim, Fellow, IEEE, and Mazin Rahim, Senior

More information

Robust Speaker Identification

Robust Speaker Identification Robust Speaker Identification by Smarajit Bose Interdisciplinary Statistical Research Unit Indian Statistical Institute, Kolkata Joint work with Amita Pal and Ayanendranath Basu Overview } } } } } } }

More information

LEARNING DYNAMIC SYSTEMS: MARKOV MODELS

LEARNING DYNAMIC SYSTEMS: MARKOV MODELS LEARNING DYNAMIC SYSTEMS: MARKOV MODELS Markov Process and Markov Chains Hidden Markov Models Kalman Filters Types of dynamic systems Problem of future state prediction Predictability Observability Easily

More information

Machine Recognition of Sounds in Mixtures

Machine Recognition of Sounds in Mixtures Machine Recognition of Sounds in Mixtures Outline 1 2 3 4 Computational Auditory Scene Analysis Speech Recognition as Source Formation Sound Fragment Decoding Results & Conclusions Dan Ellis

More information

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang Chapter 4 Dynamic Bayesian Networks 2016 Fall Jin Gu, Michael Zhang Reviews: BN Representation Basic steps for BN representations Define variables Define the preliminary relations between variables Check

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Hidden Markov Models: All the Glorious Gory Details

Hidden Markov Models: All the Glorious Gory Details Hidden Markov Models: All the Glorious Gory Details Noah A. Smith Department of Computer Science Johns Hopkins University nasmith@cs.jhu.edu 18 October 2004 1 Introduction Hidden Markov models (HMMs, hereafter)

More information

On the Influence of the Delta Coefficients in a HMM-based Speech Recognition System

On the Influence of the Delta Coefficients in a HMM-based Speech Recognition System On the Influence of the Delta Coefficients in a HMM-based Speech Recognition System Fabrice Lefèvre, Claude Montacié and Marie-José Caraty Laboratoire d'informatique de Paris VI 4, place Jussieu 755 PARIS

More information

Independent Component Analysis and Unsupervised Learning

Independent Component Analysis and Unsupervised Learning Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien National Cheng Kung University TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent

More information

Evaluation of the modified group delay feature for isolated word recognition

Evaluation of the modified group delay feature for isolated word recognition Evaluation of the modified group delay feature for isolated word recognition Author Alsteris, Leigh, Paliwal, Kuldip Published 25 Conference Title The 8th International Symposium on Signal Processing and

More information

Approximate Inference

Approximate Inference Approximate Inference Simulation has a name: sampling Sampling is a hot topic in machine learning, and it s really simple Basic idea: Draw N samples from a sampling distribution S Compute an approximate

More information

Inference and estimation in probabilistic time series models

Inference and estimation in probabilistic time series models 1 Inference and estimation in probabilistic time series models David Barber, A Taylan Cemgil and Silvia Chiappa 11 Time series The term time series refers to data that can be represented as a sequence

More information

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood

More information

Development of Stochastic Artificial Neural Networks for Hydrological Prediction

Development of Stochastic Artificial Neural Networks for Hydrological Prediction Development of Stochastic Artificial Neural Networks for Hydrological Prediction G. B. Kingston, M. F. Lambert and H. R. Maier Centre for Applied Modelling in Water Engineering, School of Civil and Environmental

More information

Machine Learning & Data Mining Caltech CS/CNS/EE 155 Hidden Markov Models Last Updated: Feb 7th, 2017

Machine Learning & Data Mining Caltech CS/CNS/EE 155 Hidden Markov Models Last Updated: Feb 7th, 2017 1 Introduction Let x = (x 1,..., x M ) denote a sequence (e.g. a sequence of words), and let y = (y 1,..., y M ) denote a corresponding hidden sequence that we believe explains or influences x somehow

More information

Improving the Multi-Stack Decoding Algorithm in a Segment-based Speech Recognizer

Improving the Multi-Stack Decoding Algorithm in a Segment-based Speech Recognizer Improving the Multi-Stack Decoding Algorithm in a Segment-based Speech Recognizer Gábor Gosztolya, András Kocsor Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University

More information

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg Temporal Reasoning Kai Arras, University of Freiburg 1 Temporal Reasoning Contents Introduction Temporal Reasoning Hidden Markov Models Linear Dynamical Systems (LDS) Kalman Filter 2 Temporal Reasoning

More information

Why do we care? Examples. Bayes Rule. What room am I in? Handling uncertainty over time: predicting, estimating, recognizing, learning

Why do we care? Examples. Bayes Rule. What room am I in? Handling uncertainty over time: predicting, estimating, recognizing, learning Handling uncertainty over time: predicting, estimating, recognizing, learning Chris Atkeson 004 Why do we care? Speech recognition makes use of dependence of words and phonemes across time. Knowing where

More information

DETECTING PROCESS STATE CHANGES BY NONLINEAR BLIND SOURCE SEPARATION. Alexandre Iline, Harri Valpola and Erkki Oja

DETECTING PROCESS STATE CHANGES BY NONLINEAR BLIND SOURCE SEPARATION. Alexandre Iline, Harri Valpola and Erkki Oja DETECTING PROCESS STATE CHANGES BY NONLINEAR BLIND SOURCE SEPARATION Alexandre Iline, Harri Valpola and Erkki Oja Laboratory of Computer and Information Science Helsinki University of Technology P.O.Box

More information

Algorithm-Independent Learning Issues

Algorithm-Independent Learning Issues Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning

More information