Learning Dynamic Audio/Visual Mapping with Input-Output Hidden Markov Models

Size: px
Start display at page:

Download "Learning Dynamic Audio/Visual Mapping with Input-Output Hidden Markov Models"

Transcription

1 Learning Dynamic Audio/Visual Mapping with Input-Output Hidden Markov Models Yan Li and Heung-Yeung Shum Microsoft Research China Beijing , P.R. China Abstract In this paper we formulate the problem of synthesizing facial animation from an input audio sequence (a.k.a. video rewrite, voice puppetry) as dynamic audio/visual mapping. We propose that audio/visual mapping should be modeled with an input-output hidden Markov model, or IOHMM. An IOHMM is an HMM for which the emission and transition probabilities are conditional on the input sequence. We train IOHMMs using the expectation-maximization (EM) algorithm with a novel architecture to explicitly model the relationship between each transition probability and the input using a neural network. Given an input sequence, an output sequence is synthesized by a maximum likelihood estimation. Experimental results demonstrate that IOHMMs can generate good-quality and natural facial animation sequences from input audio. 1

2 1 Introduction Dynamic audio/visual mapping (or vocal/facial mapping) has recently received much attention as a powerful alternative to traditional facial animation techniques [6, 5, 7, 4, 9]. Instead of directly animating facial expression, a sequence of audio is used to drive the facial motion. While voice is generated from the vocal cords, and facial expressions are formed from facial skin and muscles, there exists a great deal of mutual information between audio and visual signals. Representative projects on learning dynamic audio/visual mappings, recently from the graphics community, include video rewrite and voice puppetry. A good survey on the importance and difficulties of audio/visual mapping can be found in Section 2 of [4]. Sequential dynamics of audio or visual signals can be modeled effectively using hidden Markov models (HMMs) [10, 12]. HMMs have been widely used for speech recognition and gesture recognition from video sequences. They employ hidden states to carry contextual information forward and backward in time, and model the contextual information by state observation probabilities and state transition probabilities. HMMs can be useful for representing dynamic audio/visual mappings as well, as advocated by [5] and [4]. For instance, co-articulation, i.e., the context before and after the current frame, can be taken into account in the HMMs for dynamic mapping. Unfortunately, different audio signals can correspond to a single facial expression, while many facial expressions correspond to the same audio track. To deal with the many-to-many mapping between audio signals and visual signals, previous approaches assume that either the audio or the video can be modeled by a hidden Markov model (HMM). For example, the video rewrite technique [5] recognizes different phonemes from the input audio signal. Animation is generated by re-ordering the captured video frames which share similar phonemes as in the training video. Co-articulation from the preceding and the subsequent phonemes is considered by the use of a triphone model. But the training video may not have enough triphone samples. Moreover, long-range co-articulation effects are not captured in triphone models. On the other hand, the voice puppetry technique [4] trains an HMM model for 2

3 the visual signal. Then the corresponding audio signal is analyzed with respect to the learnt visual models (i.e., facial gestures). A remapping process is employed to give each state a dual mapping into both audio and visual signals. Given a novel audio signal, another analysis step is needed to combine the audio signal with the HMM model and the remapping results to generate the most likely visual state sequence. Finally, animation is generated from the most likely visual states and the learnt visual output distribution to obtain an optimal trajectory in the visual configuration space. There is a reason why the cumbersome remapping and analysis steps are needed in voice puppetry. Although HMM has been shown to be a powerful tool to model the dynamic process, it is quite sub-optimal for synthesis. Traditionally, for recognition, an HMM aims to model the dynamics of one kind of signal. For synthesis, we need to explore the mapping relationship between different signals, each of which might have a different probabilistic model. Moreover, the model parameters in conventional HMMs are fixed after training, which result in a homogeneous Markov chain. On the other hand, when our observations are two related input and output sequences, and the output sequence conditionally depends on the input sequence, the expected model should be inhomogeneous, or have the ability of adapting to the input. In this paper, we propose that dynamic audio/visual mapping should be learnt by an input-output hidden Markov model, or IOHMM. IOHMM, a.k.a. conditional HMM originally introduced by Bengio [3, 1] for sequence processing, can be stated as follows: An IOHMM is an HMM for which the emission and transition distributions are conditional on the input sequence. Specifically in this paper, we present novel algorithms to tackle the following two problems: learning IOHMMs for dynamic vocal/facial mapping from synchronized audio and visual signals; synthesizing facial expressions from input audio and the learnt IOHMMs. Our approach is inspired by the voice puppetry work. While the voice puppetry work highlights a new learning algorithm which determines the compact structure of 3

4 HMMs, our work emphasizes the essence of learning dynamic input/output mapping with IOHMMs. Our approach is much less complex because the input and the output are put together for training and synthesis, thus eliminating the need for the remapping and analysis steps used in voice puppetry. As a consequence, we can generate good quality and natural synthesis results. The remainder of this paper is organized as follows. The IOHMM model is introduced in Section 2. We explain why HMM needs to be augmented to IOHMM for the synthesis task. The audio/visual mapping is studied in Section 3. Experimental results are presented in Section 4. Finally we conclude our paper in Section 5. 2 IOHMM for synthesis 2.1 HMM HMMs are statistical models of sequential data that have been used successfully in many applications, e.g., speech recognition. A Bayesian network [11] representing graphically the independence assumptions of an HMM is shown in Figure 1(a). The relationship between the observed (output) sequence y1 t = (y 1,..., y t ) and the hidden state sequence q1 t = (q 1,..., q t ) satisfies the following conditional first-order independence assumptions P (y t q t 1, y t 1 1 ) = P (y t q t 1) (1) P (q t+1 q t 1, y t 1) = P (q t+1 q t 1) (2) as Therefore, the joint distribution of the hidden and observed variables can be simplified T P (y1 T, q1 T 1 T ) = P (q 1 ) P (q t+1 q t ) P (y t q t ) (3) t=1 t=1 The joint distribution is therefore completely specified by the initial state probabilities π = P (q 1 ), the transition probabilities A = P (q t q t 1 ) and the emission probabilities B = p(y t q t ). Thus, a compact HMM model is λ = (A, B, π). Details of solving three fundamental problems of HMMs can be found in the literature such as the HMM tutorial paper [12], including the Viterbi algorthm and the Baum-Welch algorithm. 4

5 The conventional HMM can be extended for the purpose of dynamic input/output mapping. The Bayesian network shown in Figure 1(b) illustrates that the learnt HMM from the observed output sequence can be remapped to the input sequence (in dotted lines). This is exactly the approach adopted in [4] for synthesizing facial gesture from voice. Although compelling results have been shown, this technique has two problems. First, at the synthesis step, vocal signals are only used to generate the most likely state sequence. Animation is generated by solving a global trajectory in the visual state space, which obliterates the relationship between vocal and visual signals. This problem can be partially addressed by enforcing the local input/output relationship, i.e., adding a direct arc from input to output, as shown in Figure 1(c). At the synthesis stage, from the state sequence and input signal, we can generate the output sequence with the help of the local input/output mapping. Introducing a local model into HMM is necessary for the synthesis problem because we expect to obtain a continuous output, not to classify the input into a specific state (as expected in a recognition problem). If we have no prior knowledge of the relationship between input and output, the local mapping model can be obtained by some regression method such as a neural network. Generally speaking, a more explicit and compact distribution of the output can be learnt by introducing some prior knowledge or assumptions about the input and output signals. Second, and more significantly, a remapping process is required to map the occupancy matrix (obtained from the HMM model for the output sequence) to the synchronized input so that each state has a dual mapping with both input and output. The underlying assumption made in the remapping process is that the input sequence shares the dynamic behavior exhibited in the HMM trained from the output. As a result, the learnt model is homogeneous for all input sequences. These problems are addressed in the Bayesian network shown in Figure 1(d) where input and output are put together for training. The model proposed in Figure 1(d) is called IOHMM or conditional HMM because the model configuration is conditionally dependent on the input sequence. This is illustrated by the arc from the input to the state (x t q t ) in Figure 1(d) having a direction reverse from that in Figure 1(b). It indicates 5

6 q t qt + 1 q t+ 2 y t yt + 1 y t+ 2 (a) q t qt + 1 q t+ 2 x y t t t + 1 t+ 1 x y x t+ y 2 t+ 2 (b) q t qt + 1 q t+ 2 x t t t + 1 y x yt+ 1 x t+ y 2 t+ 2 (c) q t qt + 1 q t+ 2 x t t t + 1 y x yt+ 1 x t+ y 2 t+ 2 (d) Figure 1: Bayesian networks for several hidden Markov models. (a) Conventional HMM where no input is in the network. (b) HMM remapped (with dotted lines) to the input sequence as well. (c) Remapped HMM (b) plus direct connection between input and output. (d) Input-output HMM. Dotted lines from q t to x t in (b)(c) indicate a remapping process, not a causal effect. Solid line from x t to q t in (d) shows that q t and transition from q t to q t+1 are conditional to x t. 6

7 the causal effect from the input to the output. For reference, notations for IOHMM are summarized in the Appendix A. 2.2 IOHMM The main difference between standard HMMs and IOHMMs, is that the former represents the distribution P (y1 T ) of output sequences, whereas the latter represents the conditional distribution P (y1 T x T 1 ) of the output sequence given the input sequence x T 1 = (x 1, x 2,..., x T ). IOHMMs are trained by maximizing the conditional likelihood P (y1 T x T 1 ). This is a supervised learning problem since the output y1 T plays the role of a desired output in response to the input x T 1. The Bayesian network for HMMs (Figure 1(a)) can be obtained by simply removing the input nodes and arcs from the IOHMM in Figure 1(d). The arc from x t to y t in Figure 1(d) indicates that IOHMMs represent a conditional distribution of an (desired) output sequence when an (observed) input sequence is given. And the arc from x t to q t implies that in IOHMM, transition probabilities are conditional on the input and thus depend on time, resulting in inhomogeneous Markov chains. In comparison, standard HMMs are based on homogeneous Markov chains. Therefore, IOHMMs are better suited for learning to represent long-range context than HMMs. These properties of IOHMMs make them more suitable than traditional HMM for synthesis. 2.3 An example We illustrate the difference between HMMs and IOHMMs in training and synthesis from a toy problem below Problem description The input and output sequences shown in Figures 2(a) and (b) have the following properties: At any time instant, the input signal is assumed to move along one of the two concentric circles, clockwise along the outer circle, but counterclockwise along the 7

8 (a) (b) Figure 2: A toy problem to map circles to squares. (a) Distribution of the input signal; (b) distribution of the output signal. Solid lines and curves are the paths in which the data move; dots are the actual samples perturbed by noise. inner one, indicated by circles with arrows. Gaussian noise proportional to the circle radius is further added to the point positions. The output signal is synchronous to the input signal. The output signal moves along one of the two corresponding diamonds, clockwise along the outer diamond, counterclockwise along the inner one. The input can only jump to adjacent circles from point J 1 to J 2, or vice versa, as shown in Figure 2(a). The output jumps accordingly. Our objective is to learn the dynamic mapping between the input and output. Furthermore, given a new input sequence, we would like to synthesize the most likely output sequence that best fits the learnt model HMM To simplify the training problem, we assume four states in our HMM. It has been shown in [4] that the minimum entropy principle can be used to learn the number of states and the structure of HMMs. Applying the standard HMM to the output sequence, we obtain four states shown in Figure 3(a), each of which represents the data distribution 8

9 (a) (b) Figure 3: The learnt HMM from the output sequence of Figure 2(b). (a) Four states of HMM; (b) fixed transition between HMM states. Solid lines in (a) indicate that model parameters are fully specified. Different shapes at the four states represent different distributions. along a specific side of the diamond. HMMs (with remapping from the output to the input as shown in Figure 1(b)) are inappropriate for synthesis because of the following two reasons. First, HMMs do not represent any dynamics at a finer scale than a state. This causes blurring and muting of the output, and eliminates the fine-scale noise and texture that are expected for synthesis. For example, we might be able to recognize that the output is in state 0 in Figure 3(a), but we cannot determine if it is on the inner or outer diamond. Although the expressive power of the model can be ameliorated by adding more states, e.g., using 8 or 16 states for this toy problem, the complexity of the state machine increases (imagine that we have 100 concentric diamonds with different sizes for the output). Second, as shown in Figure 3(b), the transition probabilities of an HMM are fixed after training. A transition probability represents an average transiting behavior between two states. The amount of uncertainty of the transition is, however, not modeled. Therefore, an HMM cannot distinguish whether a transition probability is highly volatile or fixed. With the fixed transition probability matrix, the HMM in Figure 3(a) cannot 9

10 (a) (b) (c) (d) (e) (f) Figure 4: The learnt IOHMM to map circles to diamonds. (a) Four states of IOHMM; (b) four points from the input; (c)-(f) corresponding transition matrices for the four points A, B, C, D shown in (b). Dotted lines in (a) indicate that the model parameters are not fully specified unless the input is given. synthesize the correct change from one diamond boundary to another. To apply HMMs for synthesis, the emission and transition probabilities must depend on the input. The importance of input-output dependency can be shown by another example of synthesizing visual speech from audio. Suppose that we have obtained two states A and B that represent the closed mouth and open mouth respectively. The transition probability from A to B (a AB ) is equal to that from B to A (a BA ). However, (a AB ) should be much larger than (a BA ) when the energy of the input vocal signal is increasing (e.g., starting a speech). 10

11 2.3.3 IOHMM For IOHMM, we again use four states to model the output data distribution, as shown in Figure 4(a). Dotted lines for the states in Figure 4(a) indicate that the specific formulations of the emission probability and the transition probability are not fully determined unless an input is given. Training an IOHMM is much more complex than training an HMM because the emission and transition probabilities are conditional on the input. In particular, for each entry in the transition matrix, its conditional distribution on the input may not have an analytical form. Therefore, the mapping from the input to the transition matrix should be trained by neural networks, as suggested by Bengio [2]. In general, the emission probabilities can be learnt using neural networks as well. But often they can be modeled by some radial basis functions (RBF) such as Gaussian distributions given some prior knowledge on the input/output relationship. Obviously, if we add more prior knowledge, we can obtain more compact and explicit output distributions. At the extreme, the training process degenerates to a regression problem between the input and the output. In this experiment, we simplify the emission distribution at each state as a Gaussian output whose mean and variance are determined by the input data. At each state S i, the emission probability is given by b i = G(µ i ϕ(x t ), Σ i ϕ 2 (x t )) (4) where µ i is a vector and Σ i is a matrix, and ϕ is the distance of the input signal from the origin. Learning the emission probability is then simplified to one of determining the values of µ i and Σ i. We have developed a training algorithm for IOHMM. We follow Bengio s approach [3] to train IOHMMs under the EM framework. What is novel in our algorithm is the process of training the transition matrix with neural networks. Each entry in the transition matrix is trained with an independent neural network, after the M-step at each iteration. For this toy problem, our network has a single hidden layer with six nodes, two input 11

12 nodes (2D coordinates of the input data) and one output node (the transition probability from state S i to state S j ). A bias node is further added to the input and hidden layers. Details of our training algorithm can be found in Appendix B. The learnt transition probabilities are clearly dependent on the input, as shown in Figure 4(c)-(f) for four different points. For example, the point A which is located at the boundary of state S 0 and state S 1 has a transition matrix in Figure 4(c). It shows that the next output can stay at either state S 0 or S 1, but not at S 2 or S 3 because a 22 = a 33 = 0. In other words, there is a strong tendency to transit to these two states no matter what the current state is, as long as the input falls at location A. As we move from A to B, it becomes more likely to transit to state S 1 than to S 0, as shown in Figure 4(d). When the input point C is at the mean (for the local observation distribution) of state S 1, the transition matrix will be simplified to a zero matrix except for the column corresponding to state S 1 whose entries are all equal to 1, as shown in Figure 4(e). It implies that the next state must be S 1 after the transition. Figure 4(f) shows the transition matrix when the input point is at point D. Figure 4(f) has a similar structure to Figure 4(d) except that the next output is more likely to be on state S 0. Because of the significant constraint by the transition matrix, the synthesis will most likely yield correct state transitions even if the output is sampled from a wrong state at some time instant. From the input data in Figure 5(a), we obtain the synthesis result shown in Figure 5(b). As expected, the output is distributed around one of the two diamonds, similar to the training data. Moreover, transitions between different states are correct as shown by the arrows in Figure 5(b). Depending on whether the input is on the inner or the outer circle, the output samples form two Gaussian distributions that belong to the same state S 2 (the dotted blue circle in Figure 5(b)). Because temporal information is not used in training and only four states are used, the synthesized output trajectory does not follow the two diamonds exactly. 3 Synthesizing facial animation from audio We apply IOHMMs to synthesize facial animations from audio. 12

13 (a) (b) Figure 5: The synthesis results using IOHMM. (a) The input sequence. The dots are actual samples, curly lines with arrows show the moving trajectory and the dotted line indicates a jump. (b) The output. The dots are the sampled output signal. Solid ellipses are the local distributions fully specified given the input. The dotted circle indicates the state 2 which two local distributions belong to. Arrows show the transition between states and local distributions. 3.1 Audio-visual signal representation In our system, we use the trajectories of 3D points on the face of an actor and his voice as the training data. In total, 150 points are tracked. We use principal component analysis (PCA) to compress the 450 dimensional feature vector into a 15-dimensional feature vector that covers 97% of the variance. poses. Figure 6 shows the data points at different We use an 18-dimensional feature vector to represent vocal signals. Instead of the traditional phoneme-viseme mapping, we use low-level acoustic features such as MFCC and energy as the input. The input audio sequence is blocked into frames with the same size as the captured video. In order to capture more dynamics in the vocal feature, we also calculate the delta parameters for MFCC and energy. Speech energy is an important vocal feature because it plays an important role in controlling facial expressions. 13

14 Figure 6: Training data used in our experiments. (Courtesy of Dr. Brian Guenter at Microsoft) (a) An original image with marked feature points; (b)(c) tracked 3D points at two different poses. 3.2 Training In our application, both the input (vocal feature vector) and output (facial expression) are continuous high-dimensional random variables. Furthermore, we have no prior knowledge about the mapping relations between the input and output. Therefore, training such an IOHMM is much more complex than the toy problem proposed in the preceding section. In Bengio s work [3], the local mapping model and state transition probabilities are modeled as neural networks. Although this architecture can be trained using the generalized EM (GEM) algorithm [8], training so many neural networks is non-trivial. To simplify the training process, we first quantify the input into K classes, each of which has its own mean and variance. A new audio frame a can be classified by calculating the Mahalanobis distance: if a class m (5) m = arg min(a µ ai ) T Σ 1 i ai (a µ ai ), i = 1,..., K (6) where µ ai and Σ ai are the mean and variance for class i. For each class, the conditional distribution is modeled by K Gaussians, each of which corresponds to a specific audio 14

15 Figure 7: Training IOHMM from input audio signals and corresponding visual signals. The model consists of three parts: a state machine, emission probability conditional on the input, and transition probability matrix conditional on the input. class. Then the emission probability for state i can be represented by b i (t) = G(µ vik, Σ vik ) (7) if a t belongs to the class k. Note that µ vik and Σ vik are the mean and variance of the output distribution for class k at state i. These parameters need to be learnt. In our system, the transition probabilities are modeled by N N neural networks which are similar to those used in the toy problem in the last section. The training algorithm shown in Appendix B can be easily applied to our system with a little modification. In the E-step, the emission probability should be computed by equation 7. In the M-step, the emission probability parameters are updated by: µ vik = T t=1,a t class m γ t(i)v t T t=1,a t class m γ t(i) (8) Σ vik = T t=1,a t class m γ t(i)(v t µ vik )(v t µ vik ) T T t=1,a t class m γ t(i) (9) Figure 7 shows our training algorithm. A trained HMM consists of three parts: a state machine, an emission probability for each state conditional on the input, and a transition matrix conditional to the input. 15

16 Figure 8: Synthesizing visual signals from an IOHMM and input audio signals. Steps 1 and 2 are the initialization. Steps 3, 4 and 5 are iterated until convergence. 3.3 Synthesis Given a new audio sequence, we can apply the model to synthesize the most likely visual sequence that best fits the model. In IOHMM, the state output probabilities and transition probabilities are conditionally dependent on the input. Therefore, the synthesized sequence is the most likely one that satisfies V = arg max P (V A, λ) (10) V where V is the visual sequence and A the audio sequence. There are three steps in the synthesis process as shown in Figure 8: Initialization. At time 1 we choose q 1 according to the model prior state probabilities π. Then at time t we choose (randomly sample) q t according to P (q t x t, q t 1 ) (where the R.H.S. is known), and we randomly sample y t according to P (y t x t, q t ). We obtain an initial estimation of the output sequence by repeating this process, V 1 = (v 11, v 12,..., v 1T ). 16

17 Figure 9: A few frames from a synthesized video sequence of Dr. King s speech. The synthesis uses a single picture. Iteration. The observation is complete after we obtain the initial output sequence. For each iteration we run a forward-backward process, after which an occupancy matrix γ t (i) can be obtained. γ t (i) represents the probability of being in state i at time t, given the input/output sequence and model. Then the synthesized output can be updated by equation 23. Termination. Given the fixed model parameters and input sequence, the most likely output sequence can be obtained when the change of the likelihood is below a threshold. We prove in Appendix C that the above iterative algorithm will converge to an optimal solution under the EM framework. In our experiments, we found that the synthesis sequence tends to converge to the means of the states. This can be explained by the blurring and muting effects in HMM. However, since we have K distributions for a given state which correspond to K audio classes, the expressive power is sufficient. In fact, those fine details which are expected for the synthesis are supplied mainly by the local mapping (one output distribution for each class at each state). 4 Experimental results In our experiment, we have used frames or seconds of video. The video consists of 189 short sentences. The input audio is clustered into 15 classes. The training process takes about 5 minutes to converge on a mid-level PC. We first apply the learnt 17

18 Figure 10: Several frames of synthesized cartoon video sequence. Several cartoon template images are used in the animation. Comparison Error S1-B S2-B A-B Table 1: A comparison between the synthesized result and ground truth. A is the ground truth, B is the PCA reconstructed result (also ground truth), S1 is the synthesis result initialized by B, S2 is the synthesis result initialized by random sampling. IOHMM to synthesize facial expressions from the audio in the training set. The audio sequence used for comparison is not used for training the IOHMM. Table 1 shows the comparison with ground truth. To compute the reconstruction error, we calculate the minimum distance of each feature point to the face model reconstructed after PCA (with 97% variance covered). The error shown in the table is the summed error normalized by all feature points. We can conclude from the table that reconstruction quality is good because the reconstruction errors (with two different initialization schemes) are on the same order of the error between the original and the PCA-reconstructed model. Figure 9 shows the result of animating a single picture using our model. Several frames from the synthesized sequence of the famous speech of Dr. Martin Luther King, I have a dream, are shown in the figure. The sequence shows significant facial movement. Using several cartoon templates with different poses and expressions, we can also animate a long sequence of cartoon. Several animated cartoon frames are shown in Figure 10. Because we use a 3D model with 150 feature points, we clearly observe facial expressions over the whole face in the animation sequences (Please see the accompanying video for the complete demo 1 ). 1 We apologize to the reviewers that the video can only be played in Microsoft Mediaplayer 18

19 We have encountered some difficulties when using the IOHMM for synthesis. The most difficult problem is to synthesize facial expression when the character is silent. There is no clear mapping from silence to facial expression. Some high-level knowledge must be applied to tackle this problem. Similarly, we have difficulties synthesizing some facial expressions that do not correspond to vocal signals. An example is the frowning expression. 5 Conclusion and future work In this paper we have studied the problem of dynamic audio/visual mapping, specifically, by formulating audio/visual mapping as an IOHMM problem. A key observation is that IOHMM is better suited than conventional HMM for synthesis because it can synthesize structures that are finer than states. Moreover, because transition probabilities in IOHMM are conditional to the input, it is more likely that the synthesized state sequence will be correct. An IOHMM model is trained under the EM framework, where each transition probability is modeled by a single neural network and updated at each iteration. Given the input audio signal, a facial animation sequence is generated by the maximum likelihood principle. Our experimental results from a single image and from a sequence of cartoon template images demonstrate that our synthesis results are of good quality. An interesting idea is to drive facial animation by emotions. While we have studied synthesizing facial expressions from audio in this paper, the very idea of IOHMM is also applicable to other dynamic input/output mappings. We plan to build a complete cartoon video rewrite system by combining cartoon animations from different poses/emotions. References [1] Y Bengio. Markovian models for sequential data. Neural Computing Survey 2, pages ,

20 [2] Y Bengio. Personal conmmunication, [3] Y Bengio and P Frasconi. Input/output HMMs for sequence processing. IEEE Trans on Neural Network, pages , [4] M Brand. Voice puppetry. In Proc. SIGGRAPH99, pages 21 28, [5] C Bregler, M Covell, and M Slaney. Video rewrite: Driving visual speech with audio. In Proc. SIGGRAPH97, pages , [6] T Chen and R Rao. Audio-visual integration in multimodal communication. Proceedings of IEEE, pages , May [7] K H Choi and J N Hwang. Baum-welch hidden Markov model inversion for reliable audio-to-visual conversion. In IEEE 3rd Workshop on Multimedia Signal Processing, pages , [8] A P Dempster, N M Laird, and D B Rubin. Maximum-likelihood from incomplete data via the EM algorithm. Journal of Royal Statistical Society B, 39:1 38, [9] B Guenter, C Grimm, D Wood, H Malvar, and F Pighin. Making faces. In Proc. SIGGRAPH98, pages 55 66, [10] S Levinson, L Rabiner, and M. Sondhi. An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition. Bell System Technical Journal, 64(4): , [11] J Pearl. Probabilistic reasoning in intelligent system: Networks of plausible inference. Morgan Kaufmann, [12] L Rabiner. A tutorial on hidden Markov models and selected applicaitons in speech recognition. Proceedings of IEEE, pages , Feb

21 Appendix A. A list of symbols used in IOHMM λ = (A, B, π), model parameter. A = {a ij (t)} = {P (q t = S j q t 1 = S i, x t )}, state transition probability; it is a conditional distribution on x t. B = {b i (t)} = {P (y t q t = S i, x t )}, emission probability for state i; it is a conditional distribution on x t. π i = P (q 0 = S i ), 1 i N, initial probability of state S i. S = {S 1, S 2,..., S N }, a set of N states. X = x 1, x 2,..., x T, input sequence. Y = y 1, y 2,..., y T, input sequence. N, the number of states in an IOHMM. T, size of the synchronized input/output sequence. α t (i) = P (y1, t q t = S i x t 1, λ), the probability of the joint event that y 1 y 2 y t are observed, and the state at time t is s i, given the input sequence x 1 x 2 x t and model λ. β t (i) = P (yt+1 q T t = S i, x T t+1, λ), the probability of the partial observation sequence from t + 1 to the end, given state S i at time t, the input sequence x t+1 x t+2 x T and model λ. γ t (i) = P (q t = S i x T 1, y1 T, λ), the probability of being in S i at time t, given the input/output sequence and model λ. ξ t (i, j) = P (q t = S i, q t+1 = S j x T 1, y1 T, λ), the probability of being in S i at time t, and S j at time t + 1, given the input/output sequence and model λ. 21

22 B. Training algorithm for IOHMM In our experiment, we use a back-propagation neural network to train the model. Therefore, the transition probability from state S i to state S j given the current input x t can be computed by running the network: a ij (t) = N ij (x t ) (11) Under this configuration, the IOHMM for this system can be trained under the EM framework: Initialization We initialize randomly the values of µ i and σ i, for each state S i. The initial state probabilities and transition probabilities are initialized uniformly. neural networks for the transition matrix are also initialized. The N N E-step In the E-step, we calculate the forward variables α t (i) and backward variables β t (i): α t (i) = P (y1, t q t = S i x t 1, λ) (12) = N [ α t (j)a ji (t)]b i (y t x t ) (13) j=1 β t (i) = P (yt+1 q T t = S i, x T t+1, λ) (14) = N a ij (t)b j (y t+1 x t+1 )β t+1 (j) (15) j=1 We also obtain ξ t (i, j), which is the probability of being in state S i at time t, and state S j at time t + 1, given the model and the input and output signals. It can be calculated by where ξ t (i, j) = α t(i)a ij (t)b j (y t+1 x t+1 )β t+1 (j) P (y T 1 x T 1, λ) (16) N N P (y1 T x T 1, λ) = α t (i)a ij (t)b j (y t+1 x t+1 )β t+1 (j) (17) i=1 j=1 22

23 Let γ t (i) be the probability of being in state S i at time t, given the input and output sequence and the model λ. Then we can form an occupancy matrix by calculating γ t (i) = P (q t = S i y T 1, x T 1, λ) = α t(i)β t (i) Ni=1 α t (i)β t (i) (18) It can be seen that the forward-backward procedure of Baum-Welch algorithm used in conventional HMMs can be applied to IOHMM training as well. However, the emission probability and transition probability at time t must be updated according to the input at that time. M-step The M-step in IOHMM training should adapt to the representation of the emission and transition probabilities. In this case, the parameters can be updated as follows: ˆµ i = Tt=1 γ t (i)(y t /ϕ(x t )) Tt=1 γ t (i) (19) ˆσ i = Tt=1 γ t (i)((y t ˆµ i )(y t ˆµ i ) T /ϕ 2 (x t )) Tt=1 γ t (i) (20) NN-step In order to train the transition matrix (each entry is a neural network), we first normalize ξ t (i, j) by: ˆξ t (i, j) = ξ t (i, j) Nj=1, (i, j = 1,..., N) (21) ξ t (i, j) The above normalization is necessary to ensure that each row of ˆξ t (i, j) has unit length, i.e., T j=1 ˆξt (i, j) = 1, although ˆξ t (i, j) has been normalized according to the whole matrix in Equation 16 in the E-step. Then N ij can be updated by giving T training samples {x t, ˆξ t (i, j)}, t = 1,..., T. The networks are initialized randomly in the first iteration. At each new step, training begins with the node weights converged in previous iteration. The whole model will converge after several steps. 23

24 C. Proof of synthesis algorithm In synthesis, we try to find the optimal visual sequence V given the input audio A. Using the EM algorithm, the optimal visual sequence V can be obtained by maximizing the auxiliary function Q(V V ), i.e., V = arg max Q(V V ) (22) V where V and V denotes the visual sequence before and after each iteration. Q(V V ) = E[ln P (V, S A, λ)] = S [P (V, S A, λ) ln P (V, S A, λ)] = T 1 [P (V A, λ)(ln π q0 + ln a qtqt+1 S t=1 T + ln b qt (v t))] t=1 where S is the state sequence, λ is the learnt model parameter. Setting the derivative of the above function to zero, and noting that the state observation is modeled as a Gaussian, we have Q(V V ) v t = [P (V A, λ) ln b v t qt (v t))] S N = P (V, q t = S i A, λ) ln b i=1 v t si (v t)) N = P (V, q t = S i A, λ)σ 1 i (v t µ i ) i=1 where µ i and Σ i are the mean vector and covariance matrix for state S i. The ML estimation of the visual sequence can be obtained by v t = Ni=1 γ t (i)µ i Ni=1 γ t (i) (23) where γ t (i) = P (V, q t = i A, λ) can be computed by the forward-backward process. It should be noted that a similar algorithm has been proposed, named inverted HMM [7]. But here we assume the visual signal conditionally depends on the audio input instead of modeling them by joint distributions [7]. 24

Shankar Shivappa University of California, San Diego April 26, CSE 254 Seminar in learning algorithms

Shankar Shivappa University of California, San Diego April 26, CSE 254 Seminar in learning algorithms Recognition of Visual Speech Elements Using Adaptively Boosted Hidden Markov Models. Say Wei Foo, Yong Lian, Liang Dong. IEEE Transactions on Circuits and Systems for Video Technology, May 2004. Shankar

More information

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

Master 2 Informatique Probabilistic Learning and Data Analysis

Master 2 Informatique Probabilistic Learning and Data Analysis Master 2 Informatique Probabilistic Learning and Data Analysis Faicel Chamroukhi Maître de Conférences USTV, LSIS UMR CNRS 7296 email: chamroukhi@univ-tln.fr web: chamroukhi.univ-tln.fr 2013/2014 Faicel

More information

Machine Learning for natural language processing

Machine Learning for natural language processing Machine Learning for natural language processing Hidden Markov Models Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 33 Introduction So far, we have classified texts/observations

More information

Hidden Markov Model and Speech Recognition

Hidden Markov Model and Speech Recognition 1 Dec,2006 Outline Introduction 1 Introduction 2 3 4 5 Introduction What is Speech Recognition? Understanding what is being said Mapping speech data to textual information Speech Recognition is indeed

More information

Hidden Markov Models and Gaussian Mixture Models

Hidden Markov Models and Gaussian Mixture Models Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 23&27 January 2014 ASR Lectures 4&5 Hidden Markov Models and Gaussian

More information

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes (bilmes@cs.berkeley.edu) International Computer Science Institute

More information

HMM and IOHMM Modeling of EEG Rhythms for Asynchronous BCI Systems

HMM and IOHMM Modeling of EEG Rhythms for Asynchronous BCI Systems HMM and IOHMM Modeling of EEG Rhythms for Asynchronous BCI Systems Silvia Chiappa and Samy Bengio {chiappa,bengio}@idiap.ch IDIAP, P.O. Box 592, CH-1920 Martigny, Switzerland Abstract. We compare the use

More information

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang Chapter 4 Dynamic Bayesian Networks 2016 Fall Jin Gu, Michael Zhang Reviews: BN Representation Basic steps for BN representations Define variables Define the preliminary relations between variables Check

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Dynamic Approaches: The Hidden Markov Model

Dynamic Approaches: The Hidden Markov Model Dynamic Approaches: The Hidden Markov Model Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Inference as Message

More information

Multiscale Systems Engineering Research Group

Multiscale Systems Engineering Research Group Hidden Markov Model Prof. Yan Wang Woodruff School of Mechanical Engineering Georgia Institute of echnology Atlanta, GA 30332, U.S.A. yan.wang@me.gatech.edu Learning Objectives o familiarize the hidden

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Dr Philip Jackson Centre for Vision, Speech & Signal Processing University of Surrey, UK 1 3 2 http://www.ee.surrey.ac.uk/personal/p.jackson/isspr/ Outline 1. Recognizing patterns

More information

Samy Bengioy, Yoshua Bengioz. y INRS-Telecommunications, 16, Place du Commerce, Ile-des-Soeurs, Qc, H3E 1H6, CANADA

Samy Bengioy, Yoshua Bengioz. y INRS-Telecommunications, 16, Place du Commerce, Ile-des-Soeurs, Qc, H3E 1H6, CANADA An EM Algorithm for Asynchronous Input/Output Hidden Markov Models Samy Bengioy, Yoshua Bengioz y INRS-Telecommunications, 6, Place du Commerce, Ile-des-Soeurs, Qc, H3E H6, CANADA z Dept. IRO, Universite

More information

University of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I

University of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I University of Cambridge MPhil in Computer Speech Text & Internet Technology Module: Speech Processing II Lecture 2: Hidden Markov Models I o o o o o 1 2 3 4 T 1 b 2 () a 12 2 a 3 a 4 5 34 a 23 b () b ()

More information

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 Hidden Markov Model Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/19 Outline Example: Hidden Coin Tossing Hidden

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data Statistical Machine Learning from Data Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne (EPFL),

More information

A Higher-Order Interactive Hidden Markov Model and Its Applications Wai-Ki Ching Department of Mathematics The University of Hong Kong

A Higher-Order Interactive Hidden Markov Model and Its Applications Wai-Ki Ching Department of Mathematics The University of Hong Kong A Higher-Order Interactive Hidden Markov Model and Its Applications Wai-Ki Ching Department of Mathematics The University of Hong Kong Abstract: In this talk, a higher-order Interactive Hidden Markov Model

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

O 3 O 4 O 5. q 3. q 4. Transition

O 3 O 4 O 5. q 3. q 4. Transition Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in

More information

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg Temporal Reasoning Kai Arras, University of Freiburg 1 Temporal Reasoning Contents Introduction Temporal Reasoning Hidden Markov Models Linear Dynamical Systems (LDS) Kalman Filter 2 Temporal Reasoning

More information

An Asynchronous Hidden Markov Model for Audio-Visual Speech Recognition

An Asynchronous Hidden Markov Model for Audio-Visual Speech Recognition An Asynchronous Hidden Markov Model for Audio-Visual Speech Recognition Samy Bengio Dalle Molle Institute for Perceptual Artificial Intelligence (IDIAP) CP 592, rue du Simplon 4, 1920 Martigny, Switzerland

More information

Bayesian Hidden Markov Models and Extensions

Bayesian Hidden Markov Models and Extensions Bayesian Hidden Markov Models and Extensions Zoubin Ghahramani Department of Engineering University of Cambridge joint work with Matt Beal, Jurgen van Gael, Yunus Saatci, Tom Stepleton, Yee Whye Teh Modeling

More information

SYMBOL RECOGNITION IN HANDWRITTEN MATHEMATI- CAL FORMULAS

SYMBOL RECOGNITION IN HANDWRITTEN MATHEMATI- CAL FORMULAS SYMBOL RECOGNITION IN HANDWRITTEN MATHEMATI- CAL FORMULAS Hans-Jürgen Winkler ABSTRACT In this paper an efficient on-line recognition system for handwritten mathematical formulas is proposed. After formula

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Human Mobility Pattern Prediction Algorithm using Mobile Device Location and Time Data

Human Mobility Pattern Prediction Algorithm using Mobile Device Location and Time Data Human Mobility Pattern Prediction Algorithm using Mobile Device Location and Time Data 0. Notations Myungjun Choi, Yonghyun Ro, Han Lee N = number of states in the model T = length of observation sequence

More information

Hierarchical Clustering of Dynamical Systems based on Eigenvalue Constraints

Hierarchical Clustering of Dynamical Systems based on Eigenvalue Constraints Proc. 3rd International Conference on Advances in Pattern Recognition (S. Singh et al. (Eds.): ICAPR 2005, LNCS 3686, Springer), pp. 229-238, 2005 Hierarchical Clustering of Dynamical Systems based on

More information

10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course)

10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course) 10. Hidden Markov Models (HMM) for Speech Processing (some slides taken from Glass and Zue course) Definition of an HMM The HMM are powerful statistical methods to characterize the observed samples of

More information

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Course 495: Advanced Statistical Machine Learning/Pattern Recognition Course 495: Advanced Statistical Machine Learning/Pattern Recognition Lecturer: Stefanos Zafeiriou Goal (Lectures): To present discrete and continuous valued probabilistic linear dynamical systems (HMMs

More information

Hidden Markov Models and Gaussian Mixture Models

Hidden Markov Models and Gaussian Mixture Models Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 25&29 January 2018 ASR Lectures 4&5 Hidden Markov Models and Gaussian

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Hidden Markov Models Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Additional References: David

More information

Hidden Markov Models Hamid R. Rabiee

Hidden Markov Models Hamid R. Rabiee Hidden Markov Models Hamid R. Rabiee 1 Hidden Markov Models (HMMs) In the previous slides, we have seen that in many cases the underlying behavior of nature could be modeled as a Markov process. However

More information

10 : HMM and CRF. 1 Case Study: Supervised Part-of-Speech Tagging

10 : HMM and CRF. 1 Case Study: Supervised Part-of-Speech Tagging 10-708: Probabilistic Graphical Models 10-708, Spring 2018 10 : HMM and CRF Lecturer: Kayhan Batmanghelich Scribes: Ben Lengerich, Michael Kleyman 1 Case Study: Supervised Part-of-Speech Tagging We will

More information

Inference and estimation in probabilistic time series models

Inference and estimation in probabilistic time series models 1 Inference and estimation in probabilistic time series models David Barber, A Taylan Cemgil and Silvia Chiappa 11 Time series The term time series refers to data that can be represented as a sequence

More information

Lecture 11: Hidden Markov Models

Lecture 11: Hidden Markov Models Lecture 11: Hidden Markov Models Cognitive Systems - Machine Learning Cognitive Systems, Applied Computer Science, Bamberg University slides by Dr. Philip Jackson Centre for Vision, Speech & Signal Processing

More information

L23: hidden Markov models

L23: hidden Markov models L23: hidden Markov models Discrete Markov processes Hidden Markov models Forward and Backward procedures The Viterbi algorithm This lecture is based on [Rabiner and Juang, 1993] Introduction to Speech

More information

Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition

Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition ABSTRACT It is well known that the expectation-maximization (EM) algorithm, commonly used to estimate hidden

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Lecture Notes Speech Communication 2, SS 2004 Erhard Rank/Franz Pernkopf Signal Processing and Speech Communication Laboratory Graz University of Technology Inffeldgasse 16c, A-8010

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 24. Hidden Markov Models & message passing Looking back Representation of joint distributions Conditional/marginal independence

More information

Hidden Markov Models

Hidden Markov Models 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 22 April 2, 2018 1 Reminders Homework

More information

Hidden Markov model. Jianxin Wu. LAMDA Group National Key Lab for Novel Software Technology Nanjing University, China

Hidden Markov model. Jianxin Wu. LAMDA Group National Key Lab for Novel Software Technology Nanjing University, China Hidden Markov model Jianxin Wu LAMDA Group National Key Lab for Novel Software Technology Nanjing University, China wujx2001@gmail.com May 7, 2018 Contents 1 Sequential data and the Markov property 2 1.1

More information

Hidden Markov Models. Dr. Naomi Harte

Hidden Markov Models. Dr. Naomi Harte Hidden Markov Models Dr. Naomi Harte The Talk Hidden Markov Models What are they? Why are they useful? The maths part Probability calculations Training optimising parameters Viterbi unseen sequences Real

More information

Math 350: An exploration of HMMs through doodles.

Math 350: An exploration of HMMs through doodles. Math 350: An exploration of HMMs through doodles. Joshua Little (407673) 19 December 2012 1 Background 1.1 Hidden Markov models. Markov chains (MCs) work well for modelling discrete-time processes, or

More information

Hidden Markov Models in Language Processing

Hidden Markov Models in Language Processing Hidden Markov Models in Language Processing Dustin Hillard Lecture notes courtesy of Prof. Mari Ostendorf Outline Review of Markov models What is an HMM? Examples General idea of hidden variables: implications

More information

1. Markov models. 1.1 Markov-chain

1. Markov models. 1.1 Markov-chain 1. Markov models 1.1 Markov-chain Let X be a random variable X = (X 1,..., X t ) taking values in some set S = {s 1,..., s N }. The sequence is Markov chain if it has the following properties: 1. Limited

More information

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence Page Hidden Markov models and multiple sequence alignment Russ B Altman BMI 4 CS 74 Some slides borrowed from Scott C Schmidler (BMI graduate student) References Bioinformatics Classic: Krogh et al (994)

More information

A New OCR System Similar to ASR System

A New OCR System Similar to ASR System A ew OCR System Similar to ASR System Abstract Optical character recognition (OCR) system is created using the concepts of automatic speech recognition where the hidden Markov Model is widely used. Results

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010 Hidden Markov Models Aarti Singh Slides courtesy: Eric Xing Machine Learning 10-701/15-781 Nov 8, 2010 i.i.d to sequential data So far we assumed independent, identically distributed data Sequential data

More information

Parametric Models Part III: Hidden Markov Models

Parametric Models Part III: Hidden Markov Models Parametric Models Part III: Hidden Markov Models Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2014 CS 551, Spring 2014 c 2014, Selim Aksoy (Bilkent

More information

CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm

CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm + September13, 2016 Professor Meteer CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm Thanks to Dan Jurafsky for these slides + ASR components n Feature

More information

Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features

Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Heiga ZEN (Byung Ha CHUN) Nagoya Inst. of Tech., Japan Overview. Research backgrounds 2.

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Hidden Markov Modelling

Hidden Markov Modelling Hidden Markov Modelling Introduction Problem formulation Forward-Backward algorithm Viterbi search Baum-Welch parameter estimation Other considerations Multiple observation sequences Phone-based models

More information

Robert Collins CSE586 CSE 586, Spring 2015 Computer Vision II

Robert Collins CSE586 CSE 586, Spring 2015 Computer Vision II CSE 586, Spring 2015 Computer Vision II Hidden Markov Model and Kalman Filter Recall: Modeling Time Series State-Space Model: You have a Markov chain of latent (unobserved) states Each state generates

More information

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing Hidden Markov Models By Parisa Abedi Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed data Sequential (non i.i.d.) data Time-series data E.g. Speech

More information

Hidden Markov Models

Hidden Markov Models CS769 Spring 2010 Advanced Natural Language Processing Hidden Markov Models Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu 1 Part-of-Speech Tagging The goal of Part-of-Speech (POS) tagging is to label each

More information

Temporal Modeling and Basic Speech Recognition

Temporal Modeling and Basic Speech Recognition UNIVERSITY ILLINOIS @ URBANA-CHAMPAIGN OF CS 498PS Audio Computing Lab Temporal Modeling and Basic Speech Recognition Paris Smaragdis paris@illinois.edu paris.cs.illinois.edu Today s lecture Recognizing

More information

order is number of previous outputs

order is number of previous outputs Markov Models Lecture : Markov and Hidden Markov Models PSfrag Use past replacements as state. Next output depends on previous output(s): y t = f[y t, y t,...] order is number of previous outputs y t y

More information

CS 5522: Artificial Intelligence II

CS 5522: Artificial Intelligence II CS 5522: Artificial Intelligence II Hidden Markov Models Instructor: Alan Ritter Ohio State University [These slides were adapted from CS188 Intro to AI at UC Berkeley. All materials available at http://ai.berkeley.edu.]

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Hidden Markov Models Barnabás Póczos & Aarti Singh Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed

More information

CS 5522: Artificial Intelligence II

CS 5522: Artificial Intelligence II CS 5522: Artificial Intelligence II Hidden Markov Models Instructor: Wei Xu Ohio State University [These slides were adapted from CS188 Intro to AI at UC Berkeley.] Pacman Sonar (P4) [Demo: Pacman Sonar

More information

Recall: Modeling Time Series. CSE 586, Spring 2015 Computer Vision II. Hidden Markov Model and Kalman Filter. Modeling Time Series

Recall: Modeling Time Series. CSE 586, Spring 2015 Computer Vision II. Hidden Markov Model and Kalman Filter. Modeling Time Series Recall: Modeling Time Series CSE 586, Spring 2015 Computer Vision II Hidden Markov Model and Kalman Filter State-Space Model: You have a Markov chain of latent (unobserved) states Each state generates

More information

HIDDEN MARKOV MODELS IN SPEECH RECOGNITION

HIDDEN MARKOV MODELS IN SPEECH RECOGNITION HIDDEN MARKOV MODELS IN SPEECH RECOGNITION Wayne Ward Carnegie Mellon University Pittsburgh, PA 1 Acknowledgements Much of this talk is derived from the paper "An Introduction to Hidden Markov Models",

More information

We Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named

We Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named We Live in Exciting Times ACM (an international computing research society) has named CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Apr. 2, 2019 Yoshua Bengio,

More information

Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391

Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391 Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391 Parameters of an HMM States: A set of states S=s 1, s n Transition probabilities: A= a 1,1, a 1,2,, a n,n

More information

CS 188: Artificial Intelligence Fall 2011

CS 188: Artificial Intelligence Fall 2011 CS 188: Artificial Intelligence Fall 2011 Lecture 20: HMMs / Speech / ML 11/8/2011 Dan Klein UC Berkeley Today HMMs Demo bonanza! Most likely explanation queries Speech recognition A massive HMM! Details

More information

We Prediction of Geological Characteristic Using Gaussian Mixture Model

We Prediction of Geological Characteristic Using Gaussian Mixture Model We-07-06 Prediction of Geological Characteristic Using Gaussian Mixture Model L. Li* (BGP,CNPC), Z.H. Wan (BGP,CNPC), S.F. Zhan (BGP,CNPC), C.F. Tao (BGP,CNPC) & X.H. Ran (BGP,CNPC) SUMMARY The multi-attribute

More information

Statistical NLP: Hidden Markov Models. Updated 12/15

Statistical NLP: Hidden Markov Models. Updated 12/15 Statistical NLP: Hidden Markov Models Updated 12/15 Markov Models Markov models are statistical tools that are useful for NLP because they can be used for part-of-speech-tagging applications Their first

More information

Hidden Markov Models (HMMs)

Hidden Markov Models (HMMs) Hidden Markov Models (HMMs) Reading Assignments R. Duda, P. Hart, and D. Stork, Pattern Classification, John-Wiley, 2nd edition, 2001 (section 3.10, hard-copy). L. Rabiner, "A tutorial on HMMs and selected

More information

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma COMS 4771 Probabilistic Reasoning via Graphical Models Nakul Verma Last time Dimensionality Reduction Linear vs non-linear Dimensionality Reduction Principal Component Analysis (PCA) Non-linear methods

More information

Hidden Markov Models Part 2: Algorithms

Hidden Markov Models Part 2: Algorithms Hidden Markov Models Part 2: Algorithms CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Hidden Markov Model An HMM consists of:

More information

Hidden Markov models

Hidden Markov models Hidden Markov models Charles Elkan November 26, 2012 Important: These lecture notes are based on notes written by Lawrence Saul. Also, these typeset notes lack illustrations. See the classroom lectures

More information

Design and Implementation of Speech Recognition Systems

Design and Implementation of Speech Recognition Systems Design and Implementation of Speech Recognition Systems Spring 2013 Class 7: Templates to HMMs 13 Feb 2013 1 Recap Thus far, we have looked at dynamic programming for string matching, And derived DTW from

More information

Bayesian Networks Inference with Probabilistic Graphical Models

Bayesian Networks Inference with Probabilistic Graphical Models 4190.408 2016-Spring Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang intelligence Lab Seoul National University 4190.408 Artificial (2016-Spring) 1 Machine Learning? Learning

More information

Template-Based Representations. Sargur Srihari

Template-Based Representations. Sargur Srihari Template-Based Representations Sargur srihari@cedar.buffalo.edu 1 Topics Variable-based vs Template-based Temporal Models Basic Assumptions Dynamic Bayesian Networks Hidden Markov Models Linear Dynamical

More information

Mixtures of Gaussians with Sparse Structure

Mixtures of Gaussians with Sparse Structure Mixtures of Gaussians with Sparse Structure Costas Boulis 1 Abstract When fitting a mixture of Gaussians to training data there are usually two choices for the type of Gaussians used. Either diagonal or

More information

Modeling Timing Structure in Multimedia Signals

Modeling Timing Structure in Multimedia Signals Modeling Timing Structure in Multimedia Signals Hiroaki Kawashima, Kimitaka Tsutsumi, and Takashi Matsuyama Kyoto University, Yoshida-Honmachi Sakyo, Kyoto 6068501, JAPAN, {kawashima,tm}@i.kyoto-u.ac.jp,

More information

Hidden Markov Models

Hidden Markov Models 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 19 Nov. 5, 2018 1 Reminders Homework

More information

Hidden Markov Models Part 1: Introduction

Hidden Markov Models Part 1: Introduction Hidden Markov Models Part 1: Introduction CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Modeling Sequential Data Suppose that

More information

STATS 306B: Unsupervised Learning Spring Lecture 5 April 14

STATS 306B: Unsupervised Learning Spring Lecture 5 April 14 STATS 306B: Unsupervised Learning Spring 2014 Lecture 5 April 14 Lecturer: Lester Mackey Scribe: Brian Do and Robin Jia 5.1 Discrete Hidden Markov Models 5.1.1 Recap In the last lecture, we introduced

More information

Conditional Random Fields: An Introduction

Conditional Random Fields: An Introduction University of Pennsylvania ScholarlyCommons Technical Reports (CIS) Department of Computer & Information Science 2-24-2004 Conditional Random Fields: An Introduction Hanna M. Wallach University of Pennsylvania

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring Final Exam CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the

More information

CS Machine Learning Qualifying Exam

CS Machine Learning Qualifying Exam CS Machine Learning Qualifying Exam Georgia Institute of Technology March 30, 2017 The exam is divided into four areas: Core, Statistical Methods and Models, Learning Theory, and Decision Processes. There

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood

More information

p(d θ ) l(θ ) 1.2 x x x

p(d θ ) l(θ ) 1.2 x x x p(d θ ).2 x 0-7 0.8 x 0-7 0.4 x 0-7 l(θ ) -20-40 -60-80 -00 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ θ x FIGURE 3.. The top graph shows several training points in one dimension, known or assumed to

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 13: Learning in Gaussian Graphical Models, Non-Gaussian Inference, Monte Carlo Methods Some figures

More information

Learning Gaussian Process Models from Uncertain Data

Learning Gaussian Process Models from Uncertain Data Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada

More information

Christian Mohr

Christian Mohr Christian Mohr 20.12.2011 Recurrent Networks Networks in which units may have connections to units in the same or preceding layers Also connections to the unit itself possible Already covered: Hopfield

More information

PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS

PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS Jinjin Ye jinjin.ye@mu.edu Michael T. Johnson mike.johnson@mu.edu Richard J. Povinelli richard.povinelli@mu.edu

More information

Hidden Markov Models and other Finite State Automata for Sequence Processing

Hidden Markov Models and other Finite State Automata for Sequence Processing To appear in The Handbook of Brain Theory and Neural Networks, Second edition, (M.A. Arbib, Ed.), Cambridge, MA: The MIT Press, 2002. http://mitpress.mit.edu The MIT Press Hidden Markov Models and other

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Lecture 5: GMM Acoustic Modeling and Feature Extraction

Lecture 5: GMM Acoustic Modeling and Feature Extraction CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 5: GMM Acoustic Modeling and Feature Extraction Original slides by Dan Jurafsky Outline for Today Acoustic

More information

ASR using Hidden Markov Model : A tutorial

ASR using Hidden Markov Model : A tutorial ASR using Hidden Markov Model : A tutorial Samudravijaya K Workshop on ASR @BAMU; 14-OCT-11 samudravijaya@gmail.com Tata Institute of Fundamental Research Samudravijaya K Workshop on ASR @BAMU; 14-OCT-11

More information