ENHANCEMENTS OF MAXIMUM LIKELIHOOD EIGEN-DECOMPOSITION USING FUZZY LOGIC CONTROL FOR EIGENVOICE-BASED SPEAKER ADAPTATION.

Size: px
Start display at page:

Download "ENHANCEMENTS OF MAXIMUM LIKELIHOOD EIGEN-DECOMPOSITION USING FUZZY LOGIC CONTROL FOR EIGENVOICE-BASED SPEAKER ADAPTATION."

Transcription

1 International Journal of Innovative Computing, Information and Control ICIC International c 2011 ISSN Volume 7, Number 7(B), July 2011 pp ENHANCEMENTS OF MAXIMUM LIKELIHOOD EIGEN-DECOMPOSITION USING FUZZY LOGIC CONTROL FOR EIGENVOICE-BASED SPEAKER ADAPTATION Ing-Jr Ding Department of Electrical Engineering National Formosa University No. 64, Wunhua Rd., Huwei Township, Yunlin County 632, Taiwan ingjr@nfu.edu.tw Received May 2010; revised November 2010 Abstract. This paper presents a fuzzy logic control (FLC) mechanism for the popular eigenvoice-based speaker adaptation scheme. The proposed mechanism regulates the influence of maximum likelihood eigen-decomposition (MLED) when the training data from a new speaker is inadequate. The FLC-MLED method functions by accounting for the adaptation data when estimating the linear combination coefficients for eigenvector decomposition. This approach ensures the robustness of speaker adaptation against data scarcity. The proposed mechanism is conceptually simple, effective and computationally inexpensive. Experimental results indicate that FLC-MLED outperforms conventional MLED, especially when encountering data insufficiency. The proposed approach performs better than maximum a posteriori eigen-decomposition (MAPED) at a much lower computing cost. Keywords: Speech recognition, Speaker adaptation Takagi-Sugeno fuzzy logic controller, Maximum likelihood eigen-decomposition, Maximum a posteriori eigen-decomposition 1. Introduction. Computing techniques for automatic speech recognition (ASR) have existed for years [1-4]. As they have matured, these techniques have found more and more applications in everyday life [5]. Nevertheless, the recognition performance of any speech recognition system ever built is undeniably inferior to a human listener [6]. During speech recognition, variations in speech are strange to the system or known to the system only in poor vocal shape. These variations often cause a mismatch between the pre-established reference templates and the testing template, compromising recognition performance. Speaker adaptation (SA) sometimes referred to as model-based adaptation, can reduce this mismatch phenomenon. Speaker adaptation is the process of transforming a speaker independent (SI) speech recognition system into a speaker dependent (SD) system. This process achieves SD-like performance by adjusting the acoustic parameters of the SI speech model, typically in the form of hidden Markov models (HMM), with speech samples acquired from a target speaker. There are three major types of speaker-adaptive techniques: Bayesian-based adaptation, transformation-based adaptation and speaker-clustering-based adaptation. Bayesianbased model adaptation directly re-estimates the acoustic model parameters using maximum a posteriori (MAP) adaptation [7,8]. The Bayesian reasoning framework is an example of this approach. Transformation-based model adaptation, such as maximum likelihood linear regression (MLLR) [9] and maximum a posteriori linear regression (MAPLR) [10,11], must derive certain appropriate transformations from a set of adaptation utterances from a new speaker and then apply them to clusters of HMM parameters. 4207

2 4208 I.-J. DING Figure 1. Eigenvoice-based adaptation Eigenvoice-based adaptation [12-23] is a relatively new member of the speaker adaptation family, first appearing around 2000 [12]. This approach is also known as speakerclustering-based adaptation because it creates a SD speech model for every member in a group of speakers. This adaptation method extracts feature vectors called as eigenvoices from these groups to build an eigenvoice speech model (called an eigenvoice vector space) for a new speaker. The adaptation to the speech model then can be undertaken when adaptation data is available, as Figure 1 shows. The basic concept of eigenvoice-based adaptation is to employ a priori knowledge about the inter-speaker variation by analyzing the training speakers. This method applies principal component analysis (PCA) to construct the eigenvoice space using supervectors derived from SD speaker models [12]. These principal components are then used to build speaker-adaptive models for a new speaker through maximum likelihood eigendecomposition (MLED). In MLED, the linear combination coefficients for eigenvector decomposition are estimated via the maximum likelihood (ML) criterion. The MLED adaptation scheme proposed by Kuhn et al. [12] plays a key role in this type eigenvoice adaptation technique, and has been proven effective in many speech recognition applications. However, given insufficient adaptation utterances from a new speaker, the performance of MLED is questionable due to inaccurately estimated linear combination coefficients. In this case, the recognition rate may fall below the baseline, i.e., is worse than no adaptation at all (as shown by the experiments in this study). After Kuhn et al., researchers proposed a series of variants of the MLED scheme in an attempt to improve the quality of the estimated linear combination coefficients given insufficient adaptation data. However, the approaches for enhancing the robustness of MLED are complicated and time consuming in computation, preventing on-line adaptation applications. For example, the MAPED scheme [23], which estimates the linear combination coefficients by maximizing the posterior density using maximum a posteriori (MAP) theory [7,8], is a classic variant of MLED, but it spends much more time estimating the linear combination coefficients than the MLED scheme.

3 ENHANCEMENTS OF MAXIMUM LIKELIHOOD EIGEN-DECOMPOSITION 4209 This study proposes a fuzzy control mechanism to tackle the issue of the unreliable MLED estimation of the linear combination coefficients due to insufficient training data without incurring the high cost of the MAPED-like approach. Based on the amount of adaptation utterances available, the MLED approach can be regulated to exploit the rapidness of MLED in calculating the linear combination coefficients when the amount of training data allows, while simultaneously alleviating the undesired effect of poor estimation of the linear combination coefficients. The resulting implementation is called FLC-MLED, where FLC represents fuzzy logic control and indicates the underlying fuzzy mechanism incorporated in the conventional MLED. The use of FLC mechanism in estimating acoustic parameters for eigenvoice-based speaker adaptation has been rarely attempted. However, an adaptation method with the support of FLC has several advantages compared to those without: better performance in ordinary cases, robustness against the scarcity of training data, less computation in parameter estimation compared to other MLED-enhancement methods (e.g., the typical MAPED). Fuzzy control has been applied to a wide range of applications with great success [24], including speech recognition. Takagi-Sugeno (T-S) fuzzy model is conceptually simple and straightforward [25,26], and has appeared in the control of systems as complicated as an electric power plant [27-29]. Therefore, this study employs the T-S fuzzy model in eigenvoice speaker adaptation. The rest of the paper is organized as follows. Section 2 briefly describes the theoretical formulation of MLED and MAPED. This section also introduces the concept idea of incremental MLED eigenvoice adaptation under fuzzy regulation, followed by the formulation of the T-S fuzzy mechanism for model adaptation in this study. Section 2 also presents a complexity analysis of the proposed scheme and describes a future improvement of the proposed FLC-MLED approach. Section 3 presents the experiment results, which compare the effectiveness and performance of FLC-MLED with conventional MLED and MAPED. Finally, Section 4 provides some concluding remarks. 2. FLC-MLED Eigenvoice Adaptation. The basic idea of eigenvoice adaptation is to build a number of speaker clusters in advance, and then represent the model of the current speaker as an interpolated form of the weighted sum of the speaker clusters. R. Kuhn et al. [12] first proposed the eigenvoice adaptation using a priori knowledge concerning the variations among all training speakers was represented as the set of SD model parameters in the form of eigenvectors named eigenvoices. A new speaker model is then expressed as the linear combination of the set of eigenvoices. The eigenvoice approach greatly reduces the number of parameters to be estimated, however, is still capable of retaining the overall system characteristics to capture the variance between speakers. The eigenvoice approach must take care of two things: eigenvoice construction (the training phase) and coefficient estimation (the adaptation phase). Figure 1 shows that in the eigenvoice construction phase, a set of N well-trained SD models from N speakers must be established first. Then, the model parameters of each SD model are vectorized, forming a set of N supervectors. Space dimension reduction techniques, such as PCA, are then applied to the set of N supervectors to obtain N eigenvectors with dimension D, also called as eigenvoices. In general, the higher-order eigenvoices are thrown away and only the first K (where K < N D) eigenvoices are kept. These eigenvoices are significant because they possess most of the information from speech data and are thus capable of representing all the variations in considerations. Finally, using these K eigenvoices, an accurate speaker space K-space can be spanned and acquired. The

4 4210 I.-J. DING coefficient estimation phase then performs adaptation to determine the location of a new speaker in K-space. Let the supervector µ of the new speaker be conducted in K-space as follows µ = e(0) + w(1) e(1) + + w(k) e(k) K = e(0) + w(k)e(k), k=1 where e(0) is the mean vector of N supervectors. The problem here is to estimate the weights {w(k), k = 1, 2,..., K} that correspond to K eigenvectors {e(k), k = 1, 2,..., K}, to find a weighted combination of eigenvoices. In general, a classical eigen-decomposition scheme, such as MLED or MAPED [12,23], can derive the set of weight coefficients {w(k), k = 1, 2,..., K} using speaker specific adaptation data X. The following subsection briefly describes the theoretical formulations of MLED and MAPED MLED and MAPED. The MLED method used to estimate the weight coefficients is to solve [12]. ŵ ML = arg max P (X w). (2) w ŵ ML in Equatrion (2) can be solved by the expectation-maximization (E-M) algorithm [30]. In the E-step, the expectation is determined as Q ( w w ) = E [ log P (X, S, M w) X, w ] s m t γ (s) m (t) [ n log(2π) log C m (s) ] + h(x t, s, m), where γ m (s) (t) = P (s t = s, m t = m X, w) is the occupation probability that the observation data x t stays at state s and mixture m and µ (s) m h(x t, s, m) = ( µ (s) m x t ) T C (s) 1 m (1) (3) ( µ (s) m x t ). (4) in Equation (4) can be replaced with the corresponding linear combination of eigenvoices as follows: K µ m (s) = w(k)e (s) m (k). (5) k=1 The M-step then maximizes Q( w w). To maximize Q( w w), set ( Q( w w)/ w(j)) = 0, j = 1, 2,..., K. For each j, one obtains ( ) T γ m (s) (t) e (s) (s) 1 m (j) C m x t s m t = { γ (s) K ( ) } T (6) m (t) ŵ(k) e (s) (s) 1 m (k) C m e (s) m (j). s m t k=1 There are K equations to solve for the K unknown weights ŵ(1), ŵ(2),..., ŵ(k). Given minimal adaptation data, the estimate of combination coefficients by MLED will be inaccurate. Huang et al. presented the MAPED technique to ensure the robustness of the estimate of combination coefficients with insufficient adaptation data [23]. MAPED can take the prior density into account in the estimation process of the combination coefficients using a MAP criterion: ŵ MAP = arg max R(ŵ w). (7) ŵ

5 ENHANCEMENTS OF MAXIMUM LIKELIHOOD EIGEN-DECOMPOSITION 4211 The auxiliary function R(ŵ w) in Equation (7) is defined as follows R ( w w ) = S M s T { } γ m (s) (t) n log(2π) + log C (s) m + h(x t, s, m) s=1 m=1 { t=1 + K log(2π) + 2 log σ w(j) + (ŵ(j) µ } w(j)) 2, j=1 σ 2 w(j) where the coefficient w(j) is modeled by a Gaussian distribution with mean µ w(j) and variance σw(j) 2. To maximize R(ŵ w), set ( R(ŵ w)/ ŵ(j)) = 0, j = 1, 2,..., K. The combination coefficients ŵ(1), ŵ(2),..., ŵ(k) are then derived by the following K equations: µ w(j) + S M s T ( ) T γ (s) σw(j) 2 m (t) e (s) (s) 1 m (j) C m x t s=1 m=1 t=1 { } = K S M s T ( ) T ŵ(k) γ m (s) (t) e (s) (s) 1 m (k) C m e (s) m (j) + δ(k j) ŵ(j) (9). k=1 s=1 m=1 t=1 σ 2 w(j) Solving the Equation (9) for the combination coefficients is obviously more time consuming than using MLED due to the use of additional parameters {µ w(j), σw(j) 2 } of the prior distribution [23]. Though it enhances the robustness of MLED, the MAPED scheme is more complicated and requires much more time in calculations than the MLED scheme Incremental MLED eigenvoice adaptation. The coefficient estimation phase performs eigenvoice speaker adaptation using the eigen-decomposition algorithm such as MLED or MAPED to estimate a set of weights to find a weighted combination of eigenvoices for the new speaker. Given sufficient adaptation data, the eigenvoice adaptation method is effective. However, given insufficient adaptation data, the accuracy of the estimated combination coefficients, especially derived by the MLED approach, is dubious. Poor estimation of the combination coefficients in turn leads to incorrect positioning in the speaker space. The problem of scarce adaptation data can be alleviated by using the MAPED scheme if heavy computation is permissible. Given insufficient training data, it is necessary to be more conservative in using the combination coefficients thus derived. In other words, the effect of the adaptation should be restricted so that the adapted mean vector does not reference too much from the combination coefficients derived with the insufficient training data. Therefore, this study proposes the following incremental MLED eigenvoice adaptation approach [7,8] K ˆµ (s) m = k=1 [ λ w(k) + (1 λ) µw(k) ] e (s) m (k), 0 λ 1, (10) where w(k) is the combination coefficient calculated by MLED and µ w(k) is the prior mean of the combination coefficient. The linear combination coefficients for eigenvector decomposition are not calculated as in the maximum likelihood criterion. Instead, this approach calculates a weighted sum of the maximum likelihood estimate and the prior mean of the combination coefficient. The form of incremental MLED eigenvoice adaptation in Equation (10) is very similar to MAP adaptation [7,8], which is essentially an MAP-like adaptation. A weight parameter λ governs the balance of w(k) and µ w(k), mimicking the role of the adaptation speed parameter in MAP adaptation [7,8]. Using a weighting scheme, λ, should achieve satisfactory adaptation performance even when only a small amount of training data is available for eigen-decomposition. Note that the weight λ varies depending on how much confidence one has in the combination coefficient derived from MLED. A possibly not so well estimate of the combination coefficient from MLED (8)

6 4212 I.-J. DING due to insufficient adaptation data would preferably goes with λ approaching 0 so that the biased estimate of w(k) will be restricted. Conversely, 1-approaching λ can take full advantage of fast adaptation from sufficient adaptation data Fuzzy model and eigenvoice speaker adaptation. This section presents the FLC-MLED approach, which performs incremental MLED estimate of the combination coefficient using fuzzy logic control. Depending on the adaptation data size, the FLC-MLED method adjusts the weight parameter λ by moving the coefficient for eigendecomposition closer to the side of w(k) or to the side of µ w(k) to estimate the adapted mean vector of a new speaker in the speaker space. When the combination coefficient w(k) is reliable as a result of abundant adaptation samples, λ should be large. Conversely, λ should be smaller when the quality of w(k) is in doubt as a result of a little adaptation samples. To fulfill the requirements, a rule base with three implications governs λ regulations under the circumstance of N training samples (in terms of acoustic frames) observed for all Gaussian mixture components as follows. Rule 1: If N is small, then λ is set to small, Rule 2: If N is medium, then λ is set to medium, Rule 3: If N is large, then λ is set to large. Fuzzy techniques are naturally suitable for translating lingual statements into quantitative expressions for computation. This study employs a specific type of fuzzy logic control mechanism by Takagi-Sugeno (T-S hereafter) [25] T-S fuzzy control mechanism. The T-S fuzzy design procedure presents a systematic framework of fuzzy modeling design for a complex system [25]. The system includes a set of subsystems for which local behaviors are identified by expressing the inputs-out mapping in terms of a fuzzy implication (or rule) where the inputs are specified in the antecedent part and the output as the linear combination of the associated inputs. The overall system output is then a function of the subsystem outputs, which could be as simple as of a linear combination, where coefficient handling takes cares of fuzziness of the system behaviors, or of more elaborate forms. In the T-S fuzzy model, a generic system can be formulated as a set of fuzzy implications together with a system output determined by consequences in the set of implications. The system representation adopts the form Rule 1 : IF x(1) is A 1 1 and... and x(n) is A 1 n THEN y 1 = a a 1 1x(1) a 1 nx(n), Rule i : IF x(1) is A i 1 and... and x(n) is A i n THEN y i = a i 0 + a i 1x(1) a i nx(n), (11) Rule l : IF x(1) is A l 1 and... and x(n) is A l n THEN y l = a l 0 + a l 1x(1) a l nx(n) System output : y = l w i y i i=1, given that w l i = w i i=1 n A i p(x(p)), (12) for a system of n inputs and l implications. Note that A i p, p = 1,..., n, are fuzzy sets and A i p(x(n)) denotes the fuzzy values of the membership function associated with A i p p=1

7 ENHANCEMENTS OF MAXIMUM LIKELIHOOD EIGEN-DECOMPOSITION 4213 for the input x(n); a i p, p = 0, 1,..., n, are consequent parameters through which the i-th consequence y i is expressed as a linear combination of n inputs FLC-MLED formulation. For the specific problem in this study, the aforementioned simple rule governing λ regulation, given N adaptation samples observed for all Gaussian mixture components, can be formulated as follows: Rule 1: If N is small, Then λ is small, Rule 2: If N is medium, Then λ is medium, Rule 3: If N is large, Then λ is large. Let M 1 (N), M 2 (N) and M 3 (N) be membership functions associated respectively with small, medium and large amounts of training data available for adaptation, as Figure 2 shows, and let λ S, λ M and λ L be the small, medium and large values of λ determined respectively by functions f 1 (N), f 2 (N) and f 3 (N) in each of the three cases. Then, the previous set of rules can be further clarified as: Rule 1: If N is M 1 (N), Then λ S = f 1 (N), Rule 2: If N is M 2 (N), Then λ M = f 2 (N), Rule 3: If N is M 3 (N), Then λ L = f 3 (N), where M 1 (N) = 1 N N 1, N 2 N N 2 N 1 N 1 N N 2, 0 N N 2, M 3 (N) = together with the implication functions and the final system output [25] M 2 (N) = 0 N N 2, N N 2 N 3 N 2 N 2 < N < N 3, 1 N N 3, f 1 (N) = a 1 N + b 1, f 2 (N) = a 2 N + b 2, f 3 (N) = a 3 N + b 3, λ = 0 N N 1 or N N 3, N N 1 N 2 N 1 N 1 < N N 2, N 3 N N 3 N 2 N 2 N < N 3, 3 M i (N)f i (N). (13) 3 M i (N) i=1 i=1 Equation (13) shows that for N < N 1, λ is solely determined by f 1 (N), i.e., λ = λ S, whereas for N > N 3, λ is determined by f 3 (N) alone. When N is approximately N 2, λ is determined by f 2 (N) since M 2 (N) is much greater than M 1 (N) and M 3 (N). The system now has nine hyperparameters (a 1, a 2, a 3, b 1, b 2, b 3, N 1, N 2 and N 3 ) to be fixed, for which an iterative process is developed as follows:

8 4214 I.-J. DING Figure 2. Membership functions of the FLC for FLC-MLED Step 1: Let N 1 : N 2 : N 3 = 1 : 2 : 3 and initialize N 1. Step 2: Estimate the parameters a 1 and b 1 under the condition N < N 1, wherein M 1 (N) = 1, M 2 (N) = M 3 (N) = 0 and λ = M 1(N)f 1 (N) M 1 (N) = f 1 (N) = a 1 N + b 1. The procedure for fixing a 1 and b 1 is shown in Figure 3, which is in the type of pseudocode sequence. Step 3: Estimate the parameters a 3 and b 3 under the condition N > N 3, wherein M 1 (N) = M 2 (N) = 0, M 3 (N) = 1 and λ = M 3(N)f 3 (N) M 3 (N) = f 3 (N) = a 3 N + b 3. The determination of a 3 and b 3 is done by the same process as for a 1 and b 1 with the initial condition R 0 = R q from Step 2. Step 4: Estimate the parameters a 2 and b 2 under the condition N 1 N N 2, wherein M 1 (N) = N 2 N N 2 N 1, M 2 (N) = N N 1 N 2 N 1, M 3 (N) = 0 and λ = M 1(N)f 1 (N) + M 2 (N)f 2 (N) M 1 (N) + M 2 (N) = N 2 N N 2 N 1 (a 1 N + b 1 ) + N N 1 N 2 N 1 (a 2 N + b 2 ) N 2 N N 2 N 1 + N N 1 N 2 N 1 = (N 2 N)(a 1 N + b 1 ) + (N N 1 )(a 2 N + b 2 ) N 2 N 1. With a 1 and b 1 already obtained at Step 2, the parameters a 2 and b 2 is determined through the same tuning process as in Step 2 with the initial condition R 0 = R q from Step 3 for best recognition rate, R q, too.

9 ENHANCEMENTS OF MAXIMUM LIKELIHOOD EIGEN-DECOMPOSITION 4215 Figure 3. A procedure to fix FLC hyperparameters a 1 and b 1 Step 5: Re-estimate the parameter N 3 under the condition N 2 N N 3, wherein M 1 (N) = 0, M 2 (N) = N 3 N N 3 N 2, M 3 (N) = N N 2 N 3 N 2 and λ = M 2(N)f 2 (N) + M 3 (N)f 3 (N) M 2 (N) + M 3 (N) = N 3 N N 3 N 2 (a 2 N + b 2 ) + N N 2 N 3 N 2 (a 3 N + b 3 ) N 3 N N 3 N 2 + N N 2 N 3 N 2 = (N 3 N)(a 2 N + b 2 ) + (N N 2 )(a 3 N + b 3 ) N 3 N 2. Since a 2 and b 2, together with a 3 and b 3, have already been determined at Steps 4 and 3 respectively, a new value for N 3 can now be obtained by tuning for a higher R q value than in Step 4. Step 6: Given the new estimate of N 3 from Step 5, update N 1 and N 2 such that N 1 : N 2 : N 3 = 1 : 2 : 3, δ = Rq R, / R : desired recognition rate / R R 0 = R q. Repeat from Step 2 until δ is less than a predefined threshold.

10 4216 I.-J. DING Time complexity analysis of FLC-MLED. Compared to conventional MLED, the computation overhead of FLC-MLED adaptation for calculating λ is relatively minor, considering that at most 4 extra multiplications are required. The analysis is straightforward. For N < N 1, λ = a 1 N + b 1 which takes only 1 multiplication, as is for the case when N > N 3, λ = a 3 N + b 3, and for the case N 1 N N 2, λ = M 1(N)f 1 (N) + M 2 (N)f 2 (N) M 1 (N) + M 2 (N) = N 2 (a 2 a 1 ) + N(a 1 N 2 a 2 N 1 + b 2 b 1 ) + b 1 N 2 b 2 N 1 N 2 N 1 = p (c 1 N 2 + c 2 N + c 3 ), the computation of which involves 4 multiplications, as is for the case when N 2 N N 3, λ = M 2(N)f 2 (N) + M 3 (N)f 3 (N) M 2 (N) + M 3 (N) = N 2 (a 3 a 2 ) + N(a 2 N 3 a 3 N 2 + b 3 b 2 ) + b 2 N 3 b 3 N 2 N 3 N 2 = q (d 1 N 2 + d 2 N + d 3 ). Thus, calulating λ does not increase time complexity, and the computation of Equation (10) follows the same order as computing Equation (5). Therefore, computing FLC-MLED is much less expensive than computing MAPED Improvements and future directions on FLC-MLED. This study develops a complete and convincing concept of a fuzzy logic control mechanism for eigenvoice speaker adaptation applications. However, some possible improvements to the proposed FLC-MLED approach may be performed before encompassing all the needs. There are some possible extensions or improvements to the use of FLC mechanism in the future study. Eigenvoice speaker adaptation by neural networks (NN), support vector machines (SVM) and genetic algorithm (GA) may be considered for the bringing in of FLC mechanisms whenever plausible. Given insufficient training data for a new speaker, the proposed FLC- MLED regulates the influence of maximum likelihood eigen-decomposition by considering the amount of adaptation data when estimating the linear combination coefficients for eigenvector decomposition. This approach ensures the robustness of speaker adaptation against data scarcity. However, the effectiveness and performance of speaker adaptation strongly depend on the quality and quantity of adaptation data. A previous study [31] presents a hybrid scheme of a support vector machine and fuzzy logic control incorporated into MAP speaker adaptation to address this point in speaker adaptation design. How to incorporate this hybrid SVM and FLC mechanism into the eigenvoice process is a challenging issue, and an SVM-FLC-MLED would seem to be a promising subject for future research. The adaptation capability of eigenvoice speaker adaptation under the framework of hybrid SVM-FLC operations will be further ensured, especially in the robustness against the scarcity and the impropriety of adaptation data. Another key issue in future research is to enhance the FLC design of FLC-MLED. The FLC design must account for potential variations in the process itself, making the use of time-variant parameters in the FLC design unavoidable. In other words, the FLC of FLC-MLED should be adaptive in accordance with the time-varying process. Adaptation can be performed by modifying the rule sets or the fuzzy set, resulting in two classes of FLCs: the self-organizing and self-tuning FLC [32].

11 ENHANCEMENTS OF MAXIMUM LIKELIHOOD EIGEN-DECOMPOSITION 4217 As a final remark, the use of the T-S FLC mechanism is one choice from many fuzzy formulations in control by computation. For example, the Mamdani (linguistic) type fuzzy model [33] is an alternative that can be used in place of T-S FLC in the proposed FLC-MLED. 3. Experiments. The study presents experiments with FLC-MLED adaptation to compare its recognition performance with MLED- and MAPED-adaptations when encountering different amounts of adaptation data. The following subsections present the experimental settings and results of the proposed FLC-MLED adaptation algorithm Databases and experimental design. Experiments testing the recognition of 30 worldwide famous city names in Mandarin were run in three parts: (1) establishing the initial SI models and the eigenspace, (2) the training phase for fixing FLC hyperparameters, and (3) the recognition phase for evaluating the performance of tuning of λ weight by the FLC (FLC-MLED). An 8 khz sampling rate was set for speech signal acquisition. The analysis frames were 30-ms wide with a 15-ms overlap. A 24-dimensional feature vector was extracted for each frame. This feature vector was made up of a 12-dimensional mel-cepstral vector and a 12-dimensional delta-mel-cepstral vector. The database MAT400 sub-database DB3 [34] was used to train the initial SI models as a set of HMM parameters. This study adopted the Initial/Final HMM s. A syllable in Mandarin comprises an initial part and a final part. The modeling of Mandarin syllables assumes that the initial part is right dependent on the beginning phone of the following final part and the final part is context independent [35]. A Mandarin utterance consists of one to several syllables. The HMM of a syllable comprises an HMM with 3 states for the initial part, and an HMM with 6 states for the final part. The HMM of an utterance consists of all HMMs of the constituent syllables. Each state has 4 Gaussian mixture components. An SD model was generated for each training speaker in the database by adjusting the SI model. The resulting SD models were then used to build up the eigenspace bases. The training phase collected training data for tuning the hyperparameters of the FLC from 15 speakers. Each of the 15 speakers uttered 10 city names (picked among 30 cities) to generate the adaptation data, and then uttered 60 names (two utterances for each city) to generate FLC parameter tuning data (to be used in following-up observations); all utterances were recorded by an ordinary microphone. The training phase experiment procedure is described in the pseudo-code sequence below. R 0 = baseline recognition rate; t = 0; Repeat { t ++; R 2 t = 2 utterances training (eigenvectors, hyperparameters); R 4 t = 4 utterances training (eigenvectors, hyperparameters); R 6 t = 6 utterances training (eigenvectors, hyperparameters); R 8 t = 8 utterances training (eigenvectors, hyperparameters); R 10 t = 10 utterances training (eigenvectors, hyperparameters); 5 R t 2 i R t i=1 = ; 5 R t = Rt R t 1 ; } until R t < threshold;

12 4218 I.-J. DING where 2 i utterances training(d), i = 1, 2, 3, 4, 5 is the procedure using 2 i adaptation utterances from 15 speakers for fixing the 9 hyperparameters of FLC defined in Section and thus returning a better-than-baseline overall recognition rate R t 2 i for the 15 training speakers as explained in the code-like sequence below. 2 i utterances training (eigenvectors, hyperparameters) // i = 1, 2, 3, 4, 5 { k = 0; R 0 2 i =baseline recognition rate; Repeat {k ++; R k (2 i)1 = speaker training (eigenvectors, test data 1, hyperparameters, 2 i utterances 1 );. R k (2 i)j = speaker training (eigenvectors, test data j, hyperparameters, 2 i utterances j );. R k (2 i)15 = speaker training (eigenvectors, test data 15, hyperparameters, 15 2 i utterances 15 ); R k (2 i)j R 2 i k j=1 = ; 15 R 2 i = k 1 Rk 2 i R 2 i ; } until R 2 i < threshold 1; return R 2 i; k where 2 i utterances j and test data j denote respectively the adaptation utterances in the number of 2, 4, 6, 8 and 10 and the 60 test utterances from the jth speaker, 1 j 15, for the tuning of the 9 hyperparameters in the proposed FLC mechanism. And speaker training(d) is the procedure that would incrementally perform adaptation by appropriate settings of the hyperparameters of the T-S FLC, as already described in Section 2.3.2, such that the adaptation would not jeopardize the recognition rate, given 2 i utterances. speaker training (eigenvectors, test data j, hyperparameters, 2 i utterances j ) // j = 1, 2,..., 15 { Derivation of w(k) (2 i utterances j ); //w(k), k = 1, 2,..., K, denoting the combination coefficient from MLED R (2 i)j = Iterative process (w(k), eigenvectors, test data j, hyperparameters); // as described in Section for maximizing the recognition rate R (2 i)j return R (2 i)j ; }; As a result, a set of FLC hyperparameters {a 1, a 2, a 3, b 1, b 2, b 3, N 1, N 2 and N 3 } was determined. The recognition phase involved a new group of 15 speakers. Each speaker was asked to generate 10 and 60 utterances for adaptation and recognition, respectively. The weight λ was calculated using the hyperparameters acquired in the training stage for adaptation. For the recognition experiment with FLC-MLED adaptation, five adapted models were constructed using 2, 4, 6, 8 and 10 adaptation utterances from each of the 15 speakers, and the λ for each of the 5 adaptations was calculated by Equation (13) with N utterances = 2, 4, 6, 8 or 10, and the FLC hyperparameters already determined in the training phase.

13 ENHANCEMENTS OF MAXIMUM LIKELIHOOD EIGEN-DECOMPOSITION 4219 Figure 4. The curve of the training values of λ 5 MLED-adapted and 5 MAPED-adapted models using 2, 4, 6, 8 and 10 adaptation utterances, respectively, were also constructed for performance comparison. Then, 60 utterances from each of the 15 speakers were fed into the five adapted models to evaluate their recognition rates Experiment results. The training phase produced some interesting experimental results and observations. The weight λ increased as the number of adaptation utterances increased. As Figure 4 shows, λ rose noticeably when the number of utterances increased from 2 to 6, and then ascended gradually and somewhat stabilized as the number of utterances increased further. The λ curve exhibited the same tendency as one determined from the precursory fuzzy rule base design (Section 2.3.2). This study also uses various numbers of adaptation utterances to compare the recognition performance of the proposed FLC-MLED utilizing a T-S FLC, the MLED without referencing any prior knowledge and the MAPED. As Figure 5 shows, the recognition rate improved as the number of adaptation utterances increased for all three adaptations. In the case of limited adaptation utterances, the performance of the MLED and MAPED methods fall below the baseline recognition rate, which indicates the potential inaccuracy or unreliability of inadequately MLED and MAPED models due to insufficient adaptation data. The performance of the FLC-MLED method remained above the baseline when only 2 utterances were available for adaptation. In all testing cases, the proposed FLC-MLED adaptation achieves the best recognition, followed by MAPED adaptation and MLED adaptation. FLC-MLED performs better than MLED and MAPED, especially when training data is quite limited. Note that MAPED tends to catch up with FLC-MLED when the amount of training data increases. Finally, Figures 6 and 7 show the effects of λ variation on the recognition performance of MLED under extreme cases of training data availability are also observed. The former shows that while the training data are scarce, such as 2 utterances, the performance falls below the baseline if, for λ being more than 0.3, the model adaptation is to be largely determined by the combination coefficients w(k) derived from MLED, which is very much likely poorly estimated. With a small value of λ, the influence of w(k) on the adaptation is less and the recognition rate is still able to maintain above the baseline. Conversely, given sufficient training data, 10 utterances for instance, full advantage of adaptation by w(k) should be exploited, by using a big λ value, for good performance, as Figure 7 shows. 4. Conclusions. This paper presents an FLC-MLED scheme with a weight control parameter λ determined by the fuzzy logic controller. The fuzzy mechanism regulates λ

14 4220 I.-J. DING Figure 5. The performance curves of FLC-MLED, conventional MAPED and conventional MLED in the recognition testing experiments Figure 6. Numbers of adaptation utterances = 2 (MLED testing experiments) Figure 7. Numbers of adaptation utterances = 10 (MLED testing experiments) according to the amount of adaptation data. The proposed FLC-MLED enhances eigendecomposition of eigenvoice speaker adaptation, and accurately identifies HMM acoustic parameters of a new speaker. Experiment results show that FLC-MLED outperforms MLED and even MAPED in recognition performance, regardless of the amount of adaptation data available. The behaviors of λ with respect to the variation in adaptation data

15 ENHANCEMENTS OF MAXIMUM LIKELIHOOD EIGEN-DECOMPOSITION 4221 available follow the requirement in the FLC design. FLC-MLED is an adaptive learning method that is more robust against data insufficiency than conventional MLED and incurs much lower computation cost than MAPED. Acknowledgment. This research is partially supported by the National Science Council (NSC) in Taiwan under grant NSC E The author also gratefully acknowledges the helpful comments and suggestions of the reviewers, which have improved the presentation. REFERENCES [1] B. H. Juang and L. R. Rabiner, Automatic speech recognition A brief history of the technology development, Encyclopedia of Language and Linguistics, 2nd Edition, Elsevier, [2] M. Nakayama and S. Ishimitsu, Speech support system using body-conducted speech recognition for disorders, International Journal of Innovative Computing, Information and Control, vol.5, no.11(b), pp , [3] X. Wang, J. Lin, Y. Sun, H. Gan and L. Yao, Applying feature extraction of speech recognition on VoIP auditing, International Journal of Innovative Computing, Information and Control, vol.5, no.7, pp , [4] T. Guan and Q. Gong, A study on the effects of spectral information encoding in mandarin speech recognition in white noise, ICIC Express Letters, vol.3, no.3(a), pp , [5] L. R. Rabiner, The power of speech, Science, vol.301, pp , [6] R. P. Lippmann, Speech recognition by machines and humans, Speech Communication, vol.22, pp.1-15, [7] C. H. Lee, C. H. Lin and B. H. Juang, A study on speaker adaptation of the parameters of continuous density hidden Markov models, IEEE Trans. on Acoustics, Speech and Signal Processing, vol.39, pp , [8] J. L. Gauvain and C. H. Lee, Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains, IEEE Trans. on Speech and Audio Processing, vol.2, no.2, pp , [9] C. J. Leggetter and P. C. Woodland, Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models, Computer Speech and Language, vol.9, pp , [10] C. Chesta, O. Siohan and C. H. Lee, Maximum a posteriori linear regression for hidden Markov model adaptation, Proc. of the European Conference on Speech Communication and Technology (EUROSPEECH), pp , [11] W. Chou, Maximum a posteriori linear regression with elliptically symmetric matrix priors, Proc. of the European Conference on Speech Communication and Technology (EUROSPEECH), pp.1-4, [12] R. Kuhn, J.-C. Junqua, P. Nguyen and N. Niedzielski, Rapid speaker adaptation in eigenvoice space, IEEE Trans. on Speech and Audio Processing, vol.8, no.6, pp , [13] K. T. Chen, W. W. Liau, H. M. Wang and L. S. Lee, Fast speaker adaptation using eigenspace-based maximum likelihood linear regression, Proc. of the International Conference on Spoken Language Processing, pp , [14] K. T. Chen and H. M. Wang, Eigenspace-based maximum a posteriori linear regression for rapid speaker adaptation, Proc. of IEEE the International Conference on Acoustic, Speech and Signal Processing, pp , [15] B. Mak, S. Ho and J. T. Kwok, Speedup of kernel eigenvoice speaker adaptation by embedded kernel PCA, Proc. of the International Conference on Spoken Language Processing, pp , [16] B. Mak and R. Hsiao, Improving eigenspace-based MLLR adaptation by kernel PCA, Proc. of the International Conference on Spoken Language Processing, pp.13-16, [17] R. Hsiao and B. Mak, Kernel eigenspace-based MLLR adaptation using multiple regression classes, Proc. of the IEEE International Conference on Acoustic, Speech and Signal Processing, pp , [18] B. Mak, J. T. Kwok and S. Ho, Kernel eigenvoice speaker adaptation, IEEE Trans. on Speech and Audio Processing, vol.13, no.5, pp , [19] B. Zhou and J. Hansen, Rapid discriminative acoustic model based on eigenspace mapping for fast speaker adaptation, IEEE Trans. on Speech and Audio Processing, vol.13, no.4, pp , 2005.

16 4222 I.-J. DING [20] B. Mak and S. Ho, Various reference speakers determination methods for embedded kernel eigenvoice speaker adaptation, Proc. of the IEEE International Conference on Acoustic, Speech and Signal Processing, pp , [21] B. Mak, R. Hsiao, S. Ho and J. T. Kwok, Embedded kernel eigenvoice speaker adaptation and its implication to reference speaker weighting, IEEE Trans. on Audio, Speech, and Language Processing, vol.14, no.4, pp , [22] B. Mak and R. Hsiao, Kernel eigenspace-based MLLR adaptation, IEEE Trans. on Audio, Speech, and Language Processing, vol.15, no.3, pp , [23] C.-H. Huang, J.-T. Chien and H.-M. Wang, A new eigenvoice approach to speaker adaptation, Proc. of the IEEE International Symposium on Chinese Spoken Language Processing, pp , [24] R. Yager and D. Filev, Essentials of Fuzzy Modeling and Control, Wiley, New York, [25] T. Takagi and M. Sugeno, Fuzzy identification of systems and its applications to modeling and control, IEEE Trans. on System, Man, and Cybernetics, vol.15, pp , [26] C. Li, J. Yi and D. Zhao, Design of interval type-2 fuzzy logic system using sampled data and prior knowledge, ICIC Express Letters, vol.3, no.3(b), pp , [27] J. Yen, R. Langari and L. A. Zadeh, Industrial Applications of Fuzzy Logic and Intelligent Systems, IEEE Press, New York, [28] S. Kermiche, M. L. Saidi, H. A. Abbassi and H. Ghodbane, Takagi-Sugeno based controller for mobile robot navigation, Journal of Applied Science, vol.6, no.8, pp , [29] M. C. M. Teixeira, G. S. Deaecto, R. Gaino, E. Assunção, A. A. Carvalho and U. C. Farias, Design of a fuzzy Takagi-Sugeno controller to vary the joint knee angle of paraplegic patients, Proc. of International Conference on Neural Information Processing, pp , [30] A. P. Dempster, N. M. Laird and D. B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society, vol.39, pp.1-38, [31] I.-J. Ding, MAP speaker adaptation by hybrid SVM-FLC for speech recognition, ICIC Express Letters, vol.5, no.2, pp , [32] H.-J. Zimmermann, Fuzzy Set Theory and Its Applications, 3rd Edition, Kluwer Academic, [33] E. H. Mamdani, Application of fuzzy logic to approximate reasoning using linguistic systems, Fuzzy Sets and Systems, vol.26, pp , [34] H. C. Wang, MAT A project to collect mandarin speech data through telephone networks in Taiwan, Comput. Linguist. Chinese Lang. Process., vol.2, pp.73-89, [35] C. H. Lin, C. H. Wu, P. Y. Ting and H. M. Wang, Frameworks for recognition of mandarin syllables with tones using sub-syllabic units, Speech Communication, vol.18, no.2, pp , 1996.

Heeyoul (Henry) Choi. Dept. of Computer Science Texas A&M University

Heeyoul (Henry) Choi. Dept. of Computer Science Texas A&M University Heeyoul (Henry) Choi Dept. of Computer Science Texas A&M University hchoi@cs.tamu.edu Introduction Speaker Adaptation Eigenvoice Comparison with others MAP, MLLR, EMAP, RMP, CAT, RSW Experiments Future

More information

Eigenvoice Speaker Adaptation via Composite Kernel PCA

Eigenvoice Speaker Adaptation via Composite Kernel PCA Eigenvoice Speaker Adaptation via Composite Kernel PCA James T. Kwok, Brian Mak and Simon Ho Department of Computer Science Hong Kong University of Science and Technology Clear Water Bay, Hong Kong [jamesk,mak,csho]@cs.ust.hk

More information

Monaural speech separation using source-adapted models

Monaural speech separation using source-adapted models Monaural speech separation using source-adapted models Ron Weiss, Dan Ellis {ronw,dpwe}@ee.columbia.edu LabROSA Department of Electrical Enginering Columbia University 007 IEEE Workshop on Applications

More information

Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition

Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition ABSTRACT It is well known that the expectation-maximization (EM) algorithm, commonly used to estimate hidden

More information

Independent Component Analysis and Unsupervised Learning

Independent Component Analysis and Unsupervised Learning Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien National Cheng Kung University TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent

More information

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood

More information

Embedded Kernel Eigenvoice Speaker Adaptation and its Implication to Reference Speaker Weighting

Embedded Kernel Eigenvoice Speaker Adaptation and its Implication to Reference Speaker Weighting IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, December 21, 2005 1 Embedded Kernel Eigenvoice Speaker Adaptation and its Implication to Reference Speaker Weighting Brian Mak, Roger Hsiao, Simon Ho,

More information

MAP adaptation with SphinxTrain

MAP adaptation with SphinxTrain MAP adaptation with SphinxTrain David Huggins-Daines dhuggins@cs.cmu.edu Language Technologies Institute Carnegie Mellon University MAP adaptation with SphinxTrain p.1/12 Theory of MAP adaptation Standard

More information

Automatic Speech Recognition (CS753)

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 21: Speaker Adaptation Instructor: Preethi Jyothi Oct 23, 2017 Speaker variations Major cause of variability in speech is the differences between speakers Speaking

More information

A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier

A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier Seiichi Ozawa, Shaoning Pang, and Nikola Kasabov Graduate School of Science and Technology, Kobe

More information

CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring Final Exam CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the

More information

Joint Factor Analysis for Speaker Verification

Joint Factor Analysis for Speaker Verification Joint Factor Analysis for Speaker Verification Mengke HU ASPITRG Group, ECE Department Drexel University mengke.hu@gmail.com October 12, 2012 1/37 Outline 1 Speaker Verification Baseline System Session

More information

Robust Speaker Identification

Robust Speaker Identification Robust Speaker Identification by Smarajit Bose Interdisciplinary Statistical Research Unit Indian Statistical Institute, Kolkata Joint work with Amita Pal and Ayanendranath Basu Overview } } } } } } }

More information

A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier

A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier Seiichi Ozawa 1, Shaoning Pang 2, and Nikola Kasabov 2 1 Graduate School of Science and Technology,

More information

An Evolutionary Programming Based Algorithm for HMM training

An Evolutionary Programming Based Algorithm for HMM training An Evolutionary Programming Based Algorithm for HMM training Ewa Figielska,Wlodzimierz Kasprzak Institute of Control and Computation Engineering, Warsaw University of Technology ul. Nowowiejska 15/19,

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

Support Vector Machines using GMM Supervectors for Speaker Verification

Support Vector Machines using GMM Supervectors for Speaker Verification 1 Support Vector Machines using GMM Supervectors for Speaker Verification W. M. Campbell, D. E. Sturim, D. A. Reynolds MIT Lincoln Laboratory 244 Wood Street Lexington, MA 02420 Corresponding author e-mail:

More information

Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features

Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Heiga ZEN (Byung Ha CHUN) Nagoya Inst. of Tech., Japan Overview. Research backgrounds 2.

More information

Machine Learning Techniques for Computer Vision

Machine Learning Techniques for Computer Vision Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM

More information

PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS

PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS Jinjin Ye jinjin.ye@mu.edu Michael T. Johnson mike.johnson@mu.edu Richard J. Povinelli richard.povinelli@mu.edu

More information

A Direct Criterion Minimization based fmllr via Gradient Descend

A Direct Criterion Minimization based fmllr via Gradient Descend A Direct Criterion Minimization based fmllr via Gradient Descend Jan Vaněk and Zbyněk Zajíc University of West Bohemia in Pilsen, Univerzitní 22, 306 14 Pilsen Faculty of Applied Sciences, Department of

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Covariance Matrix Enhancement Approach to Train Robust Gaussian Mixture Models of Speech Data

Covariance Matrix Enhancement Approach to Train Robust Gaussian Mixture Models of Speech Data Covariance Matrix Enhancement Approach to Train Robust Gaussian Mixture Models of Speech Data Jan Vaněk, Lukáš Machlica, Josef V. Psutka, Josef Psutka University of West Bohemia in Pilsen, Univerzitní

More information

Shankar Shivappa University of California, San Diego April 26, CSE 254 Seminar in learning algorithms

Shankar Shivappa University of California, San Diego April 26, CSE 254 Seminar in learning algorithms Recognition of Visual Speech Elements Using Adaptively Boosted Hidden Markov Models. Say Wei Foo, Yong Lian, Liang Dong. IEEE Transactions on Circuits and Systems for Video Technology, May 2004. Shankar

More information

Maximum Likelihood and Maximum A Posteriori Adaptation for Distributed Speaker Recognition Systems

Maximum Likelihood and Maximum A Posteriori Adaptation for Distributed Speaker Recognition Systems Maximum Likelihood and Maximum A Posteriori Adaptation for Distributed Speaker Recognition Systems Chin-Hung Sit 1, Man-Wai Mak 1, and Sun-Yuan Kung 2 1 Center for Multimedia Signal Processing Dept. of

More information

Lecture 5: GMM Acoustic Modeling and Feature Extraction

Lecture 5: GMM Acoustic Modeling and Feature Extraction CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 5: GMM Acoustic Modeling and Feature Extraction Original slides by Dan Jurafsky Outline for Today Acoustic

More information

Hidden Markov Models and Gaussian Mixture Models

Hidden Markov Models and Gaussian Mixture Models Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 23&27 January 2014 ASR Lectures 4&5 Hidden Markov Models and Gaussian

More information

Mixtures of Gaussians with Sparse Structure

Mixtures of Gaussians with Sparse Structure Mixtures of Gaussians with Sparse Structure Costas Boulis 1 Abstract When fitting a mixture of Gaussians to training data there are usually two choices for the type of Gaussians used. Either diagonal or

More information

FACTORIAL HMMS FOR ACOUSTIC MODELING. Beth Logan and Pedro Moreno

FACTORIAL HMMS FOR ACOUSTIC MODELING. Beth Logan and Pedro Moreno ACTORIAL HMMS OR ACOUSTIC MODELING Beth Logan and Pedro Moreno Cambridge Research Laboratories Digital Equipment Corporation One Kendall Square, Building 700, 2nd loor Cambridge, Massachusetts 02139 United

More information

Upper Bound Kullback-Leibler Divergence for Hidden Markov Models with Application as Discrimination Measure for Speech Recognition

Upper Bound Kullback-Leibler Divergence for Hidden Markov Models with Application as Discrimination Measure for Speech Recognition Upper Bound Kullback-Leibler Divergence for Hidden Markov Models with Application as Discrimination Measure for Speech Recognition Jorge Silva and Shrikanth Narayanan Speech Analysis and Interpretation

More information

We Prediction of Geological Characteristic Using Gaussian Mixture Model

We Prediction of Geological Characteristic Using Gaussian Mixture Model We-07-06 Prediction of Geological Characteristic Using Gaussian Mixture Model L. Li* (BGP,CNPC), Z.H. Wan (BGP,CNPC), S.F. Zhan (BGP,CNPC), C.F. Tao (BGP,CNPC) & X.H. Ran (BGP,CNPC) SUMMARY The multi-attribute

More information

CS 188: Artificial Intelligence Fall 2011

CS 188: Artificial Intelligence Fall 2011 CS 188: Artificial Intelligence Fall 2011 Lecture 20: HMMs / Speech / ML 11/8/2011 Dan Klein UC Berkeley Today HMMs Demo bonanza! Most likely explanation queries Speech recognition A massive HMM! Details

More information

Algorithmisches Lernen/Machine Learning

Algorithmisches Lernen/Machine Learning Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines

More information

Machine Learning Overview

Machine Learning Overview Machine Learning Overview Sargur N. Srihari University at Buffalo, State University of New York USA 1 Outline 1. What is Machine Learning (ML)? 2. Types of Information Processing Problems Solved 1. Regression

More information

A FUZZY NEURAL NETWORK MODEL FOR FORECASTING STOCK PRICE

A FUZZY NEURAL NETWORK MODEL FOR FORECASTING STOCK PRICE A FUZZY NEURAL NETWORK MODEL FOR FORECASTING STOCK PRICE Li Sheng Institute of intelligent information engineering Zheiang University Hangzhou, 3007, P. R. China ABSTRACT In this paper, a neural network-driven

More information

Mixtures of Gaussians with Sparse Regression Matrices. Constantinos Boulis, Jeffrey Bilmes

Mixtures of Gaussians with Sparse Regression Matrices. Constantinos Boulis, Jeffrey Bilmes Mixtures of Gaussians with Sparse Regression Matrices Constantinos Boulis, Jeffrey Bilmes {boulis,bilmes}@ee.washington.edu Dept of EE, University of Washington Seattle WA, 98195-2500 UW Electrical Engineering

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Maximum A Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains

Maximum A Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains Maximum A Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains Jean-Luc Gauvain 1 and Chin-Hui Lee Speech Research Department AT&T Bell Laboratories Murray Hill, NJ 07974

More information

Speaker adaptation based on regularized speaker-dependent eigenphone matrix estimation

Speaker adaptation based on regularized speaker-dependent eigenphone matrix estimation Zhang et al EURASIP Journal on Audio, Speech, and Music Processing 2014, 2014:11 RESEARCH Speaker adaptation based on regularized speaker-dependent eigenphone matrix estimation Wen-Lin Zhang 1*,Wei-QiangZhang

More information

Generalization of Belief and Plausibility Functions to Fuzzy Sets

Generalization of Belief and Plausibility Functions to Fuzzy Sets Appl. Math. Inf. Sci. 6, No. 3, 697-703 (202) 697 Applied Mathematics & Information Sciences An International Journal Generalization of Belief and Plausibility Functions to Fuzzy Sets Jianyu Xiao,2, Minming

More information

Speaker Verification Using Accumulative Vectors with Support Vector Machines

Speaker Verification Using Accumulative Vectors with Support Vector Machines Speaker Verification Using Accumulative Vectors with Support Vector Machines Manuel Aguado Martínez, Gabriel Hernández-Sierra, and José Ramón Calvo de Lara Advanced Technologies Application Center, Havana,

More information

Hidden Markov Models and Gaussian Mixture Models

Hidden Markov Models and Gaussian Mixture Models Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 25&29 January 2018 ASR Lectures 4&5 Hidden Markov Models and Gaussian

More information

ARTIFICIAL NEURAL NETWORK WITH HYBRID TAGUCHI-GENETIC ALGORITHM FOR NONLINEAR MIMO MODEL OF MACHINING PROCESSES

ARTIFICIAL NEURAL NETWORK WITH HYBRID TAGUCHI-GENETIC ALGORITHM FOR NONLINEAR MIMO MODEL OF MACHINING PROCESSES International Journal of Innovative Computing, Information and Control ICIC International c 2013 ISSN 1349-4198 Volume 9, Number 4, April 2013 pp. 1455 1475 ARTIFICIAL NEURAL NETWORK WITH HYBRID TAGUCHI-GENETIC

More information

University of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I

University of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I University of Cambridge MPhil in Computer Speech Text & Internet Technology Module: Speech Processing II Lecture 2: Hidden Markov Models I o o o o o 1 2 3 4 T 1 b 2 () a 12 2 a 3 a 4 5 34 a 23 b () b ()

More information

Why DNN Works for Acoustic Modeling in Speech Recognition?

Why DNN Works for Acoustic Modeling in Speech Recognition? Why DNN Works for Acoustic Modeling in Speech Recognition? Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Joint work with Y. Bao, J. Pan,

More information

Mixture Models and EM

Mixture Models and EM Mixture Models and EM Goal: Introduction to probabilistic mixture models and the expectationmaximization (EM) algorithm. Motivation: simultaneous fitting of multiple model instances unsupervised clustering

More information

Enhancements to Transformation-Based Speaker Adaptation: Principal Component and Inter-Class Maximum Likelihood Linear Regression

Enhancements to Transformation-Based Speaker Adaptation: Principal Component and Inter-Class Maximum Likelihood Linear Regression Enhancements to Transformation-Based Speaker Adaptation: Principal Component and Inter-Class Maximum Likelihood Linear Regression Sam-Joo Doh Submitted in partial fulfillment of the requirements for the

More information

A TSK-Type Quantum Neural Fuzzy Network for Temperature Control

A TSK-Type Quantum Neural Fuzzy Network for Temperature Control International Mathematical Forum, 1, 2006, no. 18, 853-866 A TSK-Type Quantum Neural Fuzzy Network for Temperature Control Cheng-Jian Lin 1 Dept. of Computer Science and Information Engineering Chaoyang

More information

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS Emad M. Grais and Hakan Erdogan Faculty of Engineering and Natural Sciences, Sabanci University, Orhanli

More information

The Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech

The Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech CS 294-5: Statistical Natural Language Processing The Noisy Channel Model Speech Recognition II Lecture 21: 11/29/05 Search through space of all possible sentences. Pick the one that is most probable given

More information

EBEM: An Entropy-based EM Algorithm for Gaussian Mixture Models

EBEM: An Entropy-based EM Algorithm for Gaussian Mixture Models EBEM: An Entropy-based EM Algorithm for Gaussian Mixture Models Antonio Peñalver Benavent, Francisco Escolano Ruiz and Juan M. Sáez Martínez Robot Vision Group Alicante University 03690 Alicante, Spain

More information

ABSTRACT INTRODUCTION

ABSTRACT INTRODUCTION ABSTRACT Presented in this paper is an approach to fault diagnosis based on a unifying review of linear Gaussian models. The unifying review draws together different algorithms such as PCA, factor analysis,

More information

A FUZZY TIME SERIES-MARKOV CHAIN MODEL WITH AN APPLICATION TO FORECAST THE EXCHANGE RATE BETWEEN THE TAIWAN AND US DOLLAR.

A FUZZY TIME SERIES-MARKOV CHAIN MODEL WITH AN APPLICATION TO FORECAST THE EXCHANGE RATE BETWEEN THE TAIWAN AND US DOLLAR. International Journal of Innovative Computing, Information and Control ICIC International c 2012 ISSN 1349-4198 Volume 8, Number 7(B), July 2012 pp. 4931 4942 A FUZZY TIME SERIES-MARKOV CHAIN MODEL WITH

More information

Forward algorithm vs. particle filtering

Forward algorithm vs. particle filtering Particle Filtering ØSometimes X is too big to use exact inference X may be too big to even store B(X) E.g. X is continuous X 2 may be too big to do updates ØSolution: approximate inference Track samples

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Gaussian Mixture Model Uncertainty Learning (GMMUL) Version 1.0 User Guide

Gaussian Mixture Model Uncertainty Learning (GMMUL) Version 1.0 User Guide Gaussian Mixture Model Uncertainty Learning (GMMUL) Version 1. User Guide Alexey Ozerov 1, Mathieu Lagrange and Emmanuel Vincent 1 1 INRIA, Centre de Rennes - Bretagne Atlantique Campus de Beaulieu, 3

More information

Session Variability Compensation in Automatic Speaker Recognition

Session Variability Compensation in Automatic Speaker Recognition Session Variability Compensation in Automatic Speaker Recognition Javier González Domínguez VII Jornadas MAVIR Universidad Autónoma de Madrid November 2012 Outline 1. The Inter-session Variability Problem

More information

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Speaker Representation and Verification Part II. by Vasileios Vasilakakis Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation

More information

Improving the Effectiveness of Speaker Verification Domain Adaptation With Inadequate In-Domain Data

Improving the Effectiveness of Speaker Verification Domain Adaptation With Inadequate In-Domain Data Distribution A: Public Release Improving the Effectiveness of Speaker Verification Domain Adaptation With Inadequate In-Domain Data Bengt J. Borgström Elliot Singer Douglas Reynolds and Omid Sadjadi 2

More information

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH Hoang Trang 1, Tran Hoang Loc 1 1 Ho Chi Minh City University of Technology-VNU HCM, Ho Chi

More information

Estimating Gaussian Mixture Densities with EM A Tutorial

Estimating Gaussian Mixture Densities with EM A Tutorial Estimating Gaussian Mixture Densities with EM A Tutorial Carlo Tomasi Due University Expectation Maximization (EM) [4, 3, 6] is a numerical algorithm for the maximization of functions of several variables

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

Discriminative training of GMM-HMM acoustic model by RPCL type Bayesian Ying-Yang harmony learning

Discriminative training of GMM-HMM acoustic model by RPCL type Bayesian Ying-Yang harmony learning Discriminative training of GMM-HMM acoustic model by RPCL type Bayesian Ying-Yang harmony learning Zaihu Pang 1, Xihong Wu 1, and Lei Xu 1,2 1 Speech and Hearing Research Center, Key Laboratory of Machine

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

Eigenvoice Modeling With Sparse Training Data

Eigenvoice Modeling With Sparse Training Data IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 13, NO. 3, MAY 2005 345 Eigenvoice Modeling With Sparse Training Data Patrick Kenny, Member, IEEE, Gilles Boulianne, Member, IEEE, and Pierre Dumouchel,

More information

Intelligent Systems and Control Prof. Laxmidhar Behera Indian Institute of Technology, Kanpur

Intelligent Systems and Control Prof. Laxmidhar Behera Indian Institute of Technology, Kanpur Intelligent Systems and Control Prof. Laxmidhar Behera Indian Institute of Technology, Kanpur Module - 2 Lecture - 4 Introduction to Fuzzy Logic Control In this lecture today, we will be discussing fuzzy

More information

Environmental Sound Classification in Realistic Situations

Environmental Sound Classification in Realistic Situations Environmental Sound Classification in Realistic Situations K. Haddad, W. Song Brüel & Kjær Sound and Vibration Measurement A/S, Skodsborgvej 307, 2850 Nærum, Denmark. X. Valero La Salle, Universistat Ramon

More information

Short-Time ICA for Blind Separation of Noisy Speech

Short-Time ICA for Blind Separation of Noisy Speech Short-Time ICA for Blind Separation of Noisy Speech Jing Zhang, P.C. Ching Department of Electronic Engineering The Chinese University of Hong Kong, Hong Kong jzhang@ee.cuhk.edu.hk, pcching@ee.cuhk.edu.hk

More information

Dynamic Time-Alignment Kernel in Support Vector Machine

Dynamic Time-Alignment Kernel in Support Vector Machine Dynamic Time-Alignment Kernel in Support Vector Machine Hiroshi Shimodaira School of Information Science, Japan Advanced Institute of Science and Technology sim@jaist.ac.jp Mitsuru Nakai School of Information

More information

A Generative Model Based Kernel for SVM Classification in Multimedia Applications

A Generative Model Based Kernel for SVM Classification in Multimedia Applications Appears in Neural Information Processing Systems, Vancouver, Canada, 2003. A Generative Model Based Kernel for SVM Classification in Multimedia Applications Pedro J. Moreno Purdy P. Ho Hewlett-Packard

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Jorge Silva and Shrikanth Narayanan, Senior Member, IEEE. 1 is the probability measure induced by the probability density function

Jorge Silva and Shrikanth Narayanan, Senior Member, IEEE. 1 is the probability measure induced by the probability density function 890 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 3, MAY 2006 Average Divergence Distance as a Statistical Discrimination Measure for Hidden Markov Models Jorge Silva and Shrikanth

More information

FEATURE PRUNING IN LIKELIHOOD EVALUATION OF HMM-BASED SPEECH RECOGNITION. Xiao Li and Jeff Bilmes

FEATURE PRUNING IN LIKELIHOOD EVALUATION OF HMM-BASED SPEECH RECOGNITION. Xiao Li and Jeff Bilmes FEATURE PRUNING IN LIKELIHOOD EVALUATION OF HMM-BASED SPEECH RECOGNITION Xiao Li and Jeff Bilmes Department of Electrical Engineering University. of Washington, Seattle {lixiao, bilmes}@ee.washington.edu

More information

Automated Segmentation of Low Light Level Imagery using Poisson MAP- MRF Labelling

Automated Segmentation of Low Light Level Imagery using Poisson MAP- MRF Labelling Automated Segmentation of Low Light Level Imagery using Poisson MAP- MRF Labelling Abstract An automated unsupervised technique, based upon a Bayesian framework, for the segmentation of low light level

More information

CSC411: Final Review. James Lucas & David Madras. December 3, 2018

CSC411: Final Review. James Lucas & David Madras. December 3, 2018 CSC411: Final Review James Lucas & David Madras December 3, 2018 Agenda 1. A brief overview 2. Some sample questions Basic ML Terminology The final exam will be on the entire course; however, it will be

More information

IBM Research Report. Training Universal Background Models for Speaker Recognition

IBM Research Report. Training Universal Background Models for Speaker Recognition RC24953 (W1003-002) March 1, 2010 Other IBM Research Report Training Universal Bacground Models for Speaer Recognition Mohamed Kamal Omar, Jason Pelecanos IBM Research Division Thomas J. Watson Research

More information

A TWO-LAYER NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR VOCABULARY DISCOVERY. MengSun,HugoVanhamme

A TWO-LAYER NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR VOCABULARY DISCOVERY. MengSun,HugoVanhamme A TWO-LAYER NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR VOCABULARY DISCOVERY MengSun,HugoVanhamme Department of Electrical Engineering-ESAT, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, Bus

More information

A Novel Low-Complexity HMM Similarity Measure

A Novel Low-Complexity HMM Similarity Measure A Novel Low-Complexity HMM Similarity Measure Sayed Mohammad Ebrahim Sahraeian, Student Member, IEEE, and Byung-Jun Yoon, Member, IEEE Abstract In this letter, we propose a novel similarity measure for

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes (bilmes@cs.berkeley.edu) International Computer Science Institute

More information

THE presence of missing values in a dataset often makes

THE presence of missing values in a dataset often makes 1 Efficient EM Training of Gaussian Mixtures with Missing Data Olivier Delalleau, Aaron Courville, and Yoshua Bengio arxiv:1209.0521v1 [cs.lg] 4 Sep 2012 Abstract In data-mining applications, we are frequently

More information

Comparing linear and non-linear transformation of speech

Comparing linear and non-linear transformation of speech Comparing linear and non-linear transformation of speech Larbi Mesbahi, Vincent Barreaud and Olivier Boeffard IRISA / ENSSAT - University of Rennes 1 6, rue de Kerampont, Lannion, France {lmesbahi, vincent.barreaud,

More information

Hierarchical Clustering of Dynamical Systems based on Eigenvalue Constraints

Hierarchical Clustering of Dynamical Systems based on Eigenvalue Constraints Proc. 3rd International Conference on Advances in Pattern Recognition (S. Singh et al. (Eds.): ICAPR 2005, LNCS 3686, Springer), pp. 229-238, 2005 Hierarchical Clustering of Dynamical Systems based on

More information

Self Supervised Boosting

Self Supervised Boosting Self Supervised Boosting Max Welling, Richard S. Zemel, and Geoffrey E. Hinton Department of omputer Science University of Toronto 1 King s ollege Road Toronto, M5S 3G5 anada Abstract Boosting algorithms

More information

Boundary Contraction Training for Acoustic Models based on Discrete Deep Neural Networks

Boundary Contraction Training for Acoustic Models based on Discrete Deep Neural Networks INTERSPEECH 2014 Boundary Contraction Training for Acoustic Models based on Discrete Deep Neural Networks Ryu Takeda, Naoyuki Kanda, and Nobuo Nukaga Central Research Laboratory, Hitachi Ltd., 1-280, Kokubunji-shi,

More information

A Systematic and Simple Approach for Designing Takagi-Sugeno Fuzzy Controller with Reduced Data

A Systematic and Simple Approach for Designing Takagi-Sugeno Fuzzy Controller with Reduced Data A Systematic and Simple Approach for Designing Takagi-Sugeno Fuzzy Controller with Reduced Data Yadollah Farzaneh, Ali Akbarzadeh Tootoonchi Department of Mechanical Engineering Ferdowsi University of

More information

Weighted Finite-State Transducers in Computational Biology

Weighted Finite-State Transducers in Computational Biology Weighted Finite-State Transducers in Computational Biology Mehryar Mohri Courant Institute of Mathematical Sciences mohri@cims.nyu.edu Joint work with Corinna Cortes (Google Research). 1 This Tutorial

More information

Noise Compensation for Subspace Gaussian Mixture Models

Noise Compensation for Subspace Gaussian Mixture Models Noise ompensation for ubspace Gaussian Mixture Models Liang Lu University of Edinburgh Joint work with KK hin, A. Ghoshal and. enals Liang Lu, Interspeech, eptember, 2012 Outline Motivation ubspace GMM

More information

The effect of speaking rate and vowel context on the perception of consonants. in babble noise

The effect of speaking rate and vowel context on the perception of consonants. in babble noise The effect of speaking rate and vowel context on the perception of consonants in babble noise Anirudh Raju Department of Electrical Engineering, University of California, Los Angeles, California, USA anirudh90@ucla.edu

More information

A New OCR System Similar to ASR System

A New OCR System Similar to ASR System A ew OCR System Similar to ASR System Abstract Optical character recognition (OCR) system is created using the concepts of automatic speech recognition where the hidden Markov Model is widely used. Results

More information

Hidden Markov Modelling

Hidden Markov Modelling Hidden Markov Modelling Introduction Problem formulation Forward-Backward algorithm Viterbi search Baum-Welch parameter estimation Other considerations Multiple observation sequences Phone-based models

More information

TinySR. Peter Schmidt-Nielsen. August 27, 2014

TinySR. Peter Schmidt-Nielsen. August 27, 2014 TinySR Peter Schmidt-Nielsen August 27, 2014 Abstract TinySR is a light weight real-time small vocabulary speech recognizer written entirely in portable C. The library fits in a single file (plus header),

More information

A Survey on Voice Activity Detection Methods

A Survey on Voice Activity Detection Methods e-issn 2455 1392 Volume 2 Issue 4, April 2016 pp. 668-675 Scientific Journal Impact Factor : 3.468 http://www.ijcter.com A Survey on Voice Activity Detection Methods Shabeeba T. K. 1, Anand Pavithran 2

More information

SEC: Stochastic ensemble consensus approach to unsupervised SAR sea-ice segmentation

SEC: Stochastic ensemble consensus approach to unsupervised SAR sea-ice segmentation 2009 Canadian Conference on Computer and Robot Vision SEC: Stochastic ensemble consensus approach to unsupervised SAR sea-ice segmentation Alexander Wong, David A. Clausi, and Paul Fieguth Vision and Image

More information

Rule-Based Fuzzy Model

Rule-Based Fuzzy Model In rule-based fuzzy systems, the relationships between variables are represented by means of fuzzy if then rules of the following general form: Ifantecedent proposition then consequent proposition The

More information

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated Fall 3 Computer Vision Overview of Statistical Tools Statistical Inference Haibin Ling Observation inference Decision Prior knowledge http://www.dabi.temple.edu/~hbling/teaching/3f_5543/index.html Bayesian

More information

Dynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji

Dynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji Dynamic Data Modeling, Recognition, and Synthesis Rui Zhao Thesis Defense Advisor: Professor Qiang Ji Contents Introduction Related Work Dynamic Data Modeling & Analysis Temporal localization Insufficient

More information

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement Simon Leglaive 1 Laurent Girin 1,2 Radu Horaud 1 1: Inria Grenoble Rhône-Alpes 2: Univ. Grenoble Alpes, Grenoble INP,

More information

Acoustic Modeling for Speech Recognition

Acoustic Modeling for Speech Recognition Acoustic Modeling for Speech Recognition Berlin Chen 2004 References:. X. Huang et. al. Spoken Language Processing. Chapter 8 2. S. Young. The HTK Book (HTK Version 3.2) Introduction For the given acoustic

More information