MODEL SELECTION CRITERIA FOR ACOUSTIC SEGMENTATION

Size: px
Start display at page:

Download "MODEL SELECTION CRITERIA FOR ACOUSTIC SEGMENTATION"

Transcription

1 ISCA Archive MODEL SELECTION CRITERIA FOR ACOUSTIC SEGMENTATION Mauro Cettolo and Marcello Federico ITC-irst - Centro per la Ricerca Scientifica e Tecnologica I-385 Povo, Trento, Italy ABSTRACT Robust acoustic segmentation has become a critical issue in order to apply speech recognition to audio streams with variable acoustic content, e.g. radio programs. Many techniques in the literature base segmentation on statistical model selection, by applying the Bayesian Information Criterion. This work reviews alternative model selection criteria and presents comparative experiments both under controlled conditions and on a broadcast news corpus. 1. INTRODUCTION The problem of acoustic segmentation and classification has become crucial to the application of automatic speech recognition to audio stream processing. For instance, in order to generate transcripts of broadcast news programs, it is necessary to isolate and filter out portions of the signal which do not contain speech, e.g. jingles and signature tunes. Moreover, transcription accuracy can be significantly improved, by using condition dependent acoustic models, if the speech signal is segmented and classified according to bandwidth, speaker gender, and speaker identity. In recent years, several algorithms have been presented which use a statistical decision criterion to detect spectral changes (SCs) within the feature space of the signal. Assuming that data are generated by a Gaussian process, SCs are detected within a sliding window through a model selection method. The most likely SC is tested by comparing two hypothesis: (i) the window contains data generated by the same distribution; (ii) the left and right semi-windows, with respect to to the SC point, contain data drawn by two different distributions. The test is performed with a likelihood ratio that, besides the maximum likelihood of each hypothesis, takes into account the different size of the corresponding models. Usually, the Bayesian Information Criterion () [11] is applied to select the simplest and best fitting model. This paper reviews alternative model selection criteria and presents comparative experiments both on synthetic data and real audio data. 2. SEGMENTATION PROBLEM Acoustic segmentation can be seen as a particular instance of the more general problem of partitioning data into distinct homogeneous regions [2]. The data partitioning problem arises in all applications which require to partition data into chunks, e.g. image processing, data mining, text processing, etc. The problem can be formulated as follows. Let x 1 ; x 2 ;:::x n be an ordered sample of data in the < d space. We assume that the data are generated by a Gaussian process with at most c transitions. The problem of segmentation is that of detecting all the transition points in the data set. The general problem can be approached, without loss of generality, by first considering the simplest case c =1. Single Transition Detection The search of one potential transition point goes through the definition of n different statistical models: ffl n 1 two-segment models M t (t =1;:::;n 1), each of them assuming: x 1 :::x t ο iid N d (x; μ 1 ; ± 1 ) (1) x t+1 :::x n ο iid N d (x; μ 2 ; ± 2 ) (2) ffl one single-segment model M n which assumes: x 1 ; x 2 ;:::x n ο N d (x; μ; ±): (3) The basic idea is to choose the model (M t : t =1;:::;n) that better fits the observations. The application of the maximum likelihood principle would however invariably lead to choosing one of the two-segment models, and hence to hypothesize a break point at some t =1:::n 1,astheyhave a higher number of free parameters than the one-segment model. In order to take into account the notion of dimension of the model, the following extension to the maximum likelihood principle was first proposed by Akaike [1]. The (Akaike s Information Criterion) suggests to maximize the likelihood for each model i separately, obtaining say L i = L i (x 1 ; x 2 ;:::x n ), and then choose the model for which logl i k i is largest, where k i is the dimension of the model. Computations Given a sample x1; x2;:::x n ο iid N d (x; μ; ±) the likelihood function achieves the maximum value [12]: L(x 1 ; x 2 ;:::x n )=(2ß) nd 2 j ^± j n 2 e nd=2 (4)

2 Name Author Year Reference Penalty Akaike 1972 [1] k Schwarz 1978 [11] k 2 log n k Bozdogan 1987 [3] 2 log n + k 2 F Bozdogan 1987 [3] k + k 2 log n log j I( ) j k Rissanen 1987 [1] 2 log n +(k +1)log(k +2) 2 d M Wallace & Freeman 1987 [14] 2 (1+log» d)+ 1 2 Notation: k Number of free parameters in the model. n Size of the data sample. d Dimension of the data space. I( ) Fisher Information matrix of the model.» d Constant of the optimal d-dimensional quantizing lattice [6]. Table 1: Model Dimension Estimates. log j I( ) j at ^μ = μx, the sample mean, and ^± = 1 n nx i=1 (x i μx)(x i μx) (5) the maximum-likelihood estimate of the covariance matrix. The number of free parameters of a multivariate normal distribution is equal to the dimension of the mean plus the number of variances and covariances to be estimated. For a full covariance matrix it is: (d +1) k = d + d : (6) 2 Decision Rule Several model selection criteria have been proposed in the literature that can be applied to Akaike s framework of model selection. In general, each criterion proposes a penalty function P that takes into account the model dimension. By computing the likelihood function of each model, the following decision rule can be derived. Look for the best two-segment model for the data: logl t P t = max t t=1;:::;n 1 2 log j ^± 1 j n t 2 log j ^± 2 j P t (7) then, take the one-segment model function: log L n P n = n 2 log j ^± j P n (8) and choose to segment the data at point t if and only if: (log L t log L n ) (P t P n ) > : (9) In the experimental part it will be shown that performance of the rule can be tuned by replacing the zero threshold with a value to be empirically estimated. Multiple Transition Detection The extension of the method to an arbitrary large number of potential segments requires considering a number of competing models that combinatorially grows with n and c. In general, application dependent simplifications are introduced to reduce the complexity of the problem. For the acoustic segmentation, the audio signal can be segmented through a sliding window. By keeping the window size sufficiently large to reliably apply the method, and sufficiently short to avoid multiple transitions, a segmentation algorithm can be devised that relies on the basic c = 1 case. In Figure 1 an algorithm is proposed [5] that was derived by the one described in [7]. The main idea is to have a shifting variable-size window in which a SC can be hypothesized according to (9). To reduce computations, the maximization (7) is not computed over all points 1 :::;n 1, but at a lower resolution rate. The resolution rate is increased when a potential SC is detected, in order to validate it and refine its time position. Let us refer to Figure 1. The starting window (WinMin) has to be small to contain no more than one SC, but large enough to allow reliable statistics of the criterion to be computed. It is located at the beginning of the input audio stream. Values of the criterion are computed with low resolution rate (ResLow), e.g. every 3 observations (step 2.). The window is enlarged (11.) until a potential SC is detected (3.), or a maximum size (WinMax) is reached (1.). In the first case, the potential SC is validated by computing the criterion values on the window centered around the candidate, and using an higher resolution rate (ResHigh) (4.). In the second case, the window is shifted on the right (13.). If the potential SC is validated (5.,7.) it is stored (6.), then, the window is set to the minimum size (8.) and placed just after the detected SC (9.). Steps 2-13 are repeated until all the input audio data have been processed (1.). 3. MODEL SELECTION CRITERIA Several model selection criteria have been proposed starting from the early 7s. As mentioned before, the seminal work of Akaike tried to extend the maximum likelihood principle with a term that estimates the dimension or complexity of the considered statistical model. Refinements to the Akaike s Information Criterion () were proposed by Schwarz [11], with the Bayesian Information Cri-

3 Parameters: WinMin: minimum window size WinMax: maximum window size WinDelta: window increase step WinStep: window shift step ResLow: low resolution ResHigh: high resolution N: input data length Thresh: threshold for the used criterion Variables: WinStart: left boundary of the window WinSize: current window size SC: detected spectral changes Subroutine: MaxSearch(WinStart,WinSize,Res): returns the best potential SC and its score, computed by a given criterion, inside the specified window at the given resolution Res. Initialization: WinStart=1 WinSize=WinMin SC=() Algorithm: 1. while (WinStart+WinSize < N) 2. (max,t)=maxsearch(winstart,winsize, ResLow) 3. if (max > Thresh) 4. (max,t)= MaxSearch(t - WinSize/2,WinSize, ResHigh) 5. if (max > Thresh) 6. push SC t 7. if (max > Thresh) 8. WinSize=WinMin 9. WinStart=t else if (WinSize < WinMax) 11. WinSize=WinSize + WinDelta 12. else 13. WinStart=WinStart + WinStep Figure 1: Algorithm for detecting multiple spectral changes. terion (), and by Bozdogan [3], with the Consistent (), and the Consistent with Fisher information (F). By following an information and coding theory approach to statistical modeling and stochastic complexity, Rissanen [1] and Wallace and Friedman [14] proposed in the 8s two different criteria, respectively called Minimum Description Length () and Minimum Message Length (M). Without going into the details of each method which would require too much space, the penalty terms derived by each of the mentioned criteria are given in Table 1. For the sake of comparison, a version of the Hotelling s T 2 test and the maximum likelihood method are also considered. Hotelling s Test The Hotelling s T 2 (T2) test [13] computes the maximum likelihood estimate of a changing point of the mean in the sample by: t = arg max t=1;:::;n 1 T 2 t (1) t(n t) = arg max (μx 1 μx 2 ) S 1 (μx p 1 μx 2 ) t=1;:::;n 1 n 2 where S p is the pooled variance: S p = 1 n 2 (t ^± 1 +(n t) ^± 2 ) (11) and (μx 1 ; ^± 1 ) and (μx 2 ; ^± 2 ) are, respectively, the sample means and covariance estimates on x 1 ;:::;x t and x t+1 ;:::;x n. The hypothesis of a changing point can again be accepted with a confidence level (1 ff) if: n d 1 (n 2)d T 2 t F d;n d 1;ff (12) where F d;n d 1;ff is upper ff1% point of the F- distribution with (d,n-d-1) degrees of freedom. Maximum Likelihood Test The Maximum Likelihood () criterion corresponds to a model selection criterion with a zero penalty function. Hence, a SC is detected if the two-segment model fits the data better than the single-segment model. 4. EVALUATION METRICS Segmenting an audio stream, like a broadcast news program, requires in general to detect spectral changes regarding: ffl acoustic sources, i.e. female/male speech, music ffl acoustic channels, i.e. wide/narrow band. According to [9], performance of automatic SC detection should be calculated with respect to a set of target SCs. To each target SC there is usually associated a time interval [S SC ;E SC ], rather than as a single point. This because silence or other non-speech events may occur between changes. Tolerances in detecting SCs can be introduced by extending such intervals. Hence, an hypothesized SC is considered correct if it falls inside one of the augmented target intervals [S SC tol; E SC + tol], where tol is the admitted tolerance. For comparing target and hypothesized SCs, one can adopt the recall and precision measures: a recall = 1 (13) a + c a precision = 1 (14) a + b where a is the number of hypothesized SCs that fall inside the target SC intervals, b is the number of hypothesized SCs that do not fall inside any target SC interval, and c is the number of target SC intervals which no hypothesized SC falls inside. 5. EXPERIMENTS UNDER CONTROLLED CONDITIONS Comparison of model selection criteria has been first performed under controlled conditions. Random samples of size n = 3 were generated according to different multivariate normal distributions, and for values of the dimension d =1; 5; 1. In particular, random samples were generated either by shifting the mean or by scaling the variances of a standard normal distribution.

4 Mean Shifting Random samples of size n=3 were generated according to the following scheme: x 1 ;:::x n 2 ο N d (x; ;I) (15) x n 2 +1 ;:::x n ο N d (x; + 1 ff; I) (16) with ff=.1,...,.5. Variance Scaling Random samples of size n=3 were generated according to the following scheme: x 1 ;:::x n 2 ο N d (x; ;I) (17) x n 2 +1 ;:::x n ο N d (x; ;ff I) (18) with ff=.5,.6,...,.9. Experimental Conditions The basic segmentation algorithm (c = 1) was applied to the above problems with a slight variation. Two-segment models (7) were only evaluated on the central third of the 1 data set, i.e. 3 n» t» 2 n. This to reliably compute 3 the model parameters. 1 data samples were generated for each focus condition. Finally, as for each condition the correct model has a diagonal covariance matrix, the number of free parameters k was set equal to 2 d. Performance in terms of precision/recall were computed, for each condition, by assuming transition detections correct if they fall within the interval [14; 16]. Moreover, each method was also evaluated on 1 homogeneous data samples generated according to a standard normal distribution. Hence, the statistic b of equation (14) was estimated by counting the number of hypothesized transition points found in the homogeneous samples. Finally, the T2 method was only applied to detect mean shifts in the data, with a confidence level ff =:95. Experimental Results Experimental results are reported in Figure 2. The three vertical plots on the left size correspond to experiments applying mean shifts, while the three plots on the right correspond to experiments applying variance scalings. Increasing values of the dimension d of the data are considered going from the top to the bottom plots. Each single plot shows precision versus recall performance of each criterion, under different shifting/scaling conditions. Vertical slices correspond, going rightward, to easier segmentation tasks. According to the definition of the precision/recall measures, best performing methods are those which are closest to the top-right corner of a slice. By looking at the two upper plots, which corresponding to dimension d =1, it results that the best two performing criteria are and. M follows with a lower recall, which gets closer to the best methods as the task gets easier. With a slightly better precision, but much lower recall,, T2 (just for mean shifting), and follow in the order. F often keeps abreast of the best methods, in terms of recall, but scores much lower in terms of precision. Results significantly change by looking at dimension d =5 and d =1. performs significantly better than, especially for the mean shifting case. M worsens significantly and gets close to the best methods just in the easiest variance scaling case. T2 provides the best precision/recall trade-off on the mean shifting conditions. shows a good precision-recall trade-off on both dimensions and conditions. In particular, shows the highest precision on the most difficult conditions (left most plot slices). 6. EXPERIMENTS ON REAL DATA Experiments with all the segmentation criteria were performed on audio data coming from a broadcast news data base. The aim is to detect spectral changes that occur within the signal that are mainly due to channel and source switches. The IBNC Corpus For testing purposes, data from the IBNC (Italian Broadcast News Corpus) database, developed at ITC-irst [8], were employed. The IBNC consists of 3 hours of radio news recordings, which were manually transcribed, segmented and labeled. The test set consists of six radio news programs (about minutes of audio signal) that were selected as a representative sample of the whole corpus, with respect to all the issues concerning automatic broadcast news transcription [4]. Table 2 reports statistics on the test set regarding segments. A segment is defined as a contiguous portion of audio signal, homogeneous in terms of acoustic source and channel. # average duration (s) music segments speech segments Table 2: Statistics of segments in the test set. The test set contains a total of 212 SCs (218 segments distributed among six news programs). Experimental Conditions Multivariate observations of dimension 13 were used, i.e. 12 mel-scaled cepstral coefficients and the log-energy. SCs detections was performed by using a tolerance value of 5ms. Multiple SC detection was performed by means of the algorithm shown in Figure 1. Moreover, in order to compute a precision/recall operating curve of each method, an empirical threshold was introduced in the decision criteria (9) and (12). In fact, the threshold can be seen as an empirically estimated additional penalty to the method. Different values of the threshold were tested and the resulting precision/recall statistics were computed. Experimental Results Precision vs. recall points of each method are shown in Figure 3. As a reference, complete curves are plotted for the and T2 methods. The left most points of all the model selection criteria correspond to setting the threshold

5 to the original value, i.e. zero. By looking at Figure 3 the following can be observed: ffl straightforward application of the methods on audio data provides high recall but very low precision; ffl by suitably tuning the threshold value, on each single method, much better performance can be achieved; ffl optimal values of the threshold make all methods, with the exception of T2, perform comparably well; ffl T2 performs significantly worse than all other methods. Moreover, no improvement was achieved even by using a universal pooled variance estimated as suggested in [15]; ffl,, and confirm to be among the best performing methods; ffl the pure empirically tuned method performs as well as the best model selection methods; 7. CONCLUSIONS Several model selection methods for acoustic segmentation were presented and tested, both on synthetic and real audio data. Tangible differences among the methods appeared in experiments performed under controlled conditions. In particular, methods with simple penalty functions showed to perform better with multivariate data. Methods based on the Fisher information (i.e. M and F) did not result competitive versus easier methods, at least on the here considered segmentation problems. Application of any method on real audio data requires introducing an empirical threshold on the decision criterion. Tuning the threshold on each method permits to achieve significantly better retrieval performance. Almost all the considered methods reached very similar optimal performance. Besides, methods which best performed on the synthetic data sets also worked well on the audio data. To conclude, a major point in applying the considered methods on audio data concerns their robustness with respect to the normality assumption on the data source. By the introduction of an empirical threshold in the decision criterion, all the tested selection criteria showed to be reasonably robust. Future work will be devoted to the development and evaluation of non parametric methods for the acoustic segmentation problem. [2] R. A. Baxter. Minimum Message Lenght Inference: Theory and Applications. PhD thesis, Department of Computer Science Monash University, Clayton, Victoria, Australia, [3] H. Bozdogan. Model selection and the Akaike s information criterion (): the general theory and its analytical extensions. Psychometrika, 52(3):345 37, [4] F. Brugnara, M. Cettolo, M. Federico, and D. Giuliani. A system for the segmentation and transcription of Italian radio news. In Proceedings of RIAO Content-Based Multimedia Information Access, Paris, France, 2. [5] M. Cettolo. Segmentation, classification and clustering of an Italian broadcast news corpus. In Proceedings of the RIAO International Conference, Paris, France, 2. [6] J. H. Conway and N. J. A. Sloane. Sphere Packing, Lattices and Groups. Springer Verlag, Berlin, Germany, [7] P. Delacourt, D. Kryze, and C. J. Wellekens. Speaker-based segmentation for audio data indexing. In Proceedings of the ESCA ETRW workshop Accessing Information in Spoken Audio, Cambridge, UK, [8] M. Federico, D. Giordani, and P. Coletti. Development and Evaluation of an Italian Broadcast News Corpus. In Proceedings of the Second International Conference on Language Resources and Evaluation (LREC), Athens, Greece, 2. [9] D. Liu and F. Kubala. Fast speaker change detection for broadcast news transcription and indexing. In Proceedings of the 6th European Conference on Speech Communication and Technology, pages , Budapest, Hungary, [1] J. Rissanen. Stochastic complexity. Journal of the Royal Statistical Society, B, 49(3): , [11] G. Schwarz. Estimating the dimension of a model. The Annals of Statistics, 6(2): , [12] G. A. F. Seber. Multivariate Observations. John Wiley & Sons, New York, NY, [13] M. S. Srivastava and E. M. Carter. An Introduction to Applied Multivariate Statistics. North-Holland, New York, NY, [14] C. S. Wallace and P. R. Freeman. Estimation and inference by compact coding. Journal of the Royal Statistical Society, B, 49(3):24 265, [15] S. Wegmann, P. Zhan, and L. Gillick. Progress in broadcast news transcription at Dragon Systems. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, volume I, pages 33 36, Phoenix, AZ, ACKNOWLEDGMENTS The here presented work was carried out within the European project CORETEX (IST ). The authors thank R. A. Baxter and D. Giuliani for their help and useful suggestions. REFERENCES [1] H. Akaike. On entropy maximization principle. In P. R. Krishnaiah, editor, Applications of Statistics, pages North-Holland, Amsterdam, Nederlands, 1977.

6 MEAN SHIFTING VARIANCE SCALING α=.1 1- α=.2 1- α=.3 1- α=.4 1- α=.5 1- α=.9 1- α=.8 1- α=.7 1- α=.6 1- α=.5 1- d=1 d=1 1 1 RECALL RECALL RECALL 5 F M T α=.1 α=.2 α=.3 α=.4 α=.5 α=.9 α=.8 α=.7 α=.6 α=.5 d=5 d= d=1 1 5 F M T α=.1 α=.2 α=.3 α=.4 α= F M F M α=.9 α=.8 α=.7 α=.6 α=.5 d=1 F M T PRECISION F M PRECISION Figure 2: Results of experiments under controlled conditions.

7 RECALL 85 8 aic bic caic caicf mdl ml mml t PRECISION Figure 3: Precision vs. recall curve by different methods on an audio segmentation task.

MODEL SELECTION CRITERIA FOR ACOUSTIC SEGMENTATION

MODEL SELECTION CRITERIA FOR ACOUSTIC SEGMENTATION = = = MODEL SELECTION CRITERIA FOR ACOUSTIC SEGMENTATION Mauro Cettolo and Marcello Federico ITC-irst - Centro per la Ricerca Scientifica e Tecnoloica I-385 Povo, Trento, Italy ABSTRACT Robust acoustic

More information

Minimum message length estimation of mixtures of multivariate Gaussian and von Mises-Fisher distributions

Minimum message length estimation of mixtures of multivariate Gaussian and von Mises-Fisher distributions Minimum message length estimation of mixtures of multivariate Gaussian and von Mises-Fisher distributions Parthan Kasarapu & Lloyd Allison Monash University, Australia September 8, 25 Parthan Kasarapu

More information

Minimum Message Length Autoregressive Model Order Selection

Minimum Message Length Autoregressive Model Order Selection Minimum Message Length Autoregressive Model Order Selection Leigh J. Fitzgibbon School of Computer Science and Software Engineering, Monash University Clayton, Victoria 38, Australia leighf@csse.monash.edu.au

More information

A NEW INFORMATION THEORETIC APPROACH TO ORDER ESTIMATION PROBLEM. Massachusetts Institute of Technology, Cambridge, MA 02139, U.S.A.

A NEW INFORMATION THEORETIC APPROACH TO ORDER ESTIMATION PROBLEM. Massachusetts Institute of Technology, Cambridge, MA 02139, U.S.A. A EW IFORMATIO THEORETIC APPROACH TO ORDER ESTIMATIO PROBLEM Soosan Beheshti Munther A. Dahleh Massachusetts Institute of Technology, Cambridge, MA 0239, U.S.A. Abstract: We introduce a new method of model

More information

Minimum Message Length Grouping of Ordered Data

Minimum Message Length Grouping of Ordered Data Minimum Message Length Grouping of Ordered Data Leigh J. Fitzgibbon, Lloyd Allison, and David L. Dowe School of Computer Science and Software Engineering Monash University, Clayton, VIC 3168 Australia

More information

Covariance Matrix Enhancement Approach to Train Robust Gaussian Mixture Models of Speech Data

Covariance Matrix Enhancement Approach to Train Robust Gaussian Mixture Models of Speech Data Covariance Matrix Enhancement Approach to Train Robust Gaussian Mixture Models of Speech Data Jan Vaněk, Lukáš Machlica, Josef V. Psutka, Josef Psutka University of West Bohemia in Pilsen, Univerzitní

More information

LTI Systems, Additive Noise, and Order Estimation

LTI Systems, Additive Noise, and Order Estimation LTI Systems, Additive oise, and Order Estimation Soosan Beheshti, Munther A. Dahleh Laboratory for Information and Decision Systems Department of Electrical Engineering and Computer Science Massachusetts

More information

MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava

MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS Maya Gupta, Luca Cazzanti, and Santosh Srivastava University of Washington Dept. of Electrical Engineering Seattle,

More information

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1 EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle

More information

Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition

Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition ABSTRACT It is well known that the expectation-maximization (EM) algorithm, commonly used to estimate hidden

More information

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30 MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD Copyright c 2012 (Iowa State University) Statistics 511 1 / 30 INFORMATION CRITERIA Akaike s Information criterion is given by AIC = 2l(ˆθ) + 2k, where l(ˆθ)

More information

The Minimum Message Length Principle for Inductive Inference

The Minimum Message Length Principle for Inductive Inference The Principle for Inductive Inference Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population Health University of Melbourne University of Helsinki, August 25,

More information

Approximating the Covariance Matrix with Low-rank Perturbations

Approximating the Covariance Matrix with Low-rank Perturbations Approximating the Covariance Matrix with Low-rank Perturbations Malik Magdon-Ismail and Jonathan T. Purnell Department of Computer Science Rensselaer Polytechnic Institute Troy, NY 12180 {magdon,purnej}@cs.rpi.edu

More information

Robust Speaker Identification

Robust Speaker Identification Robust Speaker Identification by Smarajit Bose Interdisciplinary Statistical Research Unit Indian Statistical Institute, Kolkata Joint work with Amita Pal and Ayanendranath Basu Overview } } } } } } }

More information

On the Influence of the Delta Coefficients in a HMM-based Speech Recognition System

On the Influence of the Delta Coefficients in a HMM-based Speech Recognition System On the Influence of the Delta Coefficients in a HMM-based Speech Recognition System Fabrice Lefèvre, Claude Montacié and Marie-José Caraty Laboratoire d'informatique de Paris VI 4, place Jussieu 755 PARIS

More information

Minimum Message Length Clustering of Spatially-Correlated Data with Varying Inter-Class Penalties

Minimum Message Length Clustering of Spatially-Correlated Data with Varying Inter-Class Penalties Minimum Message Length Clustering of Spatially-Correlated Data with Varying Inter-Class Penalties Gerhard Visser Clayton School of I.T., Monash University, Clayton, Vic. 3168, Australia, (gvis1@student.monash.edu.au)

More information

Pattern Classification

Pattern Classification Pattern Classification Introduction Parametric classifiers Semi-parametric classifiers Dimensionality reduction Significance testing 6345 Automatic Speech Recognition Semi-Parametric Classifiers 1 Semi-Parametric

More information

Motivating the Covariance Matrix

Motivating the Covariance Matrix Motivating the Covariance Matrix Raúl Rojas Computer Science Department Freie Universität Berlin January 2009 Abstract This note reviews some interesting properties of the covariance matrix and its role

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population

More information

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems Jeremy S. Conner and Dale E. Seborg Department of Chemical Engineering University of California, Santa Barbara, CA

More information

Gaussian Models

Gaussian Models Gaussian Models ddebarr@uw.edu 2016-04-28 Agenda Introduction Gaussian Discriminant Analysis Inference Linear Gaussian Systems The Wishart Distribution Inferring Parameters Introduction Gaussian Density

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Singer Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers

Singer Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers Singer Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers Kumari Rambha Ranjan, Kartik Mahto, Dipti Kumari,S.S.Solanki Dept. of Electronics and Communication Birla

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

A MULTIVARIATE MODEL FOR COMPARISON OF TWO DATASETS AND ITS APPLICATION TO FMRI ANALYSIS

A MULTIVARIATE MODEL FOR COMPARISON OF TWO DATASETS AND ITS APPLICATION TO FMRI ANALYSIS A MULTIVARIATE MODEL FOR COMPARISON OF TWO DATASETS AND ITS APPLICATION TO FMRI ANALYSIS Yi-Ou Li and Tülay Adalı University of Maryland Baltimore County Baltimore, MD Vince D. Calhoun The MIND Institute

More information

Order Selection for Vector Autoregressive Models

Order Selection for Vector Autoregressive Models IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 51, NO. 2, FEBRUARY 2003 427 Order Selection for Vector Autoregressive Models Stijn de Waele and Piet M. T. Broersen Abstract Order-selection criteria for vector

More information

On the Behavior of Information Theoretic Criteria for Model Order Selection

On the Behavior of Information Theoretic Criteria for Model Order Selection IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 49, NO. 8, AUGUST 2001 1689 On the Behavior of Information Theoretic Criteria for Model Order Selection Athanasios P. Liavas, Member, IEEE, and Phillip A. Regalia,

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

Local Asymptotics and the Minimum Description Length

Local Asymptotics and the Minimum Description Length Local Asymptotics and the Minimum Description Length Dean P. Foster and Robert A. Stine Department of Statistics The Wharton School of the University of Pennsylvania Philadelphia, PA 19104-6302 March 27,

More information

Support Vector Machines using GMM Supervectors for Speaker Verification

Support Vector Machines using GMM Supervectors for Speaker Verification 1 Support Vector Machines using GMM Supervectors for Speaker Verification W. M. Campbell, D. E. Sturim, D. A. Reynolds MIT Lincoln Laboratory 244 Wood Street Lexington, MA 02420 Corresponding author e-mail:

More information

Detection of Outliers in Regression Analysis by Information Criteria

Detection of Outliers in Regression Analysis by Information Criteria Detection of Outliers in Regression Analysis by Information Criteria Seppo PynnÄonen, Department of Mathematics and Statistics, University of Vaasa, BOX 700, 65101 Vaasa, FINLAND, e-mail sjp@uwasa., home

More information

Model selection criteria Λ

Model selection criteria Λ Model selection criteria Λ Jean-Marie Dufour y Université de Montréal First version: March 1991 Revised: July 1998 This version: April 7, 2002 Compiled: April 7, 2002, 4:10pm Λ This work was supported

More information

The Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech

The Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech CS 294-5: Statistical Natural Language Processing The Noisy Channel Model Speech Recognition II Lecture 21: 11/29/05 Search through space of all possible sentences. Pick the one that is most probable given

More information

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Intelligent Data Analysis and Probabilistic Inference Lecture

More information

Comparison of Log-Linear Models and Weighted Dissimilarity Measures

Comparison of Log-Linear Models and Weighted Dissimilarity Measures Comparison of Log-Linear Models and Weighted Dissimilarity Measures Daniel Keysers 1, Roberto Paredes 2, Enrique Vidal 2, and Hermann Ney 1 1 Lehrstuhl für Informatik VI, Computer Science Department RWTH

More information

Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic Approximation Algorithm

Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic Approximation Algorithm EngOpt 2008 - International Conference on Engineering Optimization Rio de Janeiro, Brazil, 0-05 June 2008. Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic

More information

Regression I: Mean Squared Error and Measuring Quality of Fit

Regression I: Mean Squared Error and Measuring Quality of Fit Regression I: Mean Squared Error and Measuring Quality of Fit -Applied Multivariate Analysis- Lecturer: Darren Homrighausen, PhD 1 The Setup Suppose there is a scientific problem we are interested in solving

More information

Statistical and Inductive Inference by Minimum Message Length

Statistical and Inductive Inference by Minimum Message Length C.S. Wallace Statistical and Inductive Inference by Minimum Message Length With 22 Figures Springer Contents Preface 1. Inductive Inference 1 1.1 Introduction 1 1.2 Inductive Inference 5 1.3 The Demise

More information

Model Comparison. Course on Bayesian Inference, WTCN, UCL, February Model Comparison. Bayes rule for models. Linear Models. AIC and BIC.

Model Comparison. Course on Bayesian Inference, WTCN, UCL, February Model Comparison. Bayes rule for models. Linear Models. AIC and BIC. Course on Bayesian Inference, WTCN, UCL, February 2013 A prior distribution over model space p(m) (or hypothesis space ) can be updated to a posterior distribution after observing data y. This is implemented

More information

1. Personal Audio 2. Features 3. Segmentation 4. Clustering 5. Future Work

1. Personal Audio 2. Features 3. Segmentation 4. Clustering 5. Future Work Segmenting and Classifying Long-Duration Recordings of Personal Audio Dan Ellis and Keansub Lee Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY

More information

Dominant Feature Vectors Based Audio Similarity Measure

Dominant Feature Vectors Based Audio Similarity Measure Dominant Feature Vectors Based Audio Similarity Measure Jing Gu 1, Lie Lu 2, Rui Cai 3, Hong-Jiang Zhang 2, and Jian Yang 1 1 Dept. of Electronic Engineering, Tsinghua Univ., Beijing, 100084, China 2 Microsoft

More information

Algorithmisches Lernen/Machine Learning

Algorithmisches Lernen/Machine Learning Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines

More information

COMS 4771 Lecture Course overview 2. Maximum likelihood estimation (review of some statistics)

COMS 4771 Lecture Course overview 2. Maximum likelihood estimation (review of some statistics) COMS 4771 Lecture 1 1. Course overview 2. Maximum likelihood estimation (review of some statistics) 1 / 24 Administrivia This course Topics http://www.satyenkale.com/coms4771/ 1. Supervised learning Core

More information

A Generative Model Based Kernel for SVM Classification in Multimedia Applications

A Generative Model Based Kernel for SVM Classification in Multimedia Applications Appears in Neural Information Processing Systems, Vancouver, Canada, 2003. A Generative Model Based Kernel for SVM Classification in Multimedia Applications Pedro J. Moreno Purdy P. Ho Hewlett-Packard

More information

Model Estimation Example

Model Estimation Example Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 1 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification

More information

Curve Fitting Re-visited, Bishop1.2.5

Curve Fitting Re-visited, Bishop1.2.5 Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the

More information

Necessary Corrections in Intransitive Likelihood-Ratio Classifiers

Necessary Corrections in Intransitive Likelihood-Ratio Classifiers Necessary Corrections in Intransitive Likelihood-Ratio Classifiers Gang Ji and Jeff Bilmes SSLI-Lab, Department of Electrical Engineering University of Washington Seattle, WA 9895-500 {gang,bilmes}@ee.washington.edu

More information

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood

More information

PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS

PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS Jinjin Ye jinjin.ye@mu.edu Michael T. Johnson mike.johnson@mu.edu Richard J. Povinelli richard.povinelli@mu.edu

More information

Model-based cluster analysis: a Defence. Gilles Celeux Inria Futurs

Model-based cluster analysis: a Defence. Gilles Celeux Inria Futurs Model-based cluster analysis: a Defence Gilles Celeux Inria Futurs Model-based cluster analysis Model-based clustering (MBC) consists of assuming that the data come from a source with several subpopulations.

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Outline Introduction OLS Design of experiments Regression. Metamodeling. ME598/494 Lecture. Max Yi Ren

Outline Introduction OLS Design of experiments Regression. Metamodeling. ME598/494 Lecture. Max Yi Ren 1 / 34 Metamodeling ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University March 1, 2015 2 / 34 1. preliminaries 1.1 motivation 1.2 ordinary least square 1.3 information

More information

A Unifying View of Image Similarity

A Unifying View of Image Similarity ppears in Proceedings International Conference on Pattern Recognition, Barcelona, Spain, September 1 Unifying View of Image Similarity Nuno Vasconcelos and ndrew Lippman MIT Media Lab, nuno,lip mediamitedu

More information

PRINCIPAL COMPONENT ANALYSIS TO RANKING TECHNICAL EFFICIENCIES THROUGH STOCHASTIC FRONTIER ANALYSIS AND DEA

PRINCIPAL COMPONENT ANALYSIS TO RANKING TECHNICAL EFFICIENCIES THROUGH STOCHASTIC FRONTIER ANALYSIS AND DEA PRINCIPAL COMPONENT ANALYSIS TO RANKING TECHNICAL EFFICIENCIES THROUGH STOCHASTIC FRONTIER ANALYSIS AND DEA Sergio SCIPPACERCOLA Associate Professor, Department of Economics, Management, Institutions University

More information

Independent Component Analysis and Unsupervised Learning

Independent Component Analysis and Unsupervised Learning Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien National Cheng Kung University TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent

More information

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses. 1 Review: Let X 1, X,..., X n denote n independent random variables sampled from some distribution might not be normal!) with mean µ) and standard deviation σ). Then X µ σ n In other words, X is approximately

More information

Brock University. Probabilistic granule analysis. Department of Computer Science. Ivo Düntsch & Günther Gediga Technical Report # CS May 2008

Brock University. Probabilistic granule analysis. Department of Computer Science. Ivo Düntsch & Günther Gediga Technical Report # CS May 2008 Brock University Department of Computer Science Probabilistic granule analysis Ivo Düntsch & Günther Gediga Technical Report # CS-08-04 May 2008 Brock University Department of Computer Science St. Catharines,

More information

Speaker Verification Using Accumulative Vectors with Support Vector Machines

Speaker Verification Using Accumulative Vectors with Support Vector Machines Speaker Verification Using Accumulative Vectors with Support Vector Machines Manuel Aguado Martínez, Gabriel Hernández-Sierra, and José Ramón Calvo de Lara Advanced Technologies Application Center, Havana,

More information

Fuzzy Support Vector Machines for Automatic Infant Cry Recognition

Fuzzy Support Vector Machines for Automatic Infant Cry Recognition Fuzzy Support Vector Machines for Automatic Infant Cry Recognition Sandra E. Barajas-Montiel and Carlos A. Reyes-García Instituto Nacional de Astrofisica Optica y Electronica, Luis Enrique Erro #1, Tonantzintla,

More information

CS229 Project: Musical Alignment Discovery

CS229 Project: Musical Alignment Discovery S A A V S N N R R S CS229 Project: Musical Alignment iscovery Woodley Packard ecember 16, 2005 Introduction Logical representations of musical data are widely available in varying forms (for instance,

More information

How to Deal with Multiple-Targets in Speaker Identification Systems?

How to Deal with Multiple-Targets in Speaker Identification Systems? How to Deal with Multiple-Targets in Speaker Identification Systems? Yaniv Zigel and Moshe Wasserblat ICE Systems Ltd., Audio Analysis Group, P.O.B. 690 Ra anana 4307, Israel yanivz@nice.com Abstract In

More information

Machine Learning, Fall 2009: Midterm

Machine Learning, Fall 2009: Midterm 10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all

More information

Discriminant Analysis with High Dimensional. von Mises-Fisher distribution and

Discriminant Analysis with High Dimensional. von Mises-Fisher distribution and Athens Journal of Sciences December 2014 Discriminant Analysis with High Dimensional von Mises - Fisher Distributions By Mario Romanazzi This paper extends previous work in discriminant analysis with von

More information

FEATURE SELECTION USING FISHER S RATIO TECHNIQUE FOR AUTOMATIC SPEECH RECOGNITION

FEATURE SELECTION USING FISHER S RATIO TECHNIQUE FOR AUTOMATIC SPEECH RECOGNITION FEATURE SELECTION USING FISHER S RATIO TECHNIQUE FOR AUTOMATIC SPEECH RECOGNITION Sarika Hegde 1, K. K. Achary 2 and Surendra Shetty 3 1 Department of Computer Applications, NMAM.I.T., Nitte, Karkala Taluk,

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features

Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Heiga ZEN (Byung Ha CHUN) Nagoya Inst. of Tech., Japan Overview. Research backgrounds 2.

More information

Computational Connections Between Robust Multivariate Analysis and Clustering

Computational Connections Between Robust Multivariate Analysis and Clustering 1 Computational Connections Between Robust Multivariate Analysis and Clustering David M. Rocke 1 and David L. Woodruff 2 1 Department of Applied Science, University of California at Davis, Davis, CA 95616,

More information

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 10: Acoustic Models

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 10: Acoustic Models Statistical NLP Spring 2009 The Noisy Channel Model Lecture 10: Acoustic Models Dan Klein UC Berkeley Search through space of all possible sentences. Pick the one that is most probable given the waveform.

More information

Statistical NLP Spring The Noisy Channel Model

Statistical NLP Spring The Noisy Channel Model Statistical NLP Spring 2009 Lecture 10: Acoustic Models Dan Klein UC Berkeley The Noisy Channel Model Search through space of all possible sentences. Pick the one that is most probable given the waveform.

More information

COMPUTING THE REGRET TABLE FOR MULTINOMIAL DATA

COMPUTING THE REGRET TABLE FOR MULTINOMIAL DATA COMPUTIG THE REGRET TABLE FOR MULTIOMIAL DATA Petri Kontkanen, Petri Myllymäki April 12, 2005 HIIT TECHICAL REPORT 2005 1 COMPUTIG THE REGRET TABLE FOR MULTIOMIAL DATA Petri Kontkanen, Petri Myllymäki

More information

FINGERPRINT INFORMATION MAXIMIZATION FOR CONTENT IDENTIFICATION 1. Rohit Naini, Pierre Moulin

FINGERPRINT INFORMATION MAXIMIZATION FOR CONTENT IDENTIFICATION 1. Rohit Naini, Pierre Moulin 014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) FINGERPRINT INFORMATION MAXIMIZATION FOR CONTENT IDENTIFICATION 1 Rohit Naini, Pierre Moulin University of Illinois

More information

MAXIMUM A POSTERIORI ESTIMATION OF SIGNAL RANK. PO Box 1500, Edinburgh 5111, Australia. Arizona State University, Tempe AZ USA

MAXIMUM A POSTERIORI ESTIMATION OF SIGNAL RANK. PO Box 1500, Edinburgh 5111, Australia. Arizona State University, Tempe AZ USA MAXIMUM A POSTERIORI ESTIMATION OF SIGNAL RANK Songsri Sirianunpiboon Stephen D. Howard, Douglas Cochran 2 Defence Science Technology Organisation PO Box 500, Edinburgh 5, Australia 2 School of Mathematical

More information

FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING

FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING Vishwanath Mantha Department for Electrical and Computer Engineering Mississippi State University, Mississippi State, MS 39762 mantha@isip.msstate.edu ABSTRACT

More information

Maximum Likelihood Estimation. only training data is available to design a classifier

Maximum Likelihood Estimation. only training data is available to design a classifier Introduction to Pattern Recognition [ Part 5 ] Mahdi Vasighi Introduction Bayesian Decision Theory shows that we could design an optimal classifier if we knew: P( i ) : priors p(x i ) : class-conditional

More information

Algorithm-Independent Learning Issues

Algorithm-Independent Learning Issues Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning

More information

Support Vector Method for Multivariate Density Estimation

Support Vector Method for Multivariate Density Estimation Support Vector Method for Multivariate Density Estimation Vladimir N. Vapnik Royal Halloway College and AT &T Labs, 100 Schultz Dr. Red Bank, NJ 07701 vlad@research.att.com Sayan Mukherjee CBCL, MIT E25-201

More information

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.

More information

Karl-Rudolf Koch Introduction to Bayesian Statistics Second Edition

Karl-Rudolf Koch Introduction to Bayesian Statistics Second Edition Karl-Rudolf Koch Introduction to Bayesian Statistics Second Edition Karl-Rudolf Koch Introduction to Bayesian Statistics Second, updated and enlarged Edition With 17 Figures Professor Dr.-Ing., Dr.-Ing.

More information

FACTORIAL HMMS FOR ACOUSTIC MODELING. Beth Logan and Pedro Moreno

FACTORIAL HMMS FOR ACOUSTIC MODELING. Beth Logan and Pedro Moreno ACTORIAL HMMS OR ACOUSTIC MODELING Beth Logan and Pedro Moreno Cambridge Research Laboratories Digital Equipment Corporation One Kendall Square, Building 700, 2nd loor Cambridge, Massachusetts 02139 United

More information

BAYESIAN DECISION THEORY

BAYESIAN DECISION THEORY Last updated: September 17, 2012 BAYESIAN DECISION THEORY Problems 2 The following problems from the textbook are relevant: 2.1 2.9, 2.11, 2.17 For this week, please at least solve Problem 2.3. We will

More information

Stephen Scott.

Stephen Scott. 1 / 35 (Adapted from Ethem Alpaydin and Tom Mitchell) sscott@cse.unl.edu In Homework 1, you are (supposedly) 1 Choosing a data set 2 Extracting a test set of size > 30 3 Building a tree on the training

More information

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 9: Acoustic Models

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 9: Acoustic Models Statistical NLP Spring 2010 The Noisy Channel Model Lecture 9: Acoustic Models Dan Klein UC Berkeley Acoustic model: HMMs over word positions with mixtures of Gaussians as emissions Language model: Distributions

More information

Maximum Likelihood and Maximum A Posteriori Adaptation for Distributed Speaker Recognition Systems

Maximum Likelihood and Maximum A Posteriori Adaptation for Distributed Speaker Recognition Systems Maximum Likelihood and Maximum A Posteriori Adaptation for Distributed Speaker Recognition Systems Chin-Hung Sit 1, Man-Wai Mak 1, and Sun-Yuan Kung 2 1 Center for Multimedia Signal Processing Dept. of

More information

Estimation of Relative Operating Characteristics of Text Independent Speaker Verification

Estimation of Relative Operating Characteristics of Text Independent Speaker Verification International Journal of Engineering Science Invention Volume 1 Issue 1 December. 2012 PP.18-23 Estimation of Relative Operating Characteristics of Text Independent Speaker Verification Palivela Hema 1,

More information

New Methods of Mixture Model Cluster Analysis

New Methods of Mixture Model Cluster Analysis New Methods of Mixture Model Cluster Analysis James Wicker Editor/Researcher National Astronomical Observatory of China A little about me I came to Beijing in 2007 to work as a postdoctoral researcher

More information

TinySR. Peter Schmidt-Nielsen. August 27, 2014

TinySR. Peter Schmidt-Nielsen. August 27, 2014 TinySR Peter Schmidt-Nielsen August 27, 2014 Abstract TinySR is a light weight real-time small vocabulary speech recognizer written entirely in portable C. The library fits in a single file (plus header),

More information

KULLBACK-LEIBLER INFORMATION THEORY A BASIS FOR MODEL SELECTION AND INFERENCE

KULLBACK-LEIBLER INFORMATION THEORY A BASIS FOR MODEL SELECTION AND INFERENCE KULLBACK-LEIBLER INFORMATION THEORY A BASIS FOR MODEL SELECTION AND INFERENCE Kullback-Leibler Information or Distance f( x) gx ( ±) ) I( f, g) = ' f( x)log dx, If, ( g) is the "information" lost when

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete

More information

Selecting an optimal set of parameters using an Akaike like criterion

Selecting an optimal set of parameters using an Akaike like criterion Selecting an optimal set of parameters using an Akaike like criterion R. Moddemeijer a a University of Groningen, Department of Computing Science, P.O. Box 800, L-9700 AV Groningen, The etherlands, e-mail:

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n

More information

Discriminant analysis and supervised classification

Discriminant analysis and supervised classification Discriminant analysis and supervised classification Angela Montanari 1 Linear discriminant analysis Linear discriminant analysis (LDA) also known as Fisher s linear discriminant analysis or as Canonical

More information

VC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms

VC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms 03/Feb/2010 VC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms Presented by Andriy Temko Department of Electrical and Electronic Engineering Page 2 of

More information

Signal Modeling Techniques in Speech Recognition. Hassan A. Kingravi

Signal Modeling Techniques in Speech Recognition. Hassan A. Kingravi Signal Modeling Techniques in Speech Recognition Hassan A. Kingravi Outline Introduction Spectral Shaping Spectral Analysis Parameter Transforms Statistical Modeling Discussion Conclusions 1: Introduction

More information

Model Selection and Geometry

Model Selection and Geometry Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model

More information

Environmental Sound Classification in Realistic Situations

Environmental Sound Classification in Realistic Situations Environmental Sound Classification in Realistic Situations K. Haddad, W. Song Brüel & Kjær Sound and Vibration Measurement A/S, Skodsborgvej 307, 2850 Nærum, Denmark. X. Valero La Salle, Universistat Ramon

More information

AUTOMATED TEMPLATE MATCHING METHOD FOR NMIS AT THE Y-12 NATIONAL SECURITY COMPLEX

AUTOMATED TEMPLATE MATCHING METHOD FOR NMIS AT THE Y-12 NATIONAL SECURITY COMPLEX AUTOMATED TEMPLATE MATCHING METHOD FOR NMIS AT THE Y-1 NATIONAL SECURITY COMPLEX J. A. Mullens, J. K. Mattingly, L. G. Chiang, R. B. Oberer, J. T. Mihalczo ABSTRACT This paper describes a template matching

More information

An Introduction to Multivariate Statistical Analysis

An Introduction to Multivariate Statistical Analysis An Introduction to Multivariate Statistical Analysis Third Edition T. W. ANDERSON Stanford University Department of Statistics Stanford, CA WILEY- INTERSCIENCE A JOHN WILEY & SONS, INC., PUBLICATION Contents

More information

Decision trees on interval valued variables

Decision trees on interval valued variables Decision trees on interval valued variables Chérif Mballo 1,2 and Edwin Diday 2 1 ESIEA Recherche, 38 Rue des Docteurs Calmette et Guerin 53000 Laval France mballo@esiea-ouest.fr 2 LISE-CEREMADE Université

More information