MODEL SELECTION CRITERIA FOR ACOUSTIC SEGMENTATION

Similar documents
MODEL SELECTION CRITERIA FOR ACOUSTIC SEGMENTATION

Robust Semiparametric Optimal Testing Procedure for Multiple Normal Means

A Mathematical Model for the Fire-extinguishing Rocket Flight in a Turbulent Atmosphere

Strong Interference and Spectrum Warfare

Adjustment of Sampling Locations in Rail-Geometry Datasets: Using Dynamic Programming and Nonlinear Filtering

PHY 133 Lab 1 - The Pendulum

LP Rounding and Combinatorial Algorithms for Minimizing Active and Busy Time

Statistics Hotelling s T Gary W. Oehlert School of Statistics 313B Ford Hall

Stat260: Bayesian Modeling and Inference Lecture Date: March 10, 2010

Minimum message length estimation of mixtures of multivariate Gaussian and von Mises-Fisher distributions

A Constant Complexity Fair Scheduler with O(log N) Delay Guarantee

Efficient method for obtaining parameters of stable pulse in grating compensated dispersion-managed communication systems

Convergence of DFT eigenvalues with cell volume and vacuum level

MODIFIED SPHERE DECODING ALGORITHMS AND THEIR APPLICATIONS TO SOME SPARSE APPROXIMATION PROBLEMS. Przemysław Dymarski and Rafał Romaniuk

Linearized optimal power flow

Stochastic learning feedback hybrid automata for dynamic power management in embedded systems

Ning Wu Institute for Traffic Engineering Ruhr University Bochum, Germany Tel: ; Fax: ;

10log(1/MSE) log(1/MSE)

Bidirectional Clustering of Weights for Finding Succinct Multivariate Polynomials

USE OF FILTERED SMITH PREDICTOR IN DMC

Equivalent rocking systems: Fundamental rocking parameters

Lecture 5 Processing microarray data

Asymptotic Behavior of a t Test Robust to Cluster Heterogeneity

Generalized Distance Metric as a Robust Similarity Measure for Mobile Object Trajectories

Decomposing compositional data: minimum chi-squared reduced-rank approximations on the simplex

LP Rounding and Combinatorial Algorithms for Minimizing Active and Busy Time

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING 1

HEAT TRANSFER IN EXHAUST SYSTEM OF A COLD START ENGINE AT LOW ENVIRONMENTAL TEMPERATURE

A NEW INFORMATION THEORETIC APPROACH TO ORDER ESTIMATION PROBLEM. Massachusetts Institute of Technology, Cambridge, MA 02139, U.S.A.

REAL-TIME TIME-FREQUENCY BASED BLIND SOURCE SEPARATION. Scott Rickard, Radu Balan, Justinian Rosca

Correlated Component Regression: A Fast Parsimonious Approach for Predicting Outcome Variables from a Large Number of Predictors

Experiment 3 The Simple Pendulum

What Causes Image Intensity Changes?

THE SIGNAL ESTIMATOR LIMIT SETTING METHOD

Optimal Maintenance Strategies for Wind Turbine Systems Under Stochastic Weather Conditions

Energizing Math with Engineering Applications

AMERICAN INSTITUTES FOR RESEARCH

STOCHASTICALLY GENERATED MULTIGROUP DIFFUSION COEFFICIENTS

On K-Means Cluster Preservation using Quantization Schemes

Stochastic simulations of genetic switch systems

Indeterminacy in discrete-time infinite-horizon models with non linear utility and endogenous labor

Levenberg-Marquardt-based OBS Algorithm using Adaptive Pruning Interval for System Identification with Dynamic Neural Networks

Causal Bayesian Networks

REPORT DOCUMENTATION PAGE

Genetic Algorithm Approach to Nonlinear Blind Source Separation

n j u = (3) b u Then we select m j u as a cross product between n j u and û j to create an orthonormal basis: m j u = n j u û j (4)

Modeling for control of a three degrees-of-freedom Magnetic. Levitation System

Experimental and Computational Studies of Gas Mixing in Conical Spouted Beds

Wire antenna model of the vertical grounding electrode

Evaluation of the SONAR Meter in Wet Gas Flow for an Offshore Field Development

Active filter synthesis based on nodal admittance matrix expansion

Sparse principal component analysis and its l 1 -relaxation

Numerical and Experimental Investigations of Lateral Cantilever Shaft Vibration of Passive-Pitch Vertical-Axis Ocean Current

Minimum Message Length Autoregressive Model Order Selection

Scheduling non-preemptive hard real-time tasks with strict periods

Amplitude Adaptive ASDM without Envelope Encoding

Journal of Inequalities in Pure and Applied Mathematics

INFLUENCE OF TUBE BUNDLE GEOMETRY ON HEAT TRANSFER TO FOAM FLOW

Round-off Error Free Fixed-Point Design of Polynomial FIR Predictors

Fourier Optics and Image Analysis

Emotional Optimized Design of Electro-hydraulic Actuators

Multi-sample structural equation models with mean structures, with special emphasis on assessing measurement invariance in cross-national research

Detection of Outliers in Regression Analysis by Information Criteria

Decentralized Control Design for Interconnected Systems Based on A Centralized Reference Controller

Translations on graphs with neighborhood preservation

REVIEW: Going from ONE to TWO Dimensions with Kinematics. Review of one dimension, constant acceleration kinematics. v x (t) = v x0 + a x t

LTI Systems, Additive Noise, and Order Estimation

Matrix multiplication: a group-theoretic approach

Thermo-Mechanical Damage Modeling of Polymer Matrix Composite Structures in Fire

(a) Find the function that describes the fraction of light bulbs failing by time t e (0.1)x dx = [ e (0.1)x ] t 0 = 1 e (0.1)t.

Lecture 8: Pesudorandom Generators (II) 1 Pseudorandom Generators for Bounded Computation

FLUID flow in a slightly inclined rectangular open

1. Introduction. ) exceeded the terminal velocity (U t

Mathematical Analysis of Efficiencies in Hydraulic Pumps for Automatic Transmissions

A Multigrid-like Technique for Power Grid Analysis

V DD. M 1 M 2 V i2. V o2 R 1 R 2 C C

A Performance Comparison Study with Information Criteria for MaxEnt Distributions

An EM Algorithm for the Student-t Cluster-Weighted Modeling

41903: Group-Based Inference

5 Shallow water Q-G theory.

Investigation of ternary systems

arxiv:cond-mat/ v1 [cond-mat.stat-mech] 17 Sep 1999

Strategies for Sustainable Development Planning of Savanna System Using Optimal Control Model

Termination criteria in the Moore-Skelboe Algorithm for Global Optimization by Interval Arithmetic

IN this paper we investigate a propagative impact model. A propagative model of simultaneous impact: existence, uniqueness, and design consequences

OPTIMAL BEAMFORMING AS A TIME DOMAIN EQUALIZATION PROBLEM WITH APPLICATION TO ROOM ACOUSTICS

A Maximum Entropy Approach to Classifying Gene Array Data Sets

Analysis of Outage and Throughput for Opportunistic Cooperative HARQ Systems over Time Correlated Fading Channels

Lazy Suspect-Set Computation: Fault Diagnosis for Deep Electrical Bugs

Conical Pendulum: Part 2 A Detailed Theoretical and Computational Analysis of the Period, Tension and Centripetal Forces

Examination of rapid depressurization phenomena modeling problems in VHTR following sudden DLOFC event

Deformations Preserving Gauß Curvature

Generalized Least-Squares Regressions V: Multiple Variables

Research, Education and Problem Solving as a Virtuous Circle

Teams to exploit spatial locality among agents

MATHCHEM 1.0. By Madhavan Narayanan Graduate Student, Department of Chemistry, Temple University

Available online at ScienceDirect. Procedia Computer Science 57 (2015 )

arxiv: v1 [cs.ai] 15 Nov 2013

Using Quasi-Newton Methods to Find Optimal Solutions to Problematic Kriging Systems

1 CHAPTER 7 PROJECTILES. 7.1 No Air Resistance

Transcription:

= = = MODEL SELECTION CRITERIA FOR ACOUSTIC SEGMENTATION Mauro Cettolo and Marcello Federico ITC-irst - Centro per la Ricerca Scientifica e Tecnoloica I-385 Povo, Trento, Italy ABSTRACT Robust acoustic sementation has become a critical issue in order to apply speech reconition to audio streams with variable acoustic content, e.. radio prorams. Many techniques in the literature base sementation on statistical model selection, by applyin the Bayesian Information Criterion. This work reviews alternative model selection criteria and presents comparative experiments both under controlled conditions and on a broadcast news corpus. 1. INTRODUCTION The problem of acoustic sementation and classification has become crucial to the application of automatic speech reconition to audio stream processin. For instance, in order to enerate transcripts of broadcast news prorams, it is necessary to isolate and filter out portions of the sinal which do not contain speech, e.. jinles and sinature tunes. Moreover, transcription accuracy can be sinificantly improved, by usin condition dependent acoustic models, if the speech sinal is semented and classified accordin to bandwidth, speaker ender, and speaker identity. In recent years, several alorithms have been presented which use a statistical decision criterion to detect spectral chanes (SCs) within the feature space of the sinal. Assumin that data are enerated by a Gaussian process, SCs are detected within a slidin window throuh a model selection method. The most likely SC is tested by comparin two hypothesis: (i) the window contains data enerated by the same distribution; (ii) the left and riht semi-windows, with respect to to the SC point, contain data drawn by two different distributions. The test is performed with a likelihood ratio that, besides the maximum likelihood of each hypothesis, takes into account the different size of the correspondin models. Usually, the Bayesian Information Criterion () [11] is applied to select the simplest and best fittin model. This paper reviews alternative model selection criteria and presents comparative experiments both on synthetic data and real audio data. 2. SEGMENTATION PROBLEM Acoustic sementation can be seen as a particular instance of the more eneral problem of partitionin data into distinct homoeneous reions [2]. The data partitionin problem arises in all applications which require to partition data into chunks, e.. imae processin, data minin, text processin, etc. The problem can be formulated as follows. Let be an ordered sample of data in the space. We assume that the data are enerated by a Gaussian process with at most transitions. The problem of sementation is that of detectin all the transition points in the data set. The eneral problem can be approached, without loss of enerality, by first considerin the simplest case. Sinle Transition Detection The search of one potential transition point oes throuh the definition of different statistical models: two-sement models ( ), each of them assumin: "!$#%# '&(*),+ - /. (1) 21!$#%# '&(*),+ - /. 3 (2) one sinle-sement model which assumes: 4 * 5! & ),+ -$ /. (3) The basic idea is to choose the model (6879$: ) that better fits the observations. The application of the maximum likelihood principle would however invariably lead to choosin one of the two-sement models, and hence to hypothesize a break point at some (;, as they have a hiher number of free parameters than the one-sement model. In order to take into account the notion of dimension of the model, the followin extension to the maximum likelihood principle was first proposed by Akaike [1]. The (Akaike s Information Criterion) suests to maximize the likelihood for each model < separately, obtainin say #> which @BA3C the model. # )?, and then choose the model for # EDF# is larest, where D*# is the dimension of Computations Given a sample G G G!$#%# H&IF),+ -$ /. the likelihood function achieves the maximum value [12]: = ) J )LKNM /O,PQ R(ST. S O,PRVU O XW (4)

D ] = at h - ji, the sample mean, and T. Name Author Year Reference Penalty Akaike 1972 [1] D Schwarz 1978 [11] Y Z%[F\ Bozdoan 1987 [3] Y Z%[F\ ^] Y F Bozdoan 1987 [3] D8]_Y Z`[*\ a] Z`[*\ SNb )dc S Rissanen 1987 [1] Y Z%[F\ ^] ) Y ]E Z%[F\ ) D8] K M Wallace & Freeman 1987 [14] ) e] Z%[F\ef ] Z`[*\ SNb )2c S Notation: Number of free parameters in the model. Size of the data sample. Dimension of the data space. b )dc Fisher Information matrix of the model. f Constant of the optimal -dimensional quantizin lattice [6]. k #%l ) #9nm ) #9nm po (5) the maximum-likelihood estimate of the covariance matrix. The number of free parameters of a multivariate normal distribution is equal to the dimension of the mean plus the number of variances and covariances to be estimated. For a full covariance matrix it is: Dq ) ]E K Table 1: Model Dimension Estimates. (6) Decision Rule Several model selection criteria have been proposed in the literature that can be applied to Akaike s framework of model selection. In eneral, each criterion proposes a penalty function r that takes into account the model dimension. By computin the likelihood function of each model, the followin decision rule can be derived. Look for the best two-sement model for the data: vqwnx 2l Xy{z{z{z y O K @da3c S T. S } K then, take the one-sement model function: Z%[F\ = }r K @da3c 2stru2su @da3c S T.~ S r (7) Z`[*\ S T. S r (8) and choose to sement the data at point o if and only if: ) Z%[F\ = s Z`[*\ = ) r s9r, ƒ (9) In the experimental part it will be shown that performance of the rule can be tuned by replacin the zero threshold with a value c to be empirically estimated. Multiple Transition Detection The extension of the method to an arbitrary lare number of potential sements requires considerin a number of competin models that combinatorially rows with and. In eneral, application dependent simplifications are introduced to reduce the complexity of the problem. For the acoustic sementation, the audio sinal can be semented throuh a slidin window. By keepin the window size sufficiently lare to reliably apply the method, and sufficiently short to avoid multiple transitions, a sementation alorithm can be devised that relies on the basic q case. In Fiure 1 an alorithm is proposed [5] that was derived by the one described in [7]. The main idea is to have a shiftin variable-size window in which a SC can be hypothesized accordin to (9). To reduce computations, the maximization (7) is not computed over all points q, but at a lower resolution rate. The resolution rate is increased when a potential SC is detected, in order to validate it and refine its time position. Let us refer to Fiure 1. The startin window (WinMin) has to be small to contain no more than one SC, but lare enouh to allow reliable statistics of the criterion to be computed. It is located at the beinnin of the input audio stream. Values of the criterion are computed with low resolution rate (ResLow), e.. every 3 observations (step 2.). The window is enlared (11.) until a potential SC is detected (3.), or a maximum size (WinMax) is reached (1.). In the first case, the potential SC is validated by computin the criterion values on the window centered around the candidate, and usin an hiher resolution rate (ResHih) (4.). In the second case, the window is shifted on the riht (13.). If the potential SC is validated (5.,7.) it is stored (6.), then, the window is set to the minimum size (8.) and placed just after the detected SC (9.). Steps 2-13 are repeated until all the input audio data have been processed (1.). 3. MODEL SELECTION CRITERIA Several model selection criteria have been proposed startin from the early 7s. As mentioned before, the seminal work of Akaike tried to extend the maximum likelihood principle with a term that estimates the dimension or complexity of the considered statistical model. Refinements to the Akaike s Information Criterion () were proposed by Schwarz [11], with the Bayesian Information Cri-

Parameters: WinMin: minimum window size WinMax: maximum window size WinDelta: window increase step WinStep: window shift step ResLow: low resolution ResHih: hih resolution N: input data lenth Thresh: threshold for the used criterion Variables: WinStart: left boundary of the window WinSize: current window size SC: detected spectral chanes Subroutine: MaxSearch(WinStart,WinSize,Res): returns the best potential SC and its score, computed by a iven criterion, inside the specified window at the iven resolution Res. Initialization: WinStart=1 WinSize=WinMin SC=() Alorithm: 1. while (WinStart+WinSize N) 2. (max,t)=maxsearch(winstart,winsize, ResLow) 3. if (max Thresh) 4. (max,t)= MaxSearch(t - WinSize/2,WinSize, ResHih) 5. if (max Thresh) 6. push SC t 7. if (max Thresh) 8. WinSize=WinMin 9. WinStart=t + 1 1. else if (WinSize WinMax) 11. WinSize=WinSize + WinDelta 12. else 13. WinStart=WinStart + WinStep Fiure 1: Alorithm for detectin multiple spectral chanes. terion (), and by Bozdoan [3], with the Consistent (), and the Consistent with Fisher information (F). By followin an information and codin theory approach to statistical modelin and stochastic complexity, Rissanen [1] and Wallace and Friedman [14] proposed in the 8s two different criteria, respectively called Minimum Description Lenth () and Minimum Messae Lenth (M). Without oin into the details of each method which would require too much space, the penalty terms derived by each of the mentioned criteria are iven in Table 1. For the sake of comparison, a version of the Hotellin s test and the maximum likelihood method are also considered. Hotellin s Test (T2) test [13] computes the maximum likelihood estimate of a chanin point of the mean in the sample by: The Hotellin s o ẅ ˆ \ vẅ x 2l /y{z{z{z y O ẅ ˆ \ vẅ x 2l /y{z{z{z y O (1) ) } K ) i ni o Š~O Œ ) i ni where Š Œ is the pooled variance: Š Œ K ). T ] ) }.~ T (11) and ) i. T and ) i. T 3 are, respectively, the sample means and covariance estimates on and 21. The hypothesis of a chanin point can aain be accepted with a confidence level ) Ž if: ƒ ) K s ƒ y O O Xy (12) where y O O Xy is upper Že * point of the F- distribution with (d,n-d-1) derees of freedom. Maximum Likelihood Test The Maximum Likelihood () criterion corresponds to a model selection criterion with a zero penalty function. Hence, a SC is detected if the two-sement model fits the data better than the sinle-sement model. 4. EVALUATION METRICS Sementin an audio stream, like a broadcast news proram, requires in eneral to detect spectral chanes reardin: acoustic sources, i.e. female/male speech, music acoustic channels, i.e. wide/narrow band. Accordin to [9], performance of automatic SC detection should be calculated with respect to a set of taret SCs. To each taret SC there is usually associated a time interval Š 4, rather than as a sinle point. This because silence or other non-speech events may occur between chanes. Tolerances in detectin SCs can be introduced by extendin such intervals. Hence, an hypothesized SC is considered correct if it falls inside one of the aumented taret intervals Š 6pAN@ ]špan@, where pan@ is the admitted tolerance. For comparin taret and hypothesized SCs, one can adopt the recall and precision measures: where recall ]œ precision ]Ÿž F (13) F (14) is the number of hypothesized SCs that fall inside the taret SC intervals, ž is the number of hypothesized SCs that do not fall inside any taret SC interval, and is the number of taret SC intervals which no hypothesized SC falls inside. 5. EXPERIMENTS UNDER CONTROLLED CONDITIONS Comparison of model selection criteria has been first performed under controlled conditions. Random samples of size j F were enerated accordin to different multivariate normal distributions, and for values of the dimension j?. In particular, random samples were enerated either by shiftin the mean or by scalin the variances of a standard normal distribution.

Mean Shiftin Random samples of size =3 were enerated accordin to the followin scheme: PR! & ),+/ b (15) PR 1 N 5! & ),+/ ]E Ž b (16) with Ž =.1,...,.5. Variance Scalin Random samples of size =3 were enerated accordin to the followin scheme: PR! &( ),+/ 9 b (17) PR 1! &( ),+/ 9 Ž> b (18) with Ž =.5,.6,...,.9. Experimental Conditions The basic sementation alorithm (} ) was applied to the above problems with a sliht variation. Two-sement models (7) were only evaluated on the central third of the data set, i.e. ª «š. This to reliably compute the model parameters. data samples were enerated for each focus condition. Finally, as for each condition the correct model has a diaonal covariance matrix, the number of free parameters D was set equal to K. Performance in terms of precision/recall were computed, for each condition, by assumin transition detections correct if they fall within the interval ` 3. Moreover, each method was also evaluated on homoeneous data samples enerated accordin to a standard normal distribution. Hence, the statistic ž of equation (14) was estimated by countin the number of hypothesized transition points found in the homoeneous samples. Finally, the T2 method was only applied to detect mean shifts in the data, with a confidence level Ž *. Experimental Results Experimental results are reported in Fiure 2. The three vertical plots on the left size correspond to experiments applyin mean shifts, while the three plots on the riht correspond to experiments applyin variance scalins. Increasin values of the dimension of the data are considered oin from the top to the bottom plots. Each sinle plot shows precision versus recall performance of each criterion, under different shiftin/scalin conditions. Vertical slices correspond, oin rihtward, to easier sementation tasks. Accordin to the definition of the precision/recall measures, best performin methods are those which are closest to the top-riht corner of a slice. By lookin at the two upper plots, which correspondin to dimension n, it results that the best two performin criteria are and. M follows with a lower recall, which ets closer to the best methods as the task ets easier. With a slihtly better precision, but much lower recall,, T2 (just for mean shiftin), and follow in the order. F often keeps abreast of the best methods, in terms of recall, but scores much lower in terms of precision. Results sinificantly chane by lookin at dimension and :. performs sinificantly better than, especially for the mean shiftin case. M worsens sinificantly and ets close to the best methods just in the easiest variance scalin case. T2 provides the best precision/recall trade-off on the mean shiftin conditions. shows a ood precision-recall trade-off on both dimensions and conditions. In particular, shows the hihest precision on the most difficult conditions (left most plot slices). 6. EXPERIMENTS ON REAL DATA Experiments with all the sementation criteria were performed on audio data comin from a broadcast news data base. The aim is to detect spectral chanes that occur within the sinal that are mainly due to channel and source switches. The IBNC Corpus For testin purposes, data from the IBNC (Italian Broadcast News Corpus) database, developed at ITC-irst [8], were employed. The IBNC consists of 3 hours of radio news recordins, which were manually transcribed, semented and labeled. The test set consists of six radio news prorams (about minutes of audio sinal) that were selected as a representative sample of the whole corpus, with respect to all the issues concernin automatic broadcast news transcription [4]. Table 2 reports statistics on the test set reardin sements. A sement is defined as a contiuous portion of audio sinal, homoeneous in terms of acoustic source and channel. # averae duration (s) music sements 17 2. speech sements 21 22.3 Table 2: Statistics of sements in the test set. The test set contains a total of 212 SCs (218 sements distributed amon six news prorams). Experimental Conditions Multivariate observations of dimension 13 were used, i.e. 12 mel-scaled cepstral coefficients and the lo-enery. SCs detections was performed by usin a tolerance value of 5ms. Multiple SC detection was performed by means of the alorithm shown in Fiure 1. Moreover, in order to compute a precision/recall operatin curve of each method, an empirical threshold was introduced in the decision criteria (9) and (12). In fact, the threshold can be seen as an empirically estimated additional penalty to the method. Different values of the threshold were tested and the resultin precision/recall statistics were computed. Experimental Results Precision vs. recall points of each method are shown in Fiure 3. As a reference, complete curves are plotted for the and T2 methods. The left most points of all the model selection criteria correspond to settin the threshold

to the oriinal value, i.e. zero. By lookin at Fiure 3 the followin can be observed: straihtforward application of the methods on audio data provides hih recall but very low precision; by suitably tunin the threshold value, on each sinle method, much better performance can be achieved; optimal values of the threshold make all methods, with the exception of T2, perform comparably well; T2 performs sinificantly worse than all other methods. Moreover, no improvement was achieved even by usin a universal pooled variance estimated as suested in [15];,, and confirm to be amon the best performin methods; the pure empirically tuned method performs as well as the best model selection methods; 7. CONCLUSIONS Several model selection methods for acoustic sementation were presented and tested, both on synthetic and real audio data. Tanible differences amon the methods appeared in experiments performed under controlled conditions. In particular, methods with simple penalty functions showed to perform better with multivariate data. Methods based on the Fisher information (i.e. M and F) did not result competitive versus easier methods, at least on the here considered sementation problems. Application of any method on real audio data requires introducin an empirical threshold on the decision criterion. Tunin the threshold on each method permits to achieve sinificantly better retrieval performance. Almost all the considered methods reached very similar optimal performance. Besides, methods which best performed on the synthetic data sets also worked well on the audio data. To conclude, a major point in applyin the considered methods on audio data concerns their robustness with respect to the normality assumption on the data source. By the introduction of an empirical threshold in the decision criterion, all the tested selection criteria showed to be reasonably robust. Future work will be devoted to the development and evaluation of non parametric methods for the acoustic sementation problem. [2] R. A. Baxter. Minimum Messae Lenht Inference: Theory and Applications. PhD thesis, Department of Computer Science Monash University, Clayton, Victoria, Australia, 1996. [3] H. Bozdoan. Model selection and the Akaike s information criterion (): the eneral theory and its analytical extensions. Psychometrika, 52(3):345 37, 1987. [4] F. Brunara, M. Cettolo, M. Federico, and D. Giuliani. A system for the sementation and transcription of Italian radio news. In Proceedins of RIAO Content-Based Multimedia Information Access, Paris, France, 2. [5] M. Cettolo. Sementation, classification and clusterin of an Italian broadcast news corpus. In Proceedins of the RIAO International Conference, Paris, France, 2. [6] J. H. Conway and N. J. A. Sloane. Sphere Packin, Lattices and Groups. Spriner Verla, Berlin, Germany, 1988. [7] P. Delacourt, D. Kryze, and C. J. Wellekens. Speaker-based sementation for audio data indexin. In Proceedins of the ESCA ETRW workshop Accessin Information in Spoken Audio, Cambride, UK, 1999. [8] M. Federico, D. Giordani, and P. Coletti. Development and Evaluation of an Italian Broadcast News Corpus. In Proceedins of the Second International Conference on Lanuae Resources and Evaluation (LREC), Athens, Greece, 2. [9] D. Liu and F. Kubala. Fast speaker chane detection for broadcast news transcription and indexin. In Proceedins of the 6th European Conference on Speech Communication and Technoloy, paes 131 134, Budapest, Hunary, 1999. [1] J. Rissanen. Stochastic complexity. Journal of the Royal Statistical Society, B, 49(3):223 239, 1987. [11] G. Schwarz. Estimatin the dimension of a model. The Annals of Statistics, 6(2):461 464, 1978. [12] G. A. F. Seber. Multivariate Observations. John Wiley & Sons, New York, NY, 1984. [13] M. S. Srivastava and E. M. Carter. An Introduction to Applied Multivariate Statistics. North-Holland, New York, NY, 1988. [14] C. S. Wallace and P. R. Freeman. Estimation and inference by compact codin. Journal of the Royal Statistical Society, B, 49(3):24 265, 1987. [15] S. Wemann, P. Zhan, and L. Gillick. Proress in broadcast news transcription at Draon Systems. In Proceedins of the IEEE International Conference on Acoustics, Speech and Sinal Processin, volume I, paes 33 36, Phoenix, AZ, 1999. 8. ACKNOWLEDGMENTS The here presented work was carried out within the European project CORETEX (IST-1999-11876). The authors thank R. A. Baxter and D. Giuliani for their help and useful suestions. REFERENCES [1] H. Akaike. On entropy maximization principle. In P. R. Krishnaiah, editor, Applications of Statistics, paes 27 41. North-Holland, Amsterdam, Nederlands, 1977.

MEAN SHIFTING VARIANCE SCALING α=.1 - α=.2 - α=.3 - α=.4 - α=.5 - α=.9 - α=.8 - α=.7 - α=.6 - α=.5 - d=1 d=1 RECALL RECALL RECALL 5 F M T-2 - - - - - - - - - - α=.1 α=.2 α=.3 α=.4 α=.5 α=.9 α=.8 α=.7 α=.6 α=.5 d=5 d=5 5 d=1 5 F M T-2 - - - - - α=.1 α=.2 α=.3 α=.4 α=.5 5 5 5 F M F M - - - - - α=.9 α=.8 α=.7 α=.6 α=.5 d=1 F M T-2 - - - - - PRECISION F M - - - - - PRECISION Fiure 2: Results of experiments under controlled conditions.

95 9 RECALL 85 8 aic bic caic caicf mdl ml mml t2 2 3 4 5 6 7 8 9 PRECISION Fiure 3: Precision vs. recall curve by different methods on an audio sementation task.