Phoneme segmentation based on spectral metrics
|
|
- Joy Paul
- 5 years ago
- Views:
Transcription
1 Phoneme segmentation based on spectral metrics Xianhua Jiang, Johan Karlsson, and Tryphon T. Georgiou Introduction We consider the classic problem of segmenting speech signals into individual phonemes. This is a key step in current-day speech recognition applications, but can be equally well motivated based on speech coding and data compression. The mathematical problem is that of identifying time instances where a non-stationary waveform undergoes significant changes. It is customary to identify and employ a variety of features and cues for knowledge-based recognition and, accordingly, classification of sounds. There is a substantial literature on the subject for which we refer to [1, 7, 8, 9, 1] and the reference therein. The purpose of this paper is to explore the potential of certain natural metrics for comparing power spectra. Thus, we consider signals as consisting of stationary segments where second-order statistics contain most useful information, we estimate the power spectra based on sliding windows, and base our segmentation process on the relative distance between power spectra of neighboring windows. We utilize a (new) metric which quantifies differences in predictive qualities of two power spectra, and compare this with two other natural notions of distance (which can be motivated by analogous rationale). We also compare these with the well-known Itakura-Saito distance which has similarities with the aforementioned metric. We present results on a particular voice signal which Partially supported by the National Science Foundation and the Swedish research council. Dept. of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN 5555; jiang8@ece.umn.edu Department of Mathematics, Division of Optimization and Systems Theory, Royal Institute of Technology, 1 Stockholm, Sweden, johan.karlsson@math.kth.se Dept. of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN 5555; tryphon@ece.umn.edu
2 has been marked by a human specialist ( expert ). These data comes from the American-English DARPA-TIMIT database of []. Interestingly, the result of our, rather direct usage of metrics, is very similar to the marking by the expert. Approach We first motivate a metric between power spectra which was introduced in [3]. Our starting point is the degradation of the variance of the prediction error when the design of the predictor is based on the wrong choice between two alternatives. To this end, let f, f 1 represent spectral density functions of discrete-time zero-mean stochastic processes u fi (k) for i {, 1} and k Z, and let p fi (l) (l {1,, 3,...}) represent values for the coefficients that minimize the linear prediction error variance E{ u fi () p(l)u fi ( l) }. l=1 The optimal set of coefficients depends on the power spectral density function of the process. It is reasonable to take as a distance between f and f 1 the degradation of predictive error variance when the coefficients p(l) are selected assuming one of the two, and then used to predict the other process. The ratio of the degraded predictive error variance over the optimal variance turns out to be ρ(f, f 1 ) := E{ u f () l=1 p f 1 (l)u f ( l) } E{ u f () l=1 p f (l)u f ( l) } ρ(f, f 1 ) = exp ( π f (θ) dθ f 1(θ) π ) ( π f(θ) log( f ) dθ 1(θ) π (1) ), () see [3]. The derivation is based on the Szegö-Kolmogorov theory [5, 6]. Since the above ratio is greater than or equal to 1, ρ(f, f 1 ) 1 =: γ(f, f 1 ) (3a) takes values in [, ] and can be used to quantify dissimilarity between the shapes of f and f 1. If we consider the distance γ(f, f + ) between a nominal power spectral density f and a perturbations f +, and eliminate cubic terms and beyond, we are led (modulo a scaling factor of ) to the Riemannian pseudo-metric g f ( ) := π ( ) ( (θ) dθ π ) f(θ) π (θ) dθ () f(θ) π on density functions. It turns out that geodesic paths f τ (τ [, 1]) connecting spectral densities f, f 1 and having minimal length can be explicitly computed [3].
3 Furthermore, the length along such geodesics can be also explicitly computed in terms of the end points and becomes π ( log f ) ( 1(θ) dθ π f (θ) π log f ) 1(θ) dθ. (5) f (θ) π It is seen that this is a standard-deviation -like expression of the difference log(f 1 ) log(f ). It is interesting to observe that the logarithm maps spectra into a linear space where the L -norm relates to optimal prediction as above. Earlier distance measures, such as the Itakura-Saito distance [], as well as other distances between the spectral logarithms, have similar qualities and will also be used in our case study below. For the experimental component of this work, we apply the geodesic distance to speech signal analysis. We compare neighboring sliding windows and compute the distance between the corresponding power spectra. We use the following four distances: geodesic dist. ( π log f1(θ) f (θ) ) dθ π ( π ) f1(θ) dθ log f (θ) π L -log log(f ) log(f 1 ) L 1 -log log(f ) log(f 1 ) 1 Itakura-Saito f f 1 1 log( f f 1 ) 1 The demarkation corresponding to maximal distance is taken as the separation between nearby phonemes. The segmentation is compared with that drawn by an expert, and it was often found to be remarkably accurate. The speech signal and phoneme segmentation results are shown in Figure 1. The speech segment corresponds to the precise sentence She has your dark suit in greasy wash water all year. In the top sub-figure, diamonds mark phoneme boundaries as decided by a human expert. Asterisks, circles, squares and crosses indicate the boundaries between phonemes that were obtained by comparing the geodesic distance, L -log distance, L 1 -log distance and the Itakura-Saito distance, respectively, between spectra of neighboring windows. Sub-figures, 3, and, are plots of the distance between such neighboring windows obtained by the respective metrics, as a function of the border between neighboring windows. Local maxima suggest boundaries between phonemes. Because the Itakura-Saito distance fluctuates considerably, we plot this on a logarithmic scale. Figure shows more clearly the detail about a part of the sentence that corresponds to she has. We can see that the segmentation markers by all these methods are consistent with those drawn by the expert. The segmentation in this last figure sparates phonemes corresponding to the sounds S, i, h, a, s, e.
4 Geodesic Distance x 1 x 1 L log Distance 1 5 x 1 L1 log Distance 1 5 x 1 Itakura Saito Distance x 1 Figure 1. Speech sentence and phoneme segmentation. In the top subfigure, diamonds designate segmentation by an expert. Asterisks, circles, squares and crosses correspond to segmentation based on geodesic distance, L -log distance, L 1 -log distance and the Itakura-Saito distance respectively Geodesic Distance L log Distance L1 log Distance Itakura Saito Distance Figure. Part of the sentence and phoneme segmentation results. In the top sub-figure, diamonds indicate markings by an expert. Asterisks, circles, squares and crosses are marked based on geodesic distance, L -log distance, L 1 -log distance and the Itakura-Saito distance respectively.
5 Bibliography [1] A.M. Abdelatty Ali, J. Van der Spiegel, P. Mueller, G. Haentjens, and J. Berman, An Acoustic-Phonetic feature-based system for automatic phoneme recognition in continuous speech, in Proc. ISCAS, volume 3, pages 11811, May [] J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallett, and N. L. Dahlgren, DARPA TIMIT acoustic-phonetic continuous speech corpus, U.S. Department of Commerce, NIST, Gaithersburg, Md, USA, February [3] T.T. Georgiou, Distances between power spectral densities, preprint at: ( IEEE Trans. on Signal Processing, 55(8): , 7. [] R. Gray, A. Buzo, A. Gray, and Matsuyama, Distortion measures for speech processing, IEEE Trans. on Acoustics, Speech, and Signal Proc., 8(): , Aug [5] U. Grenander and G. Szegö, Toeplitz Forms and their Applications, Chelsea, [6] P. Stoica and R. Moses, Introduction to Spectral Analysis, Prentice Hall, 5. [7] D.B. Grayden and M.S. Scordilis, Phonemic segmentation of fluent speech, in Proc. ICASSP, vol.1, I/73-I/76, Apr 199. [8] B. Zioko, S. Manandhar, and R.C. Wilson, Phoneme segmentation of speech, in Proc. ICPR, vol., 8-85, 6. [9] C.J. Weinstein, S.S. Mccandless, L.F. Mondshein and V.W. Zue System for Acoustic-Phonetic Analysis Continuous Speech, IEEE Trans. on Acoustics, Speech, and Signal Proc., 3(1): 5-67, Feb [1] K. Demuynck and T. Laureys, A Comparison of Different Approaches to Automatic Speech Segmentation, Proceedings of the 5th International Conference on Text, Speech and Dialogue, 77-8,.
INPUT-TO-STATE COVARIANCES FOR SPECTRAL ANALYSIS: THE BIASED ESTIMATE
INPUT-TO-STATE COVARIANCES FOR SPECTRAL ANALYSIS: THE BIASED ESTIMATE JOHAN KARLSSON AND PER ENQVIST Abstract. In many practical applications second order moments are used for estimation of power spectrum.
More informationSINGLE-CHANNEL SPEECH PRESENCE PROBABILITY ESTIMATION USING INTER-FRAME AND INTER-BAND CORRELATIONS
204 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) SINGLE-CHANNEL SPEECH PRESENCE PROBABILITY ESTIMATION USING INTER-FRAME AND INTER-BAND CORRELATIONS Hajar Momeni,2,,
More informationLearning Recurrent Neural Networks with Hessian-Free Optimization: Supplementary Materials
Learning Recurrent Neural Networks with Hessian-Free Optimization: Supplementary Materials Contents 1 Pseudo-code for the damped Gauss-Newton vector product 2 2 Details of the pathological synthetic problems
More informationWavelet Transform in Speech Segmentation
Wavelet Transform in Speech Segmentation M. Ziółko, 1 J. Gałka 1 and T. Drwięga 2 1 Department of Electronics, AGH University of Science and Technology, Kraków, Poland, ziolko@agh.edu.pl, jgalka@agh.edu.pl
More informationBIAS CORRECTION METHODS FOR ADAPTIVE RECURSIVE SMOOTHING WITH APPLICATIONS IN NOISE PSD ESTIMATION. Robert Rehr, Timo Gerkmann
BIAS CORRECTION METHODS FOR ADAPTIVE RECURSIVE SMOOTHING WITH APPLICATIONS IN NOISE PSD ESTIMATION Robert Rehr, Timo Gerkmann Speech Signal Processing Group, Department of Medical Physics and Acoustics
More informationMetrics for power spectra: an axiomatic approach
SUBMTTED TO THE EEE TRANSACTONS ON SGNAL PROCESSNG 1 Metrics for power spectra: an axiomatic approach Tryphon T. Georgiou, Johan Karlsson, and Mir Shahrouz Takyar Abstract We present an axiomatic framework
More informationGeometric Methods for Spectral Analysis
1 Geometric Methods for Spectral Analysis Xianhua Jiang, Zhi-Quan (Tom) Luo, Fellow, IEEE, Tryphon T. Georgiou, Fellow, IEEE Abstract This paper explores a geometric framework for modeling non-stationary
More informationAnalysis of methods for speech signals quantization
INFOTEH-JAHORINA Vol. 14, March 2015. Analysis of methods for speech signals quantization Stefan Stojkov Mihajlo Pupin Institute, University of Belgrade Belgrade, Serbia e-mail: stefan.stojkov@pupin.rs
More informationImproved Detection of Nonlinearity in Nonstationary Signals by Piecewise Stationary Segmentation
International Journal of Bioelectromagnetism Vol. 14, No. 4, pp. 223-228, 2012 www.ijbem.org Improved Detection of Nonlinearity in Nonstationary Signals by Piecewise Stationary Segmentation Mahmoud Hassan
More informationENTROPY RATE-BASED STATIONARY / NON-STATIONARY SEGMENTATION OF SPEECH
ENTROPY RATE-BASED STATIONARY / NON-STATIONARY SEGMENTATION OF SPEECH Wolfgang Wokurek Institute of Natural Language Processing, University of Stuttgart, Germany wokurek@ims.uni-stuttgart.de, http://www.ims-stuttgart.de/~wokurek
More informationNon-Stationary Noise Power Spectral Density Estimation Based on Regional Statistics
Non-Stationary Noise Power Spectral Density Estimation Based on Regional Statistics Xiaofei Li, Laurent Girin, Sharon Gannot, Radu Horaud To cite this version: Xiaofei Li, Laurent Girin, Sharon Gannot,
More informationProc. of NCC 2010, Chennai, India
Proc. of NCC 2010, Chennai, India Trajectory and surface modeling of LSF for low rate speech coding M. Deepak and Preeti Rao Department of Electrical Engineering Indian Institute of Technology, Bombay
More informationMatrix Factorization for Speech Enhancement
Matrix Factorization for Speech Enhancement Peter Li Peter.Li@nyu.edu Yijun Xiao ryjxiao@nyu.edu 1 Introduction In this report, we explore techniques for speech enhancement using matrix factorization.
More informationExploring the Relationship between Conic Affinity of NMF Dictionaries and Speech Enhancement Metrics
Interspeech 2018 2-6 September 2018, Hyderabad Exploring the Relationship between Conic Affinity of NMF Dictionaries and Speech Enhancement Metrics Pavlos Papadopoulos, Colin Vaz, Shrikanth Narayanan Signal
More informationTHE PROBLEMS OF ROBUST LPC PARAMETRIZATION FOR. Petr Pollak & Pavel Sovka. Czech Technical University of Prague
THE PROBLEMS OF ROBUST LPC PARAMETRIZATION FOR SPEECH CODING Petr Polla & Pavel Sova Czech Technical University of Prague CVUT FEL K, 66 7 Praha 6, Czech Republic E-mail: polla@noel.feld.cvut.cz Abstract
More informationOptimal Speech Enhancement Under Signal Presence Uncertainty Using Log-Spectral Amplitude Estimator
1 Optimal Speech Enhancement Under Signal Presence Uncertainty Using Log-Spectral Amplitude Estimator Israel Cohen Lamar Signal Processing Ltd. P.O.Box 573, Yokneam Ilit 20692, Israel E-mail: icohen@lamar.co.il
More informationA Probability Model for Interaural Phase Difference
A Probability Model for Interaural Phase Difference Michael I. Mandel, Daniel P.W. Ellis Department of Electrical Engineering Columbia University, New York, New York {mim,dpwe}@ee.columbia.edu Abstract
More informationTime-domain representations
Time-domain representations Speech Processing Tom Bäckström Aalto University Fall 2016 Basics of Signal Processing in the Time-domain Time-domain signals Before we can describe speech signals or modelling
More informationEUSIPCO
EUSIPCO 213 1569744273 GAMMA HIDDEN MARKOV MODEL AS A PROBABILISTIC NONNEGATIVE MATRIX FACTORIZATION Nasser Mohammadiha, W. Bastiaan Kleijn, Arne Leijon KTH Royal Institute of Technology, Department of
More informationA New Distribution Metric for Image Segmentation
A New Distribution Metric for Image Segmentation Romeil Sandhu a Tryphon Georgiou b Allen Tannenbaum a a School of Electrical & Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332-0250
More informationExperiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition
Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition ABSTRACT It is well known that the expectation-maximization (EM) algorithm, commonly used to estimate hidden
More informationSpectral Domain Speech Enhancement using HMM State-Dependent Super-Gaussian Priors
IEEE SIGNAL PROCESSING LETTERS 1 Spectral Domain Speech Enhancement using HMM State-Dependent Super-Gaussian Priors Nasser Mohammadiha, Student Member, IEEE, Rainer Martin, Fellow, IEEE, and Arne Leijon,
More informationSensitivity Considerations in Compressed Sensing
Sensitivity Considerations in Compressed Sensing Louis L. Scharf, 1 Edwin K. P. Chong, 1,2 Ali Pezeshki, 2 and J. Rockey Luo 2 1 Department of Mathematics, Colorado State University Fort Collins, CO 8523,
More informationOn Moving Average Parameter Estimation
On Moving Average Parameter Estimation Niclas Sandgren and Petre Stoica Contact information: niclas.sandgren@it.uu.se, tel: +46 8 473392 Abstract Estimation of the autoregressive moving average (ARMA)
More informationMANY digital speech communication applications, e.g.,
406 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 2, FEBRUARY 2007 An MMSE Estimator for Speech Enhancement Under a Combined Stochastic Deterministic Speech Model Richard C.
More informationGMM-based classification from noisy features
GMM-based classification from noisy features Alexey Ozerov, Mathieu Lagrange and Emmanuel Vincent INRIA, Centre de Rennes - Bretagne Atlantique STMS Lab IRCAM - CNRS - UPMC alexey.ozerov@inria.fr, mathieu.lagrange@ircam.fr,
More informationFINITE-DIMENSIONAL models are central to most
1 A convex optimization approach to ARMA modeling Tryphon T. Georgiou and Anders Lindquist Abstract We formulate a convex optimization problem for approximating any given spectral density with a rational
More informationDesign of Spectrally Shaped Binary Sequences via Randomized Convex Relaxations
Design of Spectrally Shaped Binary Sequences via Randomized Convex Relaxations Dian Mo Department of Electrical and Computer Engineering University of Massachusetts Amherst, MA 3 mo@umass.edu Marco F.
More informationText-to-speech synthesizer based on combination of composite wavelet and hidden Markov models
8th ISCA Speech Synthesis Workshop August 31 September 2, 2013 Barcelona, Spain Text-to-speech synthesizer based on combination of composite wavelet and hidden Markov models Nobukatsu Hojo 1, Kota Yoshizato
More informationCourse 495: Advanced Statistical Machine Learning/Pattern Recognition
Course 495: Advanced Statistical Machine Learning/Pattern Recognition Lecturer: Stefanos Zafeiriou Goal (Lectures): To present discrete and continuous valued probabilistic linear dynamical systems (HMMs
More informationA Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement
A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement Simon Leglaive 1 Laurent Girin 1,2 Radu Horaud 1 1: Inria Grenoble Rhône-Alpes 2: Univ. Grenoble Alpes, Grenoble INP,
More informationApproximation Approach for Timing Jitter Characterization in Circuit Simulators
Approximation Approach for iming Jitter Characterization in Circuit Simulators MM.Gourary,S.G.Rusakov,S.L.Ulyanov,M.M.Zharov IPPM, Russian Academy of Sciences, Moscow, Russia K.K. Gullapalli, B. J. Mulvaney
More informationLIKELIHOOD-BASED ESTIMATION OF PERIODICITIES IN SYMBOLIC SEQUENCES. Dept. of Mathematical Statistics, Lund University, Sweden
LIKELIHOOD-BASED ESTIMATION OF PERIODICITIES IN SYMBOLIC SEQUENCES Stefan Ingi Adalbjörnsson, Johan Swärd, and Andreas Jaobsson Dept. of Mathematical Statistics, Lund University, Sweden ABSTRACT In this
More informationA SUBSPACE METHOD FOR SPEECH ENHANCEMENT IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes
A SUBSPACE METHOD FOR SPEECH ENHANCEMENT IN THE MODULATION DOMAIN Yu ang and Mike Brookes Department of Electrical and Electronic Engineering, Exhibition Road, Imperial College London, UK Email: {yw09,
More informationJOINT ACOUSTIC AND SPECTRAL MODELING FOR SPEECH DEREVERBERATION USING NON-NEGATIVE REPRESENTATIONS
JOINT ACOUSTIC AND SPECTRAL MODELING FOR SPEECH DEREVERBERATION USING NON-NEGATIVE REPRESENTATIONS Nasser Mohammadiha Paris Smaragdis Simon Doclo Dept. of Medical Physics and Acoustics and Cluster of Excellence
More informationSelecting Good Speech Features for Recognition
ETRI Journal, volume 18, number 1, April 1996 29 Selecting Good Speech Features for Recognition Youngjik Lee and Kyu-Woong Hwang CONTENTS I. INTRODUCTION II. COMPARISON OF MEASURES III. ANALYSIS OF SPEECH
More informationNoise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic Approximation Algorithm
EngOpt 2008 - International Conference on Engineering Optimization Rio de Janeiro, Brazil, 0-05 June 2008. Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic
More informationAutomatic Speech Recognition (CS753)
Automatic Speech Recognition (CS753) Lecture 12: Acoustic Feature Extraction for ASR Instructor: Preethi Jyothi Feb 13, 2017 Speech Signal Analysis Generate discrete samples A frame Need to focus on short
More informationCompression in the Space of Permutations
Compression in the Space of Permutations Da Wang Arya Mazumdar Gregory Wornell EECS Dept. ECE Dept. Massachusetts Inst. of Technology Cambridge, MA 02139, USA {dawang,gww}@mit.edu University of Minnesota
More informationFeature Extraction for ASR: Pitch
Feature Extraction for ASR: Pitch Wantee Wang 2015-03-14 16:55:51 +0800 Contents 1 Cross-correlation and Autocorrelation 1 2 Normalized Cross-Correlation Function 3 3 RAPT 4 4 Kaldi Pitch Tracker 5 Pitch
More informationOn the Influence of the Delta Coefficients in a HMM-based Speech Recognition System
On the Influence of the Delta Coefficients in a HMM-based Speech Recognition System Fabrice Lefèvre, Claude Montacié and Marie-José Caraty Laboratoire d'informatique de Paris VI 4, place Jussieu 755 PARIS
More informationModeling the creaky excitation for parametric speech synthesis.
Modeling the creaky excitation for parametric speech synthesis. 1 Thomas Drugman, 2 John Kane, 2 Christer Gobl September 11th, 2012 Interspeech Portland, Oregon, USA 1 University of Mons, Belgium 2 Trinity
More informationESTIMATION OF RELATIVE TRANSFER FUNCTION IN THE PRESENCE OF STATIONARY NOISE BASED ON SEGMENTAL POWER SPECTRAL DENSITY MATRIX SUBTRACTION
ESTIMATION OF RELATIVE TRANSFER FUNCTION IN THE PRESENCE OF STATIONARY NOISE BASED ON SEGMENTAL POWER SPECTRAL DENSITY MATRIX SUBTRACTION Xiaofei Li 1, Laurent Girin 1,, Radu Horaud 1 1 INRIA Grenoble
More informationGeneralized phase space projection for nonlinear noise reduction
Physica D 201 (2005) 306 317 Generalized phase space projection for nonlinear noise reduction Michael T Johnson, Richard J Povinelli Marquette University, EECE, 1515 W Wisconsin, Milwaukee, WI 53233, USA
More informationModifying Voice Activity Detection in Low SNR by correction factors
Modifying Voice Activity Detection in Low SNR by correction factors H. Farsi, M. A. Mozaffarian, H.Rahmani Department of Electrical Engineering University of Birjand P.O. Box: +98-9775-376 IRAN hfarsi@birjand.ac.ir
More informationsine wave fit algorithm
TECHNICAL REPORT IR-S3-SB-9 1 Properties of the IEEE-STD-57 four parameter sine wave fit algorithm Peter Händel, Senior Member, IEEE Abstract The IEEE Standard 57 (IEEE-STD-57) provides algorithms for
More informationNONNEGATIVE MATRIX FACTORIZATION WITH TRANSFORM LEARNING. Dylan Fagot, Herwig Wendt and Cédric Févotte
NONNEGATIVE MATRIX FACTORIZATION WITH TRANSFORM LEARNING Dylan Fagot, Herwig Wendt and Cédric Févotte IRIT, Université de Toulouse, CNRS, Toulouse, France firstname.lastname@irit.fr ABSTRACT Traditional
More informationCOLLABORATIVE SPEECH DEREVERBERATION: REGULARIZED TENSOR FACTORIZATION FOR CROWDSOURCED MULTI-CHANNEL RECORDINGS. Sanna Wager, Minje Kim
COLLABORATIVE SPEECH DEREVERBERATION: REGULARIZED TENSOR FACTORIZATION FOR CROWDSOURCED MULTI-CHANNEL RECORDINGS Sanna Wager, Minje Kim Indiana University School of Informatics, Computing, and Engineering
More informationFusion of Spectral Feature Sets for Accurate Speaker Identification
Fusion of Spectral Feature Sets for Accurate Speaker Identification Tomi Kinnunen, Ville Hautamäki, and Pasi Fränti Department of Computer Science University of Joensuu, Finland {tkinnu,villeh,franti}@cs.joensuu.fi
More informationSINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan
SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS Emad M. Grais and Hakan Erdogan Faculty of Engineering and Natural Sciences, Sabanci University, Orhanli
More informationSYMBOL RECOGNITION IN HANDWRITTEN MATHEMATI- CAL FORMULAS
SYMBOL RECOGNITION IN HANDWRITTEN MATHEMATI- CAL FORMULAS Hans-Jürgen Winkler ABSTRACT In this paper an efficient on-line recognition system for handwritten mathematical formulas is proposed. After formula
More informationRe-estimation of Linear Predictive Parameters in Sparse Linear Prediction
Downloaded from vbnaaudk on: januar 12, 2019 Aalborg Universitet Re-estimation of Linear Predictive Parameters in Sparse Linear Prediction Giacobello, Daniele; Murthi, Manohar N; Christensen, Mads Græsbøll;
More informationON ADVERSARIAL TRAINING AND LOSS FUNCTIONS FOR SPEECH ENHANCEMENT. Ashutosh Pandey 1 and Deliang Wang 1,2. {pandey.99, wang.5664,
ON ADVERSARIAL TRAINING AND LOSS FUNCTIONS FOR SPEECH ENHANCEMENT Ashutosh Pandey and Deliang Wang,2 Department of Computer Science and Engineering, The Ohio State University, USA 2 Center for Cognitive
More informationOn the estimation of the K parameter for the Rice fading distribution
On the estimation of the K parameter for the Rice fading distribution Ali Abdi, Student Member, IEEE, Cihan Tepedelenlioglu, Student Member, IEEE, Mostafa Kaveh, Fellow, IEEE, and Georgios Giannakis, Fellow,
More informationFACTORIAL HMMS FOR ACOUSTIC MODELING. Beth Logan and Pedro Moreno
ACTORIAL HMMS OR ACOUSTIC MODELING Beth Logan and Pedro Moreno Cambridge Research Laboratories Digital Equipment Corporation One Kendall Square, Building 700, 2nd loor Cambridge, Massachusetts 02139 United
More informationSIGNAL STRENGTH LOCALIZATION BOUNDS IN AD HOC & SENSOR NETWORKS WHEN TRANSMIT POWERS ARE RANDOM. Neal Patwari and Alfred O.
SIGNAL STRENGTH LOCALIZATION BOUNDS IN AD HOC & SENSOR NETWORKS WHEN TRANSMIT POWERS ARE RANDOM Neal Patwari and Alfred O. Hero III Department of Electrical Engineering & Computer Science University of
More informationThe effect of speaking rate and vowel context on the perception of consonants. in babble noise
The effect of speaking rate and vowel context on the perception of consonants in babble noise Anirudh Raju Department of Electrical Engineering, University of California, Los Angeles, California, USA anirudh90@ucla.edu
More informationSymmetric Distortion Measure for Speaker Recognition
ISCA Archive http://www.isca-speech.org/archive SPECOM 2004: 9 th Conference Speech and Computer St. Petersburg, Russia September 20-22, 2004 Symmetric Distortion Measure for Speaker Recognition Evgeny
More informationVOICE ACTIVITY DETECTION IN PRESENCE OF TRANSIENT NOISE USING SPECTRAL CLUSTERING AND DIFFUSION KERNELS
2014 IEEE 28-th Convention of Electrical and Electronics Engineers in Israel VOICE ACTIVITY DETECTION IN PRESENCE OF TRANSIENT NOISE USING SPECTRAL CLUSTERING AND DIFFUSION KERNELS Oren Rosen, Saman Mousazadeh
More informationRectified binaural ratio: A complex T-distributed feature for robust sound localization
Rectified binaural ratio: A complex T-distributed feature for robust sound localization Antoine Deleforge, Florence Forbes To cite this version: Antoine Deleforge, Florence Forbes. Rectified binaural ratio:
More informationEstimation of Relative Operating Characteristics of Text Independent Speaker Verification
International Journal of Engineering Science Invention Volume 1 Issue 1 December. 2012 PP.18-23 Estimation of Relative Operating Characteristics of Text Independent Speaker Verification Palivela Hema 1,
More informationON HYPERPARAMETER OPTIMIZATION IN LEARNING SYSTEMS
ON HYPERPARAMETER OPTIMIZATION IN LEARNING SYSTEMS Luca Franceschi 1,2, Michele Donini 1, Paolo Frasconi 3, Massimiliano Pontil 1,2 (1) Istituto Italiano di Tecnologia, Genoa, 16163 Italy (2) Dept of Computer
More informationLimited error based event localizing decomposition and its application to rate speech coding. Nguyen, Phu Chien; Akagi, Masato; Ng Author(s) Phu
JAIST Reposi https://dspace.j Title Limited error based event localizing decomposition and its application to rate speech coding Nguyen, Phu Chien; Akagi, Masato; Ng Author(s) Phu Citation Speech Communication,
More informationSpeaker Representation and Verification Part II. by Vasileios Vasilakakis
Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation
More informationSpectral and Textural Feature-Based System for Automatic Detection of Fricatives and Affricates
Spectral and Textural Feature-Based System for Automatic Detection of Fricatives and Affricates Dima Ruinskiy Niv Dadush Yizhar Lavner Department of Computer Science, Tel-Hai College, Israel Outline Phoneme
More informationIBM Research Report. A Convex-Hull Approach to Sparse Representations for Exemplar-Based Speech Recognition
RC25152 (W1104-113) April 25, 2011 Computer Science IBM Research Report A Convex-Hull Approach to Sparse Representations for Exemplar-Based Speech Recognition Tara N Sainath, David Nahamoo, Dimitri Kanevsky,
More informationApplication of the Tuned Kalman Filter in Speech Enhancement
Application of the Tuned Kalman Filter in Speech Enhancement Orchisama Das, Bhaswati Goswami and Ratna Ghosh Department of Instrumentation and Electronics Engineering Jadavpur University Kolkata, India
More informationOptimization of Variable-length Code for Data. Compression of memoryless Laplacian source
Optimization of Variable-length Code for Data Compression of memoryless Laplacian source Marko D. Petković, Zoran H. Perić, Aleksandar V. Mosić Abstract In this paper we present the efficient technique
More informationPrediction Transform GMM Vector Quantization for Wideband LSFs
JOURNAL OF COMMUNICATION AND INFORMATION SYSTEMS, VOL. 29, NO. 1, MAY 2014. 56 Prediction Transform GMM Vector Quantization for Wideband LSFs Miguel Arjona Ramírez Abstract Split vector quantizers are
More informationSinger Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers
Singer Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers Kumari Rambha Ranjan, Kartik Mahto, Dipti Kumari,S.S.Solanki Dept. of Electronics and Communication Birla
More informationSoft-Output Trellis Waveform Coding
Soft-Output Trellis Waveform Coding Tariq Haddad and Abbas Yongaçoḡlu School of Information Technology and Engineering, University of Ottawa Ottawa, Ontario, K1N 6N5, Canada Fax: +1 (613) 562 5175 thaddad@site.uottawa.ca
More informationAdaptive beamforming for uniform linear arrays with unknown mutual coupling. IEEE Antennas and Wireless Propagation Letters.
Title Adaptive beamforming for uniform linear arrays with unknown mutual coupling Author(s) Liao, B; Chan, SC Citation IEEE Antennas And Wireless Propagation Letters, 2012, v. 11, p. 464-467 Issued Date
More informationEmpirical Lower Bound on the Bitrate for the Transparent Memoryless Coding of Wideband LPC Parameters
Empirical Lower Bound on the Bitrate for the Transparent Memoryless Coding of Wideband LPC Parameters Author So, Stephen, Paliwal, Kuldip Published 2006 Journal Title IEEE Signal Processing Letters DOI
More informationSIGNAL STRENGTH LOCALIZATION BOUNDS IN AD HOC & SENSOR NETWORKS WHEN TRANSMIT POWERS ARE RANDOM. Neal Patwari and Alfred O.
SIGNAL STRENGTH LOCALIZATION BOUNDS IN AD HOC & SENSOR NETWORKS WHEN TRANSMIT POWERS ARE RANDOM Neal Patwari and Alfred O. Hero III Department of Electrical Engineering & Computer Science University of
More informationJoint Filtering and Factorization for Recovering Latent Structure from Noisy Speech Data
Joint Filtering and Factorization for Recovering Latent Structure from Noisy Speech Data Colin Vaz, Vikram Ramanarayanan, and Shrikanth Narayanan Ming Hsieh Department of Electrical Engineering University
More informationEEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1
EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle
More informationA NEURAL NETWORK ALTERNATIVE TO NON-NEGATIVE AUDIO MODELS. University of Illinois at Urbana-Champaign Adobe Research
A NEURAL NETWORK ALTERNATIVE TO NON-NEGATIVE AUDIO MODELS Paris Smaragdis, Shrikant Venkataramani University of Illinois at Urbana-Champaign Adobe Research ABSTRACT We present a neural network that can
More informationDominant Feature Vectors Based Audio Similarity Measure
Dominant Feature Vectors Based Audio Similarity Measure Jing Gu 1, Lie Lu 2, Rui Cai 3, Hong-Jiang Zhang 2, and Jian Yang 1 1 Dept. of Electronic Engineering, Tsinghua Univ., Beijing, 100084, China 2 Microsoft
More informationModel Order Selection for Probing-based Power System Mode Estimation
Selection for Probing-based Power System Mode Estimation Vedran S. Perić, Tetiana Bogodorova, KTH Royal Institute of Technology, Stockholm, Sweden, vperic@kth.se, tetianab@kth.se Ahmet. Mete, University
More informationCo-Prime Arrays and Difference Set Analysis
7 5th European Signal Processing Conference (EUSIPCO Co-Prime Arrays and Difference Set Analysis Usham V. Dias and Seshan Srirangarajan Department of Electrical Engineering Bharti School of Telecommunication
More informationNearly Perfect Detection of Continuous F 0 Contour and Frame Classification for TTS Synthesis. Thomas Ewender
Nearly Perfect Detection of Continuous F 0 Contour and Frame Classification for TTS Synthesis Thomas Ewender Outline Motivation Detection algorithm of continuous F 0 contour Frame classification algorithm
More informationNAS Report MTI FROM SAR. Prof. Chris Oliver, CBE NASoftware Ltd. 19th January 2007
NAS Report MTI FROM SAR Prof. Chris Oliver, CBE NASoftware Ltd. 19th January 27 1. Introduction NASoftware have devised and implemented a series of algorithms for deriving moving target information from
More informationc 2014 Jacob Daniel Bryan
c 2014 Jacob Daniel Bryan AUTOREGRESSIVE HIDDEN MARKOV MODELS AND THE SPEECH SIGNAL BY JACOB DANIEL BRYAN THESIS Submitted in partial fulfillment of the requirements for the degree of Master of Science
More informationTools for multivariable spectral and coherence analysis
Tools for multivariable spectral and coherence analysis Author: Tryphon Georgiou report: February 2007 Affiliation: University of Minnesota Summary This report discusses recent research on high resolution
More informationSensitivity of the Elucidative Fusion System to the Choice of the Underlying Similarity Metric
Sensitivity of the Elucidative Fusion System to the Choice of the Underlying Similarity Metric Belur V. Dasarathy Dynetics, Inc., P. O. Box 5500 Huntsville, AL. 35814-5500, USA Belur.d@dynetics.com Abstract
More informationRECTIFIED LINEAR UNIT CAN ASSIST GRIFFIN LIM PHASE RECOVERY. Kohei Yatabe, Yoshiki Masuyama and Yasuhiro Oikawa
RECTIFIED LINEAR UNIT CAN ASSIST GRIFFIN LIM PHASE RECOVERY Kohei Yatabe, Yoshiki Masuyama and Yasuhiro Oikawa Department of Intermedia Art and Science, Waseda University, Tokyo, Japan ABSTRACT Phase recovery
More informationEvaluation of the modified group delay feature for isolated word recognition
Evaluation of the modified group delay feature for isolated word recognition Author Alsteris, Leigh, Paliwal, Kuldip Published 25 Conference Title The 8th International Symposium on Signal Processing and
More informationSIGNALS with sparse representations can be recovered
IEEE SIGNAL PROCESSING LETTERS, VOL. 22, NO. 9, SEPTEMBER 2015 1497 Cramér Rao Bound for Sparse Signals Fitting the Low-Rank Model with Small Number of Parameters Mahdi Shaghaghi, Student Member, IEEE,
More informationCS 188: Artificial Intelligence Fall 2011
CS 188: Artificial Intelligence Fall 2011 Lecture 20: HMMs / Speech / ML 11/8/2011 Dan Klein UC Berkeley Today HMMs Demo bonanza! Most likely explanation queries Speech recognition A massive HMM! Details
More informationPRESENT day signal processing is firmly rooted in the
212 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL 52, NO 2, FEBRUARY 2007 The Carathéodory Fejér Pisarenko Decomposition Its Multivariable Counterpart Tryphon T Georgiou, Fellow, IEEE Abstract When a covariance
More informationEIGENFILTERS FOR SIGNAL CANCELLATION. Sunil Bharitkar and Chris Kyriakakis
EIGENFILTERS FOR SIGNAL CANCELLATION Sunil Bharitkar and Chris Kyriakakis Immersive Audio Laboratory University of Southern California Los Angeles. CA 9. USA Phone:+1-13-7- Fax:+1-13-7-51, Email:ckyriak@imsc.edu.edu,bharitka@sipi.usc.edu
More informationAN ALGORITHM FOR COMPUTING FUNDAMENTAL SOLUTIONS OF DIFFERENCE OPERATORS
AN ALGORITHM FOR COMPUTING FUNDAMENTAL SOLUTIONS OF DIFFERENCE OPERATORS HENRIK BRANDÉN, AND PER SUNDQVIST Abstract We propose an FFT-based algorithm for computing fundamental solutions of difference operators
More informationRobust Speaker Identification
Robust Speaker Identification by Smarajit Bose Interdisciplinary Statistical Research Unit Indian Statistical Institute, Kolkata Joint work with Amita Pal and Ayanendranath Basu Overview } } } } } } }
More informationAdapting Wavenet for Speech Enhancement DARIO RETHAGE JULY 12, 2017
Adapting Wavenet for Speech Enhancement DARIO RETHAGE JULY 12, 2017 I am v Master Student v 6 months @ Music Technology Group, Universitat Pompeu Fabra v Deep learning for acoustic source separation v
More informationA New Approach for Computation of Timing Jitter in Phase Locked Loops
A New Approach for Computation of Timing Jitter in Phase ocked oops M M. Gourary (1), S. G. Rusakov (1), S.. Ulyanov (1), M.M. Zharov (1),.. Gullapalli (2), and B. J. Mulvaney (2) (1) IPPM, Russian Academy
More informationA Generalized Subspace Approach for Enhancing Speech Corrupted by Colored Noise
334 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL 11, NO 4, JULY 2003 A Generalized Subspace Approach for Enhancing Speech Corrupted by Colored Noise Yi Hu, Student Member, IEEE, and Philipos C
More informationTHE problem of parameter estimation in linear models is
138 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 54, NO. 1, JANUARY 2006 Minimax MSE Estimation of Deterministic Parameters With Noise Covariance Uncertainties Yonina C. Eldar, Member, IEEE Abstract In
More informationQuantum algorithms for computing short discrete logarithms and factoring RSA integers
Quantum algorithms for computing short discrete logarithms and factoring RSA integers Martin Ekerå, Johan Håstad February, 07 Abstract In this paper we generalize the quantum algorithm for computing short
More informationNONINVASIVE temperature estimation continues to attract
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 52, NO. 2, FEBRUARY 2005 221 Noninvasive Estimation of Tissue Temperature Via High-Resolution Spectral Analysis Techniques Ali Nasiri Amini*, Student Member,
More informationAn EM Algorithm for Localizing Multiple Sound Sources in Reverberant Environments
An EM Algorithm for Localizing Multiple Sound Sources in Reverberant Environments Michael I. Mandel, Daniel P. W. Ellis LabROSA, Dept. of Electrical Engineering Columbia University New York, NY {mim,dpwe}@ee.columbia.edu
More information