A Source Localization/Separation/Respatialization System Based on Unsupervised Classification of Interaural Cues
|
|
- Regina Harvey
- 5 years ago
- Views:
Transcription
1 A Source Localization/Separation/Respatialization System Based on Unsupervised Classification of Interaural Cues Joan Mouba and Sylvain Marchand SCRIME LaBRI, University of Bordeaux 1 firstname.name@labri.fr
2 Outline 1 Overview 2 Backgrounds 3 CASA-EM Methods 4 Results 5 Summary and Future Works
3 Outline 1 Overview 2 Backgrounds 3 CASA-EM Methods 4 Results 5 Summary and Future Works
4 Overview Given binaural audio mixtures, the system detects more than 4 sources; localizes each source (azimuth); reconstructs each source. Given a mono audio source, the system: generates a stereo source; positions the source at any location. based on Interaural Cues (ILD, ITD) Expectation Maximization approach
5 Outline 1 Overview 2 Backgrounds 3 CASA-EM Methods 4 Results 5 Summary and Future Works
6 Motivation Why? Binaural manipulation of source in mix Underdeterminated (degenerated) case Applications Virtual reality, hearing aids, live music... CASA-EM Subject independent Automatic processing Time-frequency processing
7 Problem Statement I Hypothesis Sources do not overlap in the t-f plane Windowed Disjoint Orthogonality S i (l, f ) S j (l, f ) = 0 i, j = 1,..., K i j
8 Problem Statement II Consequences Detection/Localization of phantom sources Cumulate energy spreading Interferences and distortions
9 Related Works DUET: [Rickard (2002)] - Computes ILD(l, f ), ITD(l, f ) - 2-dimensional power histogram (ITD ILD) [Viste (2003,2004)] - Estimates azimuth θ given interaural cues - 1-dimensional power histogram (θ) [Avendano (2003)] - Interchannel metric: panning index - Separation based on Gaussian window [Kameoka (2004)] - Spectrum density with tied Gaussian mixture - Separation of harmonic structures
10 Head Model ILD with shadow cast L(θ, f ) = α f sin θ c [Viste & Evangelista (2003)] ITD with shadow cast T (θ, f ) = β f r(sin θ + θ) c r: head radius c: sound celerity [Viste & Evangelista (2003)]
11 Source Localization Computes interaural cues: X ILD(t, f ) = 20log R (t,f ) 10 ; ITD p (t, f ) = 1 X L (t,f ) 2πf ( X R(t,f ) X L (t,f ) + 2πp ) Computes azimuth ( based on ILD and ITD: ( ) θ L (t, f ) = arcsin c ILD(t,f ) c ITDp(t,f ) α f ); θ T,p (t, f ) = Π r β f with Π(x) = x x x 5 + O(x 5 ) Finds p that minimizes: θ(t, f ) = θ T,m (t, f ) with m = argmin p θl (t, f ) θ T,p (t, f ) Cumulates the power in a histogram using a binary mask: h(θ) = f M θ(t, f )X L (t, f )X R (t, f )
12 Outline 1 Overview 2 Backgrounds 3 CASA-EM Methods 4 Results 5 Summary and Future Works
13 Source Localization/Separation Method Build histogram h(θ) Binomial smoothing and thresholding Local maxima search Outputs - Mixture order estimate (K ) - Locations of detected sources (θ 1, θ 2, θ K ) Example 2-source mixture K = 6, before threshold K = 2, after threshold
14 Gaussian Mixture Model (GMM) Θ = {θ 1,..., θ N } Each source associated to a Gaussian Gaussian Mix: {Γ} = {µ j, σ j, π j j = 1,..., K } : mean, standard deviation, weight for source j f K (θ Γ) = K j=1 π j φ j (θ γ j ) h(θ) with K j=1 π j = 1 Find Γ that best matches the data: Maximum Likelihood-Expectation Maximization objective: Γ (t+1) = argmax Γ L(Γ Θ) L(Γ (t) Θ).
15 EM Updates 2-order mix s θ ori θ est θ err s s EM Updates P K (k θ, Γ) P K (θ, k Γ) P K (θ Γ) P θ π k h(θ) P K (k θ, Γ) P h(θ) θ P θ µ k h(θ) θ P K (k θ, Γ) P h(θ) θ P K (k θ, Γ) P σk 2 θ h(θ) (θ µ k ) 2 P K (k θ, Γ) P h(θ) θ P K (k θ, Γ)
16 EM Updates 2-order mix s θ ori θ est θ err s s EM Updates P K (k θ, Γ) P K (θ, k Γ) P K (θ Γ) P θ π k h(θ) P K (k θ, Γ) P h(θ) θ P θ µ k h(θ) θ P K (k θ, Γ) P h(θ) θ P K (k θ, Γ) P σk 2 θ h(θ) (θ µ k ) 2 P K (k θ, Γ) P h(θ) θ P K (k θ, Γ)
17 EM Updates 2-order mix s θ ori θ est θ err s s EM Updates P K (k θ, Γ) P K (θ, k Γ) P K (θ Γ) P θ π k h(θ) P K (k θ, Γ) P h(θ) θ P θ µ k h(θ) θ P K (k θ, Γ) P h(θ) θ P K (k θ, Γ) P σk 2 θ h(θ) (θ µ k ) 2 P K (k θ, Γ) P h(θ) θ P K (k θ, Γ)
18 EM Updates 2-order mix s θ ori θ est θ err s s EM Updates P K (k θ, Γ) P K (θ, k Γ) P K (θ Γ) P θ π k h(θ) P K (k θ, Γ) P h(θ) θ P θ µ k h(θ) θ P K (k θ, Γ) P h(θ) θ P K (k θ, Γ) P σk 2 θ h(θ) (θ µ k ) 2 P K (k θ, Γ) P h(θ) θ P K (k θ, Γ)
19 Unmixing with probabilistic t-f Mask Philosophy each t-f bin belongs to all K sources Build a probabilistic mask for each source k M k (t, f ) = P K (k θ(t, f ), Γ) Energy allocation according to posterior probability S L (t, f ) = M k (t, f ) X L (t, f ) S R (t, f ) = M k (t, f ) X R (t, f )
20 Binaural Spatialization Method 1 hrtf subject (ρ, θ, φ, f ) depends on: subject, position, frequency CIPIC hrtf database (45 subjects) [Algazi et al (2001)] Spatialization Disk space - Table of reals - Interpolation not trivial... x L = s mean-hrtf L (θ) x R = s mean-hrtf R (θ)
21 Binaural Spatialization Method 2 w(t) x(t) FFT X(t, f) Spatialization X L(t, f) SPATIALIZATION IFFT + OVERLAP ADD ILD(θ, f) SPATIAL ITD(θ, f) CUES X R(t, f) θ X L (t, f ) = X(t, f ) 10 a/2 e j φ/2 X R (t, f ) = X(t, f ) 10 + a/2 e +j φ/2 x L(t) x R(t) with Disk space - Array of 202 reals - Geometrical interpolation a = ILD(θ, f )/(20dB) φ = ITD(θ, f ) 2πf
22 Outline 1 Overview 2 Backgrounds 3 CASA-EM Methods 4 Results 5 Summary and Future Works
23 Source Separation Results: Signals xylophone ( 55 ) (top) and horn (30 ) 2 2 amplitude x x amplitude samples x samples x 10 4 Rhythm respected Shape preserved Unmix similar to original
24 Source Separation Results: Listening Tests 2-source mix Mix original eguitar -80 unmix eguitar original saxo 80 unmix saxo 3-source mix Mix original piano -30 unmix piano original xylo 0 unmix xylo original trumpet 30 unmix trumpet Mean Opinion Score: 3 on 5 levels
25 Source Spatialization Results ReSPA xylo -45 fhorn 80 saxo -30 tuba 0 eguitar -80 Mean HRTF xylo -45 fhorn 80 saxo -30 tuba 0 eguitar -80 MHRTF better lateralization SSPA good enough MHRTF sounds more natural
26 Outline 1 Overview 2 Backgrounds 3 CASA-EM Methods 4 Results 5 Summary and Future Works
27 Summary Summary Source localization (azimuth) Source separation Source spatialization Future Works Study the localization of moving sources Implement the system in real-time environment Improve source separation with processing inside each bin Study the brightness of spectra to weight distance Conduct further MOS listening tests for spatialization
28 References J. Blauert: Spatial Hearing, MIT Press, H. Viste, G. Evangelista: Binaural Source Localization, PhD Thesis, O. Yilmaz and S. Rickard: Blind Separation of Speech Mixtures via Time-Frequency Masking, IEEE Transactions On signal Processing, Vol.52, NO.7, July V.R. Algazi, R.O. Duda, D.P. Thompson: The CIPIC HRTF database, Proc. IEEE WASPAA01, NY, pp , A. Dempster, N. Laird and D. Rubin: Maximum Likelihood from Incomplete Data via EM Algorithm, Journal of the Royal statistical Society series B, vol. 39, no. 1, pp.1-38, 1977.
An EM Algorithm for Localizing Multiple Sound Sources in Reverberant Environments
An EM Algorithm for Localizing Multiple Sound Sources in Reverberant Environments Michael I. Mandel, Daniel P. W. Ellis LabROSA, Dept. of Electrical Engineering Columbia University New York, NY {mim,dpwe}@ee.columbia.edu
More informationSpatial sound. Lecture 8: EE E6820: Speech & Audio Processing & Recognition. Columbia University Dept. of Electrical Engineering
EE E6820: Speech & Audio Processing & Recognition Lecture 8: Spatial sound 1 Spatial acoustics 2 Binaural perception 3 Synthesizing spatial audio 4 Extracting spatial sounds Dan Ellis
More informationRESPECT: A FREE SOFTWARE LIBRARY FOR SPECTRAL SOUND SYNTHESIS
RESPECT: A FREE SOFTWARE LIBRARY FOR SPECTRAL SOUND SYNTHESIS Sylvain Marchand SCRIME / LaBRI CNRS, Université Bordeaux 1 351 cours de la Libération, 33405 Talence cedex, France ABSTRACT ReSpect is a free
More informationA Probability Model for Interaural Phase Difference
A Probability Model for Interaural Phase Difference Michael I. Mandel, Daniel P.W. Ellis Department of Electrical Engineering Columbia University, New York, New York {mim,dpwe}@ee.columbia.edu Abstract
More informationCovariance smoothing and consistent Wiener filtering for artifact reduction in audio source separation
Covariance smoothing and consistent Wiener filtering for artifact reduction in audio source separation Emmanuel Vincent METISS Team Inria Rennes - Bretagne Atlantique E. Vincent (Inria) Artifact reduction
More informationScalable audio separation with light Kernel Additive Modelling
Scalable audio separation with light Kernel Additive Modelling Antoine Liutkus 1, Derry Fitzgerald 2, Zafar Rafii 3 1 Inria, Université de Lorraine, LORIA, UMR 7503, France 2 NIMBUS Centre, Cork Institute
More informationNonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation
Nonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation Mikkel N. Schmidt and Morten Mørup Technical University of Denmark Informatics and Mathematical Modelling Richard
More informationSource localization and separation for binaural hearing aids
Source localization and separation for binaural hearing aids Mehdi Zohourian, Gerald Enzner, Rainer Martin Listen Workshop, July 218 Institute of Communication Acoustics Outline 1 Introduction 2 Binaural
More informationREAL-TIME TIME-FREQUENCY BASED BLIND SOURCE SEPARATION. Scott Rickard, Radu Balan, Justinian Rosca. Siemens Corporate Research Princeton, NJ 08540
REAL-TIME TIME-FREQUENCY BASED BLIND SOURCE SEPARATION Scott Rickard, Radu Balan, Justinian Rosca Siemens Corporate Research Princeton, NJ 84 fscott.rickard,radu.balan,justinian.roscag@scr.siemens.com
More informationSoft-LOST: EM on a Mixture of Oriented Lines
Soft-LOST: EM on a Mixture of Oriented Lines Paul D. O Grady and Barak A. Pearlmutter Hamilton Institute National University of Ireland Maynooth Co. Kildare Ireland paul.ogrady@may.ie barak@cs.may.ie Abstract.
More informationAUDIO INTERPOLATION RICHARD RADKE 1 AND SCOTT RICKARD 2
AUDIO INTERPOLATION RICHARD RADKE AND SCOTT RICKARD 2 Department of Electrical, Computer, and Systems Engineering, Rensselaer Polytechnic Institute, Troy, NY 28, USA rjradke@ecse.rpi.edu 2 Program in Applied
More informationON THE LIMITATIONS OF BINAURAL REPRODUCTION OF MONAURAL BLIND SOURCE SEPARATION OUTPUT SIGNALS
th European Signal Processing Conference (EUSIPCO 12) Bucharest, Romania, August 27-31, 12 ON THE LIMITATIONS OF BINAURAL REPRODUCTION OF MONAURAL BLIND SOURCE SEPARATION OUTPUT SIGNALS Klaus Reindl, Walter
More informationPerformance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project
Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore
More informationSpeech Recognition Lecture 8: Expectation-Maximization Algorithm, Hidden Markov Models.
Speech Recognition Lecture 8: Expectation-Maximization Algorithm, Hidden Markov Models. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.com This Lecture Expectation-Maximization (EM)
More informationHEARING DISTANCE: A LOW-COST MODEL FOR NEAR-FIELD BINAURAL EFFECTS
th European Signal Processing Conference (EUSIPCO 12) Bucharest, Romania, August 27-31, 12 HEARING DISTANCE: A LOW-COST MODEL FOR NEAR-FIELD BINAURAL EFFECTS Simone Spagnol IUAV - University of Venice
More informationSOUND SOURCE SEPARATION BASED ON NON-NEGATIVE TENSOR FACTORIZATION INCORPORATING SPATIAL CUE AS PRIOR KNOWLEDGE
SOUND SOURCE SEPARATION BASED ON NON-NEGATIVE TENSOR FACTORIZATION INCORPORATING SPATIAL CUE AS PRIOR KNOWLEDGE Yuki Mitsufuji Sony Corporation, Tokyo, Japan Axel Roebel 1 IRCAM-CNRS-UPMC UMR 9912, 75004,
More informationPattern Classification
Pattern Classification Introduction Parametric classifiers Semi-parametric classifiers Dimensionality reduction Significance testing 6345 Automatic Speech Recognition Semi-Parametric Classifiers 1 Semi-Parametric
More informationSINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan
SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS Emad M. Grais and Hakan Erdogan Faculty of Engineering and Natural Sciences, Sabanci University, Orhanli
More informationWe Prediction of Geological Characteristic Using Gaussian Mixture Model
We-07-06 Prediction of Geological Characteristic Using Gaussian Mixture Model L. Li* (BGP,CNPC), Z.H. Wan (BGP,CNPC), S.F. Zhan (BGP,CNPC), C.F. Tao (BGP,CNPC) & X.H. Ran (BGP,CNPC) SUMMARY The multi-attribute
More informationIndependent Component Analysis and Unsupervised Learning. Jen-Tzung Chien
Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood
More informationCOMP 546. Lecture 21. Cochlea to brain, Source Localization. Tues. April 3, 2018
COMP 546 Lecture 21 Cochlea to brain, Source Localization Tues. April 3, 2018 1 Ear pinna auditory canal cochlea outer middle inner 2 Eye Ear Lens? Retina? Photoreceptors (light -> chemical) Ganglion cells
More informationWeighted Finite-State Transducers in Computational Biology
Weighted Finite-State Transducers in Computational Biology Mehryar Mohri Courant Institute of Mathematical Sciences mohri@cims.nyu.edu Joint work with Corinna Cortes (Google Research). 1 This Tutorial
More informationIntroduction to Audio and Music Engineering
Introduction to Audio and Music Engineering Lecture 7 Sound waves Sound localization Sound pressure level Range of human hearing Sound intensity and power 3 Waves in Space and Time Period: T Seconds Frequency:
More informationAcoustic Vector Sensor based Speech Source Separation with Mixed Gaussian-Laplacian Distributions
Acoustic Vector Sensor based Speech Source Separation with Mixed Gaussian-Laplacian Distributions Xiaoyi Chen, Atiyeh Alinaghi, Xionghu Zhong and Wenwu Wang Department of Acoustic Engineering, School of
More informationIndependent Component Analysis and Unsupervised Learning
Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien National Cheng Kung University TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent
More informationInformed Audio Source Separation: A Comparative Study
Informed Audio Source Separation: A Comparative Study Antoine Liutkus, Stanislaw Gorlow, Nicolas Sturmel, Shuhua Zhang, Laurent Girin, Roland Badeau, Laurent Daudet, Sylvain Marchand, Gaël Richard To cite
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More informationMixture Models and EM
Mixture Models and EM Goal: Introduction to probabilistic mixture models and the expectationmaximization (EM) algorithm. Motivation: simultaneous fitting of multiple model instances unsupervised clustering
More informationHarmonic/Percussive Separation Using Kernel Additive Modelling
Author manuscript, published in "IET Irish Signals & Systems Conference 2014 (2014)" ISSC 2014 / CIICT 2014, Limerick, June 26 27 Harmonic/Percussive Separation Using Kernel Additive Modelling Derry FitzGerald
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a
More informationEM Algorithm LECTURE OUTLINE
EM Algorithm Lukáš Cerman, Václav Hlaváč Czech Technical University, Faculty of Electrical Engineering Department of Cybernetics, Center for Machine Perception 121 35 Praha 2, Karlovo nám. 13, Czech Republic
More informationBlind Spectral-GMM Estimation for Underdetermined Instantaneous Audio Source Separation
Blind Spectral-GMM Estimation for Underdetermined Instantaneous Audio Source Separation Simon Arberet 1, Alexey Ozerov 2, Rémi Gribonval 1, and Frédéric Bimbot 1 1 METISS Group, IRISA-INRIA Campus de Beaulieu,
More informationNon-Negative Matrix Factorization And Its Application to Audio. Tuomas Virtanen Tampere University of Technology
Non-Negative Matrix Factorization And Its Application to Audio Tuomas Virtanen Tampere University of Technology tuomas.virtanen@tut.fi 2 Contents Introduction to audio signals Spectrogram representation
More information(3) where the mixing vector is the Fourier transform of are the STFT coefficients of the sources I. INTRODUCTION
1830 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 7, SEPTEMBER 2010 Under-Determined Reverberant Audio Source Separation Using a Full-Rank Spatial Covariance Model Ngoc Q.
More informationChapter 08: Direct Maximum Likelihood/MAP Estimation and Incomplete Data Problems
LEARNING AND INFERENCE IN GRAPHICAL MODELS Chapter 08: Direct Maximum Likelihood/MAP Estimation and Incomplete Data Problems Dr. Martin Lauer University of Freiburg Machine Learning Lab Karlsruhe Institute
More informationCOMP 546. Lecture 20. Head and Ear. Thurs. March 29, 2018
COMP 546 Lecture 20 Head and Ear Thurs. March 29, 2018 1 Impulse function at t = 0. I X, Y, Z, t = δ(x X 0, Y Y 0, Z Z 0, t) To define an impulse function properly in a continuous space requires more math.
More informationarxiv: v1 [cs.sd] 30 Oct 2015
ACE Challenge Workshop, a satellite event of IEEE-WASPAA 15 October 18-1, 15, New Paltz, NY ESTIMATION OF THE DIRECT-TO-REVERBERANT ENERGY RATIO USING A SPHERICAL MICROPHONE ARRAY Hanchi Chen, Prasanga
More informationSPACIOUSNESS OF SOUND FIELDS CAPTURED BY SPHERICAL MICROPHONE ARRAYS
BEN GURION UNIVERSITY OF THE NEGEV FACULTY OF ENGINEERING SCIENCES DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING SPACIOUSNESS OF SOUND FIELDS CAPTURED BY SPHERICAL MICROPHONE ARRAYS THESIS SUBMITTED
More informationExperiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition
Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition ABSTRACT It is well known that the expectation-maximization (EM) algorithm, commonly used to estimate hidden
More informationBinaural Beamforming Using Pre-Determined Relative Acoustic Transfer Functions
Binaural Beamforming Using Pre-Determined Relative Acoustic Transfer Functions Andreas I. Koutrouvelis, Richard C. Hendriks, Richard Heusdens, Jesper Jensen and Meng Guo e-mails: {a.koutrouvelis, r.c.hendriks,
More informationACOUSTIC VECTOR SENSOR BASED REVERBERANT SPEECH SEPARATION WITH PROBABILISTIC TIME-FREQUENCY MASKING
ACOUSTIC VECTOR SENSOR BASED REVERBERANT SPEECH SEPARATION WITH PROBABILISTIC TIME-FREQUENCY MASKING Xionghu Zhong, Xiaoyi Chen, Wenwu Wang, Atiyeh Alinaghi, and Annamalai B. Premkumar School of Computer
More informationOn Spectral Basis Selection for Single Channel Polyphonic Music Separation
On Spectral Basis Selection for Single Channel Polyphonic Music Separation Minje Kim and Seungjin Choi Department of Computer Science Pohang University of Science and Technology San 31 Hyoja-dong, Nam-gu
More informationEstimating Correlation Coefficient Between Two Complex Signals Without Phase Observation
Estimating Correlation Coefficient Between Two Complex Signals Without Phase Observation Shigeki Miyabe 1B, Notubaka Ono 2, and Shoji Makino 1 1 University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki
More informationComputational Perception. Sound Localization 1
Computational Perception 15-485/785 January 17, 2008 Sound Localization 1 Orienting sound localization visual pop-out eye/body movements attentional shift 2 The Problem of Sound Localization What are the
More informationLecture 6: Gaussian Mixture Models (GMM)
Helsinki Institute for Information Technology Lecture 6: Gaussian Mixture Models (GMM) Pedram Daee 3.11.2015 Outline Gaussian Mixture Models (GMM) Models Model families and parameters Parameter learning
More informationClustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation
Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based clustering Model estimation 1 Clustering A basic tool in data mining/pattern recognition: Divide
More informationFinite Singular Multivariate Gaussian Mixture
21/06/2016 Plan 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Plan Singular Multivariate Normal Distribution 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Multivariate
More informationIEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 54, NO. 2, FEBRUARY
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL 54, NO 2, FEBRUARY 2006 423 Underdetermined Blind Source Separation Based on Sparse Representation Yuanqing Li, Shun-Ichi Amari, Fellow, IEEE, Andrzej Cichocki,
More informationNonnegative Matrix Factorization with Markov-Chained Bases for Modeling Time-Varying Patterns in Music Spectrograms
Nonnegative Matrix Factorization with Markov-Chained Bases for Modeling Time-Varying Patterns in Music Spectrograms Masahiro Nakano 1, Jonathan Le Roux 2, Hirokazu Kameoka 2,YuKitano 1, Nobutaka Ono 1,
More informationSingle Channel Music Sound Separation Based on Spectrogram Decomposition and Note Classification
Single Channel Music Sound Separation Based on Spectrogram Decomposition and Note Classification Hafiz Mustafa and Wenwu Wang Centre for Vision, Speech and Signal Processing (CVSSP) University of Surrey,
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2 Agenda Expectation-maximization
More informationMachine Learning for Signal Processing Expectation Maximization Mixture Models. Bhiksha Raj 27 Oct /
Machine Learning for Signal rocessing Expectation Maximization Mixture Models Bhiksha Raj 27 Oct 2016 11755/18797 1 Learning Distributions for Data roblem: Given a collection of examples from some data,
More informationMachine Recognition of Sounds in Mixtures
Machine Recognition of Sounds in Mixtures Outline 1 2 3 4 Computational Auditory Scene Analysis Speech Recognition as Source Formation Sound Fragment Decoding Results & Conclusions Dan Ellis
More informationEM Algorithm. Expectation-maximization (EM) algorithm.
EM Algorithm Outline: Expectation-maximization (EM) algorithm. Examples. Reading: A.P. Dempster, N.M. Laird, and D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc.,
More informationInformation Theory Based Estimator of the Number of Sources in a Sparse Linear Mixing Model
Information heory Based Estimator of the Number of Sources in a Sparse Linear Mixing Model Radu Balan University of Maryland Department of Mathematics, Center for Scientific Computation And Mathematical
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationPhase Aliasing Correction For Robust Blind Source Separation Using DUET
IEEE TRANSACTION ON SIGNAL PROCESSING 1 Phase Aliasing Correction For Robust Blind Source Separation Using DUET Yang Wang 1,3, Özgür Yılmaz 2 and Zhengfang Zhou 1 Abstract Degenerate Unmixing Estimation
More informationESTIMATION OF RELATIVE TRANSFER FUNCTION IN THE PRESENCE OF STATIONARY NOISE BASED ON SEGMENTAL POWER SPECTRAL DENSITY MATRIX SUBTRACTION
ESTIMATION OF RELATIVE TRANSFER FUNCTION IN THE PRESENCE OF STATIONARY NOISE BASED ON SEGMENTAL POWER SPECTRAL DENSITY MATRIX SUBTRACTION Xiaofei Li 1, Laurent Girin 1,, Radu Horaud 1 1 INRIA Grenoble
More informationA LOCALIZATION METHOD FOR MULTIPLE SOUND SOURCES BY USING COHERENCE FUNCTION
8th European Signal Processing Conference (EUSIPCO-2) Aalborg, Denmark, August 23-27, 2 A LOCALIZATION METHOD FOR MULTIPLE SOUND SOURCES BY USING COHERENCE FUNCTION Hiromichi NAKASHIMA, Mitsuru KAWAMOTO,
More informationTransaural Audio - The reproduction of binaural signals over loudspeakers. Fabio Kaiser
Transaural Audio - The reproduction of binaural signals over loudspeakers Fabio Kaiser Outline 1 Introduction 2 Inversion of non-minimum phase filters Inversion techniques 3 Implementation of CTC 4 Objective
More informationThe effect of impedance on interaural azimuth cues derived from a spherical head model a)
The effect of impedance on interaural azimuth cues derived from a spherical head model a) Bradley E. Treeby, b Roshun M. Paurobally, and Jie Pan Centre for Acoustics, Dynamics and Vibration, School of
More informationCorner. Corners are the intersections of two edges of sufficiently different orientations.
2D Image Features Two dimensional image features are interesting local structures. They include junctions of different types like Y, T, X, and L. Much of the work on 2D features focuses on junction L,
More informationBrief Introduction of Machine Learning Techniques for Content Analysis
1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview
More informationAudio Source Separation Based on Convolutive Transfer Function and Frequency-Domain Lasso Optimization
Audio Source Separation Based on Convolutive Transfer Function and Frequency-Domain Lasso Optimization Xiaofei Li, Laurent Girin, Radu Horaud To cite this version: Xiaofei Li, Laurent Girin, Radu Horaud.
More informationLecture 7: Pitch and Chord (2) HMM, pitch detection functions. Li Su 2016/03/31
Lecture 7: Pitch and Chord (2) HMM, pitch detection functions Li Su 2016/03/31 Chord progressions Chord progressions are not arbitrary Example 1: I-IV-I-V-I (C-F-C-G-C) Example 2: I-V-VI-III-IV-I-II-V
More informationPreFEst: A Predominant-F0 Estimation Method for Polyphonic Musical Audio Signals
PreFEst: A Predominant-F0 Estimation Method for Polyphonic Musical Audio Signals Masataka Goto National Institute of Advanced Industrial Science and Technology (AIST). IT, AIST, 1-1-1 Umezono, Tsukuba,
More informationEstimating the parameters of hidden binomial trials by the EM algorithm
Hacettepe Journal of Mathematics and Statistics Volume 43 (5) (2014), 885 890 Estimating the parameters of hidden binomial trials by the EM algorithm Degang Zhu Received 02 : 09 : 2013 : Accepted 02 :
More informationAcoustic MIMO Signal Processing
Yiteng Huang Jacob Benesty Jingdong Chen Acoustic MIMO Signal Processing With 71 Figures Ö Springer Contents 1 Introduction 1 1.1 Acoustic MIMO Signal Processing 1 1.2 Organization of the Book 4 Part I
More informationEnvironmental Sound Classification in Realistic Situations
Environmental Sound Classification in Realistic Situations K. Haddad, W. Song Brüel & Kjær Sound and Vibration Measurement A/S, Skodsborgvej 307, 2850 Nærum, Denmark. X. Valero La Salle, Universistat Ramon
More informationSingle Channel Signal Separation Using MAP-based Subspace Decomposition
Single Channel Signal Separation Using MAP-based Subspace Decomposition Gil-Jin Jang, Te-Won Lee, and Yung-Hwan Oh 1 Spoken Language Laboratory, Department of Computer Science, KAIST 373-1 Gusong-dong,
More informationKernel-Based Formulations of Spatio-Spectral Transform and Three Related Transforms on the Sphere
Kernel-Based Formulations of Spatio-Spectral Transform and Three Related Transforms on the Sphere Rod Kennedy 1 rodney.kennedy@anu.edu.au 1 Australian National University Azores Antipode Tuesday 15 July
More informationGMM-based classification from noisy features
GMM-based classification from noisy features Alexey Ozerov, Mathieu Lagrange and Emmanuel Vincent INRIA, Centre de Rennes - Bretagne Atlantique STMS Lab IRCAM - CNRS - UPMC alexey.ozerov@inria.fr, mathieu.lagrange@ircam.fr,
More informationStatistical Filters for Crowd Image Analysis
Statistical Filters for Crowd Image Analysis Ákos Utasi, Ákos Kiss and Tamás Szirányi Distributed Events Analysis Research Group, Computer and Automation Research Institute H-1111 Budapest, Kende utca
More informationUnderdetermined Instantaneous Audio Source Separation via Local Gaussian Modeling
Underdetermined Instantaneous Audio Source Separation via Local Gaussian Modeling Emmanuel Vincent, Simon Arberet, and Rémi Gribonval METISS Group, IRISA-INRIA Campus de Beaulieu, 35042 Rennes Cedex, France
More informationDetection of Overlapping Acoustic Events Based on NMF with Shared Basis Vectors
Detection of Overlapping Acoustic Events Based on NMF with Shared Basis Vectors Kazumasa Yamamoto Department of Computer Science Chubu University Kasugai, Aichi, Japan Email: yamamoto@cs.chubu.ac.jp Chikara
More informationOptimal Speech Enhancement Under Signal Presence Uncertainty Using Log-Spectral Amplitude Estimator
1 Optimal Speech Enhancement Under Signal Presence Uncertainty Using Log-Spectral Amplitude Estimator Israel Cohen Lamar Signal Processing Ltd. P.O.Box 573, Yokneam Ilit 20692, Israel E-mail: icohen@lamar.co.il
More informationUNIVERSITY OF MIAMI. Jonathan Boley A RESEARCH PROJECT
UNIVERSITY OF MIAMI AUDITORY COMPONENT ANALYSIS USING PERCEPTUAL PATTERN RECOGNITION TO IDENTIFY AND EXTRACT INDEPENDENT COMPONENTS FROM AN AUDITORY SCENE By Jonathan Boley A RESEARCH PROJECT Submitted
More informationBayesian Hierarchical Modeling for Music and Audio Processing at LabROSA
Bayesian Hierarchical Modeling for Music and Audio Processing at LabROSA Dawen Liang (LabROSA) Joint work with: Dan Ellis (LabROSA), Matt Hoffman (Adobe Research), Gautham Mysore (Adobe Research) 1. Bayesian
More informationU-Likelihood and U-Updating Algorithms: Statistical Inference in Latent Variable Models
U-Likelihood and U-Updating Algorithms: Statistical Inference in Latent Variable Models Jaemo Sung 1, Sung-Yang Bang 1, Seungjin Choi 1, and Zoubin Ghahramani 2 1 Department of Computer Science, POSTECH,
More informationGaussian Mixture Model Uncertainty Learning (GMMUL) Version 1.0 User Guide
Gaussian Mixture Model Uncertainty Learning (GMMUL) Version 1. User Guide Alexey Ozerov 1, Mathieu Lagrange and Emmanuel Vincent 1 1 INRIA, Centre de Rennes - Bretagne Atlantique Campus de Beaulieu, 3
More informationTwo-View Segmentation of Dynamic Scenes from the Multibody Fundamental Matrix
Two-View Segmentation of Dynamic Scenes from the Multibody Fundamental Matrix René Vidal Stefano Soatto Shankar Sastry Department of EECS, UC Berkeley Department of Computer Sciences, UCLA 30 Cory Hall,
More informationInformed algorithms for sound source separation in enclosed reverberant environments
Loughborough University Institutional Repository Informed algorithms for sound source separation in enclosed reverberant environments This item was submitted to Loughborough University's Institutional
More informationTinySR. Peter Schmidt-Nielsen. August 27, 2014
TinySR Peter Schmidt-Nielsen August 27, 2014 Abstract TinySR is a light weight real-time small vocabulary speech recognizer written entirely in portable C. The library fits in a single file (plus header),
More informationA NEW DISSIMILARITY METRIC FOR THE CLUSTERING OF PARTIALS USING THE COMMON VARIATION CUE
A NEW DISSIMILARITY METRIC FOR THE CLUSTERING OF PARTIALS USING THE COMMON VARIATION CUE Mathieu Lagrange SCRIME LaBRI, Université Bordeaux 1 351, cours de la Libération, F-33405 Talence cedex, France
More informationParameter Estimation in the Spatio-Temporal Mixed Effects Model Analysis of Massive Spatio-Temporal Data Sets
Parameter Estimation in the Spatio-Temporal Mixed Effects Model Analysis of Massive Spatio-Temporal Data Sets Matthias Katzfuß Advisor: Dr. Noel Cressie Department of Statistics The Ohio State University
More informationAround the Speaker De-Identification (Speaker diarization for de-identification ++) Itshak Lapidot Moez Ajili Jean-Francois Bonastre
Around the Speaker De-Identification (Speaker diarization for de-identification ++) Itshak Lapidot Moez Ajili Jean-Francois Bonastre The 2 Parts HDM based diarization System The homogeneity measure 2 Outline
More informationPredicting speech intelligibility in noisy rooms.
Acknowledgement: Work supported by UK EPSRC Predicting speech intelligibility in noisy rooms. John F. Culling 1, Mathieu Lavandier 2 and Sam Jelfs 3 1 School of Psychology, Cardiff University, Tower Building,
More informationSource Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative Matrix Factorization
Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative Matrix Factorization Nicholas Bryan Dennis Sun Center for Computer Research in Music and Acoustics, Stanford University
More informationDIRECTION ESTIMATION BASED ON SOUND INTENSITY VECTORS. Sakari Tervo
7th European Signal Processing Conference (EUSIPCO 9) Glasgow, Scotland, August 4-8, 9 DIRECTION ESTIMATION BASED ON SOUND INTENSITY VECTORS Sakari Tervo Helsinki University of Technology Department of
More informationREVIEW OF SINGLE CHANNEL SOURCE SEPARATION TECHNIQUES
REVIEW OF SINGLE CHANNEL SOURCE SEPARATION TECHNIQUES Kedar Patki University of Rochester Dept. of Electrical and Computer Engineering kedar.patki@rochester.edu ABSTRACT The paper reviews the problem of
More informationComparison between the equalization and cancellation model and state of the art beamforming techniques
Comparison between the equalization and cancellation model and state of the art beamforming techniques FREDRIK GRAN 1,*,JESPER UDESEN 1, and Andrew B. Dittberner 2 Fredrik Gran 1,*, Jesper Udesen 1,*,
More informationOn the Slow Convergence of EM and VBEM in Low-Noise Linear Models
NOTE Communicated by Zoubin Ghahramani On the Slow Convergence of EM and VBEM in Low-Noise Linear Models Kaare Brandt Petersen kbp@imm.dtu.dk Ole Winther owi@imm.dtu.dk Lars Kai Hansen lkhansen@imm.dtu.dk
More informationMachine Learning Techniques for Computer Vision
Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM
More informationEstimation of Relative Operating Characteristics of Text Independent Speaker Verification
International Journal of Engineering Science Invention Volume 1 Issue 1 December. 2012 PP.18-23 Estimation of Relative Operating Characteristics of Text Independent Speaker Verification Palivela Hema 1,
More informationSession 1: Pattern Recognition
Proc. Digital del Continguts Musicals Session 1: Pattern Recognition 1 2 3 4 5 Music Content Analysis Pattern Classification The Statistical Approach Distribution Models Singing Detection Dan Ellis
More informationGaussian Processes for Audio Feature Extraction
Gaussian Processes for Audio Feature Extraction Dr. Richard E. Turner (ret26@cam.ac.uk) Computational and Biological Learning Lab Department of Engineering University of Cambridge Machine hearing pipeline
More informationSEC: Stochastic ensemble consensus approach to unsupervised SAR sea-ice segmentation
2009 Canadian Conference on Computer and Robot Vision SEC: Stochastic ensemble consensus approach to unsupervised SAR sea-ice segmentation Alexander Wong, David A. Clausi, and Paul Fieguth Vision and Image
More informationIMPROVED MULTI-MICROPHONE NOISE REDUCTION PRESERVING BINAURAL CUES
IMPROVED MULTI-MICROPHONE NOISE REDUCTION PRESERVING BINAURAL CUES Andreas I. Koutrouvelis Richard C. Hendriks Jesper Jensen Richard Heusdens Circuits and Systems (CAS) Group, Delft University of Technology,
More informationA Comparison of Computational Precedence Models for Source Separation in Reverberant Environments
A Comparison of Computational Precedence Models for Source Separation in Reverberant Environments CHRISTOPHER HUMMERSONE, 1 AES Member, RUSSELL MASON, 1 AES Member, AND c.hummersone@surrey.ac.uk r.mason@surrey.ac.uk
More informationMinimum message length estimation of mixtures of multivariate Gaussian and von Mises-Fisher distributions
Minimum message length estimation of mixtures of multivariate Gaussian and von Mises-Fisher distributions Parthan Kasarapu & Lloyd Allison Monash University, Australia September 8, 25 Parthan Kasarapu
More information