From Bases to Exemplars, from Separa2on to Understanding Paris Smaragdis
|
|
- Ezra Bryan
- 6 years ago
- Views:
Transcription
1 From Bases to Exemplars, from Separa2on to Understanding Paris Smaragdis CS & ECE Depts. UNIVERSITY OF ILLINOIS AT URBANA CHAMPAIGN
2 Some Mo2va2on Why do we separate signals? I don t really know Is there an all- conquering algorithm? I suspect, not really So why are you working on both of the above Paris? It s a good exercise for what s to come
3 Outline Low- rank models Learning to listen Nearest subspace approaches Using data, not fancy algorithms Taking advantage of seman2c informa2on Explaining mixtures, not decomposing them
4 How do we deal with mixtures? We find coherent structure We can mimic humans Or we can use Frequency Time
5 Learning what to separate We cannot make up data, we need something to learn from Unknown data Original speech Original interference Training data Observed data Mixture Training speech Training interference
6 Describing sounds A probabilis2c interpreta2on of the spectrum Why? We don t care for scale and phase Allows us to perform sophis@cated reasoning Probability Frequency bin
7 The space we deal with These distribu2ons live in a simplex {0,0,1} {0,1,0} {1,0,0}
8 The space we deal with These distribu2ons live in a simplex P(f1 ) [1, 0, 0] 1 1 1,, , 0, 2 2 [0.1, 0.8, 0.1] P(f2 ) P(f3 )
9 Modeling one sound Use a dic2onary representa2on P t (f ) = z P(f z)p t (z) Input spectra Dic.onary elements Weights z is the index of the dic2onary Everything is a distribu2on We can es2mate dic2onary/weights using EM
10 For the matrix inclined It s a linear transform P t (f ) = z P(f z)p t (z) Input spectra Dic.onary elements Weights f = f. z t z t
11 Huh? Frequency bin Time frame
12 Represented as frequency distribu2ons Each column is normalized Each column is now P t (f ) 500 Normalized spectra Frequency bin Time frame
13 A 2- element dic2onary approxima2on P t (f ) Frequency bin z P(f z) Time frame z P t (z)
14 Complex sounds = large dic2onaries Frequency distribu2ons capture spectral character
15 Or we can see this as Different areas of the simplex are different sounds Learned elements form convex hulls around them Source a Source b Dic.onary element for a Dic.onary element for b
16 Modeling mixtures A mix of two normed spectra lies on connec2ng subspace Source b m? Mix Source a? Two source mix Two source mix using dic.onaries
17 Huh again? Learn speech dic.onary Keep fixed Speech and Chimes Extracted speech Extracted chimes Learn only the weights Learn chime dic.onary Mixture of speech and chimes
18 A problem Convex hulls are a bad idea, sounds can overlap Source A Source B Mixture Convex Hull A Convex Hull B Estimate for A Estimate for B Approximation of mixture
19 Nearest subspace search Search for all possible solu2ons given training data i.e. exemplars training Source x Source y Mixture m
20 The bad news Very high computa2onal complexity M N searches per query For N sources and M training data points 8 min, 5 sources 206,719 training data points 206,719 5 = 377,486,980,238,462,848,824,329,599 searches For each input spectrum! Approximate algorithms Somewhat faster search, unrealis@c memory requirements A few Petabytes
21 Avoiding the search We can s2ll use the previous model P t (f ) = z P(f z)p t (z) Input spectra Consolidated training data Sparse weights If we force weights P t (z) to be sparse we approximate the nearest subspace search
22 Enforcing sparsity The hard way: Entropic priors We can tune each s entropy For sparse P t (z) we minimize its entropy Pain in The easy way: Maximum l 2 - norm Since 0 P t (z) 1 max l 2 - norm results in sparsity Corresponds to Simpson s diversity index Both plug seamlessly in EM es2ma2on
23 Computa2on gains Proposed method is substan2ally faster for a realis2c number of training data ( > 1,000) source search 2 source search 3 source search 4 source search 5 source search Computation required FLOPS Number of training samples per source
24 How this looks Finds points whose connec2ng subspace passes closest to the observed mixture point Using learned dic-onaries Using exemplars Source A Source B Mixture Convex Hull A Convex Hull B Estimate for A Estimate for B Approximation of mixture Source A Source B Mixture Estimate for A Estimate for B Approximation of mixture
25 And some results TIMIT speech mixes ~20dB SIR on average ~30dB SIR with post- process db Signal to Interference Ratio Speech and speech Inf Inf+Post z Extracted female speech Extracted male speech 8 6 Signal to Distortion Ratio Exemplars beat dic2onaries By a lot! db Inf Inf+Post z
26 A prac2cal extension We can t know all sources But we usually know one (target or interference) All mixing problems are binary Target vs. all else Target'dic+onary'P('f' 'z') Observed'mixtures'P t' ('f') Implied'compe+ng sources'subspace We need to learn extra bases Describe all that we don t know Straighgorward extension
27 In prac2ce Selec2ve parameter updates Pt (f ) = Pt (s1 ) P(f z,s1 )Pt (z s1 ) + Pt (s2 ) P(f z,s2 )Pt (z s2 ) z z Learn extra frequency elements that explain other sounds Target or interference example Mixture Extracted source Target extrac.on Noise removal Mixture recording Keep fixed Learn the weights
28 Fun things to do Original drum loop Extracted layers Music layer No tambourine Voice layer No congas Congas! Remixer Selec.ve pitch shiling Soprano layer Piano + Soprano Remixed layers Piano layer
29 More fun things
30 But the objec2ve is not to separate!! Source separa2on is a useless pursuit There is almost never a reason to separate The real holy grail: Understand mixtures, don t separate them Harder proposi2on, and rather unexplored
31 Making Direct Use of Exemplars Polyphonic pitch tracking Difficult mixture problem Some observa2ons It can be learned, it shouldn t be user- specified We can adapt what we ve done to do so We should avoid to separate!
32 Mono pitch tracking by example Hz Hz Training data Data to pitch track f x 1 (t ) f y 1 (t ) Frequency Frequency Frequency t p x 1 (t ) t Hz ? t ( )
33 Hz Hz 200 Representa2on and matching Normalized warped spectrograms 50 f x 1( t ) 100 t t p xf 1x(2t( )t ) Frequency 200 Hz Provide gain invariance Clarify harmonic structure t Frequency t p x 1( t ) t t f yp1(xt2()t ) 300 Training spectra t fy2 Frequency Find closest spectrum Use neighbor s pitch tag 100 f x 1( t ) Frequency Nearest Neighbor match t f y 1( t ) Input Frequency Training pitch tags 300
34 How well does that work? Proposed pitch tracking accuracy Error mean μ = 0.02 Hz Error devia@on σ = 1.1 Hz Praat" With popular pitch trackers Error mean μ = 0.1 Hz Error devia@on σ = 1.2 Hz So we re on to something But
35 The polyphonic case Nearest neighbors are insensi2ve to addi2vity Therefore can t resolve mixture sounds For mixtures we have to search for the nearest subspaces Aha! We know how to do that! Nearest Neighbor Search Nearest Subspace Search Untagged input Pitch- tagged data
36 Dealing with a duet Hz Hz Training on two instruments f x 1 (t ) t f y 1 (t ) f x 2 (t ) t f y 1 (t )+f y 2 (t ) Frequency Frequency t p x 1 (t ) Frequency t p x 2 (t ) t Frequency t Hz 160 Hz t ( ) t ( )+ ( )
37 Duet results S2ll works well Pitch error stats: μ = Hz, σ = 2.05 Hz ˆp y 1 (t) ˆp y 2 (t) Hz True Pratt" Estimate Misses Hz t True Praat" Estimate Misses t
38 A more beefy example Wind quintet recording Bassoon, Clarinet, Flute, Horn, Oboe Training data 7m41s per source 198,535 training vectors Removing unpitched vectors 50,000 training vectors Test data 1m10s of simultaneous performance 6,000 input spectra Data tested as duet, trio, quartet and quintet
39 Ra2o of true vs. es2mated pitch Duet p y / ˆp y µ = Hz σ = 89.5 Hz t = 37 sec Trio p y / ˆp y µ = Hz σ = Hz t = 49 sec Quartet p y / ˆp y µ = Hz σ = 1.6 Hz t = 60 sec x 104 Quintet p y / ˆp y µ = Hz σ = Hz t = 114 sec
40 Zooming in Most errors are human Raw method problems 1000Occasional confusion with other instruments True Estimate Misses Hz Correct over majority samples of a note t Median filtering Hz t Note averaging 1500 Hz 1000
41 What happened here? Input: Some listening experience Mixture of five sounds Output: Pitch values for each instrument elements used) Kind of instrument elements again) Amplitude of each source (presence of these elements) What more is there to do? No need to separate
42 Human - ish side effects Graceful degrada2on with increasing number of sources Duets easier than trios, easier than quartets, Can pitch track pitch- less sounds Inharmonic, quasi- periodic, etc. The more you know the beger you do
43 A more realis2c take Just as before, we can t know everything But we know something Semi- exemplar learning Mix exemplar model with basis decomposi@on Applies to target/background cases Which are most of the interes@ng cases anyway
44 Step 1. Learn the target source Like before, each exemplar comes with feature labels In this case pitch, can also be phoneme, stress, etc. We also learn temporal dynamics How exemplars/bases appear Also can apply for features too Target'dic+onary'P('f' 'z') Transi+on'matrix'P('z t+1 ' 'z t ') Use a transi2on matrix for z P(z t+1 z t )
45 Step 2. Learn the rest from a mixture Keep target exemplars fixed Adapt a new set of bases, while obeying transi@ons Explain mixture as target + rest We don t care about rest accuracy Use pitch from exemplars Same as before Target'dic+onary'P('f' 'z') Transi+on'matrix'P('z t+1 ' 'z t ') Observed'mixtures'P t ('f') Implied'compe+ng sources'subspace
46 Example pitch tracking Results in very accurate target following In a challenging and highly correlated case Training data Mixture Estimated P t (a) P t (a) (q) with C = Pitch (Hz) Expected pitch Time (sec)
47 Delving deeper in temporal dynamics Previous model was a linear predictor of sorts Short- term effects, minimal structure Extending this idea to stricter models Hidden Markov Model formula@on Can come in many flavors Markov Model Selec@on Non- Nega@ve HMMs
48 One last example Structured speech mixtures Each speaker follows a language model i.e. we hear words in sentences that make sense Use an HMM of course Encode domain structure knowledge Structured model replaces exemplars
49 The non- nega2ve HMM Temporal model using exemplars/bases P (f z 1 ) P (f z 2 ) T 1,1 T 2,2 P (f z 3 ) P (f z 4 ) 1 f T 1,2 State 1 State 2 f i X τ (f) i 1 2 f τ 0
50 Non- factorial learning State model addi2vity results in decoupled chains Fast state doesn t require factorial model Mixture Consolidated state dic.onaries Es.mated state probabili.es Previous extensions apply
51 Results on the Speech Separa2on Challenge Yes, we can separate, but we don t have to! HMM state paths transcribe speech Results are quite compe@@ve 0dB - 3dB - 6dB Baseline Speaker 2 Speaker 1-9dB Clean
52 My par2ng messages Don t separate! Separa@on algorithms are laying the founda@on for mixed signal processing and analysis, treat them as such!
53 My par2ng messages Don t separate! Separa@on algorithms are laying the founda@on for mixed signal processing and analysis, treat them as such! Keep separa2ng! We re learning a ton of new things, that s great! J
54 References Smaragdis, P Approximate nearest subspace for sound mixtures. In Proceedings Conference on Speech and Signal Processing (ICASSP). Prague, Czech Republic, May, 2011 Mysore, G., Smaragdis, and B. Raj Non- - - nega@ve hidden Markov modeling of audio with applica@on to source separa@on. In 9th interna@onal conference on Latent Variable Analysis and Signal Separa@on (LCA/ICA). St. Malo, France. September, 2010 Smaragdis, P. and B. Raj The Markov selec@on model for concurrent speech recogni@on. In IEEE interna@onal workshop on Machine Learning for Signal Processing (MLSP). Ki@la, Finland. August 2010 Smaragdis, P., M. Shashanka, and B. Raj A sparse non- - - parametric approach for single channel separa@on of known sounds. In in Neural Informa@on Processing Systems. Vancouver, BC, Canada. December 2009 Smaragdis, P User guided audio selec@on from complex sound mixtures. in the 22nd ACM Symposium on User Interface Sowware and Technology (UIST 09). Victoria, BC, Canada, October 2009 Shashanka, M.V., B. Raj and P. Smaragdis, Probabilis@c Latent Variable Models as Non- - - Nega@ve Factoriza@ons. In special issue on Advances in Non- - - nega@ve Matrix and Tensor Factoriza@on, Computa@onal Intelligence and Neuroscience Journal. May 2008 Shashanka, M.V., B. Raj, P. Smaragdis, Sparse Overcomplete Latent Variable Decomposi@on of Counts Data. In Neural Informa@on Processing Systems (NIPS), Vancouver, BC, Canada. December 2007 Smaragdis, P., B. Raj, and M.V. Shashanka, 2006, A probabilis@c latent variable model for acous@c modeling, Advances in models for acous@c processing workshop, NIPS 2006 Smaragdis, P. Component based techniques for monophonic speech separa@on and recogni@on, in "Blind Speech Separa@on", S. Makino, T- W.Lee and H. Sawada (eds.) Blind Speech Separa@on, Springer.
Sta$s$cal sequence recogni$on
Sta$s$cal sequence recogni$on Determinis$c sequence recogni$on Last $me, temporal integra$on of local distances via DP Integrates local matches over $me Normalizes $me varia$ons For cts speech, segments
More informationTemporal Modeling and Basic Speech Recognition
UNIVERSITY ILLINOIS @ URBANA-CHAMPAIGN OF CS 498PS Audio Computing Lab Temporal Modeling and Basic Speech Recognition Paris Smaragdis paris@illinois.edu paris.cs.illinois.edu Today s lecture Recognizing
More informationNon-negative Matrix Factorization: Algorithms, Extensions and Applications
Non-negative Matrix Factorization: Algorithms, Extensions and Applications Emmanouil Benetos www.soi.city.ac.uk/ sbbj660/ March 2013 Emmanouil Benetos Non-negative Matrix Factorization March 2013 1 / 25
More informationSINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan
SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS Emad M. Grais and Hakan Erdogan Faculty of Engineering and Natural Sciences, Sabanci University, Orhanli
More informationCSE 473: Ar+ficial Intelligence
CSE 473: Ar+ficial Intelligence Hidden Markov Models Luke Ze@lemoyer - University of Washington [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188
More informationCSE 473: Ar+ficial Intelligence. Probability Recap. Markov Models - II. Condi+onal probability. Product rule. Chain rule.
CSE 473: Ar+ficial Intelligence Markov Models - II Daniel S. Weld - - - University of Washington [Most slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188
More informationBayesian Hierarchical Modeling for Music and Audio Processing at LabROSA
Bayesian Hierarchical Modeling for Music and Audio Processing at LabROSA Dawen Liang (LabROSA) Joint work with: Dan Ellis (LabROSA), Matt Hoffman (Adobe Research), Gautham Mysore (Adobe Research) 1. Bayesian
More informationSound Recognition in Mixtures
Sound Recognition in Mixtures Juhan Nam, Gautham J. Mysore 2, and Paris Smaragdis 2,3 Center for Computer Research in Music and Acoustics, Stanford University, 2 Advanced Technology Labs, Adobe Systems
More informationSource Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative Matrix Factorization
Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative Matrix Factorization Nicholas Bryan Dennis Sun Center for Computer Research in Music and Acoustics, Stanford University
More informationNon-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs Paris Smaragdis TR2004-104 September
More informationLecture 7: Pitch and Chord (2) HMM, pitch detection functions. Li Su 2016/03/31
Lecture 7: Pitch and Chord (2) HMM, pitch detection functions Li Su 2016/03/31 Chord progressions Chord progressions are not arbitrary Example 1: I-IV-I-V-I (C-F-C-G-C) Example 2: I-V-VI-III-IV-I-II-V
More informationA Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement
A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement Simon Leglaive 1 Laurent Girin 1,2 Radu Horaud 1 1: Inria Grenoble Rhône-Alpes 2: Univ. Grenoble Alpes, Grenoble INP,
More informationDiscovering Convolutive Speech Phones using Sparseness and Non-Negativity Constraints
Discovering Convolutive Speech Phones using Sparseness and Non-Negativity Constraints Paul D. O Grady and Barak A. Pearlmutter Hamilton Institute, National University of Ireland Maynooth, Co. Kildare,
More informationMachine Learning for Signal Processing Non-negative Matrix Factorization
Machine Learning for Signal Processing Non-negative Matrix Factorization Class 10. 3 Oct 2017 Instructor: Bhiksha Raj With examples and slides from Paris Smaragdis 11755/18797 1 A Quick Recap X B W D D
More informationBayesian networks Lecture 18. David Sontag New York University
Bayesian networks Lecture 18 David Sontag New York University Outline for today Modeling sequen&al data (e.g., =me series, speech processing) using hidden Markov models (HMMs) Bayesian networks Independence
More informationOn Spectral Basis Selection for Single Channel Polyphonic Music Separation
On Spectral Basis Selection for Single Channel Polyphonic Music Separation Minje Kim and Seungjin Choi Department of Computer Science Pohang University of Science and Technology San 31 Hyoja-dong, Nam-gu
More informationMachine Learning for Signal Processing Non-negative Matrix Factorization
Machine Learning for Signal Processing Non-negative Matrix Factorization Class 10. Instructor: Bhiksha Raj With examples from Paris Smaragdis 11755/18797 1 The Engineer and the Musician Once upon a time
More informationA NEURAL NETWORK ALTERNATIVE TO NON-NEGATIVE AUDIO MODELS. University of Illinois at Urbana-Champaign Adobe Research
A NEURAL NETWORK ALTERNATIVE TO NON-NEGATIVE AUDIO MODELS Paris Smaragdis, Shrikant Venkataramani University of Illinois at Urbana-Champaign Adobe Research ABSTRACT We present a neural network that can
More informationIndependent Component Analysis and Unsupervised Learning
Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien National Cheng Kung University TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent
More informationCSE 473: Ar+ficial Intelligence. Example. Par+cle Filters for HMMs. An HMM is defined by: Ini+al distribu+on: Transi+ons: Emissions:
CSE 473: Ar+ficial Intelligence Par+cle Filters for HMMs Daniel S. Weld - - - University of Washington [Most slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All
More informationNon-Negative Matrix Factorization And Its Application to Audio. Tuomas Virtanen Tampere University of Technology
Non-Negative Matrix Factorization And Its Application to Audio Tuomas Virtanen Tampere University of Technology tuomas.virtanen@tut.fi 2 Contents Introduction to audio signals Spectrogram representation
More informationIndependent Component Analysis and Unsupervised Learning. Jen-Tzung Chien
Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood
More informationCSE 473: Ar+ficial Intelligence. Hidden Markov Models. Bayes Nets. Two random variable at each +me step Hidden state, X i Observa+on, E i
CSE 473: Ar+ficial Intelligence Bayes Nets Daniel Weld [Most slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at hnp://ai.berkeley.edu.]
More informationNonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation
Nonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation Mikkel N. Schmidt and Morten Mørup Technical University of Denmark Informatics and Mathematical Modelling Richard
More informationREVIEW OF SINGLE CHANNEL SOURCE SEPARATION TECHNIQUES
REVIEW OF SINGLE CHANNEL SOURCE SEPARATION TECHNIQUES Kedar Patki University of Rochester Dept. of Electrical and Computer Engineering kedar.patki@rochester.edu ABSTRACT The paper reviews the problem of
More informationGraphical Models. Lecture 1: Mo4va4on and Founda4ons. Andrew McCallum
Graphical Models Lecture 1: Mo4va4on and Founda4ons Andrew McCallum mccallum@cs.umass.edu Thanks to Noah Smith and Carlos Guestrin for some slide materials. Board work Expert systems the desire for probability
More informationLeast Square Es?ma?on, Filtering, and Predic?on: ECE 5/639 Sta?s?cal Signal Processing II: Linear Es?ma?on
Least Square Es?ma?on, Filtering, and Predic?on: Sta?s?cal Signal Processing II: Linear Es?ma?on Eric Wan, Ph.D. Fall 2015 1 Mo?va?ons If the second-order sta?s?cs are known, the op?mum es?mator is given
More informationProbabilistic Latent Variable Models as Non-Negative Factorizations
1 Probabilistic Latent Variable Models as Non-Negative Factorizations Madhusudana Shashanka, Bhiksha Raj, Paris Smaragdis Abstract In this paper, we present a family of probabilistic latent variable models
More informationEUSIPCO
EUSIPCO 213 1569744273 GAMMA HIDDEN MARKOV MODEL AS A PROBABILISTIC NONNEGATIVE MATRIX FACTORIZATION Nasser Mohammadiha, W. Bastiaan Kleijn, Arne Leijon KTH Royal Institute of Technology, Department of
More informationSeman&cs with Dense Vectors. Dorota Glowacka
Semancs with Dense Vectors Dorota Glowacka dorota.glowacka@ed.ac.uk Previous lectures: - how to represent a word as a sparse vector with dimensions corresponding to the words in the vocabulary - the values
More informationMidterm sample questions
Midterm sample questions CS 585, Brendan O Connor and David Belanger October 12, 2014 1 Topics on the midterm Language concepts Translation issues: word order, multiword translations Human evaluation Parts
More informationRelative-Pitch Tracking of Multiple Arbitrary Sounds
Relative-Pitch Tracking of Multiple Arbitrary Sounds Paris Smaragdis a) Adobe Systems Inc., 275 Grove St. Newton, MA 02466, USA Perceived-pitch tracking of potentially aperiodic sounds, as well as pitch
More informationExtraction of Individual Tracks from Polyphonic Music
Extraction of Individual Tracks from Polyphonic Music Nick Starr May 29, 2009 Abstract In this paper, I attempt the isolation of individual musical tracks from polyphonic tracks based on certain criteria,
More informationSingle Channel Music Sound Separation Based on Spectrogram Decomposition and Note Classification
Single Channel Music Sound Separation Based on Spectrogram Decomposition and Note Classification Hafiz Mustafa and Wenwu Wang Centre for Vision, Speech and Signal Processing (CVSSP) University of Surrey,
More informationCS 6140: Machine Learning Spring What We Learned Last Week. Survey 2/26/16. VS. Model
Logis@cs CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Assignment
More informationExemplar-based voice conversion using non-negative spectrogram deconvolution
Exemplar-based voice conversion using non-negative spectrogram deconvolution Zhizheng Wu 1, Tuomas Virtanen 2, Tomi Kinnunen 3, Eng Siong Chng 1, Haizhou Li 1,4 1 Nanyang Technological University, Singapore
More informationCS 6140: Machine Learning Spring 2016
CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa?on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Logis?cs Assignment
More informationDiscrimina)ve Latent Variable Models. SPFLODD November 15, 2011
Discrimina)ve Latent Variable Models SPFLODD November 15, 2011 Lecture Plan 1. Latent variables in genera)ve models (review) 2. Latent variables in condi)onal models 3. Latent variables in structural SVMs
More informationCS 6140: Machine Learning Spring What We Learned Last Week 2/26/16
Logis@cs CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Sign
More informationAnalysis of polyphonic audio using source-filter model and non-negative matrix factorization
Analysis of polyphonic audio using source-filter model and non-negative matrix factorization Tuomas Virtanen and Anssi Klapuri Tampere University of Technology, Institute of Signal Processing Korkeakoulunkatu
More informationFACTORS IN FACTORIZATION: DOES BETTER AUDIO SOURCE SEPARATION IMPLY BETTER POLYPHONIC MUSIC TRANSCRIPTION?
FACTORS IN FACTORIZATION: DOES BETTER AUDIO SOURCE SEPARATION IMPLY BETTER POLYPHONIC MUSIC TRANSCRIPTION? Tiago Fernandes Tavares, George Tzanetakis, Peter Driessen University of Victoria Department of
More informationScalable audio separation with light Kernel Additive Modelling
Scalable audio separation with light Kernel Additive Modelling Antoine Liutkus 1, Derry Fitzgerald 2, Zafar Rafii 3 1 Inria, Université de Lorraine, LORIA, UMR 7503, France 2 NIMBUS Centre, Cork Institute
More informationORTHOGONALITY-REGULARIZED MASKED NMF FOR LEARNING ON WEAKLY LABELED AUDIO DATA. Iwona Sobieraj, Lucas Rencker, Mark D. Plumbley
ORTHOGONALITY-REGULARIZED MASKED NMF FOR LEARNING ON WEAKLY LABELED AUDIO DATA Iwona Sobieraj, Lucas Rencker, Mark D. Plumbley University of Surrey Centre for Vision Speech and Signal Processing Guildford,
More informationMissing Data and Dynamical Systems
U NIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN CS598PS Machine Learning for Signal Processing Missing Data and Dynamical Systems 12 October 2017 Today s lecture Dealing with missing data Tracking and linear
More informationBILEVEL SPARSE MODELS FOR POLYPHONIC MUSIC TRANSCRIPTION
BILEVEL SPARSE MODELS FOR POLYPHONIC MUSIC TRANSCRIPTION Tal Ben Yakar Tel Aviv University talby0@gmail.com Roee Litman Tel Aviv University roeelitman@gmail.com Alex Bronstein Tel Aviv University bron@eng.tau.ac.il
More informationNonnegative Matrix Factorization with Markov-Chained Bases for Modeling Time-Varying Patterns in Music Spectrograms
Nonnegative Matrix Factorization with Markov-Chained Bases for Modeling Time-Varying Patterns in Music Spectrograms Masahiro Nakano 1, Jonathan Le Roux 2, Hirokazu Kameoka 2,YuKitano 1, Nobutaka Ono 1,
More informationEnvironmental Sound Classification in Realistic Situations
Environmental Sound Classification in Realistic Situations K. Haddad, W. Song Brüel & Kjær Sound and Vibration Measurement A/S, Skodsborgvej 307, 2850 Nærum, Denmark. X. Valero La Salle, Universistat Ramon
More informationSpeech enhancement based on nonnegative matrix factorization with mixed group sparsity constraint
Speech enhancement based on nonnegative matrix factorization with mixed group sparsity constraint Hien-Thanh Duong, Quoc-Cuong Nguyen, Cong-Phuong Nguyen, Thanh-Huan Tran, Ngoc Q. K. Duong To cite this
More informationLONG-TERM REVERBERATION MODELING FOR UNDER-DETERMINED AUDIO SOURCE SEPARATION WITH APPLICATION TO VOCAL MELODY EXTRACTION.
LONG-TERM REVERBERATION MODELING FOR UNDER-DETERMINED AUDIO SOURCE SEPARATION WITH APPLICATION TO VOCAL MELODY EXTRACTION. Romain Hennequin Deezer R&D 10 rue d Athènes, 75009 Paris, France rhennequin@deezer.com
More informationGraphical Models. Lecture 3: Local Condi6onal Probability Distribu6ons. Andrew McCallum
Graphical Models Lecture 3: Local Condi6onal Probability Distribu6ons Andrew McCallum mccallum@cs.umass.edu Thanks to Noah Smith and Carlos Guestrin for some slide materials. 1 Condi6onal Probability Distribu6ons
More informationSparse and Shift-Invariant Feature Extraction From Non-Negative Data
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Sparse and Shift-Invariant Feature Extraction From Non-Negative Data Paris Smaragdis, Bhiksha Raj, Madhusudana Shashanka TR8-3 August 8 Abstract
More informationCONVOLUTIVE NON-NEGATIVE MATRIX FACTORISATION WITH SPARSENESS CONSTRAINT
CONOLUTIE NON-NEGATIE MATRIX FACTORISATION WITH SPARSENESS CONSTRAINT Paul D. O Grady Barak A. Pearlmutter Hamilton Institute National University of Ireland, Maynooth Co. Kildare, Ireland. ABSTRACT Discovering
More informationMath 350: An exploration of HMMs through doodles.
Math 350: An exploration of HMMs through doodles. Joshua Little (407673) 19 December 2012 1 Background 1.1 Hidden Markov models. Markov chains (MCs) work well for modelling discrete-time processes, or
More informationACCOUNTING FOR PHASE CANCELLATIONS IN NON-NEGATIVE MATRIX FACTORIZATION USING WEIGHTED DISTANCES. Sebastian Ewert Mark D. Plumbley Mark Sandler
ACCOUNTING FOR PHASE CANCELLATIONS IN NON-NEGATIVE MATRIX FACTORIZATION USING WEIGHTED DISTANCES Sebastian Ewert Mark D. Plumbley Mark Sandler Queen Mary University of London, London, United Kingdom ABSTRACT
More informationCSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on
CSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on Professor Wei-Min Shen Week 13.1 and 13.2 1 Status Check Extra credits? Announcement Evalua/on process will start soon
More informationHow to do backpropagation in a brain
How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep
More informationSIMULTANEOUS NOISE CLASSIFICATION AND REDUCTION USING A PRIORI LEARNED MODELS
TO APPEAR IN IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEPT. 22 25, 23, UK SIMULTANEOUS NOISE CLASSIFICATION AND REDUCTION USING A PRIORI LEARNED MODELS Nasser Mohammadiha
More informationCSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on. Professor Wei-Min Shen Week 8.1 and 8.2
CSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on Professor Wei-Min Shen Week 8.1 and 8.2 Status Check Projects Project 2 Midterm is coming, please do your homework!
More informationSparseness Constraints on Nonnegative Tensor Decomposition
Sparseness Constraints on Nonnegative Tensor Decomposition Na Li nali@clarksonedu Carmeliza Navasca cnavasca@clarksonedu Department of Mathematics Clarkson University Potsdam, New York 3699, USA Department
More informationApproximate Principal Components Analysis of Large Data Sets
Approximate Principal Components Analysis of Large Data Sets Daniel J. McDonald Department of Statistics Indiana University mypage.iu.edu/ dajmcdon October 28, 2015 Motivation OBLIGATORY DATA IS BIG SLIDE
More informationAnalysis of audio intercepts: Can we identify and locate the speaker?
Motivation Analysis of audio intercepts: Can we identify and locate the speaker? K V Vijay Girish, PhD Student Research Advisor: Prof A G Ramakrishnan Research Collaborator: Dr T V Ananthapadmanabha Medical
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationNMF WITH SPECTRAL AND TEMPORAL CONTINUITY CRITERIA FOR MONAURAL SOUND SOURCE SEPARATION. Julian M. Becker, Christian Sohn and Christian Rohlfing
NMF WITH SPECTRAL AND TEMPORAL CONTINUITY CRITERIA FOR MONAURAL SOUND SOURCE SEPARATION Julian M. ecker, Christian Sohn Christian Rohlfing Institut für Nachrichtentechnik RWTH Aachen University D-52056
More informationCS 188: Artificial Intelligence Spring Announcements
CS 188: Artificial Intelligence Spring 2011 Lecture 18: HMMs and Particle Filtering 4/4/2011 Pieter Abbeel --- UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew Moore
More informationNatural Language Processing with Deep Learning. CS224N/Ling284
Natural Language Processing with Deep Learning CS224N/Ling284 Lecture 4: Word Window Classification and Neural Networks Christopher Manning and Richard Socher Classifica6on setup and nota6on Generally
More informationTinySR. Peter Schmidt-Nielsen. August 27, 2014
TinySR Peter Schmidt-Nielsen August 27, 2014 Abstract TinySR is a light weight real-time small vocabulary speech recognizer written entirely in portable C. The library fits in a single file (plus header),
More informationCS 188: Artificial Intelligence Spring Announcements
CS 188: Artificial Intelligence Spring 2010 Lecture 24: Perceptrons and More! 4/22/2010 Pieter Abbeel UC Berkeley Slides adapted from Dan Klein Announcements W7 due tonight [this is your last written for
More informationAn Efficient Posterior Regularized Latent Variable Model for Interactive Sound Source Separation
An Efficient Posterior Regularized Latent Variable Model for Interactive Sound Source Separation Nicholas J. Bryan Center for Computer Research in Music and Acoustics, Stanford University Gautham J. Mysore
More informationNonparametric Bayesian Dictionary Learning for Machine Listening
Nonparametric Bayesian Dictionary Learning for Machine Listening Dawen Liang Electrical Engineering dl2771@columbia.edu 1 Introduction Machine listening, i.e., giving machines the ability to extract useful
More informationA State-Space Approach to Dynamic Nonnegative Matrix Factorization
1 A State-Space Approach to Dynamic Nonnegative Matrix Factorization Nasser Mohammadiha, Paris Smaragdis, Ghazaleh Panahandeh, Simon Doclo arxiv:179.5v1 [cs.lg] 31 Aug 17 Abstract Nonnegative matrix factorization
More informationCS 188: Artificial Intelligence Fall 2011
CS 188: Artificial Intelligence Fall 2011 Lecture 20: HMMs / Speech / ML 11/8/2011 Dan Klein UC Berkeley Today HMMs Demo bonanza! Most likely explanation queries Speech recognition A massive HMM! Details
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationSingle Channel Signal Separation Using MAP-based Subspace Decomposition
Single Channel Signal Separation Using MAP-based Subspace Decomposition Gil-Jin Jang, Te-Won Lee, and Yung-Hwan Oh 1 Spoken Language Laboratory, Department of Computer Science, KAIST 373-1 Gusong-dong,
More informationBlind Machine Separation Te-Won Lee
Blind Machine Separation Te-Won Lee University of California, San Diego Institute for Neural Computation Blind Machine Separation Problem we want to solve: Single microphone blind source separation & deconvolution
More informationLatent Variable Framework for Modeling and Separating Single-Channel Acoustic Sources
COGNITIVE AND NEURAL SYSTEMS, BOSTON UNIVERSITY Boston, Massachusetts Latent Variable Framework for Modeling and Separating Single-Channel Acoustic Sources Committee Chair: Readers: Prof. Daniel Bullock
More informationInteres'ng- Phrase Mining for Ad- Hoc Text Analy'cs
Interes'ng- Phrase Mining for Ad- Hoc Text Analy'cs Klaus Berberich (kberberi@mpi- inf.mpg.de) Joint work with: Srikanta Bedathur, Jens DiIrich, Nikos Mamoulis, and Gerhard Weikum (originally presented
More informationExercises The Origin of Sound (page 515) 26.2 Sound in Air (pages ) 26.3 Media That Transmit Sound (page 517)
Exercises 26.1 The Origin of (page 515) Match each sound source with the part that vibrates. Source Vibrating Part 1. violin a. strings 2. your voice b. reed 3. saxophone c. column of air at the mouthpiece
More informationBLSTM-HMM HYBRID SYSTEM COMBINED WITH SOUND ACTIVITY DETECTION NETWORK FOR POLYPHONIC SOUND EVENT DETECTION
BLSTM-HMM HYBRID SYSTEM COMBINED WITH SOUND ACTIVITY DETECTION NETWORK FOR POLYPHONIC SOUND EVENT DETECTION Tomoki Hayashi 1, Shinji Watanabe 2, Tomoki Toda 1, Takaaki Hori 2, Jonathan Le Roux 2, Kazuya
More informationCOMS 4721: Machine Learning for Data Science Lecture 20, 4/11/2017
COMS 4721: Machine Learning for Data Science Lecture 20, 4/11/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University SEQUENTIAL DATA So far, when thinking
More informationOVERLAPPING ANIMAL SOUND CLASSIFICATION USING SPARSE REPRESENTATION
OVERLAPPING ANIMAL SOUND CLASSIFICATION USING SPARSE REPRESENTATION Na Lin, Haixin Sun Xiamen University Key Laboratory of Underwater Acoustic Communication and Marine Information Technology, Ministry
More informationQuan&fying Uncertainty. Sai Ravela Massachuse7s Ins&tute of Technology
Quan&fying Uncertainty Sai Ravela Massachuse7s Ins&tute of Technology 1 the many sources of uncertainty! 2 Two days ago 3 Quan&fying Indefinite Delay 4 Finally 5 Quan&fying Indefinite Delay P(X=delay M=
More informationAuto-Encoders & Variants
Auto-Encoders & Variants 113 Auto-Encoders MLP whose target output = input Reconstruc7on=decoder(encoder(input)), input e.g. x code= latent features h encoder decoder reconstruc7on r(x) With bo?leneck,
More informationNONNEGATIVE MATRIX FACTORIZATION WITH TRANSFORM LEARNING. Dylan Fagot, Herwig Wendt and Cédric Févotte
NONNEGATIVE MATRIX FACTORIZATION WITH TRANSFORM LEARNING Dylan Fagot, Herwig Wendt and Cédric Févotte IRIT, Université de Toulouse, CNRS, Toulouse, France firstname.lastname@irit.fr ABSTRACT Traditional
More informationCOMP 562: Introduction to Machine Learning
COMP 562: Introduction to Machine Learning Lecture 20 : Support Vector Machines, Kernels Mahmoud Mostapha 1 Department of Computer Science University of North Carolina at Chapel Hill mahmoudm@cs.unc.edu
More informationDept. Electronics and Electrical Engineering, Keio University, Japan. NTT Communication Science Laboratories, NTT Corporation, Japan.
JOINT SEPARATION AND DEREVERBERATION OF REVERBERANT MIXTURES WITH DETERMINED MULTICHANNEL NON-NEGATIVE MATRIX FACTORIZATION Hideaki Kagami, Hirokazu Kameoka, Masahiro Yukawa Dept. Electronics and Electrical
More informationProc. of NCC 2010, Chennai, India
Proc. of NCC 2010, Chennai, India Trajectory and surface modeling of LSF for low rate speech coding M. Deepak and Preeti Rao Department of Electrical Engineering Indian Institute of Technology, Bombay
More informationMatrix Factorization for Speech Enhancement
Matrix Factorization for Speech Enhancement Peter Li Peter.Li@nyu.edu Yijun Xiao ryjxiao@nyu.edu 1 Introduction In this report, we explore techniques for speech enhancement using matrix factorization.
More informationCS 5522: Artificial Intelligence II
CS 5522: Artificial Intelligence II Particle Filters and Applications of HMMs Instructor: Wei Xu Ohio State University [These slides were adapted from CS188 Intro to AI at UC Berkeley.] Recap: Reasoning
More informationDeep Learning for Speech Recognition. Hung-yi Lee
Deep Learning for Speech Recognition Hung-yi Lee Outline Conventional Speech Recognition How to use Deep Learning in acoustic modeling? Why Deep Learning? Speaker Adaptation Multi-task Deep Learning New
More informationSupervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization Nasser Mohammadiha*, Student Member, IEEE, Paris Smaragdis,
More informationCompressed Sensing and Neural Networks
and Jan Vybíral (Charles University & Czech Technical University Prague, Czech Republic) NOMAD Summer Berlin, September 25-29, 2017 1 / 31 Outline Lasso & Introduction Notation Training the network Applications
More informationSparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.
ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent
More informationANLP Lecture 22 Lexical Semantics with Dense Vectors
ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous
More informationCS 6140: Machine Learning Spring 2017
CS 6140: Machine Learning Spring 2017 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Logis@cs Assignment
More informationCS 188: Artificial Intelligence Spring Announcements
CS 188: Artificial Intelligence Spring 2010 Lecture 22: Nearest Neighbors, Kernels 4/18/2011 Pieter Abbeel UC Berkeley Slides adapted from Dan Klein Announcements On-going: contest (optional and FUN!)
More informationNatural Language Processing Prof. Pawan Goyal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur
Natural Language Processing Prof. Pawan Goyal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 18 Maximum Entropy Models I Welcome back for the 3rd module
More informationMULTI-RESOLUTION SIGNAL DECOMPOSITION WITH TIME-DOMAIN SPECTROGRAM FACTORIZATION. Hirokazu Kameoka
MULTI-RESOLUTION SIGNAL DECOMPOSITION WITH TIME-DOMAIN SPECTROGRAM FACTORIZATION Hiroazu Kameoa The University of Toyo / Nippon Telegraph and Telephone Corporation ABSTRACT This paper proposes a novel
More informationCS 343: Artificial Intelligence
CS 343: Artificial Intelligence Particle Filters and Applications of HMMs Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro
More informationFounda'ons of Large- Scale Mul'media Informa'on Management and Retrieval. Lecture #3 Machine Learning. Edward Chang
Founda'ons of Large- Scale Mul'media Informa'on Management and Retrieval Lecture #3 Machine Learning Edward Y. Chang Edward Chang Founda'ons of LSMM 1 Edward Chang Foundations of LSMM 2 Machine Learning
More information