Multiple Speaker Tracking with the Factorial von Mises- Fisher Filter IEEE International Workshop on Machine Learning for Signal Processing Sept 21-24, 2014 Reims, France Johannes Traa, Paris Smaragdis University of Illinois at Urbana-Champaign
Outline Motivation Sequential Bayesian Inference Linear Dynamical System o Kalman Filter Directional Statistics o von Mises-Fisher distribution o 3D Rotations Spherical Dynamical System o Particle Filter o von Mises-Fisher Filter (vmff) o Factorial vmff (FvMFF) Experiments
Motivation Tracking on the sphere o Human-computer interfaces with compact arrays
Motivation Sequential Bayesian Inference o Powerful framework for tracking Traditional techniques o Kalman Filter Ignores topology of sphere o Particle Filter Computationally intensive Proposed method o von Mises-Fisher Filter Accurate Deterministic
Sequential Bayesian Inference Dynamic Bayesian Network (DBN) o State transition o Measurement x t p (x t x t 1 ) y t p (y t x t ) x t 1 x t x t+1 y t 1 y t y t+1
Sequential Bayesian Inference Bayesian Filtering Equations (BFE) o Predict (convolution) p (x t y 1:t 1 ) = o Correct (Bayes rule) Z p (x t x t 1 ) p (x t 1 y 1:t 1 ) d x t 1 p (x t y 1:t ) / p (y t x t ) p (x t y 1:t 1 ) Predict x t 1 x t Correct y t
Linear Dynamical System (LDS) Probabilistic model Kalman Filter o Optimal on-line inference for LDS x t N (Ax t 1, x ) y t N (Bx t, y ) x t 1 x t x t+1 y t 1 y t y t+1
Kalman Filter KF ignores unique topology of DOA manifold o 3D state tracking o Posterior lies off of manifold
Directional Statistics von-mises Fisher distribution o Unit Sphere S 2 = x : x 2 R 3, kxk 2 =1 o Probability density function p (x µ,apple)= apple 4 sinh (apple) eapple x > µ
Directional Statistics Rotations on the sphere o Axis (unit vector) a a o Angle o Rotation matrix R (a, )= 2 6 4 0 a 3 a 2 a 3 0 a 1 a 2 a 1 0 3 7 5 sin ( )+ I aa > cos ( )+aa > o Rotation = linear transformation x 0 = R (a, ) x = R ( a) x
Spherical Dynamical System (SDS) Probabilistic Model o DOA state transition o Rotation state transition o Measurement x t x t 1, r t 1 vmf (R (r t 1 ) x t 1,apple x ) r t r t 1 N (Ar t 1, r ) y t x t vmf (x t,apple y ) o Rotation vector: r = a o Full state vector: s t = apple xt r t
Particle Filter Approximate inference for the SDS o Stochastic o Sequential variant of Monte Carlo o Approximate BFEs with particles (weighted point estimates) S t = n s (l) t o,w (l) t o Maintain particle representation of filtered state distribution p (s t y 1:t ) LX l=1 w (l) t s (l) t
von Mises- Fisher Filter (vmff) Factored representation p (s t 1 y 1:t 1 ) = vmf x t 1 bµ t 1, bapple t 1 N r t 1 b t 1, b t 1 Predict step Position Rotation o DOA state à approximate via convolution of 2D wrapped Normals Convolution p (x t y 1:t 1 ) = Z S 2 p (x t s t 1 ) p (x t 1 y 1:t 1 ) d x t 1 Mean Concentration bµ t = R b t 1 bµ t 1 A bapple t A (bapple t 1 ) A (bapple x ) A 1/ 2 r =0 A (apple) = 1 tanh (apple) 1 apple Position noise Rotation noise o Rotation state à regular KF prediction
von Mises- Fisher Filter (vmff) Correct step o DOA state à closed-form solution for vmf o Rotation state à approximate via auxiliary observation Bayes rule p (r t y 1:t ) / p (y t r t ) p (r t y 1:t 1 ) Emission density (not explicitly defined in SDS) Auxiliary observation: rotation vector required for bµ t = y t y r t = cos 1 bµ t 1 y t bµ t 1 y t kbµ t 1 y t k 2 Degree of rotation Axis of rotation
Factorial vmff (FvMFF) Probabilistic Data Association (PDA) o Probabilistic assignment of observations to speakers o EM-like de-coupling of correct steps for the speakers o Include component to handle clutter Outlier distribution Source 2 Source 1
Experiments Accuracy o Comparison between: 3D Kalman filter (KF) Particle filter on SDS (vmfpf) o 50 particles Proposed method (vmff) o Average angular error: E = 1 T TX cos 1 t=1 True DOA x > t bµ t Mean of filtered distribution Error (radians) Error (radians) Error (radians) 0.3 0.25 0.2 0.15 0.3 0.25 0.2 0.15 0.3 0.25 0.2 0.15 k x = 100 KF vmfpf vmff 10 15 20 25 30 35 40 45 50 k y k x = 200 KF vmfpf vmff 10 15 20 25 30 35 40 45 50 k y k x = 500 KF vmfpf vmff 10 15 20 25 30 35 40 45 50 k y
Experiments Number of particles and run time o Particle filter matches vmff performance with 150 particles o However, it runs 60x more slowly Error (radians) vmfpf computation time (milliseconds per iteration) 2.8 6.5 12.6 25 63.7 0.22 KF vmfpf vmff 0.21 0.2 0.19 0.18 0.17 20 50 100 200 500 # particles apple x = 200,apple y = 30
Experiments Measurement extraction Microphone 1 Microphone M Time domain x (1) t [1...N] x (M) t [1...N] Discrete Fourier Transform Frequency domain X (1) 1:N,t X (M) 1:N,t Inter- channel features Inter- Channel Time Differences 1,t N 2,t Map ITDs to DOAs DOA measurements y 1,t... yn 2,t
Experiments Multiple speaker tracking o Square array with M =4 microphones (side length = 2 centimeters) o 2- to 3-second sentences from TSP corpus o Speakers moved around array o ~ 0 db mixture o T 60 reverb time: 100 milliseconds True vmff 1 vmff 2 o Successful tracking requires gating procedure to avoid model mismatch
Thank you