Dynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji

Size: px

Start display at page:

Download "Dynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji"

Martin Fleming
5 years ago
Views:

1 Dynamic Data Modeling, Recognition, and Synthesis Rui Zhao Thesis Defense Advisor: Professor Qiang Ji

2 Contents Introduction Related Work Dynamic Data Modeling & Analysis Temporal localization Insufficient annotations Large intra-class variations Complex dynamics Summary 2 / 51

3 Overview Speech Processing Computer Vision Dynamic Data Natural Language Processing Neurology 3 / 51

4 Major Tasks Analyze Dynamic Data Collection Annotation Preprocessing Modeling Analysis Modeling Provide a mathematical description of the dynamic process Analysis Prediction: Forecast future values of dynamic data Regression: Estimate a target value given dynamic data Classification: Divide dynamic data into different categories Synthesis: Generate new dynamic data 4 / 51

5 Challenges Insufficient annotation Uncertainty in data and model Intra-class variation Complex dynamics 5 / 51

6 Related Work A taxonomy of modeling techniques Directed PGM HMM LDS DBN Probabilistic approaches Undirected PGM DRF RBM Modeling dynamic data Deterministic approaches Gaussian Process Neural Networks Dynamic Feature RNN LSTM 6 / 51

7 Part 1: Dynamic Pattern Localization Problem: Temporally localize the dynamic pattern in a time series by determining its starting and ending time Irrelevant segment Irrelevant segment Inlier subsequence Applications: Brain computer interface (BCI) Speech recognition Event recognition Our Solution: Combine dynamic model (HMM) with robust estimation (RANSAC) 7 / 51

8 Methods Overview Set of Complete Sequences HMM HMM s sth subset Set of Subsequences p: Probability of selecting all inliers ε: Proportion of outliers K: Total number complete sequences 8 / 51

9 Methods Step 1: Divide complete sequences into overlapping subsequences L M + 1 L M 9 / 51

10 Methods Step 2: Randomly select a subsequence from each complete sequence to train HMM K HMM Learning: EM algorithm 10 / 51

11 Methods Step 3: Vote for the learned HMM using remaining subsequences K HMM θ Votes: 11 / 51

12 Methods Step 1: Divide complete sequences into overlapping subsequences Step 2: Randomly select a subsequence from each complete sequence to train HMM Step 3: Vote for the learned HMM using remaining subsequences Step 4: Go back to Step 2 and repeat for enough times Step 5: Find the HMM with the highest vote and use it to identify the inlier subsequences Step 6: Retrain the HMM using inlier subsequences Step 7: Identify inlier subsequences and merge to get signals of interest 12 / 51

IJCAI2013 Preprocessing & Feature Extraction: Spectrum

13 Experiments Electrocorticographic (ECoG) data Data collection Experiment protocol Wang et al. IJCAI2013 Preprocessing & Feature Extraction: Spectrum filter (70-170Hz) + Hilbert transform Stage 1 State 2 Stage 3 13 / 51

14 Experiments Data Statistics 4 subjects Each has 10 trials, each of length 800 Average hit time Classification Results: 14 / 51

15 Part 2: Dynamic Regression under Insufficient Annotation Problem: Given sequential data x 1, x T, we want to compute a regression function y t = f(x t ), for some target value y t Challenge: Only part of the sequential data are annotated Applications: Facial expression intensity estimation Sensor fault detection Part-of-speech tagging Our solution: Incorporate temporal information as additional constraints 15 / 51

16 Problem Statement Goal: Given input set with partial labels, find a regression function from x to y Example: Peak x 1, y 1 x 2 x 3 x 4 (x 5, y 5 ) y y onset offset V = 1,5 E = 1,2, 1,3, 1,4, 1,5, (2,3) 2,4, 2,5, 3,4, 3,5, (4,5) 16 / 51

17 Ordinal Support Vector Regression (OSVR) Regression model Dataset with weak labels: Optimization problem Objective = Regularization + Regression Loss + with parameters Ordinal Loss Optimization method: alternating direction method of multipliers (ADMM) 17 / 51

18 Experiments: Facial Expression Intensity Estimation Overview Dataset: PAIN Feature: Gabor features, landmark, LBP + PCA Evaluation Criteria: Pearson correlation coefficient (PCC) Intra-class correlation (ICC) Mean absolute error (MAE) 18 / 51

19 Experiments PAIN dataset Experiments under different annotation settings Partial labels: about 8.8% of the total number of frames Use of ordinal information is very helpful 19 / 51

20 Experiments PAIN dataset Comparison with state-of-the-art 20 / 51

Part 3: Classification and Synthesis under Large Intra-class Variation Challenge: Same underlying dynamic pattern can manifest significant intra-class variation

21 Part 3: Classification and Synthesis under Large Intra-class Variation Challenge: Same underlying dynamic pattern can manifest significant intra-class variation Applications: Human action recognition Human motion synthesis Speech recognition Language translation Our Solution: Hidden Semi-Markov Model Bayesian Hierarchical Modeling 21 / 51

22 Methods Hidden Markov Model (HMM) A π Z 1 Z 2 Z 3 Z T ψ X 1 X 2 X 3 X T initial transition emission 22 / 51

23 Methods Hidden Semi-Markov Model (HSMM) C D 1 D 2 D 3 D T A π Z 1 Z 2 Z 3 Z T ψ X 1 X 2 X 3 X T initial transition duration emission 23 / 51

24 Methods Bayesian Hierarchical Dynamic Model (HDM) ξ C D 1 D 2 D 3 D T η A η 0 π Z 1 Z 2 Z 3 Z T λ ψ α θ X 1 X 2 X 3 X T likelihood prior 24 / 51

25 Methods Learning: estimating hyperparameter α Overall objective Intractable Approximate objective An alternating strategy Optimization methods: MAP-EM ML 25 / 51

26 Methods Bayesian Inference: predictive likelihood Advantage: reduce overfitting and improve generalization Posterior inference: Method: Gibbs Sampling Classification: 26 / 51

27 Experiments: Action Recognition Overview: Dataset: MSR-Action3D, UTD-MHAD, G3D, Penn Feature: Joint position and motion in 3D or 2D 27 / 51

28 Experiments: Action Recognition Individual dataset: comparison with different baselines 28 / 51

29 Experiments: Action Recognition Individual dataset: comparison with different state-of-the-art 29 / 51

30 Experiments: Action Recognition Cross dataset classification accuracy A: MSR B: UTD C: G3D 30 / 51

31 Methods: Adversarial Learning Adversarial Learning: a better criterion for data synthesis Overall objective 31 / 51

32 Methods: Adversarial Inference Bayesian adversarial inference Sample both generator θ and discriminator φ Discriminator Generator α d Z, D α g X + φ X θ H + H 32 / 51

33 Methods: Adversarial Inference Bayesian adversarial inference Sample generator θ conditioned on φ Generator Z, D α g φ X θ likelihood prior H Inference method: Stochastic Gradient Hamiltonian Monte Carlo (SGHMC) 33 / 51

34 Methods: Adversarial Inference Bayesian adversarial inference Sample discriminator φ conditioned on θ Discriminator X + α d φ X Likelihood H + H prior Inference method: Stochastic Gradient Hamiltonian Monte Carlo (SGHMC) 34 / 51

35 Methods Bayesian adversarial inference: data synthesis Overall synthesis target: Steps: Posterior sampling Z, D α g Generate new data using all the {θ m } X θ D + M 35 / 51

36 Experiments: Motion Synthesis Dataset CMU Motion capture Berkeley Motion capture Feature: Joint angles Qualitative Results Real data Synthetic data 36 / 51

37 Experiments: Motion Synthesis Quantitative Results Metric: BLEU score: fidelity CMU Berkeley 37 / 51

38 Experiments: Motion Synthesis Quantitative Results Inception score: diversity MMD: distribution-level similarity 38 / 51

39 Part 4: Modeling Complex Dynamics Challenge: Structural dependency Long-term temporal dependency Uncertainty and large variations Application: Action recognition Our solution: Bayesian Graph Convolution LSTM 39 / 51

40 Methods Overall framework: Training data Bayesian GC-LSTM Δt Classifier GCU LSTM Testing data Discriminator 40 / 51

41 Methods GC-LSTM Overall architecture 41 / 51

k (l) D l D l+1 + b k (l) N: number of nodes D: dimension

42 Methods GC-LSTM Graph convolution: generalize convolution to arbitrary graph structured data F k N N H (l) N D l W k (l) D l D l+1 + b k (l) N: number of nodes D: dimension of each node Graph kernel Data Kernel parameters 42 / 51

43 Methods GC-LSTM Long short-term memory (LSTM) network: modeling long-term temporal dynamics Input X 1 X 2 X T Hidden state Z 1 Z 2 Z T Output y 1 y 2 y T 43 / 51

44 Methods Bayesian GC-LSTM Extend GC-LSTM to a probabilistic model: θ~p(θ α θ ) Infer the posterior distribution of parameters likelihood prior Training data X + Bayesian GC-LSTM Δt Classifier GCU LSTM 44 / 51

45 Methods Bayesian GC-LSTM Adversarial Prior: use additional discriminator to regularize parameters Intuition: promote a feature representation to be invariant of subject Training data X + Bayesian GC-LSTM Δt Classifier GCU LSTM Testing data X Discriminator 45 / 51

46 Methods Bayesian Inference Classification: 46 / 51

47 Experiments Ablation study Effect of graph convolution Effect of Bayesian inference R: random rotation N: random noise 47 / 51

48 Experiments Comparison with state-of-the-art MSR Action3D SYSU UTD MHAD 48 / 51

49 Experiments Generalization across different datasets 49 / 51

50 Thesis Summary Localization of dynamic pattern Propose a method that combines robust estimation and dynamic model for localization Dynamic pattern regression under insufficient annotation Incorporate ordinal information as addition constraints for model learning Develop an optimization algorithm for parameter estimation Dynamic pattern classification and synthesis under large intraclass variation Propose a Bayesian hierarchical model Develop two Bayesian inference algorithms Modeling complex dynamics Propose a Bayesian neural network model Develop a Bayesian inference algorithm 50 / 51

51 Thank You! Related Publications: Rui Zhao, Quan Gan, Shangfei Wang and Qiang Ji, Facial Expression Intensity Estimation Using Ordinal Information, CVPR Rui Zhao, Md Ridwan Al Iqbal, Kristin Bennett and Qiang Ji, Wind Turbine Fault Prediction Using Soft Label SVM, ICPR Rui Zhao, Gerwin Schalk, Qiang Ji, Robust Signal Identification for Dynamic Pattern Classification, ICPR Rui Zhao, Qiang Ji, An Adversarial Hierarchical Hidden Markov Model for Human Pose Modeling and Generation, AAAI Rui Zhao, Gerwin Schalk, Qiang Ji, Temporal Pattern Localization using Mixed Integer Linear Programming, ICPR Rui Zhao, Qiang Ji, An Empirical Evaluation of Bayesian Inference Methods for Bayesian Neural Networks, NIPS Workshop 2018 (To appear) Rui Zhao, Wanru Xu, Hui Su, Qiang Ji, Bayesian Hierarchical Dynamic Model for Human Action Recognition (Under review) Rui Zhao, Hui Su, Qiang Ji, Bayesian Adversarial Human Motion Synthesis (Under review) Rui Zhao, Kang Wang, Hui Su, Qiang Ji, Bayesian Graph Convolution LSTM for Skeleton based Action Recognition (Under review) 51 / 51

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous