Juergen Gall Analyzing Human Behavior in Video Sequences
09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 2 Analyzing Human Behavior Analyzing Human Behavior Human Pose Low level features, e.g., gradients, optical flow Videos
09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 3 21 Actions from HMDB HMDB51 (Kuehne et al, ICCV 2011) 928 clips, 33183 frames
Puppet Annotation 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 4
Joint-annotated HMDB (JHMDB) [ H. Jhuang et al. Towards Understanding Action Recognition. ICCV 2013 ] [ http://jhmdb.is.tue.mpg.de ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 5
09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 6 Study with Annotated Data (2013) Low Mid High baseline given puppet flow given puppet mask given joint positions baseline given flow given mask pose features GT + ~11% + ~9% + ~20% Large potential gain for pose feature Not with existing 2d human pose methods [ H. Jhuang et al. Towards Understanding Action Recognition. ICCV 2013 ] [ http://jhmdb.is.tue.mpg.de ]
09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 7 CNNs for Pose Estimation Stack CNNs: [ S.-E. Wei et al. Convolutional Pose Machines. CVPR 2016 ]
09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 16 Coupled Action Recognition and Pose Estimation [ U. Iqbal et al. Pose for Action Action for Pose. FG 2017 ]
Pose Estimation in Videos Video datasets for human pose in unconstrained videos does not exist. [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 18
09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 19 Pose Estimation in Videos Video datasets for human pose in unconstrained videos does not exist. Unconstrained means Public available content from the Internet (e.g. Youtube) Multiple persons in a video (no assumption about position) Arbitrary number of visible joints (truncation and occlusion) Large scale variations (unknown scale) [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ]
Pose-Track Dataset [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 20
Joint-annotated HMDB (JHMDB) [ H. Jhuang et al. Towards Understanding Action Recognition. ICCV 2013 ] [ http://jhmdb.is.tue.mpg.de ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 22
Pose-Track Dataset [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 23
Pose-Track Dataset 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 24
09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 25 Challenge ICCV 2017 [ http://posetrack.net/workshops/iccv2017 ]
Pose Track: Simultaneous Pose Estimation and Tracking Estimate pose + person association over time: [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 26
09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 27 Pose Track: Simultaneous Pose Estimation and Tracking Estimate pose + person association over time: Predict body joints (CNN trained on MPII Pose)
09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 28 Pose Track: Simultaneous Pose Estimation and Tracking Estimate pose + person association over time: Predict body joints (CNN trained on MPII Pose) Build a graph with temporal and spatial edges [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ]
Pose Track: Simultaneous Pose Estimation and Tracking f f f [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 29
Pose Track: Simultaneous Pose Estimation and Tracking f f f [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 30
Pose Track: Simultaneous Pose Estimation and Tracking Unaries: Confidences of detected joints [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 31
09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 32 Pose Track: Simultaneous Pose Estimation and Tracking Spatial binaries: Extract quadratic bounding box around detection Two cases: Different joint type: Logistic regression based on distance and orientation [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ]
09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 33 Pose Track: Simultaneous Pose Estimation and Tracking Spatial binaries: Extract quadratic bounding box around detection Two cases: Same joint type: [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ]
Pose Track: Simultaneous Pose Estimation and Tracking [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 34
Pose Track: Simultaneous Pose Estimation and Tracking Temporal binaries: Compute optical flow (DeepMatching) [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 35
Pose Track: Simultaneous Pose Estimation and Tracking Temporal binaries: Compute optical flow (DeepMatching) Logistic regression: [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 36 f f
Pose Track: Simultaneous Pose Estimation and Tracking Solve integer linear program: f f f [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 37
Pose Track: Simultaneous Pose Estimation and Tracking Solve integer linear program: [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 38
Pose Track: Simultaneous Pose Estimation and Tracking Solve integer linear program: f f f [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 39
Pose Track: Simultaneous Pose Estimation and Tracking To obtain plausible pauses, constraints are added: Spatial transitivity: [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 40
09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 42 Pose Track: Simultaneous Pose Estimation and Tracking To obtain plausible pauses, constraints are added: Spatial transitivity: Temporal transitivity: [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ]
09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 43 Pose Track: Simultaneous Pose Estimation and Tracking To obtain plausible pauses, constraints are added: Spatial transitivity: Temporal transitivity: Spatio-temporal trans.: [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ]
Pose Track: Simultaneous Pose Estimation and Tracking 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 44
Pose Track: Simultaneous Pose Estimation and Tracking To obtain plausible pauses, constraints are added: Spatio-temporal consistency: [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 45
09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 46 Pose Track: Simultaneous Pose Estimation and Tracking Estimate pose + person association over time: Predict body joints (CNN trained on MPII Pose) Build a graph with temporal and spatial edges Partition spatio-temporal graph [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ]
Pose Track: Simultaneous Pose Estimation and Tracking 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 47
09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 48 Pose Track: Evaluation Pose estimation accuracy (map) Person association (MOTA) [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ]
09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 49 Pose Track: Evaluation Pose estimation accuracy (map) Person association (MOTA) [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ]
Joint-annotated HMDB (JHMDB) [ H. Jhuang et al. Towards Understanding Action Recognition. ICCV 2013 ] [ http://jhmdb.is.tue.mpg.de ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 50
Video Analysis for Studying the Behavior of Mice 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 72
10/ 9 / 20 17 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 74 Recurrent Neural Networks Gated units (LSTM/GRU)
Weakly Supervised Learning Fully supervised: Weakly supervised (transcripts) [ A. Richard et al. Weakly Supervised Action Learning with RNN based Fine-to-Coarse Modeling. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 75
Weakly Supervised Learning 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 77
09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 78 Weakly Supervised Learning Represent an activity a like spoon_powder by latent sub-activities s 1 (a),s 2 (a),s 3 (a), s (a) 1 s (a) 2 s (a) 3 s (a) 4 s (a) 5 s (a) 6 Optimal number of sub-activities is unknown: Many sub-activities for long activities Few sub-activities for short activities
09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 79 Model RNN with Gated Recurrent Units (GRU)
09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 80 Model Hidden Markov Model (HMM) enforce fixed order of sub-activities: s 1 (a),s 2 (a),s 3 (a), HMMs use probabilities of RNN as input
Model Hidden Markov Model (HMM) for each activity 09.10.2017 Juergen Gall Institute of Computer Science III Computer Vision Group 81
09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 82 Model The transcripts define the order of activities:
09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 83 Model The transcripts define the order of activities:
09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 84 Model The transcripts define the order of activities:
Weakly Supervised Learning [ A. Richard et al. Weakly Supervised Action Learning with RNN based Fine-to-Coarse Modeling. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 85
Weakly Supervised Learning [ A. Richard et al. Weakly Supervised Action Learning with RNN based Fine-to-Coarse Modeling. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 86
Weakly Supervised Learning [ A. Richard et al. Weakly Supervised Action Learning with RNN based Fine-to-Coarse Modeling. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 87
Weakly Supervised Learning [ A. Richard et al. Weakly Supervised Action Learning with RNN based Fine-to-Coarse Modeling. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 88
Weakly Supervised Learning [ A. Richard et al. Weakly Supervised Action Learning with RNN based Fine-to-Coarse Modeling. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 89
Results 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 90
Results Accuracy on unseen sequences (video without transcript) [ A. Richard et al. Weakly Supervised Action Learning with RNN based Fine-to-Coarse Modeling. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 91
Results Accuracy on unseen sequences (video without transcript) [ A. Richard et al. Weakly Supervised Action Learning with RNN based Fine-to-Coarse Modeling. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 92
Results Accuracy on unseen sequences (video with transcript) [ A. Richard et al. Weakly Supervised Action Learning with RNN based Fine-to-Coarse Modeling. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 93
Research Unit - Anticipating Human Behavior [ https://pages.iai.uni-bonn.de/for2535 ] 2 0. 0 9. 2 0 1 6 R e s e a r c h U n i t 2 5 3 5 - Antici p a t i n g H u m a n B e h a vi o r 94
Research Unit - Anticipating Human Behavior 2 0. 0 9. 2 0 1 6 R e s e a r c h U n i t 2 5 3 5 - Antici p a t i n g H u m a n B e h a vi o r 95
Thank you for your attention. 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 96