Juergen Gall. Analyzing Human Behavior in Video Sequences

Juergen Gall Analyzing Human Behavior in Video Sequences

09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 2 Analyzing Human Behavior Analyzing Human Behavior Human Pose Low level features, e.g., gradients, optical flow Videos

09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 3 21 Actions from HMDB HMDB51 (Kuehne et al, ICCV 2011) 928 clips, 33183 frames

Puppet Annotation 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 4

Joint-annotated HMDB (JHMDB) [ H. Jhuang et al. Towards Understanding Action Recognition. ICCV 2013 ] [ http://jhmdb.is.tue.mpg.de ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 5

09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 6 Study with Annotated Data (2013) Low Mid High baseline given puppet flow given puppet mask given joint positions baseline given flow given mask pose features GT + ~11% + ~9% + ~20% Large potential gain for pose feature Not with existing 2d human pose methods [ H. Jhuang et al. Towards Understanding Action Recognition. ICCV 2013 ] [ http://jhmdb.is.tue.mpg.de ]

09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 7 CNNs for Pose Estimation Stack CNNs: [ S.-E. Wei et al. Convolutional Pose Machines. CVPR 2016 ]

09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 16 Coupled Action Recognition and Pose Estimation [ U. Iqbal et al. Pose for Action Action for Pose. FG 2017 ]

Pose Estimation in Videos Video datasets for human pose in unconstrained videos does not exist. [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 18

09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 19 Pose Estimation in Videos Video datasets for human pose in unconstrained videos does not exist. Unconstrained means Public available content from the Internet (e.g. Youtube) Multiple persons in a video (no assumption about position) Arbitrary number of visible joints (truncation and occlusion) Large scale variations (unknown scale) [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ]

Pose-Track Dataset [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 20

Pose-Track Dataset 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 24

09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 25 Challenge ICCV 2017 [ http://posetrack.net/workshops/iccv2017 ]

Pose Track: Simultaneous Pose Estimation and Tracking Estimate pose + person association over time: [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 26

09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 27 Pose Track: Simultaneous Pose Estimation and Tracking Estimate pose + person association over time: Predict body joints (CNN trained on MPII Pose)

09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 28 Pose Track: Simultaneous Pose Estimation and Tracking Estimate pose + person association over time: Predict body joints (CNN trained on MPII Pose) Build a graph with temporal and spatial edges [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ]

Pose Track: Simultaneous Pose Estimation and Tracking f f f [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 29

Pose Track: Simultaneous Pose Estimation and Tracking Unaries: Confidences of detected joints [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 31

09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 32 Pose Track: Simultaneous Pose Estimation and Tracking Spatial binaries: Extract quadratic bounding box around detection Two cases: Different joint type: Logistic regression based on distance and orientation [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ]

09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 33 Pose Track: Simultaneous Pose Estimation and Tracking Spatial binaries: Extract quadratic bounding box around detection Two cases: Same joint type: [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ]

Pose Track: Simultaneous Pose Estimation and Tracking [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 34

Pose Track: Simultaneous Pose Estimation and Tracking Temporal binaries: Compute optical flow (DeepMatching) [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 35

Pose Track: Simultaneous Pose Estimation and Tracking Temporal binaries: Compute optical flow (DeepMatching) Logistic regression: [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 36 f f

Pose Track: Simultaneous Pose Estimation and Tracking Solve integer linear program: f f f [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 37

Pose Track: Simultaneous Pose Estimation and Tracking Solve integer linear program: [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 38

Pose Track: Simultaneous Pose Estimation and Tracking To obtain plausible pauses, constraints are added: Spatial transitivity: [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 40

09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 42 Pose Track: Simultaneous Pose Estimation and Tracking To obtain plausible pauses, constraints are added: Spatial transitivity: Temporal transitivity: [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ]

09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 43 Pose Track: Simultaneous Pose Estimation and Tracking To obtain plausible pauses, constraints are added: Spatial transitivity: Temporal transitivity: Spatio-temporal trans.: [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ]

Pose Track: Simultaneous Pose Estimation and Tracking 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 44

Pose Track: Simultaneous Pose Estimation and Tracking To obtain plausible pauses, constraints are added: Spatio-temporal consistency: [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 45

09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 46 Pose Track: Simultaneous Pose Estimation and Tracking Estimate pose + person association over time: Predict body joints (CNN trained on MPII Pose) Build a graph with temporal and spatial edges Partition spatio-temporal graph [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ]

Pose Track: Simultaneous Pose Estimation and Tracking 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 47

09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 48 Pose Track: Evaluation Pose estimation accuracy (map) Person association (MOTA) [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ]

09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 49 Pose Track: Evaluation Pose estimation accuracy (map) Person association (MOTA) [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ]

Video Analysis for Studying the Behavior of Mice 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 72

10/ 9 / 20 17 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 74 Recurrent Neural Networks Gated units (LSTM/GRU)

Weakly Supervised Learning Fully supervised: Weakly supervised (transcripts) [ A. Richard et al. Weakly Supervised Action Learning with RNN based Fine-to-Coarse Modeling. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 75

Weakly Supervised Learning 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 77

09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 78 Weakly Supervised Learning Represent an activity a like spoon_powder by latent sub-activities s 1 (a),s 2 (a),s 3 (a), s (a) 1 s (a) 2 s (a) 3 s (a) 4 s (a) 5 s (a) 6 Optimal number of sub-activities is unknown: Many sub-activities for long activities Few sub-activities for short activities

09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 79 Model RNN with Gated Recurrent Units (GRU)

09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 80 Model Hidden Markov Model (HMM) enforce fixed order of sub-activities: s 1 (a),s 2 (a),s 3 (a), HMMs use probabilities of RNN as input

Model Hidden Markov Model (HMM) for each activity 09.10.2017 Juergen Gall Institute of Computer Science III Computer Vision Group 81

09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 82 Model The transcripts define the order of activities:

09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 83 Model The transcripts define the order of activities:

09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 84 Model The transcripts define the order of activities:

Weakly Supervised Learning [ A. Richard et al. Weakly Supervised Action Learning with RNN based Fine-to-Coarse Modeling. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 85

Results 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 90

Results Accuracy on unseen sequences (video without transcript) [ A. Richard et al. Weakly Supervised Action Learning with RNN based Fine-to-Coarse Modeling. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 91

Results Accuracy on unseen sequences (video with transcript) [ A. Richard et al. Weakly Supervised Action Learning with RNN based Fine-to-Coarse Modeling. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 93

Research Unit - Anticipating Human Behavior [ https://pages.iai.uni-bonn.de/for2535 ] 2 0. 0 9. 2 0 1 6 R e s e a r c h U n i t 2 5 3 5 - Antici p a t i n g H u m a n B e h a vi o r 94

Research Unit - Anticipating Human Behavior 2 0. 0 9. 2 0 1 6 R e s e a r c h U n i t 2 5 3 5 - Antici p a t i n g H u m a n B e h a vi o r 95

Thank you for your attention. 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 96