Juergen Gall. Analyzing Human Behavior in Video Sequences

Similar documents
Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity- Representativeness Reward

Dynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji

Multimodal context analysis and prediction

Recurrent Autoregressive Networks for Online Multi-Object Tracking. Presented By: Ishan Gupta

LEARNING REPRESENTATIONS OF SEQUENCES IPAM GRADUATE SUMMER SCHOOL ON DEEP LEARNING

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

End-to-End Learning of Deformable Mixture of Parts and Deep Convolutional Neural Networks for Human Pose Estimation

Hidden CRFs for Human Activity Classification from RGBD Data

Recurrent Neural Networks

Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook

Spatial Transformation

Anticipating Visual Representations from Unlabeled Data. Carl Vondrick, Hamed Pirsiavash, Antonio Torralba

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Deep Generative Models. (Unsupervised Learning)

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Spatial Transformer Networks

Knowledge Extraction from DBNs for Images

Neural Networks. Intro to AI Bert Huang Virginia Tech

Introduction to Convolutional Neural Networks 2018 / 02 / 23

Two-Stream Bidirectional Long Short-Term Memory for Mitosis Event Detection and Stage Localization in Phase-Contrast Microscopy Images

Unsupervised Learning

CSC321 Lecture 9 Recurrent neural nets

Deep Learning for Computer Vision

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Asaf Bar Zvi Adi Hayat. Semantic Segmentation

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016

Modelling Time Series with Neural Networks. Volker Tresp Summer 2017

Lecture 5 Neural models for NLP

Recurrent Neural Networks. Jian Tang

Lecture 17: Neural Networks and Deep Learning

Pattern Recognition and Machine Learning

EE-559 Deep learning Recurrent Neural Networks

Neural networks COMS 4771

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Global Behaviour Inference using Probabilistic Latent Semantic Analysis

Recurrent Neural Networks (RNN) and Long-Short-Term-Memory (LSTM) Yuan YAO HKUST

Introduction to Neural Networks

Random Field Models for Applications in Computer Vision

arxiv: v2 [cs.sd] 7 Feb 2018

INF 5860 Machine learning for image classification. Lecture 14: Reinforcement learning May 9, 2018

Neural Architectures for Image, Language, and Speech Processing

Introduction to Graphical Models

RECURRENT NETWORKS I. Philipp Krähenbühl

Modeling Complex Temporal Composition of Actionlets for Activity Prediction

Final Examination CS540-2: Introduction to Artificial Intelligence

Machine Learning. Classification, Discriminative learning. Marc Toussaint University of Stuttgart Summer 2015

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

STA 4273H: Statistical Machine Learning

ECE521 Lectures 9 Fully Connected Neural Networks

AUDIO SET CLASSIFICATION WITH ATTENTION MODEL: A PROBABILISTIC PERSPECTIVE. Qiuqiang Kong*, Yong Xu*, Wenwu Wang, Mark D. Plumbley

A Hierarchical Convolutional Neural Network for Mitosis Detection in Phase-Contrast Microscopy Images

UNSUPERVISED LEARNING

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I

Lecture 12. Talk Announcement. Neural Networks. This Lecture: Advanced Machine Learning. Recap: Generalized Linear Discriminants

Introduction to Deep Neural Networks

Slide credit from Hung-Yi Lee & Richard Socher

Neural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35

Lecture 12. Neural Networks Bastian Leibe RWTH Aachen

A Constraint Generation Approach to Learning Stable Linear Dynamical Systems

Eve: A Gradient Based Optimization Method with Locally and Globally Adaptive Learning Rates

COMP90051 Statistical Machine Learning

Lecture 12. Neural Networks Bastian Leibe RWTH Aachen

Text2Action: Generative Adversarial Synthesis from Language to Action

Statistical Filters for Crowd Image Analysis

Unsupervised Learning of Hierarchical Models. in collaboration with Josh Susskind and Vlad Mnih

Machine Learning for Structured Prediction

Learning Semantic Prediction using Pretrained Deep Feedforward Networks

Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting 卷积 LSTM 网络 : 利用机器学习预测短期降雨 施行健 香港科技大学 VALSE 2016/03/23

AUTOMATIC Music Transcription (AMT) is a fundamental. An End-to-End Neural Network for Polyphonic Piano Music Transcription

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Stephen Scott.

Conditional Language modeling with attention

Replicated Softmax: an Undirected Topic Model. Stephen Turner

Jakub Hajic Artificial Intelligence Seminar I

KNOWLEDGE-GUIDED RECURRENT NEURAL NETWORK LEARNING FOR TASK-ORIENTED ACTION PREDICTION

ECE521 Lecture 19 HMM cont. Inference in HMM

Machine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Introduction to Deep Learning

Recurrent Neural Network

Long-Short Term Memory and Other Gated RNNs

Normalization Techniques in Training of Deep Neural Networks

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Lecture 16 Deep Neural Generative Models

Learning Deep Architectures

Based on slides by Richard Zemel

Information Extraction from Text

Memory-Augmented Attention Model for Scene Text Recognition

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018

Supervised Learning. George Konidaris

Recurrent neural networks

The Perceptron. Volker Tresp Summer 2016

CSC321 Lecture 16: ResNets and Attention

Deep Learning: Approximation of Functions by Composition

Bayesian Networks (Part I)

Competitive Learning for Deep Temporal Networks

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning Techniques for Computer Vision

attention mechanisms and generative models

CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning

De Novo molecular design with Deep Reinforcement Learning

Transcription:

Juergen Gall Analyzing Human Behavior in Video Sequences

09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 2 Analyzing Human Behavior Analyzing Human Behavior Human Pose Low level features, e.g., gradients, optical flow Videos

09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 3 21 Actions from HMDB HMDB51 (Kuehne et al, ICCV 2011) 928 clips, 33183 frames

Puppet Annotation 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 4

Joint-annotated HMDB (JHMDB) [ H. Jhuang et al. Towards Understanding Action Recognition. ICCV 2013 ] [ http://jhmdb.is.tue.mpg.de ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 5

09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 6 Study with Annotated Data (2013) Low Mid High baseline given puppet flow given puppet mask given joint positions baseline given flow given mask pose features GT + ~11% + ~9% + ~20% Large potential gain for pose feature Not with existing 2d human pose methods [ H. Jhuang et al. Towards Understanding Action Recognition. ICCV 2013 ] [ http://jhmdb.is.tue.mpg.de ]

09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 7 CNNs for Pose Estimation Stack CNNs: [ S.-E. Wei et al. Convolutional Pose Machines. CVPR 2016 ]

09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 16 Coupled Action Recognition and Pose Estimation [ U. Iqbal et al. Pose for Action Action for Pose. FG 2017 ]

Pose Estimation in Videos Video datasets for human pose in unconstrained videos does not exist. [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 18

09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 19 Pose Estimation in Videos Video datasets for human pose in unconstrained videos does not exist. Unconstrained means Public available content from the Internet (e.g. Youtube) Multiple persons in a video (no assumption about position) Arbitrary number of visible joints (truncation and occlusion) Large scale variations (unknown scale) [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ]

Pose-Track Dataset [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 20

Joint-annotated HMDB (JHMDB) [ H. Jhuang et al. Towards Understanding Action Recognition. ICCV 2013 ] [ http://jhmdb.is.tue.mpg.de ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 22

Pose-Track Dataset [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 23

Pose-Track Dataset 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 24

09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 25 Challenge ICCV 2017 [ http://posetrack.net/workshops/iccv2017 ]

Pose Track: Simultaneous Pose Estimation and Tracking Estimate pose + person association over time: [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 26

09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 27 Pose Track: Simultaneous Pose Estimation and Tracking Estimate pose + person association over time: Predict body joints (CNN trained on MPII Pose)

09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 28 Pose Track: Simultaneous Pose Estimation and Tracking Estimate pose + person association over time: Predict body joints (CNN trained on MPII Pose) Build a graph with temporal and spatial edges [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ]

Pose Track: Simultaneous Pose Estimation and Tracking f f f [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 29

Pose Track: Simultaneous Pose Estimation and Tracking f f f [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 30

Pose Track: Simultaneous Pose Estimation and Tracking Unaries: Confidences of detected joints [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 31

09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 32 Pose Track: Simultaneous Pose Estimation and Tracking Spatial binaries: Extract quadratic bounding box around detection Two cases: Different joint type: Logistic regression based on distance and orientation [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ]

09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 33 Pose Track: Simultaneous Pose Estimation and Tracking Spatial binaries: Extract quadratic bounding box around detection Two cases: Same joint type: [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ]

Pose Track: Simultaneous Pose Estimation and Tracking [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 34

Pose Track: Simultaneous Pose Estimation and Tracking Temporal binaries: Compute optical flow (DeepMatching) [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 35

Pose Track: Simultaneous Pose Estimation and Tracking Temporal binaries: Compute optical flow (DeepMatching) Logistic regression: [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 36 f f

Pose Track: Simultaneous Pose Estimation and Tracking Solve integer linear program: f f f [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 37

Pose Track: Simultaneous Pose Estimation and Tracking Solve integer linear program: [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 38

Pose Track: Simultaneous Pose Estimation and Tracking Solve integer linear program: f f f [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 39

Pose Track: Simultaneous Pose Estimation and Tracking To obtain plausible pauses, constraints are added: Spatial transitivity: [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 40

09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 42 Pose Track: Simultaneous Pose Estimation and Tracking To obtain plausible pauses, constraints are added: Spatial transitivity: Temporal transitivity: [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ]

09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 43 Pose Track: Simultaneous Pose Estimation and Tracking To obtain plausible pauses, constraints are added: Spatial transitivity: Temporal transitivity: Spatio-temporal trans.: [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ]

Pose Track: Simultaneous Pose Estimation and Tracking 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 44

Pose Track: Simultaneous Pose Estimation and Tracking To obtain plausible pauses, constraints are added: Spatio-temporal consistency: [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 45

09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 46 Pose Track: Simultaneous Pose Estimation and Tracking Estimate pose + person association over time: Predict body joints (CNN trained on MPII Pose) Build a graph with temporal and spatial edges Partition spatio-temporal graph [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ]

Pose Track: Simultaneous Pose Estimation and Tracking 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 47

09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 48 Pose Track: Evaluation Pose estimation accuracy (map) Person association (MOTA) [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ]

09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 49 Pose Track: Evaluation Pose estimation accuracy (map) Person association (MOTA) [ U. Iqbal et al. Pose-Track: Joint Multi-Person Pose Estimation and Tracking. CVPR 2017 ]

Joint-annotated HMDB (JHMDB) [ H. Jhuang et al. Towards Understanding Action Recognition. ICCV 2013 ] [ http://jhmdb.is.tue.mpg.de ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 50

Video Analysis for Studying the Behavior of Mice 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 72

10/ 9 / 20 17 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 74 Recurrent Neural Networks Gated units (LSTM/GRU)

Weakly Supervised Learning Fully supervised: Weakly supervised (transcripts) [ A. Richard et al. Weakly Supervised Action Learning with RNN based Fine-to-Coarse Modeling. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 75

Weakly Supervised Learning 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 77

09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 78 Weakly Supervised Learning Represent an activity a like spoon_powder by latent sub-activities s 1 (a),s 2 (a),s 3 (a), s (a) 1 s (a) 2 s (a) 3 s (a) 4 s (a) 5 s (a) 6 Optimal number of sub-activities is unknown: Many sub-activities for long activities Few sub-activities for short activities

09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 79 Model RNN with Gated Recurrent Units (GRU)

09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 80 Model Hidden Markov Model (HMM) enforce fixed order of sub-activities: s 1 (a),s 2 (a),s 3 (a), HMMs use probabilities of RNN as input

Model Hidden Markov Model (HMM) for each activity 09.10.2017 Juergen Gall Institute of Computer Science III Computer Vision Group 81

09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 82 Model The transcripts define the order of activities:

09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 83 Model The transcripts define the order of activities:

09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 84 Model The transcripts define the order of activities:

Weakly Supervised Learning [ A. Richard et al. Weakly Supervised Action Learning with RNN based Fine-to-Coarse Modeling. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 85

Weakly Supervised Learning [ A. Richard et al. Weakly Supervised Action Learning with RNN based Fine-to-Coarse Modeling. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 86

Weakly Supervised Learning [ A. Richard et al. Weakly Supervised Action Learning with RNN based Fine-to-Coarse Modeling. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 87

Weakly Supervised Learning [ A. Richard et al. Weakly Supervised Action Learning with RNN based Fine-to-Coarse Modeling. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 88

Weakly Supervised Learning [ A. Richard et al. Weakly Supervised Action Learning with RNN based Fine-to-Coarse Modeling. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 89

Results 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 90

Results Accuracy on unseen sequences (video without transcript) [ A. Richard et al. Weakly Supervised Action Learning with RNN based Fine-to-Coarse Modeling. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 91

Results Accuracy on unseen sequences (video without transcript) [ A. Richard et al. Weakly Supervised Action Learning with RNN based Fine-to-Coarse Modeling. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 92

Results Accuracy on unseen sequences (video with transcript) [ A. Richard et al. Weakly Supervised Action Learning with RNN based Fine-to-Coarse Modeling. CVPR 2017 ] 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 93

Research Unit - Anticipating Human Behavior [ https://pages.iai.uni-bonn.de/for2535 ] 2 0. 0 9. 2 0 1 6 R e s e a r c h U n i t 2 5 3 5 - Antici p a t i n g H u m a n B e h a vi o r 94

Research Unit - Anticipating Human Behavior 2 0. 0 9. 2 0 1 6 R e s e a r c h U n i t 2 5 3 5 - Antici p a t i n g H u m a n B e h a vi o r 95

Thank you for your attention. 09. 10. 2 01 7 Juer gen Gall Instit u t e of Com puter Science III Com puter Vision Gr oup 96