Pictorial Structures Revisited: People Detection and Articulated Pose Estimation. Department of Computer Science TU Darmstadt

Similar documents
Modeling Mutual Context of Object and Human Pose in Human-Object Interaction Activities

Discriminative part-based models. Many slides based on P. Felzenszwalb

Graphical Object Models for Detection and Tracking

The state of the art and beyond

End-to-End Learning of Deformable Mixture of Parts and Deep Convolutional Neural Networks for Human Pose Estimation

Face detection and recognition. Detection Recognition Sally

DISCRIMINATIVE DECORELATION FOR CLUSTERING AND CLASSIFICATION

Detecting Humans via Their Pose

Boosting: Algorithms and Applications

CS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines

A Discriminatively Trained, Multiscale, Deformable Part Model

Social Saliency Prediction. Hyun Soo Park and Jianbo Shi

Fisher Vector image representation

Compressed Fisher vectors for LSVR

Object Detection Grammars

Probabilistic Graphical Models & Applications

Fantope Regularization in Metric Learning

Detector Ensemble. Abstract. 1. Introduction

Machine Learning Lecture 5

Beyond Spatial Pyramids

Visual Object Detection

BugID: Rapid-Throughput Insect Population Counting Using Computer Vision

Boosting. CAP5610: Machine Learning Instructor: Guo-Jun Qi

Human Pose Tracking I: Basics. David Fleet University of Toronto

Asaf Bar Zvi Adi Hayat. Semantic Segmentation

Introduction to Discriminative Machine Learning

Lecture 13 Visual recognition

Properties of detectors Edge detectors Harris DoG Properties of descriptors SIFT HOG Shape context

CS5670: Computer Vision

Stochastic Gradient Descent

FPGA Implementation of a HOG-based Pedestrian Recognition System

Recitation 9. Gradient Boosting. Brett Bernstein. March 30, CDS at NYU. Brett Bernstein (CDS at NYU) Recitation 9 March 30, / 14

Achieving scale covariance

Structured Determinantal Point Processes

Part 4: Conditional Random Fields

CS 3710: Visual Recognition Describing Images with Features. Adriana Kovashka Department of Computer Science January 8, 2015

Composite Quantization for Approximate Nearest Neighbor Search

Hidden Markov Models

Fast Human Detection from Videos Using Covariance Features

10-701/ Machine Learning - Midterm Exam, Fall 2010

Accumulated Stability Voting: A Robust Descriptor from Descriptors of Multiple Scales

Robotics 2 AdaBoost for People and Place Detection

Learning with Structured Inputs and Outputs

A Generative Perspective on MRFs in Low-Level Vision Supplemental Material

Introduction to Computer Vision

Naïve Bayes classification

INTEREST POINTS AT DIFFERENT SCALES

Grouplet: A Structured Image Representation for Recognizing Human and Object Interactions

Randomized Ensemble Tracking

INTRODUCTION HIERARCHY OF CLASSIFIERS

Kai Yu NEC Laboratories America, Cupertino, California, USA

CS 231A Section 1: Linear Algebra & Probability Review

LoG Blob Finding and Scale. Scale Selection. Blobs (and scale selection) Achieving scale covariance. Blob detection in 2D. Blob detection in 2D

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang

Human Action Recognition under Log-Euclidean Riemannian Metric

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

38 1 Vol. 38, No ACTA AUTOMATICA SINICA January, Bag-of-phrases.. Image Representation Using Bag-of-phrases

Clustering with k-means and Gaussian mixture distributions

Machine Learning Lecture 2

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Wavelet-based Salient Points with Scale Information for Classification

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Introduction To Graphical Models

Dynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Lab 12: Structured Prediction

Introduction to Machine Learning Midterm, Tues April 8

Machine Learning Lecture 2

CSE 546 Midterm Exam, Fall 2014

Max-Margin Additive Classifiers for Detection

Probabilistic Graphical Models

Bayesian Support Vector Machines for Feature Ranking and Selection

Variational Bayesian Logistic Regression

An overview of Boosting. Yoav Freund UCSD

Representing Images Detecting faces in images

Lecture 12. Neural Networks Bastian Leibe RWTH Aachen

Discriminative Learning and Big Data

Machine Learning 4. week

Shape Outlier Detection Using Pose Preserving Dynamic Shape Models

Kernel Density Topic Models: Visual Topics Without Visual Words

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

Ensemble Methods. NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

COMPUTER vision and machine intelligence have been

2D Image Processing Face Detection and Recognition

Introduction. Chapter 1

Final Exam, Machine Learning, Spring 2009

Hidden Markov Models

Natural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley

Multiclass Classification-1

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

VBM683 Machine Learning

Loss Functions and Optimization. Lecture 3-1

ECE 5424: Introduction to Machine Learning

CS340 Winter 2010: HW3 Out Wed. 2nd February, due Friday 11th February

Lecture 9: PGM Learning

CMU-Q Lecture 24:

Transcription:

Pictorial Structures Revisited: People Detection and Articulated Pose Estimation Mykhaylo Andriluka Stefan Roth Bernt Schiele Department of Computer Science TU Darmstadt

Generic model for human detection and pose estimation Human pose estimation [Felzenszwalb&Huttenlocher, ICCV 5], [Ren et al., ICCV 5], [Sigal&Black, CVPR 6], [Zhang et al., CVPR 6], [Jiang&Marin, CVPR 8], [Ramanan, NIPS 6], [Ferrari et al., CVPR 8], [Ferrari et al., CVPR 9] often rather simple appearance model focus on finding optimal assembly of parts People Detection [Viola et al., ICCV 3], [Dalal&Triggs, CVPR 5], [Leibe et al., CVPR 5], [Andriluka et al., CVPR 8] complex appearance model no pose model or limited to walking motion 2

Generic model for human detection and pose estimation Human pose estimation [Felzenszwalb&Huttenlocher, ICCV 5], [Ren et al., ICCV 5], [Sigal&Black, CVPR 6], [Zhang et al., CVPR 6], [Jiang&Marin, CVPR 8], [Ramanan, NIPS 6], [Ferrari et al., CVPR 8], [Ferrari et al., CVPR 9] often rather simple appearance model focus on finding optimal assembly of parts People Detection [Viola et al., ICCV 3], [Dalal&Triggs, CVPR 5], [Leibe et al., CVPR 5], [Andriluka et al., CVPR 8] complex appearance model no pose model or limited to walking motion 3

Can we make pictorial structures model effective for these tasks? [Fischler&Elschlager, 1973] 4

Can we make pictorial structures model effective for these tasks? Yes... if the model components are chosen right. 5

Pictorial Structures Model Body is represented as flexible configuration of body parts L L = {l, l 1,..., l N } - configuration of parts D = {d, d 1,..., d N } - part evidence posterior over body poses d i p(l D) p(d L)p(L) likelihood of observations prior on body poses 6

Pictorial Structures Model Pictorial structures allow exact and efficient inference. - tree-structured prior - independent part appearance model - discretized part locations - Gaussian pairwise part relationships posterior marginals sumproduct BP p(l i D) L\l i p(l D) l 9 l 5 l 6 l 8 l 1 l 7 l 2 l 3 l 1 l 4 7

Can we make pictorial structures model effective for these tasks? So... what are the right components? 8

Model Components Appearance Model: orientation 1 likelihood of part 1 Prior and Inference: estimated pose. Local Features... AdaBoost 6 4 2 2 4 6 8 1 5 5... orientation K... likelihood of part N part posteriors.... 9

Model Components Appearance Model: orientation 1 likelihood of part 1 Prior and Inference: estimated pose. Local Features... AdaBoost 6 4 2 2 4 6 8 1 5 5... orientation K... likelihood of part N part posteriors.... 1

Likelihood Model Build on recent advances in object detection: state-of-the-art image descriptor: Shape Context [Belongie et al., PAMI 2; Mikolajczyk&Schmid, PAMI 5] dense representation discriminative model: AdaBoost classifier for each body part - Shape Context: 96 dimensions (4 angular, 3 radial, 8 gradient orientations) - Feature Vector: concatenate the descriptors inside part bounding box - head: 432 dimensions - torso: 8448 dimensions 11

Likelihood Model Part likelihood derived from the boosting score: decision stump weight p(d i l i ) = max part location ( decision stump output t α i,th t (x(l i )) t α i,t, ε ) small constant to deal with part occlusions 12

Likelihood Model Input image Head Torso Upper leg Our part likelihoods.... [Ramanan, NIPS 6] 13

Likelihood Model Input image Head Torso Upper leg Our part likelihoods.... [Ramanan, NIPS 6] 14

Likelihood Model Input image Head Torso Upper leg Our part likelihoods.... [Ramanan, NIPS 6] 15

Model Components Appearance Model: orientation 1 likelihood of part 1 Prior and Inference: estimated pose. Local Features... AdaBoost 6 4 2 2 4 6 8 1 5 5... orientation K... likelihood of part N part posteriors.... 16

Kinematic Tree Prior Represent pairwise part relations [Felzenszwalb & Huttenlocher, IJCV 5] 5 5 p(l) =p(l ) p(l i l j ), 4 3 2 1 (i,j) E p(l 2 l 1 )=N(T 12 (l 2 ) T 21 (l 1 ), Σ 12 1 ) 2 3 4 part locations relative to the joint 5 5 5 4 3 2 1 1 2 3 4 transformed part locations 5 5 5 l 2 l 1 1 8 5 4 5 4 l 2 4 2 2 1 2 1 6 3 3 l 1 2 4 1 2 + 1 2 6 3 3 8 4 4 1 5 1 5 5 5 1 5 5 5 5 5 17

Kinematic Tree Prior Prior parameters: {T ij, Σ ij } Parameters of the prior are estimated with maximum likelihood mean pose several independent samples 6 6 6 6 4 4 4 4 2 2 2 2 2 2 2 4 4 4 6 8 6 8 6 8 2 1 12 8 6 4 2 2 4 6 8 1 12 8 6 4 2 2 4 6 8 1 12 8 6 4 2 2 4 6 8 4 6 4 6 4 6 4 2 2 2 6 2 2 2 8 4 6 4 6 4 6 1 8 1 8 1 8 1 5 5 12 12 12 8 6 4 2 2 4 6 8 8 6 4 2 2 4 6 8 8 6 4 2 2 4 6 8 igure 2. (left) Kinematic prior learned on the multi-view and 18

Evaluation Scenarios 1. Human Pose Estimation People dataset [Ramanan, NIPS 6] 2. Upper-body Pose Estimation Buffy dataset [Ferrari et al., CVPR 8] 3. Pedestrian Detection TUD Pedestrians dataset [Andriluka et al., CVPR 8] 19

Evaluation Scenarios 1. Human Pose Estimation People dataset [Ramanan, NIPS 6] 2. Upper-body Pose Estimation Buffy dataset [Ferrari et al., CVPR 8] 3. Pedestrian Detection TUD Pedestrians dataset [Andriluka et al., CVPR 8] 2

Scenario 1: Qualitative Results Our model (g) 8/1 (a) 8/1 (d) 7/1 [Ramanan, NIPS 6] 7/1 /1 3/1 Our model (i) 8/1 (l) 6/1 (k) 7/1 [Ramanan, NIPS 6] 4/1 3/1 The numbers on the left of 3/1 21

Scenario 1: Quantitative Results Method [Ramanan, NIPS 6] 2nd parse Our inference, edge features from [Ramanan, NIPS 6] Our part detectors (SC) Our prior, our part detectors (SC) Our prior, our part detectors (SIFT) Torso Upper legs Lower legs Upper arm Forearm Head Total 52 3 29 17 13 37 27 63 48 37 26 2 45 37 29 12 18 3 4 4 14 81 63 55 47 31 75 55 78 58 54 44 31 66 52 22

Scenario 1: Quantitative Results Method [Ramanan, NIPS 6] 2nd parse Our prior, edge features from [Ramanan, NIPS 6] Our part detectors (SC) Our prior, our part detectors (SC) Our prior, our part detectors (SIFT) Torso Upper legs Lower legs Upper arm Forearm Head Total 52 3 29 17 13 37 27 63 48 37 26 2 45 37 29 12 18 3 4 4 14 81 63 55 47 31 75 55 78 58 54 44 31 66 52 23

Scenario 1: Quantitative Results Method [Ramanan, NIPS 6] 2nd parse Our inference, edge features from [Ramanan, NIPS 6] Our part detectors (SC) Our prior, our part detectors (SC) Our prior, our part detectors (SIFT) Torso Upper legs Lower legs Upper arm Forearm Head Total 52 3 29 17 13 37 27 63 48 37 26 2 45 37 29 12 18 3 4 4 14 81 63 55 47 31 75 55 78 58 54 44 31 66 52 SC = Shape Context 24

Scenario 1: Quantitative Results Method [Ramanan, NIPS 6] 2nd parse Our inference, edge features from [Ramanan, NIPS 6] Our part detectors (SC) Our prior, our part detectors (SC) Our prior, our part detectors (SIFT) Torso Upper legs Lower legs Upper arm Forearm Head Total 52 3 29 17 13 37 27 63 48 37 26 2 45 37 29 12 18 3 4 4 14 81 63 55 47 31 75 55 78 58 54 44 31 66 52 SC = Shape Context 25

Scenario 1: Quantitative Results Method [Ramanan, NIPS 6] 2nd parse Our inference, edge features from [Ramanan, NIPS 6] Our part detectors (SC) Our prior, our part detectors (SC) Our prior, our part detectors (SIFT) Torso Upper legs Lower legs Upper arm Forearm Head Total 52 3 29 17 13 37 27 63 48 37 26 2 45 37 29 12 18 3 4 4 14 81 63 55 47 31 75 55 78 58 54 44 31 66 52 SC = Shape Context 26

Evaluation Scenarios 1. Human Pose Estimation People dataset [Ramanan, NIPS 6] 2. Upper-body Pose Estimation Buffy dataset [Ferrari et al., CVPR 8] 3. Pedestrian Detection TUD Pedestrians dataset [Andriluka et al., CVPR 8] 27

Estimated upper-body poses 28

Quantitative Results Method Torso Upper arm Lower arm Head Total [Ferrari et al. CVPR 8] - - - - 57.9 detectors only 18.9 6.8 3.1 47.2 14.3 full model 9.7 79.3 41.2 95.9 71.3 6 4 2 2 - generic model - prior and appearance learned on the People dataset 4 6 8 1 5 5 29

Quantitative Results Method Torso Upper arm Lower arm Head Total [Ferrari et al. CVPR 8] - - - - 57.9 detectors only 18.9 6.8 3.1 47.2 14.3 full model 9.7 79.3 41.2 95.9 71.3 [Ferrari et al. CVPR 9] - - - - 72.2 6 4 2 2 - generic model - prior and appearance learned on the People dataset 4 6 8 1 5 5 3

Quantitative Results Method Torso Upper arm Lower arm Head Total [Ferrari et al. CVPR 8] - - - - 57.9 detectors only 18.9 6.8 3.1 47.2 14.3 full model 9.7 79.3 41.2 95.9 71.3 [Ferrari et al. CVPR 9] - - - - 72.2 full model, Buffy pose prior 9.7 81.35 46.5 95.5 73.5 1 5 - specialized upper body prior - appearance learned on the People dataset 5 1 5 5 31

Typical Failure Cases Foreshortening Part occlusion Detections on other body parts 32

Evaluation Scenarios 1. Human Pose Estimation People dataset [Ramanan, NIPS 6] 2. Upper-body Pose Estimation Buffy dataset [Ferrari et al., CVPR 8] 3. Pedestrian Detection TUD Pedestrians dataset [Andriluka et al., CVPR 8] 33

People Detection: Results Comparison with state-of-the art in people detection 1 6 8 [Andriluka et al., CVPR 8] 4 6 2 4 This work 2 2 4 2 6 4 8 6 1 8 12 1 14 5 5 5 5 1..9.8 recall.7.6.5.4.3 our model, 8 parts, tree prior our model, 8 parts, tree prior our model, 8 parts, star prior our model, 8 parts, star prior partism [Andriluka etdetector al., CVPR 8] HOG (INRIA Training HOG (INRIA Training Set) Set).2.1...1.2.3.4.5.6 1-precision.7.8.9 1. 34

Conclusion & Future Work Success of pose estimation by body part detection use well understood pose estimation framework (Pictorial Structures) use appropriate representation for kinematic dependencies use state of the art appearance representation (SIFT, SC) and classification (AdaBoost) Next steps: estimate poses in 3D part occlusions appearance constraints between parts 35

Thanks! Acknowledgements: Thanks to Krystian Mikolajczyk for image descriptors code Thanks to Christian Wojek for AdaBoost code and helpful suggestion Thanks to Deva Ramanan and Vittorio Ferrari for making their code and datasets publicly available This work is partially funded by German Research Foundation (DFG) through GRK 1362. Code and pre-trained models will be available at: http://www.mis.informatik.tu-darmstadt.de/code 36