Social Saliency Prediction. Hyun Soo Park and Jianbo Shi

Size: px

Start display at page:

Download "Social Saliency Prediction. Hyun Soo Park and Jianbo Shi"

Emery Price
5 years ago
Views:

1 Social Saliency Prediction Hyun Soo Park and Jianbo Shi

2 Nonverbal social signals 67% Hogan and Stubbs, Can t Get Through 8 Barriers to Communication, Vinciarelli et al., Social Signal Processing: Survey of an Emerging Domain, Image and Vision Computing, 2009.

3 Nonverbal social signals 67% Location of joint attention Hogan and Stubbs, Can t Get Through 8 Barriers to Communication, Vinciarelli et al., Social Signal Processing: Survey of an Emerging Domain, Image and Vision Computing, 2009.

4 Geometric Localization of Joint Attention Gaze ray Head location Head orientation Location of joint attention Patron-Perez et al., Structured Learning of Human Interactions in TV Shows, PAMI, 2012 Fathi et al., Social Interactions: A First Person Perspective, CVPR, 2012 Park et al., 3D Social Saliency from Head-mounted Cameras, NIPS, 2012

5 Challenges in Social Scenes

6 Challenges in Social Scenes Gaze detection Marin-Jimenez et al., Detecting People Looking at Each Other in Videos, IJCV 2014

7 Social Scene Can we localize joint attention without measuring gaze directions? Challenges in Social Scenes True positive head detection Marin-Jimenez et al., Detecting People Looking at Each Other in Videos, IJCV 2014

8 Input Video

9 Structure from Motion

10 Children sitting area Camera trajectory

11 Children Social saliency Social saliency: likelihood of joint attention Output

12 Children Social saliency Social saliency: likelihood of joint attention Output

Critani et Cristani al., BMVC et al., 2011 BMVC 2011 Fathi et Fathi al., CVPR et al., 2012 CVPR 2012 Park et al., Park NIPS et 2012 al., NIPS 2012 Rodriguez et al., ICCV 2011 Lan et al.

13 Critani et Cristani al., BMVC et al., 2011 BMVC 2011 Fathi et Fathi al., CVPR et al., 2012 CVPR 2012 Park et al., Park NIPS et 2012 al., NIPS 2012 Rodriguez et al., ICCV 2011 Lan et al., PAMI 2012 Chakraborty et al., CVPR 2013 Yang et al., CVPR 2011 Alahi et al., CVPR 2014 Choi et al., ECCV 2014 Li et al., ICCV 2013 Ryoo et al., CVPR 2013 Pusiol et al., CogSci 2014 Arev et al., SIGGRAPH 2014 Third person view Distance between face and camera First person view

14 Joint Attention from First Person Cameras Croquet : Joint attention : Head direction Park et al., 3D Social Saliency from Head-mounted Cameras, NIPS, 2012.

15 Critani et Cristani al., BMVC et al., 2011 BMVC 2011 Fathi et Fathi al., CVPR et al., 2012 CVPR 2012 Park et al., Park NIPS et 2012 al., NIPS D estimation error<10cm Noninvasiveness Measurement accuracy Third person view Distance between face and camera First person view

16 Critani et Cristani al., BMVC et al., 2011 BMVC 2011 Fathi et Fathi al., CVPR et al., 2012 CVPR 2012 Park et al., Park NIPS et 2012 al., NIPS D estimation error<10cm Noninvasiveness Prediction Learning Measurement accuracy Third person view Distance between face and camera First person view

17 : Ground truth joint attention : Head location Location of performer: Ground truth joint attention Children g ({ } N ) i i = 1 Θ = Social formation where N is the number of social members.

18 : Ground truth joint attention : Head location g ({ } N ) i i = 1 Θ = Social formation where N is the number of social members. cf. g ({ } N ) i i 1 = = Geometric localization: triangulation

19 : Ground truth joint attention : Head location g ({ } N ) i i = 1 Θ = Social formation where N is the number of social members.

20 : Ground truth joint attention : Head location : Center of mass g ({ } N ) i i = 1 Θ = Social formation where N is the number of social members. ex. g ({ } N ) i i = 1 1 N Θ = Θ= N i = 1 Geometric localization: center of mass

21 : Ground truth joint attention : Head location : Center of mass : Center of circumcircle g ({ } N ) i i = 1 Θ = Social formation where N is the number of social members. cf. g ({ } N ) i i = 1 1 N Θ = Θ= N i = 1 Geometric localization: center of circumcircle

22 : Ground truth joint attention : Head location : Center of mass : Center of circumcircle g ({ } N ) i i = 1 Θ = Social formation where N is the number of social members.

First Person Social Interaction Data Scene N T(sec) F B-boy I 18 105 317 B-boy II 18 450 1351 B-boy III 18 160 528 B-boy IV 18 50 180 Surprise party 11 120 2227 Class 11 360 3590 Croquet 6 300 6000

23 First Person Social Interaction Data Scene N T(sec) F B-boy I B-boy II B-boy III B-boy IV Surprise party Class Croquet Busker I Busker II Card game Hide and seek Block building Social game Meeting I Meeting II Picnic Musical Dance way party Snowman Total 49,490 social formations

24 : Ground truth joint attention : Head location g ({ } N ) i i = 1 Θ = Social formation where N is the number of social members.

25 g ({ } N ) i i = 1 Θ = Social formation where N is the number of social members. Scale variation : Ground truth joint attention : Head location

26 g ({ } N ) i i = 1 Θ = Social formation where N is the number of social members. Scale variation Orientation variation : Joint attention : Head location

27 Representation: Social Dipole Moment

28 Water molecule, H2O

29 104.5 Water molecule, H2O N q = ( e p ) e i i Electric dipole moment

30 H H O Water molecule, H2O N q = ( e p ) e i i Electric dipole moment

31 H H O Water molecule, H2O N q = ( e p ) e i i Electric dipole moment

32 O O O H H O H H H H O H H H H Water molecule, H2O N q = ( e p ) e i i Electric dipole moment

33 q s : Ground truth joint attention p i c = 1 N N i p i : Head location : Center of mass : Social dipole moment N 1 q= s- pi = s-c N i Social dipole moment

34 Orientation Normalization q p i s c : Ground truth joint attention : Head location : Center of mass : Social dipole moment N 1 q= s- p = s-c i N i Social dipole moment

35 Scale Normalization q p i s c : Ground truth joint attention : Head location : Center of mass : Social dipole moment 1 N N i p i c = 1

36 Social Formation Feature q s p i c

37 Social Formation Feature q s p i c

38 Learning Likelihood of Joint Attention q s x p i c ( f c, f s ; x s) Φ = =1

39 Learning Likelihood of Joint Attention x q x p i x x x s x c x x x ( f c, f s ; x s) Φ = =1 ( f c, f s ; x s) Φ =0 AdaBoost binary classifier

40 Joint Attention Prediction x f s p i c f c ( f c, f s ; x s) Φ = =1 ( f c, f s ; x s) Φ =0 AdaBoost binary classifier

41 Joint Attention Prediction x f s p i c f c ( f c, f s ; x s) Φ = =1 ( f c, f s ; x s) Φ =0 AdaBoost binary classifier Social saliency: Likelihood of joint attention

42 Joint Attention Prediction f s p i s c f c ( f c, f s ; x s) Φ = =1 ( f c, f s ; x s) Φ =0 AdaBoost binary classifier Social saliency: Likelihood of joint attention

43 Input video Human detection

44 Detected social group

45 Input video Human detection Group detection Learning via FPCs Social saliency prediction

46 Result

47 Social member location Joint attention location Center of mass (COM) Center of circumcircle (CC)

48 Group meeting CF: Context feature (Lan et al., PAMI 2012) Social member location Joint attention location Center of mass (COM) Center of circumcircle (CC) Street performance Class interactions

49 Group meeting Street performance Class interactions CF: Context feature (Lan et al., PAMI 2012) Social member location Joint attention location Center of mass (COM) Center of circumcircle (CC)

50 Group meeting Street performance Class interactions CF: Context feature (Lan et al., PAMI 2012) SSF: Social formation feature RF: Random forests Scenes SFF+Boosting SFF+RF CC COM CF Social member location Joint attention location Center of mass (COM) Center of circumcircle (CC) Dance Meeting I B-boy I Class Busker Picnic Social game Mean average precision

55 Basketball Scene Result

57 University team of Northwestern Polytechnical University (China)

59 Dynamic Joint Attention Prediction q s p i c ( f c, f s ; x s) Φ = =1 ( f c, f s ; x s) Φ =0

60 Dynamic Joint Attention Prediction q s p i c v com ( c s f, f, v ; x s) ( c s f, f, v ; x s) Φ = = com 1 Φ = = com 0

61 Top view Social saliency Person detector: Yang and Ramanan, Articulated Human Detection with Flexible Mixtures of Parts, PAMI 2003.

62 Can we predict social saliency without measuring gaze directions? Social formation Social saliency

63 Social Saliency Prediction Hyun Soo Park and Jianbo Shi Project website: Poster #36

Mathematical Definition of Group Behaviors Inspired by Granger Causality

Social Dynamics Mathematical Definition of Group Behaviors Inspired by Granger Causality Behaviors of X and Y are a group behavior if spatial/temporal predictions of Y based on X and Y jointly are more