PERCEPTION-LINK BEHAVIOR MODEL: REVISIT ENCODER & DECODER IMI PHD Presentation Presenter: William Gu Yuanlong (PhD student) Supervisor: Assoc. Prof. Gerald Seet Gim Lee Co-Supervisor: Prof. Nadia Magnenat-Thalmann
2 of 15 CONTENT Introduction Summary of reviewed interface Overview of the proposed framework Encoder and Decoder Conclusion Future work Telepresence (Sense of being there) vs Tele social presence (Sense of being together) [1] Reference [1] F. Biocca et al., The networked minds measure of social presence: Pilot test of the factor structure and concurrent validity, in International Workshop on Presence, 2001.
3 of 15 COMMUNICATION MEDIUMS Distance Telecommunication Essential tools Advantage Improves productivity Eases constrain on resources Face to face communication Golden standard How you say it is more important than what you say Advantage More social richness Reference [1] E. Paulos, Personal Tele-Embodiment, University of California at Berkeley, 2002. [2] K. M. Tsui et al, Towards Measuring the Quality of Interaction: Communication through Telepresence Robots, in Performance Metrics for Intelligent Systems Workshop, 2012.
Degree of social presence 4 of 15 - Improve the existing telepresence robot in term of social presence. - Two aspect of the works were explored 1) Physical appearance (EDGAR) 2) Operator s interface (PLB) MOTIVATION Face to Face Hasegawa s Bot[3] EDGAR EDGAR Wider range of nonverbal cues; less certain postures Life-sized system Rear projection robotic head for realistic face display PRoP[1] MeBot [2] Existing academic TPR Wider range of nonverbal cues Smaller systems (Mebot and Hasegawa) Control systems contradict each other Passive model controller Natural Interface Commercial Limited nonverbal cues Semi-autonomous behavior Anthropomorphism in term of appearance and functionality Reference [1] E. Paulos, Personal Tele-Embodiment, University of California at Berkeley, 2002. [2] C. Breazeal, MeBot : A robotic platform for socially embodied telepresence, in The 5th ACM/IEEE international conference on Human-robot interaction, 2010. [3] K. Hasegawa and Y. Nakauchi, Preliminary Evaluation of a Telepresence Robot Conveying Pre-motions for Avoiding Speech Collisions, in hai-conference.net, 2013.
5 of 15 SUMMARY: REVIEW OF THE OPERATOR S INTERFACE Reference [1] C. Breazeal, MeBot : A robotic platform for socially embodied telepresence, in The 5th ACM/IEEE international conference on Human-robot interaction, 2010. [2] K. Hasegawa and Y. Nakauchi, Preliminary Evaluation of a Telepresence Robot Conveying Pre-motions for Avoiding Speech Collisions, in hai-conference.net, 2013. [3] H. Park, E. Kim, S. Jang, and S. Park, HMM-based gesture recognition for robot control, in Pattern recognition and Image Analysis, 2005, pp. 607 614. [4] J. M. Susskind et al., Generating Facial Expressions with Deep Belief Nets, in Affective Computing, Emotion Modeling, Synthesis and Recognition, 2008.
6 of 15 Natural interface GENERAL FRAMEWORK A novel flexible model that exhibit expressive nonverbal cues without compromising safety and operator cognitive load. Perception-link behavior system integration Encodes various features into their styles Convolution Neural Network with Restricted Boltzmann machine and Sample Pooling [1] Associates style of various features, both operator and interactants FUSION adaptive resonance theory [2] Decodes the current state based on the style and the previous state. Factored gated restricted Boltzmann machine [3] Reference [1] H. Lee et al, Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations, in Proceedings of the 26th Annual International Conference on Machine Learning, 2009. [2] A. Tan et al., Intelligence through interaction: Towards a unified theory for learning, in Advances in Neural Networks, 2007. [3] R. Memisevic and G. E. Hinton, Learning to represent spatial transformations with factored higherorder Boltzmann machines., Neural computation, 2010.
7 of 15 Labeled encoded signal 1 1 1 h = max(h 0:(T c+1) ) n n n REVISITING ENCODER Revisited gestures encoder Additional database Compared various unsupervised method BOW Kmean BOW GMM h (0) CNN-RBM-Max Evaluated via intra and inter cluster distance between known label. i t i t 1 N h 1 n N Window of size T N h t 1 n N h (k) i t k h (T c+1) i t k c+1 i t T+1 i i 0 1 n N N h T h = f(i 1:c ; W, b) Convoluted weight Convoluted window of size c Convoluted Neural Network via Restricted Boltzmann Machine and Max pooling
8 of 15 DECODER FOR GESTURES Two main considerations Capability to generate different gestures given any encoded signal. Capability to generate similar variations of gestures if encoded signals are close to each others. Basic concept behind encoding and decoding signals One of the possible applications: Collision preventions
9 of 15 FRBM MODEL h t Gate W 2 Factored Gated Restricted Boltzmann Machine Bottom up to estimate the h t given i (t 1): t T+1 and z t R W 3 h t = f(w 1 i t: t T+1 [W 3 R z t ]; W 2, b) z t W 1 Top down to infer i t i t: t T+1 = g W 2 h t W 3 R z t ; W 1, a i t i t 1 i t T+1
Side Front Top Intensity of each feature in Z Intensity of features #18 (Normalized) 10 of 15 GESTURES GENERATION @ DIFFERENT LABELS Input: Z (encoded signals) Output: Gestures Frame index (15Hz) G1 G2 G3 G4 G5 Number of features in Z Given a specific encoded signals (top), a unique gesture(right) can be reconstructed (Animation is looped)
Intensity of each feature in Z Side Front Top 11 of 15 GESTURES GENERATION @ A LABEL S PROXIMITY Input: Z (encoded signals) Output: Gestures N1 N2 N3 Original Number of features in Z Given a set of encoded signals with similar intensity(top), a set of gesture(right) with similar trait can be reconstructed. (Animation is looped)
12 of 15 Reality Encoding Decoding CONCLUSION Capability to generate different gestures given a specific set of encoded signal. Capability to generate similar variations of gestures given three similar encoded signals. Future Challenges for decoder A evaluation method to prove the correctness of the decoded signals. A set of new features to encode and decode the frequencies characteristic. A cheap and real-time method to explore non-collision encoded signals Ideal
Gestures/Postures PCA3 PCA2 13 of 15 FUTURE WORK Associator Adaptive Resonance Theory Euclidean Encoder for the face Currently, the current model works on CK++ data base (frontal only) Identities PCA1 Expression Facial identity and expression
QUESTION AND ANSWER