REVISIT ENCODER & DECODER

Size: px

Start display at page:

Download "REVISIT ENCODER & DECODER"

Patrick Garrison
6 years ago
Views:

1 PERCEPTION-LINK BEHAVIOR MODEL: REVISIT ENCODER & DECODER IMI PHD Presentation Presenter: William Gu Yuanlong (PhD student) Supervisor: Assoc. Prof. Gerald Seet Gim Lee Co-Supervisor: Prof. Nadia Magnenat-Thalmann

2 of 15 CONTENT Introduction Summary of reviewed interface Overview of the proposed framework Encoder and Decoder Conclusion Future work Telepresence (Sense of being there) vs Tele social presence

2 2 of 15 CONTENT Introduction Summary of reviewed interface Overview of the proposed framework Encoder and Decoder Conclusion Future work Telepresence (Sense of being there) vs Tele social presence (Sense of being together) [1] Reference [1] F. Biocca et al., The networked minds measure of social presence: Pilot test of the factor structure and concurrent validity, in International Workshop on Presence, 2001.

3 of 15 COMMUNICATION MEDIUMS Distance Telecommunication Essential tools Advantage

standard How you say it is more important than what you say Advantage More social

Paulos, Personal Tele-Embodiment, University of California at Berkeley, 2002. [2] K. M.

3 3 of 15 COMMUNICATION MEDIUMS Distance Telecommunication Essential tools Advantage Improves productivity Eases constrain on resources Face to face communication Golden standard How you say it is more important than what you say Advantage More social richness Reference [1] E. Paulos, Personal Tele-Embodiment, University of California at Berkeley, [2] K. M. Tsui et al, Towards Measuring the Quality of Interaction: Communication through Telepresence Robots, in Performance Metrics for Intelligent Systems Workshop, 2012.

Degree of social presence 4 of 15 - Improve the existing telepresence robot in

- Two aspect of the works were explored 1) Physical appearance (EDGAR) 2)

EDGAR Wider range of nonverbal cues; less certain postures Life-sized system

Existing academic TPR Wider range of nonverbal cues Smaller systems (Mebot and

Natural Interface Commercial Limited nonverbal cues Semi-autonomous behavior

Paulos, Personal Tele-Embodiment, University of California at Berkeley, 2002.

Breazeal, MeBot : A robotic platform for socially embodied telepresence, in

4 Degree of social presence 4 of 15 - Improve the existing telepresence robot in term of social presence. - Two aspect of the works were explored 1) Physical appearance (EDGAR) 2) Operator s interface (PLB) MOTIVATION Face to Face Hasegawa s Bot[3] EDGAR EDGAR Wider range of nonverbal cues; less certain postures Life-sized system Rear projection robotic head for realistic face display PRoP[1] MeBot [2] Existing academic TPR Wider range of nonverbal cues Smaller systems (Mebot and Hasegawa) Control systems contradict each other Passive model controller Natural Interface Commercial Limited nonverbal cues Semi-autonomous behavior Anthropomorphism in term of appearance and functionality Reference [1] E. Paulos, Personal Tele-Embodiment, University of California at Berkeley, [2] C. Breazeal, MeBot : A robotic platform for socially embodied telepresence, in The 5th ACM/IEEE international conference on Human-robot interaction, [3] K. Hasegawa and Y. Nakauchi, Preliminary Evaluation of a Telepresence Robot Conveying Pre-motions for Avoiding Speech Collisions, in hai-conference.net, 2013.

5 of 15 SUMMARY: REVIEW OF THE OPERATOR S

Breazeal, MeBot : A robotic platform for socially

Avoiding Speech Collisions, in hai-conference.

Park, HMM-based gesture recognition for robot

5 5 of 15 SUMMARY: REVIEW OF THE OPERATOR S INTERFACE Reference [1] C. Breazeal, MeBot : A robotic platform for socially embodied telepresence, in The 5th ACM/IEEE international conference on Human-robot interaction, [2] K. Hasegawa and Y. Nakauchi, Preliminary Evaluation of a Telepresence Robot Conveying Pre-motions for Avoiding Speech Collisions, in hai-conference.net, [3] H. Park, E. Kim, S. Jang, and S. Park, HMM-based gesture recognition for robot control, in Pattern recognition and Image Analysis, 2005, pp [4] J. M. Susskind et al., Generating Facial Expressions with Deep Belief Nets, in Affective Computing, Emotion Modeling, Synthesis and Recognition, 2008.

6 of 15 Natural interface GENERAL FRAMEWORK A novel flexible model that exhibit expressive nonverbal cues without compromising safety and operator cognitive load.

features, both operator and interactants FUSION adaptive resonance theory [2] Decodes the current state based on the style and the previous state.

6 6 of 15 Natural interface GENERAL FRAMEWORK A novel flexible model that exhibit expressive nonverbal cues without compromising safety and operator cognitive load. Perception-link behavior system integration Encodes various features into their styles Convolution Neural Network with Restricted Boltzmann machine and Sample Pooling [1] Associates style of various features, both operator and interactants FUSION adaptive resonance theory [2] Decodes the current state based on the style and the previous state. Factored gated restricted Boltzmann machine [3] Reference [1] H. Lee et al, Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations, in Proceedings of the 26th Annual International Conference on Machine Learning, [2] A. Tan et al., Intelligence through interaction: Towards a unified theory for learning, in Advances in Neural Networks, [3] R. Memisevic and G. E. Hinton, Learning to represent spatial transformations with factored higherorder Boltzmann machines., Neural computation, 2010.

7 of 15 Labeled encoded signal 1 1 1 h = max(h 0:(T c+1) ) n n n REVISITING ENCODER Revisited gestures encoder Additional

7 7 of 15 Labeled encoded signal h = max(h 0:(T c+1) ) n n n REVISITING ENCODER Revisited gestures encoder Additional database Compared various unsupervised method BOW Kmean BOW GMM h (0) CNN-RBM-Max Evaluated via intra and inter cluster distance between known label. i t i t 1 N h 1 n N Window of size T N h t 1 n N h (k) i t k h (T c+1) i t k c+1 i t T+1 i i 0 1 n N N h T h = f(i 1:c ; W, b) Convoluted weight Convoluted window of size c Convoluted Neural Network via Restricted Boltzmann Machine and Max pooling

8 8 of 15 DECODER FOR GESTURES Two main considerations Capability to generate different gestures given any encoded signal. Capability to generate similar variations of gestures if encoded signals are close to each others. Basic concept behind encoding and decoding signals One of the possible applications: Collision preventions

9 9 of 15 FRBM MODEL h t Gate W 2 Factored Gated Restricted Boltzmann Machine Bottom up to estimate the h t given i (t 1): t T+1 and z t R W 3 h t = f(w 1 i t: t T+1 [W 3 R z t ]; W 2, b) z t W 1 Top down to infer i t i t: t T+1 = g W 2 h t W 3 R z t ; W 1, a i t i t 1 i t T+1

Gestures Frame index (15Hz) G1 G2 G3 G4 G5 Number of features in Z Given a specific

10 Side Front Top Intensity of each feature in Z Intensity of features #18 (Normalized) 10 of 15 GESTURES DIFFERENT LABELS Input: Z (encoded signals) Output: Gestures Frame index (15Hz) G1 G2 G3 G4 G5 Number of features in Z Given a specific encoded signals (top), a unique gesture(right) can be reconstructed (Animation is looped)

Intensity of each feature in Z Side Front Top 11 of 15 GESTURES GENERATION @ A

Number of features in Z Given a set of encoded signals with similar

11 Intensity of each feature in Z Side Front Top 11 of 15 GESTURES A LABEL S PROXIMITY Input: Z (encoded signals) Output: Gestures N1 N2 N3 Original Number of features in Z Given a set of encoded signals with similar intensity(top), a set of gesture(right) with similar trait can be reconstructed. (Animation is looped)

12 of 15 Reality Encoding Decoding CONCLUSION Capability

12 12 of 15 Reality Encoding Decoding CONCLUSION Capability to generate different gestures given a specific set of encoded signal. Capability to generate similar variations of gestures given three similar encoded signals. Future Challenges for decoder A evaluation method to prove the correctness of the decoded signals. A set of new features to encode and decode the frequencies characteristic. A cheap and real-time method to explore non-collision encoded signals Ideal

13 Gestures/Postures PCA3 PCA2 13 of 15 FUTURE WORK Associator Adaptive Resonance Theory Euclidean Encoder for the face Currently, the current model works on CK++ data base (frontal only) Identities PCA1 Expression Facial identity and expression

14 QUESTION AND ANSWER

Course Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch

Course Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch Psychology 452 Week 12: Deep Learning What Is Deep Learning? Preliminary Ideas (that we already know!) The Restricted Boltzmann Machine (RBM) Many Layers of RBMs Pros and Cons of Deep Learning Course Structure