REVISIT ENCODER & DECODER

Similar documents
Course Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch

TUTORIAL PART 1 Unsupervised Learning

Unsupervised Learning of Hierarchical Models. in collaboration with Josh Susskind and Vlad Mnih

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

CSC321 Lecture 20: Autoencoders

UNSUPERVISED LEARNING

Deep Generative Models. (Unsupervised Learning)

Convolutional Neural Networks

Building a Multi-FPGA Virtualized Restricted Boltzmann Machine Architecture Using Embedded MPI

Activity Mining in Sensor Networks

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Multimodal context analysis and prediction

Deep Learning Autoencoder Models

Dynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji

Dynamical Systems and Deep Learning: Overview. Abbas Edalat

Knowledge Extraction from DBNs for Images

Speech and Language Processing

The Origin of Deep Learning. Lili Mou Jan, 2015

Presented By: Omer Shmueli and Sivan Niv

Agenda. Digit Classification using CNN Digit Classification using SAE Visualization: Class models, filters and saliency 2 DCT

Jakub Hajic Artificial Intelligence Seminar I

Towards Fully-automated Driving

Multimodal Machine Learning

Affect recognition from facial movements and body gestures by hierarchical deep spatio-temporal features and fusion strategy

Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)

from Object Image Based on

Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)

Deep Learning Architectures and Algorithms

Introduction to Deep Neural Networks

Probabilistic Graphical Models for Image Analysis - Lecture 1

Knowledge Extraction from Deep Belief Networks for Images

Learning Deep Architectures

Deep unsupervised learning

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Convolution and Pooling as an Infinitely Strong Prior

CSCI 315: Artificial Intelligence through Deep Learning

Why DNN Works for Acoustic Modeling in Speech Recognition?

Bayesian Hidden Markov Models and Extensions

Unsupervised Learning

Keywords Eigenface, face recognition, kernel principal component analysis, machine learning. II. LITERATURE REVIEW & OVERVIEW OF PROPOSED METHODOLOGY

What and where: A Bayesian inference theory of attention

Lecture 16 Deep Neural Generative Models

IMISOUND: An Unsupervised System for Sound Query by Vocal Imitation

An Evolutionary Programming Based Algorithm for HMM training

Deep Learning. Jun Zhu

Feature Design. Feature Design. Feature Design. & Deep Learning

Neural Map. Structured Memory for Deep RL. Emilio Parisotto

Quantum Artificial Intelligence and Machine Learning: The Path to Enterprise Deployments. Randall Correll. +1 (703) Palo Alto, CA

CS 188: Artificial Intelligence Fall 2011

APPLIED DEEP LEARNING PROF ALEXIEI DINGLI

Neural Networks and Machine Learning research at the Laboratory of Computer and Information Science, Helsinki University of Technology

Neuroevolution for sound event detection in real life audio: A pilot study

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016

The Changing Landscape of Land Administration

Classification of Hand-Written Digits Using Scattering Convolutional Network

self-driving car technology introduction

RegML 2018 Class 8 Deep learning

STA 414/2104: Lecture 8

SGD and Deep Learning

Speech-driven Facial Animation

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

Deep Neural Networks

Data Informatics. Seon Ho Kim, Ph.D.

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6

Deep Learning of Invariant Spatiotemporal Features from Video. Bo Chen, Jo-Anne Ting, Ben Marlin, Nando de Freitas University of British Columbia

Lecture: Face Recognition

ONE-VECTOR REPRESENTATIONS OF STOCHASTIC SIGNALS FOR PATTERN RECOGNITION HAO TANG DISSERTATION

Face detection and recognition. Detection Recognition Sally

Unsupervised Neural Nets

Unsupervised Learning: K-Means, Gaussian Mixture Models

Structured deep models: Deep learning on graphs and beyond

Competitive Learning for Deep Temporal Networks

Tutorial on Methods for Interpreting and Understanding Deep Neural Networks. Part 3: Applications & Discussion

Machine Learning. Boris

An efficient way to learn deep generative models

Modelling Time Series with Neural Networks. Volker Tresp Summer 2017

Decoding conceptual representations

Situation. The XPS project. PSO publication pattern. Problem. Aims. Areas

LEARNING REPRESENTATIONS OF SEQUENCES IPAM GRADUATE SUMMER SCHOOL ON DEEP LEARNING

Speaker recognition by means of Deep Belief Networks

PATTERN CLASSIFICATION

Topics in Natural Language Processing

Introduction p. 1 Compression Techniques p. 3 Lossless Compression p. 4 Lossy Compression p. 5 Measures of Performance p. 5 Modeling and Coding p.

Lecture 14: Deep Generative Learning

Variational Autoencoders. Presented by Alex Beatson Materials from Yann LeCun, Jaan Altosaar, Shakir Mohamed

Artificial Neural Networks Examination, June 2005

Acoustic Unit Discovery (AUD) Models. Leda Sarı

Multi-scale Geometric Summaries for Similarity-based Upstream S

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Processor & SOC Architecture

Image Recognition by a Second-Order Convolutional Neural Network with Dynamic Receptive Fields

A GENERIC MODEL FOR ESTIMATING USER INTENTIONS IN HUMAN-ROBOT COOPERATION

CS325 Artificial Intelligence Ch. 15,20 Hidden Markov Models and Particle Filtering

Greedy Layer-Wise Training of Deep Networks

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

Reading Group on Deep Learning Session 4 Unsupervised Neural Networks

Neural Network Control of Robot Manipulators and Nonlinear Systems

DANIEL WILSON AND BEN CONKLIN. Integrating AI with Foundation Intelligence for Actionable Intelligence

Hierarchical Clustering of Dynamical Systems based on Eigenvalue Constraints

Transcription:

PERCEPTION-LINK BEHAVIOR MODEL: REVISIT ENCODER & DECODER IMI PHD Presentation Presenter: William Gu Yuanlong (PhD student) Supervisor: Assoc. Prof. Gerald Seet Gim Lee Co-Supervisor: Prof. Nadia Magnenat-Thalmann

2 of 15 CONTENT Introduction Summary of reviewed interface Overview of the proposed framework Encoder and Decoder Conclusion Future work Telepresence (Sense of being there) vs Tele social presence (Sense of being together) [1] Reference [1] F. Biocca et al., The networked minds measure of social presence: Pilot test of the factor structure and concurrent validity, in International Workshop on Presence, 2001.

3 of 15 COMMUNICATION MEDIUMS Distance Telecommunication Essential tools Advantage Improves productivity Eases constrain on resources Face to face communication Golden standard How you say it is more important than what you say Advantage More social richness Reference [1] E. Paulos, Personal Tele-Embodiment, University of California at Berkeley, 2002. [2] K. M. Tsui et al, Towards Measuring the Quality of Interaction: Communication through Telepresence Robots, in Performance Metrics for Intelligent Systems Workshop, 2012.

Degree of social presence 4 of 15 - Improve the existing telepresence robot in term of social presence. - Two aspect of the works were explored 1) Physical appearance (EDGAR) 2) Operator s interface (PLB) MOTIVATION Face to Face Hasegawa s Bot[3] EDGAR EDGAR Wider range of nonverbal cues; less certain postures Life-sized system Rear projection robotic head for realistic face display PRoP[1] MeBot [2] Existing academic TPR Wider range of nonverbal cues Smaller systems (Mebot and Hasegawa) Control systems contradict each other Passive model controller Natural Interface Commercial Limited nonverbal cues Semi-autonomous behavior Anthropomorphism in term of appearance and functionality Reference [1] E. Paulos, Personal Tele-Embodiment, University of California at Berkeley, 2002. [2] C. Breazeal, MeBot : A robotic platform for socially embodied telepresence, in The 5th ACM/IEEE international conference on Human-robot interaction, 2010. [3] K. Hasegawa and Y. Nakauchi, Preliminary Evaluation of a Telepresence Robot Conveying Pre-motions for Avoiding Speech Collisions, in hai-conference.net, 2013.

5 of 15 SUMMARY: REVIEW OF THE OPERATOR S INTERFACE Reference [1] C. Breazeal, MeBot : A robotic platform for socially embodied telepresence, in The 5th ACM/IEEE international conference on Human-robot interaction, 2010. [2] K. Hasegawa and Y. Nakauchi, Preliminary Evaluation of a Telepresence Robot Conveying Pre-motions for Avoiding Speech Collisions, in hai-conference.net, 2013. [3] H. Park, E. Kim, S. Jang, and S. Park, HMM-based gesture recognition for robot control, in Pattern recognition and Image Analysis, 2005, pp. 607 614. [4] J. M. Susskind et al., Generating Facial Expressions with Deep Belief Nets, in Affective Computing, Emotion Modeling, Synthesis and Recognition, 2008.

6 of 15 Natural interface GENERAL FRAMEWORK A novel flexible model that exhibit expressive nonverbal cues without compromising safety and operator cognitive load. Perception-link behavior system integration Encodes various features into their styles Convolution Neural Network with Restricted Boltzmann machine and Sample Pooling [1] Associates style of various features, both operator and interactants FUSION adaptive resonance theory [2] Decodes the current state based on the style and the previous state. Factored gated restricted Boltzmann machine [3] Reference [1] H. Lee et al, Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations, in Proceedings of the 26th Annual International Conference on Machine Learning, 2009. [2] A. Tan et al., Intelligence through interaction: Towards a unified theory for learning, in Advances in Neural Networks, 2007. [3] R. Memisevic and G. E. Hinton, Learning to represent spatial transformations with factored higherorder Boltzmann machines., Neural computation, 2010.

7 of 15 Labeled encoded signal 1 1 1 h = max(h 0:(T c+1) ) n n n REVISITING ENCODER Revisited gestures encoder Additional database Compared various unsupervised method BOW Kmean BOW GMM h (0) CNN-RBM-Max Evaluated via intra and inter cluster distance between known label. i t i t 1 N h 1 n N Window of size T N h t 1 n N h (k) i t k h (T c+1) i t k c+1 i t T+1 i i 0 1 n N N h T h = f(i 1:c ; W, b) Convoluted weight Convoluted window of size c Convoluted Neural Network via Restricted Boltzmann Machine and Max pooling

8 of 15 DECODER FOR GESTURES Two main considerations Capability to generate different gestures given any encoded signal. Capability to generate similar variations of gestures if encoded signals are close to each others. Basic concept behind encoding and decoding signals One of the possible applications: Collision preventions

9 of 15 FRBM MODEL h t Gate W 2 Factored Gated Restricted Boltzmann Machine Bottom up to estimate the h t given i (t 1): t T+1 and z t R W 3 h t = f(w 1 i t: t T+1 [W 3 R z t ]; W 2, b) z t W 1 Top down to infer i t i t: t T+1 = g W 2 h t W 3 R z t ; W 1, a i t i t 1 i t T+1

Side Front Top Intensity of each feature in Z Intensity of features #18 (Normalized) 10 of 15 GESTURES GENERATION @ DIFFERENT LABELS Input: Z (encoded signals) Output: Gestures Frame index (15Hz) G1 G2 G3 G4 G5 Number of features in Z Given a specific encoded signals (top), a unique gesture(right) can be reconstructed (Animation is looped)

Intensity of each feature in Z Side Front Top 11 of 15 GESTURES GENERATION @ A LABEL S PROXIMITY Input: Z (encoded signals) Output: Gestures N1 N2 N3 Original Number of features in Z Given a set of encoded signals with similar intensity(top), a set of gesture(right) with similar trait can be reconstructed. (Animation is looped)

12 of 15 Reality Encoding Decoding CONCLUSION Capability to generate different gestures given a specific set of encoded signal. Capability to generate similar variations of gestures given three similar encoded signals. Future Challenges for decoder A evaluation method to prove the correctness of the decoded signals. A set of new features to encode and decode the frequencies characteristic. A cheap and real-time method to explore non-collision encoded signals Ideal

Gestures/Postures PCA3 PCA2 13 of 15 FUTURE WORK Associator Adaptive Resonance Theory Euclidean Encoder for the face Currently, the current model works on CK++ data base (frontal only) Identities PCA1 Expression Facial identity and expression

QUESTION AND ANSWER