Multimodal Machine Learning

Size: px
Start display at page:

Download "Multimodal Machine Learning"

Transcription

1 Multimodal Machine Learning Louis-Philippe (LP) Morency CMU Multimodal Communication and Machine Learning Laboratory [MultiComp Lab] 1

2 CMU Course : Multimodal Machine Learning 2

3 Lecture Objectives What is Multimodal? Multimodal: Core technical challenges Representation learning, translation, alignment, fusion and co-learning Multimodal representation learning Multimodal tensor representation Implicit Alignment Temporal attention Fusion and temporal modeling Multi-view LSTM and memory-based fusion

4 What is Multimodal? 4

5 What is Multimodal? Multimodal distribution Multiple modes, i.e., distinct peaks (local maxima) in the probability density function

6 What is Multimodal? Sensory Modalities

7 Multimodal Communicative Behaviors Verbal Lexicon Words Syntax Part-of-speech Dependencies Pragmatics Discourse acts Vocal Prosody Intonation Voice quality Vocal expressions Laughter, moans Visual Gestures Head gestures Eye gestures Arm gestures Body language Body posture Proxemics Eye contact Head gaze Eye gaze Facial expressions FACS action units Smile, frowning 7

8 What is Multimodal? Modality The way in which something happens or is experienced. Modality refers to a certain type of information and/or the representation format in which information is stored. Sensory modality: one of the primary forms of sensation, as vision or touch; channel of communication. Medium ( middle ) A means or instrumentality for storing or communicating information; system of communication/transmission. Medium is the means whereby this information is delivered to the senses of the interpreter. 8

9 Multiple Communities and Modalities Psychology Medical Speech Vision Language Multimedia Robotics Learning

10 Examples of Modalities Natural language (both spoken or written) Visual (from images or videos) Auditory (including voice, sounds and music) Haptics / touch Smell, taste and self-motion Physiological signals Electrocardiogram (ECG), skin conductance Other modalities Infrared images, depth images, fmri

11 Prior Research on Multimodal Four eras of multimodal research The behavioral era (1970s until late 1980s) The computational era (late 1980s until 2000) The interaction era ( ) The deep learning era (2010s until ) Main focus of this tutorial

12 The McGurk Effect (1976) Hearing lips and seeing voices Nature

13 The McGurk Effect (1976) Hearing lips and seeing voices Nature

14 The Computational Era(Late 1980s until 2000) 1) Audio-Visual Speech Recognition (AVSR)

15 Core Technical Challenges 15

16 Core Challenges in Deep Multimodal ML Tadas Baltrusaitis, Chaitanya Ahuja, and Louis-Philippe Morency These challenges are non-exclusive. 16

17 Core Challenge 1: Representation Definition: Learning how to represent and summarize multimodal data in away that exploits the complementarity and redundancy. A Joint representations: Representation Modality 1 Modality 2 17

18 Joint Multimodal Representation Wow! I like it! Joyful tone Joint Representation (Multimodal Space) Tensed voice 18

19 Joint Multimodal Representations Audio-visual speech recognition Bimodal Deep Belief Network Image captioning [Ngiam et al., ICML 2011] [Srivastava and Salahutdinov, NIPS 2012] Multimodal Deep Boltzmann Machine Audio-visual emotion recognition Deep Boltzmann Machine [Kim et al., ICASSP 2013] Depth Video Depth Multimodal Multimodal Representation Visual Verbal Depth Verbal 19

20 Multimodal Vector Space Arithmetic [Kiros et al., Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models, 2014] 20

21 Core Challenge 1: Representation Definition: Learning how to represent and summarize multimodal data in away that exploits the complementarity and redundancy. A Joint representations: B Coordinated representations: Representation Repres. 1 Repres 2 Modality 1 Modality 2 Modality 1 Modality 2 21

22 Coordinated Representation: Deep CCA Learn linear projections that are maximally correlated: u, v = argmax u,v corr u T X, v T Y View H x View H y X u v Y H x U V W x W y Text Image X Y H y Andrew et al., ICML

23 Fancy algorithm Core Challenge 2: Alignment Definition: Identify the direct relations between (sub)elements from two or more different modalities. Modality 1 t 1 t 2 Modality 2 t 4 A Explicit Alignment The goal is to directly find correspondences between elements of different modalities t 3 t 5 B Implicit Alignment Uses internally latent alignment of modalities in order to better solve a different problem t n t n 23

24 Temporal sequence alignment Applications: - Re-aligning asynchronous data - Finding similar data across modalities (we can estimate the aligned cost) - Event reconstruction from multiple sources

25 Alignment examples (multimodal)

26 Implicit Alignment Karpathy et al., Deep Fragment Embeddings for Bidirectional Image Sentence Mapping, 26

27 Core Challenge 3: Fusion Definition: To join information from two or more modalities to perform a prediction task. A Model-Agnostic Approaches 1) Early Fusion 2) Late Fusion Modality 1 Classifier Modality 1 Classifier Modality 2 Modality 2 Classifier 27

28 Core Challenge 3: Fusion Definition: To join information from two or more modalities to perform a prediction task. B Model-Based (Intermediate) Approaches 1) Deep neural networks 2) Kernel-based methods Multiple kernel learning y 3) Graphical models x 1 A h 1 A h 1 V x 1 V h 2 A x 2 A h 2 V x 2 V h 3 A x 3 A h 3 V x 3 V h 4 A x 4 A h 4 V x 4 V h 5 A x 5 A h 5 V x 5 V Multi-View Hidden CRF 28

29 Core Challenge 4: Translation Definition: Process of changing data from one modality to another, where the translation relationship can often be open-ended or subjective. A Example-based B Model-driven 29

30 Core Challenge 4 Translation Visual gestures (both speaker and listener gestures) Transcriptions + Audio streams Marsella et al., Virtual character performance from speech, SIGGRAPH/Eurographics Symposium on Computer Animation, 2013

31 Core Challenge 5: Co-Learning Definition: Transfer knowledge between modalities, including their representations and predictive models. Prediction Modality 1 Modality 2 Help during training 31

32 Core Challenge 5: Co-Learning A Parallel B Non-Parallel C Hybrid 32

33 Taxonomy of Multimodal Research [ ] Representation Joint o Neural networks o Graphical models o Sequential Coordinated o Similarity o Structured Translation Example-based o Retrieval o Combination Model-based o Grammar-based o Encoder-decoder o Online prediction Alignment Explicit o Unsupervised o Supervised Implicit o Graphical models o Neural networks Fusion Model agnostic o Early fusion o Late fusion o Hybrid fusion Model-based o Kernel-based o Graphical models o Neural networks Co-learning Parallel data o Co-training o Transfer learning Non-parallel data Zero-shot learning Concept grounding Transfer learning Hybrid data Bridging Tadas Baltrusaitis, Chaitanya Ahuja, and Louis-Philippe Morency, Multimodal Machine Learning: A Survey and Taxonomy

34 Multimodal Applications [ ] Tadas Baltrusaitis, Chaitanya Ahuja, and Louis-Philippe Morency, Multimodal Machine Learning: A Survey and Taxonomy

35 Multimodal Representations 35

36 Core Challenge: Representation Definition: Learning how to represent and summarize multimodal data in away that exploits the complementarity and redundancy. A Joint representations: B Coordinated representations: Representation Repres. 1 Repres 2 Modality 1 Modality 2 Modality 1 Modality 2 36

37 Deep Multimodal autoencoders A deep representation learning approach A bimodal auto-encoder Used for Audio-visual speech recognition [Ngiam et al., Multimodal Deep Learning, 2011]

38 Deep Multimodal autoencoders - training Individual modalities can be pretrained RBMs Denoising Autoencoders To train the model to reconstruct the other modality Use both Remove audio [Ngiam et al., Multimodal Deep Learning, 2011]

39 Deep Multimodal autoencoders - training Individual modalities can be pretrained RBMs Denoising Autoencoders To train the model to reconstruct the other modality Use both Remove audio Remove video [Ngiam et al., Multimodal Deep Learning, 2011]

40 Multimodal Encoder-Decoder Visual modality often encoded using CNN Language modality will be decoded using LSTM A simple multilayer perceptron will be used to translate from visual (CNN) to language (LSTM) Text X Image Y 40

41 Multimodal Joint Representation For supervised learning tasks Joining the unimodal representations: Simple concatenation Element-wise multiplication or summation Multilayer perceptron How to explicitly model both unimodal and bimodal interactions? h x e.g. Sentiment softmax Text X h m Image Y h y

42 Multimodal Sentiment Analysis Sentiment Intensity [-3,+3] softmax h m h x h y h z h m = f W h x, h y, h z Text X Image Y Audio Z 42

43 Trimodal Bimodal Unimodal Unimodal, Bimodal and Trimodal Interactions Speaker s behaviors Sentiment Intensity This movie is sick? Ambiguous! This movie is fair Smile Loud voice? Unimodal cues Ambiguous! This movie is sick Smile This movie is sick Frown This movie is sick Loud voice? Resolves ambiguity (bimodal interaction) Still Ambiguous! This movie is sick Smile Loud voice This movie is fair Smile Loud voice Different trimodal interactions! 43

44 Multimodal Tensor Fusion Network (TFN) Models both unimodal and bimodal interactions: h m = h x 1 h y 1 = h x h x h y 1 h y Bimodal Unimodal e.g. Sentiment softmax 1 h m Important! h x h y [Zadeh, Jones and Morency, EMNLP 2017] Text X Image Y 44

45 Multimodal Tensor Fusion Network (TFN) h x Can be extended to three modalities: h x h z h x h y h m = h x h y 1 1 h z 1 h y h z h y h z h x h y h z Explicitly models unimodal, bimodal and trimodal interactions! [Zadeh, Jones and Morency, EMNLP 2017] h x Text X h y Image Y Audio Z h z 45

46 Experimental Results MOSI Dataset Improvement over State-Of-The-Art 46

47 Coordinated Multimodal Representations Learn (unsupervised) two or more coordinated representations from multiple modalities. A loss function is defined to bring closer these multiple representations. Similarity metric (e.g., cosine distance) Text X Image Y 47

48 Deep Canonical Correlation Analysis Same objective function as CCA: argmax V,U,W x,w y corr H x, H y View H x 1 Linear projections maximizing correlation 2 Orthogonal projections 3 Unit variance of the projection vectors Andrew et al., ICML 2013 H x View H y U V W x W y Text Image X Y H y 48

49 Deep Canonically Correlated Autoencoders (DCCAE) Jointly optimize for DCCA and autoencoders loss functions A trade-off between multi-view correlation and reconstruction error from individual views X Text View H x Y Image Wang et al., ICML 2015 H x View H y U V W x W y Text Image X Y H y 49

50 Implicit alignment 50

51 Machine Translation Given a sentence in one language translate it to another Dog on the beach le chien sur la plage

52 Attention Model for Machine Translation Dog Attention module / gate Hidden state s 0 Context z 0 h 1 h 2 h 3 h 4 h 5 Encoder le chien sur la plage [Bahdanau et al., Neural Machine Translation by Jointly Learning to Align and Translate, ICLR 2015]

53 Attention Model for Machine Translation Dog on Attention module / gate Hidden state s 1 Context z 1 h 1 h 2 h 3 h 4 h 5 Encoder le chien sur la plage [Bahdanau et al., Neural Machine Translation by Jointly Learning to Align and Translate, ICLR 2015]

54 Attention Model for Machine Translation Dog on the Attention module / gate Hidden state s 2 h 1 h 2 h 3 h 4 h 5 Context z 2 Encoder le chien sur la plage [Bahdanau et al., Neural Machine Translation by Jointly Learning to Align and Translate, ICLR 2015]

55 Attention Model for Machine Translation 55

56 Attention Model for Image Captioning Distribution over L locations Output word a 1 a 2 d 1 a 3 d 2 s 0 s 1 s 2 z 1 y 1 z 2 y 2 Expectation over features: D First word 56

57 Attention Model for Image Captioning

58 Attention Model for Video Sequences [Pei, Baltrušaitis, Tax and Morency. Temporal Attention-Gated Model for Robust Sequence Classification, CVPR, 2017 ]

59 Temporal Attention-Gated Model (TAGM) 1 a t Recurrent Attention-Gated Unit h t 1 x + h t x t + ReLU x a t 1 a t a t+1 h t 1 h t h t+1 x t 1 x t x t+1 59

60 Temporal Attention Gated Model (TAGM) [Pei, Baltrušaitis, Tax and Morency. Temporal Attention-Gated Model for Robust Sequence Classification, CVPR, 2017 ]

61 Multimodal Fusion 61

62 Multiple Kernel Learning Pick a family of kernels for each modality and learn which kernels are important for the classification case Generalizes the idea of Support Vector Machines Works as well for unimodal and multimodal data, very little adaptation is needed [Lanckriet 2004]

63 Multimodal Fusion for Sequential Data Modality-private structure Internal grouping of observations Modality-shared structure Interaction and synchrony p y x A, x V ; θ) = h A,h V p y, h A, h V x A, x V ; θ Multi-View Hidden Conditional Sentiment Random Field y A A A A A h 1 h 2 h 3 h 4 h 5 x 1 A V V V V V h 1 h 2 h 3 h 4 h 5 A A A A x 2 x 3 x 4 x 5 V V x 1 x 2 x 3 V x 4 V x 5 V We saw the yellowdog Approximate inference using loopy-belief [Song, Morency and Davis, CVPR 2012] 63

64 Sequence Modeling with LSTM y 1 y 2 y 3 y τ LSTM (1) LSTM (2) LSTM (3) LSTM (τ) x 1 x 2 x 3 x τ 64

65 Multimodal Sequence Modeling Early Fusion y 1 y 2 y 3 y τ LSTM (1) LSTM (2) LSTM (3) LSTM (τ) (1) (1) (1) (1) x 1 x 2 x 3 x τ (2) (2) (2) (2) x 1 x 2 x 3 x τ (3) (3) (3) (3) x 1 x 2 x 3 x τ 65

66 Multi-View Long Short-Term Memory (MV-LSTM) y 1 y 2 y 3 y τ MV- LSTM (1) MV- LSTM (2) MV- LSTM (3) MV- LSTM (τ) (1) (1) (1) (1) x 1 x 2 x 3 x τ (2) (2) (2) (2) x 1 x 2 x 3 x τ (3) (3) (3) (3) x 1 x 2 x 3 x τ [Shyam, Morency, et al. Extending Long Short-Term Memory for Multi-View Structured Learning, ECCV, 2016] 66

67 Multi-View Long Short-Term Memory Multi-view topologies Multiple memory cells MV- LSTM (1) (1) h t 1 (2) h t 1 (3) h t 1 MVtanh (1) g t (2) g t g (3) c t (1) c t (2) c t (3) h t (1) h t (2) h t (3) x t (1) MVsigm x t (2) x t (3) MVsigm MVsigm [Shyam, Morency, et al. Extending Long Short-Term Memory for Multi-View Structured Learning, ECCV, 2016] 67

68 Topologies for Multi-View LSTM Fullyconnected α=1, β=1 MV- LSTM (1) (1) h t 1 (2) h t 1 (3) h t 1 Multi-view topologies (1) g t (2) g t g (3) x t (1) (1) h t 1 (2) h t 1 (3) h t 1 View-specific α=1, β=0 (1) g t x t (1) (1) h t 1 (2) h t 1 (3) h t 1 MVtanh (1) g t x t (1) x t (2) x t (3) α: Memory from current view β: Memory from other views Design parameters x t (1) (1) h t 1 (2) h t 1 (3) h t 1 Coupled α=0, β=1 (1) g t x t (1) (1) h t 1 (2) h t 1 (3) h t 1 Hybrid α=2/3, β=1/3 (1) g t [Shyam, Morency, et al. Extending Long Short-Term Memory for Multi-View Structured Learning, ECCV, 2016] 68

69 Multi-View Long Short-Term Memory (MV-LSTM) Multimodal prediction of children engagement [Shyam, Morency, et al. Extending Long Short-Term Memory for Multi-View Structured Learning, ECCV, 2016] 69

70 Memory Based A memory accumulates multimodal information over time. From the representations throughout a source network. No need to modify the structure of the source network, only attached the memory. 70

71 Memory Based [Zadeh et al., Memory Fusion Network for Multi-view Sequential Learning, AAAI 2018] 71

72 Multimodal Machine Learning Tadas Baltrusaitis, Chaitanya Ahuja, and Louis-Philippe Morency 72

Tutorial on Multimodal Machine Learning

Tutorial on Multimodal Machine Learning Tutorial on Multimodal Machine Learning Louis-Philippe (LP) Morency Tadas Baltrusaitis CMU Multimodal Communication and Machine Learning Laboratory [MultiComp Lab] 1 Your Instructors Louis-Philippe Morency

More information

EfficientLow-rank Multimodal Fusion With Modality-specific Factors

EfficientLow-rank Multimodal Fusion With Modality-specific Factors EfficientLow-rank Multimodal Fusion With Modality-specific Factors Zhun Liu, Ying Shen, Varun Bharadwaj, Paul Pu Liang, Amir Zadeh, Louis-Philippe Morency Artificial Intelligence Multimodal Sentiment and

More information

Multimodal context analysis and prediction

Multimodal context analysis and prediction Multimodal context analysis and prediction Valeria Tomaselli (valeria.tomaselli@st.com) Sebastiano Battiato Giovanni Maria Farinella Tiziana Rotondo (PhD student) Outline 2 Context analysis vs prediction

More information

CSC321 Lecture 16: ResNets and Attention

CSC321 Lecture 16: ResNets and Attention CSC321 Lecture 16: ResNets and Attention Roger Grosse Roger Grosse CSC321 Lecture 16: ResNets and Attention 1 / 24 Overview Two topics for today: Topic 1: Deep Residual Networks (ResNets) This is the state-of-the

More information

Deep Learning Autoencoder Models

Deep Learning Autoencoder Models Deep Learning Autoencoder Models Davide Bacciu Dipartimento di Informatica Università di Pisa Intelligent Systems for Pattern Recognition (ISPR) Generative Models Wrap-up Deep Learning Module Lecture Generative

More information

Deep Learning Sequence to Sequence models: Attention Models. 17 March 2018

Deep Learning Sequence to Sequence models: Attention Models. 17 March 2018 Deep Learning Sequence to Sequence models: Attention Models 17 March 2018 1 Sequence-to-sequence modelling Problem: E.g. A sequence X 1 X N goes in A different sequence Y 1 Y M comes out Speech recognition:

More information

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016 Machine Learning for Signal Processing Neural Networks Continue Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016 1 So what are neural networks?? Voice signal N.Net Transcription Image N.Net Text

More information

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as

More information

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?

More information

Unsupervised Learning

Unsupervised Learning CS 3750 Advanced Machine Learning hkc6@pitt.edu Unsupervised Learning Data: Just data, no labels Goal: Learn some underlying hidden structure of the data P(, ) P( ) Principle Component Analysis (Dimensionality

More information

Deep Learning for Computer Vision

Deep Learning for Computer Vision Deep Learning for Computer Vision Spring 2018 http://vllab.ee.ntu.edu.tw/dlcv.html (primary) https://ceiba.ntu.edu.tw/1062dlcv (grade, etc.) FB: DLCV Spring 2018 Yu-Chiang Frank Wang 王鈺強, Associate Professor

More information

Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook

Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook Recap Standard RNNs Training: Backpropagation Through Time (BPTT) Application to sequence modeling Language modeling Applications: Automatic speech

More information

TUTORIAL PART 1 Unsupervised Learning

TUTORIAL PART 1 Unsupervised Learning TUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science Univ. of Toronto ranzato@cs.toronto.edu Co-organizers: Honglak Lee, Yoshua Bengio, Geoff Hinton, Yann LeCun, Andrew

More information

Sequence Modeling with Neural Networks

Sequence Modeling with Neural Networks Sequence Modeling with Neural Networks Harini Suresh y 0 y 1 y 2 s 0 s 1 s 2... x 0 x 1 x 2 hat is a sequence? This morning I took the dog for a walk. sentence medical signals speech waveform Successes

More information

Lecture 14: Deep Generative Learning

Lecture 14: Deep Generative Learning Generative Modeling CSED703R: Deep Learning for Visual Recognition (2017F) Lecture 14: Deep Generative Learning Density estimation Reconstructing probability density function using samples Bohyung Han

More information

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen Artificial Neural Networks Introduction to Computational Neuroscience Tambet Matiisen 2.04.2018 Artificial neural network NB! Inspired by biology, not based on biology! Applications Automatic speech recognition

More information

a) b) (Natural Language Processing; NLP) (Deep Learning) Bag of words White House RGB [1] IBM

a) b) (Natural Language Processing; NLP) (Deep Learning) Bag of words White House RGB [1] IBM c 1. (Natural Language Processing; NLP) (Deep Learning) RGB IBM 135 8511 5 6 52 yutat@jp.ibm.com a) b) 2. 1 0 2 1 Bag of words White House 2 [1] 2015 4 Copyright c by ORSJ. Unauthorized reproduction of

More information

Review: Learning Bimodal Structures in Audio-Visual Data

Review: Learning Bimodal Structures in Audio-Visual Data Review: Learning Bimodal Structures in Audio-Visual Data CSE 704 : Readings in Joint Visual, Lingual and Physical Models and Inference Algorithms Suren Kumar Vision and Perceptual Machines Lab 106 Davis

More information

Multimodal human behavior analysis: Learning correlation and interaction across modalities

Multimodal human behavior analysis: Learning correlation and interaction across modalities Multimodal human behavior analysis: Learning correlation and interaction across modalities The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story

More information

Making Deep Learning Understandable for Analyzing Sequential Data about Gene Regulation

Making Deep Learning Understandable for Analyzing Sequential Data about Gene Regulation Making Deep Learning Understandable for Analyzing Sequential Data about Gene Regulation Dr. Yanjun Qi Department of Computer Science University of Virginia Tutorial @ ACM BCB-2018 8/29/18 Yanjun Qi / UVA

More information

Affect recognition from facial movements and body gestures by hierarchical deep spatio-temporal features and fusion strategy

Affect recognition from facial movements and body gestures by hierarchical deep spatio-temporal features and fusion strategy Accepted Manuscript Affect recognition from facial movements and body gestures by hierarchical deep spatio-temporal features and fusion strategy Bo Sun, Siming Cao, Jun He, Lejun Yu PII: S0893-6080(17)30284-8

More information

Learning Deep Architectures for AI. Part II - Vijay Chakilam

Learning Deep Architectures for AI. Part II - Vijay Chakilam Learning Deep Architectures for AI - Yoshua Bengio Part II - Vijay Chakilam Limitations of Perceptron x1 W, b 0,1 1,1 y x2 weight plane output =1 output =0 There is no value for W and b such that the model

More information

arxiv: v2 [cs.cl] 1 Jan 2019

arxiv: v2 [cs.cl] 1 Jan 2019 Variational Self-attention Model for Sentence Representation arxiv:1812.11559v2 [cs.cl] 1 Jan 2019 Qiang Zhang 1, Shangsong Liang 2, Emine Yilmaz 1 1 University College London, London, United Kingdom 2

More information

Presented By: Omer Shmueli and Sivan Niv

Presented By: Omer Shmueli and Sivan Niv Deep Speaker: an End-to-End Neural Speaker Embedding System Chao Li, Xiaokong Ma, Bing Jiang, Xiangang Li, Xuewei Zhang, Xiao Liu, Ying Cao, Ajay Kannan, Zhenyao Zhu Presented By: Omer Shmueli and Sivan

More information

Segmental Recurrent Neural Networks for End-to-end Speech Recognition

Segmental Recurrent Neural Networks for End-to-end Speech Recognition Segmental Recurrent Neural Networks for End-to-end Speech Recognition Liang Lu, Lingpeng Kong, Chris Dyer, Noah Smith and Steve Renals TTI-Chicago, UoE, CMU and UW 9 September 2016 Background A new wave

More information

UNSUPERVISED LEARNING

UNSUPERVISED LEARNING UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training

More information

CSC321 Lecture 20: Autoencoders

CSC321 Lecture 20: Autoencoders CSC321 Lecture 20: Autoencoders Roger Grosse Roger Grosse CSC321 Lecture 20: Autoencoders 1 / 16 Overview Latent variable models so far: mixture models Boltzmann machines Both of these involve discrete

More information

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang Deep Learning Basics Lecture 7: Factor Analysis Princeton University COS 495 Instructor: Yingyu Liang Supervised v.s. Unsupervised Math formulation for supervised learning Given training data x i, y i

More information

Modelling Time Series with Neural Networks. Volker Tresp Summer 2017

Modelling Time Series with Neural Networks. Volker Tresp Summer 2017 Modelling Time Series with Neural Networks Volker Tresp Summer 2017 1 Modelling of Time Series The next figure shows a time series (DAX) Other interesting time-series: energy prize, energy consumption,

More information

Multimodal Deep Learning for Predicting Survival from Breast Cancer

Multimodal Deep Learning for Predicting Survival from Breast Cancer Multimodal Deep Learning for Predicting Survival from Breast Cancer Heather Couture Deep Learning Journal Club Nov. 16, 2016 Outline Background on tumor histology & genetic data Background on survival

More information

Lecture 5 Neural models for NLP

Lecture 5 Neural models for NLP CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm

More information

Correlation Autoencoder Hashing for Supervised Cross-Modal Search

Correlation Autoencoder Hashing for Supervised Cross-Modal Search Correlation Autoencoder Hashing for Supervised Cross-Modal Search Yue Cao, Mingsheng Long, Jianmin Wang, and Han Zhu School of Software Tsinghua University The Annual ACM International Conference on Multimedia

More information

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement Simon Leglaive 1 Laurent Girin 1,2 Radu Horaud 1 1: Inria Grenoble Rhône-Alpes 2: Univ. Grenoble Alpes, Grenoble INP,

More information

Introduction to Deep Neural Networks

Introduction to Deep Neural Networks Introduction to Deep Neural Networks Presenter: Chunyuan Li Pattern Classification and Recognition (ECE 681.01) Duke University April, 2016 Outline 1 Background and Preliminaries Why DNNs? Model: Logistic

More information

c Copyright 2016 Galen Andrew

c Copyright 2016 Galen Andrew c Copyright 2016 Galen Andrew New Techniques in Deep Representation Learning Galen Andrew A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy University

More information

CSC321 Lecture 15: Recurrent Neural Networks

CSC321 Lecture 15: Recurrent Neural Networks CSC321 Lecture 15: Recurrent Neural Networks Roger Grosse Roger Grosse CSC321 Lecture 15: Recurrent Neural Networks 1 / 26 Overview Sometimes we re interested in predicting sequences Speech-to-text and

More information

Tutorial on Methods for Interpreting and Understanding Deep Neural Networks. Part 3: Applications & Discussion

Tutorial on Methods for Interpreting and Understanding Deep Neural Networks. Part 3: Applications & Discussion Tutorial on Methods for Interpreting and Understanding Deep Neural Networks W. Samek, G. Montavon, K.-R. Müller Part 3: Applications & Discussion ICASSP 2017 Tutorial W. Samek, G. Montavon & K.-R. Müller

More information

Recurrent Neural Networks (RNN) and Long-Short-Term-Memory (LSTM) Yuan YAO HKUST

Recurrent Neural Networks (RNN) and Long-Short-Term-Memory (LSTM) Yuan YAO HKUST 1 Recurrent Neural Networks (RNN) and Long-Short-Term-Memory (LSTM) Yuan YAO HKUST Summary We have shown: Now First order optimization methods: GD (BP), SGD, Nesterov, Adagrad, ADAM, RMSPROP, etc. Second

More information

Recurrent Neural Network

Recurrent Neural Network Recurrent Neural Network Xiaogang Wang xgwang@ee..edu.hk March 2, 2017 Xiaogang Wang (linux) Recurrent Neural Network March 2, 2017 1 / 48 Outline 1 Recurrent neural networks Recurrent neural networks

More information

attention mechanisms and generative models

attention mechanisms and generative models attention mechanisms and generative models Master's Deep Learning Sergey Nikolenko Harbour Space University, Barcelona, Spain November 20, 2017 attention in neural networks attention You re paying attention

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Pushpak Bhattacharyya CSE Dept, IIT Patna and Bombay LSTM 15 jun, 2017 lgsoft:nlp:lstm:pushpak 1 Recap 15 jun, 2017 lgsoft:nlp:lstm:pushpak 2 Feedforward Network and Backpropagation

More information

Deep Learning Architectures and Algorithms

Deep Learning Architectures and Algorithms Deep Learning Architectures and Algorithms In-Jung Kim 2016. 12. 2. Agenda Introduction to Deep Learning RBM and Auto-Encoders Convolutional Neural Networks Recurrent Neural Networks Reinforcement Learning

More information

Long-Short Term Memory and Other Gated RNNs

Long-Short Term Memory and Other Gated RNNs Long-Short Term Memory and Other Gated RNNs Sargur Srihari srihari@buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Sequence Modeling

More information

Normalization Techniques in Training of Deep Neural Networks

Normalization Techniques in Training of Deep Neural Networks Normalization Techniques in Training of Deep Neural Networks Lei Huang ( 黄雷 ) State Key Laboratory of Software Development Environment, Beihang University Mail:huanglei@nlsde.buaa.edu.cn August 17 th,

More information

Random Coattention Forest for Question Answering

Random Coattention Forest for Question Answering Random Coattention Forest for Question Answering Jheng-Hao Chen Stanford University jhenghao@stanford.edu Ting-Po Lee Stanford University tingpo@stanford.edu Yi-Chun Chen Stanford University yichunc@stanford.edu

More information

Lecture 15: Recurrent Neural Nets

Lecture 15: Recurrent Neural Nets Lecture 15: Recurrent Neural Nets Roger Grosse 1 Introduction Most of the prediction tasks we ve looked at have involved pretty simple kinds of outputs, such as real values or discrete categories. But

More information

Stephen Scott.

Stephen Scott. 1 / 35 (Adapted from Vinod Variyam and Ian Goodfellow) sscott@cse.unl.edu 2 / 35 All our architectures so far work on fixed-sized inputs neural networks work on sequences of inputs E.g., text, biological

More information

Introduction to RNNs!

Introduction to RNNs! Introduction to RNNs Arun Mallya Best viewed with Computer Modern fonts installed Outline Why Recurrent Neural Networks (RNNs)? The Vanilla RNN unit The RNN forward pass Backpropagation refresher The RNN

More information

CSC321 Lecture 7 Neural language models

CSC321 Lecture 7 Neural language models CSC321 Lecture 7 Neural language models Roger Grosse and Nitish Srivastava February 1, 2015 Roger Grosse and Nitish Srivastava CSC321 Lecture 7 Neural language models February 1, 2015 1 / 19 Overview We

More information

Recurrent Neural Networks. deeplearning.ai. Why sequence models?

Recurrent Neural Networks. deeplearning.ai. Why sequence models? Recurrent Neural Networks deeplearning.ai Why sequence models? Examples of sequence data The quick brown fox jumped over the lazy dog. Speech recognition Music generation Sentiment classification There

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

Based on the original slides of Hung-yi Lee

Based on the original slides of Hung-yi Lee Based on the original slides of Hung-yi Lee New Activation Function Rectified Linear Unit (ReLU) σ z a a = z Reason: 1. Fast to compute 2. Biological reason a = 0 [Xavier Glorot, AISTATS 11] [Andrew L.

More information

Lecture 11 Recurrent Neural Networks I

Lecture 11 Recurrent Neural Networks I Lecture 11 Recurrent Neural Networks I CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor niversity of Chicago May 01, 2017 Introduction Sequence Learning with Neural Networks Some Sequence Tasks

More information

EE-559 Deep learning Recurrent Neural Networks

EE-559 Deep learning Recurrent Neural Networks EE-559 Deep learning 11.1. Recurrent Neural Networks François Fleuret https://fleuret.org/ee559/ Sun Feb 24 20:33:31 UTC 2019 Inference from sequences François Fleuret EE-559 Deep learning / 11.1. Recurrent

More information

Lecture 11 Recurrent Neural Networks I

Lecture 11 Recurrent Neural Networks I Lecture 11 Recurrent Neural Networks I CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 01, 2017 Introduction Sequence Learning with Neural Networks Some Sequence Tasks

More information

Neural Architectures for Image, Language, and Speech Processing

Neural Architectures for Image, Language, and Speech Processing Neural Architectures for Image, Language, and Speech Processing Karl Stratos June 26, 2018 1 / 31 Overview Feedforward Networks Need for Specialized Architectures Convolutional Neural Networks (CNNs) Recurrent

More information

Task-Oriented Dialogue System (Young, 2000)

Task-Oriented Dialogue System (Young, 2000) 2 Review Task-Oriented Dialogue System (Young, 2000) 3 http://rsta.royalsocietypublishing.org/content/358/1769/1389.short Speech Signal Speech Recognition Hypothesis are there any action movies to see

More information

Deep Learning Recurrent Networks 2/28/2018

Deep Learning Recurrent Networks 2/28/2018 Deep Learning Recurrent Networks /8/8 Recap: Recurrent networks can be incredibly effective Story so far Y(t+) Stock vector X(t) X(t+) X(t+) X(t+) X(t+) X(t+5) X(t+) X(t+7) Iterated structures are good

More information

Course Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch

Course Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch Psychology 452 Week 12: Deep Learning What Is Deep Learning? Preliminary Ideas (that we already know!) The Restricted Boltzmann Machine (RBM) Many Layers of RBMs Pros and Cons of Deep Learning Course Structure

More information

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Speaker Representation and Verification Part II. by Vasileios Vasilakakis Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation

More information

Feature Design. Feature Design. Feature Design. & Deep Learning

Feature Design. Feature Design. Feature Design. & Deep Learning Artificial Intelligence and its applications Lecture 9 & Deep Learning Professor Daniel Yeung danyeung@ieee.org Dr. Patrick Chan patrickchan@ieee.org South China University of Technology, China Appropriately

More information

Dynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji

Dynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji Dynamic Data Modeling, Recognition, and Synthesis Rui Zhao Thesis Defense Advisor: Professor Qiang Ji Contents Introduction Related Work Dynamic Data Modeling & Analysis Temporal localization Insufficient

More information

A Deep Learning Model for Structured Outputs with High-order Interaction

A Deep Learning Model for Structured Outputs with High-order Interaction A Deep Learning Model for Structured Outputs with High-order Interaction Hongyu Guo, Xiaodan Zhu, Martin Renqiang Min National Research Council of Canada, Ottawa, ON {hongyu.guo, xiaodan.zhu}@nrc-cnrc.gc.ca

More information

Machine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Machine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group Machine Learning for Computer Vision 8. Neural Networks and Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group INTRODUCTION Nonlinear Coordinate Transformation http://cs.stanford.edu/people/karpathy/convnetjs/

More information

Applied Natural Language Processing

Applied Natural Language Processing Applied Natural Language Processing Info 256 Lecture 20: Sequence labeling (April 9, 2019) David Bamman, UC Berkeley POS tagging NNP Labeling the tag that s correct for the context. IN JJ FW SYM IN JJ

More information

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous

More information

Learning Deep Architectures

Learning Deep Architectures Learning Deep Architectures Yoshua Bengio, U. Montreal Microsoft Cambridge, U.K. July 7th, 2009, Montreal Thanks to: Aaron Courville, Pascal Vincent, Dumitru Erhan, Olivier Delalleau, Olivier Breuleux,

More information

Memory-Augmented Attention Model for Scene Text Recognition

Memory-Augmented Attention Model for Scene Text Recognition Memory-Augmented Attention Model for Scene Text Recognition Cong Wang 1,2, Fei Yin 1,2, Cheng-Lin Liu 1,2,3 1 National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences

More information

WaveNet: A Generative Model for Raw Audio

WaveNet: A Generative Model for Raw Audio WaveNet: A Generative Model for Raw Audio Ido Guy & Daniel Brodeski Deep Learning Seminar 2017 TAU Outline Introduction WaveNet Experiments Introduction WaveNet is a deep generative model of raw audio

More information

Dimensionality Reduction and Principle Components Analysis

Dimensionality Reduction and Principle Components Analysis Dimensionality Reduction and Principle Components Analysis 1 Outline What is dimensionality reduction? Principle Components Analysis (PCA) Example (Bishop, ch 12) PCA vs linear regression PCA as a mixture

More information

Deep Learning for Speech Recognition. Hung-yi Lee

Deep Learning for Speech Recognition. Hung-yi Lee Deep Learning for Speech Recognition Hung-yi Lee Outline Conventional Speech Recognition How to use Deep Learning in acoustic modeling? Why Deep Learning? Speaker Adaptation Multi-task Deep Learning New

More information

Slide credit from Hung-Yi Lee & Richard Socher

Slide credit from Hung-Yi Lee & Richard Socher Slide credit from Hung-Yi Lee & Richard Socher 1 Review Recurrent Neural Network 2 Recurrent Neural Network Idea: condition the neural network on all previous words and tie the weights at each time step

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Deep unsupervised learning

Deep unsupervised learning Deep unsupervised learning Advanced data-mining Yongdai Kim Department of Statistics, Seoul National University, South Korea Unsupervised learning In machine learning, there are 3 kinds of learning paradigm.

More information

Neural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35

Neural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35 Neural Networks David Rosenberg New York University July 26, 2017 David Rosenberg (New York University) DS-GA 1003 July 26, 2017 1 / 35 Neural Networks Overview Objectives What are neural networks? How

More information

Multi-Signal Gesture Recognition Using Temporal Smoothing Hidden Conditional Random Fields

Multi-Signal Gesture Recognition Using Temporal Smoothing Hidden Conditional Random Fields Multi-Signal Gesture Recognition Using Temporal Smoothing Hidden Conditional Random Fields Yale Song, David Demirdjian, and Randall Davis MIT Computer Science and Artificial Intelligence Laboratory 32

More information

REVISIT ENCODER & DECODER

REVISIT ENCODER & DECODER PERCEPTION-LINK BEHAVIOR MODEL: REVISIT ENCODER & DECODER IMI PHD Presentation Presenter: William Gu Yuanlong (PhD student) Supervisor: Assoc. Prof. Gerald Seet Gim Lee Co-Supervisor: Prof. Nadia Magnenat-Thalmann

More information

arxiv: v1 [cs.cl] 21 May 2017

arxiv: v1 [cs.cl] 21 May 2017 Spelling Correction as a Foreign Language Yingbo Zhou yingbzhou@ebay.com Utkarsh Porwal uporwal@ebay.com Roberto Konow rkonow@ebay.com arxiv:1705.07371v1 [cs.cl] 21 May 2017 Abstract In this paper, we

More information

Deep Learning for NLP

Deep Learning for NLP Deep Learning for NLP CS224N Christopher Manning (Many slides borrowed from ACL 2012/NAACL 2013 Tutorials by me, Richard Socher and Yoshua Bengio) Machine Learning and NLP NER WordNet Usually machine learning

More information

Deep Generative Models. (Unsupervised Learning)

Deep Generative Models. (Unsupervised Learning) Deep Generative Models (Unsupervised Learning) CEng 783 Deep Learning Fall 2017 Emre Akbaş Reminders Next week: project progress demos in class Describe your problem/goal What you have done so far What

More information

Convolutional Neural Networks II. Slides from Dr. Vlad Morariu

Convolutional Neural Networks II. Slides from Dr. Vlad Morariu Convolutional Neural Networks II Slides from Dr. Vlad Morariu 1 Optimization Example of optimization progress while training a neural network. (Loss over mini-batches goes down over time.) 2 Learning rate

More information

How to do backpropagation in a brain

How to do backpropagation in a brain How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep

More information

Machine Learning for Physicists Lecture 1

Machine Learning for Physicists Lecture 1 Machine Learning for Physicists Lecture 1 Summer 2017 University of Erlangen-Nuremberg Florian Marquardt (Image generated by a net with 20 hidden layers) OUTPUT INPUT (Picture: Wikimedia Commons) OUTPUT

More information

arxiv: v1 [cs.lg] 3 Feb 2018

arxiv: v1 [cs.lg] 3 Feb 2018 Memory Fusion Network for Multi-view Sequential Learning Amir Zadeh Carnegie Mellon University, USA abagherz@cs.cmu.edu Paul Pu Liang Carnegie Mellon University, USA pliang@cs.cmu.edu Navonil Mazumder

More information

11/3/15. Deep Learning for NLP. Deep Learning and its Architectures. What is Deep Learning? Advantages of Deep Learning (Part 1)

11/3/15. Deep Learning for NLP. Deep Learning and its Architectures. What is Deep Learning? Advantages of Deep Learning (Part 1) 11/3/15 Machine Learning and NLP Deep Learning for NLP Usually machine learning works well because of human-designed representations and input features CS224N WordNet SRL Parser Machine learning becomes

More information

Natural Language Processing and Recurrent Neural Networks

Natural Language Processing and Recurrent Neural Networks Natural Language Processing and Recurrent Neural Networks Pranay Tarafdar October 19 th, 2018 Outline Introduction to NLP Word2vec RNN GRU LSTM Demo What is NLP? Natural Language? : Huge amount of information

More information

INF 5860 Machine learning for image classification. Lecture 14: Reinforcement learning May 9, 2018

INF 5860 Machine learning for image classification. Lecture 14: Reinforcement learning May 9, 2018 Machine learning for image classification Lecture 14: Reinforcement learning May 9, 2018 Page 3 Outline Motivation Introduction to reinforcement learning (RL) Value function based methods (Q-learning)

More information

Lecture 17: Neural Networks and Deep Learning

Lecture 17: Neural Networks and Deep Learning UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions

More information

High Order LSTM/GRU. Wenjie Luo. January 19, 2016

High Order LSTM/GRU. Wenjie Luo. January 19, 2016 High Order LSTM/GRU Wenjie Luo January 19, 2016 1 Introduction RNN is a powerful model for sequence data but suffers from gradient vanishing and explosion, thus difficult to be trained to capture long

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6 Administration Registration Hw3 is out Due on Thursday 10/6 Questions Lecture Captioning (Extra-Credit) Look at Piazza for details Scribing lectures With pay; come talk to me/send email. 1 Projects Projects

More information

Global SNR Estimation of Speech Signals using Entropy and Uncertainty Estimates from Dropout Networks

Global SNR Estimation of Speech Signals using Entropy and Uncertainty Estimates from Dropout Networks Interspeech 2018 2-6 September 2018, Hyderabad Global SNR Estimation of Speech Signals using Entropy and Uncertainty Estimates from Dropout Networks Rohith Aralikatti, Dilip Kumar Margam, Tanay Sharma,

More information

Two-Stream Bidirectional Long Short-Term Memory for Mitosis Event Detection and Stage Localization in Phase-Contrast Microscopy Images

Two-Stream Bidirectional Long Short-Term Memory for Mitosis Event Detection and Stage Localization in Phase-Contrast Microscopy Images Two-Stream Bidirectional Long Short-Term Memory for Mitosis Event Detection and Stage Localization in Phase-Contrast Microscopy Images Yunxiang Mao and Zhaozheng Yin (B) Computer Science, Missouri University

More information

Independent Component Analysis and Unsupervised Learning

Independent Component Analysis and Unsupervised Learning Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien National Cheng Kung University TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent

More information

Unsupervised Learning of Hierarchical Models. in collaboration with Josh Susskind and Vlad Mnih

Unsupervised Learning of Hierarchical Models. in collaboration with Josh Susskind and Vlad Mnih Unsupervised Learning of Hierarchical Models Marc'Aurelio Ranzato Geoff Hinton in collaboration with Josh Susskind and Vlad Mnih Advanced Machine Learning, 9 March 2011 Example: facial expression recognition

More information

Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity- Representativeness Reward

Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity- Representativeness Reward Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity- Representativeness Reward Kaiyang Zhou, Yu Qiao, Tao Xiang AAAI 2018 What is video summarization? Goal: to automatically

More information

Index. Santanu Pattanayak 2017 S. Pattanayak, Pro Deep Learning with TensorFlow,

Index. Santanu Pattanayak 2017 S. Pattanayak, Pro Deep Learning with TensorFlow, Index A Activation functions, neuron/perceptron binary threshold activation function, 102 103 linear activation function, 102 rectified linear unit, 106 sigmoid activation function, 103 104 SoftMax activation

More information

Recurrent Neural Networks. Jian Tang

Recurrent Neural Networks. Jian Tang Recurrent Neural Networks Jian Tang tangjianpku@gmail.com 1 RNN: Recurrent neural networks Neural networks for sequence modeling Summarize a sequence with fix-sized vector through recursively updating

More information

Recurrent Neural Networks with Flexible Gates using Kernel Activation Functions

Recurrent Neural Networks with Flexible Gates using Kernel Activation Functions 2018 IEEE International Workshop on Machine Learning for Signal Processing (MLSP 18) Recurrent Neural Networks with Flexible Gates using Kernel Activation Functions Authors: S. Scardapane, S. Van Vaerenbergh,

More information