Multimodal Machine Learning
|
|
- Doreen Chase
- 6 years ago
- Views:
Transcription
1 Multimodal Machine Learning Louis-Philippe (LP) Morency CMU Multimodal Communication and Machine Learning Laboratory [MultiComp Lab] 1
2 CMU Course : Multimodal Machine Learning 2
3 Lecture Objectives What is Multimodal? Multimodal: Core technical challenges Representation learning, translation, alignment, fusion and co-learning Multimodal representation learning Multimodal tensor representation Implicit Alignment Temporal attention Fusion and temporal modeling Multi-view LSTM and memory-based fusion
4 What is Multimodal? 4
5 What is Multimodal? Multimodal distribution Multiple modes, i.e., distinct peaks (local maxima) in the probability density function
6 What is Multimodal? Sensory Modalities
7 Multimodal Communicative Behaviors Verbal Lexicon Words Syntax Part-of-speech Dependencies Pragmatics Discourse acts Vocal Prosody Intonation Voice quality Vocal expressions Laughter, moans Visual Gestures Head gestures Eye gestures Arm gestures Body language Body posture Proxemics Eye contact Head gaze Eye gaze Facial expressions FACS action units Smile, frowning 7
8 What is Multimodal? Modality The way in which something happens or is experienced. Modality refers to a certain type of information and/or the representation format in which information is stored. Sensory modality: one of the primary forms of sensation, as vision or touch; channel of communication. Medium ( middle ) A means or instrumentality for storing or communicating information; system of communication/transmission. Medium is the means whereby this information is delivered to the senses of the interpreter. 8
9 Multiple Communities and Modalities Psychology Medical Speech Vision Language Multimedia Robotics Learning
10 Examples of Modalities Natural language (both spoken or written) Visual (from images or videos) Auditory (including voice, sounds and music) Haptics / touch Smell, taste and self-motion Physiological signals Electrocardiogram (ECG), skin conductance Other modalities Infrared images, depth images, fmri
11 Prior Research on Multimodal Four eras of multimodal research The behavioral era (1970s until late 1980s) The computational era (late 1980s until 2000) The interaction era ( ) The deep learning era (2010s until ) Main focus of this tutorial
12 The McGurk Effect (1976) Hearing lips and seeing voices Nature
13 The McGurk Effect (1976) Hearing lips and seeing voices Nature
14 The Computational Era(Late 1980s until 2000) 1) Audio-Visual Speech Recognition (AVSR)
15 Core Technical Challenges 15
16 Core Challenges in Deep Multimodal ML Tadas Baltrusaitis, Chaitanya Ahuja, and Louis-Philippe Morency These challenges are non-exclusive. 16
17 Core Challenge 1: Representation Definition: Learning how to represent and summarize multimodal data in away that exploits the complementarity and redundancy. A Joint representations: Representation Modality 1 Modality 2 17
18 Joint Multimodal Representation Wow! I like it! Joyful tone Joint Representation (Multimodal Space) Tensed voice 18
19 Joint Multimodal Representations Audio-visual speech recognition Bimodal Deep Belief Network Image captioning [Ngiam et al., ICML 2011] [Srivastava and Salahutdinov, NIPS 2012] Multimodal Deep Boltzmann Machine Audio-visual emotion recognition Deep Boltzmann Machine [Kim et al., ICASSP 2013] Depth Video Depth Multimodal Multimodal Representation Visual Verbal Depth Verbal 19
20 Multimodal Vector Space Arithmetic [Kiros et al., Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models, 2014] 20
21 Core Challenge 1: Representation Definition: Learning how to represent and summarize multimodal data in away that exploits the complementarity and redundancy. A Joint representations: B Coordinated representations: Representation Repres. 1 Repres 2 Modality 1 Modality 2 Modality 1 Modality 2 21
22 Coordinated Representation: Deep CCA Learn linear projections that are maximally correlated: u, v = argmax u,v corr u T X, v T Y View H x View H y X u v Y H x U V W x W y Text Image X Y H y Andrew et al., ICML
23 Fancy algorithm Core Challenge 2: Alignment Definition: Identify the direct relations between (sub)elements from two or more different modalities. Modality 1 t 1 t 2 Modality 2 t 4 A Explicit Alignment The goal is to directly find correspondences between elements of different modalities t 3 t 5 B Implicit Alignment Uses internally latent alignment of modalities in order to better solve a different problem t n t n 23
24 Temporal sequence alignment Applications: - Re-aligning asynchronous data - Finding similar data across modalities (we can estimate the aligned cost) - Event reconstruction from multiple sources
25 Alignment examples (multimodal)
26 Implicit Alignment Karpathy et al., Deep Fragment Embeddings for Bidirectional Image Sentence Mapping, 26
27 Core Challenge 3: Fusion Definition: To join information from two or more modalities to perform a prediction task. A Model-Agnostic Approaches 1) Early Fusion 2) Late Fusion Modality 1 Classifier Modality 1 Classifier Modality 2 Modality 2 Classifier 27
28 Core Challenge 3: Fusion Definition: To join information from two or more modalities to perform a prediction task. B Model-Based (Intermediate) Approaches 1) Deep neural networks 2) Kernel-based methods Multiple kernel learning y 3) Graphical models x 1 A h 1 A h 1 V x 1 V h 2 A x 2 A h 2 V x 2 V h 3 A x 3 A h 3 V x 3 V h 4 A x 4 A h 4 V x 4 V h 5 A x 5 A h 5 V x 5 V Multi-View Hidden CRF 28
29 Core Challenge 4: Translation Definition: Process of changing data from one modality to another, where the translation relationship can often be open-ended or subjective. A Example-based B Model-driven 29
30 Core Challenge 4 Translation Visual gestures (both speaker and listener gestures) Transcriptions + Audio streams Marsella et al., Virtual character performance from speech, SIGGRAPH/Eurographics Symposium on Computer Animation, 2013
31 Core Challenge 5: Co-Learning Definition: Transfer knowledge between modalities, including their representations and predictive models. Prediction Modality 1 Modality 2 Help during training 31
32 Core Challenge 5: Co-Learning A Parallel B Non-Parallel C Hybrid 32
33 Taxonomy of Multimodal Research [ ] Representation Joint o Neural networks o Graphical models o Sequential Coordinated o Similarity o Structured Translation Example-based o Retrieval o Combination Model-based o Grammar-based o Encoder-decoder o Online prediction Alignment Explicit o Unsupervised o Supervised Implicit o Graphical models o Neural networks Fusion Model agnostic o Early fusion o Late fusion o Hybrid fusion Model-based o Kernel-based o Graphical models o Neural networks Co-learning Parallel data o Co-training o Transfer learning Non-parallel data Zero-shot learning Concept grounding Transfer learning Hybrid data Bridging Tadas Baltrusaitis, Chaitanya Ahuja, and Louis-Philippe Morency, Multimodal Machine Learning: A Survey and Taxonomy
34 Multimodal Applications [ ] Tadas Baltrusaitis, Chaitanya Ahuja, and Louis-Philippe Morency, Multimodal Machine Learning: A Survey and Taxonomy
35 Multimodal Representations 35
36 Core Challenge: Representation Definition: Learning how to represent and summarize multimodal data in away that exploits the complementarity and redundancy. A Joint representations: B Coordinated representations: Representation Repres. 1 Repres 2 Modality 1 Modality 2 Modality 1 Modality 2 36
37 Deep Multimodal autoencoders A deep representation learning approach A bimodal auto-encoder Used for Audio-visual speech recognition [Ngiam et al., Multimodal Deep Learning, 2011]
38 Deep Multimodal autoencoders - training Individual modalities can be pretrained RBMs Denoising Autoencoders To train the model to reconstruct the other modality Use both Remove audio [Ngiam et al., Multimodal Deep Learning, 2011]
39 Deep Multimodal autoencoders - training Individual modalities can be pretrained RBMs Denoising Autoencoders To train the model to reconstruct the other modality Use both Remove audio Remove video [Ngiam et al., Multimodal Deep Learning, 2011]
40 Multimodal Encoder-Decoder Visual modality often encoded using CNN Language modality will be decoded using LSTM A simple multilayer perceptron will be used to translate from visual (CNN) to language (LSTM) Text X Image Y 40
41 Multimodal Joint Representation For supervised learning tasks Joining the unimodal representations: Simple concatenation Element-wise multiplication or summation Multilayer perceptron How to explicitly model both unimodal and bimodal interactions? h x e.g. Sentiment softmax Text X h m Image Y h y
42 Multimodal Sentiment Analysis Sentiment Intensity [-3,+3] softmax h m h x h y h z h m = f W h x, h y, h z Text X Image Y Audio Z 42
43 Trimodal Bimodal Unimodal Unimodal, Bimodal and Trimodal Interactions Speaker s behaviors Sentiment Intensity This movie is sick? Ambiguous! This movie is fair Smile Loud voice? Unimodal cues Ambiguous! This movie is sick Smile This movie is sick Frown This movie is sick Loud voice? Resolves ambiguity (bimodal interaction) Still Ambiguous! This movie is sick Smile Loud voice This movie is fair Smile Loud voice Different trimodal interactions! 43
44 Multimodal Tensor Fusion Network (TFN) Models both unimodal and bimodal interactions: h m = h x 1 h y 1 = h x h x h y 1 h y Bimodal Unimodal e.g. Sentiment softmax 1 h m Important! h x h y [Zadeh, Jones and Morency, EMNLP 2017] Text X Image Y 44
45 Multimodal Tensor Fusion Network (TFN) h x Can be extended to three modalities: h x h z h x h y h m = h x h y 1 1 h z 1 h y h z h y h z h x h y h z Explicitly models unimodal, bimodal and trimodal interactions! [Zadeh, Jones and Morency, EMNLP 2017] h x Text X h y Image Y Audio Z h z 45
46 Experimental Results MOSI Dataset Improvement over State-Of-The-Art 46
47 Coordinated Multimodal Representations Learn (unsupervised) two or more coordinated representations from multiple modalities. A loss function is defined to bring closer these multiple representations. Similarity metric (e.g., cosine distance) Text X Image Y 47
48 Deep Canonical Correlation Analysis Same objective function as CCA: argmax V,U,W x,w y corr H x, H y View H x 1 Linear projections maximizing correlation 2 Orthogonal projections 3 Unit variance of the projection vectors Andrew et al., ICML 2013 H x View H y U V W x W y Text Image X Y H y 48
49 Deep Canonically Correlated Autoencoders (DCCAE) Jointly optimize for DCCA and autoencoders loss functions A trade-off between multi-view correlation and reconstruction error from individual views X Text View H x Y Image Wang et al., ICML 2015 H x View H y U V W x W y Text Image X Y H y 49
50 Implicit alignment 50
51 Machine Translation Given a sentence in one language translate it to another Dog on the beach le chien sur la plage
52 Attention Model for Machine Translation Dog Attention module / gate Hidden state s 0 Context z 0 h 1 h 2 h 3 h 4 h 5 Encoder le chien sur la plage [Bahdanau et al., Neural Machine Translation by Jointly Learning to Align and Translate, ICLR 2015]
53 Attention Model for Machine Translation Dog on Attention module / gate Hidden state s 1 Context z 1 h 1 h 2 h 3 h 4 h 5 Encoder le chien sur la plage [Bahdanau et al., Neural Machine Translation by Jointly Learning to Align and Translate, ICLR 2015]
54 Attention Model for Machine Translation Dog on the Attention module / gate Hidden state s 2 h 1 h 2 h 3 h 4 h 5 Context z 2 Encoder le chien sur la plage [Bahdanau et al., Neural Machine Translation by Jointly Learning to Align and Translate, ICLR 2015]
55 Attention Model for Machine Translation 55
56 Attention Model for Image Captioning Distribution over L locations Output word a 1 a 2 d 1 a 3 d 2 s 0 s 1 s 2 z 1 y 1 z 2 y 2 Expectation over features: D First word 56
57 Attention Model for Image Captioning
58 Attention Model for Video Sequences [Pei, Baltrušaitis, Tax and Morency. Temporal Attention-Gated Model for Robust Sequence Classification, CVPR, 2017 ]
59 Temporal Attention-Gated Model (TAGM) 1 a t Recurrent Attention-Gated Unit h t 1 x + h t x t + ReLU x a t 1 a t a t+1 h t 1 h t h t+1 x t 1 x t x t+1 59
60 Temporal Attention Gated Model (TAGM) [Pei, Baltrušaitis, Tax and Morency. Temporal Attention-Gated Model for Robust Sequence Classification, CVPR, 2017 ]
61 Multimodal Fusion 61
62 Multiple Kernel Learning Pick a family of kernels for each modality and learn which kernels are important for the classification case Generalizes the idea of Support Vector Machines Works as well for unimodal and multimodal data, very little adaptation is needed [Lanckriet 2004]
63 Multimodal Fusion for Sequential Data Modality-private structure Internal grouping of observations Modality-shared structure Interaction and synchrony p y x A, x V ; θ) = h A,h V p y, h A, h V x A, x V ; θ Multi-View Hidden Conditional Sentiment Random Field y A A A A A h 1 h 2 h 3 h 4 h 5 x 1 A V V V V V h 1 h 2 h 3 h 4 h 5 A A A A x 2 x 3 x 4 x 5 V V x 1 x 2 x 3 V x 4 V x 5 V We saw the yellowdog Approximate inference using loopy-belief [Song, Morency and Davis, CVPR 2012] 63
64 Sequence Modeling with LSTM y 1 y 2 y 3 y τ LSTM (1) LSTM (2) LSTM (3) LSTM (τ) x 1 x 2 x 3 x τ 64
65 Multimodal Sequence Modeling Early Fusion y 1 y 2 y 3 y τ LSTM (1) LSTM (2) LSTM (3) LSTM (τ) (1) (1) (1) (1) x 1 x 2 x 3 x τ (2) (2) (2) (2) x 1 x 2 x 3 x τ (3) (3) (3) (3) x 1 x 2 x 3 x τ 65
66 Multi-View Long Short-Term Memory (MV-LSTM) y 1 y 2 y 3 y τ MV- LSTM (1) MV- LSTM (2) MV- LSTM (3) MV- LSTM (τ) (1) (1) (1) (1) x 1 x 2 x 3 x τ (2) (2) (2) (2) x 1 x 2 x 3 x τ (3) (3) (3) (3) x 1 x 2 x 3 x τ [Shyam, Morency, et al. Extending Long Short-Term Memory for Multi-View Structured Learning, ECCV, 2016] 66
67 Multi-View Long Short-Term Memory Multi-view topologies Multiple memory cells MV- LSTM (1) (1) h t 1 (2) h t 1 (3) h t 1 MVtanh (1) g t (2) g t g (3) c t (1) c t (2) c t (3) h t (1) h t (2) h t (3) x t (1) MVsigm x t (2) x t (3) MVsigm MVsigm [Shyam, Morency, et al. Extending Long Short-Term Memory for Multi-View Structured Learning, ECCV, 2016] 67
68 Topologies for Multi-View LSTM Fullyconnected α=1, β=1 MV- LSTM (1) (1) h t 1 (2) h t 1 (3) h t 1 Multi-view topologies (1) g t (2) g t g (3) x t (1) (1) h t 1 (2) h t 1 (3) h t 1 View-specific α=1, β=0 (1) g t x t (1) (1) h t 1 (2) h t 1 (3) h t 1 MVtanh (1) g t x t (1) x t (2) x t (3) α: Memory from current view β: Memory from other views Design parameters x t (1) (1) h t 1 (2) h t 1 (3) h t 1 Coupled α=0, β=1 (1) g t x t (1) (1) h t 1 (2) h t 1 (3) h t 1 Hybrid α=2/3, β=1/3 (1) g t [Shyam, Morency, et al. Extending Long Short-Term Memory for Multi-View Structured Learning, ECCV, 2016] 68
69 Multi-View Long Short-Term Memory (MV-LSTM) Multimodal prediction of children engagement [Shyam, Morency, et al. Extending Long Short-Term Memory for Multi-View Structured Learning, ECCV, 2016] 69
70 Memory Based A memory accumulates multimodal information over time. From the representations throughout a source network. No need to modify the structure of the source network, only attached the memory. 70
71 Memory Based [Zadeh et al., Memory Fusion Network for Multi-view Sequential Learning, AAAI 2018] 71
72 Multimodal Machine Learning Tadas Baltrusaitis, Chaitanya Ahuja, and Louis-Philippe Morency 72
Tutorial on Multimodal Machine Learning
Tutorial on Multimodal Machine Learning Louis-Philippe (LP) Morency Tadas Baltrusaitis CMU Multimodal Communication and Machine Learning Laboratory [MultiComp Lab] 1 Your Instructors Louis-Philippe Morency
More informationEfficientLow-rank Multimodal Fusion With Modality-specific Factors
EfficientLow-rank Multimodal Fusion With Modality-specific Factors Zhun Liu, Ying Shen, Varun Bharadwaj, Paul Pu Liang, Amir Zadeh, Louis-Philippe Morency Artificial Intelligence Multimodal Sentiment and
More informationMultimodal context analysis and prediction
Multimodal context analysis and prediction Valeria Tomaselli (valeria.tomaselli@st.com) Sebastiano Battiato Giovanni Maria Farinella Tiziana Rotondo (PhD student) Outline 2 Context analysis vs prediction
More informationCSC321 Lecture 16: ResNets and Attention
CSC321 Lecture 16: ResNets and Attention Roger Grosse Roger Grosse CSC321 Lecture 16: ResNets and Attention 1 / 24 Overview Two topics for today: Topic 1: Deep Residual Networks (ResNets) This is the state-of-the
More informationDeep Learning Autoencoder Models
Deep Learning Autoencoder Models Davide Bacciu Dipartimento di Informatica Università di Pisa Intelligent Systems for Pattern Recognition (ISPR) Generative Models Wrap-up Deep Learning Module Lecture Generative
More informationDeep Learning Sequence to Sequence models: Attention Models. 17 March 2018
Deep Learning Sequence to Sequence models: Attention Models 17 March 2018 1 Sequence-to-sequence modelling Problem: E.g. A sequence X 1 X N goes in A different sequence Y 1 Y M comes out Speech recognition:
More informationMachine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016
Machine Learning for Signal Processing Neural Networks Continue Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016 1 So what are neural networks?? Voice signal N.Net Transcription Image N.Net Text
More informationArtificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino
Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as
More informationDeep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści
Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?
More informationUnsupervised Learning
CS 3750 Advanced Machine Learning hkc6@pitt.edu Unsupervised Learning Data: Just data, no labels Goal: Learn some underlying hidden structure of the data P(, ) P( ) Principle Component Analysis (Dimensionality
More informationDeep Learning for Computer Vision
Deep Learning for Computer Vision Spring 2018 http://vllab.ee.ntu.edu.tw/dlcv.html (primary) https://ceiba.ntu.edu.tw/1062dlcv (grade, etc.) FB: DLCV Spring 2018 Yu-Chiang Frank Wang 王鈺強, Associate Professor
More informationRecurrent Neural Networks (Part - 2) Sumit Chopra Facebook
Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook Recap Standard RNNs Training: Backpropagation Through Time (BPTT) Application to sequence modeling Language modeling Applications: Automatic speech
More informationTUTORIAL PART 1 Unsupervised Learning
TUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science Univ. of Toronto ranzato@cs.toronto.edu Co-organizers: Honglak Lee, Yoshua Bengio, Geoff Hinton, Yann LeCun, Andrew
More informationSequence Modeling with Neural Networks
Sequence Modeling with Neural Networks Harini Suresh y 0 y 1 y 2 s 0 s 1 s 2... x 0 x 1 x 2 hat is a sequence? This morning I took the dog for a walk. sentence medical signals speech waveform Successes
More informationLecture 14: Deep Generative Learning
Generative Modeling CSED703R: Deep Learning for Visual Recognition (2017F) Lecture 14: Deep Generative Learning Density estimation Reconstructing probability density function using samples Bohyung Han
More informationArtificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen
Artificial Neural Networks Introduction to Computational Neuroscience Tambet Matiisen 2.04.2018 Artificial neural network NB! Inspired by biology, not based on biology! Applications Automatic speech recognition
More informationa) b) (Natural Language Processing; NLP) (Deep Learning) Bag of words White House RGB [1] IBM
c 1. (Natural Language Processing; NLP) (Deep Learning) RGB IBM 135 8511 5 6 52 yutat@jp.ibm.com a) b) 2. 1 0 2 1 Bag of words White House 2 [1] 2015 4 Copyright c by ORSJ. Unauthorized reproduction of
More informationReview: Learning Bimodal Structures in Audio-Visual Data
Review: Learning Bimodal Structures in Audio-Visual Data CSE 704 : Readings in Joint Visual, Lingual and Physical Models and Inference Algorithms Suren Kumar Vision and Perceptual Machines Lab 106 Davis
More informationMultimodal human behavior analysis: Learning correlation and interaction across modalities
Multimodal human behavior analysis: Learning correlation and interaction across modalities The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story
More informationMaking Deep Learning Understandable for Analyzing Sequential Data about Gene Regulation
Making Deep Learning Understandable for Analyzing Sequential Data about Gene Regulation Dr. Yanjun Qi Department of Computer Science University of Virginia Tutorial @ ACM BCB-2018 8/29/18 Yanjun Qi / UVA
More informationAffect recognition from facial movements and body gestures by hierarchical deep spatio-temporal features and fusion strategy
Accepted Manuscript Affect recognition from facial movements and body gestures by hierarchical deep spatio-temporal features and fusion strategy Bo Sun, Siming Cao, Jun He, Lejun Yu PII: S0893-6080(17)30284-8
More informationLearning Deep Architectures for AI. Part II - Vijay Chakilam
Learning Deep Architectures for AI - Yoshua Bengio Part II - Vijay Chakilam Limitations of Perceptron x1 W, b 0,1 1,1 y x2 weight plane output =1 output =0 There is no value for W and b such that the model
More informationarxiv: v2 [cs.cl] 1 Jan 2019
Variational Self-attention Model for Sentence Representation arxiv:1812.11559v2 [cs.cl] 1 Jan 2019 Qiang Zhang 1, Shangsong Liang 2, Emine Yilmaz 1 1 University College London, London, United Kingdom 2
More informationPresented By: Omer Shmueli and Sivan Niv
Deep Speaker: an End-to-End Neural Speaker Embedding System Chao Li, Xiaokong Ma, Bing Jiang, Xiangang Li, Xuewei Zhang, Xiao Liu, Ying Cao, Ajay Kannan, Zhenyao Zhu Presented By: Omer Shmueli and Sivan
More informationSegmental Recurrent Neural Networks for End-to-end Speech Recognition
Segmental Recurrent Neural Networks for End-to-end Speech Recognition Liang Lu, Lingpeng Kong, Chris Dyer, Noah Smith and Steve Renals TTI-Chicago, UoE, CMU and UW 9 September 2016 Background A new wave
More informationUNSUPERVISED LEARNING
UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training
More informationCSC321 Lecture 20: Autoencoders
CSC321 Lecture 20: Autoencoders Roger Grosse Roger Grosse CSC321 Lecture 20: Autoencoders 1 / 16 Overview Latent variable models so far: mixture models Boltzmann machines Both of these involve discrete
More informationDeep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang
Deep Learning Basics Lecture 7: Factor Analysis Princeton University COS 495 Instructor: Yingyu Liang Supervised v.s. Unsupervised Math formulation for supervised learning Given training data x i, y i
More informationModelling Time Series with Neural Networks. Volker Tresp Summer 2017
Modelling Time Series with Neural Networks Volker Tresp Summer 2017 1 Modelling of Time Series The next figure shows a time series (DAX) Other interesting time-series: energy prize, energy consumption,
More informationMultimodal Deep Learning for Predicting Survival from Breast Cancer
Multimodal Deep Learning for Predicting Survival from Breast Cancer Heather Couture Deep Learning Journal Club Nov. 16, 2016 Outline Background on tumor histology & genetic data Background on survival
More informationLecture 5 Neural models for NLP
CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm
More informationCorrelation Autoencoder Hashing for Supervised Cross-Modal Search
Correlation Autoencoder Hashing for Supervised Cross-Modal Search Yue Cao, Mingsheng Long, Jianmin Wang, and Han Zhu School of Software Tsinghua University The Annual ACM International Conference on Multimedia
More informationA Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement
A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement Simon Leglaive 1 Laurent Girin 1,2 Radu Horaud 1 1: Inria Grenoble Rhône-Alpes 2: Univ. Grenoble Alpes, Grenoble INP,
More informationIntroduction to Deep Neural Networks
Introduction to Deep Neural Networks Presenter: Chunyuan Li Pattern Classification and Recognition (ECE 681.01) Duke University April, 2016 Outline 1 Background and Preliminaries Why DNNs? Model: Logistic
More informationc Copyright 2016 Galen Andrew
c Copyright 2016 Galen Andrew New Techniques in Deep Representation Learning Galen Andrew A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy University
More informationCSC321 Lecture 15: Recurrent Neural Networks
CSC321 Lecture 15: Recurrent Neural Networks Roger Grosse Roger Grosse CSC321 Lecture 15: Recurrent Neural Networks 1 / 26 Overview Sometimes we re interested in predicting sequences Speech-to-text and
More informationTutorial on Methods for Interpreting and Understanding Deep Neural Networks. Part 3: Applications & Discussion
Tutorial on Methods for Interpreting and Understanding Deep Neural Networks W. Samek, G. Montavon, K.-R. Müller Part 3: Applications & Discussion ICASSP 2017 Tutorial W. Samek, G. Montavon & K.-R. Müller
More informationRecurrent Neural Networks (RNN) and Long-Short-Term-Memory (LSTM) Yuan YAO HKUST
1 Recurrent Neural Networks (RNN) and Long-Short-Term-Memory (LSTM) Yuan YAO HKUST Summary We have shown: Now First order optimization methods: GD (BP), SGD, Nesterov, Adagrad, ADAM, RMSPROP, etc. Second
More informationRecurrent Neural Network
Recurrent Neural Network Xiaogang Wang xgwang@ee..edu.hk March 2, 2017 Xiaogang Wang (linux) Recurrent Neural Network March 2, 2017 1 / 48 Outline 1 Recurrent neural networks Recurrent neural networks
More informationattention mechanisms and generative models
attention mechanisms and generative models Master's Deep Learning Sergey Nikolenko Harbour Space University, Barcelona, Spain November 20, 2017 attention in neural networks attention You re paying attention
More informationNatural Language Processing
Natural Language Processing Pushpak Bhattacharyya CSE Dept, IIT Patna and Bombay LSTM 15 jun, 2017 lgsoft:nlp:lstm:pushpak 1 Recap 15 jun, 2017 lgsoft:nlp:lstm:pushpak 2 Feedforward Network and Backpropagation
More informationDeep Learning Architectures and Algorithms
Deep Learning Architectures and Algorithms In-Jung Kim 2016. 12. 2. Agenda Introduction to Deep Learning RBM and Auto-Encoders Convolutional Neural Networks Recurrent Neural Networks Reinforcement Learning
More informationLong-Short Term Memory and Other Gated RNNs
Long-Short Term Memory and Other Gated RNNs Sargur Srihari srihari@buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Sequence Modeling
More informationNormalization Techniques in Training of Deep Neural Networks
Normalization Techniques in Training of Deep Neural Networks Lei Huang ( 黄雷 ) State Key Laboratory of Software Development Environment, Beihang University Mail:huanglei@nlsde.buaa.edu.cn August 17 th,
More informationRandom Coattention Forest for Question Answering
Random Coattention Forest for Question Answering Jheng-Hao Chen Stanford University jhenghao@stanford.edu Ting-Po Lee Stanford University tingpo@stanford.edu Yi-Chun Chen Stanford University yichunc@stanford.edu
More informationLecture 15: Recurrent Neural Nets
Lecture 15: Recurrent Neural Nets Roger Grosse 1 Introduction Most of the prediction tasks we ve looked at have involved pretty simple kinds of outputs, such as real values or discrete categories. But
More informationStephen Scott.
1 / 35 (Adapted from Vinod Variyam and Ian Goodfellow) sscott@cse.unl.edu 2 / 35 All our architectures so far work on fixed-sized inputs neural networks work on sequences of inputs E.g., text, biological
More informationIntroduction to RNNs!
Introduction to RNNs Arun Mallya Best viewed with Computer Modern fonts installed Outline Why Recurrent Neural Networks (RNNs)? The Vanilla RNN unit The RNN forward pass Backpropagation refresher The RNN
More informationCSC321 Lecture 7 Neural language models
CSC321 Lecture 7 Neural language models Roger Grosse and Nitish Srivastava February 1, 2015 Roger Grosse and Nitish Srivastava CSC321 Lecture 7 Neural language models February 1, 2015 1 / 19 Overview We
More informationRecurrent Neural Networks. deeplearning.ai. Why sequence models?
Recurrent Neural Networks deeplearning.ai Why sequence models? Examples of sequence data The quick brown fox jumped over the lazy dog. Speech recognition Music generation Sentiment classification There
More informationSTA 414/2104: Lecture 8
STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA
More informationBased on the original slides of Hung-yi Lee
Based on the original slides of Hung-yi Lee New Activation Function Rectified Linear Unit (ReLU) σ z a a = z Reason: 1. Fast to compute 2. Biological reason a = 0 [Xavier Glorot, AISTATS 11] [Andrew L.
More informationLecture 11 Recurrent Neural Networks I
Lecture 11 Recurrent Neural Networks I CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor niversity of Chicago May 01, 2017 Introduction Sequence Learning with Neural Networks Some Sequence Tasks
More informationEE-559 Deep learning Recurrent Neural Networks
EE-559 Deep learning 11.1. Recurrent Neural Networks François Fleuret https://fleuret.org/ee559/ Sun Feb 24 20:33:31 UTC 2019 Inference from sequences François Fleuret EE-559 Deep learning / 11.1. Recurrent
More informationLecture 11 Recurrent Neural Networks I
Lecture 11 Recurrent Neural Networks I CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 01, 2017 Introduction Sequence Learning with Neural Networks Some Sequence Tasks
More informationNeural Architectures for Image, Language, and Speech Processing
Neural Architectures for Image, Language, and Speech Processing Karl Stratos June 26, 2018 1 / 31 Overview Feedforward Networks Need for Specialized Architectures Convolutional Neural Networks (CNNs) Recurrent
More informationTask-Oriented Dialogue System (Young, 2000)
2 Review Task-Oriented Dialogue System (Young, 2000) 3 http://rsta.royalsocietypublishing.org/content/358/1769/1389.short Speech Signal Speech Recognition Hypothesis are there any action movies to see
More informationDeep Learning Recurrent Networks 2/28/2018
Deep Learning Recurrent Networks /8/8 Recap: Recurrent networks can be incredibly effective Story so far Y(t+) Stock vector X(t) X(t+) X(t+) X(t+) X(t+) X(t+5) X(t+) X(t+7) Iterated structures are good
More informationCourse Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch
Psychology 452 Week 12: Deep Learning What Is Deep Learning? Preliminary Ideas (that we already know!) The Restricted Boltzmann Machine (RBM) Many Layers of RBMs Pros and Cons of Deep Learning Course Structure
More informationSpeaker Representation and Verification Part II. by Vasileios Vasilakakis
Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation
More informationFeature Design. Feature Design. Feature Design. & Deep Learning
Artificial Intelligence and its applications Lecture 9 & Deep Learning Professor Daniel Yeung danyeung@ieee.org Dr. Patrick Chan patrickchan@ieee.org South China University of Technology, China Appropriately
More informationDynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji
Dynamic Data Modeling, Recognition, and Synthesis Rui Zhao Thesis Defense Advisor: Professor Qiang Ji Contents Introduction Related Work Dynamic Data Modeling & Analysis Temporal localization Insufficient
More informationA Deep Learning Model for Structured Outputs with High-order Interaction
A Deep Learning Model for Structured Outputs with High-order Interaction Hongyu Guo, Xiaodan Zhu, Martin Renqiang Min National Research Council of Canada, Ottawa, ON {hongyu.guo, xiaodan.zhu}@nrc-cnrc.gc.ca
More informationMachine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group
Machine Learning for Computer Vision 8. Neural Networks and Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group INTRODUCTION Nonlinear Coordinate Transformation http://cs.stanford.edu/people/karpathy/convnetjs/
More informationApplied Natural Language Processing
Applied Natural Language Processing Info 256 Lecture 20: Sequence labeling (April 9, 2019) David Bamman, UC Berkeley POS tagging NNP Labeling the tag that s correct for the context. IN JJ FW SYM IN JJ
More informationDeep Learning Srihari. Deep Belief Nets. Sargur N. Srihari
Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous
More informationLearning Deep Architectures
Learning Deep Architectures Yoshua Bengio, U. Montreal Microsoft Cambridge, U.K. July 7th, 2009, Montreal Thanks to: Aaron Courville, Pascal Vincent, Dumitru Erhan, Olivier Delalleau, Olivier Breuleux,
More informationMemory-Augmented Attention Model for Scene Text Recognition
Memory-Augmented Attention Model for Scene Text Recognition Cong Wang 1,2, Fei Yin 1,2, Cheng-Lin Liu 1,2,3 1 National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences
More informationWaveNet: A Generative Model for Raw Audio
WaveNet: A Generative Model for Raw Audio Ido Guy & Daniel Brodeski Deep Learning Seminar 2017 TAU Outline Introduction WaveNet Experiments Introduction WaveNet is a deep generative model of raw audio
More informationDimensionality Reduction and Principle Components Analysis
Dimensionality Reduction and Principle Components Analysis 1 Outline What is dimensionality reduction? Principle Components Analysis (PCA) Example (Bishop, ch 12) PCA vs linear regression PCA as a mixture
More informationDeep Learning for Speech Recognition. Hung-yi Lee
Deep Learning for Speech Recognition Hung-yi Lee Outline Conventional Speech Recognition How to use Deep Learning in acoustic modeling? Why Deep Learning? Speaker Adaptation Multi-task Deep Learning New
More informationSlide credit from Hung-Yi Lee & Richard Socher
Slide credit from Hung-Yi Lee & Richard Socher 1 Review Recurrent Neural Network 2 Recurrent Neural Network Idea: condition the neural network on all previous words and tie the weights at each time step
More informationECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction
ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationDeep unsupervised learning
Deep unsupervised learning Advanced data-mining Yongdai Kim Department of Statistics, Seoul National University, South Korea Unsupervised learning In machine learning, there are 3 kinds of learning paradigm.
More informationNeural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35
Neural Networks David Rosenberg New York University July 26, 2017 David Rosenberg (New York University) DS-GA 1003 July 26, 2017 1 / 35 Neural Networks Overview Objectives What are neural networks? How
More informationMulti-Signal Gesture Recognition Using Temporal Smoothing Hidden Conditional Random Fields
Multi-Signal Gesture Recognition Using Temporal Smoothing Hidden Conditional Random Fields Yale Song, David Demirdjian, and Randall Davis MIT Computer Science and Artificial Intelligence Laboratory 32
More informationREVISIT ENCODER & DECODER
PERCEPTION-LINK BEHAVIOR MODEL: REVISIT ENCODER & DECODER IMI PHD Presentation Presenter: William Gu Yuanlong (PhD student) Supervisor: Assoc. Prof. Gerald Seet Gim Lee Co-Supervisor: Prof. Nadia Magnenat-Thalmann
More informationarxiv: v1 [cs.cl] 21 May 2017
Spelling Correction as a Foreign Language Yingbo Zhou yingbzhou@ebay.com Utkarsh Porwal uporwal@ebay.com Roberto Konow rkonow@ebay.com arxiv:1705.07371v1 [cs.cl] 21 May 2017 Abstract In this paper, we
More informationDeep Learning for NLP
Deep Learning for NLP CS224N Christopher Manning (Many slides borrowed from ACL 2012/NAACL 2013 Tutorials by me, Richard Socher and Yoshua Bengio) Machine Learning and NLP NER WordNet Usually machine learning
More informationDeep Generative Models. (Unsupervised Learning)
Deep Generative Models (Unsupervised Learning) CEng 783 Deep Learning Fall 2017 Emre Akbaş Reminders Next week: project progress demos in class Describe your problem/goal What you have done so far What
More informationConvolutional Neural Networks II. Slides from Dr. Vlad Morariu
Convolutional Neural Networks II Slides from Dr. Vlad Morariu 1 Optimization Example of optimization progress while training a neural network. (Loss over mini-batches goes down over time.) 2 Learning rate
More informationHow to do backpropagation in a brain
How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep
More informationMachine Learning for Physicists Lecture 1
Machine Learning for Physicists Lecture 1 Summer 2017 University of Erlangen-Nuremberg Florian Marquardt (Image generated by a net with 20 hidden layers) OUTPUT INPUT (Picture: Wikimedia Commons) OUTPUT
More informationarxiv: v1 [cs.lg] 3 Feb 2018
Memory Fusion Network for Multi-view Sequential Learning Amir Zadeh Carnegie Mellon University, USA abagherz@cs.cmu.edu Paul Pu Liang Carnegie Mellon University, USA pliang@cs.cmu.edu Navonil Mazumder
More information11/3/15. Deep Learning for NLP. Deep Learning and its Architectures. What is Deep Learning? Advantages of Deep Learning (Part 1)
11/3/15 Machine Learning and NLP Deep Learning for NLP Usually machine learning works well because of human-designed representations and input features CS224N WordNet SRL Parser Machine learning becomes
More informationNatural Language Processing and Recurrent Neural Networks
Natural Language Processing and Recurrent Neural Networks Pranay Tarafdar October 19 th, 2018 Outline Introduction to NLP Word2vec RNN GRU LSTM Demo What is NLP? Natural Language? : Huge amount of information
More informationINF 5860 Machine learning for image classification. Lecture 14: Reinforcement learning May 9, 2018
Machine learning for image classification Lecture 14: Reinforcement learning May 9, 2018 Page 3 Outline Motivation Introduction to reinforcement learning (RL) Value function based methods (Q-learning)
More informationLecture 17: Neural Networks and Deep Learning
UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions
More informationHigh Order LSTM/GRU. Wenjie Luo. January 19, 2016
High Order LSTM/GRU Wenjie Luo January 19, 2016 1 Introduction RNN is a powerful model for sequence data but suffers from gradient vanishing and explosion, thus difficult to be trained to capture long
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)
More informationAdministration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6
Administration Registration Hw3 is out Due on Thursday 10/6 Questions Lecture Captioning (Extra-Credit) Look at Piazza for details Scribing lectures With pay; come talk to me/send email. 1 Projects Projects
More informationGlobal SNR Estimation of Speech Signals using Entropy and Uncertainty Estimates from Dropout Networks
Interspeech 2018 2-6 September 2018, Hyderabad Global SNR Estimation of Speech Signals using Entropy and Uncertainty Estimates from Dropout Networks Rohith Aralikatti, Dilip Kumar Margam, Tanay Sharma,
More informationTwo-Stream Bidirectional Long Short-Term Memory for Mitosis Event Detection and Stage Localization in Phase-Contrast Microscopy Images
Two-Stream Bidirectional Long Short-Term Memory for Mitosis Event Detection and Stage Localization in Phase-Contrast Microscopy Images Yunxiang Mao and Zhaozheng Yin (B) Computer Science, Missouri University
More informationIndependent Component Analysis and Unsupervised Learning
Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien National Cheng Kung University TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent
More informationUnsupervised Learning of Hierarchical Models. in collaboration with Josh Susskind and Vlad Mnih
Unsupervised Learning of Hierarchical Models Marc'Aurelio Ranzato Geoff Hinton in collaboration with Josh Susskind and Vlad Mnih Advanced Machine Learning, 9 March 2011 Example: facial expression recognition
More informationDeep Reinforcement Learning for Unsupervised Video Summarization with Diversity- Representativeness Reward
Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity- Representativeness Reward Kaiyang Zhou, Yu Qiao, Tao Xiang AAAI 2018 What is video summarization? Goal: to automatically
More informationIndex. Santanu Pattanayak 2017 S. Pattanayak, Pro Deep Learning with TensorFlow,
Index A Activation functions, neuron/perceptron binary threshold activation function, 102 103 linear activation function, 102 rectified linear unit, 106 sigmoid activation function, 103 104 SoftMax activation
More informationRecurrent Neural Networks. Jian Tang
Recurrent Neural Networks Jian Tang tangjianpku@gmail.com 1 RNN: Recurrent neural networks Neural networks for sequence modeling Summarize a sequence with fix-sized vector through recursively updating
More informationRecurrent Neural Networks with Flexible Gates using Kernel Activation Functions
2018 IEEE International Workshop on Machine Learning for Signal Processing (MLSP 18) Recurrent Neural Networks with Flexible Gates using Kernel Activation Functions Authors: S. Scardapane, S. Van Vaerenbergh,
More information