EfficientLow-rank Multimodal Fusion With Modality-specific Factors

Size: px

Start display at page:

Download "EfficientLow-rank Multimodal Fusion With Modality-specific Factors"

Domenic Cunningham
5 years ago
Views:

1 EfficientLow-rank Multimodal Fusion With Modality-specific Factors Zhun Liu, Ying Shen, Varun Bharadwaj, Paul Pu Liang, Amir Zadeh, Louis-Philippe Morency

2 Artificial Intelligence

3 Multimodal Sentiment and Emotion Analysis Speaker s behaviors Sentiment Intensity Trimodal Bimodal Unimodal This movie is sick Smile? 1 Intra-modal Interactions 2 Cross-modal Interactions 3 Computational Efficiency Loud time Multimodal Representation (Multimodal Fusion)

Multimodal Fusion using Tensor Representation Bimodal Visual h This movie is sick Language Multimodal Representation Unimodal Intra-modal interactions

4 Multimodal Fusion using Tensor Representation Bimodal Visual h This movie is sick Language Multimodal Representation Unimodal Intra-modal interactions Cross-modal interactions Computational efficiency Z = z #1 z % 1 = z # z # z % 1 z % Tensor Fusion Network for Multimodal Sentiment Analysis by Zadeh, A., et, al. (2017)

5 Computational Complexity Tensor Product O 3 M d m m61 1 Z 1 Z O(d 1 d 2 d 3 ) O(d 1 d 2 ) M=2 M=3

6 CORE CONTRIBUTIONS Low-rank Multimodal Fusion (LMF) 6

computation of h. 2 Decomposition of input tensor Z.

7 From Tensor Representation to Low-rank Fusion Low-rank Multimodal Fusion Visual Language 3 Rearrange the computation of h. 2 Decomposition of input tensor Z. 1 Decomposition of weight W. Visual Language Tensor Fusion Networks 7

8 Canonical Polyadic (CP) Decomposition of tensors Rank of tensor W: minimum number of vector tuples needed for exact reconstruction 8

9 Canonical Polyadic (CP) Decomposition of 3D tensors h h + h 9

10 Modality-specific Decomposition h h h Retain the dimension for the multimodal representation h during decomposition 10

11 1 Decomposition of weight tensor W z # 1 1 Z ; W = h z % 1 11

12 1 Decomposition of weight tensor W z # w # (>) w # (@) 1 Z ; + + = h 1 z % 1 w % (>) (@) w % 12

13 2 Decomposition of Z z # w # (>) w # (@) 1 Z ; + + = h 1 z % 1 w % (>) (@) w % 13

14 3 Rearranging computation 14

15 Low-rank Multimodal Fusion 15

16 Easily scales to more modalities Intra-modal interactions Cross-modal interactions Computational complexity 16

17 EXPERIMENTS AND RESULTS 17

18 Datasets CMU-MOSI POM IEMOCAP Sentiment Analysis Speaker Trait Recognition Emotion Recognition 2199 video segments Single-speaker From 93 Movie reviews 1000 full video clips Single-speaker Movie reviews video segments Dyadic interaction From 302 videos Segment level annotations Sentiment Real-valued Video level annotations 16 types of speaker traits Categorical annotations Segment level annotations 10 classes of emotions Categorical annotations 18

19 Compare to full rank tensor fusion CMU-MOSI Low-rank Multimodal Fusion (Our Model) LMF Tensor Fusion Networks (Zadeh, et al., 2017) TFN MAE Correlation Acc-2 F1 Acc-7 19

20 Compare to full rank tensor fusion POM CMU-MOSI IEMOCAP MAE Correlation MAE Correlation F1-Happy F1-Sad 20

21 Compare with State-of-the-Art Approaches CMU-MOSI Low-rank Multimodal Fusion (our model) LMF Memory Fusion Networks (Zadeh, et al., 2018) MFN Multi-attention Recurrent Networks (Zadeh, et al., 2018) MARN Tensor Fusion Networks (Zadeh, et al., 2017) TFN Multi-view LSTM (Rajagopalan, et al., 2016) MV-LSTM 0.0 Mean Average Error (MAE) Deep Fusion (Nojavanasghari, et al., 2016) Deep Fusion 21

22 Compare with Top 2 State-of-the-Art Approaches POM CMU-MOSI IEMOCAP LMF MFN MARN TFN MAE Correlation MV-LSTM MAE 81.0 Correlation F1-Angry F1-Sad 22

23 Efficiency Improvement CMU-MOSI 2500 LMF (Ours) TFN (Zadeh, et al., 2017) Efficiency Metric: Number of data samples processed per second Training Efficiency Testing Efficiency Training - samples/s Testing - samples/s 23

24 Conclusions Intra-modal interactions Cross-modal interactions Computational complexity State-of-the-art results 24

25 Thank you! Code:

26 Supplementary results Impact of rank settings

Multimodal Machine Learning

Multimodal Machine Learning Louis-Philippe (LP) Morency CMU Multimodal Communication and Machine Learning Laboratory [MultiComp Lab] 1 CMU Course 11-777: Multimodal Machine Learning 2 Lecture Objectives