Multi-theme Sentiment Analysis using Quantified Contextual

Similar documents
Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.

ANLP Lecture 22 Lexical Semantics with Dense Vectors

Logistic regression for conditional probability estimation

Uncertainty in prediction. Can we usually expect to get a perfect classifier, if we have enough training data?

Lecture 2: Probability, Naive Bayes

Generative Clustering, Topic Modeling, & Bayesian Inference

Understanding Comments Submitted to FCC on Net Neutrality. Kevin (Junhui) Mao, Jing Xia, Dennis (Woncheol) Jeong December 12, 2014

Naïve Bayes, Maxent and Neural Models

Comparative Document Analysis for Large Text Corpora

Topic Models and Applications to Short Documents

Topic Modeling Using Latent Dirichlet Allocation (LDA)

Latent Dirichlet Allocation Introduction/Overview

Boolean and Vector Space Retrieval Models

Instructions for NLP Practical (Units of Assessment) SVM-based Sentiment Detection of Reviews (Part 2)

CS 175, Project in Artificial Intelligence Lecture 3: Document Classification

Fast Logistic Regression for Text Categorization with Variable-Length N-grams

Comparative Summarization via Latent Dirichlet Allocation

Online Passive-Aggressive Algorithms. Tirgul 11

Bayesian Methods: Naïve Bayes

Semantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing

Online Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions?

Latent Dirichlet Allocation Based Multi-Document Summarization

Deep Learning for NLP

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

Aspect Term Extraction with History Attention and Selective Transformation 1

Prediction of Citations for Academic Papers

CS60021: Scalable Data Mining. Large Scale Machine Learning

6.036 midterm review. Wednesday, March 18, 15

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

arxiv: v1 [cs.ir] 25 Oct 2015

Machine Learning (CS 567) Lecture 2

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Adaptive Multi-Compositionality for Recursive Neural Models with Applications to Sentiment Analysis. July 31, 2014

Information Extraction from Text

CS145: INTRODUCTION TO DATA MINING

Tuning as Linear Regression

Latent Dirichlet Allocation

ECE 5984: Introduction to Machine Learning

Logistic Regression. Some slides adapted from Dan Jurfasky and Brendan O Connor

Lecture 6: Neural Networks for Representing Word Meaning

Language Models. CS6200: Information Retrieval. Slides by: Jesse Anderton

Classification: Analyzing Sentiment

Topic Models. Advanced Machine Learning for NLP Jordan Boyd-Graber OVERVIEW. Advanced Machine Learning for NLP Boyd-Graber Topic Models 1 of 1

Relational Stacked Denoising Autoencoder for Tag Recommendation. Hao Wang

A Practical Algorithm for Topic Modeling with Provable Guarantees

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views

Measuring Topic Quality in Latent Dirichlet Allocation

Hidden Markov Models

Introduction to Machine Learning Midterm Exam

Prepositional Phrase Attachment over Word Embedding Products

Driving Semantic Parsing from the World s Response

Comparative Document Analysis for Large Text Corpora

Statistical NLP for the Web

Regularization Introduction to Machine Learning. Matt Gormley Lecture 10 Feb. 19, 2018

Linear classifiers: Logistic regression

Lecture #11: Classification & Logistic Regression

Probability Review and Naïve Bayes

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Applied Natural Language Processing

ECE521 Lecture7. Logistic Regression

Collaborative Filtering with Aspect-based Opinion Mining: A Tensor Factorization Approach

Machine Learning Algorithm. Heejun Kim

Lecture 11 Linear regression

ECE 5424: Introduction to Machine Learning

DISTRIBUTIONAL SEMANTICS

text classification 3: neural networks

Natural Language Processing with Deep Learning CS224N/Ling284

ECE521 Lecture 7/8. Logistic Regression

Applying hlda to Practical Topic Modeling

An Empirical Study on Dimensionality Optimization in Text Mining for Linguistic Knowledge Acquisition

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Language Models. Tobias Scheffer

Bayesian Paragraph Vectors

Recent advances in Time Series Classification

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

A Regularized Recommendation Algorithm with Probabilistic Sentiment-Ratings


arxiv: v1 [cs.cl] 18 Nov 2018

Text mining and natural language analysis. Jefrey Lijffijt

Natural Language Processing (CSEP 517): Text Classification

Data Mining 2018 Logistic Regression Text Classification

Classification: Analyzing Sentiment

Generic Text Summarization

Scaling Semi-supervised Naive Bayes with Feature Marginals

Bayesian Learning (II)

Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields

A Neural Passage Model for Ad-hoc Document Retrieval

From statistics to data science. BAE 815 (Fall 2017) Dr. Zifei Liu

Neural networks CMSC 723 / LING 723 / INST 725 MARINE CARPUAT. Slides credit: Graham Neubig

Topic Modelling and Latent Dirichlet Allocation

Social Media & Text Analysis

Midterm sample questions

CSE 5243 INTRO. TO DATA MINING

A Continuous-Time Model of Topic Co-occurrence Trends

Information Retrieval and Web Search

Homework 3 COMS 4705 Fall 2017 Prof. Kathleen McKeown

ScienceDirect. Defining Measures for Location Visiting Preference

10-701/ Machine Learning - Midterm Exam, Fall 2010

Click Prediction and Preference Ranking of RSS Feeds

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines

(COM4513/6513) Week 1. Nikolaos Aletras ( Department of Computer Science University of Sheffield

Transcription:

Multi-theme Sentiment Analysis using Quantified Contextual Valence Shifters Hongkun Yu, Jingbo Shang, MeichunHsu, Malú Castellanos, Jiawei Han Presented by Jingbo Shang University of Illinois at Urbana-Champaign Oct 26, 2016 CIKM 2016

2 Outline q Observations and Definitions q Methodology: MTSA q Performance Study and Experimental Results q Conclusions and Future Work

3 Observation I - Multi-Theme q Review Examples q Observation q A sentiment word may express different polarity in different themes

What is a theme? q Review Examples 4 q Theme is a very general concept, it could be q Different aspects of products, e.g., service and environment for restaurants; q Different categories of review target, e.g., horror movie and romantic movie

Theme - Formal Definition q The themes in each review r are represented by a vector θ #, where θ #$ is the weight of theme i in the review r. q We assume such descriptors are given Aspects Battery Queue Screen Camera 0.3 0.1 0.6 Documents 0.7 0.3 1 1 5

6 Observation II - Shifter q Review Examples q Observation q The presences of contextual valence shifters may interfere the word polarity.

7 What is a shifter? q Review Examples q 3 types q q q Negation: not Intensifier: very Diminisher: slightly

Shifter - Formal Definition q Assumption q Shifters are theme-invariant. q The sentiment shifting effect of the shifter w is quantified as f ( R q S, represents the sentiment polarity score of the word w q Assumption q Product rule: s./$012#,( = f./$012# S ( q Examples q not happy = f 678 S :;<<= q very happy = f >?@= S :;<<= 8 q possibly happy = f <7AABCD= S :;<<=

9 Outline q Observations and Definitions q Methodology: MTSA q Performance Study and Experimental Results q Conclusions and Future Work

Methodology - What is MTSA? q A data-driven approach q Given a review corpus D, the sentiment label (polarity or score) and the theme descriptor θ q An unified word-level sentiment analysis model q Multi-theme q Theme embedding and word embedding to capture different sentiment polarities of the same word in different themes. q Shifter q Automatically discover the sentiment-changing patterns and quantify their effects. 10

11 Methodology Multi-theme q [Observation] A sentiment word may express different polarity in different themes. q The sentiment polarity for word j in theme i: s $H = p i T q j q p i -- theme i s embedding vector q q j -- word j s embedding vector q W OH is the occurrence of the word j in the document d q Normalizations such as TF-IDF may be applied q A document d is a bag-of-words q s O = θ O$ W OH $ H p i T q j q Feature-based Matrix Factorization [2]

12 Methodology Shifter q [Observation] The presences of contextual valence shifters may interfere the word polarity. q Theme-invariant sentiment words q The polarities of s $H are consistent among almost all themes. q Learn f based theme-invariant sentiment words q A logistic regression problem q Find the context of shifters; Mask the sentiments of common sentiment words; Infer the effect of shifters

13 Methodology Shifter q Example : very disappointed in the customer service s([very, disappointed, service, ]) : I do not love the flavor s([do, not, love,..]) Masked by shifters : very disappointed in the customer service s([very, service, ]) : I do not love the flavor s([do, not,..]) f very s disappointed f not s love Learn shifters effect values: very intensifier, not negation q Theme-invariant sentiment words: disappointed (-) & love(+); q Find the context of shifters (sliding window); q Infer the effect of shifters (a logistic regression problem).

Methodology MTSA 14 q Iterative learning process q Fix shifter effects à Learn theme and word embeddings q Feature-based Matrix Factorization q Fix theme and word embeddingsà Learn shifter effects q Logistic Regression q Additional challenges: q Not very Not Very q Not good Bad q Our solutions: Phrase Mining techniques [1] q not_very as a phrase shifter q not_good as a sentiment phrase

15 Outline q Observations and Definitions q Methodology: MTSA q Performance Study and Experimental Results q Conclusions and Future Work

16 Experimental Settings q Dataset Statistics q Theme Descriptor q Yelp & IMDB: LDA implementation in MALLET [4], 20 topics. q RT: A biterm topic model (BTM) [3] for short text, 5 topics. q Note: RT is too short for LDA to estimate the posterior topic distributions.

Multi-Theme Verification q Polarities of the same sentiment words in different themes q cozy, prepared, cheap, cash, boring, old 0.6 0.5 0.4 0.3 0.2 0.1 0 Cozy 0.6 0.5 0.4 0.3 0.2 0.1 0 Prepared 0.15 0.1 0.05 0-0.05-0.1-0.15-0.2-0.25 Cheap 0.5 0.4 0.3 0.2 0.1 0-0.1-0.2 Cash 0-0.2-0.4-0.6-0.8-1 -1.2 Boring 0.2 0.15 0.1 0.05 0-0.05-0.1-0.15-0.2-0.25-0.3 Old Restaurant Automotive Shopping Drink & Bar Gym 17

Shifter Learning Quality q Human Evaluation Design q Given a review and selected shifter modified sentiment words, check if after modification, the sentiment is correct or not. q Typical error by overfitting: they were actually really good q Bi-gram: actually good = -0.1304 q Ours: actually good = 1.729 q The intraclass correlation of 4 human judges is high enough to show agreement 18

Example Shifter Effects (Yelp) q Good negation: f 678 < 0.5 never: -1.33, not so: -1.00, not even: -0.75, not: -0.52, not very: -0.48, not really: -0.39, none: -0.27, no: -0.22, only: -0.18, not that: -0.13, nothing really: -0.11 19 q Good diminisher: 0.0 < f ADBX:8D= < 1.0 could: 0.12, reasonably: 0.17, few: 0.18, slightly: 0.18, nothing that: 0.18, felt: 0.22, before: 0.22, not overly: 0.25, would only: 0.25, than: 0.27, somehow: 0.28 q Good intensifier: f >?@= > 1.0 completely: 2.59, more than: 2.42, absolutely: 2.33, extremely: 2.33, really: 2.25, not only: 2.23, some really: 2.17, far: 2.15, particularly: 2.13, simply: 2.12, too: 2.06, excessively: 2.02, certainly: 2.00, most: 2.00, very: 1.96

20 Explainable Sentiment Analysis

21 Sentiment Classification q Evaluate binary classification accuracy q All datasets are close to be balanced Not substantially improved, especially in Yelp & IMDB. Why?

22 Sentiment Classification - Discussion q The instances are ranked by the ratio (number of shifters /number of tokens), from high to low. q When the ratio getting bigger, shifters exist in the review with a larger portion and the gain of modeling shifter effect is bigger.

23 Sentiment Classification - Discussion q From statistical perspective q over 93% of reviews have shifters q the portion of words (serving as features) adjusted in each review are 7.2/87 in Yelp dataset and 10.5/122.8 in IMDB dataset q From semantic perspective q Long reviews have many mentions of similar sentiment, i.e., people mention not happy and unhappy in the same review q Conclusion q Shifters may not play important roles for long document classification, but for shorter text or sentence level, they will be more effective.

24 Outline q Observations and Definitions q Methodology: MTSA q Performance Study and Experimental Results q Conclusions and Future Work

25 Conclusions and Future Work q Conclusions q Discovered shifters with quantified effects enable people better understanding reviews q Multi-theme classifiers and shifter discovery are beneficial to sentiment analysis q Shifters only offers limited power to boost sentiment classification for long reviews, in accordance with literatures q Future Work q Beyond bag-of-words feature representations q Linguistic grammar to distinguish shifters

26 Reference q [1] Liu, Jialu, et al. "Mining quality phrases from massive text corpora."proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM, 2015. q [2] Shang, Jingbo, et al. "A Parallel and Efficient Algorithm for Learning to Match." 2014 IEEE International Conference on Data Mining. IEEE, 2014. q [3] Yan, Xiaohui, et al. "A biterm topic model for short texts." Proceedings of the 22nd international conference on World Wide Web. ACM, 2013. q [4] McCallum, Andrew Kachites. "Mallet: A machine learning for language toolkit." (2002).

27 Q&A Thanks!

28 Sentiment Classification - Iterative Refinement