Bag of Words Meets Bags of Popcorn

Similar documents
Homework 3 COMS 4705 Fall 2017 Prof. Kathleen McKeown

Deep Learning for NLP

GloVe: Global Vectors for Word Representation 1

(COM4513/6513) Week 1. Nikolaos Aletras ( Department of Computer Science University of Sheffield

Deep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017

Regularization Introduction to Machine Learning. Matt Gormley Lecture 10 Feb. 19, 2018

word2vec Parameter Learning Explained

CS230: Lecture 8 Word2Vec applications + Recurrent Neural Networks with Attention

Semantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing

Machine Learning for natural language processing

Lecture 6: Neural Networks for Representing Word Meaning

Word2Vec Embedding. Embedding. Word Embedding 1.1 BEDORE. Word Embedding. 1.2 Embedding. Word Embedding. Embedding.

Language as a Stochastic Process

Applied Natural Language Processing

Lecture 2: Probability, Naive Bayes

An overview of word2vec

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.

ANLP Lecture 22 Lexical Semantics with Dense Vectors

NEURAL LANGUAGE MODELS

Adaptive Multi-Compositionality for Recursive Neural Models with Applications to Sentiment Analysis. July 31, 2014

Deep Learning for NLP Part 2

Classification: Analyzing Sentiment

Data Mining 2018 Logistic Regression Text Classification

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

From perceptrons to word embeddings. Simon Šuster University of Groningen

Deep Learning Basics Lecture 10: Neural Language Models. Princeton University COS 495 Instructor: Yingyu Liang

Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology

DISTRIBUTIONAL SEMANTICS

Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology

Feature Engineering, Model Evaluations

arxiv: v3 [cs.cl] 30 Jan 2016

Naïve Bayes, Maxent and Neural Models

Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology

Algorithms for NLP. Language Modeling III. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley

Information Retrieval

Instructions for NLP Practical (Units of Assessment) SVM-based Sentiment Detection of Reviews (Part 2)

Neural networks CMSC 723 / LING 723 / INST 725 MARINE CARPUAT. Slides credit: Graham Neubig

a) b) (Natural Language Processing; NLP) (Deep Learning) Bag of words White House RGB [1] IBM

IN FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning

Classification: Analyzing Sentiment

Deep Learning For Mathematical Functions

Text mining and natural language analysis. Jefrey Lijffijt

Inexact Search is Good Enough

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Random Coattention Forest for Question Answering

Tutorial on Methods for Interpreting and Understanding Deep Neural Networks. Part 3: Applications & Discussion

Machine Learning for natural language processing

Embeddings Learned By Matrix Factorization

Machine Learning for NLP

Chapter 10: Information Retrieval. See corresponding chapter in Manning&Schütze

Information Retrival Ranking. tf-idf. Eddie Aronovich. October 30, 2012

Overview Today: From one-layer to multi layer neural networks! Backprop (last bit of heavy math) Different descriptions and viewpoints of backprop

Probabilistic Information Retrieval

The representation of word and sentence

16 The Information Retrieval "Data Model"

Natural Language Processing (CSEP 517): Text Classification

Maxent Models and Discriminative Estimation

Neural Networks for NLP. COMP-599 Nov 30, 2016

The Noisy Channel Model and Markov Models

Introduction to Machine Learning Midterm Exam

Natural Language Processing with Deep Learning CS224N/Ling284

Online Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions?

Introduction to Machine Learning Midterm Exam Solutions

Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields

TDDD43. Information Retrieval. Fang Wei-Kleiner. ADIT/IDA Linköping University. Fang Wei-Kleiner ADIT/IDA LiU TDDD43 Information Retrieval 1

Machine Learning Algorithm. Heejun Kim

Retrieval by Content. Part 2: Text Retrieval Term Frequency and Inverse Document Frequency. Srihari: CSE 626 1

Prediction of Citations for Academic Papers

Statistical NLP Spring A Discriminative Approach

Deep Learning for Natural Language Processing

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from

Fast Logistic Regression for Text Categorization with Variable-Length N-grams

Transmogrification: The Magic of Feature Engineering Leah McGuire and Mayukh Bhaowal

Natural Language Processing

Categorization ANLP Lecture 10 Text Categorization with Naive Bayes

ANLP Lecture 10 Text Categorization with Naive Bayes

arxiv: v2 [cs.cl] 1 Jan 2019

Linear and Logistic Regression. Dr. Xiaowei Huang

STA141C: Big Data & High Performance Statistical Computing

Dealing with Text Databases

CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning

Word Embeddings 2 - Class Discussions

Bowl Maximum Entropy #4 By Ejay Weiss. Maxent Models: Maximum Entropy Foundations. By Yanju Chen. A Basic Comprehension with Derivations

text classification 3: neural networks

What is Text mining? To discover the useful patterns/contents from the large amount of data that can be structured or unstructured.

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber

Deep Learning for NLP

Gaussian and Linear Discriminant Analysis; Multiclass Classification

CS224n: Natural Language Processing with Deep Learning 1

Probability Review and Naïve Bayes

11/3/15. Deep Learning for NLP. Deep Learning and its Architectures. What is Deep Learning? Advantages of Deep Learning (Part 1)

Dr. Marc Pendzich Institute of Musicology University of Hamburg. Dr. Daniel Müllensiefen Goldsmiths College University of London

Multi-theme Sentiment Analysis using Quantified Contextual

Machine Learning. Lecture 04: Logistic and Softmax Regression. Nevin L. Zhang

INTRODUCTION TO DATA SCIENCE

Ranked Retrieval (2)

Bayesian Learning (II)

Natural Language Processing

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang

Term Weighting and Vector Space Model. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze

Transcription:

Sentiment Analysis via and Natural Language Processing Tarleton State University July 16, 2015

Data Description

Sentiment Score tf-idf NDSI AFINN List word score invincible 2 mirthful 3 flops -2 hypocritical -2 upset -2 overlooked -1 hooligans -2 welcome 2

Sentiment Score tf-idf NDSI AFINN Score Probability Densities by Sentiment Bayes Classifier, AUC =.770

Sentiment Score tf-idf NDSI Count up the number of times each term occurs in a review no good war great bad... Review 1 8 0 1 0 3... Review 2 2 0 1 1 0... Review 3 4 1 0 0 1... Review 4 4 1 0 1 0... Review 5 4 0 0 0 0... Review 6 3 3 0 0 1............

Sentiment Score tf-idf NDSI Count up the number of times each term occurs in a review Using AFINN list to start no good war great bad... Review 1 8 0 1 0 3... Review 2 2 0 1 1 0... Review 3 4 1 0 0 1... Review 4 4 1 0 1 0... Review 5 4 0 0 0 0... Review 6 3 3 0 0 1............

Sentiment Score tf-idf NDSI Count up the number of times each term occurs in a review Using AFINN list to start no good war great bad... Review 1 8 0 1 0 3... Review 2 2 0 1 1 0... Review 3 4 1 0 0 1... Review 4 4 1 0 1 0... Review 5 4 0 0 0 0... Review 6 3 3 0 0 1............

Sentiment Score tf-idf NDSI Random Forest Classifier, AUC =.886

Sentiment Score tf-idf NDSI Text Frequency Inverse Document Frequency Compare a term s relevance in a document to the inverse of its relevance in a collection of documents

Sentiment Score tf-idf NDSI Text Frequency Inverse Document Frequency Compare a term s relevance in a document to the inverse of its relevance in a collection of documents The more frequently a term occurs in a document, the more relevant it is to that document

Sentiment Score tf-idf NDSI Text Frequency Inverse Document Frequency Compare a term s relevance in a document to the inverse of its relevance in a collection of documents The more frequently a term occurs in a document, the more relevant it is to that document The more frequently a term occurs in a collection of documents, the less relevant it is to each document in the collection

Sentiment Score tf-idf NDSI Text Frequency Inverse Document Frequency tf (t, d) = n(t d)

Sentiment Score tf-idf NDSI Text Frequency Inverse Document Frequency idf (t, D) = log tf (t, d) = n(t d) D {d D : t d}

Sentiment Score tf-idf NDSI Text Frequency Inverse Document Frequency idf (t, D) = log tf (t, d) = n(t d) D {d D : t d} tfidf (t, d, D) = tf (t, d) idf (t, D)

Sentiment Score tf-idf NDSI Text Frequency Inverse Document Frequency AUC =.883

Sentiment Score tf-idf NDSI Feature Extraction A priori feature extraction tends to perform poorly for simple analyses

Sentiment Score tf-idf NDSI Feature Extraction A priori feature extraction tends to perform poorly for simple analyses It s typically better to learn features from the data themselves

Sentiment Score tf-idf NDSI Term Frequency Word Frequency movie 125,307 film 113,054 one 77,447 like 59,147 just 53,132 good 43,279..

Sentiment Score tf-idf NDSI Difference in Term Frequencies Word Freq (Pos) Freq (Neg) Difference movie 18,139 23,668 5,529 bad 1,830 7,089 5,259 great 6,294 2,601 3,693 just 7,098 10,535 3,437 even 4,899 7,604 2,705 worst 246 2,436 2,190....

Sentiment Score tf-idf NDSI Normalized Difference Sentiment Index NDSI := n(t 1) n(t 0) n(t 1) + n(t 0)

Sentiment Score tf-idf NDSI Normalized Difference Sentiment Index NDSI := (n(t 1) + α) (n(t 0) + α) (n(t 1) + α) + (n(t 0) + α)

Sentiment Score tf-idf NDSI Normalized Difference Sentiment Index NDSI := n(t 1) n(t 0) n(t 1) + n(t 0) + 2α

Sentiment Score tf-idf NDSI Normalized Difference Sentiment Index NDSI := n(t 1) n(t 0) n(t 1) + n(t 0) + 2α

Sentiment Score tf-idf NDSI Normalized Difference Sentiment Index Word Freq (Pos) Freq (Neg) Difference NDSI worst 246 2,436 2,190.745 waste 94 1,351 1,257.739 poorly 0 620 620.708 lame 0 618 618.691 awful 159 1,441 1,282.691 mess 0 498 498.660.....

Sentiment Score tf-idf NDSI Normalized Difference Sentiment Index, AUC =.919 tf-idf, AUC =.904

Word2vec Doc2vec Word Vectors Vector representation of words word v i = [v i1, v i2,..., v in ] V R N Relative word meanings reflected in vector representations

Word2vec Doc2vec Word Vectors Distributional hypothesis: Two words appear in similar contexts iff they share similar meaning

Word2vec Doc2vec Word Vectors Distributional hypothesis: Two words appear in similar contexts iff they share similar meaning Context similarity: If two words appear in similar contexts, then their vector representations are similar, i.e. P( v i c) P( v j c) = v i v j

Word2vec Doc2vec Word Vectors Distributional hypothesis: Two words appear in similar contexts iff they share similar meaning Context similarity: If two words appear in similar contexts, then their vector representations are similar, i.e. P( v i c) P( v j c) = v i v j Distributional hypothesis + context similarity = If two words share similar meaning, then their vector representations are similar

Word2vec Doc2vec Word Vectors

Word2vec Doc2vec One Word Contexts (Bigrams) Multinomial Logistic (Softmax) Regression e βj v i P( v j v i ) = V e β k v i k Generalizable into multi-word contexts

Word2vec Doc2vec Word Similarity What do we mean by similar?

Word2vec Doc2vec Word Similarity What do we mean by similar? Cosine Similarity

Word2vec Doc2vec Word Similarity What do we mean by similar? Cosine Similarity sim( v i, v j ) = v i v j v i v j

Word2vec Doc2vec Word Similarity What do we mean by similar? Cosine Similarity sim( v i, v j ) = v i v j v i v j In [16]: model.most_similar( physics ) Out[16]: [(u quantum, 0.5752027034759521), (u laws, 0.45106104016304016), (u scientific, 0.43514519929885864), (u engineering, 0.4271385669708252), (u gravity, 0.42456042766571045), (u theory, 0.41807645559310913), (u mechanics, 0.3903239369392395)]

Word2vec Doc2vec Analogies Relative meanings and word relationships preserved in vector representations

Word2vec Doc2vec Analogies Relative meanings and word relationships preserved in vector representations MAN : KING :: WOMAN :???

Word2vec Doc2vec Analogies Relative meanings and word relationships preserved in vector representations MAN : KING :: WOMAN :??? v king v man x v woman

Word2vec Doc2vec MAN : KING :: WOMAN :??? In [17]: model.most_similar(positive = [ king, woman ], negative = [ man ])

Word2vec Doc2vec MAN : KING :: WOMAN :??? In [17]: model.most_similar(positive = [ king, woman ], negative = [ man ]) Out[17]: [(u queen, 0.3589944541454315), (u princess, 0.33725661039352417), (u arthur, 0.2945181727409363), (u mistress, 0.29320359230041504), (u france, 0.2916792035102844), (u lion, 0.29003939032554626), (u throne, 0.2894885540008545), (u kong, 0.2762626111507416), (u kingdom, 0.26161640882492065), (u prince, 0.26111793518066406)]

Word2vec Doc2vec Document Vectors Combine word vectors into one document vector

Word2vec Doc2vec Document Vectors Combine word vectors into one document vector f ({ v 1, v 2,... v k }) = d

Word2vec Doc2vec Document Vectors Combine word vectors into one document vector f ({ v 1, v 2,... v k }) = d Document vectors live in the same space as word vectors

Word2vec Doc2vec Document Vectors Combine word vectors into one document vector f ({ v 1, v 2,... v k }) = d Document vectors live in the same space as word vectors bad not good = v bad v not good

Word2vec Doc2vec Document Vectors Combine word vectors into one document vector f ({ v 1, v 2,... v k }) = d Document vectors live in the same space as word vectors bad not good = v bad v not good

Word2vec Doc2vec Document Vectors Combine word vectors into one document vector f ({ v 1, v 2,... v k }) = d Document vectors live in the same space as word vectors bad not good = v bad v not good Syntax trees

Word2vec Doc2vec Syntax Trees S IP IP NP VP but IP All models are wrong, NP VP some models are useful.

Word2vec Doc2vec References. Kaggle. https://www.kaggle.com/c/word2vec-nlp-tutorial AFINN list. F Nielson. The Technical University of Denmark. http://www2.imm.dtu.dk/pubdb/views/publication details.php?id=6010 Tf-idf Weighting. The Stanford Group. C Manning, P Raghavan. H Schütze, http://nlp.stanford.edu/ir-book/html/htmledition/tf-idf-weighting-1.html Deep Learning for (Without Magic). R Socher, Y Bengio, C Manning. The Stanford Group. http://nlp.stanford.edu/courses/naacl2013/ word2vec Parameter Learning Explained. X Rong. arxiv:1411.2738v1. Thank you!