Classification: Analyzing Sentiment

Similar documents
Classification: Analyzing Sentiment

Linear classifiers: Logistic regression

Least Squares Classification

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Stephen Scott.

Performance Evaluation

CSC 411: Lecture 03: Linear Classification

Evaluation requires to define performance measures to be optimized

Diagnostics. Gad Kimmel

Linear Classifiers. Michael Collins. January 18, 2012

Evaluation. Andrea Passerini Machine Learning. Evaluation

Multiple Aspect Ranking Using the Good Grief Algorithm. Benjamin Snyder and Regina Barzilay MIT

CS 5522: Artificial Intelligence II

Ad Placement Strategies

Learning Theory Continued

Uncertainty in prediction. Can we usually expect to get a perfect classifier, if we have enough training data?

Bayesian Decision Theory

Categorization ANLP Lecture 10 Text Categorization with Naive Bayes

ANLP Lecture 10 Text Categorization with Naive Bayes

CS 343: Artificial Intelligence

Security Analytics. Topic 6: Perceptron and Support Vector Machine

Lecture 3 Classification, Logistic Regression

Linear classifiers: Overfitting and regularization

Decision Trees: Overfitting

CS 188: Artificial Intelligence. Outline

IMBALANCED DATA. Phishing. Admin 9/30/13. Assignment 3: - how did it go? - do the experiments help? Assignment 4. Course feedback

Learning Theory. Machine Learning CSE546 Carlos Guestrin University of Washington. November 25, Carlos Guestrin

Errors, and What to Do. CS 188: Artificial Intelligence Fall What to Do About Errors. Later On. Some (Simplified) Biology

CS 188: Artificial Intelligence Fall 2011

Classification with Perceptrons. Reading:

Online Advertising is Big Business

Machine Learning. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 8 May 2012

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri

Tackling the Poor Assumptions of Naive Bayes Text Classifiers

Information Retrieval and Organisation

Performance Metrics for Machine Learning. Sargur N. Srihari

(COM4513/6513) Week 1. Nikolaos Aletras ( Department of Computer Science University of Sheffield

Applied Natural Language Processing

The Naïve Bayes Classifier. Machine Learning Fall 2017

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Classification, Linear Models, Naïve Bayes

Logistic regression for conditional probability estimation

MIRA, SVM, k-nn. Lirong Xia

CS 188: Artificial Intelligence. Machine Learning

Natural Language Processing (CSEP 517): Text Classification

Performance evaluation of binary classifiers

Introduction to AI Learning Bayesian networks. Vibhav Gogate

IN Pratical guidelines for classification Evaluation Feature selection Principal component transform Anne Solberg

Large-Margin Thresholded Ensembles for Ordinal Regression

Classification objectives COMS 4771

Models, Data, Learning Problems

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16

Applied Machine Learning Annalisa Marsico

Day 3: Classification, logistic regression

Computational Learning

Lecture 4 Discriminant Analysis, k-nearest Neighbors

PAC-learning, VC Dimension and Margin-based Bounds

CS 188: Artificial Intelligence Spring Today

Machine Learning for NLP

Final Exam. 1 True or False (15 Points)

CS 188: Artificial Intelligence Fall 2008

General Naïve Bayes. CS 188: Artificial Intelligence Fall Example: Overfitting. Example: OCR. Example: Spam Filtering. Example: Spam Filtering

Classification Based on Probability

Text Categorization CSE 454. (Based on slides by Dan Weld, Tom Mitchell, and others)

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning

Machine Learning Linear Classification. Prof. Matteo Matteucci

Classification. Team Ravana. Team Members Cliffton Fernandes Nikhil Keswaney

Online Passive-Aggressive Algorithms. Tirgul 11

PAC-learning, VC Dimension and Margin-based Bounds

Statistical NLP Spring A Discriminative Approach

1 Probabilities. 1.1 Basics 1 PROBABILITIES

Decision Trees. Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1

Deconstructing Data Science

Logarithms and Exponentials

A Course in Machine Learning

Is our computers learning?

CSE446: Naïve Bayes Winter 2016

( )( b + c) = ab + ac, but it can also be ( )( a) = ba + ca. Let s use the distributive property on a couple of

Loss Functions, Decision Theory, and Linear Models

Jeffrey D. Ullman Stanford University

Bias-Variance in Machine Learning

Bayes Classifiers. CAP5610 Machine Learning Instructor: Guo-Jun QI

Linear classifiers Lecture 3

Natural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley

Linear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7.

Classification: Naïve Bayes. Nathan Schneider (slides adapted from Chris Dyer, Noah Smith, et al.) ENLP 19 September 2016

1 Probabilities. 1.1 Basics 1 PROBABILITIES

Midterm Exam, Spring 2005

Article from. Predictive Analytics and Futurism. July 2016 Issue 13

COMS 4721: Machine Learning for Data Science Lecture 13, 3/2/2017

Ranked Retrieval (2)

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

CS229 Supplemental Lecture notes

Regularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline

Machine Learning and Data Mining. Bayes Classifiers. Prof. Alexander Ihler

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University

Supervised Learning. George Konidaris

Model Accuracy Measures

Logistic Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Evaluating Classifiers. Lecture 2 Instructor: Max Welling

Transcription:

Classification: Analyzing Sentiment STAT/CSE 416: Machine Learning Emily Fox University of Washington April 17, 2018 Predicting sentiment by topic: An intelligent restaurant review system 1

4/16/18 It s a big day & I want to book a table at a nice Japanese restaurant Seattle has many sushi restaurants What are people saying about the food? the ambiance?... 3 Positive reviews not positive about everything Experience Sample review: Watching the chefs create incredible edible art made the experience very unique. My wife tried their ramen and it was pretty forgettable. All the sushi was delicious! Easily best sushi in Seattle. 4 2

From reviews to topic sentiments All reviews for restaurant Novel intelligent restaurant review app Experience Ramen Sushi Easily best sushi in Seattle. 5 Intelligent restaurant review system All reviews for restaurant Break all reviews into sentences The seaweed salad was just OK, vegetable salad was just ordinary. I like the interior decoration and the blackboard menu on the wall. All the sushi was delicious. My wife tried their ramen and it was pretty forgettable. The sushi was amazing, and the rice is just outstanding. The service is somewhat hectic. Easily best sushi in Seattle. 6 3

Core building block Easily best sushi in Seattle. Sentence Sentiment Classifier Easily best sushi in Seattle. 7 Intelligent restaurant review system All reviews for restaurant All the sushi was delicious. The sushi was amazing, and the rice is just outstanding. Easily best sushi in Seattle. Break Select all reviews sentences into sentences about sushi The seaweed salad was just OK, vegetable salad was just ordinary. I like the interior decoration and the blackboard menu on the wall. All the sushi was delicious. My wife tried their ramen and it was pretty forgettable. The sushi was amazing, and the rice is just outstanding. The service is somewhat hectic. Easily best sushi in Seattle. Sentence Sentiment Classifier Average predictions Sushi Most & Easily best sushi in Seattle. 8 4

Classifier applications Classifier Sentence from review Input: x Sushi was awesome, the food was awesome, but the service was awful. Classifier MODEL Output: y Predicted class 10 5

Spam filtering Not spam Text of email, sender, IP, Spam Input: x Output: y 11 Multiclass classifier Output y has more than 2 categories Education Finance Technology Input: x Webpage Output: y 12 6

4/16/18 Image classification Input: x Output: y Image pixels Predicted object 13 Personalized medical diagnosis Input: x Output: y Healthy Disease Classifier MODEL Cold Flu Pneumonia 14 7

Reading your mind Output y Hammer Inputs x are brain region intensities House 15 Linear classifiers 8

Training Data x Feature extraction h(x) ML model ŷ y ŵ ML algorithm Quality metric 17 Representing classifiers How does it work??? Sentence from review Input: x Classifier MODEL Output: y Predicted class 18 9

List of positive words great, awesome, good, amazing, List of negative words bad, terrible, disgusting, sucks, 19 Sentence from review Input: x Simple threshold classifier Count positive & negative words in sentence If number of positive words > number of negative words: ŷ = Else: ŷ = List of positive words great, awesome, good, amazing, List of negative words bad, terrible, disgusting, sucks, 20 Sushi was great, the food was awesome, but the service was terrible. Simple threshold classifier Count positive & negative words in sentence If number of positive words > number of negative words: ŷ = 1 Else: ŷ = 2 10

Problems with threshold classifier How do we get list of positive/negative words? Words have different degrees of sentiment: - Great > good - How do we weigh different words? Single words are not enough: - Good è Positive - Not good è Negative Addressed by learning a classifier Addressed by more elaborate features 21 A (linear) classifier Will use training data to learn a weight for each word Word Weight good 1.0 great 1.5 awesome 2.7 bad -1.0 terrible -2.1 awful -3.3 restaurant, the, we, where, 0.0 22 11

Scoring a sentence Word Weight good 1.0 great 1.2 awesome 1.7 bad -1.0 terrible -2.1 awful -3.3 restaurant, the, we, where, 0.0 Input x: Sushi was great, the food was awesome, but the service was terrible. 23 Called a linear classifier, because output is weighted sum of input. Word Weight Sentence from review Input: x Simple linear classifier Score(x) = weighted count of words in sentence If Score (x) > 0: ŷ = Else: ŷ = 24 12

More generically Model: ŷ i = sign(score(x i )) Score(x i ) = w 0 h 0 (x i ) + w 1 h 1 (x i ) + + w D h D (x i ) DX = w j h j (x i ) = w T h(x i ) j=0 25 feature 1 = h 0 (x) e.g., 1 feature 2 = h 1 (x) e.g., x[1] = #awesome feature 3 = h 2 (x) e.g., x[2] = #awful or, log(x[7]) x[2] = log(#bad) x #awful or, tf-idf( awful ) feature D+1 = h D (x) some other function of x[1],, x[d] Training Data x Feature extraction h(x) ML model ŷ = sign(ŵ T h(x)) (either -1 or +1) y ŵ ML algorithm Quality metric 26 13

Decision boundaries Suppose only two words had non-zero coefficient Input Coefficient Value w 0 0.0 #awesome w 1 1.0 #awful w 2-1.5 Score(x) = 1.0 #awesome 1.5 #awful #awful 4 3 Sushi was awesome, the food was awesome, but the service was awful. 28 2 1 0 0 1 2 3 4 #awesome 14

Decision boundary example Input Coefficient Value w 0 0.0 #awesome w 1 1.0 #awful w 2-1.5 Score(x) = 1.0 #awesome 1.5 #awful 29 #awful 4 3 2 1 0 Score(x) < 0 1.0 #awesome 1.5 #awful = 0 0 1 2 3 4 Score(x) > 0 #awesome Decision boundary separates + and predictions Decision boundary: effect of changing coefficients Input Coefficient Value w 0 1.0 0.0 #awesome w 1 1.0 #awful w 2-1.5 Score(x) = 1.0 #awesome + 1.5 #awful 30 #awful 4 3 2 1 0 Score(x) < 0 1.0 + 1.0 #awesome 1.5 #awful = 0 1.0 #awesome 1.5 #awful = 0 0 1 2 3 4 Score(x) > 0 #awesome 15

Decision boundary: effect of changing coefficients Input Coefficient Value 1.0 w 0 #awesome w 1 1.0 #awful w 2-3.0-1.5 Score(x) = 1.0 + 1.0 #awesome 3.0 1.5 #awful 31 #awful 4 3 2 1 0 Score(x) < 0 1.0 + 1.0 #awesome 1.5 #awful = 0 0 1 2 3 4 Score(x) > 0 #awesome For more inputs (linear features) #awful x[2] 32 #awesome x[1] #great x[3] Score(x) = w 0 + w 1 #awesome + w 2 #awful + w 3 #great 16

For general features For more general classifiers (not just linear features) è more complicated shapes 33 Training and evaluating a classifier 17

Training Data x Feature extraction h(x) ML model ŷ y ŵ ML algorithm Quality metric 35 Training a classifier = Learning the weights Training set Learn classifier Word Weight good 1.0 awesome 1.7 bad -1.0 awful -3.3 Data (x,y) (Sentence1, ) (Sentence2, ) Test set Evaluate? 36 18

Classification error Learned classifier ŷ = Test example (Sushi (Food was great, OK, ) ) Correct! Mistake! Correct Mistakes 01 01 Hide label 37 Classification error & accuracy Error measures fraction of mistakes error =. - Best possible value is 0.0 Often, measure accuracy - Fraction of correct predictions accuracy=. - Best possible value is 1.0 38 19

What s a good accuracy? What if you ignore the sentence, and just guess? For binary classification: - Half the time, you ll get it right! (on average) è accuracy = For k classes, accuracy = 1/k - for 3 classes, for 4 classes, 40 At the very, very, very least, you should healthily beat random Otherwise, it s (usually) pointless 20

Is a classifier with 90% accuracy good? Depends 2010 data shows: 90% emails sent are spam! Predicting every email is spam gets you 90% accuracy!!! Majority class prediction 41 Amazing performance when there is class imbalance (but silly approach) One class is more common than others Beats random (if you know the majority class) So, always be digging in and asking the hard questions about reported accuracies Is there class imbalance? How does it compare to a simple, baseline approach? - Random guessing - Majority class - Most importantly: What accuracy does my application need? - What is good enough for my user s experience? - What is the impact of the mistakes we make? 42 21

False positives, false negatives, and confusion matrices Types of mistakes Predicted label True label True Positive False Positive Negative (FP) (FN) False Negative Positive (FN) (FP) True Negative 44 22

Cost of different types of mistakes can be different (& high) in some applications False negative False positive Spam filtering Annoying Email lost Medical diagnosis Disease not treated Wasteful treatment 45 Confusion matrix binary classification Predicted label True label True Positive False Negative (FN) False Positive (FP) True Negative 46 23

Confusion matrix multiclass classification Predicted label Healthy Cold Flu Healthy True label Cold Flu 47 Learning curves (again): How much data do I need? 24

How much data does a model need to learn? The more the merrier J - But data quality is most important factor Theoretical techniques can sometimes bound how much data is needed - Typically too loose for practical application - But provide guidance In practice: - More complex models require more data - Empirical analysis can provide guidance 49 Test error vs. amount of training data Test error Bias of model Amount of training data 50 25

More complex models tend to have less bias Sentiment classifier using single words can do OK, but Never classifies correctly: The sushi was not good. More complex model: consider pairs of words (bigrams) 51 Word Weight good +1.5 not good -2.1 Less bias è potentially more accurate, needs more data to learn Models with less bias tend to need more data to learn well, but do better with sufficient data Test error Classifier based on single words Amount of training data 52 26

Summary of classification intro Training Data x Feature extraction h(x) ML model ŷ y ŵ ML algorithm Quality metric 54 27

What you can do now Identify a classification problem and some common applications Describe decision boundaries and linear classifiers Train a classifier Measure its error - Some rules of thumb for good accuracy Interpret the types of error associated with classification Describe the tradeoffs between model bias and data set size 55 28