Multiple Aspect Ranking Using the Good Grief Algorithm. Benjamin Snyder and Regina Barzilay MIT

Size: px

Start display at page:

Download "Multiple Aspect Ranking Using the Good Grief Algorithm. Benjamin Snyder and Regina Barzilay MIT"

Leona Barker
6 years ago
Views:

1 Multiple Aspect Ranking Using the Good Grief Algorithm Benjamin Snyder and Regina Barzilay MIT

2 From One Opinion To Many Much previous work assumes one opinion per text. (Turney 2002; Pang et al 2002; Pang & Lee 2005) Real texts contain multiple, related, opinions. The food was a little greasy, but it was priced pretty well. Our only complaint was the service after our order was taken. Food Price Service

4 Multiple Aspect Opinion The Task: Analysis Predict writer s opinion on a fixed set of aspects (e.g. food, service, price etc) using a fixed scale (e.g. from 1-5). Simple Approach: Treat each aspect as an independent ranking (rating) problem.

5 Shortcomings of Independent Ranking Multiple opinions in a single text are correlated. Real text relates opinions in coherent, meaningful ways. The food was a little greasy, but it was priced pretty well. Our only complaint was the service after our order was taken. Service < Food < Price

6 Shortcomings of Independent Ranking Multiple opinions in a single text are correlated. Real text relates opinions in coherent, meaningful ways. The food was a little greasy, but it was priced pretty well. Our only complaint was the service after our order was taken. Service < Food < Price

7 Shortcomings of Independent Ranking Multiple opinions in a single text are correlated. Real text relates opinions in coherent, meaningful ways. The food was a little greasy, but it was priced pretty well. Our only complaint was the service after our order was taken. Service < Food < Price Independent ranking fails to model correlations and to exploit discourse cues.

8 Key Goals Build on previous success of simple linear models trained with Perceptron in NLP: Simple, fast training High performance Exact, fast decoding Extend framework to tasks with complex label dependencies: Task-specific dependency space Label dependencies sensitive to input features Factorization of label prediction and dependency models

9 Our Idea: The Model Individual ranking model for each aspect. Add a meta-model which predicts discourse relations between aspects. e.g., Service < Food < Price (order) Service = Food = Price ~[Service = Food = Price] (agreement) (disagreement)

10 Our Idea: The Model Individual ranking model for each aspect. Add a meta-model which predicts discourse relations between aspects. e.g., Service < Food < Price (order) { Service = Food = Price ~[Service = Food = Price] (agreement) Binary Agreement Model (disagreement) }

11 Our Idea: Learning and Inference Resolve differences between individual rankers and meta-model using overall confidence of all model components. Meta-model glues together individual ranking decisions in coherent way. Optimize individual ranker parameters to operate with meta-model through joint Perceptron updates.

12 Meta-Model Food Service Price Ambience The restaurant was a bit uneven. Although the food was tasty, the window curtains blocked out all sunlight.

13 Basic Ranking Framework (Crammer & Singer 2001) Goal: Assign each input Model: weight vector: a rank in x R n {1,..., k} w R n boundaries divide real line into k segments: b = (b 1,..., b k 1 )

14 Rank Decoding Input: x b 0 = b 1 b 2 b 3 b 4 b 5 = Output:

15 Rank Decoding Input: x score(x) = w x b 0 = b 1 b 2 b 3 b 4 b 5 = Output:

16 Rank Decoding Input: x score(x) = w x b 0 = b 1 b 2 b 3 b 4 b 5 = Output: ŷ = 3

17 Multiple Aspect Ranking Each input x has m aspects. e.g. reviews rate products for different qualities -- food, service, ambience etc. Associated with each input vector. (The component is the rank for aspect.) is a rank x r r i {1,..., k} i r =< 5, 3, 4, 4, 4 > food service...

18 Joint Ranking Model Ranking model for each aspect i : (w[i], b[i]) Linear agreement model to a R n predict unanimity across aspects: sign(a x) Combine predictions of individual rankers and agreement model: Introduce grief terms and choose joint rank which minimizes their sum. }Component Models

19 Grief of Component Models g i (x, r i ) : Measure of dissatisfaction of ranking model with rank input x. r i aspect for i th g a (x, r) : Measure of dissatisfaction of agreement model with rank vector r for input x.

20 Grief of i th Ranking Model g i (x, r) = min c c s.t. b[i] r 1 score i (x) + c < b[i] r score i (x) b r 2 b r 1 b r

21 Grief of i th Ranking Model g i (x, r) = min c c s.t. b[i] r 1 score i (x) + c < b[i] r c { b r 2 b r 1 b r

22 Grief of Agreement Model g a (x, r) = min c c s.t. [(a x + c > 0) (r 1 = r 2 =... = r m )] [(a x + c 0) (r 1 = r 2 =... = r m )] disagree 0 agree r 1 = r 2 =... = r m

23 Grief of Agreement Model g a (x, r) = min c c s.t. [(a x + c > 0) (r 1 = r 2 =... = r m )] [(a x + c 0) (r 1 = r 2 =... = r m )] disagree {r1 = r2 =... = rm c 0 agree

24 Good Grief Decoding Select joint rank which minimizes total grief: H(x) = arg min r Y m Exact search is linear in number of aspects: O(m) r {1,..., k} m [ g a (x, r) + m i=1 g i (x, r i ) ]

25 Disagreement with high confidence food: b 1 b 2 b 3 ambience: b 1 b 2 b 3 disagree 0 Agreement model agree

26 Disagreement with high confidence c { food: b 1 b 2 b 3 ambience: b 1 b 2 b 3 disagree 0 Agreement model agree

27 Disagreement with low confidence food: b 1 b 2 b 3 ambience: b 1 b 2 b 3 disagree 0 Agreement model agree

28 Disagreement with low confidence food: b 1 b 2 b 3 ambience: b 1 b 2 b 3 c { disagree 0 Agreement model agree

29 Joint Learning Idea: optimize individual ranker parameters for Good Grief Decoding. 1. Train agreement model on corpus. 2. Incorporate Grief Minimization into online learning procedure for rankers: Jointly decode each training instance. Simultaneously update all rankers.

30 Joint Online Learning Input : (x 1, y 1 ),..., (x T, y T ), Agreement model a, Decoder defintion H(x). Initialize : Set w[i] 1 = 0, b[i] 1 1,..., b[i] 1 k 1 = 0, b[i]1 k =, i 1...m. Loop : For t = 1, 2,..., T : 1. Get a new instance x t R n. 2. Predict ŷ t = H(x; w t, b t, a). 3. Get a new label y t. 4. For aspect i = 1,..., m: If ŷ[i] t y[i] t update model: 4.a For r = 1,..., k 1 : If y[i] t r then y[i] t r = 1 else y[i] t r = 1. 4.b For r = 1,..., k 1 : If (ŷ[i] t r)y[i] t r 0 then τ[i] t r = y[i] t r else τ[i] t r = 0. 4.c Update w[i] t+1 w[i] t + ( r τ[i]t r)x t. For r = 1,..., k 1 update : b[i] t+1 r b[i] t r τ[i] t r. Output : H(x; w T +1, b T +1, a). Update rule based on (Crammer & Singer 2001)

31 Joint Online Learning Input : (x 1, y 1 ),..., (x T, y T ), Agreement model a, Decoder defintion H(x). Initialize : Set w[i] 1 = 0, b[i] 1 1,..., b[i] 1 k 1 = 0, b[i]1 k =, i 1...m. Loop : For t = 1, 2,..., T : 1. Get a new instance x t R n. Grief Minimization 2. Predict ŷ t = H(x; w t, b t, a). 3. Get a new label y t. 4. For aspect i = 1,..., m: If ŷ[i] t y[i] t update model: 4.a For r = 1,..., k 1 : If y[i] t r then y[i] t r = 1 else y[i] t r = 1. 4.b For r = 1,..., k 1 : If (ŷ[i] t r)y[i] t r 0 then τ[i] t r = y[i] t r else τ[i] t r = 0. 4.c Update w[i] t+1 w[i] t + ( r τ[i]t r)x t. For r = 1,..., k 1 update : b[i] t+1 r b[i] t r τ[i] t r. Output : H(x; w T +1, b T +1, a). Update rule based on (Crammer & Singer 2001)

32 Joint Online Learning Input : (x 1, y 1 ),..., (x T, y T ), Agreement model a, Decoder defintion H(x). Initialize : Set w[i] 1 = 0, b[i] 1 1,..., b[i] 1 k 1 = 0, b[i]1 k =, i 1...m. Loop : For t = 1, 2,..., T : 1. Get a new instance x t R n. Grief Minimization 2. Predict ŷ t = H(x; w t, b t, a). 3. Get a new label y t. 4. For aspect i = 1,..., m: If ŷ[i] t y[i] t update model: Simultaneous Updates 4.a For r = 1,..., k 1 : If y[i] t r then y[i] t r = 1 else y[i] t r = 1. 4.b For r = 1,..., k 1 : If (ŷ[i] t r)y[i] t r 0 then τ[i] t r = y[i] t r else τ[i] t r = 0. 4.c Update w[i] t+1 w[i] t + ( r τ[i]t r)x t. For r = 1,..., k 1 update : b[i] t+1 r b[i] t r τ[i] t r. Output : H(x; w T +1, b T +1, a). Update rule based on (Crammer & Singer 2001)

33 Joint Online Learning Input : (x 1, y 1 ),..., (x T, y T ), Agreement model a, Decoder defintion H(x). Initialize : Set w[i] 1 = 0, b[i] 1 1,..., b[i] 1 k 1 = 0, b[i]1 k =, i 1...m. Loop : For t = 1, 2,..., T : 1. Get a new instance x t R n. Grief Minimization 2. Predict ŷ t = H(x; w t, b t, a). 3. Get a new label y t. 4. For aspect i = 1,..., m: If ŷ[i] t y[i] t update model: Simultaneous Updates { 4.a For r = 1,..., k 1 : If y[i] t r then y[i] t r = 1 else y[i] t r = 1. 4.b For r = 1,..., k 1 : If (ŷ[i] t r)y[i] t r 0 then τ[i] t r = y[i] t r else τ[i] t r = 0. 4.c Update w[i] t+1 w[i] t + ( r τ[i]t r)x t. For r = 1,..., k 1 update : b[i] t+1 r b[i] t r τ[i] t r. Output : H(x; w T +1, b T +1, a). Update rule based on (Crammer & Singer 2001)

34 Feature Representation Each review represented as binary feature vector. Features track presence or absence of words and word bigrams. About 30,000 total features. Lexical features previously found effective for: Sentiment Analysis (Wiebe 2000; Pang et al 2004) Discourse Analysis (Marcu & Echihabi 2002)

35 Evaluation 4,500 restaurant reviews ( 3,500 / 500 / 500 random split into training, development, and test data. Average review length: 115 words. Each review ranks restaurant with respect to: food, service, ambience, value, and overall experience on a scale of 1-5. Average Rank Loss : T t=1 ˆr t r t /T

36 Performance of Agreement Model Majority Baseline (disagreement): 58% Agreement Model accuracy: 67% According to Good Grief Criterion: Raw accuracy not what matters, rather accuracy as function of confidence. As confidence goes up, so does accuracy

37 Accuracy of Agreement Model

38 Accuracy of Agreement Model 33% of data with highest confidence classified at 80% accuracy.

39 Accuracy of Agreement Model 10% of data with highest confidence classified at 90% accuracy.

40 Baselines PRANK: Independent rankers for each aspect trained using PRank algorithm (Crammer & Singer 2001) MAJORITY: < 5, 5, 5, 5, 5 > SIM: Joint model using cosine similarity between aspects (Basilico & Hofmann 2004) score i (x) = w[i] x + j sim(i, j)(w[j] x) GG DECODE: Good Grief decoding but independent training

41 Average Rank Loss on Test Set Food Service Value Atmosphere Experience Total majority prank sim GG decode * GG train+decode 0.534* 0.622* 0.644* 0.774* * * = Statistically Significant improvement over closest rival using Fisher Sign Test.

42 Average Rank Loss Agreement Disagreement prank GG train+decode Cases of Disagreement: 58% of corpus relative reduction in error: 1% Cases of Agreement: 42% of corpus relative reduction in error: 22%

43 Technical Contributions Novel framework for tasks with complex label dependencies: simple, fast, exact, and accurate Explicit Meta-Model: task-specific dependency spaces features tailored for dependency prediction joint Perceptron updates for label predictors

44 Conclusions & Future Work Applied Good Grief framework to Multiple Aspect Sentiment Analysis: Agreement Model guides aspect rank predictions Outperform all baseline models. Future Work: apply GG framework to other tasks classification, regression etc more complex label dependency spaces

45 Data and Code available: Thank You!

47 Model Expressivity

48 Model Expressivity Ambience Fully Implicit Opinions: Our only complaint was the service after our order was taken

49 Model Expressivity Ambience Fully Implicit Opinions: Our only complaint was the service after our order was taken

50 Model Expressivity Ambience Fully Implicit Opinions: Our only complaint was the service after our order was taken One Opinion expressed in terms of another: The food was good, but not the ambience

51 Model Expressivity Ambience Fully Implicit Opinions: Our only complaint was the service after our order was taken One Opinion expressed in terms of another: The food was good, but not the ambience

52 Decoding The restaurant was a bit uneven. Although the food was tasty, the window curtains blocked out all sunlight. Food Ambience Agreement

53 Decoding The restaurant was a bit uneven. Although the food was tasty, the window curtains blocked out all sunlight. Food Ambience Agreement

54 Decoding The restaurant was a bit uneven. Although the food was tasty, the window curtains blocked out all sunlight. Food Ambience Agreement

55 Decoding The restaurant was a bit uneven. Although the food was tasty, the window curtains blocked out all sunlight. Food Ambience Agreement

56 Decoding The restaurant was a bit uneven. Although the food was tasty, the window curtains blocked out all sunlight. Food Food Ambience Ambience X Agreement

57 Online Training The restaurant was a bit uneven. Although the food was tasty, the window curtains blocked out all sunlight. Food Ambience Agreement

58 Online Training The restaurant was a bit uneven. Although the food was tasty, the window curtains blocked out all sunlight. Food X Food Ambience Ambience Agreement

59 Online Training The restaurant was a bit uneven. Although the food was tasty, the window curtains blocked out all sunlight. Food X Food Ambience Ambience Agreement

60 Online Training The restaurant was a bit uneven. Although the food was tasty, the window curtains blocked out all sunlight. Food Food Ambience Ambience Agreement

Multiple Aspect Ranking for Opinion Analysis. Benjamin Snyder

Multiple Aspect Ranking for Opinion Analysis by Benjamin Snyder Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Master