Seeing stars when there aren't many stars Graph-based semi-supervised learning for sentiment categorization

Size: px

Start display at page:

Download "Seeing stars when there aren't many stars Graph-based semi-supervised learning for sentiment categorization"

Alfred Powers
6 years ago
Views:

1 Seeing stars when there aren't many stars Graph-based semi-supervised learning for sentiment categorization Andrew B. Goldberg, Xiaojin Jerry Zhu Computer Sciences Department University of Wisconsin-Madison 1

2 Goodbye Academia, Hello Hollywood! Not quite! 2

3 Sentiment Categorization...captivating... special effects were amazing...?...excellent acting... quite good and believable...?...weak, lame attempts at humor...bland as can be...? 3

4 Sentiment Categorization...captivating... special effects were amazing......excellent acting... quite good and believable......weak, lame attempts at humor...bland as can be... 4

5 Graph-Based Learning This is rating inference [Pang+Lee05] We use graph-based semi-supervised learning Main assumption encoded in graph: Similar documents should have similar ratings [Bo Pang and Lillian Lee Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the ACL.] 5

6 Graph-Based Learning Labeled...captivating... special effects were amazing......excellent acting... quite good and believable......weak, lame attempts at humor... bland as can be... Unlabeled...captivating piece of work from an excellent team......excellent acting makes up for bland, lame scenery......preview quite good, but acting was weak, way off......weak, bad... pathetically lame... acting way off... 6

7 Graph-Based Learning Labeled...captivating... special effects were amazing......excellent acting... quite good and believable......weak, lame attempts at humor... bland as can be... Unlabeled...captivating piece of work from an excellent team......excellent acting makes up for bland, lame scenery......preview quite good, but acting was weak, way off......weak, bad... pathetically lame... acting way off... Truth 7

8 Graph-Based Learning Labeled Unlabeled Truth Supervised...captivating... special effects were amazing......captivating piece of work from an excellent team......excellent acting... quite good and believable......excellent acting makes up for bland, lame scenery......weak, lame attempts at humor... bland as can be......preview quite good, but acting was weak, way off......weak, bad... pathetically lame... acting way off... 8

9 Graph-Based Learning Labeled Unlabeled Truth Supervised...captivating... special effects were amazing......captivating piece of work from an excellent team......excellent acting... quite good and believable......excellent acting makes up for bland, lame scenery......weak, lame attempts at humor... bland as can be......preview quite good, but acting was weak, way off......weak, bad... pathetically lame... acting way off... 9

10 Graph-Based Learning Labeled Unlabeled Truth Supervised...captivating... special effects were amazing......captivating piece of work from an excellent team......excellent acting... quite good and believable......excellent acting makes up for bland, lame scenery......weak, lame attempts at humor... bland as can be......preview quite good, but acting was weak, way off......weak, bad... pathetically lame... acting way off... 10

11 Graph-Based Learning 50% Accuracy Labeled Unlabeled Truth Supervised...captivating... special effects were amazing......captivating piece of work from an excellent team......excellent acting... quite good and believable......excellent acting makes up for bland, lame scenery......weak, lame attempts at humor... bland as can be......preview quite good, but acting was weak, way off......weak, bad... pathetically lame... acting way off... 11

12 Graph-Based Learning Labeled...captivating... special effects were amazing... Unlabeled...captivating piece of work from an excellent team... Truth Semi- Supervised...excellent acting... quite good and believable......excellent acting makes up for bland, lame scenery......weak, lame attempts at humor... bland as can be......preview quite good, but acting was weak, way off......weak, bad... pathetically lame... acting way off... 12

13 Graph-Based Learning Labeled...captivating... special effects were amazing... Unlabeled...captivating piece of work from an excellent team... Truth Semi- Supervised...excellent acting... quite good and believable......excellent acting makes up for bland, lame scenery......weak, lame attempts at humor... bland as can be......preview quite good, but acting was weak, way off......weak, bad... pathetically lame... acting way off... 13

14 Graph-Based Learning 100% Accuracy Labeled...captivating... special effects were amazing... Unlabeled...captivating piece of work from an excellent team... Truth Semi- Supervised...excellent acting... quite good and believable......excellent acting makes up for bland, lame scenery......weak, lame attempts at humor... bland as can be......preview quite good, but acting was weak, way off......weak, bad... pathetically lame... acting way off... 14

15 How it really works 15

16 Goal Assign a discrete numeric rating f(x) to each document x f : x 16

17 Our Approach Get initial predictions using SVM Improve predictions using graph-based SSL Nodes = reviews Edges = assumed relations between reviews Find the optimal f over the graph 17

18 Measuring Loss over the Graph Similar reviews should get similar ratings A Similarity w B f A f B Loss over this edge = w ( f A - f B ) 2 Task: Minimize loss L ( f ) over whole graph 18

19 Semi-Supervised Learning Graph Unlabeled Labeled 19

20 Semi-Supervised Learning Graph Unlabeled Loss L f = Labeled 20

21 Semi-Supervised Learning Graph Unlabeled Loss L f = i U f i y i 2 1 Predicted label y i Labeled 21

22 Semi-Supervised Learning Graph Unlabeled Loss L f = i U f i y i 2 Predicted label y i 1 i L M f i y i 2 M Given label y i Labeled 22

23 Semi-Supervised Learning Graph Unlabeled Loss L f = Predicted label y i 1 i U i L f i y i 2 M f i y i 2 M Given label y i Labeled Minimization is fairly trivial f i = y i or y i 23

24 Semi-Supervised Learning Graph Unlabeled Loss L f = i U f i y i 2 Predicted label y i 1 M a w ij i L i U j knn L i M f i y i 2 aw ij f i f j 2 Given label y i Labeled k neighbors 24

25 Semi-Supervised Learning Graph Unlabeled k' neighbors Loss L f = i U f i y i 2 b w ij Predicted label y i 1 M a w ij i L i U j knn L i M f i y i 2 aw ij f i f j 2 Given label y i Labeled k neighbors i U j k ' NN U i bw ij f i f j 2 25

26 Minimization is now Semi-Supervised Learning non-trivial Graph Unlabeled k' neighbors Loss L f = Predicted label y i 1 M Given label y i Labeled b w ij a w ij k neighbors i U i L i U j knn L i i U j k ' NN U i f i y i 2 M f i y i 2 aw ij f i f j 2 bw ij f i f j 2 26

27 Finding a Closed-Form Solution L f = i U i U j knn L i f i y i 2 i L aw ij f i f j 2 i U j k ' NN U i M f i y i 2 bw ij f i f j 2 27

28 Finding a Closed-Form Yikes! Solution L f = i U i U j knn L i f i y i 2 i L aw ij f i f j 2 i U j k ' NN U i M f i y i 2 bw ij f i f j 2 Can rewrite L ( f ) in matrix notation as: T f y C f yf T f 28

29 Finding a Closed-Form Solution Can rewrite L ( f ) in matrix notation as: T f y C f yf T f 1 y i b w ij M a w ij y i 29

30 Finding a Closed-Form Solution Vector of f values for all reviews Vector of given labels for labeled reviews and predicted labels y i Can rewrite for unlabeled L ( f ) in matrix reviews notation as: y i T f y C f yf T f 1 Labeled Unlabeled y i y i M C = M M 1 All other matrix entries are 0 1 b w ij a w ij

31 Finding a Closed-Form Solution Graph Laplacian matrix Can rewrite L ( f ) in matrix notation as: T f y C f yf T f y i 1 Constant parameter b w ij M a w ij y i 31

32 Graph Laplacian Matrix Assume n labeled and unlabeled documents W = n x n weight matrix D = n x n diagonal degree matrix, where Graph Laplacian matrix is n D ii = j=1 W ij =D W 32

33 Finding a Closed-Form Solution L( f ) in matrix notation: T f y C f y f T f Set gradient to zero and solve for f: f =C 1 C y 33

34 Experiments Task: Predict 1 to 4 star ratings for reviews 4-author data used by [Pang+Lee05] y i Predicted values with SVM light regression using {0,1} word vectors Positive-sentence percentage (PSP) similarity [Pang+Lee05] Tuned parameters with cross validation 34

35 Results 35

36 Results 36

37 Results 37

38 Results 38

39 Results Graph-based SSL outperforms other methods for small labeled set sizes 39

40 Conclusions Adapted graph-based semi-supervised learning to sentiment analysis domain Designed a graph for rating inference Showed benefit of SSL using movie review data Thank you! Any questions? 40

Online Manifold Regularization: A New Learning Setting and Empirical Study

Online Manifold Regularization: A New Learning Setting and Empirical Study Andrew B. Goldberg 1, Ming Li 2, Xiaojin Zhu 1 1 Computer Sciences, University of Wisconsin Madison, USA. {goldberg,jerryzhu}@cs.wisc.edu