A Regularized Recommendation Algorithm with Probabilistic Sentiment-Ratings

Size: px

Start display at page:

Download "A Regularized Recommendation Algorithm with Probabilistic Sentiment-Ratings"

Doreen Merryl Knight
5 years ago
Views:

1 A Regularized Recommendation Algorithm with Probabilistic Sentiment-Ratings Filipa Peleja, Pedro Dias and João Magalhães Department of Computer Science Faculdade de Ciências e Tecnologia Universidade Nova de Lisboa

2 Users OBJECTIVE: HOW TO IMPROVE RECOMMENDATIONS WITH USER COMMENTS AND REVIEWS? Products i... M i... m j Recommendation algorithm... j N 3 n Rating: Review: Love it or hate it!... Rating: Review: This is a miserable film. 1

3 PROPOSED SOLUTION Sentiment analysis 1. Corpus representation 2. Orientation and intensity of words 3. Review classification SentiWordNet Semantic Orientation Unrated Reviews Ratings Rated Reviews Sentiment-based recommendation 1. User/movie biases 2. Matrix factorization 3. Recommendations inference User-movie recommendations 2

4 SENTIMENT ANALYSIS CHALLENGES Opinions are written in natural language which implies : - subjectivity; - sarcasm; - irony; - idiomatic expressions; misspelling; etc. The same opinion word may be used in a positive or negative context Negative, Conditional and Comparative expressions 3

5 OPINION WORD ORIENTATION AND INTENSITY Semantic Orientation 1 : hits( word," excellent ") hits(" poor ") SO( word) log 2( ) hits( word," poor ") hits(" excellent ") How positive or negative is an opinion word? SentiWordNet 2 (1) TURNEY, P. 2002, THUMBS UP OR THUMBS DOWN? SEMANTIC ORIENTATION APPLIED TO UNSUPERVISED CLASSIFICATION OF REVIEWS (2) ESULI, A. AND SEBASTIANI, F., 2006, SENTIWORDNET: A PUBLICY AVAILABLE LEXICAL RESOURCE FOR OPINION MINING 4

6 Example Love it or hate it. However, can someone tell me what on earth the last page... word family SO (Google) + SentiWordNet - SentiWordNet love n it nointerest na na na or nointerest na na na hate v it nointerest na na na however r someone N tell V me nointerest na na na what nointerest na na na on nointerest na na na earth n

7 MULTIPLE BERNOULLI CLASSIFICATION p( ra r re ) i i f r 10 L 1 ( ra, re ) f i L i ( ra, re ) i i A CLASSIFIER IS LEARNED FOR EVERY RATING VALUE THUS, FOR EACH RATING VALUE THERE IS A PREDICTION FOR EACH REVIEW THE PREDICTION IS NORMALIZED ACCORDINGLY TO THE PREDICTIONS OF ALL RATINGS RATING RANGE IMDB DATASET IS 1 TO 10 6

8 DATASET REVIEWS FROM IMDB A TOTAL OF 1,729,293 REVIEWS WERE COLLECTED Split #Reviews Description A 335,975 Only to train SA B 335,975 Test SA/Train RS C 417,147 Train RS (no explicit ratings) D 335,976 Train RS E 201,586 Test RS F 102,634 Validate RS 7

9 PERFORMANCE - SENTIMENT ANALYSIS Rating F-score on the IMDb corpus Precision Recall F-Score 8

10 PERFORMANCE - SENTIMENT ANALYSIS 9

11 INFERRED RATINGS IN RECOMMENDATION ALGORITHM Users Original ratings Media Computed by the sentiment analysis algorithm Review-enhanced ratings Review-based recommendation New recommendations

12 RATING MATRIX R ra r 11 1m r n1 r r nm HIGHLY INCOMPLETE SINCE MOST ELEMENTS ARE EMPTY PREDICT AN UNKNOWN RATING: rˆ p. q ui u i USERS AND PRODUCTS REPRESENTED IN THE SAME LATENT FACTOR SPACE WITH A SVD DECOMPOSITION THE RATING MATRIX R ra u u p p 11 1m 11 1m. P. Q u u p p n1 nm n1 nm T T MATRIX FACTORIZATION ENABLES THE ASSESSMENT OF USERS PREFERENCES REGARDING THE PRODUCTS BY CALCULATING THEIR FACTOR REPRESENTATIONS 11

13 RATINGS MATRIX Rra FACTORIZATION WITH BIASES GOAL: MINIMIZE THE PREDICTION ERROR pu, qi r R ui ˆui u i u i [ P, Q] argmin ( r r ) ( p q b b ) ui ra 12

14 RATINGS MATRIX Rra FACTORIZATION WITH BIASES ui ˆui ˆui ˆui u i u i pu, qi r R r Rˆ [ P, Q] argmin ( r r ) ( c r ) ( p q b b ) ui ra ui rev RATINGS INFERRED FROM THE SENTIMENT ANALYSIS FRAMEWORK ARE GIVEN TO THE RS THE REVIEWS ACTUAL RATING ARE KNOWN 13

15 RATINGS MATRIX Rra FACTORIZATION WITH SENTIMENT-BASED REGULARIZATION R R ra, R rev ENRICH THE MATRIX R WITH RATINGS INFERRED FROM REVIEWS WITH KNOWN AND UNKNOWN EXPLICIT RATINGS RREV Rˆ r rˆ cˆ rˆ 2 2 ra arg min ( ui ui) ui ( ui ui) rˆ ui r ˆ ui Rra cui Rrev ( pu qi bu bi ), The confidence level is given by de Sentiment Analysis framework 14

16 RMSE RECOMMENDATIONS: IMDB DATASET Lower RMSE error DB Baseline (671,951 reviews) DB+Ci Sentiment-ratings DB+Ci Probabilistic sentiment-ratings Only Explicit Ratings Inferred Ratings Probabilistic Inferred Ratings 15

17 RMSE RECOMMENDATIONS: IMDB DATASET Sentiment-ratings Probabilistic sentiment-ratings D Baseline (335,976 reviews) D+Ci D+Bi D+BCi Reviews with Unknown ratings 16

18 SUMMARY Achievements: Extraction and sentiment analysis of users reviews Introduced sentiment-based ratings in a recommendation algorithm Next step: alternatives to SentiWordNet semantic orientation metric improve algorithm with opinion targets information 17

19 Thank you for your attention Questions? 18

Matrix Factorization Techniques for Recommender Systems

Matrix Factorization Techniques for Recommender Systems By Yehuda Koren Robert Bell Chris Volinsky Presented by Peng Xu Supervised by Prof. Michel Desmarais 1 Contents 1. Introduction 4. A Basic Matrix