6.034 Introduction to Artificial Intelligence

Size: px

Start display at page:

Download "6.034 Introduction to Artificial Intelligence"

Alexandrina Smith
5 years ago
Views:

1 6.34 Introduction to Artificial Intelligence Tommi Jaakkola MIT CSAIL

2 The world is drowning in data...

3 The world is drowning in data access to information is based on recommendations

4 Recommending news feeds Lots of venues (and articles)... challenging to find the few articles that you are actually interested in reading

5 Recommending news feeds Training examples and corresponding ratings news articles... x 1 x 2 x 3 x 4 rating y 1 y 2 y 3 y 4...

6 Recommending news feeds Training examples and corresponding ratings news articles feature vectors x 1 x 2 x 3 x 4 (x 1 ) (x 2 ) (x 4 ) (x 3 ) rating y 1 y 2 y 3 y 4...

7 Recommending news feeds Training examples and corresponding ratings news articles feature vectors x 1 x 2 x 3 x 4 (x 1 ) (x 2 ) (x 4 ) (x 3 ) rating y 1 y 2 y 3 y 4...

8 Articles as feature vectors Does the word order matter? White House officials consulted with the Justice Department in preparing a list of U.S. attorneys who would be removed. (NYT 3/13/7) x

9 Articles as feature vectors Does the word order matter? the with White House officials consulted with the Justice Department in preparing a list of U.S. attorneys who would be removed. (NYT 3/13/7) x bag of words officials House be removed who would list U.S. in Department a Justice of attorneys White consulted preparing

10 Does the word order matter? Not for every task... (Wolf et al. 26)

11 Articles as feature vectors the with White House officials consulted with the Justice Department in preparing a list of U.S. attorneys who would be removed. (NYT 3/13/7) x bag of words officials House be removed who would list U.S. in Department a Justice of attorneys White consulted preparing

12 Articles as feature vectors the with White House officials consulted with the Justice Department in preparing a list of U.S. attorneys who would be removed. (NYT 3/13/7) x bag of words officials House be removed who would list U.S. in Department a Justice of attorneys White consulted preparing

13 Articles as feature vectors White House officials consulted with the Justice Department in preparing a list of U.S. attorneys who would be removed. (NYT 3/13/7) x bag of words the with House officials be removed who would list U.S. in Department a Justice of attorneys White preparing consulted counts 1 1 (x) politics Justice government president House

14 Recommending news feeds A few examples of articles that we d like to read (+1) Potentially a large number of unwanted articles (-1) (x) = 1 1 politics Justice government president House

15 Recommending news feeds A few examples of articles that we d like to read (+1) Potentially a large number of unwanted articles (-1) linear preferences y(x) = (x)+b (x) = 1 1 politics Justice government president House + b =

16 Recommending news feeds Why is the problem challenging? - lots of possible words - only a small subset appears in any particular article - most frequent words are not content words - meaningful classes of articles are typically tied to words that occur relatively infrequently - any two articles in the same meaningful class may have only a few content words in common (x) = 1 1 politics Justice government president House + b =

17 Some tricks We can transform the counts in the feature vectors so as to emphasize more relevant words TFIDF weighting w(x) = TF freq. of word w in doc. x IDF # of docs log # of docs with word w

18 Recommending news feeds linear preferences y(x) = (x)+b (x) = 1 1 politics Justice government president House + b =

19 Recommending news feeds linear preferences y(x) = (x)+b (x) = 1 1 politics Justice government president House J(, b) = + b = nx (y t (x t ) b) 2 t=1 sum over the training examples squared prediction error on each example

20 Linear regression, complexity We can easily obtain (too) complex regression functions by considering different feature mappings 5 5 y y linear x rd order polynomial x y y 5th order polynomial x 1 2 7th order polynomial x

21 Recommending news feeds linear preferences y(x) = (x)+b (x) = 1 1 politics Justice government president House J(, b) = + b = nx (y t (x t ) b) 2 t=1 sum over the training examples squared prediction error on each example

22 Recommending news feeds linear preferences y(x) = (x)+b (x) = 1 1 politics Justice government president House + b = J(, b) = nx (y t (x t ) b) 2 t=1 sum over the training examples squared prediction error on each example + k k 2 regularization term

23 Recommending news feeds linear preferences y(x) = (x)+b (x) = 1 1 politics Justice government president House + b = J(, b) = nx (y t (x t ) b) 2 t=1 sum over the training examples squared prediction error on each example + k k 2 regularization term

24 Today s topics Preface: regression for recommendation problems Collaborative filtering - setup, regression formulation - matrix factorization

25 Collaborative filtering Consider the problem of predicting how n users rate m movies Known ratings (training data) are arranged in a partially filled nxm data matrix The goal is to predict the remaining entries n users m movies

26 Collaborative filtering Consider the problem of predicting how n users rate m movies Known ratings (training data) are arranged in a partially filled nxm data matrix The goal is to predict the remaining entries Basic intuition: similar users can complete each others experience n users m movies

27 Collaborative filtering Consider the problem of predicting how n users rate m movies Known ratings (training data) are arranged in a partially filled nxm data matrix The goal is to predict the remaining entries Basic intuition: similar users can complete each others experience n users m movies

28 Collaborative filtering Consider the problem of predicting how n users rate m movies Known ratings (training data) are arranged in a partially filled nxm data matrix The goal is to predict the remaining entries Basic intuition: similar users can complete each others experience n users Key part of the problem is to couple the estimation tasks across users / movies m movies

29 Collaborative filtering Our goal is to fill the data matrix, i.e., accurately predict values for unobserved entries Computational issues: - a typical matrix is very large, e.g., n=4k, m=17k Statistical issues: m movies - the matrix is very sparse, e.g., 1% known ratings - ratings may be diverse and under-sampled (?) Formulation issues: - many interpretations for missing entries n users

30 Single user predictions We could try to solve the problem separately for each user using simple linear regression models for ratings m movies user i J i ( i )= X j2m i (Y ij i j) 2 + k i k 2 known entries for user i rating matrix user i parameters feature vector for movie j

31 Single user predictions We could try to solve the problem separately for each user using simple linear regression models for ratings m movies user i J i ( i )= X known entries for user i rating matrix But - reasonable feature vectors may be hard to obtain - each user may have only a few ratings - no help from similar users j2m i (Y ij i j) 2 + k i k 2 user i parameters feature vector for movie j

32 Matrix factorization We can approximate the rating matrix as a product of two lower rank matrices Y ij [UV T ] ij

33 Matrix factorization We can approximate the rating matrix as a product of two lower rank matrices Y ij [UV T ] ij min U,V X (Y ij [UV T ] ij ) 2 + kuk 2 F + kv k 2 F ij2d observed entries

34 Matrix factorization We can approximate the rating matrix as a product of two lower rank matrices min U,V Y ij [UV T ] ij X ij2d (Y ij [UV T ] ij ) 2 + the only complexity kukcontrol 2 F + kvwould k 2 F be the rank d observed entries

35 Matrix factorization We can approximate the rating matrix as a product of two lower rank matrices Y ij [UV T ] ij min U,V X (Y ij [UV T ] ij ) 2 + kuk 2 F + kv k 2 F ij2d observed entries

36 Matrix factorization The matrix factorization approach can be interpreted as iteratively solving regression problems for users/movies

37 Matrix factorization The matrix factorization approach can be interpreted as iteratively solving regression problems for users/movies

38 Matrix factorization The matrix factorization approach can be interpreted as iteratively solving regression problems for users/movies J i ( i )= X j:ij2d (Y ij i j) 2 + k i k 2 regression problem for each user with fixed movie features

39 Matrix factorization The matrix factorization approach can be interpreted as iteratively solving regression problems for users/movies J j ( j )= X i:ij2d (Y ij i j) 2 + k j k 2 regression problem for each movie with fixed user features

40 Matrix factorization cont d We can approximate the rating matrix as a product of two lower rank matrices Y ij [UV T ] ij min U,V X (Y ij [UV T ] ij ) T + kuk 2 F + kv k 2 F ij2d observed entries

41 CF and the Netflix Price Progress using different matrix factorization methods Plain With biases With implicit feedback With temporal dynamics (v.1) With temporal dynamics (v.2) RMSE , 1, , 1, 1, Millions of parameters (Koren et al., 29) (to win the price, one had to combine hundreds of different methods)

42 Matrix factorization We try to find the best rank d approximation to the rating matrix based on the observed entries 1 X minimize (Y ij [UV T ] ij ) 2 + kuk F + kv k 2 2 F ij2d where U is n d and V is m d - rank d can be used for complexity control along with the regularization parameter lambda - the optimization problem is not jointly convex in U and V. However, it is convex in U if we fix V, and vice versa - an alternating minimization algorithm, i.e., iteratively solving user / movie regression problems, may get stuck in a locally optimal solution (initialization is important) - algorithms that sequentially add simple rank-1 components at a time are typically better.

Matrix Factorization Techniques for Recommender Systems

Matrix Factorization Techniques for Recommender Systems Patrick Seemann, December 16 th, 2014 16.12.2014 Fachbereich Informatik Recommender Systems Seminar Patrick Seemann Topics Intro New-User / New-Item