Social/Collaborative Filtering

Size: px

Start display at page:

Download "Social/Collaborative Filtering"

Corey Harrington
5 years ago
Views:

1 Social/Collaborative Filtering

2 Outline Recap SVD vs PCA Collaborative <iltering aka Social recommendation k-nn CF methods classi<ication CF via MF MF vs SGD vs.

3 Dimensionality Reduction and Principle Components Analysis: Recap 3

4 More cartoons 4

5 PCA as matrices: the optimization problem

6 PCA as matrices PC1 2 mixing wts 10,000 pixels 1000 * 10,000,00 x1 x2.. y1 y2.. a1 a2.. am b1 b2 bm ~ v images vij xn yn PC2 vnm V[i,j] = pixel j in image i

7 Poll True or false: the weights for an example should add up to 1. True or false: the weights for a prototype should add up to 1

8 Implementing PCA Also: they are orthogonal to each other! 8

9 PCA FOR MODELING TEXT (SVD = SINGULAR VALUE DECOMPOSITION) 9

A Scalability Problem with PCA Covariance matrix is large in high dimensions With d features covariance matrix is d*d SVD is a closely-related method that can be implemented more ef<iciently in high

10 A Scalability Problem with PCA Covariance matrix is large in high dimensions With d features covariance matrix is d*d SVD is a closely-related method that can be implemented more ef<iciently in high dimensions Don t explicitly compute covariance matrix Instead write docterm matrix X as X=USV T S is k*k where k<<d S is diagonal and S[i,i]=sqrt(λ i ) the i-th eigenvec Columns of V ~= principle components Rows of US ~= embedding for examples 10

11 Recovering latent factors in a matrix U S m terms V T doc term matrix n documents x1 x2.. y1 y2.. s1 0 0 s2 a1 a2.. am b1 b2 bm ~ v11 vij xn yn vnm Docterm[i,j] = TFIDF score of term j in doc i 11

12 SVD example 12

13 The Neatest Little Guide to Stock Market Investing Investing For Dummies, 4th Edition The Little Book of Common Sense Investing: The Only Way to Guarantee Your Fair Share of Stock Market Returns The Little Book of Value Investing Value Investing: From Graham to Buffett and Beyond Rich Dad s Guide to Investing: What the Rich Invest in, That the Poor and the Middle Class Do Not! Investing in Real Estate, 5th Edition Stock Investing For Dummies Rich Dad s Advisors: The ABC s of Real Estate Investing: The Secrets of Finding Hidden Profits Most Investors Miss

15 =

16 InvesKng for real estate Rich Dad s Advisor s: The ABCs of Real Estate Investment

17 The liqle book of common sense inveskng: Neatest LiQle Guide to Stock Market InvesKng

18 My recap: SVD vs PCA Very closely related methods As described here SVD decomp doesn t require a square matrix PCA decomp is always applied to C X which is square and mean-centered You can implement PCA using SVD as a substep People sometimes use the terms interchangeably

19 Outline What is CF? Nearest-neighbor methods for CF One old-school paper: BellCore s movie recommender Some general discussion CF reduced to classi<ication CF reduced to matrix factoring Other uses of matrix factoring in ML

20 WHAT IS COLLABORATIVE FILTERING? AKA social filtering, recommendakon systems,

21 What is collaborative <iltering?

22 What is collaborative <iltering?

23 What is collaborative <iltering?

24 What is collaborative <iltering?

26 What is collaborative <iltering?

27 Other examples of social <iltering.

28 Other examples of social <iltering.

29 Other examples of social <iltering.

30 Other examples of social <iltering.

31 Everyday Examples of Collaborative Filtering... Bestseller lists Top 40 music lists The recent returns shelf at the library Unmarked but well-used paths thru the woods The printer room at work Read any good books lately?... Common insight: personal tastes are correlated: If Alice and Bob both like X and Alice likes Y then Bob is more likely to like Y especially (perhaps) if Bob knows Alice

32 SOCIAL/COLLABORATIVE FILTERING: NEAREST-NEIGHBOR METHODS

33 BellCore s MovieRecommender Recommending And Evaluating Choices In A Virtual Community Of Use. Will Hill, Larry Stead, Mark Rosenstein and George Furnas, Bellcore; CHI 1995 By virtual community we mean "a group of people who share characteriskcs and interact in essence or effect only". In other words, people in a Virtual Community influence each other as though they interacted but they do not interact. Thus we ask: "Is it possible to arrange for people to share some of the personalized informakonal benefits of community involvement without the associated communicakons costs?"

34 MovieRecommender Goals Recommendations should: simultaneously ease and encourage rather than replace social processes...should make it easy to participate while leaving in hooks for people to pursue more personal relationships if they wish. be for sets of people not just individuals...multi-person recommending is often important, for example, when two or more people want to choose a video to watch together. be from people not a black box machine or so-called "agent". tell how much con<idence to place in them, in other words they should include indications of how accurate they are.

35 BellCore s MovieRecommender Participants sent to videos@bellcore.com System replied with a list of 500 movies to rate on a 1-10 scale (250 random, 250 popular) Only subset need to be rated New participant P sends in rated movies via System compares ratings for P to ratings of (a random sample of) previous users Most similar users are used to predict scores for unrated movies (more later) System returns recommendations in an message.

36 Suggested Videos for: John A. Jamus. Your must-see list with predicted rakngs: 7.0 "Alien (1979)" 6.5 "Blade Runner" 6.2 "Close Encounters Of The Third Kind (1977)" Your video categories with average rakngs: 6.7 "AcKon/Adventure" 6.5 "Science FicKon/Fantasy" 6.3 "Children/Family" 6.0 "Mystery/Suspense" 5.9 "Comedy" 5.8 "Drama"

37 The viewing paqerns of 243 viewers were consulted. PaQerns of 7 viewers were found to be most similar. CorrelaKon with target viewer: 0.59 viewer-130 (unlisted@merl.com) 0.55 bullert,jane r (bullert@cc.bellcore.com) 0.51 jan_arst (jan_arst@khdld.decnet.philips.nl) 0.46 Ken Cross (moose@denali.ee.cornell.edu) 0.42 rskt (rskt@cc.bellcore.com) 0.41 kkgg (kkgg@athena.mit.edu) 0.41 bnn (bnn@cc.bellcore.com) By category, their joint rakngs recommend: AcKon/Adventure: "Excalibur" 8.0, 4 viewers "Apocalypse Now" 7.2, 4 viewers "Platoon" 8.3, 3 viewers Science FicKon/Fantasy: "Total Recall" 7.2, 5 viewers Children/Family: "Wizard Of Oz, The" 8.5, 4 viewers "Mary Poppins" 7.7, 3 viewers Mystery/Suspense: "Silence Of The Lambs, The" 9.3, 3 viewers Comedy: "NaKonal Lampoon's Animal House" 7.5, 4 viewers "Driving Miss Daisy" 7.5, 4 viewers "Hannah and Her Sisters" 8.0, 3 viewers Drama: "It's A Wonderful Life" 8.0, 5 viewers "Dead Poets Society" 7.0, 5 viewers "Rain Man" 7.5, 4 viewers CorrelaKon of predicted rakngs with your actual rakngs is: 0.64 This number measures ability to evaluate movies accurately for you means low ability means very good ability means fair ability.

38 BellCore s MovieRecommender Evaluation: Withhold 10% of the ratings of each user to use as a test set Measure correlation between predicted ratings and actual ratings for test-set movie/user pairs

40 BellCore s MovieRecommender Participants sent to videos@bellcore.com System replied with a list of 500 movies to rate New participant P sends in rated movies via System compares ratings for P to ratings of (a random sample of) previous users Most similar users are used to predict scores for unrated movies Empirical Analysis of Predictive Algorithms for Collaborative Filtering Breese, Heckerman, Kadie, UAI98 System returns recommendations in an message.

41 recap: k-nearest neighbor learning Given a test example x: 1. Find the k training-set examples (x1,y1),.,(xk,yk) that are closest to x. 2. Predict the most frequent label in that set.??

42 Breaking it down: To train: save the data To test: Very fast! For each test example x: 1. Find the k training-set examples (x 1,y 1 ),., (x k,y k ) that are closest to x. 2. Predict the most frequent label in that set. PredicKon is relakvely slow... but it doesn t depend on the number of classes, only the number of neighbors

43 recap: k-nearest neighbor learning Given a test example x: 1. Find the k training-set examples (x1,y1),.,(xk,yk) that are closest to x. 2. Predict the most frequent label in that set.??

Mean vote for i is Predicted vote for active user a for j is a weighted sum

44 Algorithms for Collaborative Filtering 1: Memory-Based Algorithms (Breese et al, UAI98) v i,j = vote of user i on item j I i = items for which user i has voted Mean vote for i is Predicted vote for active user a for j is a weighted sum normalizer weights of n similar users weight is based on similarity between user a and i

45 Algorithms for Collaborative Filtering 1: Memory-Based Algorithms (Breese et al, UAI98) K-nearest neighbor 1 w( a, i) = 0 if i neighbors( a) else Pearson correlation coef<icient (Resnick 94, Grouplens): Cosine distance, etc,

46 SOCIAL/COLLABORATIVE FILTERING: TRADITIONAL CLASSIFICATION

47 What are other ways to formulate the collaborative <iltering problem? Treat it like ordinary classi<ication or regression

48 Collaborative + Content Filtering (Basu et al, AAAI98; Condliff et al, AI-STATS99) Joe Carol... 27,M, $70k 53,F, $20k Airplane Matrix comedy, $2M action, $70M Room with a View romance, $25M... Hidalgo... action, $30M Kumar U a 25,M, $22k 48,M, $81k ???

49 Collaborative + Content Filtering As Classification (Basu, Hirsh, Cohen, AAAI98) Classification task: map (user,movie) pair into {likes,dislikes} Training data: known likes/dislikes Test data: active users Features: any properties of user/movie pair Airplane Matrix Room with a View... Hidalgo comedy action romance... action Joe Carol 27,M,70k ,F,20k Kumar 25,M,22k U a 48,M,81k 0 1???

50 Collaborative + Content Filtering As Classification (Basu et al, AAAI98) Examples: genre(u,m), age(u,m), income(u,m),... genre(carol,matrix) = action income(kumar,hidalgo) = 22k/year Features: any properties of user/movie pair (U,M) Airplane Matrix Room with a View... Hidalgo comedy action romance... action Joe Carol 27,M,70k ,F,20k Kumar 25,M,22k U a 48,M,81k 0 1???

51 Collaborative + Content Filtering As Classification (Basu et al, AAAI98) Examples: userswholikedmovie(u,m): userswholikedmovie(carol,hidalgo) = {Joe,...,Kumar} userswholikedmovie(u a, Matrix) = {Joe,...} Features: any properties of user/movie pair (U,M) Airplane Matrix Room with a View... Hidalgo comedy action romance... action Joe Carol 27,M,70k ,F,20k Kumar 25,M,22k U a 48,M,81k 0 1???

52 Collaborative + Content Filtering As Classification (Basu et al, AAAI98) Examples: movieslikedbyuser(m,u): movieslikedbyuser(*,joe) = {Airplane,Matrix,...,Hidalgo} actionmovieslikedbyuser(*,joe)={matrix,hidalgo} Features: any properties of user/movie pair (U,M) Airplane Matrix Room with a View... Hidalgo comedy action romance... action Joe Carol 27,M,70k ,F,20k Kumar 25,M,22k U a 48,M,81k 0 1???

53 Collaborative + Content Filtering As Classification (Basu et al, AAAI98) genre={romance}, age=48, sex=male, income=81k, userswholikedmovie={carol}, movieslikedbyuser={matrix,airplane},... Features: any properties of user/movie pair (U,M) Airplane Matrix Room with a View... Hidalgo comedy action romance... action Joe Carol 27,M,70k ,F,20k Kumar 25,M,22k U a 48,M,81k 1 1???

54 Collaborative + Content Filtering As Classification (Basu et al, AAAI98) genre={romance}, age=48, sex=male, income=81k, userswholikedmovie={carol}, movieslikedbyuser={matrix,airplane},... genre={action}, age=48, sex=male, income=81k, userswholikedmovie = {Joe,Kumar}, movieslikedbyuser={matrix,airplane},... Airplane Matrix Room with a View... Hidalgo comedy action romance... action Joe Carol 27,M,70k ,F,20k Kumar 25,M,22k U a 48,M,81k 1 1???

55 Collaborative + Content Filtering As Classification (Basu et al, AAAI98) Classification learning algorithm: rule learning (RIPPER) If NakedGun33/13 movieslikedbyuser and Joe userswholikedmovie and genre=comedy then predict likes(u,m) If age>12 and age<17 and HolyGrail movieslikedbyuser and director=melbrooks then predict likes(u,m) If Ishtar movieslikedbyuser then predict likes(u,m)

56 Basu et al 98 - results Evaluation: Predict liked(u,m)= M in top quartile of U s ranking from features, evaluate recall and precision Features: Collaborative: UsersWhoLikedMovie, UsersWhoDislikedMovie, MoviesLikedByUser Content: Actors, Directors, Genre, MPAA rating,... Hybrid: ComediesLikedByUser, DramasLikedByUser, UsersWhoLikedFewDramas,... Results: at same level of recall (about 33%) Ripper with collaborative features only is worse than the original MovieRecommender (by about 5 pts precision 73 vs 78) Ripper with hybrid features is better than MovieRecommender (by about 5 pts precision)

57 Matrix Factorization for Collaborative Filtering

58 Recovering latent factors in a matrix m movies v11 n users vij vnm V[i,j] = user i s rating of movie j

59 Recovering latent factors in a matrix m movies m movies x1 x2.. y1 y2.. a1 a2.. am b1 b2 bm ~ v11 n users vij xn yn vnm V[i,j] = user i s rating of movie j

60 talk pilfered from à.. KDD 2011

62 Recovering latent factors in a matrix r m movies m movies x1 x2.. y1 y2.. H a1 a2.. am b1 b2 bm ~ v11 n users W vij V xn yn vnm V[i,j] = user i s rating of movie j

65 Recovering latent factors in a matrix r m movies m movies x1 x2.. y1 y2.. H a1 a2.. am b1 b2 bm ~ v11 n users W vij V xn yn vnm V[i,j] = user i s rating of movie j

66 is like Linear Regression. r features (eg 4) m=1 regressors predictions n instances (e.g., 150) pl1 pw1 sl1 sw1 pl2 pw2 sl2 sw2.... W w1 w2 w3 w4 H ~ y1 yi Y pln pwn yn Y[i,1] = instance i s prediction

67 .. for many outputs at once. r features (eg 4) m regressors predictions n instances (e.g., 150) pl1 pw1 sl1 sw1 pl2 pw2 sl2 sw2.... W w11 w12 w21.. H w31.. w41.. ~ y11 y12 Y ym pln yn1 ynm where we also have to <ind the dataset! Y[I,j] = instance i s prediction for regression task j

68 Matrix factorization as SGD step size

69 Matrix factorization as SGD - why does this work? step size

70 Matrix factorization as SGD - why does this work? Here s the key claim:

71 Checking the claim Think for SGD for logistic regression LR loss = compare y and ŷ = dot(w,x) similar but now update w (user weights) and x (movie weight)

72 What loss functions are possible? generalized KL-divergence

73 What loss functions are possible?

74 What loss functions are possible?

75 ALS = alternating least squares

76 Matrix Multiplications in Machine Learning: MF vs PCA vs SGD vs.

77 Recovering latent factors in a matrix r m movies m movies x1 x2.. y1 y2.. H a1 a2.. am b1 b2 bm ~ v11 n users W vij V xn yn vnm V[i,j] = user i s rating of movie j

78 .. vs k-means (1) indicators for r clusters cluster means original data set M a1 a2.. am b1 b2 bm ~ v11 n examples Z vij X xn yn vnm

79 Matrix multiplication - 1 (Gram matrix) r features (eg 2) transpose of X n instances (e.g., 150) x1 y1 x2 y2.... X x1 x2.. xn y1 y2 yn X T ~ <x1,x1> <xi,xj> V xn yn <xn,xn> V[i,j] = inner product of instances i and j (Gram matrix)

80 Matrix multiplication - 2 (Covariance matrix) r features (eg 2) transpose of X X T x1 y1 a1 a2 am b1 b2 bm x2 y2.... X xn yn ~ v11 v12 v21 v22 C X n C X (i, j) = x t t i x j = n cov(i, j) t=1 assuming mean(x)=0

E T = PCA(X) = Z from the others and C X Z(i,j) is similarity of example i to x1 y1

81 Matrix multiplication - 2 (PCA) V = C X = X T X variance/covariances I think of these as fixed point E = eigenvectors(v) of a process where we predict each feature value X E T = PCA(X) = Z from the others and C X Z(i,j) is similarity of example i to x1 y1 eigenvector j x2 y2.. X.. e11 e12 e21 e22 E T Eigenvecs of C X z1 z2.. Z z1 z2.. xn yn zn zn

82 Matrix multiplication - 2 (PCA) V = C X = X T X variance/covariances E = eigenvectors(v) X E T = PCA(X) = Z K or use E(1:K, :) instead of E x1 y1 x2.. X y2.. e11 e21 E K T top K eigenvecs of C X z1 z2.. Z K xn yn zn

83 Matrix multiplication - 3 (SVD) V = C X = X T X variance/covariances E = eigenvectors(v) X E T = PCA(X) = Z Eigenvecs of C X X = Z E = Z Σ -1 Σ E = U Σ E where U = Z Σ -1 i.e., factored version of X Usually written as X = U Σ V z1 z2.. Z z1 z2.. e11 e21 E e12 e22 x1 x2.. X y1 y2.. zn zn xn yn

84 Matrix multiplication - 3 (SVD) V = C X = X T X variance/covariances E = eigenvectors(v) X E K T = PCA(X) = Z K E K = E(1:K, :) instead of E Σ Eigenvecs of C X X Z K E K Z K Σ -1 Σ E U K Σ E where U K = Z K Σ -1 i.e., factored version of X z1 z2.. U K Σ1 e11 e21 E e12 e22 x1 x2.. X y1 y2.. zn xn yn

85 Matrix multiplication - 2 (SVD) K features with zero covar and unit variances Σ eigenvectors of C X x1 x2.. y1 y2.. Σ1 0 0 Σ2 x1 x2.. xn E y1 y2 yn ~ <x1,x1> n instances U K <xi,xj> X xn yn <xn,xn> original matrix

86 Recovering latent factors in a matrix r m movies m movies x1 x2.. y1 y2.. H a1 a2.. am b1 b2 bm ~ v11 n users W vij V xn yn vnm V[i,j] = user i s rating of movie j

Techniques for Dimensionality Reduction. PCA and Other Matrix Factorization Methods

Techniques for Dimensionality Reduction PCA and Other Matrix Factorization Methods Outline Principle Compoments Analysis (PCA) Example (Bishop, ch 12) PCA as a mixture model variant With a continuous latent