Collabora've Filtering

Size: px

Start display at page:

Download "Collabora've Filtering"

Nora Waters
5 years ago
Views:

1 Collabora've Filtering EECS 349 Machine Learning Bongjun Kim Fall, 2015

2 What is CollaboraCve Filtering? RecommendaCon system Amazon recommends items based on your purchase history and racngs RecommendaCon Purchase history

3 What is CollaboraCve Filtering? RecommendaCon system Amazon recommends items based on your purchase history and racngs View history RecommendaCon

4 What is CollaboraCve Filtering? Task: How do I predict what you ll like? Two approaches User- based: You will like item A because users who are similar to you like item A. Item- based: You will like item A because you like items that are similar to item A.

5 User- Based CollaboraCve Filtering Find users that is similar to you and you might like the item the user likes A I like.. - Star wars - Star Trek - Mission Impossible B I like.. - Star wars - Star Trek - Mission Impossible - X- men B is a user who has similar preference to A. So A would like X- men too!!

6 Item- Based CollaboraCve Filtering You might like items that are similar to items you already like A I like Star wars! Star Trek is a movie similar to Star Wars because it has star in the name. Then, A would like Star Trek too! Do you think A would also like Dancing with the Star?

7 Feature SelecCon Measuring similarity (of users or items) requires measuring their features. Which features should I measure? Are there features that are (relacvely) insensicve to the parcculars of the recommendacon tasks? User racngs to items or their purchase history is one of the explicit features to measure user preference

8 USER- BASED COLLABORATIVE FILTERING

9 How do we find a user who is similar? Distance (or similarity) measure N- dimensional space Example: movie racngs of 3 users RaCngs from 1 (dislike) to 5 (like) Harry Poaer Star Wars U1 U2 U Harry poaer U1 U Star wars U2

10 Which similarity measure to use? p- norm Manhaaan Euclidian Pearson CorrelaCon Cosine Similarity Etc..

11 Who is the most similar to John? Example #1 IncepCon Begin again Once Brian Bob Cathy John Manhaaan Distance: (John, Brian) = =1 (John, Bob) = =9 (John, Cathy) = = 6 Q: Does Manhaaan Distance measure similarices properly in this data set?

12 Who is the most similar to Adam? Example #2 IncepCon Begin again Once Star wars Bill Brian Adam Manhaaan Distance: (Adam, Bill) = =4 (Adam, Brian) = = 6 Q: Does Manhaaan Distance measure similarices properly in this data set? Different users may use different racng scales

13 Who is the most similar to Adam? Bill Brian Adam 1 0 IncepCon Begin again Once Star wars - Manhaaan Distance: (Adam, Bill) = =4 (Adam, Brian) = = 6 Q: Does Manhaaan Distance measure similarices properly in this data set? Different users may use different racng scales

14 Pearson CorrelaCon Measure of correlacon between two variables Pearson correlacon coefficient Range (- 1, 1) A perfect posicve correlacon: 1 A perfect negacve correlacon: - 1, ) ( ) ( ) )( ( ), ( 2, 2,,, = C i i C i i C i i i r r r r r r r r sim v v u u v v u u v u In Python, >> import scipy.stats >> scipy.stats.pearsonr(array1, array2)

15 Cosine Similarity Measure of similarity between two vectors Range from - 1 (opposite) to 1 (same) Cosine similarity between vector a and b: sim(a, b) = a b a b

16 Who is the most similar to Adam? Example #2 IncepCon Begin again Once Star wars Bill Brian Adam Pearson CorrelaCon: (Adam, Bill) = - 1 (Adam, Brian) = 1 Q: Does Pearson CorrelaCon measure similarices properly in this data set?

17 How to predict racngs to unrated items User- based K- Nearest Neighbor CollaboraCve Filtering 1) Define a similarity measure 2) Pick k users that had similar preferences to those of current user 3) Compute a prediccon from a weighted average of k nearest neighbors racngs (see the next slide) You need to do experiments to find opkmal k value.

18 How to predict racngs to unrated items PredicCon for the racng of user a for item p. RaCng of user b for item p pred(a, p) = r a + b k sim(a, b) ( r r ) b, p b sim(a, b) b k User a s average racng Similarity between user a and user b

19 Let s praccce user- based k- NN CF In this praccce and our homework, we will use much simpler way to compute a prediccon of racng 1) Define a similarity measure 2) Pick k users that had similar preferences to those of current user 3) Pick the mode of the top k nearest neighbors as the predicted ra'ng - ex) If you pick 3 neighbors and their ra'ngs to the target item are (2, 2, 3), then the predic'on will be 2.

20 PracCce: User- based k- NN CF (k=1) Example #1: How would John rate Star wars? IncepCon Begin again Once Star wars Brian Bob Cathy John 5 1 2? Manhaaan Distance: (John, Brian) = =1 (John, Bob) = =9 (John, Cathy) = = 6 The nearest neighbor: Brian John s racng to Star wars: 4

21 PracCce: User- based k- NN CF (k=1) Example #2: How would John rate Avatar? IncepCon Begin again Once Star wars Avatar Brian Bob Cathy John ? Manhaaan Distance: (John, Brian) = =5 (John, Bob) = = 6 (John, Cathy) = = 4 The nearest neighbor: Cathy John s racng to Avatar: 1 Pearson CorrelaCon Coefficient (John, Brian) = (John, Bob) = 1.0 (John, Cathy) = 0.95 The nearest neighbor: Bob John s racng to Avatar: 2

22 ITEM- BASED COLLABORATIVE FILTERING

23 How to predict racngs to unrated items Item- based K- Nearest Neighbor CollaboraCve Filtering 1) Define a similarity measure between items 2) Pick k items rated by the current user similar to the target item 3) Compute a prediccon from a weighted average of the k similar items racngs

24 Let s praccce item- based k- NN CF In this praccce and our homework, we will use much simpler way to compute a prediccon of racng 1) Define a similarity measure between items 2) Pick k items rated by the current user similar to the target item 3) Pick the mode of the top k nearest neighbors as the predicted ra'ng - ex) If you picked 3 items and current user s ra'ngs to the 3 items are (2, 2, 3), then the predic'on will be 2.

25 PracCce: Item- based k- NN CF (k=1) Example #1 IncepCon Begin again Once Star wars Brian Bob Cathy John 5 1 2? Manhaaan Distance: (Star wars, IncepCon) = =3 (Star wars, Begin again) = =5 (Star wars, Once) = = 6 The most similar item to Star wars: IncepCon John s racng to Star wars: 5

26 The Cold Start Problem What if this user has never rated anything before? What if nobody has rated this item before? AddiConal informacon. For example, Ask users to rate some inical items Demographic informacon for users Content analysis or metadata for items

27 Missing values Missing values in user- racng matrix What if two users have rated different sets of things? How do we compare them? What if two items have been rated by disjoint sets of users? How do we compare them?

28 Dealing with missing values Example IncepCon Begin again Once Star wars Avatar Brian 2? 3? 4 Bob Cathy 5? John 5? 2 3?

29 Dealing with missing values Example IncepCon Begin again Once Star wars Avatar Brian Bob Cathy John ?

30 Dealing with missing values Discarding the person/item from comparison? It does not solve cold start problem What if the data set is so sparse? Pulng in a crazy number (- 1000) for missing values? Pulng in a random number? Pulng in a mean (median) value? Mean value of what set? Other advanced imputacon technique?

31 Make a decision Which similarity (or distance) measure to use? How many neighbors to pick? How to weight neighbors chosen? User- based or item- based? How to deal with missing values?

Collaborative Filtering

Collaborative Filtering Nicholas Ruozzi University of Texas at Dallas based on the slides of Alex Smola & Narges Razavian Collaborative Filtering Combining information among collaborating entities to make