Data Mining and Matrices

Size: px
Start display at page:

Download "Data Mining and Matrices"

Transcription

1 Data Mining and Matrices 04 Matrix Completion Rainer Gemulla, Pauli Miettinen May 02, 2013

2 Recommender systems Problem Set of users Set of items (movies, books, jokes, products, stories,...) Feedback (ratings, purchase, click-through, tags,...) Sometimes: metadata (user profiles, item properties,...) Goal: Predict preferences of users for items Ultimate goal: Create item recommendations for each user Example Avatar The Matrix Up Alice? 4 2 Bob 3 2? Charlie 5? 3 2 / 35

3 Outline 1 Collaborative Filtering 2 Matrix Completion 3 Algorithms 4 Summary 3 / 35

4 Collaborative filtering Key idea: Make use of past user behavior No domain knowledge required No expensive data collection needed Allows discovery of complex and unexpected patterns Widely adopted: Amazon, TiVo, Netflix, Microsoft Key techniques: neighborhood models, latent factor models Avatar The Matrix Up Alice? 4 2 Bob 3 2? Charlie 5? 3 Leverage past behavior of other users and/or on other items. 4 / 35

5 A simple baseline m users, n items, m n rating matrix D Revealed entries Ω = { (i, j) rating D ij is revealed }, N = Ω Baseline predictor: b ui = µ + b i + b j µ = 1 N (i,j) Ω D ij is the overall average rating b i is a user bias (user s tendency to rate low/high) bj is an item bias (item s tendency to be rated low/high) Least squares estimates: argmin b (i,j) Ω (D ij µ b i b j ) 2 D Avatar Matrix Up (1.01) (0.34) ( 1.32) Alice? 4 2 (0.32) (4.5) (3.8) (2.1) Bob 3 2? ( 1.34) (2.8) (2.2) (0.5) Charlie 5? 3 (0.99) (5.2) (4.5) (2.8) m = 3 n = 3 Ω = { (1, 2), (1, 3), (2, 1),... } N = 6 µ = 3.17 b 32 = = 4.5 Baseline does not account for personal tastes. 5 / 35

6 When does a user like an item? Neighborhood models (knn): When he likes similar items I I I Find the top-k most similar items the user has rated Combine the ratings of these items (e.g., average) Requires a similarity measure (e.g., Pearson correlation coefficient) is similar to Unrated by Bob predict 4 Bob rated 4 F E AT U RE Latent factor models (LFM): When similarcover users like similar items I I I More holistic approach Users and items are placed in the same latent factor space Position of a user and an item related to preference (via dot products) Serious Braveheart The Color Purple Amadeus Lethal Weapon Geared toward females Sense and Sensibility Ocean s 11 Geared toward males Dave The Lion King The Princess Diaries Dumb and Dumber Independence Day Escapist Gus vector qi ated with a ve i, the elemen which the it positive or n the elements interest the u on the corres tive or negati qit pu, capture u and item i the item s cha user u s rating rui, leading to r ui = qit p major ch 6The / 35

7 Intuitioncover behindfeature latent factor models (1) Geared toward females The Color Purple Sense and Sensibility The Princess Diaries Serious Amadeus Escapist Ocean s 11 Dave The Lion King Independence Day Braveheart Lethal Weapon Gus Geared toward males Dumb and Dumber Figure 2. A simplified illustration of the latent factor approach, which Koren et al., vector q ated with i, the ele which th positive the elem interest on the c tive or n q i T p u, cap u and ite the item user u s r r ui, leadi ˆr ui = The majo ping of e q i, p u 7 / 35

8 Intuition behind latent factor models (2) Does user u like item v? Quality: measured via direction from origin (cos (u, v)) Same direction attraction: cos (u, v) 1 Opposite direction repulsion: cos (u, v) 1 Orthogonal direction oblivious: cos (u, v) 0 Strength: measured via distance from origin ( u v ) Far from origin strong relationship: u v large Close to origin weak relationship: u v small Overall preference: measured via dot product (u v) u v = u v u v = u v cos (u, v) u v Same direction, far out strong attraction: u v large positive Opposite direction, far out strong repulsion: u v large negative Orthogonal direction, any distance oblivious: : u v 0 But how to select dimensions and where to place items and users? Key idea: Pick dimensions that explain the known data well. 8 / 35

9 SVD and missing values Input data Rank-10 truncated SVD 10% of input data Rank-10 truncated SVD SVD treats missing entries as 0. 9 / 35

10 Latent factor models and missing values Input data Rank-10 LFM 10% of input data Rank-10 LFM LFMs ignore missing entries. 10 / 35

11 Latent factor models (simple form) Given rank r, find m r matrix L and r n matrix R such that D ij [LR] ij for (i, j) Ω Least squares formulation (D ij [LR] ij ) 2 min L,R (i,j) Ω Example (r = 1) R Avatar The Matrix Up (2.24) (1.92) (1.18) L Alice? 4 2 (1.98) (4.4) (3.8) (2.3) Bob 3 2? (1.21) (2.7) (2.3) (1.4) Charlie 5? 3 (2.30) (5.2) (4.4) (2.7) L L i R R j D ij D 11 / 35

12 Example: Netflix prize data ( 500k users, 17k movies, 100M ratings) -time ent is ion. odel dence ight to If conted as e cost nt for i 2 (8) Factor vector fe apemes, 1.5 ng for Koren et al., Sophie s Choice Moonstruck Freddy Got Fingered Half Baked Julien Donkey-Boy Kill Bill: Vol. 1 Freddy vs. Jason Natural Born Killers Road Trip I Heart Huckabees Scarface Punch-Drunk Love The Royal Tenenbaums The Longest Yard Being John Malkovich The Fast and the Furious Lost in Translation Belle de Jour Armageddon Catwoman The Wizard of Oz Citizen Kane Annie Hall Coyote Ugly Maid in Manhattan Runaway Bride Stepmom Sister Act The Way We Were The Sound of Music The Waltons: Season Factor vector 1 12 / 35

13 Latent factor models (summation form) Least squares formulation prone to overfitting More general summation form: L = l ij (L i, R j ) + R(L, R), L (i,j) Ω L is global loss L i and R j are user and item parameters, resp. l ij is local loss, e.g., l ij = (D ij [LR] ij ) 2 R is regularization term, e.g., R = λ( L 2 F + R 2 F ) Loss function can be more sophisticated Improved predictors (e.g., include user and item bias) Additional feedback data (e.g., time, implicit feedback) Regularization terms (e.g., weighted depending on amount of feedback) Available metadata (e.g., demographics, genre of a movie) R R j L i D ij D 13 / 35

14 Example: Netflix prize data RMSE cover FEATURE Root mean square error of predictions , ,000 10, ,000 Millions of parameters Plain With biases With implicit feedback With temporal dynamics (v.1) With temporal dynamics (v.2) 1,500 Mcollabor mender datasets s data has s accuracy nearest-n the same pact mem that syste easily. W niques ev that mod rally man data, suc feedback and conf Figure 4. Matrix factorization models accuracy. The plots show the root-mean-square 14 / 35 Koren et al., 2009.

15 Outline 1 Collaborative Filtering 2 Matrix Completion 3 Algorithms 4 Summary 15 / 35

16 The matrix completion problem Complete these matrices! ? ???? 1???? 1???? 1???? Matrix completion is impossible without additional assumptions! Let s assume that underlying full matrix is simple (here: rank 1) When/how can we recover a low-rank matrix from a sample of its entries? 16 / 35

17 Rank minimization Definition (rank minimization problem) Given an n n data matrix D and an index set Ω of revealed entries. The rank minimization problem is minimize rank(x) subject to D ij = X ij (i, j) Ω X R n n. Seeks for simplest explanation fitting the data If unique and sufficient samples, recovers D (i.e., X = D) NP-hard Time complexity of existing rank minimization algorithms double exponential in n (and also slow in practice). 17 / 35

18 Nuclear norm minimization Rank: rank(d) = { σ k (D) > 0 : 1 k n } = n k=1 I σ k (D)>0 Nuclear norm: D = n k=1 σ k(d) Definition (nuclear norm minimization) Given an n n data matrix D and an index set Ω of revealed entries. The nuclear minimization problem is minimize X subject to D ij = X ij (i, j) Ω X R n n. A heuristic for rank minimization Nuclear norm is convex function (thus local optimum is global opt.) Can be optimized (more) efficiently via semidefinite programming. 18 / 35

19 is a random, one-dimensional, affine subspace which, as expected, intersects the nuclear norm ball a rank 1 matrix. Why As further nuclear motivation, norm an interesting minimization? connection exists between the nuclear norm and popular algorithms in Figure 1. Unit ball of the nuclear norm for symmetric 2 2 matrices. The red line depicts a random one-dimensional affine space. Such a subspace will generically intersect a sufficiently large nuclear norm ball at a rank one matrix N y ( ) x y y z x to 2nr. For large problems, this leads to a significan tion in computation time, such that very large instan be solved on a desktop computer. On the other hand mulation (2.4) is nonconvex and thus potentially h minima that are not globally optimal. Nonetheless, Consider SVD of D = UΣV T tored approximation (2.4) of the nuclear norm is on most successful stand-alone approaches to solving flix Prize Unit problem. nuclear 16, 24 Indeed, normit was ball one = of the foun components of the winning team s prediction engin convex combination (σ k ) of rank-1 matrices of unit 2.1. Main results As seen in our first example (2.1), it is impossible to a matrix Frobenius which is equal (Uto k 0 V T in k nearly ) all of its entrie we see all the entries of the matrix. This is particula if the Extreme singular vectors points of a matrix have M have lowmost rank of th concentrated (in figure: in a few rank-1 coordinates. matrices For instance, of c the rank 2 symmetric matrix M given by unit Frobenius norm) Nuclear norm minimization: inflate unit ball as little as where the singular values are arbitrary. Then, thi vanishes possible everywhere to except reachin Dthe ij top-left = X ij 2 2 cor one would basically need to see all the entries of able Solution to recover this lies matrix exactly. extreme There point is an end of examples of this sort. Hence, we arrive at the not the singular of inflated vectors ball need to be (hopefully) sufficiently spread low rank june 2012 vol. 55 no. 6 communications of the Candès and Recht, / 35

20 Relationship to LFMs Recall regularized LFM (L is m r, R is r n): (D ij [LR] ij ) 2 + λ ( L 2 F + ) R 2 F min L,R (i,j) Ω View as matrix completion problem by enforcing D ij = [LR] ij : ( 1 minimize 2 L 2 F + R 2 ) F subject to D ij = X ij (i, j) Ω LR = X. One can show: for r chosen larger than rank of nuclear norm optimum, equivalent to nuclear norm minimization For some intuition, suppose X = UΣV T at optimum L and R: 1 2 ( L 2 F + R 2 ) ( F 1 UΣ 1/2 2 F + Σ1/2 V T 2 ) F 2 = 1 2 n i=1 r k=1 (U2 ik σ k + V 2 ik σ k) = r k=1 σ k = X 20 / 35

21 When can we hope to recover D? (1) Assume D is the 5 5 all-ones matrix (rank 1) ?? 1? 1?????? 1?? 1????? 1?? 1 1???? 1? 1?? 1????? 1 1?? Ok Ok ? 1???? 1 1???? 1??? 1?????? 1?? 1?? 1???? 1? 1???????? 1 Not unique Not unique (column missed) (insufficient samples) Sampling strategy and sample size matter. 21 / 35

22 When can we hope to recover D? (2) Consider the following rank-1 matrices and assume few revealed entries Ok ( incoherent ) Ok ( incoherent ) Bad ( coherent ) Bad ( coherent ) first row required (1, 1)-entry required Properties of D matter. 22 / 35

23 When can we hope to recover D? (3) Exact conditions under which matrix completion works is active research area: Which sampling schemes? (e.g., random, WR/WOR, active) Which sample size? Which matrices? (e.g., incoherent matrices) Noise (e.g., independent, normally distributed noise) Theorem (Candès and Recht, 2009) Let D = UΣV T. If D is incoherent in that max U 2 ij µ B ij n and max V 2 ij µ B ij n for some µ B = O(1), and if rank(d) µ 1 B n1/5, then O(n 6/5 r log n) random samples without replacement suffice to recover D exactly with high probability. Candès and Recht, / 35

24 Outline 1 Collaborative Filtering 2 Matrix Completion 3 Algorithms 4 Summary 24 / 35

25 Overview Latent factor models in practice Millions of users and items Billions of ratings Sometimes quite complex models Many algorithms have been applied to large-scale problems Gradient descent and quasi-newton methods Coordinate-wise gradient descent Stochastic gradient descent Alternating least squares 25 / 35

26 Continuous gradient descent Find minimum θ of function L Pick a starting point θ 0 Compute gradient L (θ 0 ) Walk downhill Differential equation θ(t) t = L (θ(t)) with boundary cond. θ(0) = θ 0 Under certain conditions θ(t) θ q(t) - q * t 26 / 35

27 Discrete gradient descent Find minimum θ of function L Pick a starting point θ 0 Compute gradient L (θ 0 ) Jump downhill Difference equation θ n+1 = θ n ɛ n L (θ n ) Under certain conditions, approximates CGD in that θ n (t) = θ n + steps of size t satisfies the ODE as n q(t) - q * stepfun(px, py) t / 35

28 Gradient descent for LFMs Set θ = (L, R) and write L(θ) = L ij (L i, R j ) R R j Li L(θ) = (i,j) Ω j { j (i,j ) Ω } Li L ij (L i, R j ) L L i D ij GD epoch 1 Compute gradient Initialize zero matrices L and R For each entry (i, j) Ω, update gradients D L i L i + Li L ij (L i, R j ) R j R j + R j L ij (L i, R j ) 2 Update parameters L L ɛ n L R R ɛ n R 28 / 35

29 Computing the gradient (example) Simplest form (unregularized) R R j L ij (L i, R j ) = (D ij L i R j ) 2 Gradient computation Li k L ij (L i, R j ) = { 0 if i i 2R kj (D ij L i R j ) L if i = i L i D ij D Local gradient of entry (i, j) Ω nonzero only on row L i and column R j. 29 / 35

30 Stochastic gradient descent Find minimum θ of function L Pick a starting point θ 0 Approximate gradient ˆL (θ 0 ) Jump approximately downhill Stochastic difference equation θ n+1 = θ n ɛ n ˆL (θ n ) Under certain conditions, asymptotically approximates (continuous) gradient descent q(t) - q * stepfun(px, py) t 30 / 35

31 Stochastic gradient descent for LFMs Set θ = (L, R) and use L(θ) = L ij (L i, R j ) R j R L (θ) = (i,j) Ω (i,j) Ω L ij(l i, R j ) L L i D ij ˆL (θ, z) = NL i z j z (L iz, R jz ), where N = Ω and z = (i z, j z ) Ω D SGD epoch 1 Pick a random entry z Ω 2 Compute approximate gradient ˆL (θ, z) 3 Update parameters θ n+1 = θ n ɛ n ˆL (θ n, z) SGD step affects only current row and column. 4 Repeat N times 31 / 35

32 SGD in practice Step size sequence { ɛ n } needs to be chosen carefully Pick initial step size based on sample (of some rows and columns) Reduce step size gradually Bold driver heuristic: After every epoch Increase step size slightly when loss decreased (by, say, 5%) Decrease step size sharply when loss increased (by, say, 50%) Mean Loss LBFGS SGD ALS Netflix data (unregularized) epoch 32 / 35

33 Outline 1 Collaborative Filtering 2 Matrix Completion 3 Algorithms 4 Summary 33 / 35

34 Lessons learned Collaborative filtering methods learn from past user behavior Latent factor models are best-performing single approach for collaborative filtering But often combined with other methods Users and items are represented in common latent factor space Holistic matrix-factorization approach Similar users/item placed at similar positions Low-rank assumption = few factors influence user preferences Close relationship to matrix completion problem Reconstruct a partially observed low-rank matrix SGD is simple and practical algorithm to solve LFMs in summation form 34 / 35

35 Suggested reading Y. Koren, R. Bell, C. Volinsky Matrix factorization techniques for recommender systems IEEE Computer, 42(8), p , E. Candès, B. Recht Exact matrix completion via convex optimization Communications of the ACM, 55(6), p , And references in the above articles 35 / 35

Matrix Factorization and Collaborative Filtering

Matrix Factorization and Collaborative Filtering 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Matrix Factorization and Collaborative Filtering MF Readings: (Koren et al., 2009)

More information

Approximate SDP solvers, Matrix Factorizations, the Netflix Prize, and PageRank. Mittagseminar Martin Jaggi, Oct

Approximate SDP solvers, Matrix Factorizations, the Netflix Prize, and PageRank. Mittagseminar Martin Jaggi, Oct Approximate SDP solvers, Matrix Factorizations, the Netflix Prize, and PageRank Mittagseminar Martin Jaggi, Oct 6 009 Sparse Approximation The Problem f( ) convex min f(x) x R n x 0 T x = min f(x) X S

More information

CS425: Algorithms for Web Scale Data

CS425: Algorithms for Web Scale Data CS: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS. The original slides can be accessed at: www.mmds.org J. Leskovec,

More information

Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent

Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent KDD 2011 Rainer Gemulla, Peter J. Haas, Erik Nijkamp and Yannis Sismanis Presenter: Jiawen Yao Dept. CSE, UT Arlington 1 1

More information

Collaborative Filtering. Radek Pelánek

Collaborative Filtering. Radek Pelánek Collaborative Filtering Radek Pelánek 2017 Notes on Lecture the most technical lecture of the course includes some scary looking math, but typically with intuitive interpretation use of standard machine

More information

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit

More information

Matrix Factorization Techniques For Recommender Systems. Collaborative Filtering

Matrix Factorization Techniques For Recommender Systems. Collaborative Filtering Matrix Factorization Techniques For Recommender Systems Collaborative Filtering Markus Freitag, Jan-Felix Schwarz 28 April 2011 Agenda 2 1. Paper Backgrounds 2. Latent Factor Models 3. Overfitting & Regularization

More information

Low Rank Matrix Completion Formulation and Algorithm

Low Rank Matrix Completion Formulation and Algorithm 1 2 Low Rank Matrix Completion and Algorithm Jian Zhang Department of Computer Science, ETH Zurich zhangjianthu@gmail.com March 25, 2014 Movie Rating 1 2 Critic A 5 5 Critic B 6 5 Jian 9 8 Kind Guy B 9

More information

Collaborative Filtering Matrix Completion Alternating Least Squares

Collaborative Filtering Matrix Completion Alternating Least Squares Case Study 4: Collaborative Filtering Collaborative Filtering Matrix Completion Alternating Least Squares Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade May 19, 2016

More information

Lecture Notes 10: Matrix Factorization

Lecture Notes 10: Matrix Factorization Optimization-based data analysis Fall 207 Lecture Notes 0: Matrix Factorization Low-rank models. Rank- model Consider the problem of modeling a quantity y[i, j] that depends on two indices i and j. To

More information

Recommender Systems. Dipanjan Das Language Technologies Institute Carnegie Mellon University. 20 November, 2007

Recommender Systems. Dipanjan Das Language Technologies Institute Carnegie Mellon University. 20 November, 2007 Recommender Systems Dipanjan Das Language Technologies Institute Carnegie Mellon University 20 November, 2007 Today s Outline What are Recommender Systems? Two approaches Content Based Methods Collaborative

More information

Matrix Factorization Techniques for Recommender Systems

Matrix Factorization Techniques for Recommender Systems Matrix Factorization Techniques for Recommender Systems By Yehuda Koren Robert Bell Chris Volinsky Presented by Peng Xu Supervised by Prof. Michel Desmarais 1 Contents 1. Introduction 4. A Basic Matrix

More information

* Matrix Factorization and Recommendation Systems

* Matrix Factorization and Recommendation Systems Matrix Factorization and Recommendation Systems Originally presented at HLF Workshop on Matrix Factorization with Loren Anderson (University of Minnesota Twin Cities) on 25 th September, 2017 15 th March,

More information

Collaborative Filtering

Collaborative Filtering Collaborative Filtering Nicholas Ruozzi University of Texas at Dallas based on the slides of Alex Smola & Narges Razavian Collaborative Filtering Combining information among collaborating entities to make

More information

Using SVD to Recommend Movies

Using SVD to Recommend Movies Michael Percy University of California, Santa Cruz Last update: December 12, 2009 Last update: December 12, 2009 1 / Outline 1 Introduction 2 Singular Value Decomposition 3 Experiments 4 Conclusion Last

More information

Collaborative Filtering

Collaborative Filtering Case Study 4: Collaborative Filtering Collaborative Filtering Matrix Completion Alternating Least Squares Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin

More information

Andriy Mnih and Ruslan Salakhutdinov

Andriy Mnih and Ruslan Salakhutdinov MATRIX FACTORIZATION METHODS FOR COLLABORATIVE FILTERING Andriy Mnih and Ruslan Salakhutdinov University of Toronto, Machine Learning Group 1 What is collaborative filtering? The goal of collaborative

More information

Matrix Completion: Fundamental Limits and Efficient Algorithms

Matrix Completion: Fundamental Limits and Efficient Algorithms Matrix Completion: Fundamental Limits and Efficient Algorithms Sewoong Oh PhD Defense Stanford University July 23, 2010 1 / 33 Matrix completion Find the missing entries in a huge data matrix 2 / 33 Example

More information

Recommendation Systems

Recommendation Systems Recommendation Systems Pawan Goyal CSE, IITKGP October 21, 2014 Pawan Goyal (IIT Kharagpur) Recommendation Systems October 21, 2014 1 / 52 Recommendation System? Pawan Goyal (IIT Kharagpur) Recommendation

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 11, 2016 Paper presentations and final project proposal Send me the names of your group member (2 or 3 students) before October 15 (this Friday)

More information

Recommendation Systems

Recommendation Systems Recommendation Systems Pawan Goyal CSE, IITKGP October 29-30, 2015 Pawan Goyal (IIT Kharagpur) Recommendation Systems October 29-30, 2015 1 / 61 Recommendation System? Pawan Goyal (IIT Kharagpur) Recommendation

More information

Recommendation Systems

Recommendation Systems Recommendation Systems Popularity Recommendation Systems Predicting user responses to options Offering news articles based on users interests Offering suggestions on what the user might like to buy/consume

More information

Recommender Systems EE448, Big Data Mining, Lecture 10. Weinan Zhang Shanghai Jiao Tong University

Recommender Systems EE448, Big Data Mining, Lecture 10. Weinan Zhang Shanghai Jiao Tong University 2018 EE448, Big Data Mining, Lecture 10 Recommender Systems Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html Content of This Course Overview of

More information

Matrix Factorization In Recommender Systems. Yong Zheng, PhDc Center for Web Intelligence, DePaul University, USA March 4, 2015

Matrix Factorization In Recommender Systems. Yong Zheng, PhDc Center for Web Intelligence, DePaul University, USA March 4, 2015 Matrix Factorization In Recommender Systems Yong Zheng, PhDc Center for Web Intelligence, DePaul University, USA March 4, 2015 Table of Contents Background: Recommender Systems (RS) Evolution of Matrix

More information

Algorithms for Collaborative Filtering

Algorithms for Collaborative Filtering Algorithms for Collaborative Filtering or How to Get Half Way to Winning $1million from Netflix Todd Lipcon Advisor: Prof. Philip Klein The Real-World Problem E-commerce sites would like to make personalized

More information

Introduction to Computational Advertising

Introduction to Computational Advertising Introduction to Computational Advertising MS&E 9 Stanford University Autumn Instructors: Dr. Andrei Broder and Dr. Vanja Josifovski Yahoo! Research General course info Course Website: http://www.stanford.edu/class/msande9/

More information

Scaling Neighbourhood Methods

Scaling Neighbourhood Methods Quick Recap Scaling Neighbourhood Methods Collaborative Filtering m = #items n = #users Complexity : m * m * n Comparative Scale of Signals ~50 M users ~25 M items Explicit Ratings ~ O(1M) (1 per billion)

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear

More information

Data Science Mastery Program

Data Science Mastery Program Data Science Mastery Program Copyright Policy All content included on the Site or third-party platforms as part of the class, such as text, graphics, logos, button icons, images, audio clips, video clips,

More information

Matrix Factorization Techniques for Recommender Systems

Matrix Factorization Techniques for Recommender Systems Matrix Factorization Techniques for Recommender Systems Patrick Seemann, December 16 th, 2014 16.12.2014 Fachbereich Informatik Recommender Systems Seminar Patrick Seemann Topics Intro New-User / New-Item

More information

Structured matrix factorizations. Example: Eigenfaces

Structured matrix factorizations. Example: Eigenfaces Structured matrix factorizations Example: Eigenfaces An extremely large variety of interesting and important problems in machine learning can be formulated as: Given a matrix, find a matrix and a matrix

More information

A Modified PMF Model Incorporating Implicit Item Associations

A Modified PMF Model Incorporating Implicit Item Associations A Modified PMF Model Incorporating Implicit Item Associations Qiang Liu Institute of Artificial Intelligence College of Computer Science Zhejiang University Hangzhou 31007, China Email: 01dtd@gmail.com

More information

Jeffrey D. Ullman Stanford University

Jeffrey D. Ullman Stanford University Jeffrey D. Ullman Stanford University 2 Often, our data can be represented by an m-by-n matrix. And this matrix can be closely approximated by the product of two matrices that share a small common dimension

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Recommendation. Tobias Scheffer

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Recommendation. Tobias Scheffer Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Recommendation Tobias Scheffer Recommendation Engines Recommendation of products, music, contacts,.. Based on user features, item

More information

Adaptive one-bit matrix completion

Adaptive one-bit matrix completion Adaptive one-bit matrix completion Joseph Salmon Télécom Paristech, Institut Mines-Télécom Joint work with Jean Lafond (Télécom Paristech) Olga Klopp (Crest / MODAL X, Université Paris Ouest) Éric Moulines

More information

Binary Principal Component Analysis in the Netflix Collaborative Filtering Task

Binary Principal Component Analysis in the Netflix Collaborative Filtering Task Binary Principal Component Analysis in the Netflix Collaborative Filtering Task László Kozma, Alexander Ilin, Tapani Raiko first.last@tkk.fi Helsinki University of Technology Adaptive Informatics Research

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!

More information

Low-rank Matrix Completion with Noisy Observations: a Quantitative Comparison

Low-rank Matrix Completion with Noisy Observations: a Quantitative Comparison Low-rank Matrix Completion with Noisy Observations: a Quantitative Comparison Raghunandan H. Keshavan, Andrea Montanari and Sewoong Oh Electrical Engineering and Statistics Department Stanford University,

More information

Collaborative Filtering Applied to Educational Data Mining

Collaborative Filtering Applied to Educational Data Mining Journal of Machine Learning Research (200) Submitted ; Published Collaborative Filtering Applied to Educational Data Mining Andreas Töscher commendo research 8580 Köflach, Austria andreas.toescher@commendo.at

More information

Data Mining and Matrices

Data Mining and Matrices Data Mining and Matrices 6 Non-Negative Matrix Factorization Rainer Gemulla, Pauli Miettinen May 23, 23 Non-Negative Datasets Some datasets are intrinsically non-negative: Counters (e.g., no. occurrences

More information

CS425: Algorithms for Web Scale Data

CS425: Algorithms for Web Scale Data CS: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS. The original slides can be accessed at: www.mmds.org Customer

More information

Rank minimization via the γ 2 norm

Rank minimization via the γ 2 norm Rank minimization via the γ 2 norm Troy Lee Columbia University Adi Shraibman Weizmann Institute Rank Minimization Problem Consider the following problem min X rank(x) A i, X b i for i = 1,..., k Arises

More information

SQL-Rank: A Listwise Approach to Collaborative Ranking

SQL-Rank: A Listwise Approach to Collaborative Ranking SQL-Rank: A Listwise Approach to Collaborative Ranking Liwei Wu Depts of Statistics and Computer Science UC Davis ICML 18, Stockholm, Sweden July 10-15, 2017 Joint work with Cho-Jui Hsieh and James Sharpnack

More information

a Short Introduction

a Short Introduction Collaborative Filtering in Recommender Systems: a Short Introduction Norm Matloff Dept. of Computer Science University of California, Davis matloff@cs.ucdavis.edu December 3, 2016 Abstract There is a strong

More information

Matrix and Tensor Factorization from a Machine Learning Perspective

Matrix and Tensor Factorization from a Machine Learning Perspective Matrix and Tensor Factorization from a Machine Learning Perspective Christoph Freudenthaler Information Systems and Machine Learning Lab, University of Hildesheim Research Seminar, Vienna University of

More information

Decoupled Collaborative Ranking

Decoupled Collaborative Ranking Decoupled Collaborative Ranking Jun Hu, Ping Li April 24, 2017 Jun Hu, Ping Li WWW2017 April 24, 2017 1 / 36 Recommender Systems Recommendation system is an information filtering technique, which provides

More information

Techniques for Dimensionality Reduction. PCA and Other Matrix Factorization Methods

Techniques for Dimensionality Reduction. PCA and Other Matrix Factorization Methods Techniques for Dimensionality Reduction PCA and Other Matrix Factorization Methods Outline Principle Compoments Analysis (PCA) Example (Bishop, ch 12) PCA as a mixture model variant With a continuous latent

More information

Matrix completion: Fundamental limits and efficient algorithms. Sewoong Oh Stanford University

Matrix completion: Fundamental limits and efficient algorithms. Sewoong Oh Stanford University Matrix completion: Fundamental limits and efficient algorithms Sewoong Oh Stanford University 1 / 35 Low-rank matrix completion Low-rank Data Matrix Sparse Sampled Matrix Complete the matrix from small

More information

CS 175: Project in Artificial Intelligence. Slides 4: Collaborative Filtering

CS 175: Project in Artificial Intelligence. Slides 4: Collaborative Filtering CS 175: Project in Artificial Intelligence Slides 4: Collaborative Filtering 1 Topic 6: Collaborative Filtering Some slides taken from Prof. Smyth (with slight modifications) 2 Outline General aspects

More information

Domokos Miklós Kelen. Online Recommendation Systems. Eötvös Loránd University. Faculty of Natural Sciences. Advisor:

Domokos Miklós Kelen. Online Recommendation Systems. Eötvös Loránd University. Faculty of Natural Sciences. Advisor: Eötvös Loránd University Faculty of Natural Sciences Online Recommendation Systems MSc Thesis Domokos Miklós Kelen Applied Mathematics MSc Advisor: András Benczúr Ph.D. Department of Operations Research

More information

Recommender systems, matrix factorization, variable selection and social graph data

Recommender systems, matrix factorization, variable selection and social graph data Recommender systems, matrix factorization, variable selection and social graph data Julien Delporte & Stéphane Canu stephane.canu@litislab.eu StatLearn, april 205, Grenoble Road map Model selection for

More information

Linear dimensionality reduction for data analysis

Linear dimensionality reduction for data analysis Linear dimensionality reduction for data analysis Nicolas Gillis Joint work with Robert Luce, François Glineur, Stephen Vavasis, Robert Plemmons, Gabriella Casalino The setup Dimensionality reduction for

More information

Nonnegative Matrix Factorization

Nonnegative Matrix Factorization Nonnegative Matrix Factorization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Recovering any low-rank matrix, provably

Recovering any low-rank matrix, provably Recovering any low-rank matrix, provably Rachel Ward University of Texas at Austin October, 2014 Joint work with Yudong Chen (U.C. Berkeley), Srinadh Bhojanapalli and Sujay Sanghavi (U.T. Austin) Matrix

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Low-rank matrix recovery via convex relaxations Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Collaborative Recommendation with Multiclass Preference Context

Collaborative Recommendation with Multiclass Preference Context Collaborative Recommendation with Multiclass Preference Context Weike Pan and Zhong Ming {panweike,mingz}@szu.edu.cn College of Computer Science and Software Engineering Shenzhen University Pan and Ming

More information

Large-scale Information Processing, Summer Recommender Systems (part 2)

Large-scale Information Processing, Summer Recommender Systems (part 2) Large-scale Information Processing, Summer 2015 5 th Exercise Recommender Systems (part 2) Emmanouil Tzouridis tzouridis@kma.informatik.tu-darmstadt.de Knowledge Mining & Assessment SVM question When a

More information

Restricted Boltzmann Machines for Collaborative Filtering

Restricted Boltzmann Machines for Collaborative Filtering Restricted Boltzmann Machines for Collaborative Filtering Authors: Ruslan Salakhutdinov Andriy Mnih Geoffrey Hinton Benjamin Schwehn Presentation by: Ioan Stanculescu 1 Overview The Netflix prize problem

More information

Machine Learning and Data Mining. Dimensionality Reduction; PCA & SVD. Kalev Kask

Machine Learning and Data Mining. Dimensionality Reduction; PCA & SVD. Kalev Kask Machine Learning and Data Mining Dimensionality Reduction; PCA & SVD Kalev Kask Motivation High-dimensional data Images of faces Text from articles All S&P 500 stocks Can we describe them in a simpler

More information

Chapter 2. Optimization. Gradients, convexity, and ALS

Chapter 2. Optimization. Gradients, convexity, and ALS Chapter 2 Optimization Gradients, convexity, and ALS Contents Background Gradient descent Stochastic gradient descent Newton s method Alternating least squares KKT conditions 2 Motivation We can solve

More information

Preliminaries. Data Mining. The art of extracting knowledge from large bodies of structured data. Let s put it to use!

Preliminaries. Data Mining. The art of extracting knowledge from large bodies of structured data. Let s put it to use! Data Mining The art of extracting knowledge from large bodies of structured data. Let s put it to use! 1 Recommendations 2 Basic Recommendations with Collaborative Filtering Making Recommendations 4 The

More information

Spectral k-support Norm Regularization

Spectral k-support Norm Regularization Spectral k-support Norm Regularization Andrew McDonald Department of Computer Science, UCL (Joint work with Massimiliano Pontil and Dimitris Stamos) 25 March, 2015 1 / 19 Problem: Matrix Completion Goal:

More information

CSC 576: Variants of Sparse Learning

CSC 576: Variants of Sparse Learning CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in

More information

arxiv: v2 [cs.ir] 14 May 2018

arxiv: v2 [cs.ir] 14 May 2018 A Probabilistic Model for the Cold-Start Problem in Rating Prediction using Click Data ThaiBinh Nguyen 1 and Atsuhiro Takasu 1, 1 Department of Informatics, SOKENDAI (The Graduate University for Advanced

More information

Collaborative Filtering: A Machine Learning Perspective

Collaborative Filtering: A Machine Learning Perspective Collaborative Filtering: A Machine Learning Perspective Chapter 6: Dimensionality Reduction Benjamin Marlin Presenter: Chaitanya Desai Collaborative Filtering: A Machine Learning Perspective p.1/18 Topics

More information

From Compressed Sensing to Matrix Completion and Beyond. Benjamin Recht Department of Computer Sciences University of Wisconsin-Madison

From Compressed Sensing to Matrix Completion and Beyond. Benjamin Recht Department of Computer Sciences University of Wisconsin-Madison From Compressed Sensing to Matrix Completion and Beyond Benjamin Recht Department of Computer Sciences University of Wisconsin-Madison Netflix Prize One million big ones! Given 100 million ratings on a

More information

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information

Parallel Numerical Algorithms

Parallel Numerical Algorithms Parallel Numerical Algorithms Chapter 6 Structured and Low Rank Matrices Section 6.3 Numerical Optimization Michael T. Heath and Edgar Solomonik Department of Computer Science University of Illinois at

More information

Data Mining and Matrices

Data Mining and Matrices Data Mining and Matrices 08 Boolean Matrix Factorization Rainer Gemulla, Pauli Miettinen June 13, 2013 Outline 1 Warm-Up 2 What is BMF 3 BMF vs. other three-letter abbreviations 4 Binary matrices, tiles,

More information

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big

More information

Quick Introduction to Nonnegative Matrix Factorization

Quick Introduction to Nonnegative Matrix Factorization Quick Introduction to Nonnegative Matrix Factorization Norm Matloff University of California at Davis 1 The Goal Given an u v matrix A with nonnegative elements, we wish to find nonnegative, rank-k matrices

More information

Matrix Factorizations: A Tale of Two Norms

Matrix Factorizations: A Tale of Two Norms Matrix Factorizations: A Tale of Two Norms Nati Srebro Toyota Technological Institute Chicago Maximum Margin Matrix Factorization S, Jason Rennie, Tommi Jaakkola (MIT), NIPS 2004 Rank, Trace-Norm and Max-Norm

More information

1-bit Matrix Completion. PAC-Bayes and Variational Approximation

1-bit Matrix Completion. PAC-Bayes and Variational Approximation : PAC-Bayes and Variational Approximation (with P. Alquier) PhD Supervisor: N. Chopin Bayes In Paris, 5 January 2017 (Happy New Year!) Various Topics covered Matrix Completion PAC-Bayesian Estimation Variational

More information

Machine Learning and Data Mining. Dimensionality Reduction; PCA & SVD. Prof. Alexander Ihler

Machine Learning and Data Mining. Dimensionality Reduction; PCA & SVD. Prof. Alexander Ihler + Machine Learning and Data Mining Dimensionality Reduction; PCA & SVD Prof. Alexander Ihler Mo#va#on High-dimensional data Images of faces Text from articles All S&P 500 stocks Can we describe them in

More information

SCMF: Sparse Covariance Matrix Factorization for Collaborative Filtering

SCMF: Sparse Covariance Matrix Factorization for Collaborative Filtering SCMF: Sparse Covariance Matrix Factorization for Collaborative Filtering Jianping Shi Naiyan Wang Yang Xia Dit-Yan Yeung Irwin King Jiaya Jia Department of Computer Science and Engineering, The Chinese

More information

Matrix Completion for Structured Observations

Matrix Completion for Structured Observations Matrix Completion for Structured Observations Denali Molitor Department of Mathematics University of California, Los ngeles Los ngeles, C 90095, US Email: dmolitor@math.ucla.edu Deanna Needell Department

More information

Collaborative Filtering on Ordinal User Feedback

Collaborative Filtering on Ordinal User Feedback Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Collaborative Filtering on Ordinal User Feedback Yehuda Koren Google yehudako@gmail.com Joseph Sill Analytics Consultant

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

8.1 Concentration inequality for Gaussian random matrix (cont d)

8.1 Concentration inequality for Gaussian random matrix (cont d) MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration

More information

Data Mining and Matrices

Data Mining and Matrices Data Mining and Matrices 05 Semi-Discrete Decomposition Rainer Gemulla, Pauli Miettinen May 16, 2013 Outline 1 Hunting the Bump 2 Semi-Discrete Decomposition 3 The Algorithm 4 Applications SDD alone SVD

More information

Generative Models for Discrete Data

Generative Models for Discrete Data Generative Models for Discrete Data ddebarr@uw.edu 2016-04-21 Agenda Bayesian Concept Learning Beta-Binomial Model Dirichlet-Multinomial Model Naïve Bayes Classifiers Bayesian Concept Learning Numbers

More information

Matrix Factorization & Latent Semantic Analysis Review. Yize Li, Lanbo Zhang

Matrix Factorization & Latent Semantic Analysis Review. Yize Li, Lanbo Zhang Matrix Factorization & Latent Semantic Analysis Review Yize Li, Lanbo Zhang Overview SVD in Latent Semantic Indexing Non-negative Matrix Factorization Probabilistic Latent Semantic Indexing Vector Space

More information

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms François Caron Department of Statistics, Oxford STATLEARN 2014, Paris April 7, 2014 Joint work with Adrien Todeschini,

More information

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Schütze s, linked from http://informationretrieval.org/ IR 8: Evaluation & SVD Paul Ginsparg Cornell University, Ithaca, NY 20 Sep 2011

More information

An Extended Frank-Wolfe Method, with Application to Low-Rank Matrix Completion

An Extended Frank-Wolfe Method, with Application to Low-Rank Matrix Completion An Extended Frank-Wolfe Method, with Application to Low-Rank Matrix Completion Robert M. Freund, MIT joint with Paul Grigas (UC Berkeley) and Rahul Mazumder (MIT) CDC, December 2016 1 Outline of Topics

More information

Lecture 9: September 28

Lecture 9: September 28 0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

Lecture 2 Part 1 Optimization

Lecture 2 Part 1 Optimization Lecture 2 Part 1 Optimization (January 16, 2015) Mu Zhu University of Waterloo Need for Optimization E(y x), P(y x) want to go after them first, model some examples last week then, estimate didn t discuss

More information

Matrix Completion from Fewer Entries

Matrix Completion from Fewer Entries from Fewer Entries Stanford University March 30, 2009 Outline The problem, a look at the data, and some results (slides) 2 Proofs (blackboard) arxiv:090.350 The problem, a look at the data, and some results

More information

Machine Learning and Data Mining. Dimensionality Reduction; PCA & SVD. Prof. Alexander Ihler

Machine Learning and Data Mining. Dimensionality Reduction; PCA & SVD. Prof. Alexander Ihler + Machine Learning and Data Mining Dimensionality Reduction; PCA & SVD Prof. Alexander Ihler Mo#va#on High-dimensional data Images of faces Text from articles All S&P 00 stocks Can we describe them in

More information

Collaborative topic models: motivations cont

Collaborative topic models: motivations cont Collaborative topic models: motivations cont Two topics: machine learning social network analysis Two people: " boy Two articles: article A! girl article B Preferences: The boy likes A and B --- no problem.

More information

IBM Research Report. Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent

IBM Research Report. Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent RJ10481 (A1103-008) March 16, 2011 Computer Science IBM Research Report Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent Rainer Gemulla Max-Planck-Institut für Informatik Saarbrücken,

More information

CSC321 Lecture 8: Optimization

CSC321 Lecture 8: Optimization CSC321 Lecture 8: Optimization Roger Grosse Roger Grosse CSC321 Lecture 8: Optimization 1 / 26 Overview We ve talked a lot about how to compute gradients. What do we actually do with them? Today s lecture:

More information

Conditions for Robust Principal Component Analysis

Conditions for Robust Principal Component Analysis Rose-Hulman Undergraduate Mathematics Journal Volume 12 Issue 2 Article 9 Conditions for Robust Principal Component Analysis Michael Hornstein Stanford University, mdhornstein@gmail.com Follow this and

More information

A Comparative Study of Matrix Factorization and Random Walk with Restart in Recommender Systems

A Comparative Study of Matrix Factorization and Random Walk with Restart in Recommender Systems A Comparative Study of Matrix Factorization and Random Walk with Restart in Recommender Systems Haekyu Park Computer Science and Engineering Seoul National University Seoul, Republic of Korea Email: hkpark627@snu.ac.kr

More information

Recommender Systems: Overview and. Package rectools. Norm Matloff. Dept. of Computer Science. University of California at Davis.

Recommender Systems: Overview and. Package rectools. Norm Matloff. Dept. of Computer Science. University of California at Davis. Recommender December 13, 2016 What Are Recommender Systems? What Are Recommender Systems? Various forms, but here is a common one, say for data on movie ratings: What Are Recommender Systems? Various forms,

More information

Lecture 6 Optimization for Deep Neural Networks

Lecture 6 Optimization for Deep Neural Networks Lecture 6 Optimization for Deep Neural Networks CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 12, 2017 Things we will look at today Stochastic Gradient Descent Things

More information

From Non-Negative Matrix Factorization to Deep Learning

From Non-Negative Matrix Factorization to Deep Learning The Math!! From Non-Negative Matrix Factorization to Deep Learning Intuitions and some Math too! luissarmento@gmailcom https://wwwlinkedincom/in/luissarmento/ October 18, 2017 The Math!! Introduction Disclaimer

More information

PU Learning for Matrix Completion

PU Learning for Matrix Completion Cho-Jui Hsieh Dept of Computer Science UT Austin ICML 2015 Joint work with N. Natarajan and I. S. Dhillon Matrix Completion Example: movie recommendation Given a set Ω and the values M Ω, how to predict

More information