Recommender systems, matrix factorization, variable selection and social graph data

Size: px

Start display at page:

Download "Recommender systems, matrix factorization, variable selection and social graph data"

Helen Armstrong
6 years ago
Views:

1 Recommender systems, matrix factorization, variable selection and social graph data Julien Delporte & Stéphane Canu StatLearn, april 205, Grenoble

2 Road map Model selection for matrix factorization Recommender systems Factorize to recommend Model selection 2 Socially enabled preference learning Tuenti s problem Modeling social preferences Alternating least square minimization Experimental results on real data

3 Recommendations

4 Recommend items, movies, lines of code...

5 Recommendation as matrix completion V Y U d? d F = <U ; V > hui, Vj i estimates the desire of user i for the product j

6 Problem Model Y = F + R observation = information + noise F regular and R noise matrix Problem: build F to estimate F min F L(Y, }{{ F) } data loss + λ Ω( F) }{{} penalization for λ well chosen F = UV t low rank factorization

7 Problem Model Y = F + R observation = information + noise F regular and R noise matrix Problem: build F to estimate F min F L(Y, }{{ F) } data loss + λ Ω( F) }{{} penalization for λ well chosen F = UV t low rank factorization

8 How to estimate U et V? Low rank matrix Y approximation min F with L(Y, F) := Y F 2 F rank( F) = d Eckart & Young 936 Solution: principal component analysis (PCA) F = UV t Y n = normalise(y ) Large scale adaptation C = Y n Y n OUT OF MEMORY d = 3 we need more 200 V, Σ = eig(c, d) full sparse SVD F = UV

9 How to estimate U et V? Low rank matrix Y approximation min F with L(Y, F) := Y F 2 F rank( F) = d Eckart & Young 936 Solution: principal component analysis (PCA) F = UV t Y n = normalise(y ) Large scale adaptation C = Y n Y n OUT OF MEMORY d = 3 we need more 200 V, Σ = eig(c, d) full sparse SVD F = UV

10 How to implement PCA (to retrieve U and V )? Singular value decomposition (SVD) F = U t V (= ŨΣV t ) Exact solution on sparse matrices (??? = 0) svds too slow use adapted Lanczos process bidiagonalization Y = WBZ diagonalize B (using Givens rotations GVL 8.6.) adapted software: PROPACK a a rmunk/propack/ >> size(y) ans = >> d = 200; >> tic >> [U,D,V] = lansvd(y,d, L ); >> toc Elapsed time is seconds. 27 sec at

Factorize for million Netflix Challenge (2007-2009) Task: movies recommendation CineMatch: Netflix rec.

11 Factorize for million Netflix Challenge ( ) Task: movies recommendation CineMatch: Netflix rec. engine Improve performances by 0% Criterion: square error Data users movies 00 millons scores (from to 5) sparcity rate:,3 % test on 3 millions (most recent ones) download

12 baseline: CineMatch factorization: SVD with d factors U i, V j IR d min U,V (Y ij Ui V j ) 2 + λ( U i 2 + V i 2 ) i,j normalization global mean µ, for user i b i (Y ij Ui V j µ b i p j ) 2 + λ( U i 2 + V j 2 ) min U,V i,j weight implicit feedback c ij confidence level min U,V c ij (Y ij Ui V j µ b i p j ) 2 + λ( U i 2 + V j 2 ) i,j time (Y ij U i (t) V j µ b i (t) p j (t)) 2 + λ( U i 2 + V j 2 ) min U,V i,j mixture of models

13 Netflix results initiale error: factorization - 4 % number of factors -.5 % improvments % normalization implicit feedback time bagging - 3 % = mixing 00 methods Koren, Y., Bell, R.M., Volinsky, C.: Matrix factorization techniques for recommender systems. IEEE Computer 42(8), (2009)

14 Lessons from Netflix too specific efforts to be generalized factorization is the solution implicit feedback helps take into account the specificities (time effect) stochastic gradient (thanks Yann) flexible scalable

15 Road map Model selection for matrix factorization Recommender systems Factorize to recommend Model selection 2 Socially enabled preference learning Tuenti s problem Modeling social preferences Alternating least square minimization Experimental results on real data

16 Model selection for factorization Objectif How to choose d (the number of latent variables) V Y U d? d F = <U ; V >

17 Chosing d is chosing λ Modèle Estimator F Y = F + R i.i.d. R ij N (0, s 2 ) F = argmin Y M 2 + pen(λ, Y, M) M Certain pen SVD + shrinkage F = f λ (Y ) (Candès et al, 202) Y = UΣV then F = fλ (Y ) = p k= f (k) λ (σ k)u k V k Examples f (k) λ (σ k) = max(0, σ k λ) (soft shrinkage) = max(0, min(σ, γ(σ k λ)) (firm shrinkage)

18 A criteria to choose λ model: Y = F + R R ij N (0, s 2 ) i.i.d. estimator family: F = f λ (Y ) estimator cost: prediction error C(Y, λ) = F f λ (Y ) 2 = i j (F ij f λ (Y ) ij ) 2 Oracle: λ = argmin F f λ (Y ) 2 λ Stein Unbiased Risk Estimator (Stein, 977) δ 0 (Y, λ) = Y f λ (Y ) 2 + (2div Y ( fλ (Y ) ) np)ŝ 2 If R is i.i.d. Gaussian, then δ 0 (Y, λ) is an unbiased estimator of F f λ (Y ) 2 (SURE) λ = argmin δ 0 (Y, λ) λ

19 SURE δ 0 (Y, λ) min λ δ 0 (Y, λ) = Y f λ (Y ) 2 + (2div Y ( fλ (Y ) ) np)ŝ 2 Y f λ (Y ) 2 empirical error f λ the shrinking function and f λ its derivative ŝ 2 unbiased estimator independent from s 2 div Y ( fλ (Y ) ) degrees of freedom n lines and p columns The divergence div Y ( f λ(y ) ) = p i= ( f λ (σ i) + (n p) f ) λ(σ i ) + 2 σ i p i= p σ i f λ (σ i ) σi 2 σj 2 j= j i σ i small f λ (σ i ) = 0

20 Starting point with min λ δ 0 (Y, λ) = Y f λ (Y ) 2 + (2div Y ( fλ (Y ) ) np)ŝ 2 div Y ( f (Y ) ) = p i= ( f i (σ i) + (n p) f ) i(σ i ) + 2 σ i p i= p j= j i σ i f i (σ i ) σ 2 i σ 2 j Scaling issues prior: propose other functions f λ. div doesn t scale: estimate div. variance ŝ 2 is supposed to be known

21 Thresholding F = f λ (Y ) = d k= f (k) λ (σ k)u k V k f (k) λ (σ k) = max(0, σ k λ) (σ k) = MCP f (k) λ f (k) λ (σ k) = SCAD Soft Mcp Scad Adalasso Soft Mcp Scad Adalasso penalty functions (left) and associated thresholds (right)

22 Other thresholding functions Soft Mcp Scad Ada Risque estimé λ Risk estimation as a function of λ

23 Divergence estimation Problem div Y ( f (Y ) ) = p i= ( f i (σ i) + (n p) f ) i(σ i ) + 2 σ i p i= p σ i f i (σ i ) j= j i σ 2 i σ 2 j Solution calculate k largest singular values. estimate the remaning part of the spectrum.

24 Divergence estimation (II) Theorem (Marchenko-Pastur) Let X L(p, n) a centered random matrix. The eigenvalues λ i of the the matrix n X X are distributed (when considered as random variables) according to a Marchenko-Pastur law of parameter c > 0 with probability density { P c (x) = 2πxc (x a)(b x) si a x b, 0 sinon. a = ( c) 2 and b = ( + c) 2.

25 Divergence estimation (III) divergence snr = 0, x réelle marchenko linéaire divergence λ x0 4 snr = 2 0 réelle marchenko linéaire λ Comparizon of divergence estimations

26 Correlated case (with D. Fourdrinier) Y = F + R R N (0, Σ) Elliptically Symmetric Distributions If R admit a density, x Σ n/2 g ( tr(xσ x t ) ), Invariant loss F F 2 Σ = tr(f F)Σ (F F) δ inv 0 = c Y f λ (Y ) 2 S + 2 div Y f λ (Y ) np is IE -unbiased (with IE = IE in the gaussian case) S estimating Σ c = n k p

27 Summary factorization works it is adaptable some more work remains to provide robust & efficient model selection

28 Recommendation Recommandation & liens 26 / 36 Road map Model selection for matrix factorization Recommender systems Factorize to recommend Factorization and singular values decomposition Netflix and the factorization Model selection Model selection for factorization Scaling 2 Socially enabled preference learning Tuenti s problem Modeling social preferences Alternating least square minimization Experimental results on real data

29 Recommendation Recommandation & liens 27 / 36 Tuenti s problem Size 3M people 00K places 200M social connexions user user 2 user 3 user Really sparse 4 places per user 20 friends per user

30 Recommendation Recommandation & liens Objective V Y U d? d F = <U ; V > 28 / 36

31 Recommendation Recommandation & liens Objective V Y U d? d A 28 / 36

32 Recommendation Recommandation & liens 29 / 36 Existing approaches Cost functions min L (Y, UV ) + µl 2 (A, UU ) + λω(u, V ) Yang et al. (20) U,V n min L(Y, UV ) + µ U i U k 2 F U,V F i + λω(u, V ) Ma et al. (20) i= k F i F i set of i s friends A ii = if i F i Decision function min U,V (i,j) Y ( c ij Ui V j + A ik U k V j ) 2 Y ij + ΩU,V,A Ma et al. (2009) F i k F }{{ i } ˆF ij

33 Recommendation Recommandation & liens 30 / 36 Decision function Decision function ˆF ij = U i V j + k Fi A ik U k V j F i Ma et al. (2009) () Loss function min U,V,A (i,j) Y c ij ( Ui V j + k F i A ik U k V j F i Y ij ) 2 + ΩU,V,A (2) c ij penalizing more than 0. Ω U,V,A = λ U 2 Fro + λ 2 V 2 Fro + λ 3 A 2 Fro Contribution Learn friendship influence

34 Recommendation Recommandation & liens 3 / 36 Algorithm Alternating least square minimization SE CoFi Input: Y and F intialize U,V and A Repeat For each user i U update U i For each item j V update V j For each user i U For each user k F i update A ik Until convergence

35 Recommendation Recommandation & liens 32 / 36 Tuenti and Epinions dataset Tuenti: 3 M users and 00 k places (20 friends) Epinion: 50 k users et 00 k items (5 friends)

36 Recommendation Recommandation & liens Tuenti and Epinions 0.95 imf SE CoFi LLA RSR Trust average MAP Nombre de facteurs d RANK imf SECoFi LLA RSR Trust Average Nombre de facteurs d Tuenti: 3 M users and 00 k places (20 friends) SECoFi Trust LLA RSR imf average Epinion: 50 k users et 00 k items (5 friends) MAP RANK using 24 cores with 00Go RAM Nombre de facteurs d SECoFi Trust LLA RSR imf average Nombre de facteurs d 33 / 36

37 Recommendation Recommandation & liens 34 / 36 Back on friends influence frequency of the histogram Distribution Tuenti values of ἀalpha Epinion Affinity principle (homophily effect), no negative influence

38 Recommendation Recommandation & liens 35 / 36 Conclusion Summary matrix AND graph large scale datasets recommendation improvement influence measure Perspectives take advantage of the framework take into account more contextual data GPS, places types,... towards friends recommendations (learn α)

39 Recommendation Recommandation & liens Questions? V Y U d? d F = <U ; V > Matrix factorization, recommender systems, personalized with preferences Julien DELPORTE PhD 36 / 36

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms François Caron Department of Statistics, Oxford STATLEARN 2014, Paris April 7, 2014 Joint work with Adrien Todeschini,