Facing the information flood in our daily lives, search engines mainly respond

Similar documents
Collaborative Recommendation with Multiclass Preference Context

A Modified PMF Model Incorporating Implicit Item Associations

Collaborative Filtering. Radek Pelánek

arxiv: v2 [cs.ir] 14 May 2018

Matrix Factorization Techniques For Recommender Systems. Collaborative Filtering

Asymmetric Correlation Regularized Matrix Factorization for Web Service Recommendation

Collaborative Filtering on Ordinal User Feedback

Scaling Neighbourhood Methods

Decoupled Collaborative Ranking

SCMF: Sparse Covariance Matrix Factorization for Collaborative Filtering

Rating Prediction with Topic Gradient Descent Method for Matrix Factorization in Recommendation

* Matrix Factorization and Recommendation Systems

Recommendation Systems

COT: Contextual Operating Tensor for Context-Aware Recommender Systems

Circle-based Recommendation in Online Social Networks

Matrix Factorization Techniques for Recommender Systems

Restricted Boltzmann Machines for Collaborative Filtering

Andriy Mnih and Ruslan Salakhutdinov

TopicMF: Simultaneously Exploiting Ratings and Reviews for Recommendation

Matrix Factorization Techniques for Recommender Systems

Recommendation Systems

CS425: Algorithms for Web Scale Data

Recommender Systems. Dipanjan Das Language Technologies Institute Carnegie Mellon University. 20 November, 2007

Binary Principal Component Analysis in the Netflix Collaborative Filtering Task

Matrix Factorization In Recommender Systems. Yong Zheng, PhDc Center for Web Intelligence, DePaul University, USA March 4, 2015

Data Mining Techniques

Algorithms for Collaborative Filtering

Joint user knowledge and matrix factorization for recommender systems

Preliminaries. Data Mining. The art of extracting knowledge from large bodies of structured data. Let s put it to use!

Little Is Much: Bridging Cross-Platform Behaviors through Overlapped Crowds

Collaborative Filtering Applied to Educational Data Mining

Collaborative topic models: motivations cont

Large-scale Collaborative Ranking in Near-Linear Time

Large-Scale Social Network Data Mining with Multi-View Information. Hao Wang

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views

Collaborative Filtering

Scalable Bayesian Matrix Factorization

Time-aware Collaborative Topic Regression: Towards Higher Relevance in Textual Item Recommendation

Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent

Matrix Factorization and Collaborative Filtering

Recommendation Systems

arxiv: v1 [cs.lg] 26 Oct 2012

Recommender Systems EE448, Big Data Mining, Lecture 10. Weinan Zhang Shanghai Jiao Tong University

Local Low-Rank Matrix Approximation with Preference Selection of Anchor Points

Multiverse Recommendation: N-dimensional Tensor Factorization for Context-aware Collaborative Filtering

ECS289: Scalable Machine Learning

A Matrix Factorization Technique with Trust Propagation for Recommendation in Social Networks

Matrix and Tensor Factorization from a Machine Learning Perspective

Collaborative Filtering Matrix Completion Alternating Least Squares

Content-based Recommendation

Mixed Membership Matrix Factorization

Matrix Factorization and Factorization Machines for Recommender Systems

CS 175: Project in Artificial Intelligence. Slides 4: Collaborative Filtering

Transfer Learning for Collective Link Prediction in Multiple Heterogenous Domains

SQL-Rank: A Listwise Approach to Collaborative Ranking

Predicting the Performance of Collaborative Filtering Algorithms

Ranking-Oriented Evaluation Metrics

Learning in Probabilistic Graphs exploiting Language-Constrained Patterns

A Gradient-based Adaptive Learning Framework for Efficient Personal Recommendation

Department of Computer Science, Guiyang University, Guiyang , GuiZhou, China

Matrix Factorization with Content Relationships for Media Personalization

Exploiting Local and Global Social Context for Recommendation

Impact of Data Characteristics on Recommender Systems Performance

Modeling User Rating Profiles For Collaborative Filtering

arxiv: v2 [cs.ir] 4 Jun 2018

Using SVD to Recommend Movies

Active Transfer Learning for Cross-System Recommendation

Item Recommendation for Emerging Online Businesses

Exploiting Emotion on Reviews for Recommender Systems

Mixed Membership Matrix Factorization

CS425: Algorithms for Web Scale Data

Collaborative Filtering with Entity Similarity Regularization in Heterogeneous Information Networks

APPLICATIONS OF MINING HETEROGENEOUS INFORMATION NETWORKS

Collaborative Topic Modeling for Recommending Scientific Articles

Linear Regression (9/11/13)

Probabilistic Neighborhood Selection in Collaborative Filtering Systems

Data Science Mastery Program

Learning to Learn and Collaborative Filtering

Bayesian Matrix Factorization with Side Information and Dirichlet Process Mixtures

Mixture-Rank Matrix Approximation for Collaborative Filtering

CS249: ADVANCED DATA MINING

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Cross-Domain Recommendation via Cluster-Level Latent Factor Model

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University

Additive Co-Clustering with Social Influence for Recommendation

Improving Quality of Crowdsourced Labels via Probabilistic Matrix Factorization

Relational Stacked Denoising Autoencoder for Tag Recommendation. Hao Wang

Context-aware Ensemble of Multifaceted Factorization Models for Recommendation Prediction in Social Networks

Collaborative Filtering

Iterative Laplacian Score for Feature Selection

Large-scale Ordinal Collaborative Filtering

Review: Probabilistic Matrix Factorization. Probabilistic Matrix Factorization (PMF)

Learning to Recommend Point-of-Interest with the Weighted Bayesian Personalized Ranking Method in LBSNs

SoCo: A Social Network Aided Context-Aware Recommender System

Personalized Multi-relational Matrix Factorization Model for Predicting Student Performance

A Bayesian Treatment of Social Links in Recommender Systems ; CU-CS

Mitigating Data Sparsity Using Similarity Reinforcement-Enhanced Collaborative Filtering

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from

NCDREC: A Decomposability Inspired Framework for Top-N Recommendation

Cost-Aware Collaborative Filtering for Travel Tour Recommendations

Transcription:

Interaction-Rich ransfer Learning for Collaborative Filtering with Heterogeneous User Feedback Weike Pan and Zhong Ming, Shenzhen University A novel and efficient transfer learning algorithm called interaction-rich transfer by collective factorization extends the efficient collective matrix factorization algorithm by providing more interactions between the user-specific latent features. Facing the information flood in our daily lives, search engines mainly respond to our submitted queries passively, while recommender systems aim to discover and meet our needs in a more active way. Collaborative filtering techniques 1 4 have been applied in various recommendation-embedded applications. However, lack of users accurate preference data for example, five-star numerical ratings might limit this approach s applicability in real deployment. On the other side, a real recommender system can usually make use of additional types of user feedback for example, binary ratings of likes and dislikes. 5 Hence, collaborative filtering with different types of user feedback provides a potential way to address the data sparsity problem of accurate graded ratings. Here, we focus on this new research problem of collaborative filtering with heterogeneous user feedback, which is associated with few prior works. A recent work proposed a transfer learning algorithm called transfer by collective factorization (CF) that exploits such heterogeneous user feedback. 5 CF addresses the data sparsity problem via simultaneously sharing data-independent knowledge and modeling the data-dependent effect of two types of feedback. However, CF is a batch algorithm and updates model parameters only once after scanning the whole data, which might not be applicable for large datasets. On the contrary, some stochastic methods such as regularized singular value decomposition (RSVD) 3 and collective matrix factorization (CMF) 6 are empirically much more efficient than alternative batch-style algorithms like probabilistic matrix factorization (PMF) 7 and CF. However, the prediction accuracy of RSVD and CMF might not be adequate when compared with that of CF, especially when the users feedback are heterogeneous. here are also some efficient distributed or online collaborative filtering algorithms such as distributed stochastic gradient descent 8 and online multitask collaborative filtering, 9 but they re designed for homogeneous user feedback instead of the heterogenous ones studied in this article. 48 1541-1672/14/$31.00 2014 IEEE IEEE INELLIGEN SYSEMS Published by the IEEE Computer Society

Related Work in ransfer Learning in Collaborative Filtering ransfer learning in collaborative filtering (LCF)1,2 is an emerging interdisciplinary topic, which aims to design transfer learning 3 solutions to address the challenges in collaborative filtering, 4 for example, rating sparsity. Parallel to transfer learning in text mining, LCF has developed a family of new algorithms: model-, instance-, and featurebased transfer, which answer the question of what to transfer from the perspective of shared knowledge; and adaptive, collective, and integrative algorithms, which answer the question of how to transfer from the perspective of algorithm styles. We can categorize the proposed interaction-rich transfer by collective factorization (icf) algorithm as a feature-based (what to transfer), collective (how to transfer), transfer learning method. he most closely related work to our icf are transfer by collective factorization 5 and collective matrix factorization, 6 because they re also feature-based collective algorithms. References 1. B. Li, Q. Yang, and X. Xue, ransfer Learning for Collaborative Filtering via a Rating-Matrix Generative Model, Proc. 26th Ann. Int l Conf. Machine Learning, 2009, pp. 617 624. 2. W. Pan, E.W. Xiang, and Q. Yang, ransfer Learning in Collaborative Filtering via Uncertain Ratings, Proc. 26th AAAI Conf. Artificial Intelligence, 2012. 3. S.J. Pan and Q. Yang, A Survey on ransfer Learning, IEEE rans. Knowledge and Data Eng., vol. 22, no. 10, 2010, pp. 1345 1359. 4. D. Goldberg et al., Using Collaborative Filtering to Weave an Information apestry, Comm. ACM, vol. 35, no. 12, 1992, pp. 61 70. 5. W. Pan and Q. Yang, ransfer Learning in Heterogeneous Collaborative Filtering Domains, Artificial Intelligence, vol. 197, Apr. 2013, pp. 39 55. 6. A.P. Singh and G.J. Gordon, Relational Learning via Collective Matrix Factorization, Proc. 14th ACM SIGKDD Int l Conf. Knowledge Discovery and Data Mining, 2008, pp. 650 658. In this work, we aim to achieve a good balance between the accuracy of CF and the efficiency of CMF (see the related sidebar for information on others efforts). We extend the CMF algorithm by introducing richer interactions between the user-specific latent features, and design a corresponding algorithm called interactionrich transfer by collective factorization (icf). In particular, we assume that the predictability with regards to the same user s rating behaviors in the related numerical ratings and binary ratings is likely to be similar. With this assumption, we design update rules by sharing not only the item-specific latent features as that in CMF, but also the user-specific latent features in a smooth manner. he icf algorithm thus introduces more interactions between user-specific latent features. Experimental results on three real-world datasets show the effectiveness of our icf over RSVD and CMF. Background he studied problem setting is exactly the same as that of CF. We have n users and m items in a target numerical rating data R = {r ui } n m {1, 2, 3, 4, 5,?} n m and an auxiliary binary rating data R = n m n { r } {,,?} m ui 01, where? denotes a missing value. he users and? 3?????? 3 1 2?? 1? R 5? 5? 5? 5 4?? 1?? arget data (5-star graded ratings) 0? 0 0 1? 1 0?? 0 1 0 0? R ~ 1 0? 0 1??? 0 0? 0 1 items in the two data types are the same, and a one-to-one mapping is given. Our goal is to transfer knowledge from R to help predict the missing values in R. We illustrate the problem setting in Figure 1a. Note that in this article, we aim to design an efficient transfer learning algorithm, because the lack of efficiency is a major limitation of CF. PMF PMF models the preference data via two latent feature matrices, R ~ UV, (1) U W V V Auxiliary data (binary ratings of likes/dislikes) CMF Uu. U u. i = 1,..., m r ui r ~ ui Vi. Wu. W u. (a) (b) (c) i = 1,..., m r ui Figure 1. Problem setting and solutions. (a) Illustration of the studied problem setting. wo transfer learning solutions: (b) collective matrix factorization (CMF) and (c) interaction-rich transfer by collective factorization (icf). icf where the target numerical rating matrix R is factorized into a user-specific latent feature matrix U n d and an item-specific latent feature matrix V m d. Once we have obtained the latent feature matrices, we can predict the rating located at (u, i) via ˆrui = Uu Vi, where U u 1 d and V i 1 d are user u s and item i s latent feature vectors, respectively. CMF CMF uses item-side auxiliary data via sharing the same item-specific latent features. We use matrix notations r ~ ui V i. november/december 2014 www.computer.org/intelligent 49

to illustrate its idea on knowledge sharing, R ~ UV, R ~ WV, (2) where the auxiliary binary rating matrix R is decomposed into a userspecific latent feature matrix W n d and an item-specific latent feature matrix V m d. he knowledge encoded in the item-specific latent feature matrix V is shared in two factorization systems, and the two user-specific latent feature matrices W and U aren t shared. For our problem with the same users and same items in the target and auxiliary data, we reach the following factorization systems: R ~ UV, R ~ WV, s.t. W = U, (3) which means that CMF reduces to PMF with a pool of both target and auxiliary data R R. However, such a reduced approach will cause performance degradation, because it ignores the heterogeneity of the users feedback in R and R. Obviously, the semantic meaning of likes and dislikes in the auxiliary data are different from that of graded ratings in the target data. Our Solution Now let s look at the solution we propose. icf We can see that the PMF in Equation 1 doesn t make use of auxiliary data, CMF in Equation 2 only makes use of item-side auxiliary data, while CMF in Equation 3 reduces to PMF without distinguishing the heterogeneity of user feedback. he question we ask in this article is whether we can transfer more knowledge besides sharing the item-specific latent features in CMF as shown in Equation 2 and Figure 1b. here s some potential that we can exploit because the users in both data types are the same. For a typical user, a model s prediction accuracy trained on the target data of numerical ratings or auxiliary data of binary ratings is likely to be similar, because a user s preference variation and a model s ability to capture the user s preference usually doesn t change much in two related data. With this assumption, we reach the following factorization systems: R~ UV, R ~ WV, s.t. E = E, (4) where E and E denote the corresponding errors of the prediction model on the two data types, representing the predictability of user preferences. We can see that the main difference between Equations 2 and 4 is from the shared predictability in Equation 4, denoted as E = E. We expand the matrix formulation in Equation 4 as follows: n m min fui, s.t. eui = e ui, Wu, Uu, Vi, bu, bi, µ u= 1 i= 1 (5) where fui = yui[( 1/ 2) eui 2 +R ui ] + λy ui [( 1/ 2) e 2 ui +R ui ] is a balanced loss function on two data with l > 0. Note that eui = rui rˆ ui and e ui = r ui r ˆ ui are the errors of the prediction model on missing ratings in the target data and auxiliary data, respectively, where ˆrui = µ + bu + bi + Uu Vi and ˆr ui = W u V i are estimated preferences, m is the global average, b u is the user bias of user u, and b i is item bias of item i. he variables y ui and y ui indicate whether the entry located at (u, i) is observed in the target data and auxiliary data, respectively. R ui = ( αu/ 2) U 2 u ( αv/ 2) V 2 + i + ( βu/ 2) bu 2 + ( β v / 2) b 2 i and R 2 ui = ( αw/ 2) Wu + ( αv/ 2) V 2 i + ( βu/ 2) bu 2 + ( βv/ 2) bi 2 are regularization terms used to avoid overfitting when learning the latent variables. Learning the icf o solve the optimization problem in Equation 5, we start from the perspective of gradient descents, which will be used in the stochastic gradient descent framework. Learning parameters using the target data. Given a rating from the target data r ui with y ui = 1 and y ui = 0, we have gradients, U u = e ui V i + α u U u, V i = e ui U u + α v V i, b u = e ui + b u b u, b i = e ui + b v b i, and m = e ui for U u, V i, b u, b i, and m, respectively. Besides using these gradients to update the target parameters, we can also make use of auxiliary variables W u to update the target item-specific latent feature vector V i, because the predictability is assumed to be similar and can be shared that is, eui = eui. Given e ui, we have the gradient of V i in the auxiliary data, Vi = e uiwu + α vvi. We combine two gradients for the item-specific latent feature vector V i, Vi = ρ euiuu + αvvi + ( 1 ρ) e uiwu + αvvi = ρ euiuu + αvvi + ( 1 ρ) euiwu + αvvi = eui ρuu + ( 1 ρ) Wu + αvvi, where 0 r 1 is a parameter used to linearly integrate two gradients. Comparing ru u + (1 r)w u and U u in the gradient V i, we can see that more interactions between the userspecific latent features U u and W u are introduced, which is also illus trated via graphical models in Fig ure 1c. For this reason, we call r an interaction parameter between the user-specific latent features. We can see that the shared predictability will introduce more interactions between the user-specific 50 www.computer.org/intelligent IEEE INELLIGEN SYSEMS

Input: he target user-item numerical rating matrix R, the auxiliary user-item binary rating matrix R. Output: he user-specific latent feature vector U u, and bias b u, user-specific latent feature vector W u, item-specific latent feature vector V i and bias b i, and global average µ,where u = 1,...,n,i = 1,...,m. For t = 1,..., For iter = 1,...,q+q Step 1. Randomly pick up a rating from R and R; Step 2. Calculate the gradients as shown in Eqs.(6 10) if y ui = 1 or Eqs.(10 11) if y ui = 1; Step 3. Update the parameters as shown in Eq.(12). End End Figure 2. he icf algorithm. latent feature matrices U and W via ru u + (1 r)w u in V i. And for this reason, we call the proposed approach icf, representing interaction-rich transfer by collective factorization. Learning parameters using the auxiliary data. Similar to that of the target data, given a rating from the auxiliary data r ui with y ui = 1 and y ui = 0, we have the following gradient: Wu = λe uivi + λαwwu, Vi = λe ui ρwu + ( 1 ρ) U u + λαvvi, where 0 r 1 is again an interaction parameter to combine two gradients. Similarly, more interactions are introduced between the user-specific latent features W u and U u in rw u + (1 r)u u. he algorithm. We thus have the gradients given a target numerical rating (y ui = 1) or an auxiliary binary rating ( y ui = 1) as follows: b u = e ui + b u b u, if y ui = 1; (6) b i = e ui + b v b i, if y ui = 1; (7) m = e ui, if y ui = 1; (8) U u. = e ui V i. + α u U u., if y ui = 1; (9) Vi = eui Zu + αvvi, if yui = 1, λe ui Z u + λα v V i, if y ui = 1 ; (10) Wu = λe uivi + λαwwu, if y ui = 1 ; (11) where Z u = ru u + (1 r)w u. and Z u = ρwu + ( 1 ρ) Uu. We can see that when r = 1, we have Z u = U u and Z u = Wu, which are exactly the same as that of CMF. CMF is thus a special case of icf, which only shares the item-specific latent feature matrix V with r = 1. When 0 < r < 1, the equation Z u = ru u + (1 r)w u in icf is considered a smooth version, in comparison with U u only in CMF, which is likely to be more stable in the stochastic algorithmic framework of Stochastic Gradient Descent (SGD). Finally, we have the update rules, q = q γ q, (12) where q can be b u, b i, m, U u, V i when y ui = 1; and V i, W u when y ui = 1. Note that g > 0 is the learning rate. Figure 2 describes a complete algorithm with the previously discussed update rules, where it goes over all of both target and auxiliary data in times. he time complexity of icf and CMF are O ( ( q+ q ) d) and that of RSVD is O(qd), where q and q are the numbers of ratings in target and auxiliary data, respectively. he learning algorithm of icf is much more efficient than that of CF, because icf is a stochastic algorithm while CF is a batch one. Note that CF can t use similar stochastic update rules because of the orthonormal constraints on user-specific and item-specific latent feature matrices in the adopted matrix trifactorization model, and its time complexity is OK ( max( qqd, 3 6 ) + Kd ) with K as the iteration number. 5 he difference between CF and our icf can be identified from the two fundamental questions in transfer learning. 10 o answer the question of what to transfer, CF shares latent features, while our icf shares both latent features and the predictability; and for how to transfer, CF adopts matrix trifactorization and batch style implementation, while our icf uses the more efficient matrix bifactorization and stochastic style implementation. Experimental Results Next, we tested the algorithm to determine its performance. november/december 2014 www.computer.org/intelligent 51

able 1. Description of Netflix subset (n = m = 5000), MovieLens10M (n = 71, 567, m = 10,681), and Flixter (n = 147, 612, m = 48, 794) data used in the experiments. Dataset Form Sparsity (%) Netflix arget (training) {1,, 5,?} 0.8 arget (test) {1,, 5,?} 11.3 Auxiliary {dislike, like,?} 2 MovieLens10M arget (training) {0.5,, 5,?} 0.52 arget (test) {0.5,, 5,?} 0.26 Auxiliary {dislike, like,?} 0.52 Flixter arget (training) {0.5,, 5,?} 0.046 arget (test) {0.5,, 5,?} 0.023 Auxiliary {dislike, like,?} 0.046 Style Batch Stochastic able 2. Prediction performance of icf and other methods on the Netflix subset data.* Algorithm Mean absolute error (MAE) Root mean square error (RMSE) Probabilistic matrix 0.7642 ±0.0003 0.9691 ±0.0007 factorization CMF-link 0.7295 ±0.0003 0.9277 ±0.0004 CF (CMF)** 0.6962 ±0.0009 0.8884 ±0.0007 CF (CSVD)** 0.6877 ±0.0007 0.8809 ±0.0005 Regularized singular value 0.7236 ±0.0003 0.9201 ±0.0004 decomposition (RSVD) CMF 0.7054 ±0.0002 0.9020 ±0.0003 icf 0.7014 ±0.0005 0.8966 ±0.0004 * For stochastic algorithms, the interaction parameter r is fixed at 0.5, and the number of iterations is fixed at 50. Batch algorithm results are from other work. 5 ** CMF = collective matrix trifactorization; CSVD = collective singular value decomposition. Datasets and Evaluation Metrics We extracted the first dataset from Netflix (see www.netflix.com) in the same way as that used in other work. 5 he data contains three copies of numerical ratings and binary ratings assigned by 5,000 users on 5,000 items. Note that we used this small dataset for empirical studies among icf, CF, and other methods, because CF might not scale well to large datasets. We extracted the second dataset from MovieLens10M (see www.grouplens.org/node/73/) in the same way as that used in other work. 11 he data contains five copies of target, auxiliary, and test data. For each copy of auxiliary data, we convert ratings smaller than four to dislike, and ratings larger than or equal to four to like, to simulate the binary feedback. We extracted the third dataset from Flixter (see www.cs.ubc.ca/~ jamalim/datasets). 12 his data contains 8.2 10 6 ratings given by 1.5 10 5 users on 4.9 10 4 products. We preprocess the Flixter rating data in the same way as that of the MovieLens10M data to generate five copies of target, auxiliary, and test data. able 1 shows the detailed statistics of the datasets used in the experiments. For icf, RSVD, and CMF, dislike and like are replaced with numerical values of 1 and 5, respectively, to make both target data and auxiliary data in the same rating range. We adopt two commonly used evaluation metrics in recommender systems: mean absolute error (MAE) and root mean square error (RMSE), MAE= rui rˆ ui / E ( uir,, ui ) E RMSE = ( rui rˆ 2 ui) / E, ( uir,, ) ui E where r ui and ˆr ui are the true and predicted ratings, respectively, and E is the number of test ratings. Baselines and Parameter Settings We compare our icf algorithm with some batch algorithms 5 on the small dataset, because batch algorithms aren t very efficient. We also compare icf with two stochastic algorithms, RSVD and CMF, on the aforementioned two large datasets. For icf, RSVD, and CMF, the model parameters of m, b u, b i, U ik, V ik, and W ik, k = 1,, d are initialized exactly the same as that done in previous work. 11 he tradeoff parameters are set similarly to that used by Yehuda Koren, α u = α v = α w = 0.01, b u = b v = 0.01. 3 he learning rate is initialized as g = 0.01 and decreased via g g 0.9 over every scan of both the target data and auxiliary data. 3 For the Netflix subset data, we set the number of latent features as d = 10; 5 for the MovieLens10M data, we use d = 20; 13 and for the Flixter data, we use d = 10. 12 o study the effectiveness of interactions between userspecific latent features, we report the results of using different values of r {0, 0.2, 0.4, 0.6, 0.8, 1}. Note that when r = 1, icf reduces to CMF. he value of l is fixed as 1 with the same weight on auxiliary and target data for the MovieLens10M and Flixter data, and is fixed as 10 for the Netflix subset data. Results Now, let s study the results of the algorithm s performance. 52 www.computer.org/intelligent IEEE INELLIGEN SYSEMS

able 3. Prediction performance of icf and other methods on MovieLens10M and Flixter data.* Comparison with batch algorithms. From able 2, we can see that the batch algorithm CF performs better than the proposed stochastic algorithm icf, because CF is able to capture the datadependent effect and to transfer the data-independent knowledge simultaneously in a principled way. he icf algorithm aims for efficiency and large data, which beats other batch algorithms of PMF and CMF-link, and stochastic algorithms of RSVD and CMF. he results of batch algorithms of PMF, CMF-link, and CF shown in able 2 are from other research. 5 Comparison with stochastic algorithms. From able 3, we can see that icf is again better than RSVD and CMF, which shows the effect of the introduced richer interactions between auxiliary and target data in the proposed transfer learning solution. We can also see that the transfer learning methods CMF and icf are both better than RSVD, which shows the usefulness of the auxiliary data and the effectiveness of the knowledge transfer mechanisms in CMF and icf. Impact of interaction parameter (r). From Figure 3, we can see that icf performs best when 0.2 r 0.4, which shows that a relatively strong interaction is useful. Note that when r = 1, icf reduces to CMF with no interactions between user-specific latent features. In this article, we propose a novel and efficient transfer learning algorithm, icf, in collaborative filtering with heterogeneous user feedback. Our icf aims to transfer knowledge from auxiliary binary ratings of likes and dislikes to improve the target numerical rating prediction performance in an efficient way. Our icf algorithm achieves this via introducing richer interactions by sharing both item-specific Data Metric RSVD CMF icf MovieLens10M MAE 0.6438 ±0.0011 0.6334 ±0.0012 0.6197 ±0.0006 RMSE 0.8364 ±0.0012 0.8273 ±0.0013 0.8091 ±0.0008 Flixter MAE RMSE * he interaction parameter r is fixed at 0.5. he number of iterations is fixed at 50. RMSE (a) 0.83 0.82 0.81 0 0.2 0.4 0.6 0.8 1.0 ρ latent features and the predictability in two heterogeneous data in a smooth manner. Our icf is more efficient than a recent batch algorithm that is, CF and performs better than two state-of-the-art stochastic algorithms that is, RSVD and CMF. For future work, we re interested in generalizing the idea of introducing rich interactions in heterogeneous user feedback to the problem of collaborative filtering with auxiliary information of social context and implicit feedback. 14 Acknowledgments We thank the National Natural Science Foundation of China (no. 61170077 and no. 61272303), NSF GD (no. 10351806001000000), GDS& (no. 2012B091100198), S& project of SZ (no. JCYJ20130326110956468), and the National Basic Research Program of China (973 Plan, no. 2010CB327903) for their support. Zhong Ming is the corresponding author for this work. References 1. G. Adomavicius and A. uzhilin, oward the Next Generation of Recommender Systems: A Survey of the 0.6561 ±0.0007 0.6423 ±0.0009 0.6373 ±0.0005 0.8814 ±0.0010 0.8710 ±0.0012 0.8636 ±0.0010 0.875 0.870 0.865 0.860 0 0.2 0.4 0.6 0.8 1.0 ρ Figure 3. Prediction performance of icf on (a) MovieLens10M data and (b) Flixter data with different q values. he number of iterations is fixed at 50. RMSE (b) State-of-the-Art and Possible Extensions, IEEE rans. Knowledge and Data Eng., vol. 17, no. 6, 2005, pp. 734 749. 2. D. Goldberg et al., Using Collaborative Filtering to Weave an Information apestry, Comm. ACM, vol. 35, no. 12, 1992, pp. 61 70. 3. Y. Koren, Factorization Meets the Neighborhood: A Multifaceted Collaborative Filtering Model, Proc. 14th ACM SIG- KDD Int l Conf. Knowledge Discovery and Data Mining, 2008, pp. 426 434. 4. S. Rendle, Factorization Machines with libfm, ACM rans. Intelligent Systems and echnology, vol. 3, no. 3, 2012, pp. 57:1 57:22. 5. W. Pan and Q. Yang, ransfer Learning in Heterogeneous Collaborative Filtering Domains, Artificial Intelligence, vol. 197, Apr. 2013, pp. 39 55. 6. A.P. Singh and G.J. Gordon, Relational Learning via Collective Matrix Factorization, Proc. 14th ACM SIGKDD Int l Conf. Knowledge Discovery and Data Mining, 2008, pp. 650 658. 7. R. Salakhutdinov and A. Mnih, Probabilistic Matrix Factorization, Proc. november/december 2014 www.computer.org/intelligent 53

he Authors Weike Pan is a lecturer with the College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China. His research interests include transfer learning, recommender systems, and statistical machine learning. Pan has a PhD in computer science and engineering from the Hong Kong University of Science and echnology, Kowloon, Hong Kong. Contact him at panweike@szu.edu.cn. Zhong Ming is a professor with the College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China. His research interests include software engineering and Web intelligence. Ming has a PhD in computer science and technology from the Sun Yat-Sen University, Guangzhou, Guangdong, China. He is the corresponding author. Contact him at mingz@szu.edu.cn. Ann. Conf. Neural Information Processing Systems, 2008, pp. 1257 1264. 8. R. Gemulla et al., Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent, Proc. 17th ACM SIG- KDD Int l Conf. Knowledge Discovery and Data Mining, 2011, pp. 69 77. 9. J. Wang et al., Online Multi-ask Collaborative Filtering for On-the-Fly Recommender Systems, Proc. 7th ACM Conf. Recommender Systems, 2013, pp. 237 244. 10. S.J. Pan and Q. Yang, A Survey on ransfer Learning, IEEE rans. Knowledge and Data Eng., vol. 22, no. 10, 2010, pp. 1345 1359. 11. W. Pan, E.W. Xiang, and Q. Yang, ransfer Learning in Collaborative Filtering via Uncertain Ratings, Proc. 26th AAAI Conf. Artificial Intelligence, 2012. 12. M. Jamali and M. Ester, A Matrix Factorization echnique with rust Propagation for Recommendation in Social Networks, Proc. 4th ACM Conf. Recommender Systems, 2010, pp. 135 142. 13..C. Zhou et al., agrec: Leveraging agging Wisdom for Recommendation, Proc. 2009 Int l Conf. Computational Science and Eng., 2009, pp. 194 199. 14. N.N. Liu, L. He, and M. Zhao, Social emporal Collaborative Ranking for Context Aware Movie Recommendation, ACM rans. Intelligent Systems and echnology, vol. 4, no. 1, 2013, pp. 15:1 15:26. Selected CS articles and columns are also available for free at http://computingnow.computer.org. ADVERISER INFORMAION Advertising Personnel Marian Anderson: Sr. Advertising Coordinator Email: manderson@computer.org Phone: +1 714 816 2139 Fax: +1 714 821 4010 Sandy Brown: Sr. Business Development Mgr. Email sbrown@computer.org Phone: +1 714 816 2144 Fax: +1 714 821 4010 Advertising Sales Representatives (display) Central, Northwest, Far East: Eric Kincaid Email: e.kincaid@computer.org Phone: +1 214 673 3742 Fax: +1 888 886 8599 Northeast, Midwest, Europe, Middle East: Ann & David Schissler Email: a.schissler@computer.org, d.schissler@computer.org Phone: +1 508 394 4026 Fax: +1 508 394 1707 Southwest, California: Mike Hughes Email: mikehughes@computer.org Phone: +1 805 529 6790 Southeast: Heather Buonadies Email: h.buonadies@computer.org Phone: +1 973 304 4123 Fax: +1 973 585 7071 Advertising Sales Representatives (Classified Line) Heather Buonadies Email: h.buonadies@computer.org Phone: +1 973 304 4123 Fax: +1 973 585 7071 Advertising Sales Representatives (Jobs Board) Heather Buonadies Email: h.buonadies@computer.org Phone: +1 973 304 4123 Fax: +1 973 585 7071 54 www.computer.org/intelligent IEEE INELLIGEN SYSEMS