APPLICATIONS OF MINING HETEROGENEOUS INFORMATION NETWORKS

Similar documents
Collaborative Filtering with Entity Similarity Regularization in Heterogeneous Information Networks

Collaborative topic models: motivations cont

Recommendation Systems

A Modified PMF Model Incorporating Implicit Item Associations

Scaling Neighbourhood Methods

Recommendation Systems

Data Mining Techniques

Point-of-Interest Recommendations: Learning Potential Check-ins from Friends

CS249: ADVANCED DATA MINING

CS145: INTRODUCTION TO DATA MINING

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Recommendation. Tobias Scheffer

Collaborative Filtering. Radek Pelánek

Recommender Systems EE448, Big Data Mining, Lecture 10. Weinan Zhang Shanghai Jiao Tong University

Circle-based Recommendation in Online Social Networks

Decoupled Collaborative Ranking

Relational Stacked Denoising Autoencoder for Tag Recommendation. Hao Wang

* Matrix Factorization and Recommendation Systems

Andriy Mnih and Ruslan Salakhutdinov

Clustering based tensor decomposition

Matrix Factorization & Latent Semantic Analysis Review. Yize Li, Lanbo Zhang

arxiv: v2 [cs.ir] 14 May 2018

Preliminaries. Data Mining. The art of extracting knowledge from large bodies of structured data. Let s put it to use!

Collaborative Filtering on Ordinal User Feedback

Distributed ML for DOSNs: giving power back to users

Modeling User Rating Profiles For Collaborative Filtering

Click Prediction and Preference Ranking of RSS Feeds

CSCI-567: Machine Learning (Spring 2019)

Algorithms for Collaborative Filtering

A Matrix Factorization Technique with Trust Propagation for Recommendation in Social Networks

Latent Dirichlet Allocation

Overlapping Community Detection at Scale: A Nonnegative Matrix Factorization Approach

Generative Models for Discrete Data

CS6220: DATA MINING TECHNIQUES

Prediction of Citations for Academic Papers

Matrix Factorization Techniques For Recommender Systems. Collaborative Filtering

RaRE: Social Rank Regulated Large-scale Network Embedding

Recommender Systems. Dipanjan Das Language Technologies Institute Carnegie Mellon University. 20 November, 2007

CS425: Algorithms for Web Scale Data

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine

Large-Scale Social Network Data Mining with Multi-View Information. Hao Wang

Aggregated Temporal Tensor Factorization Model for Point-of-interest Recommendation

The Ties that Bind Characterizing Classes by Attributes and Social Ties

Collaborative Topic Modeling for Recommending Scientific Articles

Recommendation Systems

Content-based Recommendation

Collaborative Filtering

Learning to Recommend Point-of-Interest with the Weighted Bayesian Personalized Ranking Method in LBSNs

PROBABILISTIC LATENT SEMANTIC ANALYSIS

Matrix Factorization and Factorization Machines for Recommender Systems

Personalized Ranking for Non-Uniformly Sampled Items

Rating Prediction with Topic Gradient Descent Method for Matrix Factorization in Recommendation

CS6220: DATA MINING TECHNIQUES

Binary Principal Component Analysis in the Netflix Collaborative Filtering Task

A Gradient-based Adaptive Learning Framework for Efficient Personal Recommendation

PathSelClus: Integrating Meta-Path Selection with User-Guided Object Clustering in Heterogeneous Information Networks

SCMF: Sparse Covariance Matrix Factorization for Collaborative Filtering

Collaborative Recommendation with Multiclass Preference Context

Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent

Matrix Factorization and Collaborative Filtering

When Will It Happen? Relationship Prediction in Heterogeneous Information Networks

Sequential Recommender Systems

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Incorporating Social Context and Domain Knowledge for Entity Recognition

Generative Clustering, Topic Modeling, & Bayesian Inference

Asymmetric Correlation Regularized Matrix Factorization for Web Service Recommendation

Probabilistic Matrix Factorization

Overlapping Communities

Department of Computer Science, Guiyang University, Guiyang , GuiZhou, China

Collaborative Filtering with Aspect-based Opinion Mining: A Tensor Factorization Approach

Data Science Mastery Program

Nonnegative Matrix Factorization

TopicMF: Simultaneously Exploiting Ratings and Reviews for Recommendation

Time-aware Collaborative Topic Regression: Towards Higher Relevance in Textual Item Recommendation

SQL-Rank: A Listwise Approach to Collaborative Ranking

Collaborative Filtering via Ensembles of Matrix Factorizations

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES

Location Regularization-Based POI Recommendation in Location-Based Social Networks

HBGG: a Hierarchical Bayesian Geographical Model for Group Recommendation

CS145: INTRODUCTION TO DATA MINING

Collaborative Filtering via Different Preference Structures

Matrix Factorization Techniques for Recommender Systems

Techniques for Dimensionality Reduction. PCA and Other Matrix Factorization Methods

ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 3 Centrality, Similarity, and Strength Ties

CS425: Algorithms for Web Scale Data

Multi-Relational Matrix Factorization using Bayesian Personalized Ranking for Social Network Data

Document and Topic Models: plsa and LDA

Community Detection. fundamental limits & efficient algorithms. Laurent Massoulié, Inria

What is Happening Right Now... That Interests Me?

MEI: Mutual Enhanced Infinite Community-Topic Model for Analyzing Text-augmented Social Networks

Liangjie Hong, Ph.D. Candidate Dept. of Computer Science and Engineering Lehigh University Bethlehem, PA

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Recommender Systems with Social Regularization

Side Information Aware Bayesian Affinity Estimation

RECSM Summer School: Facebook + Topic Models. github.com/pablobarbera/big-data-upf

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

NetBox: A Probabilistic Method for Analyzing Market Basket Data

6.034 Introduction to Artificial Intelligence

Non-Negative Matrix Factorization

Matrix Factorization In Recommender Systems. Yong Zheng, PhDc Center for Web Intelligence, DePaul University, USA March 4, 2015

Transcription:

APPLICATIONS OF MINING HETEROGENEOUS INFORMATION NETWORKS Yizhou Sun College of Computer and Information Science Northeastern University yzsun@ccs.neu.edu July 25, 2015

Heterogeneous Information Networks Multiple object types and/or multiple link types Movie Studio Venue Paper Author DBLP Bibliographic Network Actor Movie Director The IMDB Movie Network The Facebook Network 1. Homogeneous networks are Information loss projection of heterogeneous networks! 2. New problems are emerging in heterogeneous networks! Directly Mining information richer heterogeneous networks 1

Outline Why Heterogeneous Information Networks? Entity Recommendation Information Diffusion Ideology Detection Summary 2

Recommendation Paradigm community useritem feedback feedback user product features recommender system recommendation Collaborative Content-Based Hybrid Methods Filtering Methods E.g., K-Nearest Content-Based (Balabanovic Neighbor Comm. CF (Antonopoulus, ACM (Sarwar 97, Zhang WWW 01), IS 06), SIGIR 02) Matrix Factorization External Knowledge (Hu ICDM 08, CF (Ma Koren WSDM 11) IEEE-CS 09), Probabilistic Model (Hofmann SIGIR 03) external knowledge 3

Problem Definition feedback user implicit user feedback recommender system recommendation hybrid collaborative filtering with information networks information network 4

Hybrid Collaborative Filtering with Networks Utilizing network relationship information can enhance the recommendation quality However, most of the previous studies only use single type of relationship between users or items (e.g., social network Ma,WSDM 11, trust relationship Ester, KDD 10, service membership Yuan, RecSys 11) 5

The Heterogeneous Information Network View of Recommender System Avatar Aliens Titanic Revolution -ary Road Zoe Saldana Adventure James Cameron Leonardo Dicaprio Kate Winslet Romance 6

# of ratings Relationship Heterogeneity Alleviates Data Sparsity Collaborative filtering methods suffer from data sparsity issue A small number of users and items have a large number of ratings Most users and items have a small number of ratings # of users or items Heterogeneous relationships complement each other Users and items with limited feedback can be connected to the network by different types of paths Connect new users or items (cold start) in the information network 7

Relationship Heterogeneity Based Personalized Recommendation Models Different users may have different behaviors or preferences Aliens Sigourney Weaver fan James Cameron fan 80s Sci-fi fan Different users may be interested in the same movie for different reasons Two levels of personalization Data level Most recommendation methods use one model for all users and rely on personal feedback to achieve personalization Model level With different entity relationships, we can learn personalized models for different users to further distinguish their differences 8

Preference Propagation-Based Latent Features genre: drama Bob Naomi Watts King Kong Charlie Alice Titanic tag: Oscar Nomination Ralph Fiennes Kate Winslet revolutionary road Sam Mendes skyfall Generate L different meta-path (path types) connecting users and items Propagate user implicit feedback along each metapath Calculate latentfeatures for users and items for each meta-path with NMF related method 9

Recommendation Models Observation 1: Different meta-paths may have different importance Global Recommendation Model ranking score features for user i and item j (1) the q-th meta-path Observation 2: Different users may require different models Personalized Recommendation Model user-cluster similarity L (2) c total soft user clusters 10

Parameter Estimation Bayesian personalized ranking (Rendle UAI 09) Objective function sigmoid function min Θ (3) for each correctly ranked item pair i.e., u i gave feedback to e a but not e b Soft cluster users with NMF + k-means For each user cluster, learn one model with Eq. (3) Generate personalized model for each user on the fly with Eq. (2) Learning Personalized Recommendation Model 11

Experiment Setup Datasets Comparison methods: Popularity: recommend the most popular items to users Co-click: conditional probabilities between items NMF: non-negative matrix factorization on user feedback Hybrid-SVM: use Rank-SVM with plain features (utilize both user feedback and information network) 12

Performance Comparison HeteRec personalized recommendation (HeteRec-p) provides the best recommendation results 13

Performance under Different Scenarios p p user HeteRec p consistently outperform other methods in different scenarios better recommendation results if users provide more feedback better recommendation for users who like less popular items 14

Propose latent representations for users and items by propagating user preferences along different meta-paths Employ Bayesian ranking optimization technique to correctly evaluate recommendation models Further improve recommendation quality by considering user differences at model level and define personalized recommendation models Two levels of personalization Entity Contributions Recommendation in Information Networks with Implicit User Feedback (RecSys 13, WSDM 14a) 15

Outline Why Heterogeneous Information Networks? Entity Recommendation Information Diffusion Ideology Detection Summary 16

Information Diffusion in Networks Action of a node is triggered by the actions of their neighbors 17

[Granovetter, 1978] Linear Threshold Model If the weighted activation number of its neighbors is bigger than a pre-specified threshold θ u, the node u is going to be activated In other words p u (t + 1) = E[1 v Γ u w v,u δ u, t > θ u ] 18

Heterogeneous Bibliographic Network Multiple types of objects Multiple types of links 19

Derived Multi-Relational Bibliographic Network Collaboration: Author-Paper-Author Citation: Author-Paper->Paper-Author Sharing Co-authors: Author-Paper-Author-Paper-Author Co-attending venues: Author-Paper-Venue-Paper-Author How to generate these meta-paths? PathSim: Sun et.al, VLDB 11 20

How Topics Are Propagated among Authors? To Apply Existing approaches Select one relation between authors (say, A-P-A) Use all the relations, but ignore the relation types Do different relation types play different roles? Need new models! 21

Two Assumptions for Topic Diffusion in Multi- Relational Networks Assumption 1: Relation independent diffusion Model-level aggregation 22

Assumption 2: Relation interdependent diffusion Relation-level aggregation 23

Two Models under the Two Assumptions Two multi-relational linear threshold models Model 1: MLTM-M Model-level aggregation Model 2: MLTM-R Relation-level aggregation 24

For each relation type k MLTM-M The activation probability for object i at time t+1: The collective model The final activation probability for object i is an aggregation over all relation types 25

Properties of MLTM-M 26

MLTM-R Aggregate multi-relational network with different weights Treat the activation as in a single-relational network To make sure the activation probability non-negative, weights β s are required non-negative 27

Properties of MLTM-R 28

How to Evaluate the Two Models? Test on the real action log on multiple topics! Action log: {< u i, t i >} Diffusion model learning from action log MLE estimation over β s 29

DBLP Computer Science Relation types APS APA, AP->PA, APAPA, APVPA Physics Relation types APA, AP->PA, APAPA, APOPA Two Real Datasets 30

Topics Selected Select topics with increasing trends 31

Global Prediction Evaluation Methods How many authors are activated at t+1 Error rate = ½(predicted#/true# + true#/predicted#)-1 Local Prediction Which author is likely to be activated at t+1 AUPR (Area under Precision-Recall Curve) 32

Global Prediction 33

Local Prediction - AUPR 1: Different Relation Play Different Roles in Diffusion Process 2: Relation-Level Aggregation is better than Model- Level Aggregation

Case Study 35

Prediction Results on social network Diffusion 36

37

WIN! 38

Outline Why Heterogeneous Information Networks? Entity Recommendation Information Diffusion Ideology Detection Summary 39

Topic-Factorized Ideal Point Estimation Model for Legislative Voting Network (KDD 14, Gu, Sun et al.) 40

Background United Stated Congress Federal Legislation (bill) Ronald Paul Barack Obama Bill 1 Bill 2 The House Senate Law The House Senate Politician Republican Democrat Ronald Paul Barack Obama liberal conservative 41

Legislative Voting Network 42

Problem Definition Input: Legislative Network Output: x u : Ideal Points for Politician u a d : Ideal Points for Bill d x u s on different topics 43

Existing Work 1-dimensional ideal point model (Poole and Rosenthal, 1985; Gerrish and Blei, 2011) High-dimensional ideal point model (Poole and Rosenthal, 1997) Issue-adjusted ideal point model (Gerrish and Blei, 2012) 44

Motivations Voters have different positions on different topics. Topic 1 Topic 2 Topic 3 Topic 4 Traditional matrix factorization method cannot give the meanings for each dimension. M U V T k th latent factor Topics of bills can influence politician s voting, and the voting behavior can better interpret the topics of bills as well. Topic Model: Health Public Transport Voting-guided Topic Model: Health Service Health Expenses Public Transport 45

Topic-Factorized IPM u v ud d n(d, w) w Entities: Politicians Bills Terms Links: (P, B) (B, T) Politicians Bills Heterogeneous Voting Network Terms Parameters to maximize the likelihood of generating two types of links: Ideal points for politicians Ideal points for bills Topic models 46

Text Part Politicians Bills Terms 47

Text Part We model the probability of each word in each document as a mixture of categorical distributions, as in PLSA (Hofmann, 1999) and LDA (Blei et al., 2003) θ dk = p(k d) β kw = p(w k) d k w Bill Topic Word w d = n d, 1, n d, 2,, n d, N w p w d θ, β p W θ, β d w w ( k ( k n(d,w) θ dk β kw ) n(d,w) θ dk β kw ) 48

Voting Part Intuitions: The more similar of the ideal points of u and d, the higher probability of YEA link Politicians Bills Terms The higher portion a bill belongs to topic k, the higher weight of ideal points on topic k 49

Voting Part Voter u x u Topic 1 Topic 2 Topic k Topic K x u1 x u2 x uk x uk x uk R u 1 u 2 u NU d 1 d 2 d ND 0 1-1 1 1 1 1 1 0 0-1 1 1 1-1 1 1 1 1 1-1 1 0 0 User-Bill voting matrix V Bill d a d θ d1 x u1 a d1 θ dk θ dk x uk x uk a dk a dk a d1 a d2 a dk a dk a dk R r ud = x uk a dk YEA p v ud = 1 = σ( θ dk x uk a dk + b d ) k K k=1 K r ud = θ dk x uk a dk k=1 p v ud = 1 = 1 σ( θ dk x uk a dk + b d ) I k {vud =1} I {vud = 1} NAY p V θ, X, A, b = u,d :v ud 0 (p v ud = 1 1+v ud 2 p v ud = 1 1 v ud 2 ) 50

Combining Two Parts Together The final objective function is a linear combination of the two average log-likelihood functions over the word links and voting links. We also add an l 2 regularization term to A and X to reduce over-fitting. 51

Learning Algorithm An iterative algorithm where ideal points related parameters (X, A, b) and topic model related parameters (θ, β) enhance each other. Step 1: Update X, A, b given θ, β Gradient descent Step 2: Update θ, β given X, A, b Follow the idea of expectation-maximization (EM) algorithm: maximize a lower bound of the objective function in each iteration 52

Learning Algorithm Update θ: A nonlinear constrained optimization problem. Remove the constraints by a logistic function based transformation: e μ dk 1 + K 1 k =1 e μ dk if 1 k K 1 θ dk = 1 1 + K 1 k =1 e μ dk if k = K and update μ dk using gradient descent. Update β: Since β only appears in the topic model part, we use the same updating rule as in PLSA: where 53

Dataset: Data Description U.S. House and Senate roll call data in the years between 1990 and 2013. 1,540 legislators 7,162 bills 2,780,453 votes (80% are YEA ) Keep the latest version of a bill if there are multiple versions. Randomly select 90% of the votes as training and 10% as testing. Downloaded from http://thomas.loc.gov/home/rollcallvotes.html 54

Evaluation Measures Root mean square error (RMSE) between the predicted vote score and the ground truth RMSE = u,d :v ud 0 1+v ud p v 2 ud =1 N V Accuracy of correctly predicted votes (using 0.5 as a threshold for the predicted accuracy) Accuracy = u,d (I p v ud =1 >0.5 && v ud =1 +I p v ud =1 <0.5 && v ud = 1 ) N V Average log-likelihood of the voting link 2 AvelogL = u,d :v ud 0 1+v ud 2 log p v ud =1 + 1 v ud 2 N V log p(v ud = 1) 55

Experimental Results Training Data set Testing Data set 56

Parameter Study J θ, β, X, A, b = 1 λ avelogl text + λ avelogl(voting) 1 2σ 2 ( u x u 2 2 + d a d 2 2 ) Parameter study on λ Parameter study on σ (regularization coefficient) 57

Case Studies Ideal points for three famous politicians: (Republican, Democrat) Ronald Paul (R), Barack Obama (D), Joe Lieberman (D) Public Transportation Funds Health Expenses Health Service Law Militar y Financial Institution Individual Property Educatio n Foreign Ronald Paul Barack Obama Joe Lieberman 58

Case Studies Scatter plots over selected dimensions: (Republican, Democrat) 59

Case Studies Bill: H_RES_578 109th Congress (2005-2006) It is about supporting the government of Romania to improve the standard health care and well-being of children in Romania. x uk YEA R. Paul H_RES_578 For Unseen Bill d: Topic Model θ d TF-IPM x u p v ud = 1 p v ud = 1 = σ( k θ dk x uk a dk + b d ) Experts/Algo rithm a d 60

Outline Why Heterogeneous Information Networks? Entity Recommendation Information Diffusion Ideology Detection Summary 61

Summary Heterogeneous Information Networks are networks with multiple types of objects and links Principles in mining heterogeneous information networks Meta-path-based mining Systematically form new types of relations Relation strength-aware mining Different types of relations have different strengths Relation semantic-aware mining Different types of relations need different modeling 62

Q & A 63