Social Computing and Its Application in Query Suggestion

Size: px
Start display at page:

Download "Social Computing and Its Application in Query Suggestion"

Transcription

1 Social Computing and Its Application in Query Suggestion Irwin King Department of Computer Science & Engineering The Chinese University of Hong Kong

2 Billionaires Shuffle 2007 William Gates Warren Buffett 2008 Warren Buffett Mark Zuckerberg Facebook in Carlos Slim Helu & family Carlos Slim Helu & family 2008 at 23 and $1.5 billion later... William Gates

3 Alexa as of Nov USA CHINA Global Google Baidu Yahoo Yahoo QQ Google Myspace Sina YouTube YouTube Google.cn Windows Live Facebook Taobao Facebook Windows Live 163 MSN MSN Yahoo Myspace Wikipedia Google Wikipedia EBay Sohu Blogger AOL Youku Yahoo.jp

4 What s On the Menu? Web 2.0 and Social X Social Computing Some Interesting Problems Collaborative Filtering Query Suggestion

5 What s On the Menu? Web 2.0 and Social X Social Computing Some Interesting Problems Collaborative Filtering Query Suggestion

6 Web 2.0 Web as a medium vs. Web as a platform Read-Only Web vs. Read-and-Write Web Static vs. Dynamic Restrictive vs. Freedom & Empowerment Technology-centric vs. User-centric Limited vs. Rich User Experience Individualistic vs. Group/Collective Behavior Consumer vs. Producer Transactional vs. Relational Top-down vs. Bottom-up People-to-Machine vs. People-to-People Search & browse vs. Publish & Subscribe Closed application vs. Service-oriented Services Functionality vs. Utility Data vs. Value

7 Social Networking

8 Social Search Social Search Engine Leveraging your social networks for searching

9 Social Bookmarking

10 Social Media

11 Social News/Mash Up

12 Social Knowledge Sharing

13 Social Marketing Viral marketing Who are the brokers? Who can exert the most influence on buying/ selling? How much should one advertise?

14 Social/Human Computation

15 Human Computation

16 Web 2.0 Revolution The Three C s Connectivity Collaboration Communities

17 What s On the Menu? Web 2.0 and Social X Social Computing Some Interesting Problems Collaborative Filtering Query Suggestion

18 Social Relations presence crew identity teams populations binary social role squad cardinal reputation organizations cohorts markets communities partners groups integer real expertise trust ownership accountability knowledge

19 social network services blogs s wikis instant messaging mobile devices social bookmarking Social Computing Social Behavior ranking Regression tagging Intelligent Computation Clustering Algorithms Classification security & privacy NLP Collective Intelligence Model Selection Theory collaborative filtering social marketing human computation opinion mining/ sentiment analysis query logs large graph analysis algorithms

20 Social Computing (SC) Social computing is a general term for an area of computer science that is concerned with the intersection of social behavior and computational systems. Wikipedia A social structure in which technology puts power in communities, not institutions. Forrester Forms of web services where the value is created by the collective contributions of a user population.

21 Issues Theory and models Seach, mining, and ranking of existing information, e.g., spatial (relations) and temporal (time) domains Dealing with partial and incomplete information, e.g., collaborative filtering, ranking, tagging, etc. Scalability and algorithmic issues Security and privacy issues Monetization of social interactions

22 Machine Learning in SC Classification, clustering, regression, etc. New insights on the data Social relations are often hidden (latent) Change data from (x, y) to (x, c 1 (x),c 2 (x),,y) c(x) = context in tags, relations, ratings, etc. data type = binary, integer, real, cardinal, etc.

23 Organizational Chart

24 Social Network Chart Authority vs. Importance

25 What s On the Menu? Web 2.0 and Social X Social Computing Some Interesting Problems Collaborative Filtering Query Suggestion

26 What s On the Menu? Web 2.0 and Social X Social Computing Some Interesting Problems Collaborative Filtering Query Suggestion

27 A Better Mousetrap?

28 Challenges Queries contain ambiguous and new terms apple: apple computer or apple pie? NDCG:? Users tend to submit short queries consisting of only one or two words almost 20% one-word queries almost 30% two-word queries Users may have little or even no knowledge about the topic they are searching for!

29 What is Clickthrough Data Query logs recorded by search engines u, q, l, r, t Users relevance feedback to indicate desired/preferred/target results

30 Joint Bipartite Graph B uq =(V uq,e uq ) V uq = U Q U = {u 1,u 2,..., u m } Q = {q 1,q 2,..., q n } E uq = {(u i,q j ) there is an edge from u i to q j } is the set of all edges. The edge (u i,q j ) exists in this bipartite graph if and only if a user u i issued a query q j. B ql =(V ql,e ql ) V ql = Q L Q = {q 1,q 2,..., q n } L = {l 1,l 2,..., l p } E ql = {(q i,l j ) there is an edge from q i to l j } is the set of all edges. The edge (q j,l k ) exists if and only if a user u i clicked a URL l k after issuing an query q j.

31 Key Points Two-level latent semantic analysis Level 1 Level 2 { { Consider the use of a joint user-query and query-url bipartite graphs for query suggestion Use matrix factorization for learning query features in constructing the Query Similarity Graph Use heat diffusion for similarity propagation for query suggestions

32 Users Queries URLs 0.8 J I G H C 0.8 D F A Bipartite Graphs 0.8 E B Query Similarity Graph Queries are issued by the users, and which URLs to click are also decided by the users Two distinct users are similar if they issued similar queries Two queries are similar if they are issued by similar users

33 rij s jk U i Q j L k Normalized weight, how many times u i issued q j Normalized weight, how many times q j is linked to l k L-dimensional vector of user u i L-dimensional vector of query q j L-dimensional vector of URL l k H(R, U, Q) = min U,Q H(S, Q, L) = min Q,L 1 2 m i=1 n j=1 + α u 2 U 2 F + α q 2 Q 2 F 1 2 n j=1 p k=1 + α q 2 Q 2 F + α l 2 L 2 F I R ij(r ij g(u T i Q j )) 2 I S jk(s jk g(q T j L k )) 2

34 H(S, R, U, Q, L) = 1 n p I S 2 jk(s jk g(q T j L k )) 2 + α r 2 j=1 k=1 + α u 2 U 2 F + α q 2 Q 2 F + α l 2 L 2 F, A local minimum can be found by performing gradient descent in Ui, Qj and Lk m i=1 n Iij(r R ij g(u i T Q j )) 2 j=1

35 Gradient Descent Equations H U i H Q j = H L k = = α r p k=1 n j=1 + α r m n j=1 I R ijg (U T i Q j )(g(u T i Q j ) r ij)q j + α u U i, I S jkg (Q T j L k )(g(q T j L k ) s jk)l k i=1 I R ijg (U T i Q j )(g(u T i Q j ) r ij)u i + α q Q j, I S jkg (Q T j L k )(g(q T j L k ) s jk)q j + α l L k, Only the Q matrix, the queries latent features, is being used to generate the query similarity graph!

36 Query Similarity Graph 0.8 J I G H C 0.8 D 0.7 F E 0.8 Similarities are calculated using queries latent features Only the top-k similar neighbors (terms) are kept 0.3 B 0.1 A 0.2 k = 4

37 Similarity Propagation Based on the Heat Diffusion Model In the query graph, given the heat sources and the initial heat values, start the heat diffusion process and perform P steps Return the Top-N queries in terms of highest heat values for query suggestions

38 Heat Diffusion Model Heat diffusion is a physical phenomena ρc P T t = Q + (k T ) Heat flows from high temperature to low temperature in a medium Heat kernel is used to describe the amount of heat that one point receives from another point The way that heat diffuse varies when the underlying geometry ρ C P T t Q k T v Density Heat capacity and constant pressure Change in temperature over time Heat added Thermal conductivity Temperature gradient Divergence

39 Heat Diffusion Process 0.8 J I G H C 0.8 D F A E 0.8 B

40 Similarity Propagation Model f i (t + t) f i (t) t α H ij = τ i d i f i (t) = f(1) = e αr f(0), k:(q i,q k ) E w ik + j:(q j,q i ) E f(1) = e αh f(0) w ji f j (t) d j w ji /d j, (q j,q i ) E, (τ i /d i ) k:(i,k) E w ik, i = j, 0, otherwise. R = γh + (1 γ)g1 T (1) (2) (3) (4) α d i f i (t) w ik f(0) f(1) τ i γ g Thermal conductivity Heat value of node i at time t Heat value of node i at time t Weight between node i and node k Vector of the initial heat distribution Vector of the heat distribution at time 1 Equal to 1 if node i has outlinks, else equal to 0 Random jump parameter, and set to 0.85 Uniform stochastic distribution vector

41 Discrete Approximation Compute e αr is time consuming We use the discrete approximation to substitute f(1) = ( I + α ) P P R f(0) For every heat source, only diffuse heat to its neighbors within P steps In our experiments, P = 3 already generates fairly good results

42 Query Suggestion Procedure For a given query q 1. Select a set of n queries, each of which contains at least one word in common with q, as heat sources 2. Calculate the initial heat values by fˆqi (0) = W(q) W(ˆq i) W(q) W(ˆq i ) q = Sony Sony = 1 Sony Electronics = 1/2 Sony Vaio Laptop = 1/3 3. Use f(1) = e αr f(0) to diffuse the heat in graph 4. Obtain the Top-N queries from f(1)

43 Physical Meaning of α If set α to a large value The results depend more on the query graph, and more semantically related to original queries, e.g., travel => lowest air fare If set α to a small value The results depend more on the initial heat distributions, and more literally similar to original queries, e.g., travel => travel insurance

44 Experimental Dataset Data Source Collection Period Lines of Logs Unique user IDS Unique queries Unique URLs Unique words Clickthrough data from AOL search engine March 2006 to May 2006 (3 months) 19,442,629 After Pre- Processing 657, ,371 4,802, ,165 1,606, ,302 69,937

45 Query Suggestions

46 Comparisons ODP, Open Directory Project, see

47 Impact of Parameter k To test the extend of similarity needed

48 Impact of Parameter P To test the propagation influence

49 Efficiency Analysis

50 Summary Propose an offline novel joint matrix factorization method using user-query and query-url bipartite graphs for learning query features Propose an online diffusion-based similarity propagation and ranking method for query suggestion

51 Conclusion Social Computing is a paradigm shift! Novel views on the spatial and temporal relationship among social entities! Great opportunities in a new research direction!

52 On-Going Research Machine Learning Direct Zero-norm Optimization for Feature Selection (ICDM 08) Semi-supervised Learning from General Unlabeled Data (ICDM 08) Learning with Consistency between Inductive Functions and Kernels (NIPS 08) An Extended Level Method for Efficient Multiple Kernel Learning (NIPS 08) Semi-supervised Text Categorization by Active Search (CIKM 08) Transductive Support Vector Machine (NIPS 07) Global and local learning (ICML 04, JMLR 04) Web Intelligence Effective Latent Space Graph-based Re-ranking Model with Global Consistency (WSDM 09) Formal Models for Expert Finding on DBLP Bibliography Data (ICDM 08) Learning Latent Semantic Relations from Query Logs for Query Suggestion (CIKM 08) RATE: a Review of Reviewers in a Manuscript Review Process (WI 08) MatchSim: link-based web page similarity measurements (WI 07) Diffusion rank: Ranking web pages based on heat diffusion equations (SIGIR 07) Web text classification (WWW 07) Collaborative Filtering Recommender system: accurate recommendation based on sparse matrix (SIGIR 07) SoRec: Social Recommendation Using Probabilistic Matrix Factorization (CIKM 08) Human Computation An Analytical Study of Puzzle Selection Strategies for the ESP Game (WI 08) An Analytical Approach to Optimizing The Utility of ESP Games (WI 08)

53 Acknowledgments Prof. Michael R. Lyu Prof. Jimmy Lee Dr. Kaizhu Huang Dr. Haixuan Yang Dr. Zenglin Xu (Ph.D.) Hongbo Deng (Ph.D.) Zhenjiang Lin (Ph.D.) Hao Ma (Ph.D.) Haiqin Yang (Ph.D.) Xin Xin (Ph.D.) Thomas Chan (M.Phil) Chao Zhou (Ph.D.)

54 VeriGuide Similarity text detection system Developed at CUHK Promote and uphold academic honesty, integrity, and quality Support English, Traditional and Simplified Chinese Handle.doc,.txt,.pdf,.html, etc. file formats Generate detailed originality report including readability

55 Theory and modeling of social computing in education Technology and software of social computing for education Social educational system design and architectures Case studies, best practices, and demos of social media in education Benchmark and experiments on social computing in education Mobile learning applications for social computing Semantic web standards for e-learning Software for social learning and collaborative learning Life long social learning network Quality and reliability of information and resources Privacy, risk and security issues in education using social media Virtual space for leaning communities Ubiquitous, distributed, and collaborative learning Integration of social learning spaces Social computing in education trend analysis Web 2.0 and social computing for learning (media sharing, media manipulation, conversational arenas, online games, virtual worlds, social networking, blogging, social bookmarking, recommender systems, collaborative editing, wikis, syndication, etc.)

56 Q & A

Scaling Neighbourhood Methods

Scaling Neighbourhood Methods Quick Recap Scaling Neighbourhood Methods Collaborative Filtering m = #items n = #users Complexity : m * m * n Comparative Scale of Signals ~50 M users ~25 M items Explicit Ratings ~ O(1M) (1 per billion)

More information

Beating Social Pulse: Understanding Information Propagation via Online Social Tagging Systems 1

Beating Social Pulse: Understanding Information Propagation via Online Social Tagging Systems 1 Journal of Universal Computer Science, vol. 18, no. 8 (2012, 1022-1031 submitted: 16/9/11, accepted: 14/12/11, appeared: 28/4/12 J.UCS Beating Social Pulse: Understanding Information Propagation via Online

More information

Point-of-Interest Recommendations: Learning Potential Check-ins from Friends

Point-of-Interest Recommendations: Learning Potential Check-ins from Friends Point-of-Interest Recommendations: Learning Potential Check-ins from Friends Huayu Li, Yong Ge +, Richang Hong, Hengshu Zhu University of North Carolina at Charlotte + University of Arizona Hefei University

More information

Collaborative topic models: motivations cont

Collaborative topic models: motivations cont Collaborative topic models: motivations cont Two topics: machine learning social network analysis Two people: " boy Two articles: article A! girl article B Preferences: The boy likes A and B --- no problem.

More information

ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 3 Centrality, Similarity, and Strength Ties

ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 3 Centrality, Similarity, and Strength Ties ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 3 Centrality, Similarity, and Strength Ties Prof. James She james.she@ust.hk 1 Last lecture 2 Selected works from Tutorial

More information

Wiki Definition. Reputation Systems I. Outline. Introduction to Reputations. Yury Lifshits. HITS, PageRank, SALSA, ebay, EigenTrust, VKontakte

Wiki Definition. Reputation Systems I. Outline. Introduction to Reputations. Yury Lifshits. HITS, PageRank, SALSA, ebay, EigenTrust, VKontakte Reputation Systems I HITS, PageRank, SALSA, ebay, EigenTrust, VKontakte Yury Lifshits Wiki Definition Reputation is the opinion (more technically, a social evaluation) of the public toward a person, a

More information

A Framework of Detecting Burst Events from Micro-blogging Streams

A Framework of Detecting Burst Events from Micro-blogging Streams , pp.379-384 http://dx.doi.org/10.14257/astl.2013.29.78 A Framework of Detecting Burst Events from Micro-blogging Streams Kaifang Yang, Yongbo Yu, Lizhou Zheng, Peiquan Jin School of Computer Science and

More information

Circle-based Recommendation in Online Social Networks

Circle-based Recommendation in Online Social Networks Circle-based Recommendation in Online Social Networks Xiwang Yang, Harald Steck*, and Yong Liu Polytechnic Institute of NYU * Bell Labs/Netflix 1 Outline q Background & Motivation q Circle-based RS Trust

More information

Online Social Networks and Media. Link Analysis and Web Search

Online Social Networks and Media. Link Analysis and Web Search Online Social Networks and Media Link Analysis and Web Search How to Organize the Web First try: Human curated Web directories Yahoo, DMOZ, LookSmart How to organize the web Second try: Web Search Information

More information

Relational Stacked Denoising Autoencoder for Tag Recommendation. Hao Wang

Relational Stacked Denoising Autoencoder for Tag Recommendation. Hao Wang Relational Stacked Denoising Autoencoder for Tag Recommendation Hao Wang Dept. of Computer Science and Engineering Hong Kong University of Science and Technology Joint work with Xingjian Shi and Dit-Yan

More information

Yahoo! Labs Nov. 1 st, Liangjie Hong, Ph.D. Candidate Dept. of Computer Science and Engineering Lehigh University

Yahoo! Labs Nov. 1 st, Liangjie Hong, Ph.D. Candidate Dept. of Computer Science and Engineering Lehigh University Yahoo! Labs Nov. 1 st, 2012 Liangjie Hong, Ph.D. Candidate Dept. of Computer Science and Engineering Lehigh University Motivation Modeling Social Streams Future work Motivation Modeling Social Streams

More information

Liangjie Hong, Ph.D. Candidate Dept. of Computer Science and Engineering Lehigh University Bethlehem, PA

Liangjie Hong, Ph.D. Candidate Dept. of Computer Science and Engineering Lehigh University Bethlehem, PA Rutgers, The State University of New Jersey Nov. 12, 2012 Liangjie Hong, Ph.D. Candidate Dept. of Computer Science and Engineering Lehigh University Bethlehem, PA Motivation Modeling Social Streams Future

More information

DS504/CS586: Big Data Analytics Graph Mining II

DS504/CS586: Big Data Analytics Graph Mining II Welcome to DS504/CS586: Big Data Analytics Graph Mining II Prof. Yanhua Li Time: 6:00pm 8:50pm Mon. and Wed. Location: SL105 Spring 2016 Reading assignments We will increase the bar a little bit Please

More information

Topic Models and Applications to Short Documents

Topic Models and Applications to Short Documents Topic Models and Applications to Short Documents Dieu-Thu Le Email: dieuthu.le@unitn.it Trento University April 6, 2011 1 / 43 Outline Introduction Latent Dirichlet Allocation Gibbs Sampling Short Text

More information

Asymmetric Correlation Regularized Matrix Factorization for Web Service Recommendation

Asymmetric Correlation Regularized Matrix Factorization for Web Service Recommendation Asymmetric Correlation Regularized Matrix Factorization for Web Service Recommendation Qi Xie, Shenglin Zhao, Zibin Zheng, Jieming Zhu and Michael R. Lyu School of Computer Science and Technology, Southwest

More information

Web Structure Mining Nodes, Links and Influence

Web Structure Mining Nodes, Links and Influence Web Structure Mining Nodes, Links and Influence 1 Outline 1. Importance of nodes 1. Centrality 2. Prestige 3. Page Rank 4. Hubs and Authority 5. Metrics comparison 2. Link analysis 3. Influence model 1.

More information

APPLICATIONS OF MINING HETEROGENEOUS INFORMATION NETWORKS

APPLICATIONS OF MINING HETEROGENEOUS INFORMATION NETWORKS APPLICATIONS OF MINING HETEROGENEOUS INFORMATION NETWORKS Yizhou Sun College of Computer and Information Science Northeastern University yzsun@ccs.neu.edu July 25, 2015 Heterogeneous Information Networks

More information

Recommendation Systems

Recommendation Systems Recommendation Systems Pawan Goyal CSE, IITKGP October 21, 2014 Pawan Goyal (IIT Kharagpur) Recommendation Systems October 21, 2014 1 / 52 Recommendation System? Pawan Goyal (IIT Kharagpur) Recommendation

More information

* Matrix Factorization and Recommendation Systems

* Matrix Factorization and Recommendation Systems Matrix Factorization and Recommendation Systems Originally presented at HLF Workshop on Matrix Factorization with Loren Anderson (University of Minnesota Twin Cities) on 25 th September, 2017 15 th March,

More information

DEKDIV: A Linked-Data-Driven Web Portal for Learning Analytics Data Enrichment, Interactive Visualization, and Knowledge Discovery

DEKDIV: A Linked-Data-Driven Web Portal for Learning Analytics Data Enrichment, Interactive Visualization, and Knowledge Discovery DEKDIV: A Linked-Data-Driven Web Portal for Learning Analytics Data Enrichment, Interactive Visualization, and Knowledge Discovery Yingjie Hu, Grant McKenzie, Jiue-An Yang, Song Gao, Amin Abdalla, and

More information

Introduction to ArcGIS Maps for Office. Greg Ponto Scott Ball

Introduction to ArcGIS Maps for Office. Greg Ponto Scott Ball Introduction to ArcGIS Maps for Office Greg Ponto Scott Ball Agenda What is Maps for Office? Platform overview What are Apps for the Office? ArcGIS Maps for Office features - Visualization - Geoenrichment

More information

Link Prediction. Eman Badr Mohammed Saquib Akmal Khan

Link Prediction. Eman Badr Mohammed Saquib Akmal Khan Link Prediction Eman Badr Mohammed Saquib Akmal Khan 11-06-2013 Link Prediction Which pair of nodes should be connected? Applications Facebook friend suggestion Recommendation systems Monitoring and controlling

More information

From Social User Activities to People Affiliation

From Social User Activities to People Affiliation 2013 IEEE 13th International Conference on Data Mining From Social User Activities to People Affiliation Guangxiang Zeng 1, Ping uo 2, Enhong Chen 1 and Min Wang 3 1 University of Science and Technology

More information

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize/navigate it? First try: Human curated Web directories Yahoo, DMOZ, LookSmart

More information

MobiHoc 2014 MINIMUM-SIZED INFLUENTIAL NODE SET SELECTION FOR SOCIAL NETWORKS UNDER THE INDEPENDENT CASCADE MODEL

MobiHoc 2014 MINIMUM-SIZED INFLUENTIAL NODE SET SELECTION FOR SOCIAL NETWORKS UNDER THE INDEPENDENT CASCADE MODEL MobiHoc 2014 MINIMUM-SIZED INFLUENTIAL NODE SET SELECTION FOR SOCIAL NETWORKS UNDER THE INDEPENDENT CASCADE MODEL Jing (Selena) He Department of Computer Science, Kennesaw State University Shouling Ji,

More information

Click Prediction and Preference Ranking of RSS Feeds

Click Prediction and Preference Ranking of RSS Feeds Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS

More information

DS504/CS586: Big Data Analytics Graph Mining II

DS504/CS586: Big Data Analytics Graph Mining II Welcome to DS504/CS586: Big Data Analytics Graph Mining II Prof. Yanhua Li Time: 6-8:50PM Thursday Location: AK233 Spring 2018 v Course Project I has been graded. Grading was based on v 1. Project report

More information

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine CS 277: Data Mining Mining Web Link Structure Class Presentations In-class, Tuesday and Thursday next week 2-person teams: 6 minutes, up to 6 slides, 3 minutes/slides each person 1-person teams 4 minutes,

More information

Information Retrieval and Organisation

Information Retrieval and Organisation Information Retrieval and Organisation Chapter 13 Text Classification and Naïve Bayes Dell Zhang Birkbeck, University of London Motivation Relevance Feedback revisited The user marks a number of documents

More information

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab

More information

CS249: ADVANCED DATA MINING

CS249: ADVANCED DATA MINING CS249: ADVANCED DATA MINING Graph and Network Instructor: Yizhou Sun yzsun@cs.ucla.edu May 31, 2017 Methods Learnt Classification Clustering Vector Data Text Data Recommender System Decision Tree; Naïve

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data Instructor: Yizhou Sun yzsun@ccs.neu.edu March 16, 2016 Methods to Learn Classification Clustering Frequent Pattern Mining Matrix Data Decision

More information

Collaborative Filtering

Collaborative Filtering Collaborative Filtering Nicholas Ruozzi University of Texas at Dallas based on the slides of Alex Smola & Narges Razavian Collaborative Filtering Combining information among collaborating entities to make

More information

Aggregated Temporal Tensor Factorization Model for Point-of-interest Recommendation

Aggregated Temporal Tensor Factorization Model for Point-of-interest Recommendation Aggregated Temporal Tensor Factorization Model for Point-of-interest Recommendation Shenglin Zhao 1,2B, Michael R. Lyu 1,2, and Irwin King 1,2 1 Shenzhen Key Laboratory of Rich Media Big Data Analytics

More information

CS-E4830 Kernel Methods in Machine Learning

CS-E4830 Kernel Methods in Machine Learning CS-E4830 Kernel Methods in Machine Learning Lecture 5: Multi-class and preference learning Juho Rousu 11. October, 2017 Juho Rousu 11. October, 2017 1 / 37 Agenda from now on: This week s theme: going

More information

RaRE: Social Rank Regulated Large-scale Network Embedding

RaRE: Social Rank Regulated Large-scale Network Embedding RaRE: Social Rank Regulated Large-scale Network Embedding Authors: Yupeng Gu 1, Yizhou Sun 1, Yanen Li 2, Yang Yang 3 04/26/2018 The Web Conference, 2018 1 University of California, Los Angeles 2 Snapchat

More information

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima. http://goo.gl/xilnmn Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT

More information

PU Learning for Matrix Completion

PU Learning for Matrix Completion Cho-Jui Hsieh Dept of Computer Science UT Austin ICML 2015 Joint work with N. Natarajan and I. S. Dhillon Matrix Completion Example: movie recommendation Given a set Ω and the values M Ω, how to predict

More information

Leverage Sparse Information in Predictive Modeling

Leverage Sparse Information in Predictive Modeling Leverage Sparse Information in Predictive Modeling Liang Xie Countrywide Home Loans, Countrywide Bank, FSB August 29, 2008 Abstract This paper examines an innovative method to leverage information from

More information

Learning Query and Document Similarities from Click-through Bipartite Graph with Metadata

Learning Query and Document Similarities from Click-through Bipartite Graph with Metadata Learning Query and Document Similarities from Click-through Bipartite Graph with Metadata ABSTRACT Wei Wu Microsoft Research Asia No 5, Danling Street, Haidian District Beiing, China, 100080 wuwei@microsoft.com

More information

Joint user knowledge and matrix factorization for recommender systems

Joint user knowledge and matrix factorization for recommender systems World Wide Web (2018) 21:1141 1163 DOI 10.1007/s11280-017-0476-7 Joint user knowledge and matrix factorization for recommender systems Yonghong Yu 1,2 Yang Gao 2 Hao Wang 2 Ruili Wang 3 Received: 13 February

More information

Reimaging GIS: Geographic Information Society. Clint Brown Linda Beale Mark Harrower Esri

Reimaging GIS: Geographic Information Society. Clint Brown Linda Beale Mark Harrower Esri Reimaging GIS: Geographic Information Society Clint Brown Linda Beale Mark Harrower Esri 8 billion = Number of basemap requests per month on AGOL 14,000 = Unique requests per second 12,000 = New Items

More information

Learning Query and Document Similarities from Click-through Bipartite Graph with Metadata

Learning Query and Document Similarities from Click-through Bipartite Graph with Metadata Learning Query and Document Similarities from Click-through Bipartite Graph with Metadata Wei Wu a, Hang Li b, Jun Xu b a Department of Probability and Statistics, Peking University b Microsoft Research

More information

Lecture 12: Link Analysis for Web Retrieval

Lecture 12: Link Analysis for Web Retrieval Lecture 12: Link Analysis for Web Retrieval Trevor Cohn COMP90042, 2015, Semester 1 What we ll learn in this lecture The web as a graph Page-rank method for deriving the importance of pages Hubs and authorities

More information

Online Learning for Multi-Task Feature Selection

Online Learning for Multi-Task Feature Selection Online Learning for Multi-Task Feature Selection Haiqin Yang Department of Computer Science and Engineering The Chinese University of Hong Kong Joint work with Irwin King and Michael R. Lyu November 28,

More information

Hong Kong Observatory Summer Placement Programme 2018

Hong Kong Observatory Summer Placement Programme 2018 Hong Kong Observatory Summer Placement Programme 2018 Training Programme : supervise the student. A mentor from the Hong Kong Observatory with relevant expertise will Training Period : 8 weeks, starting

More information

Outliers Treatment in Support Vector Regression for Financial Time Series Prediction

Outliers Treatment in Support Vector Regression for Financial Time Series Prediction Outliers Treatment in Support Vector Regression for Financial Time Series Prediction Haiqin Yang, Kaizhu Huang, Laiwan Chan, Irwin King, and Michael R. Lyu Department of Computer Science and Engineering

More information

Recommendation Systems

Recommendation Systems Recommendation Systems Popularity Recommendation Systems Predicting user responses to options Offering news articles based on users interests Offering suggestions on what the user might like to buy/consume

More information

Collaborative Filtering. Radek Pelánek

Collaborative Filtering. Radek Pelánek Collaborative Filtering Radek Pelánek 2017 Notes on Lecture the most technical lecture of the course includes some scary looking math, but typically with intuitive interpretation use of standard machine

More information

Deep Learning for NLP

Deep Learning for NLP Deep Learning for NLP CS224N Christopher Manning (Many slides borrowed from ACL 2012/NAACL 2013 Tutorials by me, Richard Socher and Yoshua Bengio) Machine Learning and NLP NER WordNet Usually machine learning

More information

Conquering the Complexity of Time: Machine Learning for Big Time Series Data

Conquering the Complexity of Time: Machine Learning for Big Time Series Data Conquering the Complexity of Time: Machine Learning for Big Time Series Data Yan Liu Computer Science Department University of Southern California Mini-Workshop on Theoretical Foundations of Cyber-Physical

More information

11/3/15. Deep Learning for NLP. Deep Learning and its Architectures. What is Deep Learning? Advantages of Deep Learning (Part 1)

11/3/15. Deep Learning for NLP. Deep Learning and its Architectures. What is Deep Learning? Advantages of Deep Learning (Part 1) 11/3/15 Machine Learning and NLP Deep Learning for NLP Usually machine learning works well because of human-designed representations and input features CS224N WordNet SRL Parser Machine learning becomes

More information

Semi-Supervised Learning

Semi-Supervised Learning Semi-Supervised Learning getting more for less in natural language processing and beyond Xiaojin (Jerry) Zhu School of Computer Science Carnegie Mellon University 1 Semi-supervised Learning many human

More information

A Modified PMF Model Incorporating Implicit Item Associations

A Modified PMF Model Incorporating Implicit Item Associations A Modified PMF Model Incorporating Implicit Item Associations Qiang Liu Institute of Artificial Intelligence College of Computer Science Zhejiang University Hangzhou 31007, China Email: 01dtd@gmail.com

More information

Mining Triadic Closure Patterns in Social Networks

Mining Triadic Closure Patterns in Social Networks Mining Triadic Closure Patterns in Social Networks Hong Huang, University of Goettingen Jie Tang, Tsinghua University Sen Wu, Stanford University Lu Liu, Northwestern University Xiaoming Fu, University

More information

Prediction of Citations for Academic Papers

Prediction of Citations for Academic Papers 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Department of Computer Science, Guiyang University, Guiyang , GuiZhou, China

Department of Computer Science, Guiyang University, Guiyang , GuiZhou, China doi:10.21311/002.31.12.01 A Hybrid Recommendation Algorithm with LDA and SVD++ Considering the News Timeliness Junsong Luo 1*, Can Jiang 2, Peng Tian 2 and Wei Huang 2, 3 1 College of Information Science

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data Instructor: Yizhou Sun yzsun@ccs.neu.edu November 16, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining Matrix Data Decision

More information

Geo Business Gis In The Digital Organization

Geo Business Gis In The Digital Organization We have made it easy for you to find a PDF Ebooks without any digging. And by having access to our ebooks online or by storing it on your computer, you have convenient answers with geo business gis in

More information

Interact with Strangers

Interact with Strangers Interact with Strangers RATE: Recommendation-aware Trust Evaluation in Online Social Networks Wenjun Jiang 1, 2, Jie Wu 2, and Guojun Wang 1 1. School of Information Science and Engineering, Central South

More information

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run

More information

Climate Risk Visualization for Adaptation Planning and Emergency Response

Climate Risk Visualization for Adaptation Planning and Emergency Response Climate Risk Visualization for Adaptation Planning and Emergency Response NCR Flood Fact Finding Workshop Ricardo Saavedra ricardo@vizonomy.com Social Media, Mobile, and Big Data St. Peter s Basilica,

More information

Large-scale Collaborative Ranking in Near-Linear Time

Large-scale Collaborative Ranking in Near-Linear Time Large-scale Collaborative Ranking in Near-Linear Time Liwei Wu Depts of Statistics and Computer Science UC Davis KDD 17, Halifax, Canada August 13-17, 2017 Joint work with Cho-Jui Hsieh and James Sharpnack

More information

A Novel Click Model and Its Applications to Online Advertising

A Novel Click Model and Its Applications to Online Advertising A Novel Click Model and Its Applications to Online Advertising Zeyuan Zhu Weizhu Chen Tom Minka Chenguang Zhu Zheng Chen February 5, 2010 1 Introduction Click Model - To model the user behavior Application

More information

Matrix Factorization Techniques for Recommender Systems

Matrix Factorization Techniques for Recommender Systems Matrix Factorization Techniques for Recommender Systems Patrick Seemann, December 16 th, 2014 16.12.2014 Fachbereich Informatik Recommender Systems Seminar Patrick Seemann Topics Intro New-User / New-Item

More information

SCMF: Sparse Covariance Matrix Factorization for Collaborative Filtering

SCMF: Sparse Covariance Matrix Factorization for Collaborative Filtering SCMF: Sparse Covariance Matrix Factorization for Collaborative Filtering Jianping Shi Naiyan Wang Yang Xia Dit-Yan Yeung Irwin King Jiaya Jia Department of Computer Science and Engineering, The Chinese

More information

Cost and Preference in Recommender Systems Junhua Chen LESS IS MORE

Cost and Preference in Recommender Systems Junhua Chen LESS IS MORE Cost and Preference in Recommender Systems Junhua Chen, Big Data Research Center, UESTC Email:junmshao@uestc.edu.cn http://staff.uestc.edu.cn/shaojunming Abstract In many recommender systems (RS), user

More information

Structured deep models: Deep learning on graphs and beyond

Structured deep models: Deep learning on graphs and beyond Structured deep models: Deep learning on graphs and beyond Hidden layer Hidden layer Input Output ReLU ReLU, 25 May 2018 CompBio Seminar, University of Cambridge In collaboration with Ethan Fetaya, Rianne

More information

Mixed Membership Matrix Factorization

Mixed Membership Matrix Factorization Mixed Membership Matrix Factorization Lester Mackey 1 David Weiss 2 Michael I. Jordan 1 1 University of California, Berkeley 2 University of Pennsylvania International Conference on Machine Learning, 2010

More information

Large-Scale Social Network Data Mining with Multi-View Information. Hao Wang

Large-Scale Social Network Data Mining with Multi-View Information. Hao Wang Large-Scale Social Network Data Mining with Multi-View Information Hao Wang Dept. of Computer Science and Engineering Shanghai Jiao Tong University Supervisor: Wu-Jun Li 2013.6.19 Hao Wang Multi-View Social

More information

Personal Field Data Collection by UM-FieldGIS (Integration of Google Map API to Mobile GIS)

Personal Field Data Collection by UM-FieldGIS (Integration of Google Map API to Mobile GIS) Personal Field Data Collection by UM-FieldGIS (Integration of Google Map API to Mobile GIS) Ko Ko Lwin*. Yuji MURAYAMA* *Division of Spatial Information Science Graduate School of Life and Environmental

More information

Little Is Much: Bridging Cross-Platform Behaviors through Overlapped Crowds

Little Is Much: Bridging Cross-Platform Behaviors through Overlapped Crowds Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16) Little Is Much: Bridging Cross-Platform Behaviors through Overlapped Crowds Meng Jiang, Peng Cui Tsinghua University Nicholas

More information

Recommender Systems with Social Regularization

Recommender Systems with Social Regularization Recommender Systems with Social Regularization Hao Ma The Chinese University of Hong Kong Shatin, N.T., Hong Kong hma@cse.cuhk.edu.hk Michael R. Lyu The Chinese University of Hong Kong Shatin, N.T., Hong

More information

Hybrid Models for Text and Graphs. 10/23/2012 Analysis of Social Media

Hybrid Models for Text and Graphs. 10/23/2012 Analysis of Social Media Hybrid Models for Text and Graphs 10/23/2012 Analysis of Social Media Newswire Text Formal Primary purpose: Inform typical reader about recent events Broad audience: Explicitly establish shared context

More information

A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank

A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank Shoaib Jameel Shoaib Jameel 1, Wai Lam 2, Steven Schockaert 1, and Lidong Bing 3 1 School of Computer Science and Informatics,

More information

Social Recommendation in Dynamic Networks

Social Recommendation in Dynamic Networks ocial Recommendation in Dynamic Networks 193 References Barbier G (01) Finding provenance data in social media. Doctoral dissertation, Arizona tate University Facebook (01) https://www.facebook.com/photo.php?

More information

Predicting New Search-Query Cluster Volume

Predicting New Search-Query Cluster Volume Predicting New Search-Query Cluster Volume Jacob Sisk, Cory Barr December 14, 2007 1 Problem Statement Search engines allow people to find information important to them, and search engine companies derive

More information

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Overlapping Communities

Overlapping Communities Overlapping Communities Davide Mottin HassoPlattner Institute Graph Mining course Winter Semester 2017 Acknowledgements Most of this lecture is taken from: http://web.stanford.edu/class/cs224w/slides GRAPH

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 11, 2016 Paper presentations and final project proposal Send me the names of your group member (2 or 3 students) before October 15 (this Friday)

More information

Slides based on those in:

Slides based on those in: Spyros Kontogiannis & Christos Zaroliagis Slides based on those in: http://www.mmds.org High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering

More information

Online Social Networks and Media. Link Analysis and Web Search

Online Social Networks and Media. Link Analysis and Web Search Online Social Networks and Media Link Analysis and Web Search How to Organize the Web First try: Human curated Web directories Yahoo, DMOZ, LookSmart How to organize the web Second try: Web Search Information

More information

Mobility Analytics through Social and Personal Data. Pierre Senellart

Mobility Analytics through Social and Personal Data. Pierre Senellart Mobility Analytics through Social and Personal Data Pierre Senellart Session: Big Data & Transport Business Convention on Big Data Université Paris-Saclay, 25 novembre 2015 Analyzing Transportation and

More information

A Tunable Mechanism for Identifying Trusted Nodes in Large Scale Distributed Networks

A Tunable Mechanism for Identifying Trusted Nodes in Large Scale Distributed Networks A Tunable Mechanism for Identifying Trusted Nodes in Large Scale Distributed Networks Joydeep Chandra 1, Ingo Scholtes 2, Niloy Ganguly 1, Frank Schweitzer 2 1 - Dept. of Computer Science and Engineering,

More information

Mixed Membership Matrix Factorization

Mixed Membership Matrix Factorization Mixed Membership Matrix Factorization Lester Mackey University of California, Berkeley Collaborators: David Weiss, University of Pennsylvania Michael I. Jordan, University of California, Berkeley 2011

More information

Large-scale Image Annotation by Efficient and Robust Kernel Metric Learning

Large-scale Image Annotation by Efficient and Robust Kernel Metric Learning Large-scale Image Annotation by Efficient and Robust Kernel Metric Learning Supplementary Material Zheyun Feng Rong Jin Anil Jain Department of Computer Science and Engineering, Michigan State University,

More information

Overlapping Community Detection at Scale: A Nonnegative Matrix Factorization Approach

Overlapping Community Detection at Scale: A Nonnegative Matrix Factorization Approach Overlapping Community Detection at Scale: A Nonnegative Matrix Factorization Approach Author: Jaewon Yang, Jure Leskovec 1 1 Venue: WSDM 2013 Presenter: Yupeng Gu 1 Stanford University 1 Background Community

More information

Predictive Discrete Latent Factor Models for large incomplete dyadic data

Predictive Discrete Latent Factor Models for large incomplete dyadic data Predictive Discrete Latent Factor Models for large incomplete dyadic data Deepak Agarwal, Srujana Merugu, Abhishek Agarwal Y! Research MMDS Workshop, Stanford University 6/25/2008 Agenda Motivating applications

More information

Large Scale Semi-supervised Linear SVM with Stochastic Gradient Descent

Large Scale Semi-supervised Linear SVM with Stochastic Gradient Descent Journal of Computational Information Systems 9: 15 (2013) 6251 6258 Available at http://www.jofcis.com Large Scale Semi-supervised Linear SVM with Stochastic Gradient Descent Xin ZHOU, Conghui ZHU, Sheng

More information

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute

More information

Regularity and Conformity: Location Prediction Using Heterogeneous Mobility Data

Regularity and Conformity: Location Prediction Using Heterogeneous Mobility Data Regularity and Conformity: Location Prediction Using Heterogeneous Mobility Data Yingzi Wang 12, Nicholas Jing Yuan 2, Defu Lian 3, Linli Xu 1 Xing Xie 2, Enhong Chen 1, Yong Rui 2 1 University of Science

More information

Non-negative Matrix Factorization: Algorithms, Extensions and Applications

Non-negative Matrix Factorization: Algorithms, Extensions and Applications Non-negative Matrix Factorization: Algorithms, Extensions and Applications Emmanouil Benetos www.soi.city.ac.uk/ sbbj660/ March 2013 Emmanouil Benetos Non-negative Matrix Factorization March 2013 1 / 25

More information

Topic Models. Advanced Machine Learning for NLP Jordan Boyd-Graber OVERVIEW. Advanced Machine Learning for NLP Boyd-Graber Topic Models 1 of 1

Topic Models. Advanced Machine Learning for NLP Jordan Boyd-Graber OVERVIEW. Advanced Machine Learning for NLP Boyd-Graber Topic Models 1 of 1 Topic Models Advanced Machine Learning for NLP Jordan Boyd-Graber OVERVIEW Advanced Machine Learning for NLP Boyd-Graber Topic Models 1 of 1 Low-Dimensional Space for Documents Last time: embedding space

More information

Generative Models for Discrete Data

Generative Models for Discrete Data Generative Models for Discrete Data ddebarr@uw.edu 2016-04-21 Agenda Bayesian Concept Learning Beta-Binomial Model Dirichlet-Multinomial Model Naïve Bayes Classifiers Bayesian Concept Learning Numbers

More information

Methods and tools for semantic social network analysis

Methods and tools for semantic social network analysis Methods and tools for semantic social network analysis Big Open Data Analysis, Roma 2 nd February, 2018 Antonio De Nicola (DTE-SEN-APIC, ENEA) INFLUENCERS D Agostino G., D Antonio F., De Nicola A., Tucci

More information

Predictive analysis on Multivariate, Time Series datasets using Shapelets

Predictive analysis on Multivariate, Time Series datasets using Shapelets 1 Predictive analysis on Multivariate, Time Series datasets using Shapelets Hemal Thakkar Department of Computer Science, Stanford University hemal@stanford.edu hemal.tt@gmail.com Abstract Multivariate,

More information

Personalized Social Recommendations Accurate or Private

Personalized Social Recommendations Accurate or Private Personalized Social Recommendations Accurate or Private Presented by: Lurye Jenny Paper by: Ashwin Machanavajjhala, Aleksandra Korolova, Atish Das Sarma Outline Introduction Motivation The model General

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!

More information

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH Hoang Trang 1, Tran Hoang Loc 1 1 Ho Chi Minh City University of Technology-VNU HCM, Ho Chi

More information

Discovering Geographical Topics in Twitter

Discovering Geographical Topics in Twitter Discovering Geographical Topics in Twitter Liangjie Hong, Lehigh University Amr Ahmed, Yahoo! Research Alexander J. Smola, Yahoo! Research Siva Gurumurthy, Twitter Kostas Tsioutsiouliklis, Twitter Overview

More information