Analysis of Multiview Legislative Networks with Structured Matrix Factorization: Does Twitter Influence Translate to the Real World?
|
|
- Stanley Payne
- 5 years ago
- Views:
Transcription
1 Analysis of Multiview Legislative Networks with Structured Matrix Factorization: Does Twitter Influence Translate to the Real World? Shawn Mankad The University of Maryland Joint work with: George Michailidis 1 / 30
2 Motivation There is a growing literature that attempts to understand and exploit social networking platforms for resource optimization and marketing. We develop new methodology for identifying important accounts based on studying networks that are generated from Twitter, which has over 20 million active accounts each month as of September / 30
3 Motivation Twitter platform Twitter allows accounts to broadcast short messages, referred to as tweets A tweet that is a copy of another account s tweet is called a retweet Within a tweet, an account can mention another account by referring to their account name with symbol as a prefix Accounts also declare the other accounts they are interested in following, which means the follower receives notication whenever a new tweet is posted by the followed account Each of the three actions define networks. Collectively, they define a multiview network. 3 / 30
4 Motivation Example of Multiview Networks Twitter networks from 418 Members of Parliament (MPs) in the United Kingdom Retweet Network Mentions Network Follows Network 12 Conservative MPs 18 Labour 43 Liberal Democrats 5 MPs representing the Scottish National Party (SNP) 11 MPs belonging to other parties 4 / 30
5 Motivation Motivating Question Can we use the network structures in Twitter to create an influence measure that is a surrogate for real-life MP influence? There are many ways to combine network structure (communities) with network statistics for the identification of influential nodes, (e.g., MPs), but it remains unclear which is the preferred method. We integrate both steps together to address this issue through matrix factorization. PageRank, HITS, etc. 5 / 30
6 Non-negative Matrix Factorization for Network Analysis Outline Motivation Non-negative Matrix Factorization for Network Analysis Structured NMF for Network Analysis Extension to Multiview Networks Application to the Data 6 / 30
7 Non-negative Matrix Factorization for Network Analysis Non-negative Matrix Factorization Let Y be an observed n p matrix that is non-negative. NMF expresses where U R n K +, V Rp K +. Y UV T, / 30
8 Non-negative Matrix Factorization for Network Analysis Why NMF? 1 Better interpretability: NMF SVD Networks, other data from social sciences are typically non-negative 1 Images modified from Xu, W., Liu, X., & Gong, Y. (2003, July). Document clustering based on non-negative matrix factorization. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval (pp ). ACM. 8 / 30
9 Non-negative Matrix Factorization for Network Analysis Interpretations of NMF Y = = K U k Vk T k=1 Mean of Cluster k in R p +... s.t. V jk = 1 k [P(Obs.1 group k),..., P(Obs.n group k)], Ding et al (2009) show NMF equivalence with relaxed K-means. Y ij = (UDV T ) ij s.t. Y ij = 1, V kj = i,j k k P(w i, d j ) = P(w i z k ) P(z k ) P(d j z k ), Ding et al (2008) show NMF equivalence with PLSI. U ik = 1 9 / 30
10 Non-negative Matrix Factorization for Network Analysis Edge Assignment and Overlapping Communities Y ij = U i1 V j U ik V jk, U ik V jk measures the contribution of community k to edge Y ij. Rank 3 NMF SVD (Spectral clustering) 10 / 30
11 Structured NMF for Network Analysis Outline Motivation Non-negative Matrix Factorization for Network Analysis Structured NMF for Network Analysis Extension to Multiview Networks Application to the Data 11 / 30
12 Structured NMF for Network Analysis Structured Semi-NMF We propose min Y SΛV T 2 F, Λ;V 0 where S R n d, Λ R d K, and V R n K +. Each column of S is a node-level network statistic that is calculated a-priori, e.g., c 1 b 1 c 2 b 2 S = c n b n. S are covariates that guide the matrix factorization to more interpretable solutions. Then V can be used to rank nodes within each community. 12 / 30
13 Structured NMF for Network Analysis Centrality Measures If S is specified, then nodes with different types of local topologies will be emphasized in the factorizations. For instance, in each of the following networks, X has higher centrality than Y according to a particular measure. 13 / 30
14 Structured NMF for Network Analysis Analysis Procedure 1. Specify S (node-level statistics), K (number of communities). 2. Perform the matrix factorization. 3. Node i has importance I i = k V ik. 4. Rank nodes according to I. 14 / 30
15 Structured NMF for Network Analysis Semi-NMF If S = I, then min Y ΛV T 2 F, Λ;V 0 which is similar to the standard NMF model. Thus, if S is not specified, then the usual results. 15 / 30
16 Structured NMF for Network Analysis PageRank Structured Semi-NMF with S = I Structured Semi-NMF with S = [Clustering Coefficient] Structured Semi-NMF with S = [Clustering Coefficient, Betweenness, Closeness, Degree] / 30
17 Extension to Multiview Networks Outline Motivation Non-negative Matrix Factorization for Network Analysis Structured NMF for Network Analysis Extension to Multiview Networks Application to the Data 1 / 30
18 Extension to Multiview Networks New Objective Function Each column of S m is a node-level network statistic, e.g., c 1 b 1 c 2 b 2 Then we propose min Λ m,θ 0,V m 0 S m = c n b n Y m S m Λ m (Θ + V m ) T 2 F, where S m R n d, Λ m R d K, and Θ, V m R n K +. m Rows of Θ reveal the overall importance of a node to each community. 18 / 30
19 Extension to Multiview Networks Analysis Procedure 1. Specify S m (node-level statistics), K (number of communities). 2. Perform the matrix factorization. 3. Node i has importance I i = k Θ ik. 4. Rank nodes according to I. 19 / 30
20 Extension to Multiview Networks Approximate Alternating Least Squares Λ m = (Sm T S m ) 1 Sm T A m (Θ + V m )((Θ + V m ) T (Θ + V m )) 1 V m = A T ms m Λ m (Λ T msm T S m Λ m ) 1 Θ = A T ms m Λ m (Λ T msm T S m Λ m ) 1 m To overcome numerical instabilities that occur when too many elements are exactly zero, and maintain non-negativity of Θ and V m, we project to a small constant. 20 / 30
21 Application to the Data Outline Motivation Non-negative Matrix Factorization for Network Analysis Structured NMF for Network Analysis Extension to Multiview Networks Application to the Data 21 / 30
22 Application to the Data Specifying S m S m = (Betweenness, ClusteringCoefficient, Closeness, Degree) Clustering coefficient for a given node quantifies how close its neighbors are to being a complete graph. A higher measure of clustering coefficient could result from an MP creating buzz. Betweenness quantifies the control of a node on the communication between other nodes in a social network, and is computed as the number of shortest paths going through a given node. Closeness is a related centrality measure that quantifies the length of time it would take for information to spread from a given node to all other nodes. Degree, the number of connections a node has obtained, ensures that active MPs are emphasized in the factorization. 22 / 30
23 Application to the Data Rank 2 S m Rank 3 S m Rank 4 S m % Variance Explained % Variance Explained % Variance Explained Estimated Rank of θ, V m Estimated Rank of θ, V m Estimated Rank of θ, V m We set K = 6 and rank of S m = / 30
24 Application to the Data Results: Ranking by Twitter influence Rank Structured Semi-NMF Semi-NMF PageRank HITS 1 Ed Miliband (L, 248) Ed Miliband (L, 248) Ian Austin (L, 3) Michael Dugher (L, 120) 2 Ed Balls (L, 580) Ed Balls (L, 580) William Hague (C, 1) Ed Miliband (L, 248) 3 Tom Watson (L, 253) Michael Dugher (L, 120) Hugo Swire (C, 5) Ed Balls (L, 580) 4 Michael Dugher (L, 120) Tom Watson (L, 253) Tom Watson (L, 253) Chuka Umunna (L, 203) 5 Chuka Umunna (L, 203) Chuka Umunna (L, 203) Ed Balls (L, 580) Andy Burnham (L, 125) 6 Rachel Reeves (L, 54) Rachel Reeves (L, 54) Michael Dugher (L, 120) Tom Watson (L, 253) Stella Creasy (L, 18) Chris Bryant (L, 164) Pat McFadden (L, 1) Rachel Reeves (L, 54) 8 Chris Bryant (L, 164) Stella Creasy (L, 18) Ed Miliband (L, 248) Chris Bryant (L, 164) 9 Tom Harris (L, 113) Luciana Berger (L, 133) Stella Ceasy (L, 18) Diana Johnson (L, 105) 10 David Miliband (L, 489) Andy Burnham (L, 125) Matthew Hancock (C, 32) Tom Harris (L, 113) 24 / 30
25 Application to the Data Results: Twitter influence does translate to the real world Predicting future newspaper coverage with Poisson Regression and various influence measures I where Controls includes Age Gender Constituency Size Political Party HeadlineCount = F (α + βi + γcontrols), Indicator variable denoting whether each MP represents a constituency within the city of London. 25 / 30
26 Application to the Data UK UK without D.Cameron Irish RMSE Structured Semi NMF Semi NMF HITS PageRank None 0 Structured Semi NMF Semi NMF HITS PageRank None 0 Structured Semi NMF Semi NMF HITS PageRank None Method 26 / 30
27 Application to the Data Using Θ and V m to identify interesting substructure: (a) Retweet Network (b) Mentions Network (c) Follows Network 2 / 30
28 Application to the Data Wrap up Key idea: Use network statistics to guide the factorization to better solutions. 1. If we can identify the right local topology, then we can overcome not having dynamic data for certain tasks. 2. The data is exclusively link meta-data. Content analysis can potentially be avoided with network analysis tools for identifying influential users. Important for applications in marketing and intelligence gathering. Thank you! 28 / 30
29 Application to the Data Betweenness Centrality In marketing theory, these are the types: 1. Bridge Node 2. Gateway Node 3. Creation Node 4. Consumption Node Viral marketing depends heavily on high betweeness bridge nodes! 29 / 30
30 Application to the Data Clustering Coefficient The clustering coefficient for node B asks, if A B and B C, is A C connected? The clustering coefficient for a given node is defined as the ratio of closed triads to total possible closed triads. 30 / 30
MobiHoc 2014 MINIMUM-SIZED INFLUENTIAL NODE SET SELECTION FOR SOCIAL NETWORKS UNDER THE INDEPENDENT CASCADE MODEL
MobiHoc 2014 MINIMUM-SIZED INFLUENTIAL NODE SET SELECTION FOR SOCIAL NETWORKS UNDER THE INDEPENDENT CASCADE MODEL Jing (Selena) He Department of Computer Science, Kennesaw State University Shouling Ji,
More informationWeb Structure Mining Nodes, Links and Influence
Web Structure Mining Nodes, Links and Influence 1 Outline 1. Importance of nodes 1. Centrality 2. Prestige 3. Page Rank 4. Hubs and Authority 5. Metrics comparison 2. Link analysis 3. Influence model 1.
More informationDS504/CS586: Big Data Analytics Graph Mining II
Welcome to DS504/CS586: Big Data Analytics Graph Mining II Prof. Yanhua Li Time: 6:00pm 8:50pm Mon. and Wed. Location: SL105 Spring 2016 Reading assignments We will increase the bar a little bit Please
More informationDS504/CS586: Big Data Analytics Graph Mining II
Welcome to DS504/CS586: Big Data Analytics Graph Mining II Prof. Yanhua Li Time: 6-8:50PM Thursday Location: AK233 Spring 2018 v Course Project I has been graded. Grading was based on v 1. Project report
More informationOrthogonal Nonnegative Matrix Factorization: Multiplicative Updates on Stiefel Manifolds
Orthogonal Nonnegative Matrix Factorization: Multiplicative Updates on Stiefel Manifolds Jiho Yoo and Seungjin Choi Department of Computer Science Pohang University of Science and Technology San 31 Hyoja-dong,
More informationKristina Lerman USC Information Sciences Institute
Rethinking Network Structure Kristina Lerman USC Information Sciences Institute Università della Svizzera Italiana, December 16, 2011 Measuring network structure Central nodes Community structure Strength
More information6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search
6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search Daron Acemoglu and Asu Ozdaglar MIT September 30, 2009 1 Networks: Lecture 7 Outline Navigation (or decentralized search)
More informationORIE 4741: Learning with Big Messy Data. Spectral Graph Theory
ORIE 4741: Learning with Big Messy Data Spectral Graph Theory Mika Sumida Operations Research and Information Engineering Cornell September 15, 2017 1 / 32 Outline Graph Theory Spectral Graph Theory Laplacian
More informationData Mining Techniques
Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!
More informationMining Triadic Closure Patterns in Social Networks
Mining Triadic Closure Patterns in Social Networks Hong Huang, University of Goettingen Jie Tang, Tsinghua University Sen Wu, Stanford University Lu Liu, Northwestern University Xiaoming Fu, University
More informationJointly Clustering Rows and Columns of Binary Matrices: Algorithms and Trade-offs
Jointly Clustering Rows and Columns of Binary Matrices: Algorithms and Trade-offs Jiaming Xu Joint work with Rui Wu, Kai Zhu, Bruce Hajek, R. Srikant, and Lei Ying University of Illinois, Urbana-Champaign
More informationCOMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017
COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University TOPIC MODELING MODELS FOR TEXT DATA
More informationDM-Group Meeting. Subhodip Biswas 10/16/2014
DM-Group Meeting Subhodip Biswas 10/16/2014 Papers to be discussed 1. Crowdsourcing Land Use Maps via Twitter Vanessa Frias-Martinez and Enrique Frias-Martinez in KDD 2014 2. Tracking Climate Change Opinions
More informationFinding central nodes in large networks
Finding central nodes in large networks Nelly Litvak University of Twente Eindhoven University of Technology, The Netherlands Woudschoten Conference 2017 Complex networks Networks: Internet, WWW, social
More informationNonnegative Matrix Factorization
Nonnegative Matrix Factorization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationOverview and comparison of random walk based techniques for estimating network averages
Overview and comparison of random walk based techniques for estimating network averages Konstantin Avrachenkov (Inria, France) Ribno COSTNET Conference, 21 Sept. 2016 Motivation Analysing (online) social
More informationMining Newsgroups Using Networks Arising From Social Behavior by Rakesh Agrawal et al. Presented by Will Lee
Mining Newsgroups Using Networks Arising From Social Behavior by Rakesh Agrawal et al. Presented by Will Lee wwlee1@uiuc.edu September 28, 2004 Motivation IR on newsgroups is challenging due to lack of
More informationOn the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering
On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering Chris Ding, Xiaofeng He, Horst D. Simon Published on SDM 05 Hongchang Gao Outline NMF NMF Kmeans NMF Spectral Clustering NMF
More informationNonnegative Matrix Factorization and Probabilistic Latent Semantic Indexing: Equivalence, Chi-square Statistic, and a Hybrid Method
Nonnegative Matrix Factorization and Probabilistic Latent Semantic Indexing: Equivalence, hi-square Statistic, and a Hybrid Method hris Ding a, ao Li b and Wei Peng b a Lawrence Berkeley National Laboratory,
More informationPU Learning for Matrix Completion
Cho-Jui Hsieh Dept of Computer Science UT Austin ICML 2015 Joint work with N. Natarajan and I. S. Dhillon Matrix Completion Example: movie recommendation Given a set Ω and the values M Ω, how to predict
More informationDegree Distribution: The case of Citation Networks
Network Analysis Degree Distribution: The case of Citation Networks Papers (in almost all fields) refer to works done earlier on same/related topics Citations A network can be defined as Each node is
More informationFacebook Friends! and Matrix Functions
Facebook Friends! and Matrix Functions! Graduate Research Day Joint with David F. Gleich, (Purdue), supported by" NSF CAREER 1149756-CCF Kyle Kloster! Purdue University! Network Analysis Use linear algebra
More informationStatistical and Computational Analysis of Locality Preserving Projection
Statistical and Computational Analysis of Locality Preserving Projection Xiaofei He xiaofei@cs.uchicago.edu Department of Computer Science, University of Chicago, 00 East 58th Street, Chicago, IL 60637
More informationOn the equivalence between Non-negative Matrix Factorization and Probabilistic Latent Semantic Indexing
Computational Statistics and Data Analysis 52 (2008) 3913 3927 www.elsevier.com/locate/csda On the equivalence between Non-negative Matrix Factorization and Probabilistic Latent Semantic Indexing Chris
More information1 Outline. 1. Motivation. 2. SUR model. 3. Simultaneous equations. 4. Estimation
1 Outline. 1. Motivation 2. SUR model 3. Simultaneous equations 4. Estimation 2 Motivation. In this chapter, we will study simultaneous systems of econometric equations. Systems of simultaneous equations
More informationNode Centrality and Ranking on Networks
Node Centrality and Ranking on Networks Leonid E. Zhukov School of Data Analysis and Artificial Intelligence Department of Computer Science National Research University Higher School of Economics Social
More informationReconstruction in the Generalized Stochastic Block Model
Reconstruction in the Generalized Stochastic Block Model Marc Lelarge 1 Laurent Massoulié 2 Jiaming Xu 3 1 INRIA-ENS 2 INRIA-Microsoft Research Joint Centre 3 University of Illinois, Urbana-Champaign GDR
More informationRECSM Summer School: Facebook + Topic Models. github.com/pablobarbera/big-data-upf
RECSM Summer School: Facebook + Topic Models Pablo Barberá School of International Relations University of Southern California pablobarbera.com Networked Democracy Lab www.netdem.org Course website: github.com/pablobarbera/big-data-upf
More informationTheory and Methods for the Analysis of Social Networks
Theory and Methods for the Analysis of Social Networks Alexander Volfovsky Department of Statistical Science, Duke University Lecture 1: January 16, 2018 1 / 35 Outline Jan 11 : Brief intro and Guest lecture
More informationLinear Algebra Review. Vectors
Linear Algebra Review 9/4/7 Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa (UCSD) Cogsci 8F Linear Algebra review Vectors
More informationLinear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,
Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,
More informationA Geometric Interpretation of Gene Co-Expression Network Analysis. Steve Horvath, Jun Dong
A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong Outline Network and network concepts Approximately factorizable networks Gene Co-expression Network Eigengene Factorizability,
More informationImproving Diversity in Ranking using Absorbing Random Walks
Improving Diversity in Ranking using Absorbing Random Walks Andrew B. Goldberg with Xiaojin Zhu, Jurgen Van Gael, and David Andrzejewski Department of Computer Sciences, University of Wisconsin, Madison
More informationSpectral Clustering. Guokun Lai 2016/10
Spectral Clustering Guokun Lai 2016/10 1 / 37 Organization Graph Cut Fundamental Limitations of Spectral Clustering Ng 2002 paper (if we have time) 2 / 37 Notation We define a undirected weighted graph
More informationMethods and tools for semantic social network analysis
Methods and tools for semantic social network analysis Big Open Data Analysis, Roma 2 nd February, 2018 Antonio De Nicola (DTE-SEN-APIC, ENEA) INFLUENCERS D Agostino G., D Antonio F., De Nicola A., Tucci
More informationOverlapping Communities
Overlapping Communities Davide Mottin HassoPlattner Institute Graph Mining course Winter Semester 2017 Acknowledgements Most of this lecture is taken from: http://web.stanford.edu/class/cs224w/slides GRAPH
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu November 3, 2015 Methods to Learn Matrix Data Text Data Set Data Sequence Data Time Series Graph
More informationCS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine
CS 277: Data Mining Mining Web Link Structure Class Presentations In-class, Tuesday and Thursday next week 2-person teams: 6 minutes, up to 6 slides, 3 minutes/slides each person 1-person teams 4 minutes,
More informationComplex Networks CSYS/MATH 303, Spring, Prof. Peter Dodds
Complex Networks CSYS/MATH 303, Spring, 2011 Prof. Peter Dodds Department of Mathematics & Statistics Center for Complex Systems Vermont Advanced Computing Center University of Vermont Licensed under the
More informationMarkov Chains and Spectral Clustering
Markov Chains and Spectral Clustering Ning Liu 1,2 and William J. Stewart 1,3 1 Department of Computer Science North Carolina State University, Raleigh, NC 27695-8206, USA. 2 nliu@ncsu.edu, 3 billy@ncsu.edu
More informationFaloutsos, Tong ICDE, 2009
Large Graph Mining: Patterns, Tools and Case Studies Christos Faloutsos Hanghang Tong CMU Copyright: Faloutsos, Tong (29) 2-1 Outline Part 1: Patterns Part 2: Matrix and Tensor Tools Part 3: Proximity
More informationWhom to Ask? Jury Selection for Decision Making Tasks on Micro-blog Services
Whom to Ask? Jury Selection for Decision Making Tasks on Micro-blog Services Caleb Chen CAO, Jieying SHE, Yongxin TONG, Lei CHEN The Hong Kong University of Science and Technology Is Istanbul the capital
More informationAttentive Betweenness Centrality (ABC): Considering Options and Bandwidth when Measuring Criticality
Attentive Betweenness Centrality (ABC): Considering Options and Bandwidth when Measuring Criticality Sibel Adalı, Xiaohui Lu, M. Magdon-Ismail September 5, 0 Who is the Most Critical? Would you use the
More informationThe University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.
The University of Texas at Austin Department of Electrical and Computer Engineering EE381V: Large Scale Learning Spring 2013 Assignment 1 Caramanis/Sanghavi Due: Thursday, Feb. 7, 2013. (Problems 1 and
More informationNonlinear Dimensionality Reduction
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Kernel PCA 2 Isomap 3 Locally Linear Embedding 4 Laplacian Eigenmap
More informationForecasting Using Time Series Models
Forecasting Using Time Series Models Dr. J Katyayani 1, M Jahnavi 2 Pothugunta Krishna Prasad 3 1 Professor, Department of MBA, SPMVV, Tirupati, India 2 Assistant Professor, Koshys Institute of Management
More informationBayesian Hierarchical Models
Bayesian Hierarchical Models Gavin Shaddick, Millie Green, Matthew Thomas University of Bath 6 th - 9 th December 2016 1/ 34 APPLICATIONS OF BAYESIAN HIERARCHICAL MODELS 2/ 34 OUTLINE Spatial epidemiology
More informationLecture 5: Web Searching using the SVD
Lecture 5: Web Searching using the SVD Information Retrieval Over the last 2 years the number of internet users has grown exponentially with time; see Figure. Trying to extract information from this exponentially
More informationCrowd-Learning: Improving the Quality of Crowdsourcing Using Sequential Learning
Crowd-Learning: Improving the Quality of Crowdsourcing Using Sequential Learning Mingyan Liu (Joint work with Yang Liu) Department of Electrical Engineering and Computer Science University of Michigan,
More informationCS 340 Lec. 6: Linear Dimensionality Reduction
CS 340 Lec. 6: Linear Dimensionality Reduction AD January 2011 AD () January 2011 1 / 46 Linear Dimensionality Reduction Introduction & Motivation Brief Review of Linear Algebra Principal Component Analysis
More informationChallenges in Geocoding Socially-Generated Data
Challenges in Geocoding Socially-Generated Data Jonny Huck (2 nd year part-time PhD student) Duncan Whyatt Paul Coulton Lancaster Environment Centre School of Computing and Communications Royal Wedding
More informationLink Analysis. Leonid E. Zhukov
Link Analysis Leonid E. Zhukov School of Data Analysis and Artificial Intelligence Department of Computer Science National Research University Higher School of Economics Structural Analysis and Visualization
More information13 Searching the Web with the SVD
13 Searching the Web with the SVD 13.1 Information retrieval Over the last 20 years the number of internet users has grown exponentially with time; see Figure 1. Trying to extract information from this
More informationGossip algorithms for solving Laplacian systems
Gossip algorithms for solving Laplacian systems Anastasios Zouzias University of Toronto joint work with Nikolaos Freris (EPFL) Based on : 1.Fast Distributed Smoothing for Clock Synchronization (CDC 1).Randomized
More informationEstimating the long-term health impact of air pollution using spatial ecological studies. Duncan Lee
Estimating the long-term health impact of air pollution using spatial ecological studies Duncan Lee EPSRC and RSS workshop 12th September 2014 Acknowledgements This is joint work with Alastair Rushworth
More informationA LINE GRAPH as a model of a social network
A LINE GRAPH as a model of a social networ Małgorzata Krawczy, Lev Muchni, Anna Mańa-Krasoń, Krzysztof Kułaowsi AGH Kraów Stern School of Business of NY University outline - ideas, definitions, milestones
More informationA New Space for Comparing Graphs
A New Space for Comparing Graphs Anshumali Shrivastava and Ping Li Cornell University and Rutgers University August 18th 2014 Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 1 / 38 Main
More informationSanjay Chaudhuri Department of Statistics and Applied Probability, National University of Singapore
AN EMPIRICAL LIKELIHOOD BASED ESTIMATOR FOR RESPONDENT DRIVEN SAMPLED DATA Sanjay Chaudhuri Department of Statistics and Applied Probability, National University of Singapore Mark Handcock, Department
More informationPrediction of Citations for Academic Papers
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationMultivariate Statistical Analysis
Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 3 for Applied Multivariate Analysis Outline 1 Reprise-Vectors, vector lengths and the angle between them 2 3 Partial correlation
More informationCollaborative Recommendation with Multiclass Preference Context
Collaborative Recommendation with Multiclass Preference Context Weike Pan and Zhong Ming {panweike,mingz}@szu.edu.cn College of Computer Science and Software Engineering Shenzhen University Pan and Ming
More informationData Mining and Matrices
Data Mining and Matrices 10 Graphs II Rainer Gemulla, Pauli Miettinen Jul 4, 2013 Link analysis The web as a directed graph Set of web pages with associated textual content Hyperlinks between webpages
More informationGeneralized Exponential Random Graph Models: Inference for Weighted Graphs
Generalized Exponential Random Graph Models: Inference for Weighted Graphs James D. Wilson University of North Carolina at Chapel Hill June 18th, 2015 Political Networks, 2015 James D. Wilson GERGMs for
More informationLinear Methods in Data Mining
Why Methods? linear methods are well understood, simple and elegant; algorithms based on linear methods are widespread: data mining, computer vision, graphics, pattern recognition; excellent general software
More informationRegionalizing and Understanding Commuter Flows: An Open Source Geospatial Approach
Regionalizing and Understanding Commuter Flows: An Open Source Geospatial Approach Lorraine Barry School of Natural and Built Environment, Queen's University Belfast l.barry@qub.ac.uk January 2017 Summary
More informationRetrieval by Content. Part 2: Text Retrieval Term Frequency and Inverse Document Frequency. Srihari: CSE 626 1
Retrieval by Content Part 2: Text Retrieval Term Frequency and Inverse Document Frequency Srihari: CSE 626 1 Text Retrieval Retrieval of text-based information is referred to as Information Retrieval (IR)
More informationPAIRED COMPARISONS MODELS AND APPLICATIONS. Regina Dittrich Reinhold Hatzinger Walter Katzenbeisser
PAIRED COMPARISONS MODELS AND APPLICATIONS Regina Dittrich Reinhold Hatzinger Walter Katzenbeisser PAIRED COMPARISONS (Dittrich, Hatzinger, Katzenbeisser) WU Wien 6.11.2003 1 PAIRED COMPARISONS (PC) a
More informationSample Geometry. Edps/Soc 584, Psych 594. Carolyn J. Anderson
Sample Geometry Edps/Soc 584, Psych 594 Carolyn J. Anderson Department of Educational Psychology I L L I N O I S university of illinois at urbana-champaign c Board of Trustees, University of Illinois Spring
More informationBlog Community Discovery and Evolution
Blog Community Discovery and Evolution Mutual Awareness, Interactions and Community Stories Yu-Ru Lin, Hari Sundaram, Yun Chi, Junichi Tatemura and Belle Tseng What do people feel about Hurricane Katrina?
More informationMatrix Factorization In Recommender Systems. Yong Zheng, PhDc Center for Web Intelligence, DePaul University, USA March 4, 2015
Matrix Factorization In Recommender Systems Yong Zheng, PhDc Center for Web Intelligence, DePaul University, USA March 4, 2015 Table of Contents Background: Recommender Systems (RS) Evolution of Matrix
More informationThe role of topological outliers in the spatial analysis of georeferenced social media data
06 April 2017 The role of topological outliers in the spatial analysis of georeferenced social media data René Westerholt, Heidelberg University Seminar on Spatial urban analytics: big data, methodologies,
More informationModeling Controversy Within Populations
Modeling Controversy Within Populations Myungha Jang, Shiri Dori-Hacohen and James Allan Center for Intelligent Information Retrieval (CIIR) University of Massachusetts Amherst {mhjang, shiri, allan}@cs.umass.edu
More information2. Sample representativeness. That means some type of probability/random sampling.
1 Neuendorf Cluster Analysis Model: X1 X2 X3 X4 X5 Clusters (Nominal variable) Y1 Y2 Y3 Clustering/Internal Variables External Variables Assumes: 1. Actually, any level of measurement (nominal, ordinal,
More informationRaRE: Social Rank Regulated Large-scale Network Embedding
RaRE: Social Rank Regulated Large-scale Network Embedding Authors: Yupeng Gu 1, Yizhou Sun 1, Yanen Li 2, Yang Yang 3 04/26/2018 The Web Conference, 2018 1 University of California, Los Angeles 2 Snapchat
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Week #1
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Week #1 Today Introduction to machine learning The course (syllabus) Math review (probability + linear algebra) The future
More informationOn efficient use of entropy centrality for social network analysis and community detection
On efficient use of entropy centrality for social network analysis and community detection ALEXANDER G. NIKOLAEV, RAIHAN RAZIB, ASHWIN KUCHERIYA PRESENTER: PRIYA BALACHANDRAN MARY ICSI 445/660 12/1/2015
More informationGraph Functional Methods for Climate Partitioning
Graph Functional Methods for Climate Partitioning Mathilde Mougeot - with D. Picard, V. Lefieux*, M. Marchand* Université Paris Diderot, France *Réseau Transport Electrique (RTE) Buenos Aires, 2015 Mathilde
More informationQuick Tour of Linear Algebra and Graph Theory
Quick Tour of Linear Algebra and Graph Theory CS224w: Social and Information Network Analysis Fall 2012 Yu Wayne Wu Based on Borja Pelato s version in Fall 2011 Matrices and Vectors Matrix: A rectangular
More informationL 2,1 Norm and its Applications
L 2, Norm and its Applications Yale Chang Introduction According to the structure of the constraints, the sparsity can be obtained from three types of regularizers for different purposes.. Flat Sparsity.
More informationLink Prediction. Eman Badr Mohammed Saquib Akmal Khan
Link Prediction Eman Badr Mohammed Saquib Akmal Khan 11-06-2013 Link Prediction Which pair of nodes should be connected? Applications Facebook friend suggestion Recommendation systems Monitoring and controlling
More informationFreeman (2005) - Graphic Techniques for Exploring Social Network Data
Freeman (2005) - Graphic Techniques for Exploring Social Network Data The analysis of social network data has two main goals: 1. Identify cohesive groups 2. Identify social positions Moreno (1932) was
More informationPower Grid Partitioning: Static and Dynamic Approaches
Power Grid Partitioning: Static and Dynamic Approaches Miao Zhang, Zhixin Miao, Lingling Fan Department of Electrical Engineering University of South Florida Tampa FL 3320 miaozhang@mail.usf.edu zmiao,
More informationMA : Introductory Probability
MA 320-001: Introductory Probability David Murrugarra Department of Mathematics, University of Kentucky http://www.math.uky.edu/~dmu228/ma320/ Spring 2017 David Murrugarra (University of Kentucky) MA 320:
More informationData Mining and Matrices
Data Mining and Matrices 08 Boolean Matrix Factorization Rainer Gemulla, Pauli Miettinen June 13, 2013 Outline 1 Warm-Up 2 What is BMF 3 BMF vs. other three-letter abbreviations 4 Binary matrices, tiles,
More informationDirichlet process Bayesian clustering with the R package PReMiuM
Dirichlet process Bayesian clustering with the R package PReMiuM Dr Silvia Liverani Brunel University London July 2015 Silvia Liverani (Brunel University London) Profile Regression 1 / 18 Outline Motivation
More informationSPACE Workshop NSF NCGIA CSISS UCGIS SDSU. Aldstadt, Getis, Jankowski, Rey, Weeks SDSU F. Goodchild, M. Goodchild, Janelle, Rebich UCSB
SPACE Workshop NSF NCGIA CSISS UCGIS SDSU Aldstadt, Getis, Jankowski, Rey, Weeks SDSU F. Goodchild, M. Goodchild, Janelle, Rebich UCSB August 2-8, 2004 San Diego State University Some Examples of Spatial
More informationP -spline ANOVA-type interaction models for spatio-temporal smoothing
P -spline ANOVA-type interaction models for spatio-temporal smoothing Dae-Jin Lee 1 and María Durbán 1 1 Department of Statistics, Universidad Carlos III de Madrid, SPAIN. e-mail: dae-jin.lee@uc3m.es and
More informationMany phenomena in nature have approximately Normal distributions.
NORMAL DISTRIBUTION The Normal r.v. plays an important role in probability and statistics. Many phenomena in nature have approximately Normal distributions. has a Normal distribution with parameters and,
More informationThe Static Absorbing Model for the Web a
Journal of Web Engineering, Vol. 0, No. 0 (2003) 000 000 c Rinton Press The Static Absorbing Model for the Web a Vassilis Plachouras University of Glasgow Glasgow G12 8QQ UK vassilis@dcs.gla.ac.uk Iadh
More informationMultivariate Statistical Analysis
Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions
More informationCollaborative topic models: motivations cont
Collaborative topic models: motivations cont Two topics: machine learning social network analysis Two people: " boy Two articles: article A! girl article B Preferences: The boy likes A and B --- no problem.
More informationHeat Kernel Based Community Detection
Heat Kernel Based Community Detection Joint with David F. Gleich, (Purdue), supported by" NSF CAREER 1149756-CCF Kyle Kloster! Purdue University! Local Community Detection Given seed(s) S in G, find a
More informationStatistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1
Week 2 Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Part I Other datatypes, preprocessing 2 / 1 Other datatypes Document data You might start with a collection of
More informationPart I. Other datatypes, preprocessing. Other datatypes. Other datatypes. Week 2 Based in part on slides from textbook, slides of Susan Holmes
Week 2 Based in part on slides from textbook, slides of Susan Holmes Part I Other datatypes, preprocessing October 3, 2012 1 / 1 2 / 1 Other datatypes Other datatypes Document data You might start with
More informationManifold Coarse Graining for Online Semi-supervised Learning
for Online Semi-supervised Learning Mehrdad Farajtabar, Amirreza Shaban, Hamid R. Rabiee, Mohammad H. Rohban Digital Media Lab, Department of Computer Engineering, Sharif University of Technology, Tehran,
More informationNon-Negative Matrix Factorization
Chapter 3 Non-Negative Matrix Factorization Part 2: Variations & applications Geometry of NMF 2 Geometry of NMF 1 NMF factors.9.8 Data points.7.6 Convex cone.5.4 Projections.3.2.1 1.5 1.5 1.5.5 1 3 Sparsity
More informationAssessment of Multi-Hop Interpersonal Trust in Social Networks by 3VSL
Assessment of Multi-Hop Interpersonal Trust in Social Networks by 3VSL Guangchi Liu, Qing Yang, Honggang Wang, Xiaodong Lin and Mike P. Wittie Presented by Guangchi Liu Department of Computer Science Montana
More informationLab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018
Lab 8: Measuring Graph Centrality - PageRank Monday, November 5 CompSci 531, Fall 2018 Outline Measuring Graph Centrality: Motivation Random Walks, Markov Chains, and Stationarity Distributions Google
More informationDATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD
DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary
More informationRecommendation Systems
Recommendation Systems Popularity Recommendation Systems Predicting user responses to options Offering news articles based on users interests Offering suggestions on what the user might like to buy/consume
More information