Long Term Evolution of Networks CS224W Final Report
|
|
- Elaine Cross
- 5 years ago
- Views:
Transcription
1 Long Term Evolution of Networks CS224W Final Report Jave Kane (SUNET ID javekane), Conrad Roche (SUNET ID conradr) Nov. 13, 2014 Introduction and Motivation Many studies in social networking and computer science have investigated evolution of large networks, on time scales of a year to ten years. Less well investigated is the long term evolution of large networks and the stability of communities when the external forces generating these structures are themselves evolving. We investigate collaboration networks generated from American Physical Society (APS) article metadata ( ). We study the correlation between author interests as represented by the APS PACS fields. We then simulate the APS network using the Kronecker and Forest Fire models, which turn out to be difficult to use with APS. Therefore, we create a third, simple model of a collaboration network based on a few intuitive rules about how coauthors are chosen. This model produces a network with power law degree structure and clustering coefficient similar to APS. The effective diameter is systematically lower than for APS, and we explore why. Overall the results suggest APS is a collection of tightly bound communities, with few edges joining them. Finally, we briefly investigate longer- term evolution in two what- if scenarios. Summary & Critique of selected papers Microscopic Evolution of Social Networks, Leskovec et al. Analyzes the evolution of four social networks at the microscopic level. Show that the edge creation for a node seems unaffected by its age, but is proportional to its degree. Evolution of the social network of scientific collaborations, Baraba śi et al. Investigate structural properties of two collaboration networks. For both networks, the degree distribution has a power law tail, with different exponents for the two networks. Tracking the Evolution of Communities in Dynamic Social Networks, Greene & Doyle. A model for tracking user communities in dynamic networks where each the evolution of each community is determined by a set of significant events. Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication, Leskovec et al. A graph generator that obeys static properties and temporal evolution patterns of real- life network and is mathematically. Generated graphs exhibit multinomial degree distribution, and multinomial eignevalue distribution. Processing the APS Data and Creating Network Snapshots The corpus of Physical Review Letters, Physical Review, and Reviews of Modern Physics contains metadata for 541,448 articles dating back to 1893 from 12 APS journals described in Table 1. Pre- processing the data to generating network snapshots is a significant effort;
2 after considerable investigation we settled on the following. We select only articles where having the authors and names fields, thus eliminating preambles and commentaries, and the first names, thus eliminating collaborations (with an average of 331 authors per article.) We build 121 yearly snapshots (1893 to 1913) by scanning the articles in date order. We add an author as a node at the time of her first article. For each article, we add a coauthor edge for each pair of authors. We accumulate the time- dependent interest vector for each author as the sum of the PACS vector of all their papers. Identifying authors is very difficult. An author may be listed with various first names or initials. Some names are very common, e.g. Lee, and Brown. The affiliations field is not helpful; its format varies even for the same author. Furthermore, authors often change institutions. Misidentification means authors may map one- to- many or many- to- one to nodes, spuriously creating or omitting edges. The final (2013) network has 2.7e5 nodes and 5.7e6 edges. Full Title Year Description Physical Review Original journal Physical Review A Atomic, molecular, optical, quantum information Physical Review B Condensed matter and materials Physical Review C Experimental and theoretical nuclear Physical Review D Elementary particle, field theory, gravitation, cosmology Physical Review E Collective phenomena of many-body systems: PR Series I Physical Review Letters All fields, short letters, important research. PR Special Topics Accelerators and Beams PR Special Topics Education PR X all areas: pure, applied, interdisciplinary Reviews of Modern Physics 1929 broad fundamental current trends and applications Table 1. The APS journals We use the APS PACS classification field to assign interest vectors to authors. The 10 PACS areas were introduced in 1975, e.g. (10) The Physics of Elementary Particles and Fields; (70) Condensed Matter: Electronic Structure, Electrical, Magnetic, and Optical Properties. For each article we parse the PACS code down to a broad category, e.g. 7y.xx, becomes % of the articles have no PACS, mostly older articles but persisting well into the 1980 s. We tested using journal- averaged PACS in these cases but felt this procedure was too complicated and could obscure network of interest, so we assign a zero PAC vector instead, and for analyzing interest of communities, average over the normalized non- zero vectors. Community Detection and PACS Interest Vectors To detect communities in the APS network snapshots and investigate their correlation with the accumulated PACS vectors of authors, we have tried the Girvan- Newman (GM) and Clauset- Newman- Moore (CNM) algorithms (fastcommunity_mh in C. GM appears intractable, while CNM takes 18 hours on the 2013 network. We considered Big Clam, and settled on Louvain, which takes less than a minute on both the APS and simulated networks, and generates intuitively reasonable communities in both cases. We also tested Louvain with edge weighting with our simulated network; we found it was too slow. Figure 1 shows the distribution of PACS for each of the top 5 communities by size in the 2010, 2011, 2012
3 and 2013 snapshots. The largest community (blue bars in all plots) is always specialized in Condensed Matter Physics (PACS categories 7 and 8), but the continuity of the other communities is less clear. All communities grow in size with time, but in addition Louvain unpredictably splits communities with similar interests that might intuitively be combined into one. Communities also change order in the ranking by size. To match communities from one year to the next, we computed two Jaccard- type similarities on the author (node) ids, and also cosine similarities between the mean normalized total PAC vectors (summed over authors) within each community. Because a very small community U (of say, 200 nodes) could become split off from a larger community V (on, say, nodes) that contained it a year earlier, we computed both J! U V U V and J! U V min U, V. We found that both J 1 and J 2 were well correlated with the cosine similarities of the groups. Since our goal here was only to confirm a supposed link between communities and interests, we did not attempt to compute exhaustive statistics for the database, but for example both J 1 =72% and J 2 = 88% both match community 3 (green) in 2012 with community 2 (light blue) in 2012; the interest cosine similarity is 99.76%. These results suggest APS communities are tight knit and identifiable by similar PACS interests. Figure 1. Fraction of total community PACS vector in each category for top 5 Louvain APS communities by size for the years 2010, 2011, 2012 and The legend shows the total number of nodes in each group.
4 Models of the APS Network Forest Fire Model The Forest Fire Model [3] generate evolutionary models by burning through an existing graph. It has two parameter, the forward and backward burning probabilities p and r. Each arriving node v attaches to an ambassador node, w chosen uniformly at random then attaches to a subset of the out- and in- neighbors of w using a geometric distribution based on the p and r. The node v then creates edges with the subset of nodes selected. This is repeated recursively on the subset until no nodes are left, with any node visited only once. The resulting graphs exihibit densification and a power law degree distribution. For APS network, the forest fire model can be viewed as new author selecting a primary co- author, then selecting more coauthors recursively. The forward/reverse burning probabilities were determined for each of the APS snapshots. A simple least square approximation of the goodness of fit model was used to determine the probabilities for which the model closely matches the properties of the actual network. The representative properties used were node count, edge count, approximate diameter, maximum WCC size, clustering coefficient and maximum node degree. The probabilities for the best fits were in the sweet spot between 0.2 and The forest fire model with the best fit for all the snapshots had a forward burning probability, p, of 0.33 and a backward burning probability, r, of The basic Forest Fire model will generate a graph which contains a single component whose nodes (ignoring edge direction) one can navigate to any other node in the component. Thus, the maximal WCC is the entire network. This is not true for APS, however (Figure 4), so the forest fire model did not correctly model the maximum WCC size. Figure 2. Graph properties over time for the Forest Fire models initial years of the graph while the basic forest fire model fits the later years of the graph. As suggested in [3], one way to generate a graph which contains multiple weakly connected components is the orphan model. Here, with probability of op, v, will not establish an edge with the existing graph. This leads to many orphan nodes which will eventually form edges
5 with new nodes. The forest fire model with the best fit for all the snapshots had a forward burning probability, p, of 0.27 and a backward burning probability, r, of 0.28 and an orphan probability of 0.3. The Forest Fire orphan model fit the initial evolution of APS well, but the basic model fit the latter part of the evolution better. The clustering coefficients in the both models remain flat for most of the evolution. The APS network on the other hand has a steep initial increase in the clustering coefficient and then, as the number of nodes increase, it asymptotes. This makes it difficult to fit the forest fire model to the data. Stochastic Kronecker Graph model The self- similar Stochastic Kronecker graph [4] is generated recursively, starting with a N! N! probability matrix (N! = 2 in our cae), and compute its k!" power. Edges are based on the probability in the corresponding entry of the resulting matrix. For APS self- similarity would imply that the network amongst authors is similar (but not exactly the same) across the network. This makes sense, as authors in different fields would exhibit similar collaborative behavior. Unlike Forest Fire, stochastic Kronecker is undirected. It exhibits a power law degree distribution, small diameter and densification. Another interpretation of Stochastic Kronecker graphs, as described in [4], is to associate each node with a set of features, where the probability of an edge depends on the feature similarities. For APS the features are authors (research) interests. We used the KronFit approach [4] to fit the APS netwrk snapshots and determine the initiator matrix. We chose N 1 = 2, since larger N 1 did radically improve the fit not in [4]. The matrix varied for each of the snapshots. The diagonal values in the matrices were nearly the same, with an average difference over the years of The median value of the matrix elements in the yearly snapshots was chosen as the initiator for the entire time series, Θ = Figure 3 The Stochastic Kronecker graph modeled the edge growth and the effective diameter well. It could not model the behavior of the maximum SCC or the clustering coefficient, which a steep fall with network growth while the APS network had the clustering coefficient rise for the latter half of the time.
6 Simple Model We have also built a simple model for generating an APS- like network. The key element is a small set of intuitive rules choosing coauthors that account for the interests of the authors, where interest means any external influence, especially funding or popular support for particular subfields of physics. A rule is chosen randomly using the weighted distribution shown in Table 2. Through extensive experimentation, these rules were found to be both effective in controlling network propertie, and feasible for APS- size graphs. Coauthor selection rule Typical relative weighting 1 complete a triangle where nodes have the same interest 23 2 seek a high degree node with same interest 70 3 seek any node with same interest 6 4 seek a high degree node (neighbor of random node with any 1e- 3 interest) 5 random node (any degree, any interest 0.4 Table 2. Intuitive rules for choosing coauthors in the simple model. Notably, Rule 4, essentially an edge rewiring rule, reduces the effective diameter in the model shown here from around 8 to an APS- like value of around 5 6. In the model shown here, for simplicity of exposition each author has one of three possible scalar interest values (1,2, or 3). We also tried as many as 20 scalar interests. For the model run shown here, the network G = (V, E) is initialized with 4 nodes in each of three equal- size complete subgraph communities C1, C2 and C3, where every node in Ci has a single scalar interest i. The subgraphs are connected in a ring (three additional edges joining pairs of subgraphs). The model is run for 1440 time steps (representing 120 years time 12 months). Each month n = max(1, rn) new authors (nodes) are added to the network, where the rate r = 0.058/12 and N= V. Each new author is assigned an interest value 1, 2, or 3 with equal probability. The top right panel of Figure 3 shows an excellent quadratic fit to the number of edges vs. the number of nodes in the APS Wcc, so we stipulate the number of new papers ΔP is related to the number of new nodes ΔN at each step as (1 + 2α N 2 ) ΔN; the constant α = 3e- 3 is tuned along with the probabilities for number of coauthors stated above.. Each new paper has m coauthors in addition to the first author, where m is chosen randomly for each paper separately using the weighting [0.35, 0.15, 0.10, 0.10, 0.05] for m = [1, 2, 3, 4, 5]. This weighting influences the rate at which edges appear. All n new authors are placed in an initially empty queue q of first authors waiting for coauthors, and then s = q n existing first authors are also added to the list. To choose an existing first author for q, an existing node v is chosen randomly from the network. With 10% probability, v is added to the list, or with 90% probability, a random neighbor of v is node is added to the list; the latter implements a preferential well- connected get more publications mode. Once q is assembled, each v in q is assigned its own m(v) - 1 coauthors by the rules in Table 1. Because the initial network is connected and every paper has at least one coauthor, the network remains connected. Using a profiling utility in Python, we found that careful approximations to the rule reduced, the running time to generate the full APS- size from several hours to two minutes. For example, at first we used sampling routines in the Python stats package to generate a list of
7 candidate nodes for Rule 1, from which we randomly downselected. However, the profiler showed this and similar approaches are very expensive. Therefore, in Rule 1, if the chosen neighbor of v or its neighbor does not have the same interest as v, we simply iterate and choose a rule again (possibly rule 1). This means the rules are not applied with exactly the distribution shown in Table 2. However, because the edges are sparse E << N(N- 1)/2, and the rules tend to surround neigbors with similar interest, the rules usually find a candidate, so we expect the approximations to work well. The improved speed of smade it possible to test many concepts for the model and to vary the parameters. Results of Simple Model As the top left panel of Figure 4 shows, the APS network is significantly disconnected until the 1940s, when the network has only a few thousand nodes. Therefore we compare simulated results to the maximum weakly connected component (Wcc) of APS. The left panel on the second line of Figure 4 shows the number of edges versus the number of nodes. The simple model produces an APS- like network with the following features. 1) clustering coefficient C: close to APS value of 6.1: second line, right panel of Figure 4. 2) power law degree similar to APS: third line of Figure 4 3) effective diameter D e and # of nodes at a given # of hops similar to APS: bottom line. Surprisingly, D e of the APS network is the most sensitive measure to the input parameters. As the left panel on the bottom line of Figure 4 shows, for the model run shown the final D e is closer to 4 than to the final D e of about 5 6 in the APS Wcc, although the two may be converging. We note that the full diameter of the model is very similar to the APS Wcc D e. One way a graph can have high C and high D e is if it has dense communities with tendrils. For example, a complete graph G C on n has C = 1. If a chain (line) subgraph G L of m nodes is attached by one edge to one node v of G C, then every node i in G L has C i = 0. The full diameter of the graph is D = m +1. Every node j in G C except v has C j = 1. The node v has 2e C v = v ( ( ) +1= n is the degree of v and k v = n 1 )( n 2) is the number of edges k v 1 2 k v ( ), where k v = n 1 between pairs of neighbors of C v, i.e. between the n 1 neighbors of v in G L. Thus ( ) ( n 1) ( n 2) 2 ( ) & ( ) 2 C v = k v ( k v 1) = 2 = n 2. Therefore for the entire graph, the clustering coefficient n n 1 n is C = 1 # % m + n m 0 + n 2 + ( n 1) 1( = 1 n n 2 2 =, m 1 $ n ' m + n n D + n 1 n For a given D, lim n C =1, i.e. C can be arbitrarily close to 1. We suspect this type of structure is present in APS, but it would not result from the simple rules in Table 1. (This would be interesting further work). Notably, APS has a cloud of high- frequency, high- distribution counts above the power law (third line, left panel of Figure 4) that the simple model does not reproduce. These could be explained as small highly connected communities nearly detached from the main network, and could raise D e. (These may be specialty institutions or fields this would be worth investigating further.) In general, by varying parameters in the simple model it is easy to improve the match on any of criteria 1) 3) individually (not shown), such as the number of nodes at degree 1 or the tail of single- count, high- degree nodes. We have made no systematic attempt to find a simultaneous good match on all 3 criteria. However, given the simplicity of our rules, these results suggest that simple intuitive rules can robustly explain the APS network
8 Simple Model Figure 4. Results of the simple model Top left: APS # Nodes vs. Year. Top right: Fit to APS # Edges vs. # Nodes. 2 nd line left: # Edge vs. # Nodes. 2 nd line right: Global Clustering Coefficient. 3 rd line left: degree distribution for APS. 3 rd line right: same for simple model. Bottom left: Diameter. Bottom distribution of nodes versus number of hops:
9 Long term evolution of simulated network We have also the used the simple model to investigate longer- term evolution of a collaboration network in two what- if scenarios. The type of scenario we envision is a significant change in the funding levels for two subfields of physics research. An example might be an increase in funding for Condensed Matter as it becomes the dominant profitable field in physics, and a decrease in funding for nuclear physics as it loses political support. To mock up these scenarios, at 1200 months (out of 1440 months) we impose a change in the interest attributes of new nodes and existing nodes publishing new papers. At 1200 months time the network contains about 1/3 the number of nodes it does at 1440 months. The change in interest mocks up an increase in the fractional funding to Interest 1 from 33% to 60% of total physics funding, and a decrease in funding for Interest 3 from 33% to 5%. New authors entering the network are assigned interests according to the changed interest distribution, while existing first authors who currently have Interest 3 reassign their interest according to the new interest distribution when they publish a new paper. This mocks up the effect of existing authors possibly bailing out on a declining field. We run Louvain community detection on yearly network snapshots. The result of this model is that the clustering coefficient increases, while the effective diameter stays nearly constant. This is a somewhat surprising result. A change in author interest might be expected to generate more long- distance connections since an existing author who changes interest would tend to have different interests from its neighbors; this would not tend to decrease the clustering coefficient, but would decrease the diameter. As a fraction of the total number of authors in the network, C1 increases in size in response to increased funding, while C2 stays constant in size. C3 changes in several ways: it shrinks in fractional size by a factor of 3; its average interest changes from 3 to 1.6, and the fraction of authors in Community 3 with interests 1, 2 and 3 respectively each become about 1/3. This result suggest that a fairly simple change in the funding input to a collaboration network can lead to the disruption of a previously stable community in a short amount of time. Since existing edges (papers published on earlier interests) are not remove in this simple model, it is interesting that the new edges come to dominate the structure. However, this result was obtained for the case where the network continued to grow; in this case new authors entering the network may dominate the structure. To address the last point, we simulate a case where we stop adding nodes to the network at 1200 months, and where existing authors with Interest 3 change their interest according to the prescription in the previous section when they publish a paper. By 1440 months the effective network diameter drops to to 5 and the clustering coefficient actually decrease to 0.46, because many existing nodes that had Interest 3 have started new triangles that are not yet completed. The number of edges in the network has increased from 1.8e6 to 3.9e6 (compared to ~5.9e6 in the case where new nodes are added.) At 1440 months the Interest 3 community from 1200 months has significantly changed: the community is only 5% Interest 3, and 51% Interest 1. This result in not surprising, but suggests that the rapid transformation of communities takes place among both existing nodes and new nodes.
10 Conclusions and Further work The forward and backward probabilities of the forest fire mode that fit each of the yearly snapshots varied from year to year. We could consider a model where the probabilities are a function of the number of nodes in the graph decreasing as the number of nodes increase. An alternative approach could be a model where the orphan probability decreases as the graph grows, so that it reduces to zero once the graph reaches a certain size. We would like to add more features to our simple model of network generation. Nodes and edges could age, and be refreshed by new papers; edges could have strengths based on cosine similarity of the endpoint interest. We have created and run models with these features (shown in the milestone report) but have not presented results here. External changes on collaboration networks could also include influence/outbreak, where author interests spread by contact edges. However, from the viewpoint of creating simple, realistic model, it s not obvious that such effects are as important as more practical concerns like funding levels and political/social palatability of particular areas of research. The results for extended 'longer term' evolution with the simple model are interesting, but not too surprising, because the APS it consists of tight communities with few links between them, i.e. as a whole it's barely a network. The average number of edges from a node to a node in a different community is much less than one. This is not astonishing; coauthoring a paper is a much more involved undertaking and commitment than friending someone or liking their post. The main conclusion is that the APS network is formed by authors choosing like- minded (or like- funded) coauthors that are well- connected. References [1] [2] A. Clauset, M.E.J. Newman and C. Moore, "Finding community structure in very large networks." Phys. Rev. E 70, (2004). [3] Leskovec, J. Kleinberg and C. Faloutsos. Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), [4] J Leskovec, D. Chakrabarti, J. Kleinberg, C. Faloutsos, and Z. Ghahramani. Kronecker graphs: An approach to modeling networks. Journal of Machine Learning Research (JMLR), Tasks/Roles C. Roche: compiled Louvain for Mac; ran and analyzed Forest Fire and Kronecker models. J. Kane: formulated project goals; supervised project; preprocessed APS data; created network snapshots; performed Louvain community detection and analysis; built and ran simple model and analyzed output; did long- term evolution runs; assembled final report and wrote 80% of it.
Networks as vectors of their motif frequencies and 2-norm distance as a measure of similarity
Networks as vectors of their motif frequencies and 2-norm distance as a measure of similarity CS322 Project Writeup Semih Salihoglu Stanford University 353 Serra Street Stanford, CA semih@stanford.edu
More informationCS224W: Social and Information Network Analysis
CS224W: Social and Information Network Analysis Reaction Paper Adithya Rao, Gautam Kumar Parai, Sandeep Sripada Keywords: Self-similar networks, fractality, scale invariance, modularity, Kronecker graphs.
More informationCS224W: Social and Information Network Analysis Jure Leskovec, Stanford University
CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu 10/24/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
More informationModeling the Evolution of the Global Migration Network
Modeling the Evolution of the Global Migration Network Stephanie Chen (schen751) December 10, 2017 1 Introduction Human migration has shaped the world since humans first came into being as a species; with
More informationLink Prediction. Eman Badr Mohammed Saquib Akmal Khan
Link Prediction Eman Badr Mohammed Saquib Akmal Khan 11-06-2013 Link Prediction Which pair of nodes should be connected? Applications Facebook friend suggestion Recommendation systems Monitoring and controlling
More informationModeling Dynamic Evolution of Online Friendship Network
Commun. Theor. Phys. 58 (2012) 599 603 Vol. 58, No. 4, October 15, 2012 Modeling Dynamic Evolution of Online Friendship Network WU Lian-Ren ( ) 1,2, and YAN Qiang ( Ö) 1 1 School of Economics and Management,
More informationDegree Distribution: The case of Citation Networks
Network Analysis Degree Distribution: The case of Citation Networks Papers (in almost all fields) refer to works done earlier on same/related topics Citations A network can be defined as Each node is
More informationModeling of Growing Networks with Directional Attachment and Communities
Modeling of Growing Networks with Directional Attachment and Communities Masahiro KIMURA, Kazumi SAITO, Naonori UEDA NTT Communication Science Laboratories 2-4 Hikaridai, Seika-cho, Kyoto 619-0237, Japan
More informationDeterministic Decentralized Search in Random Graphs
Deterministic Decentralized Search in Random Graphs Esteban Arcaute 1,, Ning Chen 2,, Ravi Kumar 3, David Liben-Nowell 4,, Mohammad Mahdian 3, Hamid Nazerzadeh 1,, and Ying Xu 1, 1 Stanford University.
More informationCS224W: Methods of Parallelized Kronecker Graph Generation
CS224W: Methods of Parallelized Kronecker Graph Generation Sean Choi, Group 35 December 10th, 2012 1 Introduction The question of generating realistic graphs has always been a topic of huge interests.
More information6.207/14.15: Networks Lecture 12: Generalized Random Graphs
6.207/14.15: Networks Lecture 12: Generalized Random Graphs 1 Outline Small-world model Growing random networks Power-law degree distributions: Rich-Get-Richer effects Models: Uniform attachment model
More information1 Matrix notation and preliminaries from spectral graph theory
Graph clustering (or community detection or graph partitioning) is one of the most studied problems in network analysis. One reason for this is that there are a variety of ways to define a cluster or community.
More information6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search
6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search Daron Acemoglu and Asu Ozdaglar MIT September 30, 2009 1 Networks: Lecture 7 Outline Navigation (or decentralized search)
More informationarxiv:physics/ v3 [physics.soc-ph] 28 Jan 2007
arxiv:physics/0603229v3 [physics.soc-ph] 28 Jan 2007 Graph Evolution: Densification and Shrinking Diameters Jure Leskovec School of Computer Science, Carnegie Mellon University, Pittsburgh, PA Jon Kleinberg
More informationGroups of vertices and Core-periphery structure. By: Ralucca Gera, Applied math department, Naval Postgraduate School Monterey, CA, USA
Groups of vertices and Core-periphery structure By: Ralucca Gera, Applied math department, Naval Postgraduate School Monterey, CA, USA Mostly observed real networks have: Why? Heavy tail (powerlaw most
More informationModeling, Analysis and Validation of Evolving Networks with Hybrid Interactions
1 Modeling, Analysis and Validation of Evolving Networks with Hybrid Interactions Jiaqi Liu, Luoyi Fu, Yuhang Yao, Xinzhe Fu, Xinbing Wang and Guihai Chen Shanghai Jiao Tong University {13-liujiaqi, yiluofu,
More informationSimilarity Measures for Link Prediction Using Power Law Degree Distribution
Similarity Measures for Link Prediction Using Power Law Degree Distribution Srinivas Virinchi and Pabitra Mitra Dept of Computer Science and Engineering, Indian Institute of Technology Kharagpur-72302,
More informationOn Node-differentially Private Algorithms for Graph Statistics
On Node-differentially Private Algorithms for Graph Statistics Om Dipakbhai Thakkar August 26, 2015 Abstract In this report, we start by surveying three papers on node differential privacy. First, we look
More informationAnalysis & Generative Model for Trust Networks
Analysis & Generative Model for Trust Networks Pranav Dandekar Management Science & Engineering Stanford University Stanford, CA 94305 ppd@stanford.edu ABSTRACT Trust Networks are a specific kind of social
More information1 Matrix notation and preliminaries from spectral graph theory
Graph clustering (or community detection or graph partitioning) is one of the most studied problems in network analysis. One reason for this is that there are a variety of ways to define a cluster or community.
More informationCS224W: Analysis of Networks Jure Leskovec, Stanford University
CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu 10/30/17 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 2
More informationData mining in large graphs
Data mining in large graphs Christos Faloutsos University www.cs.cmu.edu/~christos ALLADIN 2003 C. Faloutsos 1 Outline Introduction - motivation Patterns & Power laws Scalability & Fast algorithms Fractals,
More information1 Complex Networks - A Brief Overview
Power-law Degree Distributions 1 Complex Networks - A Brief Overview Complex networks occur in many social, technological and scientific settings. Examples of complex networks include World Wide Web, Internet,
More informationAdventures in random graphs: Models, structures and algorithms
BCAM January 2011 1 Adventures in random graphs: Models, structures and algorithms Armand M. Makowski ECE & ISR/HyNet University of Maryland at College Park armand@isr.umd.edu BCAM January 2011 2 LECTURE
More informationLink Analysis Ranking
Link Analysis Ranking How do search engines decide how to rank your query results? Guess why Google ranks the query results the way it does How would you do it? Naïve ranking of query results Given query
More informationPrediction of Citations for Academic Papers
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationIdentification of Bursts in a Document Stream
Identification of Bursts in a Document Stream Toshiaki FUJIKI 1, Tomoyuki NANNO 1, Yasuhiro SUZUKI 1 and Manabu OKUMURA 2 1 Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute
More informationPredicting flight on-time performance
1 Predicting flight on-time performance Arjun Mathur, Aaron Nagao, Kenny Ng I. INTRODUCTION Time is money, and delayed flights are a frequent cause of frustration for both travellers and airline companies.
More informationRTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs
RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs Leman Akoglu Mary McGlohon Christos Faloutsos Carnegie Mellon University, School of Computer Science {lakoglu, mmcgloho, christos}@cs.cmu.edu
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationRTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs
RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs Submitted for Blind Review Abstract How do real, weighted graphs change over time? What patterns, if any, do they obey? Earlier studies
More informationChaos, Complexity, and Inference (36-462)
Chaos, Complexity, and Inference (36-462) Lecture 21: More Networks: Models and Origin Myths Cosma Shalizi 31 March 2009 New Assignment: Implement Butterfly Mode in R Real Agenda: Models of Networks, with
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationChaos, Complexity, and Inference (36-462)
Chaos, Complexity, and Inference (36-462) Lecture 21 Cosma Shalizi 3 April 2008 Models of Networks, with Origin Myths Erdős-Rényi Encore Erdős-Rényi with Node Types Watts-Strogatz Small World Graphs Exponential-Family
More informationCS 224w: Problem Set 1
CS 224w: Problem Set 1 Tony Hyun Kim October 8, 213 1 Fighting Reticulovirus avarum 1.1 Set of nodes that will be infected We are assuming that once R. avarum infects a host, it always infects all of the
More informationCS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine
CS 277: Data Mining Mining Web Link Structure Class Presentations In-class, Tuesday and Thursday next week 2-person teams: 6 minutes, up to 6 slides, 3 minutes/slides each person 1-person teams 4 minutes,
More informationSpatial and Temporal Behaviors in a Modified Evolution Model Based on Small World Network
Commun. Theor. Phys. (Beijing, China) 42 (2004) pp. 242 246 c International Academic Publishers Vol. 42, No. 2, August 15, 2004 Spatial and Temporal Behaviors in a Modified Evolution Model Based on Small
More informationNetwork Observational Methods and. Quantitative Metrics: II
Network Observational Methods and Whitney topics Quantitative Metrics: II Community structure (some done already in Constraints - I) The Zachary Karate club story Degree correlation Calculating degree
More information1 Searching the World Wide Web
Hubs and Authorities in a Hyperlinked Environment 1 Searching the World Wide Web Because diverse users each modify the link structure of the WWW within a relatively small scope by creating web-pages on
More informationSupplementary Information Activity driven modeling of time varying networks
Supplementary Information Activity driven modeling of time varying networks. Perra, B. Gonçalves, R. Pastor-Satorras, A. Vespignani May 11, 2012 Contents 1 The Model 1 1.1 Integrated network......................................
More informationClustering. CSL465/603 - Fall 2016 Narayanan C Krishnan
Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification
More informationT , Lecture 6 Properties and stochastic models of real-world networks
T-79.7003, Lecture 6 Properties and stochastic models of real-world networks Charalampos E. Tsourakakis 1 1 Aalto University November 1st, 2013 Properties of real-world networks Properties of real-world
More informationCover Page. The handle holds various files of this Leiden University dissertation
Cover Page The handle http://hdl.handle.net/1887/39637 holds various files of this Leiden University dissertation Author: Smit, Laurens Title: Steady-state analysis of large scale systems : the successive
More informationClass President: A Network Approach to Popularity. Due July 18, 2014
Class President: A Network Approach to Popularity Due July 8, 24 Instructions. Due Fri, July 8 at :59 PM 2. Work in groups of up to 3 3. Type up the report, and submit as a pdf on D2L 4. Attach the code
More informationLecture 21: Spectral Learning for Graphical Models
10-708: Probabilistic Graphical Models 10-708, Spring 2016 Lecture 21: Spectral Learning for Graphical Models Lecturer: Eric P. Xing Scribes: Maruan Al-Shedivat, Wei-Cheng Chang, Frederick Liu 1 Motivation
More informationarxiv:cond-mat/ v1 [cond-mat.dis-nn] 4 May 2000
Topology of evolving networks: local events and universality arxiv:cond-mat/0005085v1 [cond-mat.dis-nn] 4 May 2000 Réka Albert and Albert-László Barabási Department of Physics, University of Notre-Dame,
More informationLecture 20 : Markov Chains
CSCI 3560 Probability and Computing Instructor: Bogdan Chlebus Lecture 0 : Markov Chains We consider stochastic processes. A process represents a system that evolves through incremental changes called
More informationLINK ANALYSIS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
LINK ANALYSIS Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Retrieval models Retrieval evaluation Link analysis Models
More informationThanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides
Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides Web Search: How to Organize the Web? Ranking Nodes on Graphs Hubs and Authorities PageRank How to Solve PageRank
More informationLecture: Local Spectral Methods (1 of 4)
Stat260/CS294: Spectral Graph Methods Lecture 18-03/31/2015 Lecture: Local Spectral Methods (1 of 4) Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these notes are still very rough. They provide
More informationLecture 14: Random Walks, Local Graph Clustering, Linear Programming
CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 14: Random Walks, Local Graph Clustering, Linear Programming Lecturer: Shayan Oveis Gharan 3/01/17 Scribe: Laura Vonessen Disclaimer: These
More informationModels of Communication Dynamics for Simulation of Information Diffusion
Models of Communication Dynamics for Simulation of Information Diffusion Konstantin Mertsalov, Malik Magdon-Ismail, Mark Goldberg Rensselaer Polytechnic Institute Department of Computer Science 11 8th
More informationModularity in several random graph models
Modularity in several random graph models Liudmila Ostroumova Prokhorenkova 1,3 Advanced Combinatorics and Network Applications Lab Moscow Institute of Physics and Technology Moscow, Russia Pawe l Pra
More informationWeb Structure Mining Nodes, Links and Influence
Web Structure Mining Nodes, Links and Influence 1 Outline 1. Importance of nodes 1. Centrality 2. Prestige 3. Page Rank 4. Hubs and Authority 5. Metrics comparison 2. Link analysis 3. Influence model 1.
More informationNonlinear Dynamical Behavior in BS Evolution Model Based on Small-World Network Added with Nonlinear Preference
Commun. Theor. Phys. (Beijing, China) 48 (2007) pp. 137 142 c International Academic Publishers Vol. 48, No. 1, July 15, 2007 Nonlinear Dynamical Behavior in BS Evolution Model Based on Small-World Network
More informationSupporting Statistical Hypothesis Testing Over Graphs
Supporting Statistical Hypothesis Testing Over Graphs Jennifer Neville Departments of Computer Science and Statistics Purdue University (joint work with Tina Eliassi-Rad, Brian Gallagher, Sergey Kirshner,
More informationMini course on Complex Networks
Mini course on Complex Networks Massimo Ostilli 1 1 UFSC, Florianopolis, Brazil September 2017 Dep. de Fisica Organization of The Mini Course Day 1: Basic Topology of Equilibrium Networks Day 2: Percolation
More informationProtein Complex Identification by Supervised Graph Clustering
Protein Complex Identification by Supervised Graph Clustering Yanjun Qi 1, Fernanda Balem 2, Christos Faloutsos 1, Judith Klein- Seetharaman 1,2, Ziv Bar-Joseph 1 1 School of Computer Science, Carnegie
More informationCongratulations! You ve completed Practice Test 1! You re now ready to check your
Practice Test 1: Answers and Explanations Congratulations! You ve completed Practice Test 1! You re now ready to check your answers to see how you fared. In this chapter, I provide the answers, including
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/7/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 2 Web pages are not equally important www.joe-schmoe.com
More information4 : Exact Inference: Variable Elimination
10-708: Probabilistic Graphical Models 10-708, Spring 2014 4 : Exact Inference: Variable Elimination Lecturer: Eric P. ing Scribes: Soumya Batra, Pradeep Dasigi, Manzil Zaheer 1 Probabilistic Inference
More informationIntuitionistic Fuzzy Estimation of the Ant Methodology
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 9, No 2 Sofia 2009 Intuitionistic Fuzzy Estimation of the Ant Methodology S Fidanova, P Marinov Institute of Parallel Processing,
More informationRealistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication
Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication Jurij Leskovec 1, Deepayan Chakrabarti 1, Jon Kleinberg 2, and Christos Faloutsos 1 1 School of Computer
More informationLecture 2: Divide and conquer and Dynamic programming
Chapter 2 Lecture 2: Divide and conquer and Dynamic programming 2.1 Divide and Conquer Idea: - divide the problem into subproblems in linear time - solve subproblems recursively - combine the results in
More informationQuantum Percolation: Electrons in a Maze. Brianna Dillon-Thomas, PhD 2016
Quantum Percolation: Electrons in a Maze Brianna Dillon-Thomas, PhD 2016 Physicists, especially theoretical physicists, love to make models of the world to help us understand it. We weigh various effects
More informationPreliminaries and Complexity Theory
Preliminaries and Complexity Theory Oleksandr Romanko CAS 746 - Advanced Topics in Combinatorial Optimization McMaster University, January 16, 2006 Introduction Book structure: 2 Part I Linear Algebra
More informationClass Note #14. In this class, we studied an algorithm for integer multiplication, which. 2 ) to θ(n
Class Note #14 Date: 03/01/2006 [Overall Information] In this class, we studied an algorithm for integer multiplication, which improved the running time from θ(n 2 ) to θ(n 1.59 ). We then used some of
More informationarxiv: v2 [stat.ml] 21 Aug 2009
KRONECKER GRAPHS: AN APPROACH TO MODELING NETWORKS Kronecker graphs: An Approach to Modeling Networks arxiv:0812.4905v2 [stat.ml] 21 Aug 2009 Jure Leskovec Computer Science Department, Stanford University
More informationOnline Social Networks and Media. Link Analysis and Web Search
Online Social Networks and Media Link Analysis and Web Search How to Organize the Web First try: Human curated Web directories Yahoo, DMOZ, LookSmart How to organize the web Second try: Web Search Information
More informationSpectral Analysis of Directed Complex Networks. Tetsuro Murai
MASTER THESIS Spectral Analysis of Directed Complex Networks Tetsuro Murai Department of Physics, Graduate School of Science and Engineering, Aoyama Gakuin University Supervisors: Naomichi Hatano and Kenn
More informationReal Estate Price Prediction with Regression and Classification CS 229 Autumn 2016 Project Final Report
Real Estate Price Prediction with Regression and Classification CS 229 Autumn 2016 Project Final Report Hujia Yu, Jiafu Wu [hujiay, jiafuwu]@stanford.edu 1. Introduction Housing prices are an important
More informationApproximate Inference
Approximate Inference Simulation has a name: sampling Sampling is a hot topic in machine learning, and it s really simple Basic idea: Draw N samples from a sampling distribution S Compute an approximate
More informationPattern Recognition Approaches to Solving Combinatorial Problems in Free Groups
Contemporary Mathematics Pattern Recognition Approaches to Solving Combinatorial Problems in Free Groups Robert M. Haralick, Alex D. Miasnikov, and Alexei G. Myasnikov Abstract. We review some basic methodologies
More informationComplex Networks, Course 303A, Spring, Prof. Peter Dodds
Complex Networks, Course 303A, Spring, 2009 Prof. Peter Dodds Department of Mathematics & Statistics University of Vermont Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License.
More informationPurnamrita Sarkar (Carnegie Mellon) Deepayan Chakrabarti (Yahoo! Research) Andrew W. Moore (Google, Inc.)
Purnamrita Sarkar (Carnegie Mellon) Deepayan Chakrabarti (Yahoo! Research) Andrew W. Moore (Google, Inc.) Which pair of nodes {i,j} should be connected? Variant: node i is given Alice Bob Charlie Friend
More informationarxiv: v1 [cs.it] 26 Sep 2018
SAPLING THEORY FOR GRAPH SIGNALS ON PRODUCT GRAPHS Rohan A. Varma, Carnegie ellon University rohanv@andrew.cmu.edu Jelena Kovačević, NYU Tandon School of Engineering jelenak@nyu.edu arxiv:809.009v [cs.it]
More informationQR FACTORIZATIONS USING A RESTRICTED SET OF ROTATIONS
QR FACTORIZATIONS USING A RESTRICTED SET OF ROTATIONS DIANNE P. O LEARY AND STEPHEN S. BULLOCK Dedicated to Alan George on the occasion of his 60th birthday Abstract. Any matrix A of dimension m n (m n)
More informationIntegrated CME Project Mathematics I-III 2013
A Correlation of -III To the North Carolina High School Mathematics Math I A Correlation of, -III, Introduction This document demonstrates how, -III meets the standards of the Math I. Correlation references
More informationMaking Our Cities Safer: A Study In Neighbhorhood Crime Patterns
Making Our Cities Safer: A Study In Neighbhorhood Crime Patterns Aly Kane alykane@stanford.edu Ariel Sagalovsky asagalov@stanford.edu Abstract Equipped with an understanding of the factors that influence
More informationComputing PageRank using Power Extrapolation
Computing PageRank using Power Extrapolation Taher Haveliwala, Sepandar Kamvar, Dan Klein, Chris Manning, and Gene Golub Stanford University Abstract. We present a novel technique for speeding up the computation
More informationDynamics of Real-world Networks
Dynamics of Real-world Networks Thesis proposal Jurij Leskovec Machine Learning Department Carnegie Mellon University May 2, 2007 Thesis committee: Christos Faloutsos, CMU Avrim Blum, CMU John Lafferty,
More information0.1 O. R. Katta G. Murty, IOE 510 Lecture slides Introductory Lecture. is any organization, large or small.
0.1 O. R. Katta G. Murty, IOE 510 Lecture slides Introductory Lecture Operations Research is the branch of science dealing with techniques for optimizing the performance of systems. System is any organization,
More informationRaRE: Social Rank Regulated Large-scale Network Embedding
RaRE: Social Rank Regulated Large-scale Network Embedding Authors: Yupeng Gu 1, Yizhou Sun 1, Yanen Li 2, Yang Yang 3 04/26/2018 The Web Conference, 2018 1 University of California, Los Angeles 2 Snapchat
More informationCS345a: Data Mining Jure Leskovec and Anand Rajaraman Stanford University
CS345a: Data Mining Jure Leskovec and Anand Rajaraman Stanford University TheFind.com Large set of products (~6GB compressed) For each product A=ributes Related products Craigslist About 3 weeks of data
More informationCS224W: Social and Information Network Analysis Jure Leskovec, Stanford University
CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize/navigate it? First try: Human curated Web directories Yahoo, DMOZ, LookSmart
More informationCollaborative Filtering. Radek Pelánek
Collaborative Filtering Radek Pelánek 2017 Notes on Lecture the most technical lecture of the course includes some scary looking math, but typically with intuitive interpretation use of standard machine
More informationLecture 1: March 7, 2018
Reinforcement Learning Spring Semester, 2017/8 Lecture 1: March 7, 2018 Lecturer: Yishay Mansour Scribe: ym DISCLAIMER: Based on Learning and Planning in Dynamical Systems by Shie Mannor c, all rights
More informationOriented majority-vote model in social dynamics
Author: Facultat de Física, Universitat de Barcelona, Diagonal 645, 08028 Barcelona, Spain. Advisor: M. Ángeles Serrano Mass events ruled by collective behaviour are present in our society every day. Some
More informationCorrelation Lengths of Red and Blue Galaxies: A New Cosmic Ruler
10/22/08 Correlation Lengths of Red and Blue Galaxies: A New Cosmic Ruler Michael J. Longo University of Michigan, Ann Arbor, MI 48109 A comparison of the correlation lengths of red galaxies with blue
More informationCommunities Via Laplacian Matrices. Degree, Adjacency, and Laplacian Matrices Eigenvectors of Laplacian Matrices
Communities Via Laplacian Matrices Degree, Adjacency, and Laplacian Matrices Eigenvectors of Laplacian Matrices The Laplacian Approach As with betweenness approach, we want to divide a social graph into
More informationIntroduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa
Introduction to Search Engine Technology Introduction to Link Structure Analysis Ronny Lempel Yahoo Labs, Haifa Outline Anchor-text indexing Mathematical Background Motivation for link structure analysis
More informationCollaborative Filtering Applied to Educational Data Mining
Journal of Machine Learning Research (200) Submitted ; Published Collaborative Filtering Applied to Educational Data Mining Andreas Töscher commendo research 8580 Köflach, Austria andreas.toescher@commendo.at
More informationGreedy Search in Social Networks
Greedy Search in Social Networks David Liben-Nowell Carleton College dlibenno@carleton.edu Joint work with Ravi Kumar, Jasmine Novak, Prabhakar Raghavan, and Andrew Tomkins. IPAM, Los Angeles 8 May 2007
More informationTeaching a Prestatistics Course: Propelling Non-STEM Students Forward
Teaching a Prestatistics Course: Propelling Non-STEM Students Forward Jay Lehmann College of San Mateo MathNerdJay@aol.com www.pearsonhighered.com/lehmannseries Learning Is in the Details Detailing concepts
More informationLAMMPS Simulation of a Microgravity Shear Cell 299r Progress Report Taiyo Wilson. Units/Parameters:
Units/Parameters: In our simulations, we chose to express quantities in terms of three fundamental values: m (particle mass), d (particle diameter), and τ (timestep, which is equivalent to (g/d)^0.5, where
More informationINFO 2950 Intro to Data Science. Lecture 18: Power Laws and Big Data
INFO 2950 Intro to Data Science Lecture 18: Power Laws and Big Data Paul Ginsparg Cornell University, Ithaca, NY 7 Apr 2016 1/25 Power Laws in log-log space y = cx k (k=1/2,1,2) log 10 y = k log 10 x +log
More informationLearning Energy-Based Models of High-Dimensional Data
Learning Energy-Based Models of High-Dimensional Data Geoffrey Hinton Max Welling Yee-Whye Teh Simon Osindero www.cs.toronto.edu/~hinton/energybasedmodelsweb.htm Discovering causal structure as a goal
More informationLower Bounds for Testing Bipartiteness in Dense Graphs
Lower Bounds for Testing Bipartiteness in Dense Graphs Andrej Bogdanov Luca Trevisan Abstract We consider the problem of testing bipartiteness in the adjacency matrix model. The best known algorithm, due
More informationAlgebra 1 Yearlong Curriculum Plan. Last modified: June 2014
Algebra 1 Yearlong Curriculum Plan Last modified: June 2014 SUMMARY This curriculum plan is divided into four academic quarters. In Quarter 1, students first dive deeper into the real number system before
More informationCourse Number 432/433 Title Algebra II (A & B) H Grade # of Days 120
Whitman-Hanson Regional High School provides all students with a high- quality education in order to develop reflective, concerned citizens and contributing members of the global community. Course Number
More information