Analyzing Evolution of Web Communities using a Series of Japanese Web Archives

Size: px
Start display at page:

Download "Analyzing Evolution of Web Communities using a Series of Japanese Web Archives"

Transcription

1 {toyoda,kitsure}@tkl.iis.u-tokyo.ac.jp Analyzing Evolution of Web Communities using a Series of Japanese Web Archives Masashi TOYODA and Masaru KITSUREGAWA Institute of Industrial Science, University of Tokyo 4 6 Komaba Meguro-ku, Tokyo, Japan {toyoda,kitsure}@tkl.iis.u-tokyo.ac.jp Abstract In this paper, we analyze evolution of web communities using a series of Japanese web archives. A web community is a set of web pages created by individuals or associations with a common interest on a topic. So far various techniques have been developed to extract web communities by link analysis. We examined evolution of web communities by comparing Japanese web archives crawled four times from 999 to Several metrics are introduced for measuring the degree of web community evolution, such as growth rate, novelty, and stability. Statistics of these archives and the evolution metrics are shown, and the global behavior of evolution is described. We found that most of changes in communities follow the power-law, and the changes are symmetric between split and merge, and between growth and shrinkage. Key words Web, link analysis, web community, evolution. URL [4] WWW

2 [2], [4], [5], [7] [2] () (2) (3) [3] [3] Fig A part of our web community chart Yahoo! /../Maker/Electric PC vendor links Computer vendors IBM TOSHIBA SONY 2 Authority hub Fig. 2 Typical graph structure of hubs and authorities [3] [6], [3] () (2) authority hub (3) authority authority hub hub authority hub authority [6], [3] 2 authority hub IBM TOSHIBA SONY authority hub 2 authority hub IBM TOSHIBA SONY IN IN

3 N N a b [3] 2 () 3. t,t 2,..., t n: W (t k ): t k C(t k ): W (t k ) c(t k ),d(t k ),e(t k ),...: C(t k ) 3. (W (t ),W(t 2),..., W (t n)) () (C(t ),C(t 2),..., C(t n)) (2) t k C(t k ) 2 C(t k ) C(t k ) t k t k : c(t k ) C(t k ) URL c(t k ) C(t k ) c(t k ) URL W (t k ) : c(t k ) C(t k ) c(t k ) c(t k ) URL W (t k ) : C(t k ) c(t k ) C(t k ) c(t k ) URL 2 URL URL URL URL : c(t k ) C(t k ) URL : c(t k ) C(t k ) URL 3. 2 c(t k ) c(t k ) c(t k ) c(t k ) C(t k ) t k c(t k ) C(t k ) URL c(t k ) c(t k ) URL URL c(t k ) t k c (t k ) c (t k ) c(t k ) (c(t k ), c (t k )) c(t k ) c(t k ) N(c(t k )): N sh (c(t k ),c(t k )): N dis (c(t k )): N sp(c(t k ),c(t k )): N ap(c(t k ): N mg(c(t t ),c(t k )): c(t k ) c(t k ) c(t k ) c(t k ) c(t k ) c(t k ) c(t k ) c(t k ) c(t k ) URL : URL c(t k ) N(c(t k )) 0

4 R grow(c(t k ),c(t k )) = N(c(t k)) N(c(t k )) t k t k : URL URL URL URL : R stability (c(t k ),c(t k )) = N(c(t k )) + N(c(t k )) 2N sh (c(t k ),c(t k )) t k t k URL R novelty (c(t k ),c(t k )) = Nap(c(t k)) t k t k : URL R disappear (c(t k ),c(t k )) = N dis(c(t k )) t k t k : URL R split (c(t k ),c(t k )) = Nsp(c(t k ),c(t k )) t k t k : URL R merge(c(t k ),c(t k )) = Nmg(c(t k ),c(t k )) t k t k : : : Year Period Crawled pages Total URLs Links 999 Jul. to Aug. 7M 34M 20M 2000 Jun. to Aug. 7M 32M 2M 200 Early Oct. 40M 76M 33M 2002 Early Feb. 45M 84M 375M Table Details of our web archives (c(t i), c(t i+),..., c(t j)) j k=i R novelty (c(t i),c(t j)) = Nap(c(t k)) t j t i jp URL URL URL com edu URL URL URL URL connectivity server [] Sun Enterprise Server URL URL ( URL ) 3 URL URL URL URL URL 3 URL URL

5 8.E+07 7.E+07 6.E+07 5.E+07 4.E+07 3.E+07 2.E+07 Transition of URLs in web graphs.e+07 0.E+00 Aug-99 Oct-99 Dec-99 Feb-00 Apr-00 Jun-00 Aug-00 Oct-00 Dec-00 Feb-0 Apr-0 Jun-0 Aug-0 Oct-0 Dec-0 Feb-02 Number of URLs Year Period Seeds Communities 999 Jul. to Aug. 657K 79K 2000 Jun. to Aug. 737K 88K 200 Early Oct. 404K 56K 2002 Early Feb. 5K 70K 2 URL Table 2 The number of seeds and communities Time 3 Fig. 3 URL Transition of URLs in web graphs URL % URL % URL URL 5 URL URL () [3], [0], [2] i URL i /i k [3], [0], [2] k 2. k URL IN 3 3 URL IN N N 0 0 ( [3] ) 2 URL 5. 4 Fig. 4 An emerged community around an Islam information community URL [5] URL URL URL (

6 e+06 Size distribution of communities Aug 999 Aug 2000 Oct 200 Feb 2002 Number of communities Size of community 6 Fig. 6 Size distribution of communities Transition of communities 5 Fig. 5 Communities of biotechnology companies Aug-99 Emerged Dissolved Split or merged Single Oct-99 Dec-99 Feb-00 Apr-00 Jun-00 Aug-00 Oct-00 Dec-00 Feb-0 Apr-0 Jun-0 Aug-0 Oct-0 Dec-0 Feb-02 ) URL hub hub hub hub hub 6. Time 7 Fig. 7 Transition of communities t k :Aug.99 Aug.00 Oct.0 t k :Aug.00 Oct.0 Feb.02 t k 28,467 32,490 4,50 8,060 22,299 44,425 t k 29,722 4,752 44,305 3 Table 3 Number of main lines in split or merged communities 6. ( URL ) t k t k

7 Distribution of split rate Aug Aug 2000 Aug Oct 200 Oct Feb 2002 Size distribution of emerged communities Aug Aug 2000 Aug Oct 200 Oct Feb Nsp Size of emerged community Fig. 8 8 Distribution of split rate 0 Fig. 0 Size distribution of emerged communities Distribution of merge rate Aug Aug 2000 Aug Oct 200 Oct Feb 2002 Size distribution of dissolved communities Aug 999--Aug 2000 Aug Oct 200 Oct 200--Feb Nmg Size of dissolved community Fig. 9 9 Distribution of merge rate Fig. Size distribution of dissolved communities 7 25% % ( 3. 2 ) t k 40% % 8 9 URL ( 3. 2 N sp N mg)

8 Number of mainlines Growth rate distribution Fig. 2 2 Growth rate Aug Aug 2000 Aug Oct 200 Oct Feb 2002 Distribution of growth rate t k t k 0 [] K. Bharat, A. Broder, M. Henzinger, P. Kumar, and S. Venkatasubramanian. The Connectivity Server: fast access to linkage information on the Web. In Proceedings of the 7th International World Wide Web Conference, 998. [2] K. Bharat and G. A. Mihaila. When Experts Agree: Using Non-Affiliated Experts to Rank Popular Topics. In Proceedings of the 0th World-Wide Web Conference, 200. [3] A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener. Graph Structure in the Web. In Proceedings of the 9th World- Wide Web Conference, [4] S. Chakrabarti. Integrating the Document Object Model with Hyperlinks for Enhanced Topic Distillation. In Proceedings of the 0th World-Wide Web Conference, 200. [5] S. Chakrabarti, B. Dom, P. Raghavan, S. Rajagopalan, D. Gibson, and J. Kleinberg. Automatic resource compilation by analyzing hyperlink structure and associated text. In Proceedings of the 7th International World Wide Web Conference, 998. [6] J. Dean and M. R. Henzinger. Finding related pages in the World Wide Web. In Proceedings of the 8th World-Wide Web Conference, 999. [7] G. W. Flake, S. Lawrence, and C. L. Giles. Efficient Identification of Web Communities. In Proceedings of KDD 2000, [8] D. Gibson, J. Kleinberg, and P. Raghavan. Inferring Web Communities from Link Topology. In Proceedings of Hyper- Text98, 998. [9] J. M. Kleinberg. Authoritative Sources in a Hyperlinked Environment. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, 998. [0] R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Extracting large-scale knowledge bases from the web. In Proceedings of the 25th VLDB Conference, 999. [] R. Lempel and S. Moran. The Stochastic Approach for Link-Structure Analysis (SALSA) and the TKC Effect. In Proceedings of the 9th World-Wide Web Conference, [2] S. R. Ravi Kumar, Prabhakar Raghavan and A. Tomkins. Trawling the Web for emerging cyber-communities. In Proceedings of the 8th World-Wide Web Conference, 999. [3] M. Toyoda and M. Kitsuregawa. Creating a Web Community Chart for Navigating Related Communities. In Conference Proceedings of Hypertext 200, pp. 03 2, 200. [4] Wayback Machine, The Internet Archive. archive. org/. [5],..,

Analyzing Evolution of Web Communities using a Series of Japanese Web Archives

Analyzing Evolution of Web Communities using a Series of Japanese Web Archives DEWS2003 2-P-05 53 8505 4 6 E-mail: {toyoda,kitsure}@tkl.iis.u-tokyo.ac.jp 999 2002 4 Analyzing Evolution of Web Communities using a Series of Japanese Web Archives Masashi TOYODA and Masaru KITSUREGAWA

More information

An Analysis on Link Structure Evolution Pattern of Web Communities

An Analysis on Link Structure Evolution Pattern of Web Communities DEWS2005 5-C-01 153-8505 4-6-1 E-mail: {imafuji,kitsure}@tkl.iis.u-tokyo.ac.jp 1999 2002 HITS An Analysis on Link Structure Evolution Pattern of Web Communities Noriko IMAFUJI and Masaru KITSUREGAWA Institute

More information

GAMINGRE 8/1/ of 7

GAMINGRE 8/1/ of 7 FYE 09/30/92 JULY 92 0.00 254,550.00 0.00 0 0 0 0 0 0 0 0 0 254,550.00 0.00 0.00 0.00 0.00 254,550.00 AUG 10,616,710.31 5,299.95 845,656.83 84,565.68 61,084.86 23,480.82 339,734.73 135,893.89 67,946.95

More information

Hyperlinked-Induced Topic Search (HITS) identifies. authorities as good content sources (~high indegree) HITS [Kleinberg 99] considers a web page

Hyperlinked-Induced Topic Search (HITS) identifies. authorities as good content sources (~high indegree) HITS [Kleinberg 99] considers a web page IV.3 HITS Hyperlinked-Induced Topic Search (HITS) identifies authorities as good content sources (~high indegree) hubs as good link sources (~high outdegree) HITS [Kleinberg 99] considers a web page a

More information

Identification of Bursts in a Document Stream

Identification of Bursts in a Document Stream Identification of Bursts in a Document Stream Toshiaki FUJIKI 1, Tomoyuki NANNO 1, Yasuhiro SUZUKI 1 and Manabu OKUMURA 2 1 Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute

More information

Link Analysis: Hubs and Authorities on the World Wide Web

Link Analysis: Hubs and Authorities on the World Wide Web Link Analysis: Hubs and Authorities on the World Wide Web Chris H.Q. Ding, Hongyuan Zha, Xiaofeng He, Parry Husbands, Horst D. Simon LBNL Tech Report 47847. May 7, 2001 (updated July 2003). Abstract Ranking

More information

A stochastic model for the link analysis of the Web

A stochastic model for the link analysis of the Web A stochastic model for the link analysis of the Web Paola Favati, IIT-CNR, Pisa. Grazia Lotti, University of Parma. Ornella Menchi and Francesco Romani, University of Pisa Three examples of link distribution

More information

arxiv: v1 [physics.soc-ph] 15 Dec 2009

arxiv: v1 [physics.soc-ph] 15 Dec 2009 Power laws of the in-degree and out-degree distributions of complex networks arxiv:0912.2793v1 [physics.soc-ph] 15 Dec 2009 Shinji Tanimoto Department of Mathematics, Kochi Joshi University, Kochi 780-8515,

More information

Hub, Authority and Relevance Scores in Multi-Relational Data for Query Search

Hub, Authority and Relevance Scores in Multi-Relational Data for Query Search Hub, Authority and Relevance Scores in Multi-Relational Data for Query Search Xutao Li 1 Michael Ng 2 Yunming Ye 1 1 Department of Computer Science, Shenzhen Graduate School, Harbin Institute of Technology,

More information

Modeling of Growing Networks with Directional Attachment and Communities

Modeling of Growing Networks with Directional Attachment and Communities Modeling of Growing Networks with Directional Attachment and Communities Masahiro KIMURA, Kazumi SAITO, Naonori UEDA NTT Communication Science Laboratories 2-4 Hikaridai, Seika-cho, Kyoto 619-0237, Japan

More information

Models and Algorithms for Complex Networks. Link Analysis Ranking

Models and Algorithms for Complex Networks. Link Analysis Ranking Models and Algorithms for Complex Networks Link Analysis Ranking Why Link Analysis? First generation search engines view documents as flat text files could not cope with size, spamming, user needs Second

More information

The Dynamic Absorbing Model for the Web

The Dynamic Absorbing Model for the Web The Dynamic Absorbing Model for the Web Gianni Amati, Iadh Ounis, Vassilis Plachouras Department of Computing Science University of Glasgow Glasgow G12 8QQ, U.K. Abstract In this paper we propose a new

More information

Link Analysis. Leonid E. Zhukov

Link Analysis. Leonid E. Zhukov Link Analysis Leonid E. Zhukov School of Data Analysis and Artificial Intelligence Department of Computer Science National Research University Higher School of Economics Structural Analysis and Visualization

More information

The Diameter of Random Massive Graphs

The Diameter of Random Massive Graphs The Diameter of Random Massive Graphs Linyuan Lu October 30, 2000 Abstract Many massive graphs (such as the WWW graph and Call graphs) share certain universal characteristics which can be described by

More information

Data Mining and Matrices

Data Mining and Matrices Data Mining and Matrices 10 Graphs II Rainer Gemulla, Pauli Miettinen Jul 4, 2013 Link analysis The web as a directed graph Set of web pages with associated textual content Hyperlinks between webpages

More information

Introduction to Information Retrieval

Introduction to Information Retrieval Introduction to Information Retrieval http://informationretrieval.org IIR 19: Size Estimation & Duplicate Detection Hinrich Schütze Institute for Natural Language Processing, Universität Stuttgart 2008.07.08

More information

Wiki Definition. Reputation Systems I. Outline. Introduction to Reputations. Yury Lifshits. HITS, PageRank, SALSA, ebay, EigenTrust, VKontakte

Wiki Definition. Reputation Systems I. Outline. Introduction to Reputations. Yury Lifshits. HITS, PageRank, SALSA, ebay, EigenTrust, VKontakte Reputation Systems I HITS, PageRank, SALSA, ebay, EigenTrust, VKontakte Yury Lifshits Wiki Definition Reputation is the opinion (more technically, a social evaluation) of the public toward a person, a

More information

MultiRank and HAR for Ranking Multi-relational Data, Transition Probability Tensors, and Multi-Stochastic Tensors

MultiRank and HAR for Ranking Multi-relational Data, Transition Probability Tensors, and Multi-Stochastic Tensors MultiRank and HAR for Ranking Multi-relational Data, Transition Probability Tensors, and Multi-Stochastic Tensors Michael K. Ng Centre for Mathematical Imaging and Vision and Department of Mathematics

More information

preferential attachment

preferential attachment preferential attachment introduction to network analysis (ina) Lovro Šubelj University of Ljubljana spring 2018/19 preferential attachment graph models are ensembles of random graphs generative models

More information

Introduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa

Introduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa Introduction to Search Engine Technology Introduction to Link Structure Analysis Ronny Lempel Yahoo Labs, Haifa Outline Anchor-text indexing Mathematical Background Motivation for link structure analysis

More information

Link Analysis. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze

Link Analysis. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze Link Analysis Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze 1 The Web as a Directed Graph Page A Anchor hyperlink Page B Assumption 1: A hyperlink between pages

More information

Why does attention to web articles fall with time?

Why does attention to web articles fall with time? Why does attention to web articles fall with time? M.V. Simkin and V.P. Roychowdhury Department of Electrical Engineering, University of California, Los Angeles, CA 90095-594 We analyze access statistics

More information

How does Google rank webpages?

How does Google rank webpages? Linear Algebra Spring 016 How does Google rank webpages? Dept. of Internet and Multimedia Eng. Konkuk University leehw@konkuk.ac.kr 1 Background on search engines Outline HITS algorithm (Jon Kleinberg)

More information

Capturing the Semantics of Web Log Data by Navigation Matrices

Capturing the Semantics of Web Log Data by Navigation Matrices Capturing the emantics of Web Log Data by Navigation Matrices Wilfred Ng Email:wilfred@cs.ust.hk Department of Computer cience, The Hong Kong University of cience and Technology Abstract: Key words: The

More information

Deterministic Decentralized Search in Random Graphs

Deterministic Decentralized Search in Random Graphs Deterministic Decentralized Search in Random Graphs Esteban Arcaute 1,, Ning Chen 2,, Ravi Kumar 3, David Liben-Nowell 4,, Mohammad Mahdian 3, Hamid Nazerzadeh 1,, and Ying Xu 1, 1 Stanford University.

More information

Perturbation of the hyper-linked environment

Perturbation of the hyper-linked environment Perturbation of the hyper-linked environment Hyun Chul Lee and Allan Borodin Department of Computer Science University of Toronto Toronto, Ontario, M5S3G4 {leehyun,bor}@cs.toronto.edu Abstract. After the

More information

Extrapolation Methods for Accelerating PageRank Computations

Extrapolation Methods for Accelerating PageRank Computations Extrapolation Methods for Accelerating PageRank Computations Sepandar D. Kamvar Stanford University sdkamvar@stanford.edu Taher H. Haveliwala Stanford University taherh@cs.stanford.edu Christopher D. Manning

More information

T , Lecture 6 Properties and stochastic models of real-world networks

T , Lecture 6 Properties and stochastic models of real-world networks T-79.7003, Lecture 6 Properties and stochastic models of real-world networks Charalampos E. Tsourakakis 1 1 Aalto University November 1st, 2013 Properties of real-world networks Properties of real-world

More information

Annual Average NYMEX Strip Comparison 7/03/2017

Annual Average NYMEX Strip Comparison 7/03/2017 Annual Average NYMEX Strip Comparison 7/03/2017 To Year to Year Oil Price Deck ($/bbl) change Year change 7/3/2017 6/1/2017 5/1/2017 4/3/2017 3/1/2017 2/1/2017-2.7% 2017 Average -10.4% 47.52 48.84 49.58

More information

1 Searching the World Wide Web

1 Searching the World Wide Web Hubs and Authorities in a Hyperlinked Environment 1 Searching the World Wide Web Because diverse users each modify the link structure of the WWW within a relatively small scope by creating web-pages on

More information

The Second Eigenvalue of the Google Matrix

The Second Eigenvalue of the Google Matrix The Second Eigenvalue of the Google Matrix Taher H. Haveliwala and Sepandar D. Kamvar Stanford University {taherh,sdkamvar}@cs.stanford.edu Abstract. We determine analytically the modulus of the second

More information

Multiple Relational Ranking in Tensor: Theory, Algorithms and Applications

Multiple Relational Ranking in Tensor: Theory, Algorithms and Applications Multiple Relational Ranking in Tensor: Theory, Algorithms and Applications Michael K. Ng Centre for Mathematical Imaging and Vision and Department of Mathematics Hong Kong Baptist University Email: mng@math.hkbu.edu.hk

More information

Link Analysis: Hubs and Authorities on the World Wide Web

Link Analysis: Hubs and Authorities on the World Wide Web SIAM REVIEW Vol. 46, No. 2, pp. 000 000 c 2004 Society for Industrial and Applied Mathematics Link Analysis: Hubs and Authorities on the World Wide Web Chris H. Q. Ding Hongyuan Zha Xiaofeng He Parry Husbands

More information

Spectral Analysis of Directed Complex Networks. Tetsuro Murai

Spectral Analysis of Directed Complex Networks. Tetsuro Murai MASTER THESIS Spectral Analysis of Directed Complex Networks Tetsuro Murai Department of Physics, Graduate School of Science and Engineering, Aoyama Gakuin University Supervisors: Naomichi Hatano and Kenn

More information

Node and Link Analysis

Node and Link Analysis Node and Link Analysis Leonid E. Zhukov School of Applied Mathematics and Information Science National Research University Higher School of Economics 10.02.2014 Leonid E. Zhukov (HSE) Lecture 5 10.02.2014

More information

Atmospheric circulation analysis for seasonal forecasting

Atmospheric circulation analysis for seasonal forecasting Training Seminar on Application of Seasonal Forecast GPV Data to Seasonal Forecast Products 18 21 January 2011 Tokyo, Japan Atmospheric circulation analysis for seasonal forecasting Shotaro Tanaka Climate

More information

A Singular Perturbation Approach for Choosing the PageRank Damping Factor

A Singular Perturbation Approach for Choosing the PageRank Damping Factor imvol5 2009/7/13 17:22 page 45 #1 Internet Mathematics Vol. 5, No. 1 2: 45 67 A Singular Perturbation Approach for Choosing the PageRank Damping Factor Konstantin Avrachenkov, Nelly Litvak, and Kim Son

More information

Link Analysis Information Retrieval and Data Mining. Prof. Matteo Matteucci

Link Analysis Information Retrieval and Data Mining. Prof. Matteo Matteucci Link Analysis Information Retrieval and Data Mining Prof. Matteo Matteucci Hyperlinks for Indexing and Ranking 2 Page A Hyperlink Page B Intuitions The anchor text might describe the target page B Anchor

More information

Average 175, , , , , , ,046 YTD Total 1,098,649 1,509,593 1,868,795 1,418, ,169 1,977,225 2,065,321

Average 175, , , , , , ,046 YTD Total 1,098,649 1,509,593 1,868,795 1,418, ,169 1,977,225 2,065,321 AGRICULTURE 01-Agriculture JUL 2,944-4,465 1,783-146 102 AUG 2,753 6,497 5,321 1,233 1,678 744 1,469 SEP - 4,274 4,183 1,596 - - 238 OCT 2,694 - - 1,032 340-276 NOV 1,979-5,822 637 3,221 1,923 1,532 DEC

More information

Average 175, , , , , , ,940 YTD Total 944,460 1,284,944 1,635,177 1,183, ,954 1,744,134 1,565,640

Average 175, , , , , , ,940 YTD Total 944,460 1,284,944 1,635,177 1,183, ,954 1,744,134 1,565,640 AGRICULTURE 01-Agriculture JUL 2,944-4,465 1,783-146 102 AUG 2,753 6,497 5,321 1,233 1,678 744 1,469 SEP - 4,274 4,183 1,596 - - 238 OCT 2,694 - - 1,032 340-276 NOV 1,979-5,822 637 3,221 1,923 1,532 DEC

More information

Web Search Via Hub Synthesis

Web Search Via Hub Synthesis Web Search Via Hub Synthesis Dimitris Achlioptas Λ Amos Fiat y Anna R. Karlin z Frank McSherry x Small have continual plodders ever won save base authority from others books. William Shakespeare, Loves

More information

Inf 2B: Ranking Queries on the WWW

Inf 2B: Ranking Queries on the WWW Inf B: Ranking Queries on the WWW Kyriakos Kalorkoti School of Informatics University of Edinburgh Queries Suppose we have an Inverted Index for a set of webpages. Disclaimer Not really the scenario of

More information

Analysis (SALSA) and the TKC Eect. R. Lempel S. Moran. Department of Computer Science. The Technion, Haifa 32000, Israel

Analysis (SALSA) and the TKC Eect. R. Lempel S. Moran. Department of Computer Science. The Technion, Haifa 32000, Israel The Stochastic Approach for Link-Structure Analysis (SALSA) and the TKC Eect R. Lempel S. Moran Department of Computer Science The Technion, Haifa 32000, Israel email: frlempel,morang@cs.technion.ac.il

More information

What is this Page Known for? Computing Web Page Reputations. Outline

What is this Page Known for? Computing Web Page Reputations. Outline What is this Page Known for? Computing Web Page Reputations Davood Rafiei University of Alberta http://www.cs.ualberta.ca/~drafiei Joint work with Alberto Mendelzon (U. of Toronto) 1 Outline Scenarios

More information

LINK ANALYSIS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

LINK ANALYSIS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS LINK ANALYSIS Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Retrieval models Retrieval evaluation Link analysis Models

More information

Must-read material (1 of 2) : Multimedia Databases and Data Mining. Must-read material (2 of 2) Main outline

Must-read material (1 of 2) : Multimedia Databases and Data Mining. Must-read material (2 of 2) Main outline Must-read material (1 of 2) 15-826: Multimedia Databases and Data Mining Lecture #29: Graph mining - Generators & tools Fully Automatic Cross-Associations, by D. Chakrabarti, S. Papadimitriou, D. Modha

More information

Distribution of PageRank Mass Among Principle Components of the Web

Distribution of PageRank Mass Among Principle Components of the Web Distribution of PageRank Mass Among Principle Components of the Web Konstantin Avrachenkov Nelly Litvak Kim Son Pham arxiv:0709.2016v1 [cs.ni] 13 Sep 2007 September 2007 Abstract We study the PageRank

More information

Mr. XYZ. Stock Market Trading and Investment Astrology Report. Report Duration: 12 months. Type: Both Stocks and Option. Date: Apr 12, 2011

Mr. XYZ. Stock Market Trading and Investment Astrology Report. Report Duration: 12 months. Type: Both Stocks and Option. Date: Apr 12, 2011 Mr. XYZ Stock Market Trading and Investment Astrology Report Report Duration: 12 months Type: Both Stocks and Option Date: Apr 12, 2011 KT Astrologer Website: http://www.softwareandfinance.com/magazine/astrology/kt_astrologer.php

More information

MAE 298, Lecture 8 Feb 4, Web search and decentralized search on small-worlds

MAE 298, Lecture 8 Feb 4, Web search and decentralized search on small-worlds MAE 298, Lecture 8 Feb 4, 2008 Web search and decentralized search on small-worlds Search for information Assume some resource of interest is stored at the vertices of a network: Web pages Files in a file-sharing

More information

Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication

Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication Jurij Leskovec 1, Deepayan Chakrabarti 1, Jon Kleinberg 2, and Christos Faloutsos 1 1 School of Computer

More information

CS224W: Social and Information Network Analysis

CS224W: Social and Information Network Analysis CS224W: Social and Information Network Analysis Reaction Paper Adithya Rao, Gautam Kumar Parai, Sandeep Sripada Keywords: Self-similar networks, fractality, scale invariance, modularity, Kronecker graphs.

More information

BRADSHAW'S RAILWAY GUIDE : accessible copies

BRADSHAW'S RAILWAY GUIDE : accessible copies BRADSHAW'S RAILWAY GUIDE : accessible copies Y = copy held; YS = copy held with supplement; R = reprint held; I = incomplete copy held; F = fragile copy (not available for general public - access limited);

More information

Information Retrieval and Search. Web Linkage Mining. Miłosz Kadziński

Information Retrieval and Search. Web Linkage Mining. Miłosz Kadziński Web Linkage Analysis D24 D4 : Web Linkage Mining Miłosz Kadziński Institute of Computing Science Poznan University of Technology, Poland www.cs.put.poznan.pl/mkadzinski/wpi Web mining: Web Mining Discovery

More information

Applications to network analysis: Eigenvector centrality indices Lecture notes

Applications to network analysis: Eigenvector centrality indices Lecture notes Applications to network analysis: Eigenvector centrality indices Lecture notes Dario Fasino, University of Udine (Italy) Lecture notes for the second part of the course Nonnegative and spectral matrix

More information

Computing & Telecommunications Services

Computing & Telecommunications Services Computing & Telecommunications Services Monthly Report September 214 CaTS Help Desk (937) 775-4827 1-888-775-4827 25 Library Annex helpdesk@wright.edu www.wright.edu/cats/ Table of Contents HEAT Ticket

More information

On the NP-Completeness of Some Graph Cluster Measures

On the NP-Completeness of Some Graph Cluster Measures arxiv:cs/050600v [cs.cc] 9 Jun 005 On the NP-Completeness of Some Graph Cluster Measures Jiří Šíma Institute of Computer Science, Academy of Sciences of the Czech Republic, P. O. Box 5, 807 Prague 8, Czech

More information

Computing & Telecommunications Services Monthly Report January CaTS Help Desk. Wright State University (937)

Computing & Telecommunications Services Monthly Report January CaTS Help Desk. Wright State University (937) January 215 Monthly Report Computing & Telecommunications Services Monthly Report January 215 CaTS Help Desk (937) 775-4827 1-888-775-4827 25 Library Annex helpdesk@wright.edu www.wright.edu/cats/ Last

More information

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize/navigate it? First try: Human curated Web directories Yahoo, DMOZ, LookSmart

More information

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine CS 277: Data Mining Mining Web Link Structure Class Presentations In-class, Tuesday and Thursday next week 2-person teams: 6 minutes, up to 6 slides, 3 minutes/slides each person 1-person teams 4 minutes,

More information

Link Analysis Ranking

Link Analysis Ranking Link Analysis Ranking How do search engines decide how to rank your query results? Guess why Google ranks the query results the way it does How would you do it? Naïve ranking of query results Given query

More information

In this activity, students will compare weather data from to determine if there is a warming trend in their community.

In this activity, students will compare weather data from to determine if there is a warming trend in their community. Overview: In this activity, students will compare weather data from 1910-2000 to determine if there is a warming trend in their community. Objectives: The student will: use the Internet to locate scientific

More information

CS54701 Information Retrieval. Link Analysis. Luo Si. Department of Computer Science Purdue University. Borrowed Slides from Prof.

CS54701 Information Retrieval. Link Analysis. Luo Si. Department of Computer Science Purdue University. Borrowed Slides from Prof. CS54701 Information Retrieval Link Analysis Luo Si Department of Computer Science Purdue University Borrowed Slides from Prof. Rong Jin (MSU) Citation Analysis Web Structure Web is a graph Each web site

More information

The Bursty Structure of On-Line Information Streams Jon Kleinberg Cornell University

The Bursty Structure of On-Line Information Streams Jon Kleinberg Cornell University The Bursty Structure of On-Line Information Streams Jon Kleinberg Cornell University Includes joint work w/ Jon Aizen, Dan Huttenlocher, Tony Novak Internet Archive and Cornell University Temporal Dynamics

More information

Technical note on seasonal adjustment for M0

Technical note on seasonal adjustment for M0 Technical note on seasonal adjustment for M0 July 1, 2013 Contents 1 M0 2 2 Steps in the seasonal adjustment procedure 3 2.1 Pre-adjustment analysis............................... 3 2.2 Seasonal adjustment.................................

More information

arxiv:cond-mat/ v2 [cond-mat.stat-mech] 13 Jan 1999

arxiv:cond-mat/ v2 [cond-mat.stat-mech] 13 Jan 1999 arxiv:cond-mat/9901071v2 [cond-mat.stat-mech] 13 Jan 1999 Evolutionary Dynamics of the World Wide Web Bernardo A. Huberman and Lada A. Adamic Xerox Palo Alto Research Center Palo Alto, CA 94304 February

More information

Improve Forecasts: Use Defect Signals

Improve Forecasts: Use Defect Signals Improve Forecasts: Use Defect Signals Paul Below paul.below@qsm.com Quantitative Software Management, Inc. Introduction Large development and integration project testing phases can extend over many months

More information

arxiv:physics/ v1 [physics.soc-ph] 14 Dec 2006

arxiv:physics/ v1 [physics.soc-ph] 14 Dec 2006 Europhysics Letters PREPRINT Growing network with j-redirection arxiv:physics/0612148v1 [physics.soc-ph] 14 Dec 2006 R. Lambiotte 1 and M. Ausloos 1 1 SUPRATECS, Université de Liège, B5 Sart-Tilman, B-4000

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 6: Numerical Linear Algebra: Applications in Machine Learning Cho-Jui Hsieh UC Davis April 27, 2017 Principal Component Analysis Principal

More information

Online Social Networks and Media. Link Analysis and Web Search

Online Social Networks and Media. Link Analysis and Web Search Online Social Networks and Media Link Analysis and Web Search How to Organize the Web First try: Human curated Web directories Yahoo, DMOZ, LookSmart How to organize the web Second try: Web Search Information

More information

arxiv:physics/ v2 [physics.soc-ph] 9 Feb 2007

arxiv:physics/ v2 [physics.soc-ph] 9 Feb 2007 Europhysics Letters PREPRINT Growing network with j-redirection arxiv:physics/0612148v2 [physics.soc-ph] 9 Feb 2007 R. Lambiotte 1 and M. Ausloos 1 1 GRAPES, Université de Liège, B5 Sart-Tilman, B-4000

More information

IR: Information Retrieval

IR: Information Retrieval / 44 IR: Information Retrieval FIB, Master in Innovation and Research in Informatics Slides by Marta Arias, José Luis Balcázar, Ramon Ferrer-i-Cancho, Ricard Gavaldá Department of Computer Science, UPC

More information

Deterministic scale-free networks

Deterministic scale-free networks Physica A 299 (2001) 559 564 www.elsevier.com/locate/physa Deterministic scale-free networks Albert-Laszlo Barabasi a;, Erzsebet Ravasz a, Tamas Vicsek b a Department of Physics, College of Science, University

More information

Higher-Order Web Link Analysis Using Multilinear Algebra

Higher-Order Web Link Analysis Using Multilinear Algebra Higher-Order Web Link Analysis Using Multilinear Algebra Tamara G. Kolda, Brett W. Bader, and Joseph P. Kenny Sandia National Laboratories Livermore, CA and Albuquerque, NM {tgkolda,bwbader,jpkenny}@sandia.gov

More information

Directed Scale-Free Graphs

Directed Scale-Free Graphs Directed Scale-Free Graphs Béla Bollobás Christian Borgs Jennifer Chayes Oliver Riordan Abstract We introduce a model for directed scale-free graphs that grow with preferential attachment depending in

More information

Web Ranking. Classification (manual, automatic) Link Analysis (today s lesson)

Web Ranking. Classification (manual, automatic) Link Analysis (today s lesson) Link Analysis Web Ranking Documents on the web are first ranked according to their relevance vrs the query Additional ranking methods are needed to cope with huge amount of information Additional ranking

More information

arxiv:cond-mat/ v1 [cond-mat.dis-nn] 4 May 2000

arxiv:cond-mat/ v1 [cond-mat.dis-nn] 4 May 2000 Topology of evolving networks: local events and universality arxiv:cond-mat/0005085v1 [cond-mat.dis-nn] 4 May 2000 Réka Albert and Albert-László Barabási Department of Physics, University of Notre-Dame,

More information

CS47300: Web Information Search and Management

CS47300: Web Information Search and Management CS473: Web Information Search and Management Using Graph Structure for Retrieval Prof. Chris Clifton 24 September 218 Material adapted from slides created by Dr. Rong Jin (formerly Michigan State, now

More information

Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides

Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides Web Search: How to Organize the Web? Ranking Nodes on Graphs Hubs and Authorities PageRank How to Solve PageRank

More information

Lecture 12: Link Analysis for Web Retrieval

Lecture 12: Link Analysis for Web Retrieval Lecture 12: Link Analysis for Web Retrieval Trevor Cohn COMP90042, 2015, Semester 1 What we ll learn in this lecture The web as a graph Page-rank method for deriving the importance of pages Hubs and authorities

More information

[11] [3] [1], [12] HITS [4] [2] k-means[9] 2 [11] [3] [6], [13] 1 ( ) k n O(n k ) 1 2

[11] [3] [1], [12] HITS [4] [2] k-means[9] 2 [11] [3] [6], [13] 1 ( ) k n O(n k ) 1 2 情報処理学会研究報告 1,a) 1 1 2 IT Clustering by Clique Enumeration and Data Cleaning like Method Abstract: Recent development on information technology has made bigdata analysis more familiar in research and industrial

More information

Lecture 7 Mathematics behind Internet Search

Lecture 7 Mathematics behind Internet Search CCST907 Hidden Order in Daily Life: A Mathematical Perspective Lecture 7 Mathematics behind Internet Search Dr. S. P. Yung (907A) Dr. Z. Hua (907B) Department of Mathematics, HKU Outline Google is the

More information

SCIENCE CURRICULUM MAPPING

SCIENCE CURRICULUM MAPPING SCIENCE MAPPING UNIT: E Dates: From Sept. 5,2008 Grade: Third To Oct. 2, 2008 From: 9/5/08 To: 9/22/08 Properties of Matter What are Physical Properties of Matter? What are solids, liquids, and gases?

More information

Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides

Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides Web Search: How to Organize the Web? Ranking Nodes on Graphs Hubs and Authorities PageRank How to Solve PageRank

More information

In Centre, Online Classroom Live and Online Classroom Programme Prices

In Centre, Online Classroom Live and Online Classroom Programme Prices In Centre, and Online Classroom Programme Prices In Centre Online Classroom Foundation Certificate Bookkeeping Transactions 430 325 300 Bookkeeping Controls 320 245 225 Elements of Costing 320 245 225

More information

Models of Communication Dynamics for Simulation of Information Diffusion

Models of Communication Dynamics for Simulation of Information Diffusion Models of Communication Dynamics for Simulation of Information Diffusion Konstantin Mertsalov, Malik Magdon-Ismail, Mark Goldberg Rensselaer Polytechnic Institute Department of Computer Science 11 8th

More information

FEB DASHBOARD FEB JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC

FEB DASHBOARD FEB JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC Positive Response Compliance 215 Compliant 215 Non-Compliant 216 Compliant 216 Non-Compliant 1% 87% 96% 86% 96% 88% 89% 89% 88% 86% 92% 93% 94% 96% 94% 8% 6% 4% 2% 13% 4% 14% 4% 12% 11% 11% 12% JAN MAR

More information

Quasi-random graphs with given degree sequences

Quasi-random graphs with given degree sequences Quasi-random graphs with given degree sequences Fan Chung and Ron Graham y University of California, San Diego La Jolla, CA 9093 Abstract It is now known that many properties of the objects in certain

More information

An Algorithm for Densest Subgraphs of Vertexweighted

An Algorithm for Densest Subgraphs of Vertexweighted An Algorithm for Densest Subgraphs of Vertexweighted Graphs School of Computer Science and Educational Software, Guangzhou University Guangzhou, 50006,China E-mail: 40947@qq.com Wenbin Chen Department

More information

Using a Layered Markov Model for Distributed Web Ranking Computation

Using a Layered Markov Model for Distributed Web Ranking Computation Using a Layered Markov Model for Distributed Web Ranking omputation Jie Wu School of omp. and omm. Sciences Ecole Polytechnique Fédérale de Lausanne 1015 Lausanne, Switzerland jie.wu@epfl.ch Karl Aberer

More information

WHEN IS IT EVER GOING TO RAIN? Table of Average Annual Rainfall and Rainfall For Selected Arizona Cities

WHEN IS IT EVER GOING TO RAIN? Table of Average Annual Rainfall and Rainfall For Selected Arizona Cities WHEN IS IT EVER GOING TO RAIN? Table of Average Annual Rainfall and 2001-2002 Rainfall For Selected Arizona Cities Phoenix Tucson Flagstaff Avg. 2001-2002 Avg. 2001-2002 Avg. 2001-2002 October 0.7 0.0

More information

Math 304 Handout: Linear algebra, graphs, and networks.

Math 304 Handout: Linear algebra, graphs, and networks. Math 30 Handout: Linear algebra, graphs, and networks. December, 006. GRAPHS AND ADJACENCY MATRICES. Definition. A graph is a collection of vertices connected by edges. A directed graph is a graph all

More information

Smoothed Prediction of the Onset of Tree Stem Radius Increase Based on Temperature Patterns

Smoothed Prediction of the Onset of Tree Stem Radius Increase Based on Temperature Patterns Smoothed Prediction of the Onset of Tree Stem Radius Increase Based on Temperature Patterns Mikko Korpela 1 Harri Mäkinen 2 Mika Sulkava 1 Pekka Nöjd 2 Jaakko Hollmén 1 1 Helsinki

More information

2016 Year-End Benchmark Oil and Gas Prices (Average of Previous 12 months First-Day-of-the Month [FDOM] Prices)

2016 Year-End Benchmark Oil and Gas Prices (Average of Previous 12 months First-Day-of-the Month [FDOM] Prices) Oil and Gas Benchmark Prices to Estimate Year-End Petroleum Reserves and Values Using U.S. Securities and Exchange Commission Guidelines from the Modernization of Oil and Gas Reporting Effective January

More information

Online Social Networks and Media. Link Analysis and Web Search

Online Social Networks and Media. Link Analysis and Web Search Online Social Networks and Media Link Analysis and Web Search How to Organize the Web First try: Human curated Web directories Yahoo, DMOZ, LookSmart How to organize the web Second try: Web Search Information

More information

Computation of Large Sparse Aggregated Areas for Analytic Database Queries

Computation of Large Sparse Aggregated Areas for Analytic Database Queries Computation of Large Sparse Aggregated Areas for Analytic Database Queries Steffen Wittmer Tobias Lauer Jedox AG Collaborators: Zurab Khadikov Alexander Haberstroh Peter Strohm Business Intelligence and

More information

Finding patterns in large, real networks

Finding patterns in large, real networks Thanks to Finding patterns in large, real networks Christos Faloutsos CMU www.cs.cmu.edu/~christos/talks/ucla-05 Deepayan Chakrabarti (CMU) Michalis Faloutsos (UCR) George Siganos (UCR) UCLA-IPAM 05 (c)

More information

Multi Time Scale Wind Energy Forecasting Model based on Meteorological Simulation and Onsite Measurement

Multi Time Scale Wind Energy Forecasting Model based on Meteorological Simulation and Onsite Measurement Multi Time Scale Wind Energy Forecasting Model based on Meteorological Simulation and Onsite Measurement Kota ENOKI, Takeshi ISHIHARA, Atsushi YAMAGUCHI, Yukinari FUKUMOTO, The University of Tokyo Tokyo

More information

Sluggish Economy Puts Pinch on Manufacturing Technology Orders

Sluggish Economy Puts Pinch on Manufacturing Technology Orders Updated Release: June 13, 2016 Contact: Penny Brown, AMT, 703-827-5275 pbrown@amtonline.org Sluggish Economy Puts Pinch on Manufacturing Technology Orders Manufacturing technology orders for were down

More information

A local algorithm for finding dense bipartite-like subgraphs

A local algorithm for finding dense bipartite-like subgraphs A local algorithm for finding dense bipartite-like subgraphs Pan Peng 1,2 1 State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences 2 School of Information Science

More information

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS DATA MINING LECTURE 3 Link Analysis Ranking PageRank -- Random walks HITS How to organize the web First try: Manually curated Web Directories How to organize the web Second try: Web Search Information

More information