Complex Systems Methods 11. Power laws an indicator of complexity?

Similar documents
Power laws. Leonid E. Zhukov

6.207/14.15: Networks Lecture 12: Generalized Random Graphs

CS224W: Analysis of Networks Jure Leskovec, Stanford University

1 Degree distributions and data

Introduction to Scientific Modeling CS 365, Fall 2012 Power Laws and Scaling

arxiv: v1 [stat.me] 2 Mar 2015

Power-Law Distributions in Empirical Data

arxiv: v1 [physics.data-an] 7 Jun 2007

Power-Law Distributions in Empirical Data

Data Analysis I. Dr Martin Hendry, Dept of Physics and Astronomy University of Glasgow, UK. 10 lectures, beginning October 2006

Theory and Applications of Complex Networks

Principles of Complex Systems CSYS/MATH 300, Spring, Prof. Peter

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

Dr. Maddah ENMG 617 EM Statistics 10/15/12. Nonparametric Statistics (2) (Goodness of fit tests)

Introduction to Probability and Statistics (Continued)

Probability Distributions Columns (a) through (d)

Brief Review of Probability

Probability Density Functions

1.0 Continuous Distributions. 5.0 Shapes of Distributions. 6.0 The Normal Curve. 7.0 Discrete Distributions. 8.0 Tolerances. 11.

Power Laws & Rich Get Richer

CS 365 Introduction to Scientific Modeling Fall Semester, 2011 Review

Network models: dynamical growth and small world

Reduction of Variance. Importance Sampling

Principles of Complex Systems CSYS/MATH 300, Fall, Prof. Peter Dodds

Predicting long term impact of scientific publications

Introduction to Bayesian Learning. Machine Learning Fall 2018

Lecture notes for /12.586, Modeling Environmental Complexity. D. H. Rothman, MIT October 20, 2014

If we want to analyze experimental or simulated data we might encounter the following tasks:

Physics 509: Bootstrap and Robust Parameter Estimation

Qualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama

Statistics for scientists and engineers

Heavy Tails: The Origins and Implications for Large Scale Biological & Information Systems

Robert Collins CSE586, PSU Intro to Sampling Methods

Almost giant clusters for percolation on large trees

Practice Exam 1. (A) (B) (C) (D) (E) You are given the following data on loss sizes:

Distribution Fitting (Censored Data)

Modeling Uncertainty in the Earth Sciences Jef Caers Stanford University

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions

Probability reminders

This does not cover everything on the final. Look at the posted practice problems for other topics.

Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger

Recall the Basics of Hypothesis Testing

DUBLIN CITY UNIVERSITY

The Fundamentals of Heavy Tails Properties, Emergence, & Identification. Jayakrishnan Nair, Adam Wierman, Bert Zwart

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011

Foundations of Statistical Inference

Pattern Recognition. Parameter Estimation of Probability Density Functions

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

BERTINORO 2 (JVW) Yet more probability Bayes' Theorem* Monte Carlo! *The Reverend Thomas Bayes

Linear Models for Regression CS534

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

Scaling invariant distributions of firms exit in OECD countries

Linear Models for Regression CS534

Probability Models. 4. What is the definition of the expectation of a discrete random variable?

Robustness and Distribution Assumptions

Frequency Analysis & Probability Plots

Hidden Markov Models and Gaussian Mixture Models

Stable Limit Laws for Marginal Probabilities from MCMC Streams: Acceleration of Convergence

S n = x + X 1 + X X n.

Intro to probability concepts

Simple Spatial Growth Models The Origins of Scaling in Size Distributions

ECON 4130 Supplementary Exercises 1-4

Continuous Random Variables. and Probability Distributions. Continuous Random Variables and Probability Distributions ( ) ( ) Chapter 4 4.

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

The Gaussian distribution

arxiv:cond-mat/ v3 [cond-mat.stat-mech] 29 May 2006

Scaling in Biology. How do properties of living systems change as their size is varied?

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

Robert Collins CSE586, PSU Intro to Sampling Methods

Topic 2: Probability & Distributions. Road Map Probability & Distributions. ECO220Y5Y: Quantitative Methods in Economics. Dr.

Continuous Random Variables. and Probability Distributions. Continuous Random Variables and Probability Distributions ( ) ( )

CS 365 Introduc/on to Scien/fic Modeling Lecture 4: Power Laws and Scaling

Basic concepts of probability theory

Lecture 2. Distributions and Random Variables

1 The preferential attachment mechanism

MIT Spring 2015

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

Practical Statistics

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

INDIAN INSTITUTE OF SCIENCE STOCHASTIC HYDROLOGY. Lecture -27 Course Instructor : Prof. P. P. MUJUMDAR Department of Civil Engg., IISc.

Model Fitting. Jean Yves Le Boudec

Midterm Exam 1 Solution

Mathematical statistics

Monte Carlo and cold gases. Lode Pollet.

Modeling Multiscale Differential Pixel Statistics

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Continuous Expectation and Variance, the Law of Large Numbers, and the Central Limit Theorem Spring 2014

System Identification, Lecture 4

Lecture 1: August 28

Information Retrieval CS Lecture 03. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Exercise 1: Basics of probability calculus

IE 230 Probability & Statistics in Engineering I. Closed book and notes. 60 minutes.

FIRST YEAR EXAM Monday May 10, 2010; 9:00 12:00am

Independent Events. Two events are independent if knowing that one occurs does not change the probability of the other occurring

Bayesian Methods for Machine Learning

Introduction to Rare Event Simulation

The Goodness-of-fit Test for Gumbel Distribution: A Comparative Study

Today: Fundamentals of Monte Carlo

Probabilistic Graphical Models

Physics 509: Error Propagation, and the Meaning of Error Bars. Scott Oser Lecture #10

Transcription:

Complex Systems Methods 11. Power laws an indicator of complexity? Eckehard Olbrich e.olbrich@gmx.de http://personal-homepages.mis.mpg.de/olbrich/complex systems.html Potsdam WS 2007/08 Olbrich (Leipzig) 25.01.2008 1 / 23

Overview 1 Introduction 2 The detection of power laws 3 Mechanisms for generating power laws Simple mechanisms Preferential attachment Yule process Criticality revisited 4 Power laws and complexity Olbrich (Leipzig) 25.01.2008 2 / 23

Introduction The ubiquity of power laws Observed in light from quasars, intensity of sunspots, flow of rivers such as the Nile, stock exchange price indices Distribution of wealth, city sizes, word frequencies Fluctuations in physiological systems (heart rate,... Why are power laws interesting? Heavy tails, higher probability of untypical events Power law distributions are self-similar Occur in fractals Might be related to some optimality principle Olbrich (Leipzig) 25.01.2008 3 / 23

Power laws for discrete and continuous variables A quantity x obeys a power lay if it is drawn from a probability distribution p(x) x α If x is a continuous variable, we have a probability density function (PDF) p(x) = α 1 ( ) x α x min x min If x N x x min, the probability of a certain value x is given by p(x) = with the generalized zeta function ζ(α, x min ) = 1 ζ(α, x min ) x α (n + x min ) α. n=0 Olbrich (Leipzig) 25.01.2008 4 / 23

Cumulative distribution function and rank/frequency plots Best way to plot power laws is to plot the cumulative distribution function (CDF) P(x) the probability to observe a x that has a value that is equal or larger than x no binning necessary, lower statistical fluctuations In the continuous case In the discrete case P(x) = x ( x p(x )dx = P(x) = x min ζ(α, x) ζ(α, x min ) ) α+1 Plotting the number of x s which are equal or larger than x with x being the frequency of occurrence gives the rank/frequency plot, which corresponds to the (unnormalized) cumulative distribution function Olbrich (Leipzig) 25.01.2008 5 / 23

Plotting power laws 10 0 Normal histogram 10 0 Logarithmic bins 10 1 pdf(x) 10 2 pdf(x) 10 10 3 10 0 10 1 10 2 10 3 x 0 Cumulative distribution 10 10 5 10 0 10 1 10 2 10 3 x 10 Rank frequency plot cdf(x) 10 1 10 2 x (frequency) 10 5 10 3 10 0 10 5 10 10 x 10 0 10 0 10 1 10 2 10 3 rank Olbrich (Leipzig) 25.01.2008 6 / 23

Zipf s law and its relatives Zipfs law: in natural langage the frequency of words is inversely proportional to its rank in the frequency table George Kingsley Zipf (1902-1950), American linguist and philologist who studied statistical occurrences in different languages. Vilfredo Federico Damaso Pareto (1848-1923) was a French-Italian sociologist, economist and philosopher. Pareto noticed that 80% of Italy s wealth was owned by 20% of the population. Pareto distribution: Number of people with an income larger than x. Pareto index is the exponent of the cumulative distributions of incomes Olbrich (Leipzig) 25.01.2008 7 / 23

Fitting the exponent Usual way: making a histogram and fitting a straight line in a log-log plot Problems: The errors are not gaussian and have not the same variance Maximum likelihood estimate of the slope ˆα such that p(α x 1,..., x n ) becomes maximal. Probability density p(x) = Cx α = α 1 x min ( x x min ) α Probability to observe a sample of data {x 1,..., x n } given α p(x 1,..., x n α) = n p(x i ) = i=1 n i=1 α 1 x min ( xi x min ) α Olbrich (Leipzig) 25.01.2008 8 / 23

Fitting the exponent Probability to observe a sample of data {x 1,..., x n } given α n n ( ) α 1 α xi p(x 1,..., x n α) = p(x i ) = x min x min i=1 Bayes rule p(α) p(α x 1,..., x n ) = P(x α) p(x 1,..., x n ). Assuming an uniform prior this leads to maximizing the log likelihood L = ln p(x 1,..., x n α) Setting L/ α = 0, one finds i=1 = n ln(α 1) n ln x min α ˆα = 1 + n ( i ln x i x min ) 1 n ln i=1 x i x min Olbrich (Leipzig) 25.01.2008 9 / 23

Comparism with least square fit in the log-log plot Maximum likelihood estimate: ˆα MLE = 1.5003 using all data points 10 0 α= 1.5 α LS = 1.31 α L S= 1.54 10 0 α= 1.5 α L S= 1.52 pdf(x) pdf(x) 10 5 10 0 10 1 10 2 10 3 x 10 5 10 0 10 1 10 2 10 3 x 10 0 α+1= 0.5 (α+1) LS = 0.4985 10 10 1/(1+α)= 2 (1/(1+α)) L S= 1.99 cdf(x) 10 1 10 2 x (frequency) 10 5 10 3 10 0 10 5 10 10 x 10 0 10 0 10 1 10 2 10 3 rank Olbrich (Leipzig) 25.01.2008 10 / 23

Maximum likelihood estimator of α ˆα = 1 + n ( i ln x i x min ) 1 with standard error σˆα = ˆα 1 n + O(1/n) More precise results in Clauset at al. (2007) ˆα = α n n 1 1 n 1 n σˆα = (α 1) (n 1) n 2 Olbrich (Leipzig) 25.01.2008 11 / 23

Estimating the lower bound x min Often ˆx m in has been chosen by visual inspection of a plot of the probability density function or the cumulative density function Increasing x min means unsing less data for the fit, thus the likelihood cannot be compared directly Bayes sches information criterion (BIC) having n points with x i > x min ln P(x x min ) L 1 2 (x min + 1) ln n. BIC underestimates x min if there is a simple (but not a power) law also for a region below x min. Clauset et al. propose to minimize the difference between the empirical distribution and the power law disribution using the Kolmogorov-Smirnov (KS) statistic D = max x x min S(x) P(x, ˆα, x min ) with S(x) being the CDF of the data for the observations with value at least x min. Olbrich (Leipzig) 25.01.2008 12 / 23

Testing the power law hypothesis Two aspects: 1 Is the power law hypothesis compatible with the data and not ruled out? 2 Are there other better explanations of the data? 1. Testing the power law: Clauset et al. propose a Monte Carlo procedure, i.e. (1) generating a large number of sysnthetic data sets drawn from the power-law distribution that best fits the observed data, (2) fit each one individually to the power-law model, (3) calculate the KS statistic for each one relative to its own best-fit model, (4) count the fraction that the resulting statistic is larger than the value D observed for the true data. This fraction is the p-value. (5) If the p-value is sufficiently small, the power law hypothesis can be ruled out. Olbrich (Leipzig) 25.01.2008 13 / 23

Testing the power law hypothesis Two aspects: 1 Is the power law hypothesis compatible with the data and not ruled out? 2 Are there other better explanations of the data? 2. Testing against alternative explanations: Calculating the log likelihood ratio R = ln L 1 L 2 = ln = n i=1 p 1 (x i ) p 2 (x i ) n (ln p 1 (x i ) ln p 2 (x i )) = i=1 n i=1 ( ) l (1) i l (2) i If R is significantly larger than 0 than p 1 is a better fit than p 2. Olbrich (Leipzig) 25.01.2008 13 / 23

Transformation method If x is distributed according to p(x) and y is a function of x, y = f (x), then we have q(y)dy = p(x)dx and therefore ( ) df 1 q(y) = p(x) dx x=f 1 (y) If x are uniformly distributed random numbers with (0 x < 1), we can ask for a function f, such that y = f (x) is distributed according to a power law q(y) = α 1 ( ) y α. y min y min Thus dy dx = y ( ) min y α α 1 y min Olbrich (Leipzig) 25.01.2008 14 / 23

Transformation method If x is distributed according to p(x) and y is a function of x, y = f (x), then we have q(y)dy = p(x)dx and therefore ( ) df 1 q(y) = p(x) dx x=f 1 (y) If x are uniformly distributed random numbers with (0 x < 1), we can ask for a function f, such that y = f (x) is distributed according to a power law q(y) = α 1 ( ) y α. y min y min Thus dy dx = y 1 α min α 1 y α Olbrich (Leipzig) 25.01.2008 14 / 23

Transformation method Integrating gives y dy dx = y 1 α min α 1 y α dy y min y α = y 1 α min α 1 x y = y min (1 x) 1 α 1 0 dx Olbrich (Leipzig) 25.01.2008 15 / 23

Combinations of exponentials A variant of the preceeding is the following: Suppose y has an exponential distribution p(y) e ay E.g. intervals between events occurring with a constant rate are exponentially distributed A second quantity x = e by, then p(y) = p(x) dy dx eay (1/b)e by = 1 b x 1+ a b Olbrich (Leipzig) 25.01.2008 16 / 23

Zipfs law for random texts This can explain Zipf s like behaviour of words in random texts: m letters, randomly typed, probability q s of pressing the space bar, probability for any letter q l = (1 q s )/m Frequency x of of a particular word with y letters followed by a space ( ) 1 y qs x = q s e by with b = ln(1 q s ) ln m. m Number of distinct possible words with length y goes up exponentially as p(y) m y = e ay with a = ln m. Thus p(x) x α with α = 1 a b α = 1 ln m ln(1 q s ) ln m = ln(1 q s) 2 ln m ln(1 q s ) ln m 2 for large m Olbrich (Leipzig) 25.01.2008 17 / 23

Preferential attachment Yule process According to Newman one of the most convincing and widely applicable mechanisms for generating power laws First reported for the size distribution of biological taxa by Willis and Yule 1922 The rich become richer The model: n... time steps, generations, at each step a new entity (city, species,...) is formed. Thus the total number of entities is n. p k,n fraction of entities of size k at step n. The total number of entities of size k is np k,n. At each step also m entities increase in size by 1 with a probability proportional to their size k i / i k i. Note that i k i = n(m + 1). Power law: p k k (2+ 1 m ) Olbrich (Leipzig) 25.01.2008 18 / 23

Preferential attachment Yule process Expected increas of entities of size k is then mk n(m + 1) np k,n = m m + 1 kp k,n Master equation for the new number (n + 1)p k,n+1 of entities of size k: (n + 1)p k,n+1 = np k,n + m m + 1 [(k 1)p k 1,n kp k,n ] with the exception of entities of size 1 (n + 1)p 1,n+1 = np 1,n + 1 m m + 1 p 1,n What is the distribution in the limit n? p 1 = m + 1 2m + 1 p k = m m + 1 [(k 1)p k 1 kp k ] Olbrich (Leipzig) 25.01.2008 19 / 23

Preferential attachment Yule process What is the distribution in the limit n? p 1 = m + 1 p k = m 2m + 1 m + 1 [(k 1)p k 1 kp k ] By iteration one gets p k = leading to p k = k 1 k + 1 + 1/m p k 1 (k 1)(k 2)... 1 (k + 1 + 1/m)(k + 1/m)... (3 + 1/m) p 1 which can be expressed using the gamma function Γ(a) = (a 1)Γ(a 1), Γ(1) = 1 Γ(k)Γ(2 + 1/m) p k = (1 + 1/m) Γ(k + 2 + 1/m) = (1 + 1/m)B(k, 2 + 1/m) Olbrich (Leipzig) 25.01.2008 19 / 23

Preferential attachment Yule process What is the distribution in the limit n? p 1 = m + 1 p k = m 2m + 1 m + 1 [(k 1)p k 1 kp k ] Γ(k)Γ(2 + 1/m) p k = (1 + 1/m) Γ(k + 2 + 1/m) = (1 + 1/m)B(k, 2 + 1/m) B(a, b) has a power law tail a b, thus p k k (2+ 1 m ) 10 0 B(x,1.5) x 1.5 10 1 10 2 10 3 10 4 10 0 10 1 10 2 x Olbrich (Leipzig) 25.01.2008 19 / 23

Preferential attachment Yule process What is the distribution in the limit n? p 1 = m + 1 2m + 1 p k = m m + 1 [(k 1)p k 1 kp k ] B(a, b) has a power law tail a b, thus p k k (2+ 1 m ) Generalizations: starting size k 0 instead of 1 Attachment probability proportioanl to k + c instead of k in order to handle also the case of k 0 = 0 (e.g. citations) Exponent α = 2 + k 0 + c m Most widely accepted explanation for citations, city populations and personal income Olbrich (Leipzig) 25.01.2008 19 / 23

x Random walks Random walks: Distribution of time intervals between zero crossings Gamblers ruin p(τ) τ 3/2 100 10 0 50 0 10 1 cdf 50 10 2 100 150 0 0.5 1 1.5 2 2.5 3 t x 10 4 ˆα MLE = 1.5375 10 3 10 0 10 5 τ Olbrich (Leipzig) 25.01.2008 20 / 23

Multiplicative processes Multiplicative random processes: the logarithm of the product is the sum of the logarithms the logarithm is normal distributed, the product itself is log-normal distributed p(ln x) = 1 2πσ 2 ( p(x) = p(ln x) d ln x dx (ln x µ)2 2σ 2 = ) 1 x 2πσ 2 exp ( ) (ln x µ)2 2σ 2 The log-normal distribution is not a power law, but it can look like a power law in the log-log plot ln p(x) = ln x (ln x µ)2 (ln ( x)2 µ ) 2σ 2 = 2σ 2 + σ 2 1 ln x µ2 2σ 2 Olbrich (Leipzig) 25.01.2008 21 / 23

Multiplicative processes Multiplicative time series model x t+1 = a(t)x(t) with a(t) randomly drawn from some distribution gives still a log-nomal distribution. Kesten multiplicative process x t+1 = a(t)x(t) + b(t) produces a CDF for x with tails P(x) c x under the following conditions (see Sornette): a and b are i.i.d. µ real-valued random variables and there exists a µ such that 1 0 < b µ < + 2 a µ = 1 3 a µ ln a < + This can be considered as a generalization of the random walk a = 1. Olbrich (Leipzig) 25.01.2008 21 / 23

Criticality revisited Main feature of the critical state is the divergence of the correlation length (and/or time), which means that there is no typical length (time) scale in the system and the system has to be scale free. In equlibrium phase transitions power law dependencies between control and order parameters were a consequence of the self-similarity of the critical state. By using the same argument (see e.g. Newman) one can show, that e.g. size distribution of clusters at the percolation transition has to obey a power law. Models of SOC (also the game of life) can be related to directed percolation or Reggeon field theory, respectively, a certain kind of branching process. Olbrich (Leipzig) 25.01.2008 22 / 23

Power laws and complexity The observation of a power law alone is not sufficient to call a system critical or even complex. There are plenty of trivial and less trivial mechanims to generate power laws. But, power laws are interesting: A lot of (in some sense) complex systems show power laws. It might, however, not necessarily be related to their complexity. Olbrich (Leipzig) 25.01.2008 23 / 23