The relation between memory and power-law exponent

Size: px

Start display at page:

Download "The relation between memory and power-law exponent"

Sherilyn Lee
5 years ago
Views:

1 The relation between memory and power-law exponent Fangjian Guo and Tao Zhou December 2, 22 Abstract The inter-event time of many real systems are empirically found to be power-law distributed, and memory (the first-order autocorrelation), as a quantity ranging from - to +, has been proposed to characterize the short-range correlation of time series. While series from real systems are usually found to be positively or negatively correlated, the memories for empirical power-law series are predominantly positive. In this paper, we study the mechanism for such memory bias. We analyze the bounds of memory for a family of random series with a specified marginal, which are generated by rearranging the orders of i.i.d. samples drawn from the marginal while preserving their values. While such rearrangement can produce uniform or Gaussian distributed series with arbitrary memory from - to +, we found in this paper that power-law distribution imposes a special constraint on the possible memory of such series, which depends on the power-law exponent. Probabilistic and approximation methods are developed respectively to analyze the bounds when corresponding population moments converge and diverge, accompanied by validation from experimental results. Introduction Power-law distributions are used to model the inter-event time series of many real systems [], including communication, web browsing and heart beats. Whereas the power-law captures the marginal distribution of empirical data, memory has been introduced by Goh and Barabási as a quantity to characterize their interdependence, which has even been claimed to be a measure orthogonal to the marginal distribution [2]. Given a positive-valued series {t i } of length n, memory is defined as its first-order autocorrelation, i.e. M = n (t i m )(t i+ m2), () n σ σ 2

2 where m, σ and m 2, σ 2 refer to the mean and standard deviation of series {t, t 2,, t n } and {t 2, t 3,, t n } respectively. M lies in the range of [, +] due to Cauchy-Schwartz inequality, for which we have M [ ] [ ] n n (t i m ) σ σ 2 n 2 (t i+ m 2 ) n 2 =. (2) It should be noted that, when n, the aforementioned two series would only be different in one element. Therefore, we have m = m 2 = m, σ = σ 2 = σ (n ), (3) where m and σ are simply the mean and standard deviation of the whole series. The memory can thus be rewritten as M = n ( σ 2 t i t i+ m 2) (n ), (4) n where the rearrangement of the order of the series only affects the sum of products n t it i+, leaving m and σ unchanged. When the series {t i } are independently sampled from a distribution, the memory of the series would definitely approach zero when n, as E[M] = n ( σ 2 E[t i t i+ ] m 2) =, (5) n with E[t i t i+ ] = m 2. However, by rearranging the order of the elements in the series while preserving their values, the memory can be adjusted without changing the marginal distribution. In fact, in the it of large n, such permutations affect the term n t it i+ only. 2 Rearrangement with maximum or minimum memory In the following sections, we assume n and therefore a simpler form of memory as in equation (4) is adopted. A rearrangement θ is a one-toone mapping from the set {, 2,, n} to itself, and the i-th element in the rearranged sequence is denoted as θ i. Denoting the rearrangement with maximum memory and the one with minimum memory as θ max and θ min respectively, and denoting the sum of the products of adjacent elements as n S θ = t θi t θi+, (6) 2

3 we have θ max = argmax S θ, θ θ min = argmin S θ. θ (7) We also define a shorthand s θ = n S θ. It has been shown there exist fixed θ max and θ min that holds for all the possible positive series of {t, t 2,, t n } [3]. (Whereas [3] uses an objective function that sums the products in a circle, i.e. S θ = n t θ i t θi+ + t θ t θn, the results can be reduced to our case by introducing an additional S = to the series, which makes zero contribution to the sum.) To illustrate how they are arranged, we first introduce the order statics. The order statics {t (i) } are formed by simply arranging the series in an increasing order, i.e. t () t (2) t (n ) t (n), (8) with t (i) being the i-th smallest element in the series. θ max obtains the maximum memory by first arranging the odd elements of order statistics in increasing order, followed by even elements in decreasing order, which is t (), t (3),, t (2l ), t (2l), t (2l 2),, t (4), t (2) (n = 2l), t (), t (3),, t (2l ), t (2l+), t (2l),, t (4), t (2) (n = 2l + ). (9) For simplicity, we only address the case when n = 2l, the sum with order statistics is expressed as S θmax = 2l 2 t (i) t (i+2) + t (2l) t (2l ) (n = 2l), () while the case of odd n can be handled similarly. On the contrary, quite interestingly, θ min arranges the order statistics by interlacing the even and odd terms when n = 2l or small and big terms when n = 2l +, i.e. t (2l), t (), t (2l 2),, t (2l 3), t (2), t (2l ) (n = 2l), t (2l), t (2),, t (l),, t (), t (2l+) (n = 2l + ). () Again, for even n, we have l ( ) S θmin = t(i) t (2l+ i) + t (i) t (2l i) + t(l) t (l+) (n = 2l). (2) 3

4 3 Adjusting memory by iterative rearrangement We first explore the bounds for uniform, Gaussian and power-law samples by iteratively rearranging them towards θ max or θ min. To do this, we construct an approximation of the order statistic iteratively. The process contains n steps, the same as the length of the series. In each step, the series is rearranged with one pass of bubble sort on the series produced after the last step, which means stepping through the series from the first element to the last, comparing each pair of adjacent elements and swap them if they are not in increasing order. The series obtained after i such steps is denoted as { t (i) }, as an approximation to the order statistics, with { t (n) } guaranteed to be the same as the order statistics because a bubble sort always finishes in n passes. In each step, treating { t (i) } as approximate order statistics, series with positive and negative memory are obtained by rearranging { t (i) } according to (9) and () respectively. In our experiment, series are of length n =,, independently drawn from uniform distribution on [, ], standard Gaussian distribution (all samples are added by the same positive constant afterwards to ensure being positive), and power-law distribution whose probability density function is f(t) = α t ( ) α, (3) t min t min with t min = and α = 3.5. Results are obtained by averaging 5 independent runs and reported by Figure for positive memory and Figure 2 for negative memory. As can be seen, by rearranging the series towards θ max, the memory goes to + for all three distributions. As is noted, the maximum memory of power-law series after the last iteration is slightly below +, due to the effect of finite n, which we will see later. However, while the memory can be tuned down to by rearrangement towards θ min for uniform and Gaussian distributed series, it is bounded above about.2 for the power-law series. In the following sections, we will further analyze how this non-trivial lower bound depends on the exponent α. 4 The bounds of memory for rearranged independent power-law series In our following analysis, without loss of generality, we assume the original series are independently sampled from the power-law distribution with t min =, hence the probability density function being f(t) = (α )t α, (4) 4

5 .2.8 Power law Uniform Gaussian Memory Iterations of rearrangement Figure : Positive memory by iteratively rearranging independently sampled series..2.2 Memory.4.6 Power law Uniform Gaussian Iterations of rearrangement Figure 2: Negative memory by iteratively rearranging independently sampled series. where α > is required to form a well-defined distribution. Series with a different lower bound can be recovered by being multiplied by t min and its 5

6 memory ( ) n M = (t min σ) 2 (t min t i )(t min t i+ ) (t min m) 2 = M (5) n is the same as the one with t min =. Therefore, the bounds obtained with t min = hold true with any positive t min. M is always well-defined for empirical samples, as samples are of course finite. However, the population variance Var[t] exists only when α > 3 and the population mean E[t] exists only when α > 2. Therefore, we adopt different treatments for α > 3 and < α 3. When α > 3, we equate sample moments involved in the definition of memory with population moments and then derive the expected value E[M] in the it of n. When < α 3, we instead use a method that approximates the random samples with determinant values. Such approximation can largely simplify the analysis when population moments diverge while still reproducing the behavior of bounds. 4. Probabilistic method for the α > 3 case When α > 3, we denote the population standard deviation and the population mean as σ(α) and m(α) respectively when there is no ambiguity. By equating sample moments with corresponding population moments, we have where E[M] = n σ(α) 2 ( E[t i t i+ ] m(α) 2 ), (6) n m(α) = α α 2, (7) σ(α) 2 = α α 3 m(α)2. (8) 4.. The upper bound of memory The upper bound of memory M max is defined as the expected value of the memory with rearrangement θ max in the it of n, namely M max = E[M θ n max ] = σ(α) 2 ( n n E[S θ max ] m(α) 2 ), (9) where the expected value of S θmax is composed of the expected value of products of adjacent items according to equation (), i.e. n E[S θ max ] = 2l 2 E[t 2l (i) t (i+2) ] + 2l E[t (2l)t (2l ) ], (2) 6

7 assuming n = 2l. The case when n = 2l + can be worked out in a similar fashion to arrive at the same results. The expected value of each term can be obtained by using the joint distribution of order statistics. The probability density function for the joint distribution of two order statistics t (j), t (k) (j < k) is given by [4] [F (x)]j [F (y) F (x)] k j f t(j),t (k) (x, y) = n! (j )! (k j)! [ F (y)] n k f(x)f(y) (n k)! where F (x) is the cumulative distribution function for power-law, i.e. (x y), (2) F (x) = x α, (22) and f is given by (4). Therefore, we have E[t (i) t (i+2) ] = = x y< Γ(2l + ) Γ(2l + 2c) xyf t(i),t (i+2) (x, y)dxdy Γ(2l i + 2c) Γ(2l i ) [(2l i) c][(2l i) αc] ( i 2l 2), and E[t (2l ) t (2l) ] = = x y< xyf t(2l ),t (2l) (x, y)dxdy (α )2 2(α 2) 2 Γ(3 2c) Γ(2l + ) Γ(2l + 2c), (23) where we use a shorthand c = α (, 2 ). In the it of n, the first term in (2) would be 2l 2 2l = 2l E[t (i) t (i+2) ] Γ(2l + ) Γ(2l + 2c) 2l 2 [2l i 2c][2l i αc] Γ(2l i + 2c). Γ(2l i ) Reducing the prefactor by using a property of Gamma function, namely x Γ(x) Γ(x + γ)x γ = (γ R), and rewriting the summation with index k = 2l i, we have 2l 2 2l 2l E[t (i) t (i+2) ] = (2l+) ( 2c) 7 k=2 (k c)(k c ) Γ(k + 2c). Γ(k )

8 As c < 2, the prefactor approaches zero when n, making the result depend on the iting behavior of the summing terms only. Now we have 2l 2 2l E[t (i) t (i+2) ] = k=2 k 2l ( 2l + ) 2c 2l + = (24) Meanwhile, the second term in the RHS of (2) vanishes in the it of large n, as Substituting (24) and (25) into (6), we arrive at 2l E[t (2l)t (2l ) ] =. (25) M max = n E[M θ max ] = (α > 3), (26) proving that the maximum memory for rearranged power-law series is +, the same as uniform and Gaussian distributed series The lower bound of memory The lower bound of memory is formulated similarly as where M min = M θ n min = σ(α) 2 ( n n E[S θ min ] m(α) 2 ), (27) t 2c dt = α α 3. n E[S θ min ] = l ( E[t(i) t 2l (2l+ i) ] + E[t (i) t (2l i) ] ) + 2l E[t (l)t (l+) ], (28) again assuming n = 2l for convenience. The expected value of these products can also be obtained with the distribution of order statistics: E[t (i) t (2l+ i) ] = E[t (i) t (2l i) ] = Γ(2l + ) Γ(2l + 2c) Γ(2l + ) Γ(2l + 2c) Γ(i c) Γ(2l i + 2c) Γ(i) Γ(2l i + c), (29) Γ(i + 2 c) Γ(2l i + 2c) Γ(i + 2) Γ(2l i + c). (3) 8

9 Taking the large n it, we have l 2l = 2l E[t (i) t (2l+ i) ] Γ(2l + ) Γ(2l + 2c) l Γ(i + 2 c) Γ(2l i + 2c) Γ(i + 2) Γ(2l i + c) l = (2l + ) ( 2c) (i + 2) c (2l i + ) c l = ( i + 2 2l + ) c ( i 2l + ) c ( 2l + ) = 2 u c ( u) c du = B( 2 ; α 2 α, α 2 α ), where B is the incomplete beta function, defined as B(x; a, b) = x (3) t a ( t) b dt. (32) We also have the other term with the same it, namely l 2l and again the remaining term Therefore, the minimum memory is derived as [ 2B M min = n E[M θ min ] = σ(α) Experimental results E[t (i) t (2l i) ] = B( 2 ; α 2 α, α 2 ), (33) α 2l E[t (l)t (l+) ] =. (34) ( 2 ; m(α), m(α) ) m(α) 2 ] (α > 3). (35) Figure 3 reports the comparison between theoretical and experimental results for the upper and lower bound of memory for rearranged independently sampled power-law series, in the case of α > 3. The solid lines are drawn according to (26) and (35), while the circles are produced by rearranging the independently sampled series with θ max and θ min. The effect of finite system size n is observed when α is near 3, resulting in minor difference between theoretical and experimental results. When α > 3, while M max is a constant, M min is a decreasing function of α with the it M min.64 (α ). 9

10 .8 Memory Max Memory (experiment) Min Memory (experiment) Min Memory (theory) Max Memory (theory) α Figure 3: The comparison between theoretical and experimental bounds for memory, in the case of α > 3. The series are of length n = 5, and experimental results are obtained by averaging over independent runs. 4.2 Approximation method for the α 3 case When α 3, the population variance would diverge and even the population mean would diverge when α goes below 2, making the method of equating sample moments with population moments invalid for such circumstances. And a direct calculation of the expected value of maximum or minimum memory involves the distribution of M itself, which is intractable. Therefore, we adopt an approximation method for the α 3 case, which substitutes the random samples with determinant values in the it of large n Approximating random samples with equiprobable slices As illustrated by Figure 4, the points ˆt, ˆt 2,, ˆt n cut the area under the probability density function f(t) into slices of equal area n, with ˆt = t min = and the area from t n extending to infinity also being n. Then we approximate the random samples {t, t 2,, t n } with these determinant points {ˆt, ˆt 2,, ˆt n }. It should be noted that such approximation imposes a cutoff on the maximum value of t i and the probability of drawing a sample exceeding the cut-off is n, which diminishes to zero as n. By setting

11 ˆt i = i n where c = ˆt (i) = ˆt i., we have α > 2 f(t) ˆt i = ( i n ) c (i =, 2,, n), (36) in this case. And obviously the order statistics satisfies ^ ^ ^ t t 2 t 3 t min ^t 4... ^t n t Figure 4: Cutting the area under f(t) (in logarithm scale) into slices with equal area /n, thus equal probability. The area under the last point to infinity is also /n. Recall the expression for memory M = n ( σ 2 t i t i+ m 2) n = n n n t it i+ m 2 n t2 i, m2 where three quantities of samples are involved, denoted as (37) and s = n S = n t i t i+, (38) n t 2 = n m 2 = ( n n t 2 i, (39) n t i ) 2. (4)

12 When α 3, s and t 2 diverge as n and m 2 no longer converges if α 2. We can then use the approximated samples {ˆt, ˆt 2,, ˆt n } to analyze the asymptotic behavior of these quantities as n The upper bound of memory Substituting t (i) with ˆt i, we have ŝ θmax = 2l 2 n ( ˆt iˆt i+2 + ˆt 2lˆt 2l ) (n = 2l) = n n2c[ n (k 2 ) c + 2 c] (α < 3), k=2 (4) ˆt 2 θ max = n n n ˆt 2 i = n 2c (k + ) 2c (α < 3). (42) With n, both of them are of the same order as n 2c, while it is found that m 2 either converges when 2 < α 3 or diverges with a lower order n 2c 2 when < α 2. Hence, in the it of large n, M θmax can be approximated by neglecting m 2, i.e. ˆM θmax ŝθ max ˆt 2 θ max k= n k=2 (k2 ) c + 2 c n k= (k + ) 2c, (43) where both the numerator and denominator are convergent when n. And thus we have M max = n E[M θ max ] n n ˆM θmax n k=2 (k2 ) c + 2 c n k= (k + ) 2c ( < α 3). (44) The lower bound of memory By analyzing the memory produced by θ min with approximated samples, it is found that ˆt 2 θ min is the term diverging the biggest order of n, while the orders for m 2 and ŝ θmin are smaller if they diverge. As the biggest order term only appears in the denominator, we have M min = n E[M θ min ] n ˆM θmin =. (45) 2

13 .2.8 Max Memory (experiment) Min Memory (experiment) Max Memory (approx. theory) Min Memory (approx. theory) Memory α Figure 5: Comparison between experimental results and approximation theoretical results for < α 3. Each experimental data point is obtained by averaging sampled series of length n = 5, for independent runs Experimental results Figure 5 compares the upper and lower bounds of memory obtained by equiprobable slice approximation theory with experimental results, where data points are averaged over independent runs of drawing power-law samples of length n = 5,. Although such approximation method cannot precisely recover the behavior of maximum memory, it still captures the trend the maximum memory goes up from to + as α increases from to 3. Meanwhile, the predicted minimum memory remains zero when α 3 and larger gaps between experimental and theoretical values are noticed when α is near 3. 5 Conclusion and discussion By combining the results obtained by probabilistic method for α > 3 and approximation method for α 3, in Figure 6 we plot theoretical values for M max and M min with < α 7, where slice approximation is used for < α 3 and probabilistic method is used for α > 3. Clearly, in the sense of expected values and in the it of large n, the memory of rearranged power-law series is confined to the positive region for α 3, with the upper bound goes up from to + as α increases. 3

14 .8.6 Maximum Memory Minimum Memory Memory α Figure 6: Theoretical values for maximum and minimum memory of rearranged independent power-law series in the range of < α < 7. Slice approximation method is used for < α 3 while probabilistic method is used for 3 < α < 7. When α > 3, however, the upper bound remains to be +, while the lower bound slides down to the negative region as a decreasing function of α, with the it M min.64 (α ). Therefore, the minimum memory for rearranged power-law series is constrained above.64. Admittedly, the rearrangement of independently sampled power-law series cannot cover all the possible correlated series with power-law marginal and memory (first-order autocorrelation) is not the only measure for describing temporal correlation. Still, our results may point to a hidden mechanism of power-law, i.e. the power-law distribution may naturally constrains the possible interdependence among elements, especially preventing the correlation from being too negative. In a addition, our approach, both probabilistic and approximate, may be generalized to the analysis of a wider spectrum of distributions common for empirical data, such as log-normal and power-law with exponential tails. References [] A. Clauset, C. Shalizi, and M. Newman. Power-law distributions in empirical data. SIAM Review, 5(4):66 73, 29. 4

15 [2] K. I. Goh and A. L. Barabási. Burstiness and memory in complex systems. EPL (Europhysics Letters), February 28. [3] Marc Hallin, Guy Melard, and Xavier Milhaud. Permutational extreme values of autocorrelation coefficients and a pitman test against serial dependence. ULB Institutional Repository 23/237, ULB Universite Libre de Bruxelles, 992. [4] H. A. David and H. N. Nagaraja. Order Statistics. Wiley, New Jersey, 3rd edition edition, 23. 5

The Expectation Maximization Algorithm

The Expectation Maximization Algorithm Frank Dellaert College of Computing, Georgia Institute of Technology Technical Report number GIT-GVU-- February Abstract This note represents my attempt at explaining