UNR Jont Economcs Workng Paper Seres Workng Paper No. 08-005 Further Analyss of the Zpf Law: Does the Rank-Sze Rule Really Exst? Fungsa Nota and Shunfeng Song Department of Economcs /030 Unversty of Nevada, Reno Reno, NV 89557-0207 (775) 784-6850 Fax (775) 784-4728 emal: song@unr.nevada.edu September, 2008 Abstract The wdely-used Zpf law has two strkng regulartes: excellent ft and close-to-one exponent. When the exponent equals to one, the Zpf law collapses nto the rank-sze rule. Ths paper further analyzes the Zpf exponent. By changng the sample sze, the truncaton pont, and the mx of ctes n the sample, we found that the exponent s close to one only for some selected sub-samples. Usng the values of estmated exponent from the rollng sample method, we obtaned an elastcty of the exponent wth respect to sample sze. JEL Classfcaton: C1, R1 Keywords: Zpf law; Rank-sze rule; Rollng sample method
Further Analyss of the Zpf Law: Does the Rank-Sze Rule Really Exst? Fungsa Nota 1 and Shunfeng Song 2 Abstract: The wdely-used Zpf law has two strkng regulartes: excellent ft and close-to-one exponent. When the exponent equals to one, the Zpf law collapses nto the rank-sze rule. Ths paper further analyzes the Zpf exponent. By changng the sample sze, the truncaton pont, and the mx of ctes n the sample, we found that the exponent s close to one only for some selected sub-samples. Usng the values of estmated exponent from the rollng sample method, we obtaned an elastcty of the exponent wth respect to sample sze. JEL classfcaton: C1; R1 Keywords: Zpf law; Rank-sze rule; Rollng sample method 1 Nota s an Assstant Professor of Economcs at the Wartburg College, Iowa, USA. Emal: fungsa.nota@wartburg.edu 2 Song s a Professor of Economcs at the Unversty of Nevada, Reno, NV 89557, USA and an adjunct research fellow at Center for Research of Prvate Economy, Zhejang Unversty, Chna. Emal: song@unr.edu 1
1. Introducton: Zpf law states that the rank assocated wth some sze S s proportonal to S to some negatve power (Zpf, 1949). It has two strkng observatons. One s ts excellent ft. Numerous emprcal studes have shown that a lnear regresson of log-rank on log-sze generates an excellent ft (hgh R 2 -value). For example, Rosen and Resnck (1980) used data from 44 countres and found that R 2 -values were above 0.95 for 36 countres, wth only Thaland havng an R 2 -value lower than 0.9 (0.83). Ths astonshng regularty led Krugman (1995, p.44) to say that the rank-sze rule s "a major embarrassment for economc theory: one of the strongest statstcal phenomena we know, lackng any clear bass n theory. Fujta et al. (1999, p. 219) stated the regularty of the urban sze dstrbuton posses a real puzzle, one that nether our approach nor the most plausble alternatve approach to cty szes seems to answer. The other strkng observaton s about the Zpf coeffcent. For the 44 countres studed by Rosen and Resnck (1980), the estmated coeffcent ranges from 0.809 for ctes n Morocco to 1.963 for ctes n Australa. Ntsche (2005) analyzed 515 estmates from 29 studes of the rank-sze relatonshp and found that two-thrd of the estmated coeffcents are between 0.80 and 1.20. Several studes have attempted to explan why Zpf law holds. Gabax (1999a, 1999b) proved that the Zpf law derves from the Gbrat law, where the Gbrat law states that the growth process s ndependent of sze. Gan et al. (2006) concluded that the Zpf law s a statstcal phenomenon rather than an economc regularty. However, the strkng observaton of Zpf coeffcent close to 1 remans a puzzle. Is t an economc regularty or a statstcal phenomenon? Ths paper attempts to solve ths puzzle. The next secton outlnes the methodologes used n ths analyss, the rollng sample method and the random samplng method wth replacement. The thrd secton provdes the 2
results. The fnal secton summarzes the emprcal results and dscusses ther economc sgnfcance. Succnctly, ths paper seeks to fnd the mpact of sample sze, truncaton pont and the mx of ctes on the estmated exponent of the Zpf law. 2. The model and methodology Zpf law s commonly expressed n the followng form: β R = AS [1] where R s the rank of the th cty, S s the cty's sze and β s the exponent coeffcent. Wth a log transformaton, t estmates β as follows: log( R ) = α β log( S ) + ε [2] Several studes have noted that estmatng Equaton [2] yelds an OLS bas through the standard errors. To correct ths bas, Gabax and Ibragmov (2006) offered the followng verson 0.5 that gves unbased standard errors of ( 2 / ) β, where s the correspondng sub-sample sze: log( R 0.5) = α β log( S ) + ε [3] n Ths corrected verson s known as the rank-mnus-half rule. Throughout ths analyss, we wll provde results from both versons and comment on the dfferences that exst between them. The frst method we use n ths paper s the rollng sample method. We estmate the exponent coeffcent β usng OLS and repeat the estmaton process usng a movng truncaton pont. The start pont of each sub-sample s fxed at the largest cty and the truncaton pont moves down by one cty every tme, thereby ncreasng the sub-sample sze by one each tme. For example, the full sample sze of U.S. urbanzed areas for 1990 s 396. These urbanzed areas are ordered decreasngly from the largest urbanzed area of New York to the smallest one of ^ n 3
Brunswck, GA. The frst sub-sample sze s n1, the 10 largest ctes for example; then the second sub-sample s n 2 = n 1+ 1, the 11 largest ctes, and so on. We contnue ths process untl the last sub-sample becomes the full sample of 396. The advantage of ths methodology s to capture the coeffcent varaton as both the sample sze and truncaton pont change. The second method, random samplng wth replacement, separates some of the smultaneous effects captured wth the rollng sample. The rollng sample provdes the gross varaton n the estmated coeffcent as three factors change smultaneously (sample sze, truncaton pont and the varaton n cty szes). To untangle these effects, we use our orgnal data for each year as a pool to select from. We then randomly select the frst sub-sample, 10 random ctes for example, and rank them up. The second sub-sample s ndependent of the frst sub-sample. However, t contans one more cty, and so on. We contnue ths process untl the last sub-sample becomes the full sample. Snce ths s a random process, we run regresson 100 tmes for each sample sze and get 100 estmated coeffcents. We average these seres and obtan the dstrbuton of the coeffcent wth respect to sample sze. The thrd method s to further test the effect of sample sze on the dstrbuton of the estmated coeffcent. For ths, we randomly generate 1000 numbers from a normal dstrbuton. We then apply the random samplng technque and repeat the process we dd above. After 100 teratons, we average the seres of β 's and obtan the dstrbuton of the coeffcent wth respect to sample sze. ^ 4
3. Results Table 1 shows the full-sample results of Zpf law. Not surprsngly, we obtaned very hgh R 2 -values. Comparng the estmated coeffcents between OLS bas corrected and uncorrected models, we conclude that the uncorrected Zpf law has a downward bas. Table 1: Regresson results on Zpf's law usng data on US urbanzed areas OLS Bas R 2 Year ^β Corrected ^ β (from unadjusted) Sample Sze 1980 0.91 0.925 0.989 366 1990 0.895 0.913 0.989 396 2000 0.875 0.895 0.989 452 Data Sources: U.S Bureau of Census ( 2000). The rollng sample results show a negatve relatonshp between the estmated coeffcent and sample sze. Ths mples that small samples of bg ctes yeld hgher coeffcents than large samples that also nclude smaller ctes. Does the rank-sze rule exst? We note that the rank-sze rule only holds for certan sub-samples where the 95% confdence nterval ncludes 1. Specfcally, for the 1980 ctes, rank-sze rule holds only for sub-samples between 180 and 205; for 1990, 140 to 195; and for 2000, 140 to 205. Ths fndng suggests that the rank-sze rule (.e., β=1) does not holds for ether large ctes or the larger samples wth more small ctes. Fgure 1 shows the dstrbuton of estmated coeffcents wth respect to sample sze for 2000. 5
Fgure 1 Zpf's Law: U.S Urban Areas 2000 1.8 1.6 1.4 Pareto Exponent 1.2 1 0.8 0.6 440 425 410 395 380 365 350 335 320 305 290 275 260 245 230 215 200 185 170 155 140 125 100 85 70 55 40 25 10 Sample Sze Adj_Beta Unadj Interestngly, the graph suggests a lognormal dstrbuton. To confrm ths observaton, we run the followng regresson between the estmated exponent ( β ) and the sample sze (SS): log( ^ β ) = α δ log( SS ) + ε [4] ^ ^ Table 2 presents the results, wth observatons beng the number of estmated exponents ( β 's) obtaned from the rollng sample method. Surprsngly, the lognormal regresson yelds a very hgh R 2 -value, ndcatng a strong statstcal relatonshp between estmated coeffcent and sample sze. For the OLS-bas-corrected model, Table 2 shows that a one percent ncrease n the sample sze would lead to a 0.15 percent or more decrease n the value of the estmated exponent. The uncorrected model shows a smaller elastcty of estmated exponent wth respect to sample 6
sze, and ths explans why the uncorrected model converges wth the corrected model n Fgure 1. These results are mportant, because they prove that the valdty of the rank-sze rule largely depends on the sample sze used n a study. In other words, the rank-sze s not an economc regularty but a statstcal phenomenon. Table 2: The relatonshp between the estmated Zpf exponent and the sample sze OLS Bas R 2 Year ^δ Corrected ^ δ (from adjusted) Number of observatons 1980-0.10*** -0.15*** 0.98 355 1990-0.11*** -0.16*** 0.96 385 2000-0.13*** -0.17*** 0.97 441 ***: sgnfcant at 1% As we dscussed n the methodology secton, a dlemma exsts wth the rollng sample technque because t captures the jont effect of truncaton pont, sample sze, and the assortment of ctes n the sample. Usng the random samplng wth replacement technque whle ncreasng the sub-sample, we capture an assortment of ctes that can nclude all szes from the begnnng. Ths elmnates the bas due to large ctes n the frst sub-samples. By randomly samplng each tme, the truncaton pont also randomly changes. Ths elmnates the systematcally changng truncaton pont bas nherent n the rollng sample technque. Fgure 2 presents the dstrbuton of estmated coeffcents based on the random samplng method for 2000. It shows that sample 7
sze alone has an upward bas manly for sub-samples below 100. For sample szes greater than 100, the effect of sample sze dsappears as we ncrease the sample sze..e., the estmated coeffcent stays almost constant. Fgure 2 U.S. UA 2000: Random Samplng 1.2 1.1 1 Betas 0.9 0.8 0.7 0.6 440 425 410 395 380 365 350 335 320 305 290 275 260 245 230 215 200 185 170 155 140 125 110 95 80 65 50 35 20 5 Sample Sze To further test the effect of sample sze on the dstrbuton of the estmated coeffcent, we randomly generate 1000 numbers from a normal dstrbuton. We then apply the random samplng technque and repeat the process we dd above. After 100 teratons, we average the ^ seres of β 's and show the results n the graph below. Surprsngly, we stll capture the effect of very small sample szes below 100. Fgure 3 confrms the upward bas of samples less than 100. For sample sze greater than 100, the sample sze has lttle nfluence on the value of estmated coeffcent. 8
Fgure 3 Smulatons Results: Randomly generated Numbers 1.1 1 0.9 0.8 Betas 0.7 0.6 0.5 0.4 5 35 65 95 125 155 185 215 245 275 305 335 365 395 425 455 485 515 Sample Sze 545 575 605 635 665 695 725 755 785 815 845 875 905 935 965 995 4. Conclusons Ths paper has examned the valdty of the rank-sze rule based on estmated Zpf exponent. Usng the rollng sample technque, we proved that small samples wth large ctes tend to generate hgh values of the estmated coeffcent compared to samples domnated wth small ctes. The rank-sze rule holds only for some selected sub-samples. We also observed the upward bas of the estmated coeffcent when we used random samplng wth replacement technque and got random samples from a normal dstrbuton. The double log regresson model of estmated exponents and sample szes yelded a very hgh R 2 -value. It also produced an elastcty of the estmated exponent wth respect to sample sze, wth a one percent ncrease n the sample sze leadng to about 0.15 percent or more decrease n the value of the estmated exponent. Therefore, we conclude that the Zpf exponent depends on the sample sze used n a study and the rank-sze rule does not hold n general. In other words, the rank-sze s not an economc regularty but a statstcal phenomenon. 9
References: Fujta, M., Krugman, P., Venables, A.J., 1999. The Spatal Economy. The MIT Press, Cambrdge, MA. Gabax, X., 1999a. Zpf's law for ctes: an explanaton. Quartely Journal of Economcs CXIV (3), 739 767. Gabax, X., 1999b. Zpf's law and the growth of ctes. Amercan Economc Revew, Vol. 89 (2), 129 132. Gabax, X., Ibragmov, R., 2006. Rank ½: A smple way to mprove the OLS estmaton of tal exponents. Workng Paper. Gan, L., L, D., Song, S., 2006. Is the Zpf's law spurous n explanng cty-sze dstrbutons? Economc Letters 92, 256 262. Krugman, K., 1995. Development, Geography, and Economc Theory. The MIT Press, Cambrdge, MA. Ntsche, V., 2005. Zpf zpped. Journal of Urban Economcs 57, 86-100. Rosen, K., Resnck, M., 1980. The sze dstrbuton of ctes: An explanaton of the Pareto law and prmacy. Journal of Urban Economcs 8, 165-186. Zpf, G., 1949. Human behavor and the prncple of last effort. Cambrdge, MA: Addson Wesley Press. 10