UNR Joint Economics Working Paper Series Working Paper No Further Analysis of the Zipf Law: Does the Rank-Size Rule Really Exist?

Similar documents
Statistics for Business and Economics

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Statistics for Economics & Business

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

STAT 511 FINAL EXAM NAME Spring 2001

STATISTICS QUESTIONS. Step by Step Solutions.

Chapter 13: Multiple Regression

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

x i1 =1 for all i (the constant ).

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Chapter 11: Simple Linear Regression and Correlation

/ n ) are compared. The logic is: if the two

NUMERICAL DIFFERENTIATION

Testing for seasonal unit roots in heterogeneous panels

A Robust Method for Calculating the Correlation Coefficient

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students.

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

T E C O L O T E R E S E A R C H, I N C.

Economics 130. Lecture 4 Simple Linear Regression Continued

Midterm Examination. Regression and Forecasting Models

Scatter Plot x

Factor models with many assets: strong factors, weak factors, and the two-pass procedure

Basic Business Statistics, 10/e

Assignment 5. Simulation for Logistics. Monti, N.E. Yunita, T.

Turbulence classification of load data by the frequency and severity of wind gusts. Oscar Moñux, DEWI GmbH Kevin Bleibler, DEWI GmbH

Comparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method

Lecture 6: Introduction to Linear Regression

Statistics II Final Exam 26/6/18

Chapter 15 Student Lecture Notes 15-1

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Negative Binomial Regression

Comparison of Regression Lines

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

Basically, if you have a dummy dependent variable you will be estimating a probability.

January Examinations 2015

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes

x = , so that calculated

Lecture 3 Stat102, Spring 2007

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

Continuous vs. Discrete Goods

First Year Examination Department of Statistics, University of Florida

e i is a random error

STAT 405 BIOSTATISTICS (Fall 2016) Handout 15 Introduction to Logistic Regression

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Statistics MINITAB - Lab 2

Durban Watson for Testing the Lack-of-Fit of Polynomial Regression Models without Replications

An (almost) unbiased estimator for the S-Gini index

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

28. SIMPLE LINEAR REGRESSION III

18. SIMPLE LINEAR REGRESSION III

x yi In chapter 14, we want to perform inference (i.e. calculate confidence intervals and perform tests of significance) in this setting.

Credit Card Pricing and Impact of Adverse Selection

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

CHAPTER 8. Exercise Solutions

Week3, Chapter 4. Position and Displacement. Motion in Two Dimensions. Instantaneous Velocity. Average Velocity

Linear Correlation. Many research issues are pursued with nonexperimental studies that seek to establish relationships among 2 or more variables

Lecture 4 Hypothesis Testing

Module Contact: Dr Susan Long, ECO Copyright of the University of East Anglia Version 1

ECONOMETRICS - FINAL EXAM, 3rd YEAR (GECO & GADE)

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

PBAF 528 Week Theory Is the variable s place in the equation certain and theoretically sound? Most important! 2. T-test

One-sided finite-difference approximations suitable for use with Richardson extrapolation

Introduction to Regression

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

a. (All your answers should be in the letter!

JAB Chain. Long-tail claims development. ASTIN - September 2005 B.Verdier A. Klinger

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for P Charts. Dr. Wayne A. Taylor

Linear Regression Analysis: Terminology and Notation

NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER I EXAMINATION MTH352/MH3510 Regression Analysis

Chapter 14 Simple Linear Regression

DECADAL DECLINE ( )OF LOGGERHEAD SHRIKES ON CHRISTMAS BIRD COUNTS IN ALABAMA, MISSISSIPPI, AND TENNESSEE

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva

Statistics for Managers Using Microsoft Excel/SPSS Chapter 14 Multiple Regression Models

Chapter 3 Describing Data Using Numerical Measures

A Comparative Study for Estimation Parameters in Panel Data Model

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)

Chapter 9: Statistical Inference and the Relationship between Two Variables

# c i. INFERENCE FOR CONTRASTS (Chapter 4) It's unbiased: Recall: A contrast is a linear combination of effects with coefficients summing to zero:

STAT 3008 Applied Regression Analysis

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

A LINEAR PROGRAM TO COMPARE MULTIPLE GROSS CREDIT LOSS FORECASTS. Dr. Derald E. Wentzien, Wesley College, (302) ,

Developing a Data Validation Tool Based on Mendelian Sampling Deviations

Kernel Methods and SVMs Extension

Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 212. Chapters 14, 15 & 16. Professor Ahmadi, Ph.D. Department of Management

Linear regression. Regression Models. Chapter 11 Student Lecture Notes Regression Analysis is the

Andreas C. Drichoutis Agriculural University of Athens. Abstract

Gravitational Acceleration: A case of constant acceleration (approx. 2 hr.) (6/7/11)

A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS

Gravity Model and Zipf's Law: An In-Depth Study into the Nature of International Trade

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

The Ordinary Least Squares (OLS) Estimator

since [1-( 0+ 1x1i+ 2x2 i)] [ 0+ 1x1i+ assumed to be a reasonable approximation

Transcription:

UNR Jont Economcs Workng Paper Seres Workng Paper No. 08-005 Further Analyss of the Zpf Law: Does the Rank-Sze Rule Really Exst? Fungsa Nota and Shunfeng Song Department of Economcs /030 Unversty of Nevada, Reno Reno, NV 89557-0207 (775) 784-6850 Fax (775) 784-4728 emal: song@unr.nevada.edu September, 2008 Abstract The wdely-used Zpf law has two strkng regulartes: excellent ft and close-to-one exponent. When the exponent equals to one, the Zpf law collapses nto the rank-sze rule. Ths paper further analyzes the Zpf exponent. By changng the sample sze, the truncaton pont, and the mx of ctes n the sample, we found that the exponent s close to one only for some selected sub-samples. Usng the values of estmated exponent from the rollng sample method, we obtaned an elastcty of the exponent wth respect to sample sze. JEL Classfcaton: C1, R1 Keywords: Zpf law; Rank-sze rule; Rollng sample method

Further Analyss of the Zpf Law: Does the Rank-Sze Rule Really Exst? Fungsa Nota 1 and Shunfeng Song 2 Abstract: The wdely-used Zpf law has two strkng regulartes: excellent ft and close-to-one exponent. When the exponent equals to one, the Zpf law collapses nto the rank-sze rule. Ths paper further analyzes the Zpf exponent. By changng the sample sze, the truncaton pont, and the mx of ctes n the sample, we found that the exponent s close to one only for some selected sub-samples. Usng the values of estmated exponent from the rollng sample method, we obtaned an elastcty of the exponent wth respect to sample sze. JEL classfcaton: C1; R1 Keywords: Zpf law; Rank-sze rule; Rollng sample method 1 Nota s an Assstant Professor of Economcs at the Wartburg College, Iowa, USA. Emal: fungsa.nota@wartburg.edu 2 Song s a Professor of Economcs at the Unversty of Nevada, Reno, NV 89557, USA and an adjunct research fellow at Center for Research of Prvate Economy, Zhejang Unversty, Chna. Emal: song@unr.edu 1

1. Introducton: Zpf law states that the rank assocated wth some sze S s proportonal to S to some negatve power (Zpf, 1949). It has two strkng observatons. One s ts excellent ft. Numerous emprcal studes have shown that a lnear regresson of log-rank on log-sze generates an excellent ft (hgh R 2 -value). For example, Rosen and Resnck (1980) used data from 44 countres and found that R 2 -values were above 0.95 for 36 countres, wth only Thaland havng an R 2 -value lower than 0.9 (0.83). Ths astonshng regularty led Krugman (1995, p.44) to say that the rank-sze rule s "a major embarrassment for economc theory: one of the strongest statstcal phenomena we know, lackng any clear bass n theory. Fujta et al. (1999, p. 219) stated the regularty of the urban sze dstrbuton posses a real puzzle, one that nether our approach nor the most plausble alternatve approach to cty szes seems to answer. The other strkng observaton s about the Zpf coeffcent. For the 44 countres studed by Rosen and Resnck (1980), the estmated coeffcent ranges from 0.809 for ctes n Morocco to 1.963 for ctes n Australa. Ntsche (2005) analyzed 515 estmates from 29 studes of the rank-sze relatonshp and found that two-thrd of the estmated coeffcents are between 0.80 and 1.20. Several studes have attempted to explan why Zpf law holds. Gabax (1999a, 1999b) proved that the Zpf law derves from the Gbrat law, where the Gbrat law states that the growth process s ndependent of sze. Gan et al. (2006) concluded that the Zpf law s a statstcal phenomenon rather than an economc regularty. However, the strkng observaton of Zpf coeffcent close to 1 remans a puzzle. Is t an economc regularty or a statstcal phenomenon? Ths paper attempts to solve ths puzzle. The next secton outlnes the methodologes used n ths analyss, the rollng sample method and the random samplng method wth replacement. The thrd secton provdes the 2

results. The fnal secton summarzes the emprcal results and dscusses ther economc sgnfcance. Succnctly, ths paper seeks to fnd the mpact of sample sze, truncaton pont and the mx of ctes on the estmated exponent of the Zpf law. 2. The model and methodology Zpf law s commonly expressed n the followng form: β R = AS [1] where R s the rank of the th cty, S s the cty's sze and β s the exponent coeffcent. Wth a log transformaton, t estmates β as follows: log( R ) = α β log( S ) + ε [2] Several studes have noted that estmatng Equaton [2] yelds an OLS bas through the standard errors. To correct ths bas, Gabax and Ibragmov (2006) offered the followng verson 0.5 that gves unbased standard errors of ( 2 / ) β, where s the correspondng sub-sample sze: log( R 0.5) = α β log( S ) + ε [3] n Ths corrected verson s known as the rank-mnus-half rule. Throughout ths analyss, we wll provde results from both versons and comment on the dfferences that exst between them. The frst method we use n ths paper s the rollng sample method. We estmate the exponent coeffcent β usng OLS and repeat the estmaton process usng a movng truncaton pont. The start pont of each sub-sample s fxed at the largest cty and the truncaton pont moves down by one cty every tme, thereby ncreasng the sub-sample sze by one each tme. For example, the full sample sze of U.S. urbanzed areas for 1990 s 396. These urbanzed areas are ordered decreasngly from the largest urbanzed area of New York to the smallest one of ^ n 3

Brunswck, GA. The frst sub-sample sze s n1, the 10 largest ctes for example; then the second sub-sample s n 2 = n 1+ 1, the 11 largest ctes, and so on. We contnue ths process untl the last sub-sample becomes the full sample of 396. The advantage of ths methodology s to capture the coeffcent varaton as both the sample sze and truncaton pont change. The second method, random samplng wth replacement, separates some of the smultaneous effects captured wth the rollng sample. The rollng sample provdes the gross varaton n the estmated coeffcent as three factors change smultaneously (sample sze, truncaton pont and the varaton n cty szes). To untangle these effects, we use our orgnal data for each year as a pool to select from. We then randomly select the frst sub-sample, 10 random ctes for example, and rank them up. The second sub-sample s ndependent of the frst sub-sample. However, t contans one more cty, and so on. We contnue ths process untl the last sub-sample becomes the full sample. Snce ths s a random process, we run regresson 100 tmes for each sample sze and get 100 estmated coeffcents. We average these seres and obtan the dstrbuton of the coeffcent wth respect to sample sze. The thrd method s to further test the effect of sample sze on the dstrbuton of the estmated coeffcent. For ths, we randomly generate 1000 numbers from a normal dstrbuton. We then apply the random samplng technque and repeat the process we dd above. After 100 teratons, we average the seres of β 's and obtan the dstrbuton of the coeffcent wth respect to sample sze. ^ 4

3. Results Table 1 shows the full-sample results of Zpf law. Not surprsngly, we obtaned very hgh R 2 -values. Comparng the estmated coeffcents between OLS bas corrected and uncorrected models, we conclude that the uncorrected Zpf law has a downward bas. Table 1: Regresson results on Zpf's law usng data on US urbanzed areas OLS Bas R 2 Year ^β Corrected ^ β (from unadjusted) Sample Sze 1980 0.91 0.925 0.989 366 1990 0.895 0.913 0.989 396 2000 0.875 0.895 0.989 452 Data Sources: U.S Bureau of Census ( 2000). The rollng sample results show a negatve relatonshp between the estmated coeffcent and sample sze. Ths mples that small samples of bg ctes yeld hgher coeffcents than large samples that also nclude smaller ctes. Does the rank-sze rule exst? We note that the rank-sze rule only holds for certan sub-samples where the 95% confdence nterval ncludes 1. Specfcally, for the 1980 ctes, rank-sze rule holds only for sub-samples between 180 and 205; for 1990, 140 to 195; and for 2000, 140 to 205. Ths fndng suggests that the rank-sze rule (.e., β=1) does not holds for ether large ctes or the larger samples wth more small ctes. Fgure 1 shows the dstrbuton of estmated coeffcents wth respect to sample sze for 2000. 5

Fgure 1 Zpf's Law: U.S Urban Areas 2000 1.8 1.6 1.4 Pareto Exponent 1.2 1 0.8 0.6 440 425 410 395 380 365 350 335 320 305 290 275 260 245 230 215 200 185 170 155 140 125 100 85 70 55 40 25 10 Sample Sze Adj_Beta Unadj Interestngly, the graph suggests a lognormal dstrbuton. To confrm ths observaton, we run the followng regresson between the estmated exponent ( β ) and the sample sze (SS): log( ^ β ) = α δ log( SS ) + ε [4] ^ ^ Table 2 presents the results, wth observatons beng the number of estmated exponents ( β 's) obtaned from the rollng sample method. Surprsngly, the lognormal regresson yelds a very hgh R 2 -value, ndcatng a strong statstcal relatonshp between estmated coeffcent and sample sze. For the OLS-bas-corrected model, Table 2 shows that a one percent ncrease n the sample sze would lead to a 0.15 percent or more decrease n the value of the estmated exponent. The uncorrected model shows a smaller elastcty of estmated exponent wth respect to sample 6

sze, and ths explans why the uncorrected model converges wth the corrected model n Fgure 1. These results are mportant, because they prove that the valdty of the rank-sze rule largely depends on the sample sze used n a study. In other words, the rank-sze s not an economc regularty but a statstcal phenomenon. Table 2: The relatonshp between the estmated Zpf exponent and the sample sze OLS Bas R 2 Year ^δ Corrected ^ δ (from adjusted) Number of observatons 1980-0.10*** -0.15*** 0.98 355 1990-0.11*** -0.16*** 0.96 385 2000-0.13*** -0.17*** 0.97 441 ***: sgnfcant at 1% As we dscussed n the methodology secton, a dlemma exsts wth the rollng sample technque because t captures the jont effect of truncaton pont, sample sze, and the assortment of ctes n the sample. Usng the random samplng wth replacement technque whle ncreasng the sub-sample, we capture an assortment of ctes that can nclude all szes from the begnnng. Ths elmnates the bas due to large ctes n the frst sub-samples. By randomly samplng each tme, the truncaton pont also randomly changes. Ths elmnates the systematcally changng truncaton pont bas nherent n the rollng sample technque. Fgure 2 presents the dstrbuton of estmated coeffcents based on the random samplng method for 2000. It shows that sample 7

sze alone has an upward bas manly for sub-samples below 100. For sample szes greater than 100, the effect of sample sze dsappears as we ncrease the sample sze..e., the estmated coeffcent stays almost constant. Fgure 2 U.S. UA 2000: Random Samplng 1.2 1.1 1 Betas 0.9 0.8 0.7 0.6 440 425 410 395 380 365 350 335 320 305 290 275 260 245 230 215 200 185 170 155 140 125 110 95 80 65 50 35 20 5 Sample Sze To further test the effect of sample sze on the dstrbuton of the estmated coeffcent, we randomly generate 1000 numbers from a normal dstrbuton. We then apply the random samplng technque and repeat the process we dd above. After 100 teratons, we average the ^ seres of β 's and show the results n the graph below. Surprsngly, we stll capture the effect of very small sample szes below 100. Fgure 3 confrms the upward bas of samples less than 100. For sample sze greater than 100, the sample sze has lttle nfluence on the value of estmated coeffcent. 8

Fgure 3 Smulatons Results: Randomly generated Numbers 1.1 1 0.9 0.8 Betas 0.7 0.6 0.5 0.4 5 35 65 95 125 155 185 215 245 275 305 335 365 395 425 455 485 515 Sample Sze 545 575 605 635 665 695 725 755 785 815 845 875 905 935 965 995 4. Conclusons Ths paper has examned the valdty of the rank-sze rule based on estmated Zpf exponent. Usng the rollng sample technque, we proved that small samples wth large ctes tend to generate hgh values of the estmated coeffcent compared to samples domnated wth small ctes. The rank-sze rule holds only for some selected sub-samples. We also observed the upward bas of the estmated coeffcent when we used random samplng wth replacement technque and got random samples from a normal dstrbuton. The double log regresson model of estmated exponents and sample szes yelded a very hgh R 2 -value. It also produced an elastcty of the estmated exponent wth respect to sample sze, wth a one percent ncrease n the sample sze leadng to about 0.15 percent or more decrease n the value of the estmated exponent. Therefore, we conclude that the Zpf exponent depends on the sample sze used n a study and the rank-sze rule does not hold n general. In other words, the rank-sze s not an economc regularty but a statstcal phenomenon. 9

References: Fujta, M., Krugman, P., Venables, A.J., 1999. The Spatal Economy. The MIT Press, Cambrdge, MA. Gabax, X., 1999a. Zpf's law for ctes: an explanaton. Quartely Journal of Economcs CXIV (3), 739 767. Gabax, X., 1999b. Zpf's law and the growth of ctes. Amercan Economc Revew, Vol. 89 (2), 129 132. Gabax, X., Ibragmov, R., 2006. Rank ½: A smple way to mprove the OLS estmaton of tal exponents. Workng Paper. Gan, L., L, D., Song, S., 2006. Is the Zpf's law spurous n explanng cty-sze dstrbutons? Economc Letters 92, 256 262. Krugman, K., 1995. Development, Geography, and Economc Theory. The MIT Press, Cambrdge, MA. Ntsche, V., 2005. Zpf zpped. Journal of Urban Economcs 57, 86-100. Rosen, K., Resnck, M., 1980. The sze dstrbuton of ctes: An explanaton of the Pareto law and prmacy. Journal of Urban Economcs 8, 165-186. Zpf, G., 1949. Human behavor and the prncple of last effort. Cambrdge, MA: Addson Wesley Press. 10