Estimation and Testing for Rank Size Rule Regression under Pareto Distribution

Estmato ad Testg for Ra Sze Rule Regresso uder Pareto Dstrbuto Y Nshyama a S Osada a ad K Mormue b a Kyoto Isttute of ecoomc Research Kyoto Uversty Kyoto 66-85 Japa b Graduate School of Ecoomcs Kyoto Uversty Kyoto 66-85 Japa Abstract: Lettg S be the -th largest cty a coutry t s ofte observed that log S α + α log for some α > ad α < It s called ra sze rule whe α = Ths relatoshp has bee examed by meas of ordary least squares estmato ad t test the lterature However sce S s heterosedastc ad autocorrelated t statstcs do ot have stadard dstrbuto Ideed we show t p as the sample sze creases The purpose of ths paper s to obta statstcal propertes of OLS estmator of the ra sze rule regresso ad dstrbuto of t statstcs uder Pareto dstrbuto ad further to propose more effcet estmato procedures two ways Frstly we mprove effcecy by adjustg the heterosedastcty ad autocorrelato by GLS method Aother source of effcecy ga s to exclude some large varace observatos It seems GLS attas the Cramer-Rao lower boud for α Keywords: Ra sze rule; Zpf law; Pareto dstrbuto; Cty sze INTRODUCTION After poeerg wor o cty sze dstrbuto by Auerbach [93] ad Zpf [949] may researchers have vestgated a wde rage of settlemet systems Zpf's ma result called Zpf law s the followg Let S deote a radom varable represetg cty sze measured by ts populato the for large x P S x = A / x for some A> or Pareto dstrbuto wth ut expoet Ths s closely related to so-called ra sze rule of cty sze data Let S = be populato of ctes a coutry ad S be ts order statstcs satsfyg S S the we ofte observe that log S α + α log = where α > ad α < Ths relatoshp s called ra sze rule whe α = Whe Zpf law holds ra sze rule follows approxmately Regardg as a regresso model may researchers have estmated α ad α by ordary least squares OLS method ad mplemeted t test for α = Oe of the most mportat papers ths feld s Rose ad Resc [98] They examed cty sze dstrbuto of the 5 largest admstratve urba areas 44 coutres ad they cocluded valdty of the urba ra-sze rule appears to be a ope questo Soo [] also made a teratoal comparso usg updated data of 73 coutres May researchers cludg the above metoed oes have studed based o the OLS estmato ad t test But sce the depedet varable there does ot satsfy the stadard codtos of OLS regresso we caot evaluate the results The purpose of ths paper s to derve the exact ad approxmate propertes of the OLS estmator ad t test statstcs for the ra sze rule ull orα = We obta the bas ad varace of the estmator assumg S are depedetly ad detcally dstrbuted d Further we show t statstc does ot have t dstrbuto ule stadard classcal lear regresso theory because S are fact autocorrelated ad heterosedastc

uder the d assumpto Sce Zpf suggested t s ofte assumed that S have Pareto dstrbuto Uder ths assumpto we ca show E[log S ] = α + α log does ot strctly hold for α α small samples but t does approxmately oly for large ad The followg secto shows exact ad approxmate expressos for E[log S ] ad V [log S ] the derve the bas ad varace of the OLS estmator for The we preset Mote Carlo results o the dstrbuto of t value for the estmator whch s far away from t dstrbuto We further show t explodes asymptotcally dcatg t test s ever applcable to test the ull of α = Secto 3 proposes more effcet estmators whle Secto 4 gves emprcal results from Japaese cty sze data of Metropolta Employmet Area MEA Secto 5 s coclusos OLS ESTIMATION OF THE RANK SIZE RULE REGRESSION log + + log E ˆ α = { log log } j log log j= j = V ˆ α = { log log } We suppress the expressos for those of ˆα or the OLS estmate for α because t s of less mportace ad terest From ths proposto we derve the asymptotc expressos of bas ad varace: C log log E ˆ α = C = + o 4 3 D log log V ˆ α = D O = + + 3 Lettg B = E ˆ α Fgure draws B = C The bas decays as the sample sze creases Fgure shows V ˆ α = D whch also decreases wth Bas ad Varace of the OLS Estmator We state some results o the propertes of the estmator wthout proofs the sequel Assume S = are d from a Pareto dstrbuto fucto F S x = x > x The the followg lemma holds LEMMA Lettg S S } be the order { statstcs of S = satsfyg S S a + ] = + = E[log S b + V[log S ] = + = c Cov[log S log S j ] = Var[log S j ] < j Ths lemma straghtforwardly yelds the followg proposto PROPOSITION log log E[log S ] = + O + as Proposto mples that approxmato of s justfed whe ad are large Based o Proposto ad Lemma we ca obta the exact expectato ad varace of the OLS estmator ˆα for α as follows PROPOSITION Table tabulates these values for some The above results drectly suggest a smple bas correcto of the followg form: log log α log + + = ˆ α

We have two remars regardg ths estmator Frstly the multplcatve costat o the rght depeds oly o free from ay uow quattes ad thus t s easy ad feasble Secodly ths method ot oly elmates the bas but also reduces the varace because the multplcatve costat s smaller tha uty Table Values of C ad D C D 5-8 4-534 -34 99 3-83 4 The dstrbuto of t statstcs I testg the sgfcace of coeffcets of lear regresso models we mplemet t test I the preset case because log S the depedet varables are ot oly ormally dstrbuted but also heterosedastc ad autocorrelated We obtaed the dstrbuto of t statstcs for α the regresso uder the ull ofα = by Mote Carlo smulato Fgure 3 ad 4 show the hstogram from replcatos whe = respectvely The mea varace sewess ad urtoss are respectvely -5 7 43 whe = Therefore they are obvously far from t dstrbuto Table shows emprcal crtcal regos of two-sded t test whe = for dfferet szes calculated from the smulato whch should be used testgα = stead of quatles of t dstrbuto We mmedately ow we face severe sze dstorto f we bldly apply t test for α = because ts crtcal rego s set to be aroud - -][ the case of test wth 5% sze Table Emprcal crtcal regos of two-sded t test by smulato Sze Crtcal rego = % - -73] [64 5% - -7] [68 % - -79] [38 Moreover we foud a smulato ot reported here that t teds to become larger magtude as the sample sze creases Ths pheomeo s caused by the fact that stadard error of the regresso teds to zero as whch s proved the followg proposto PROPOSITION 3 Lettg s = {log S ˆ ˆ α α log we have a log E s = O p b s as Rewrtg the estmator by Rey represetato Rey [953] straghtforward applcato of Ldeberg-Feller cetral lmt theorem ad

Cramer devce yeld the followg lmtg dstrbuto of ˆ α ˆ PROPOSITION 4 α log ˆ α d log N ˆ α Proposto 3 ad 4 gve the followg result o t statstcs for ˆα PROPOSITION 5 For t = ˆ α s X ' X we have t p as where X ' X = = + o log log s the -elemet of X ' X Ths proposto dcates t value for ths regresso explodes asymptotcally uder the ull of true parameter value Therefore whe we would le to test a ull hypothess such asα = we ow we should ever use stadard t test especally whe the sample sze s large but we should apply a asymptotc ormalty based test usg ˆ α ˆ α / d N vew of Proposto we ca expect to mprove the statstcal propertes of the estmator by droppg some observatos wth smaller or larger observatos 3 GLS estmato Puttg ad y' = [log S log S log S ] X ' = Ω = V y loglog α α log GLS estmator for s smply = X ' Ω ad ts varace s V & $ % #! " X = X ' Ω X ' Ω X y 3 Expressos for the elemets of ' are Lemma b c A terestg feature 3 s that t s free from usace parameters ule usual GLS estmato Normally ' volves some usace parameters ad thus GLS estmato s feasble so that we eed to estmate ' the frst step practce I as recommeded eg Gabax ad Ioades 3 / volved the asymptotc varace s replaced by a cosstet estmator α uder ˆ the ull I may applcato wor such as Rose ad Resc 98 Alperovch 984 ad Soo mechacal applcato of t test provdes very large t values leadg to wrog coclusos 3 MORE EFFICIENT ESTIMATION We propose two methods of effcecy mprovemet the estmato of Oe s geeralzed least squares GLS method adjustg osphercal dsturbaces whle the other s a trmmed least squares regresso The dea s that observg Var[log S ] s larger for smaller ad also approxmato s worse for smaller

vew of Lemma b c ' tself volves a uow parameter but t appears oly as a multplcatve costat The due to the form of 3 t cacels so that 3 turs to be feasble Smlarly to the OLS estmator we ca obta the exact bas ad varace of GLS estmator aalogous to Proposto whch are Fgure 5 ad 6 We do ot preset them explctly because of ther log ad tedous expressos Table 3 provdes them for the same sample sze wth Table to compare wth those for the OLS Table 3 Bas ad varace of GLS estmator for α bas varace 5-368 -8 5-7 5 3-58 We gve the followg two remars regardg ths result comparg two tables Frstly GLS procedure reduces ot oly the varace but also the bas whch we dd ot expect because GLS s prmarly developed order to mprove the effcecy ot bas reducto Secodly we see the varace of GLS s about a half of that of OLS ad further t approxmately equals to / whch cocdes wth Cramer-Rao lower boud for / fact Therefore we atcpate GLS gves a effcet estmate comparable wth the maxmum lelhood estmator MLE 3 Trmmed OLS ad GLS Proposto ad Lemma a mply that source of the bas of least squares estmators s the approxmato error of + + by log log ad t s larger for smaller The we cojecture the bas ca be reduced by excludg observatos wth smaller Also Lemma b mply that varace of least squares estmators could become smaller f we trm observatos wth smaller though there should o doubt be tradeoff betwee effcecy ga by excluso of larger varace data pots ad effcecy loss due to the reduced sample sze Lettg ˆ α ˆ α ad be respectvely OLS ad GLS estmators from the subsample of [log S + log S ] where the larger observatos are excluded we have smlarly to Proposto E ˆ α ad ˆ α V ˆ α log + + = log C = = & $ ' $ % + + log + + log = + log + + + log log S! log = + = + V $ '! ' ' log log S log + #!! " ' + log Let ts -elemet be D / They are costats determed oly by ad depedet of uow quattes We ca smlarly obta the correspodg formulae for the GLS estmator but suppress them We tabulate the bas varace ad mea squared error MSE for both trmmed OLS ad GLS estmators Table 4 for = ad = 5 We fd larger yelds smaller bas magtude for both OLS ad GLS whle varace of OLS estmator attas the mmum whe =8 as a result of the trade-off metoed above GLS varace o the other had creases wth thus there s o effcecy ga but oly effcecy loss by decreased sample sze Based o the above fdgs we ca propose a optmal trmmg rule by the mmum MSE prcple

Whe = =9 gves the optmal trmmg OLS estmato whle GLS estmato = s the best choce I OLS estmato we atta about 33% MSE mprovemet I GLS estmato varace of y s stablzed by Ω see 3 so that we eed to exclude much less observatos tha the OLS We ote the best trmmg pots deped oly o because MSE= C + D where C ad D deped oly o ad Table 5 gves the best trmmg pots for some As easly expected we should exclude more observatos for larger sample sze 4 CONCLUSIONS We examed statstcal propertes of least squares estmators for ra sze rule regresso of cty sze uder Pareto dstrbuto Stadard method emprcal study of regoal scece has bee OLS estmato ad t test based o t We obtaed exact bas ad varace of OLS estmator for the coeffcet By Mote Carlo smulato we obtaed dstrbuto of t statstcs where we foud t statstc does ot have t dstrbuto ad we wll face a severe sze dstorto f we mplemet t test Moreover we proved t value asymptotcally explodes fact Table 4 Bas ad varace of trmmed OLS ad GLS estmators = basols varols MSEOLS basgls vargls MSEGLS -534 6 9-8 55-355 73 847-93 6 97-838 68 78-763 68 99 3-449 578 638-65 77 4 4-9 549 597-56 86 5-53 57-486 96 8 6-855 5 554-43 6 6 7-738 54 544-369 7 36 8-64 5 538-3 8 46 9-559 5 535-79 4 56-488 54 536-4 5 67-46 58 539-6 64 79-37 54 543-74 77 9 3-33 53 549-45 9 3 4-8 54 557-9 3 6 5-4 55 566-94 7 9 Table 5 Optmal trmmg pots OLS GLS 5 6 9 7 5 39 We propose to apply GLS procedure because the explaed varable s heterosedastc ad autocorrelated Both of the bas ad varace are sgfcatly reduced ad we beleve the varace attas Cramer-Rao lower boud As aother tool of effcecy mprovemet we propose a trmmed least squares method whch wors well for OLS but ot so clearly effectve for GLS Obvously whe we are sure of the Pareto assumpto GLS or MLE s the best but whe we are ot so sure OLS may have a advatage from robustess pot of vew ad we beleve trmmed OLS may have a good performace because log S should stll have larger varace for smaller eve f the uderlyg dstrbuto s ot Pareto Research toward ths drecto s curretly uder way 5 REFERENCES Alperovch GA The Sze Dstrbuto of Ctes: O the Emprcal Valdty of the Ra-Sze Rule Joural of Urba Ecoomcs 6 3-39 984

Auerbach F Das Gesetz der Bevolerugsozetrato Petermas Geoghsche Mtteluge 59 74-76 93 Gabax H ad YM Ioades The Evoluto of Cty Sze Dstrbutos forthcomg Hadboo of Urba ad Regoal Ecoomcs vol4 3 Rey A O the theory of order statstcs Acta Math Acad Sc Hug 4 9-3 953 Rose KT ad M Resc The Sze dstrbuto of Ctes: A Explaato of the Pareto Low ad Prmacy Joural of Urba Ecoomcs 8 65-86 98 Soo KT Zpf's Law for Ctes: A Cross Coutry Ivestgato mmeo Lodo School of Ecoomcs Zpf GK Huma Behavour ad the Prcple of Least Effort A Itroducto to Huma Ecology Cambrdge MA: Addso-Wesley 949