Estimation and Testing for Rank Size Rule Regression under Pareto Distribution

Similar documents
Summary of the lecture in Biostatistics

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

Multiple Linear Regression Analysis

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

Part 4b Asymptotic Results for MRR2 using PRESS. Recall that the PRESS statistic is a special type of cross validation procedure (see Allen (1971))

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Simulation Output Analysis

Bayes Estimator for Exponential Distribution with Extension of Jeffery Prior Information

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then

Econometric Methods. Review of Estimation

ECONOMETRIC THEORY. MODULE VIII Lecture - 26 Heteroskedasticity

Simple Linear Regression

Comparison of Dual to Ratio-Cum-Product Estimators of Population Mean

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

A New Family of Transformations for Lifetime Data

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

Econometrics. 3) Statistical properties of the OLS estimator

Chapter 14 Logistic Regression Models

TESTS BASED ON MAXIMUM LIKELIHOOD

Recall MLR 5 Homskedasticity error u has the same variance given any values of the explanatory variables Var(u x1,...,xk) = 2 or E(UU ) = 2 I

Bootstrap Method for Testing of Equality of Several Coefficients of Variation

Chapter 13, Part A Analysis of Variance and Experimental Design. Introduction to Analysis of Variance. Introduction to Analysis of Variance

Analysis of Variance with Weibull Data

Chapter 5 Properties of a Random Sample

Estimation of Stress- Strength Reliability model using finite mixture of exponential distributions

Chapter 8. Inferences about More Than Two Population Central Values

Lecture 3. Sampling, sampling distributions, and parameter estimation

Lecture Notes Types of economic variables

Wu-Hausman Test: But if X and ε are independent, βˆ. ECON 324 Page 1

Bias Correction in Estimation of the Population Correlation Coefficient

22 Nonparametric Methods.

Median as a Weighted Arithmetic Mean of All Sample Observations

Application of Calibration Approach for Regression Coefficient Estimation under Two-stage Sampling Design

Class 13,14 June 17, 19, 2015

Chapter 3 Sampling For Proportions and Percentages

X ε ) = 0, or equivalently, lim

ESS Line Fitting

Goodness of Fit Test for The Skew-T Distribution

Simple Linear Regression

1. The weight of six Golden Retrievers is 66, 61, 70, 67, 92 and 66 pounds. The weight of six Labrador Retrievers is 54, 60, 72, 78, 84 and 67.

Lecture 1 Review of Fundamental Statistical Concepts

4. Standard Regression Model and Spatial Dependence Tests

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA

Special Instructions / Useful Data

Mean is only appropriate for interval or ratio scales, not ordinal or nominal.

STK4011 and STK9011 Autumn 2016

Idea is to sample from a different distribution that picks points in important regions of the sample space. Want ( ) ( ) ( ) E f X = f x g x dx

Introduction to local (nonparametric) density estimation. methods

LECTURE - 4 SIMPLE RANDOM SAMPLING DR. SHALABH DEPARTMENT OF MATHEMATICS AND STATISTICS INDIAN INSTITUTE OF TECHNOLOGY KANPUR

Chapter 10 Two Stage Sampling (Subsampling)

Bayesian Inferences for Two Parameter Weibull Distribution Kipkoech W. Cheruiyot 1, Abel Ouko 2, Emily Kirimi 3

STA302/1001-Fall 2008 Midterm Test October 21, 2008

ECON 5360 Class Notes GMM

Non-uniform Turán-type problems

Module 7: Probability and Statistics

9 U-STATISTICS. Eh =(m!) 1 Eh(X (1),..., X (m ) ) i.i.d

Point Estimation: definition of estimators

The Mathematical Appendix

Linear Regression with One Regressor

Bayes Interval Estimation for binomial proportion and difference of two binomial proportions with Simulation Study

PTAS for Bin-Packing

BAYESIAN INFERENCES FOR TWO PARAMETER WEIBULL DISTRIBUTION

Functions of Random Variables

MATH 247/Winter Notes on the adjoint and on normal operators.

12.2 Estimating Model parameters Assumptions: ox and y are related according to the simple linear regression model

Chapter 11 The Analysis of Variance

Objectives of Multiple Regression

Comparison of Parameters of Lognormal Distribution Based On the Classical and Posterior Estimates

GENERALIZED METHOD OF MOMENTS CHARACTERISTICS AND ITS APPLICATION ON PANELDATA

Chapter Business Statistics: A First Course Fifth Edition. Learning Objectives. Correlation vs. Regression. In this chapter, you learn:

Correlation and Simple Linear Regression

ENGI 3423 Simple Linear Regression Page 12-01

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

PROPERTIES OF GOOD ESTIMATORS

Chapter 13 Student Lecture Notes 13-1

Lecture 3 Probability review (cont d)

LINEAR REGRESSION ANALYSIS

: At least two means differ SST

Permutation Tests for More Than Two Samples

Dimensionality Reduction and Learning

Qualifying Exam Statistical Theory Problem Solutions August 2005

Bounds on the expected entropy and KL-divergence of sampled multinomial distributions. Brandon C. Roy

CHAPTER VI Statistical Analysis of Experimental Data

Handout #8. X\Y f(x) 0 1/16 1/ / /16 3/ / /16 3/16 0 3/ /16 1/16 1/8 g(y) 1/16 1/4 3/8 1/4 1/16 1

Parameter, Statistic and Random Samples

Chapter 8: Statistical Analysis of Simulated Data

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best

THE EFFICIENCY OF EMPIRICAL LIKELIHOOD WITH NUISANCE PARAMETERS

VOL. 3, NO. 11, November 2013 ISSN ARPN Journal of Science and Technology All rights reserved.

Statistics MINITAB - Lab 5

Can we take the Mysticism Out of the Pearson Coefficient of Linear Correlation?

THE ROYAL STATISTICAL SOCIETY 2016 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE MODULE 5

Example: Multiple linear regression. Least squares regression. Repetition: Simple linear regression. Tron Anders Moger

Chapter 3 Multiple Linear Regression Model

DERIVATIVE ESTIMATION WITH KNOWN CONTROL-VARIATE VARIANCES. Jamie R. Wieland Bruce W. Schmeiser

Simple Linear Regression - Scalar Form

Chapter 4 (Part 1): Non-Parametric Classification (Sections ) Pattern Classification 4.3) Announcements

Transcription:

Estmato ad Testg for Ra Sze Rule Regresso uder Pareto Dstrbuto Y Nshyama a S Osada a ad K Mormue b a Kyoto Isttute of ecoomc Research Kyoto Uversty Kyoto 66-85 Japa b Graduate School of Ecoomcs Kyoto Uversty Kyoto 66-85 Japa Abstract: Lettg S be the -th largest cty a coutry t s ofte observed that log S α + α log for some α > ad α < It s called ra sze rule whe α = Ths relatoshp has bee examed by meas of ordary least squares estmato ad t test the lterature However sce S s heterosedastc ad autocorrelated t statstcs do ot have stadard dstrbuto Ideed we show t p as the sample sze creases The purpose of ths paper s to obta statstcal propertes of OLS estmator of the ra sze rule regresso ad dstrbuto of t statstcs uder Pareto dstrbuto ad further to propose more effcet estmato procedures two ways Frstly we mprove effcecy by adjustg the heterosedastcty ad autocorrelato by GLS method Aother source of effcecy ga s to exclude some large varace observatos It seems GLS attas the Cramer-Rao lower boud for α Keywords: Ra sze rule; Zpf law; Pareto dstrbuto; Cty sze INTRODUCTION After poeerg wor o cty sze dstrbuto by Auerbach [93] ad Zpf [949] may researchers have vestgated a wde rage of settlemet systems Zpf's ma result called Zpf law s the followg Let S deote a radom varable represetg cty sze measured by ts populato the for large x P S x = A / x for some A> or Pareto dstrbuto wth ut expoet Ths s closely related to so-called ra sze rule of cty sze data Let S = be populato of ctes a coutry ad S be ts order statstcs satsfyg S S the we ofte observe that log S α + α log = where α > ad α < Ths relatoshp s called ra sze rule whe α = Whe Zpf law holds ra sze rule follows approxmately Regardg as a regresso model may researchers have estmated α ad α by ordary least squares OLS method ad mplemeted t test for α = Oe of the most mportat papers ths feld s Rose ad Resc [98] They examed cty sze dstrbuto of the 5 largest admstratve urba areas 44 coutres ad they cocluded valdty of the urba ra-sze rule appears to be a ope questo Soo [] also made a teratoal comparso usg updated data of 73 coutres May researchers cludg the above metoed oes have studed based o the OLS estmato ad t test But sce the depedet varable there does ot satsfy the stadard codtos of OLS regresso we caot evaluate the results The purpose of ths paper s to derve the exact ad approxmate propertes of the OLS estmator ad t test statstcs for the ra sze rule ull orα = We obta the bas ad varace of the estmator assumg S are depedetly ad detcally dstrbuted d Further we show t statstc does ot have t dstrbuto ule stadard classcal lear regresso theory because S are fact autocorrelated ad heterosedastc

uder the d assumpto Sce Zpf suggested t s ofte assumed that S have Pareto dstrbuto Uder ths assumpto we ca show E[log S ] = α + α log does ot strctly hold for α α small samples but t does approxmately oly for large ad The followg secto shows exact ad approxmate expressos for E[log S ] ad V [log S ] the derve the bas ad varace of the OLS estmator for The we preset Mote Carlo results o the dstrbuto of t value for the estmator whch s far away from t dstrbuto We further show t explodes asymptotcally dcatg t test s ever applcable to test the ull of α = Secto 3 proposes more effcet estmators whle Secto 4 gves emprcal results from Japaese cty sze data of Metropolta Employmet Area MEA Secto 5 s coclusos OLS ESTIMATION OF THE RANK SIZE RULE REGRESSION log + + log E ˆ α = { log log } j log log j= j = V ˆ α = { log log } We suppress the expressos for those of ˆα or the OLS estmate for α because t s of less mportace ad terest From ths proposto we derve the asymptotc expressos of bas ad varace: C log log E ˆ α = C = + o 4 3 D log log V ˆ α = D O = + + 3 Lettg B = E ˆ α Fgure draws B = C The bas decays as the sample sze creases Fgure shows V ˆ α = D whch also decreases wth Bas ad Varace of the OLS Estmator We state some results o the propertes of the estmator wthout proofs the sequel Assume S = are d from a Pareto dstrbuto fucto F S x = x > x The the followg lemma holds LEMMA Lettg S S } be the order { statstcs of S = satsfyg S S a + ] = + = E[log S b + V[log S ] = + = c Cov[log S log S j ] = Var[log S j ] < j Ths lemma straghtforwardly yelds the followg proposto PROPOSITION log log E[log S ] = + O + as Proposto mples that approxmato of s justfed whe ad are large Based o Proposto ad Lemma we ca obta the exact expectato ad varace of the OLS estmator ˆα for α as follows PROPOSITION Table tabulates these values for some The above results drectly suggest a smple bas correcto of the followg form: log log α log + + = ˆ α

We have two remars regardg ths estmator Frstly the multplcatve costat o the rght depeds oly o free from ay uow quattes ad thus t s easy ad feasble Secodly ths method ot oly elmates the bas but also reduces the varace because the multplcatve costat s smaller tha uty Table Values of C ad D C D 5-8 4-534 -34 99 3-83 4 The dstrbuto of t statstcs I testg the sgfcace of coeffcets of lear regresso models we mplemet t test I the preset case because log S the depedet varables are ot oly ormally dstrbuted but also heterosedastc ad autocorrelated We obtaed the dstrbuto of t statstcs for α the regresso uder the ull ofα = by Mote Carlo smulato Fgure 3 ad 4 show the hstogram from replcatos whe = respectvely The mea varace sewess ad urtoss are respectvely -5 7 43 whe = Therefore they are obvously far from t dstrbuto Table shows emprcal crtcal regos of two-sded t test whe = for dfferet szes calculated from the smulato whch should be used testgα = stead of quatles of t dstrbuto We mmedately ow we face severe sze dstorto f we bldly apply t test for α = because ts crtcal rego s set to be aroud - -][ the case of test wth 5% sze Table Emprcal crtcal regos of two-sded t test by smulato Sze Crtcal rego = % - -73] [64 5% - -7] [68 % - -79] [38 Moreover we foud a smulato ot reported here that t teds to become larger magtude as the sample sze creases Ths pheomeo s caused by the fact that stadard error of the regresso teds to zero as whch s proved the followg proposto PROPOSITION 3 Lettg s = {log S ˆ ˆ α α log we have a log E s = O p b s as Rewrtg the estmator by Rey represetato Rey [953] straghtforward applcato of Ldeberg-Feller cetral lmt theorem ad

Cramer devce yeld the followg lmtg dstrbuto of ˆ α ˆ PROPOSITION 4 α log ˆ α d log N ˆ α Proposto 3 ad 4 gve the followg result o t statstcs for ˆα PROPOSITION 5 For t = ˆ α s X ' X we have t p as where X ' X = = + o log log s the -elemet of X ' X Ths proposto dcates t value for ths regresso explodes asymptotcally uder the ull of true parameter value Therefore whe we would le to test a ull hypothess such asα = we ow we should ever use stadard t test especally whe the sample sze s large but we should apply a asymptotc ormalty based test usg ˆ α ˆ α / d N vew of Proposto we ca expect to mprove the statstcal propertes of the estmator by droppg some observatos wth smaller or larger observatos 3 GLS estmato Puttg ad y' = [log S log S log S ] X ' = Ω = V y loglog α α log GLS estmator for s smply = X ' Ω ad ts varace s V & $ % #! " X = X ' Ω X ' Ω X y 3 Expressos for the elemets of ' are Lemma b c A terestg feature 3 s that t s free from usace parameters ule usual GLS estmato Normally ' volves some usace parameters ad thus GLS estmato s feasble so that we eed to estmate ' the frst step practce I as recommeded eg Gabax ad Ioades 3 / volved the asymptotc varace s replaced by a cosstet estmator α uder ˆ the ull I may applcato wor such as Rose ad Resc 98 Alperovch 984 ad Soo mechacal applcato of t test provdes very large t values leadg to wrog coclusos 3 MORE EFFICIENT ESTIMATION We propose two methods of effcecy mprovemet the estmato of Oe s geeralzed least squares GLS method adjustg osphercal dsturbaces whle the other s a trmmed least squares regresso The dea s that observg Var[log S ] s larger for smaller ad also approxmato s worse for smaller

vew of Lemma b c ' tself volves a uow parameter but t appears oly as a multplcatve costat The due to the form of 3 t cacels so that 3 turs to be feasble Smlarly to the OLS estmator we ca obta the exact bas ad varace of GLS estmator aalogous to Proposto whch are Fgure 5 ad 6 We do ot preset them explctly because of ther log ad tedous expressos Table 3 provdes them for the same sample sze wth Table to compare wth those for the OLS Table 3 Bas ad varace of GLS estmator for α bas varace 5-368 -8 5-7 5 3-58 We gve the followg two remars regardg ths result comparg two tables Frstly GLS procedure reduces ot oly the varace but also the bas whch we dd ot expect because GLS s prmarly developed order to mprove the effcecy ot bas reducto Secodly we see the varace of GLS s about a half of that of OLS ad further t approxmately equals to / whch cocdes wth Cramer-Rao lower boud for / fact Therefore we atcpate GLS gves a effcet estmate comparable wth the maxmum lelhood estmator MLE 3 Trmmed OLS ad GLS Proposto ad Lemma a mply that source of the bas of least squares estmators s the approxmato error of + + by log log ad t s larger for smaller The we cojecture the bas ca be reduced by excludg observatos wth smaller Also Lemma b mply that varace of least squares estmators could become smaller f we trm observatos wth smaller though there should o doubt be tradeoff betwee effcecy ga by excluso of larger varace data pots ad effcecy loss due to the reduced sample sze Lettg ˆ α ˆ α ad be respectvely OLS ad GLS estmators from the subsample of [log S + log S ] where the larger observatos are excluded we have smlarly to Proposto E ˆ α ad ˆ α V ˆ α log + + = log C = = & $ ' $ % + + log + + log = + log + + + log log S! log = + = + V $ '! ' ' log log S log + #!! " ' + log Let ts -elemet be D / They are costats determed oly by ad depedet of uow quattes We ca smlarly obta the correspodg formulae for the GLS estmator but suppress them We tabulate the bas varace ad mea squared error MSE for both trmmed OLS ad GLS estmators Table 4 for = ad = 5 We fd larger yelds smaller bas magtude for both OLS ad GLS whle varace of OLS estmator attas the mmum whe =8 as a result of the trade-off metoed above GLS varace o the other had creases wth thus there s o effcecy ga but oly effcecy loss by decreased sample sze Based o the above fdgs we ca propose a optmal trmmg rule by the mmum MSE prcple

Whe = =9 gves the optmal trmmg OLS estmato whle GLS estmato = s the best choce I OLS estmato we atta about 33% MSE mprovemet I GLS estmato varace of y s stablzed by Ω see 3 so that we eed to exclude much less observatos tha the OLS We ote the best trmmg pots deped oly o because MSE= C + D where C ad D deped oly o ad Table 5 gves the best trmmg pots for some As easly expected we should exclude more observatos for larger sample sze 4 CONCLUSIONS We examed statstcal propertes of least squares estmators for ra sze rule regresso of cty sze uder Pareto dstrbuto Stadard method emprcal study of regoal scece has bee OLS estmato ad t test based o t We obtaed exact bas ad varace of OLS estmator for the coeffcet By Mote Carlo smulato we obtaed dstrbuto of t statstcs where we foud t statstc does ot have t dstrbuto ad we wll face a severe sze dstorto f we mplemet t test Moreover we proved t value asymptotcally explodes fact Table 4 Bas ad varace of trmmed OLS ad GLS estmators = basols varols MSEOLS basgls vargls MSEGLS -534 6 9-8 55-355 73 847-93 6 97-838 68 78-763 68 99 3-449 578 638-65 77 4 4-9 549 597-56 86 5-53 57-486 96 8 6-855 5 554-43 6 6 7-738 54 544-369 7 36 8-64 5 538-3 8 46 9-559 5 535-79 4 56-488 54 536-4 5 67-46 58 539-6 64 79-37 54 543-74 77 9 3-33 53 549-45 9 3 4-8 54 557-9 3 6 5-4 55 566-94 7 9 Table 5 Optmal trmmg pots OLS GLS 5 6 9 7 5 39 We propose to apply GLS procedure because the explaed varable s heterosedastc ad autocorrelated Both of the bas ad varace are sgfcatly reduced ad we beleve the varace attas Cramer-Rao lower boud As aother tool of effcecy mprovemet we propose a trmmed least squares method whch wors well for OLS but ot so clearly effectve for GLS Obvously whe we are sure of the Pareto assumpto GLS or MLE s the best but whe we are ot so sure OLS may have a advatage from robustess pot of vew ad we beleve trmmed OLS may have a good performace because log S should stll have larger varace for smaller eve f the uderlyg dstrbuto s ot Pareto Research toward ths drecto s curretly uder way 5 REFERENCES Alperovch GA The Sze Dstrbuto of Ctes: O the Emprcal Valdty of the Ra-Sze Rule Joural of Urba Ecoomcs 6 3-39 984

Auerbach F Das Gesetz der Bevolerugsozetrato Petermas Geoghsche Mtteluge 59 74-76 93 Gabax H ad YM Ioades The Evoluto of Cty Sze Dstrbutos forthcomg Hadboo of Urba ad Regoal Ecoomcs vol4 3 Rey A O the theory of order statstcs Acta Math Acad Sc Hug 4 9-3 953 Rose KT ad M Resc The Sze dstrbuto of Ctes: A Explaato of the Pareto Low ad Prmacy Joural of Urba Ecoomcs 8 65-86 98 Soo KT Zpf's Law for Ctes: A Cross Coutry Ivestgato mmeo Lodo School of Ecoomcs Zpf GK Huma Behavour ad the Prcple of Least Effort A Itroducto to Huma Ecology Cambrdge MA: Addso-Wesley 949