MPRA Munich Personal RePEc Archive Estimating International Migration on the Base of Small Area echniques Vergil Voineagu an Nicoleta Caragea an Silvia Pisica 2013 Online at http://mpra.ub.uni-muenchen.e/48775/ MPRA Paper No. 48775, poste 1. August 2013 19:56 UC
Professor Vergil VOINEAGU, PhD Acaemy of Economic Stuies E-mail: vergil.voineagu@yahoo.com Associate Professor Nicoleta CARAGEA, PhD Ecological University of Bucharest E-mail: nicoletacaragea@gmail.com Silvia PISICA, PhD National Institute of Statistics E-mail: silvia.pisica@insse.ro ESIMAING INERNAIONAL MIGRAION ON HE BASE OF SMALL AREA ECHNIQUES Abstract. Population migration flow is a component of population facing ifficulties in measuring in the inter-census perio of time. he rationale of this stuy is that Romanian statistics on international migration flows are of very poor quality, the availability of ata on past trens being strongly limite, provie only from aministrative sources. For this reason, in the inter-census perio, the variable of interest is provie by the labour force survey available at national an regional level every quarter of the year since 2004. he smaller isaggregation like localities level using irect estimators conucts to results of unreliable estimates an will surely lea to higher stanar error an consequently, high coefficients of variation. he main reason for this is the insufficient number of responents or no responent at all in a small omain. Small area estimation techniques are able to carry out the estimation at the localities level (NUS 1 5). he main purpose is to provie methos able to estimate the population in Romania, base on the Labour Force Survey an also the results of 2002, respectively 2011 population census. Key wors: international migration, population, emography, statistics, small area estimation JEL classification: C13, C15, C53 1. Introuction 1 NUS Nomenclature of erritorial Units for Statistics
Vergil Voineagu, Nicoleta Caragea, Silvia Pisica Base on small area estimation techniques, we inten to carry out a simulation of international migration using the results of the Population Census (2002 an 2011, respectively). International migration consists of two components: emigration an immigration. Statistically speaking, accoring to Regulation (EC) No 862/2007 on Community statistics on migration an international protection, we efine the two components of international migration as follows: - immigration means the action by which a person establishes his or her usual resience in the territory of a Member State for a perio that is, or is expecte to be, of at least 12 months, having previously been usually resient in another Member State or a thir country; - emigration means the action by which a person, having previously been usually resient in the territory of a Member State, ceases to have his or her usual resience in that Member State for a perio that is, or is expecte to be, of at least 12 months. Every EU citizen has the right to live an work in any country within the Community. his is one of the most tangible EU membership benefits its citizens enjoy. For some, this involve moving from poorer countries to richer countries, generally to northwestern Europe to benefit from higher wages an better living conitions. However, free movement of persons in Europe creates ifficulties in the measurement of international migration in general an international emigration in particular. he ata on immigration accurately capture the phenomenon, as they are recore through aministrative sources an provie annually by IGI (Inspectorate General for Immigration) for foreigners establishing their resience in Romania an by DEPABD (Directorate for People an Database Aministration) for Romanian citizens reestablishing their resience in Romania. Emigration is, however, very ifficult to quantify; it is more ifficult to count people leaving than those arriving in a country. National legislation oes not provie for citizens obligation to notify authorities when establishing usual resience in another country. Registration in the Passports Directorate recors is performe only if Romanian citizens request establishing their omicile (permanent resience) in another state, either an EU member or a non-member. hus in terms of emigration, the existing 2 ata from aministrative sources currently o not cover the whole phenomenon, as there is a severe unerestimation in the number of emigrants; this eficit gives an overvaluation of the Romanian population. Lack of availability of exact figures on emigration le to the nee for new statistical thinking base on estimation methos. On the recommenation of the European Commission uner Article 9 (1) of Regulation 862/2007, in the course of the statistical proceure, the National Statistical Institutes are allowe to use wellocumente statistical estimation methos base on scientific ata. Such estimations can be carrie out when ata observe irectly is not available or, for 2 At the en of 2012
Kernel-base Methos for Learning Non-linear SVM example, when ata from aministrative sources must be ajuste to meet the efinitions. Statistical estimations have been use in the past in a number of countries as part of the prouction of official statistics on migration, especially when ata sources from statistical sample surveys were use mainly. he intent of this Regulation provision was to ensure that where national authorities woul still use estimations, the estimating proceures use are transparent an clearly ocumente. 2. Description of the estimation metho (SAE) - conceptual framework he basic iea of the metho is to apply econometric moels to estimate international migration for the lowest aministrative units. Small area estimation metho involves proucing estimators for geographical areas for which the irect results obtaine from statistical sample surveys are not reliable (of trust). Conceptualization of small area estimation can create confusion, because this technique oes not require that the omains must be small, but the number of statistical units selecte from the respective geographical areas must be low. herefore, istrust on irect estimates for small areas (by localities, for example) results from the fact that the samples comprise a too small number of statistical units, or - in some cases this even o not exist. Small area estimation borrows relevance an accuracy by combining ata from sample surveys with covariates from other ata sources (census or aministrative sources). However, the estimates shoul be treate carefully ue to estimation errors. It is known the iea that any estimation generates suspicions concerning the way to fit the moel an the accuracy of the results an thus suggests the existence of potential errors resulting from the ifference between the estimates an the true values. In this sense, particular interest will be given to the iagnosis of the moels applie; respectively the sources of error 3 with impact on the results of the estimation base on the small area techniques will be examine. Even so, one shoul consier a clear message: the results of estimation methos cannot guarantee that these are the true values of each range eeme small. Preiction errors give an inication of moel s reliability in terms of estimate values closeness to the reality, but neither these errors cannot be certain, because in practice, the true values of the variable of interest are unknown. ypology of estimators obtaine by applying estimation techniques on small areas an escription of significance thereof. We istinguish between three types of estimators: 3 here are mainly three types of errors: sampling errors, auxiliary variables errors an moel generate errors.
Vergil Voineagu, Nicoleta Caragea, Silvia Pisica a) Direct Estimators - are erive from ata obtaine from the sample survey (specifically, irect estimators consist in grossing up the sample ata at locality level base on the weighting coefficients - Horwitz- hompson estimators) b) Synthetic Estimators - are etermine by applying a regression moel combining the covariates corresponing to the sample survey; c) Aggregate Estimators - are obtaine base on a linear combination between the Direct Estimator an the Synthetic Estimator. o ensure representativeness on small areas, the estimators must be unbiase (the estimate average of the variable of interest must represent all the statistical units in the sample). Unbiase Estimators are usually obtaine by selecting very large samples, the selection comprising statistical units istribute in all the small omains (esign-unbiase estimators). Direct Estimators can be use successfully in this case. It is not always possible to reach the representativeness of the ata only by using samples. herefore, it is necessary to use econometric methos to etermine some unbiase estimators (moel-unbiase estimators). GREG Estimator he generalize regression estimation (GREG) is a esign-base moelassiste approach with numerous applications to omain estimation an it is sometimes use also in small area estimation. GREG is obtaine by ajusting the irect estimator with the ifferences between covariates provie by two sources of ata. he moel that ajusts the irect estimator is base on the correlation between the y variable of interest an covariates x i. he GREG estimator is moel-assiste in the sense that regressing y on x is one only for removing the unexplaine variation from y to increase estimation accuracy. Regression equation can be written as: GREG 1 1 Ŷ w x ˆ w iyi X i i (*) Nˆ is Nˆ is GREG inclues irect estimator, calculate exclusively on the basis of ata obtaine from the survey (LFS). he irect estimator is a sum of the Horvitz- hompson estimator: DIREC 1 Ŷ w iyi (**) Nˆ is Where: Nˆ - represents the total population for each area wi - weight of selection / inverse value of the inclusion probability of unit i in area y - variable of interest for unit i in area i
Kernel-base Methos for Learning Non-linear SVM - number of small areas (such as localities, accoring to ientifier coe of the omain) ˆ - regression coefficients X - covariates containe in census file x - covariates containe in LFS file Base on formulas (*) an (**) is obtaine the mathematical expression of GREG estimator: GREG DIREC 1 Ŷ w x ˆ Ŷ X i i Nˆ is As a result, we get GREG estimator (estimate values of the y variable of interest, base on regression between the irectly estimate y values an covariates values obtaine from two ata sources). It will be observe that the GREG estimator ajusts the irect estimator in terms of a higher egree of homogeneity, i.e. a smaller variation of the ata obtaine at area level. Will compare variances for the two or more estimators. It is expecte that the ispersion for GREG estimator to be less than the one of irect estimator. Disclaimer: using JoSAE package from R, it is obtaine the GREG estimator without aitional calculations. For finer ajustments are use other estimators that are base on more complex econometric moels (moel-unbiase estimators), as follows in the next section. SYNH Estimator he synthetic estimator is base on assuming a (linear) moel for the ata so that the values of the areas that have not been sample are estimate from the moel using only information for available covariates. In other wors, the Synthetic estimator is set up so that involves linearity between the variable of interest / epenent variable an inepenent variables / influence factors for all areas, incluing those that were not inclue in the sample, the values on area level are estimate using the aitional information fiels known to the entire population statistics (census, for example, or other aministrative sources). he general equation of synthetic estimator can be written as: y x u e i Where: x - is the transpose matrix compose of covariates values obtaine in the areas / localities for the entire population (known values of census) - regression coefficients u, - the resiuals of the regression whose average is zero an variances e are u, e ( u, e have normal istribution, centere, with variances u, e )
Vergil Voineagu, Nicoleta Caragea, Silvia Pisica u - the ranom-effect resiuals e - resiuals ue to fixe effect An equivalent equation can be written as: y x zu e Where: y an e - are vectors of size (m x 1) x - is a matrix of size (m x p) - is a vector of size (px 1) u - is a vector of size (D x 1) z - is a matrix of size (m x D) m - number of localities / small areas containe in the sample p - number of covariates D - number of small areas ranging from the entire population (in this case equals the number of localities, accoring to area ientifier coe = m). Using JoSAE package from R, the Synth estimator is obtaine. EBLUP Estimator (Empirical Best Linear Unbiase Preictor) In statistics, BLUP is the resulting value of a preictor base on linear mixe moels to estimate parameters ue to regression's ranom effects. As we mentione in the previous section, linear mixe regression moels may be applie if the iniviual ata (unit level) can be organize into groups / areas / clusters (area level). In these cases, the theoretical values of the epenent variable / variables of interest are correlate with factor variable / inepenent variables through a regression function whose parameters can be estimate by ifferent methos [2]. hese regression parameters can be generate by fixe effect regression (number of coefficients is equal to the number of factors in the moel), or can be generate by ranom effect (number of coefficients are multiplie by the number of areas / groups). EBLUP is the sum of sample observations an preicte values of nonsample observations of variable y. EBLUP estimator, at statistical unit level (unit level), is calculate as: EBLUP _ A X (y x ) Or an equivalent formula: EBLUP _ A (X unit x ) unit y he estimates of are obtaine using stanar Generalize Least Square techniques, whilst the estimates of u are compute using their Empirical Best Linear Unbiase Preictor (EBLUP). EBLUP estimator in the omain level/ area level (area level) is calculate as: EBLUP _ B X (y x ) area unit area
Kernel-base Methos for Learning Non-linear SVM Or an equivalent formula: EBLUP _ B (X x ) area y Where: X - Is the transpose matrix compose of covariates values obtaine in the localities for the entire population (from census). t x - ranspose of covariates at localities level for units in the sample (from AMIGO/LFS). y - variable of interest at area level containe in the sample he funamental ifference between GREG, Synth an EBLUP estimators is that EBLUP estimators reuce the noise / enhance large ispersions of the mean values estimate by using regressors. Use of this regressor has as a result the moeling in u resiual values (unknown) of population (from census) base on e resiual values of iniviuals in the sample (from LFS). Using JoSAE package from R, we obtain the output for EBLUP, GREG an Synth estimators. he first results are actually "first step" in econometric moeling approach. Visualization of the results, their interpretation an statistical analysis creates the potential ways to improve the moels use an the resumption of techniques of estimation. he main iagnosis inicators are generate by efault by JoSAE packages use in the R software. For example, in case of linear regression moel is use nlme package incorporating the function lm (linear moel). A summary of the results obtaine presents the regression coefficients (calculate using the leastsquares metho), stanar eviations, t-stuent an F-statistic significance tests, an values of probability values of regressors estimation for various egrees of freeom. 3. Conclusions he main conclusion is there are a set of ifficulties encountere in the process of estimation. Because of them, the result of small area estimation of international migration is still in progress. Estimations will be use in near future in the calculation of population, migration stock being one of the emographic component. Consiering the methoological complexity of the small area estimation moels, it is expecte that the process of applying the methoology will take long an will set out a number ifficulties. 1. he main ifficulty arises from the fact that the sample of LFS (the only source with annual perioicity that provies information on the variable of interest: the number of emigrants left by 12 months an over) was not esigne to rigorously estimate the international migration. Moreover, the sample oes not fully cover all areas of Romania (about a quarter).
Vergil Voineagu, Nicoleta Caragea, Silvia Pisica 2. A secon set of problems is generate by applying the econometric moels from which migration is estimate. Moreover, small area estimation moels use ata from ifferent sources, involving a large number of estimation stages, which coul lea to isplacement of estimators besie real values (bias). Rao [10] consiers that we shoul take extra-precaution when using the sae metho for approximating the mean square errors, especially when the sample size is small an the variance components are large. 3. Other limitations in applying estimation moels are relate to ata availability. For example, an important factor with irect impact on international migration is the ifference in economic welfare between countries of estination an countries of origin. Acquaintance with variables that quantify the economic welfare of the iniviual, such as income, coul lea to increase significance estimates. Note that the variable shoul be available in both ata sources (sample, respectively census). 4. Detailing the variables on isaggregation levels is not possible to estimate, but by eepening small area estimation proceures (requires sample segregation accoringly its iminution, epening on the variable of interest). REFERENCES [1] Baltagi, H.B., (2011), Econometrics, Springer Publisher, ISBN 978-3-642-20058-8. [2] Battese, G. E., Harter, R. M. & Fuller, W. A. (1988), An error-components moel for preiction of county crop areas using survey an satellite ata, Journal of the American Statistical Association, 83, 28-36 [3] Breienbach, J. an Astrup, R. (submitte 2011), Small area estimation of forest attributes in the Norwegian National Forest Inventory. European Journal of Forest Research. [4] C.-E. S arnal, B. Swensson, an J. Wretman. Moel Assiste Survey Sampling.Springer-Verlag Inc., New York, 1992. [5] Caragea, N., Alexanru, A.C., Dobre, A.M. (2012), Bringing New Opportunities to Develop Statistical Software an Data Analysis ools in Romania, he Proceeings of the VIth International Conference on Globalization an Higher Eucation in Economics an Business Aministration, ISBN: 978-973-703-766-4, pp.450-456. [6] Gomez-Rubio (2008), utorial on small area estimation, UseR conference 2008, August 12-14, echnische Universitat Dortmun, Germany. [7] Jula, N. an Jula, D., (2009), Moelare economica. Moele econometrice si e optimizare., Mustang Publisher, ISBN 978-606-8058-14-6 [8] Michele D Al o, Loreana Di Consiglio, Stefano Falorsi, Fabrizio Solari, Course on Small Area Estimation, ESSnet Project on SAE, Small area estimation
Kernel-base Methos for Learning Non-linear SVM [9] R Development Core eam (2005). R: A language an environment for statistical computing. R Founation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL: http://www.r-project.org. [10] Rao, J.N.K. (2003), Small area estimation. [11] Sarnal, C. (1984), Design-consistent versus moel-epenent estimation for small omains, Journal of the American Statistical Association, JSOR, 624-631 [12] Schoch,. (2011), rsae: Robust Small Area Estimation. R package version 0.1-3. [13] Voineagu, V., Pisica, S., Caragea, N., (2012) Forecasting Monthly Unemployment by Econometric Smoothing echniques, http://www.ecocyb.ase.ro/32012/virgil%20voineagu.pf