Small Domain Estimation for a Brazilian Service Sector Survey

Similar documents
Contextual Effects in Modeling for Small Domains

Small Area Estimates of Poverty Incidence in the State of Uttar Pradesh in India

REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY

A Fay Herriot Model for Estimating the Proportion of Households in Poverty in Brazilian Municipalities

On Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness

Combining data from two independent surveys: model-assisted approach

ESTP course on Small Area Estimation

Paper N National Soil Erosion Research Lab

Log-linear multidimensional Rasch model for capture-recapture

Estimation Techniques in the German Labor Force Survey (LFS)

arxiv: v2 [math.st] 20 Jun 2014

Estimation of Parameters

An alternative use of the Verdoorn law at the Portuguese nuts II level

Weighting of Results. Sample weights

INTRODUCTION TO BASIC LINEAR REGRESSION MODEL

Introduction to Survey Data Integration

Trends in the Relative Distribution of Wages by Gender and Cohorts in Brazil ( )

Economic Impacts of Heritage Tourism in St. Johns County, Florida, by Tom Stevens, Alan Hodges and David Mulkey.

Recitation 1: Regression Review. Christina Patterson

Finite Population Sampling and Inference

Trade and Inequality: From Theory to Estimation

Non-parametric bootstrap and small area estimation to mitigate bias in crowdsourced data Simulation study and application to perceived safety

TERMS OF TRADE: THE AGRICULTURE-INDUSTRY INTERACTION IN THE CARIBBEAN

Resampling techniques for statistical modeling

Estimation of Mean Population in Small Area with Spatial Best Linear Unbiased Prediction Method

A MODEL-BASED EVALUATION OF SEVERAL WELL-KNOWN VARIANCE ESTIMATORS FOR THE COMBINED RATIO ESTIMATOR

y response variable x 1, x 2,, x k -- a set of explanatory variables

Business Statistics. Lecture 9: Simple Regression

Inference for Regression Simple Linear Regression

Advanced Methods for Agricultural and Agroenvironmental. Emily Berg, Zhengyuan Zhu, Sarah Nusser, and Wayne Fuller

Spatial Modeling and Prediction of County-Level Employment Growth Data

In July, US wholesalers inventories were flat month over month, and their sales declined. The inventory-to-sales ratio was 1.34.

Estimation of change in a rotation panel design

Formalizing the Concepts: Simple Random Sampling. Juan Muñoz Kristen Himelein March 2012

You are permitted to use your own calculator where it has been stamped as approved by the University.

Small domain estimation using probability and non-probability survey data

Eric V. Slud, Census Bureau & Univ. of Maryland Mathematics Department, University of Maryland, College Park MD 20742

Labour Market Areas in Italy. Sandro Cruciani Istat, Italian National Statistical Institute Directorate for territorial and environmental statistics

Log-linear Modelling with Complex Survey Data

Small Area Housing Deficit Estimation: A Spatial Microsimulation Approach

Multivariate beta regression with application to small area estimation

This is a repository copy of Estimating Quarterly GDP for the Interwar UK Economy: An Application to the Employment Function.

Euro-indicators Working Group

Use of Dummy (Indicator) Variables in Applied Econometrics

THE APPLICATION OF GREY SYSTEM THEORY TO EXCHANGE RATE PREDICTION IN THE POST-CRISIS ERA

STATS DOESN T SUCK! ~ CHAPTER 16

LFS quarterly small area estimation of youth unemployment at provincial level

A Practitioner s Guide to Generalized Linear Models

Formalizing the Concepts: Simple Random Sampling. Juan Muñoz Kristen Himelein March 2013

Business Statistics. Lecture 10: Correlation and Linear Regression

Taking into account sampling design in DAD. Population SAMPLING DESIGN AND DAD

Inference for Regression Inference about the Regression Model and Using the Regression Line

DISTRICT ESTIMATES OF HOME DELIVERIES IN GHANA: A SMALL AREA ANALYSIS USING DHS AND CENSUS DATA

Economic and Resident Baseline

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION

Formelsammlung zu Wirtschaftstheorie IV

Review of Statistics

Export nancial support of Brazilian manufactured products: a microeconometric analysis

ECONOMETRIC THEORY. MODULE VI Lecture 19 Regression Analysis Under Linear Restrictions

Financial Econometrics and Quantitative Risk Managenent Return Properties

Modi ed Ratio Estimators in Strati ed Random Sampling

The Global Statistical Geospatial Framework and the Global Fundamental Geospatial Themes

Econometrics of Panel Data

Data Analysis and Statistical Methods Statistics 651

Lecture 9. Matthew Osborne

Conditional density estimation: an application to the Ecuadorian manufacturing sector. Abstract

PREDICTION OF GROSS DOMESTIC PRODUCT (GDP) BY USING GREY COBB-DOUGLAS PRODUCTION FUNCTION

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models

Trip Generation Model Development for Albany

NOWCASTING REPORT. Updated: August 17, 2018

Understanding China Census Data with GIS By Shuming Bao and Susan Haynie China Data Center, University of Michigan

Some Theoretical Properties and Parameter Estimation for the Two-Sided Length Biased Inverse Gaussian Distribution

Small Area Estimation in R with Application to Mexican Income Data

The Application and Promise of Hierarchical Linear Modeling (HLM) in Studying First-Year Student Programs

SUBJECT: Non paper on the size, nature and dynamics of the blue economy, 15 September 2015, prepared by DG MARE

sociology 362 regression

Brazil Experience in SDG data production, dissemination and capacity building

The effects of errors in measuring drainage basin area on regionalized estimates of mean annual flood: a simulation study

Steps in Regression Analysis

Research Article Some Improved Multivariate-Ratio-Type Estimators Using Geometric and Harmonic Means in Stratified Random Sampling

Influence of parameter estimation uncertainty in Kriging: Part 2 Test and case study applications

Warwick Business School Forecasting System. Summary. Ana Galvao, Anthony Garratt and James Mitchell November, 2014

Inference for the Regression Coefficient

BRITISH VIRGIN ISLANDS SECTORAL GROSS DOMESTIC PRODUCT MARKET PRICES (current prices) (US$M)

GRADE 8 LEAP SOCIAL STUDIES ASSESSMENT STRUCTURE. Grade 8 Social Studies Assessment Structure

Notes on Heterogeneity, Aggregation, and Market Wage Functions: An Empirical Model of Self-Selection in the Labor Market

BRAZILIAN APPROACH FOR GEOGRAPHICAL CLASSIFICATION AND DISSEMINATION OF STATISTICAL DATA

Economic and Social Council

Charting Employment Loss in North Carolina Textiles 1

Geocoding of Statistics Portugal Business Register and it s integration with the INSPIRE s Annex III Buildings theme

ECON Introductory Econometrics. Lecture 17: Experiments

Economic Survey of the Hotel and Catering sector. Calculating Sampling Errors

Implications of Ignoring the Uncertainty in Control Totals for Generalized Regression Estimators. Calibration Estimators

THE PRINCIPLES AND PRACTICE OF STATISTICS IN BIOLOGICAL RESEARCH. Robert R. SOKAL and F. James ROHLF. State University of New York at Stony Brook

Econ 3790: Statistics Business and Economics. Instructor: Yogesh Uppal

Development of methodology for the estimate of variance of annual net changes for LFS-based indicators

MAPPING THE EXPOSURE OF THE BRAZILIAN POPULATION TO NATURAL BACKGROUND RADIATION COSMIC RADIATION

Final Exam - Solutions

Comparison of Some Improved Estimators for Linear Regression Model under Different Conditions

Propensity Score Matching and Genetic Matching : Monte Carlo Results

Transcription:

Proceedings 59th ISI World Statistics Congress, 5-30 August 013, Hong Kong (Session CPS003) p.334 Small Domain Estimation for a Brazilian Service Sector Survey André Neves 1, Denise Silva and Solange Correa 3 1 Brazilian Institute of Geography and Statistics, Rio de Janeiro, Brazil National School of Statistical Sciences, Rio de Janeiro, Brazil 3 University of Southampton, Southampton, United Kingdom Corresponding author: André Neves, email: andre.neves@ibge.gov.br Abstract Small domain estimation covers a variety of methods used to produce survey based estimates for geographical areas or domains of study in which the sample sizes are too small to provide reliable direct estimates. This occurs when the survey is not designed for estimation at the required level. In order to obtain reliable estimates, additional datasets are generally brought to bear upon the process and a possible solution for this problem is the use of model based estimators. In the case of economic surveys, there are additional issues related to the asymmetry of the data and the non-linear relationship among economic variables. This paper presents the use of small area estimation methods for the Service Annual Survey conducted by the Brazilian Institute of Geography and Statistics. Due to the sampling design, sample estimates for some economic activities in the North, Northeast and Midwest regions of Brazil have low precision. To solve this problem, an area level Fay-Herriot model is used to produce model based estimates. The small domain estimation model relates operating revenues with auxiliary variables obtained from a Business Register and provides results showing improvement in precision for the maority of the domains considered. Key Words: official statistics, business surveys, sample surveys, small area estimation. 1 Introduction The Brazilian Institute of Geography and Statistics (IBGE) carries out regular surveys for different sectors of the economy, including the Service Sector Annual Survey (Pesquisa Anual de Serviços PAS) which focuses on segments of the tertiary sector (IBGE, 010). Due to the growing demand for information with greater spatial detail and thematic range (SILVA and CLARKE, 008), the traditional process of producing only design based estimates from this economic survey is not satisfactory. Depending on the geographic region, the survey provides information for industry service sectors at different levels of aggregation. For the South and Southeast regions, survey estimates are produced by economic activities according to disaggregation level defined by four-digit codes of the National Classification of Economic Activities. This classification was developed based on the International Standard Industrial Classification. For States in the North, Northeast, Midwest regions, the survey provides estimates by the so-called group (a level of disaggregation related to three-digit codes of the economic activity classification). The main obective of this paper is to apply model based methods to estimate the total operational gross revenue by areas (States) and economic activities. The Brazilian Service Sector Annual Survey The survey encompasses a set of activities from the services sector and investigates enterprises economic and financial characteristics such as revenue and expenses. Estimates are published for groups of economic activities according to the National Classification of Economic Activities and by States (IBGE, 010). The sampling unit is the enterprise and the sampling frame is a business register maintained by IBGE based on administrative records. Its sample design is stratified by economic activities and geographical areas (States) and also according to the number of employees on 31 st of December, as reported in the business register. The sample design involves a take-all stratum comprised of those enterprises with 0 or more

Proceedings 59th ISI World Statistics Congress, 5-30 August 013, Hong Kong (Session CPS003) p.335 employees and enterprises that have establishments in more than one Brazilian State. The remainder of the sampling frame units (enterprises with less than 0 employees and operating in only one State) is then distributed into three other strata according to number of employees (0-4, 5-9 and 10 to 19).The final sample is comprised by those enterprises from the take-all stratum and by those that are randomly selected by simple random sampling within the strata defined by economic activities, geographical areas (States) and enterprise size (number of employees).in this work, a subset of economic activities was considered as the domains of study. The idea is to focus the analysis on activities in which the enterprises operate mainly in one State only. Table 1 displays the subset of domains considered, according to the sample design. For most of the States, direct survey estimates are solely produced by group (3-digit economic classification). Table 1 - Disaggregation level of economic classification for which direct estimates are published services considered in this study Services Economic Classification For States in South and Southeast Regions For Other States Food and beverage service activities 5611-561 Renting of video tapes and disks 77-5 Renting of clothing, ewellery and accessories 773-3 77 Teaching of art and culture 859-9 Foreign language instruction 8593-7 859 Activities of fitness center 9313-1 931 Washing and cleaning of textile and fur products 9601-7 960 Hairdressing and other beauty treatment 960-5 Source: IBGE, Service Annual Survey 008. The population of enterprises in the activities showed in Table.1 is comprised of 76,31 enterprises out of 1,,13 listed in the 008 sampling frame. The sample, in turn, consists of 11,751 enterprises and 13 domains (defined by states and class of economic activity). In order to set up a small area estimation framework it is necessary to define the target of the estimation (the variable of interest and the geographical level or domain requirements) and also to identify the potential auxiliary data. In this paper, the variable of interest is the gross operating revenue. Auxiliary variables were obtained from administrative data produced by the Brazilian Ministry of Labour and Employment based on information provided compulsorily by enterprises each year: employed persons, wages and number of establishments. The focus of this work is to produce model based estimates by class (4-digit economic classification) for a subset of services in all 7 States of Brazil in order to assess the potential use of standard small area estimation methods for improving the production of official statistics from Brazilian business surveys. 3 The Fay-Herriot Area Level Model The well-known Fay-Herriot model (Fay and Herriot, 1979) is defined by two equations representing: the sampling model (3.1), that relates the direct estimator to the true parameter value, and the binding model (3.) that relates the quantity of interest to the auxiliary variables for the small domains (Sample Proect, 011; Rao, 003; Pfeffermann and Correa, 01). Let Y be the population total of y for domain and Y ~ the corresponding direct survey estimator. The area level model is given by:

Proceedings 59th ISI World Statistics Congress, 5-30 August 013, Hong Kong (Session CPS003) p.336 ~ Y = Y t + ε (3.1) Y = x β + u, (3.) where: ind = 1,..., J are the domains; ε ~ N (0, σ ) - is the sampling error related to with mean zero and known sampling varianceσ, u ~ N(0, σ ) is the random effect of domain with mean equal to zero and variance iid u σ u, ε and Y ~, u are mutually independent. The vector of auxiliary variables for domain is defined by the column t t vector x = ( x 1,..., x ) whereas β = ( β1,..., β is a column vector of unknown P parameters. Putting together (3.) in (3.1), the Fay-Herriot model is defined by: ~ t Y = x β + u + ε (3.3) The above formulation specifies a linear model with a random intercept that varies by small domains. As the model relates the small domain totals to a group of auxiliary variables at the domain level, this type of model is often referred to as a basic area level model. In the case of business surveys, there are additional issues related to the asymmetry of the data and the non-linear relationship among economic variables that has to be considered. It is common to apply a logarithmic transformation to the response variable in an attempt to represent economic data by linear models. To obtain the variance estimator of the transformed variable, the Taylor s approximation (Bishop et al, 1975) can be used, such that ~ ~ ~ Var( g( Y )) g'[ E( Y )] Var( Y ), where g is the first order derivative of the ln (natural logarithm) function and, hence, ~ Var( g( Y ~ )) CV ( Y ). The model based estimates as well as the corresponding mean squared errors (mse) in the log-transformed scale were obtained using the well-known estimators proposed by Fay and Herriot (1979). The estimates were converted back to the original scale based on the properties of the log-normal distribution (RAO, 003). The estimates and corresponding mean squared errors in the original scale are given by: Yˆ = exp{ Yˆ + 0.5 [ mse( Yˆ )]} and,, Final,, ˆ ˆ mse Y ) = exp( Y ) mse( Yˆ ). (,, Final,, 4 Results The Fay-Herriot estimator in equation (3.8) was used to produce model based estimates for small domains for the 008 Brazilian Service Sector Annual Survey. The model dependent variables are the survey direct estimates of total revenue for the small domains defined by geographical area and economic activity as described in the previous section. The auxiliary variables used in the model are the total number of employed persons, wages and number of establishments. For this study in particular, graphical analysis and statistical tests provided evidence that a double-log model would be more suitable to model the relationship between the direct estimates and the auxiliary variables. This functional form consists in applying logarithm transformation to the dependent variable and auxiliary variables as presented in Table. Table Regression coefficient estimates for the Fay-Herriot model Standard Auxiliary Variables Estimates Z-value P-value error Intercept.358 0.486 4.856 <0.000 Logarithm of employed persons 0.19 0.058.4 <0.030 Logarithm of wages 0.878 0.057 15.496 <0.000 P )

Proceedings 59th ISI World Statistics Congress, 5-30 August 013, Hong Kong (Session CPS003) p.337 A scatter plot of direct survey estimates (y-axis) against model based estimates (x-axis) is one method of assessing whether the relationship between the target variable and the covariates has been properly specified. The small domain estimation approach was used to overcome the problem of small sample sizes within domains. However, it is well known that, although in general more precise, the resulting model based estimates are biased. The aim of the estimation procedure adopted here is to balance the trade-off between variability and bias, producing estimates with good precision and as little bias as possible. Although the direct estimates are very variable, they are nevertheless unbiased. Thus a scatterplot of direct estimates (on the y axis) against model based estimates (on the x axis) should display a regression line close to the y=x line if the model based estimates are also unbiased. Figure 1 (a) shows a scatter plot of the total revenue estimates to investigate if the model based estimates are approximately unbiased. The results show that the direct and model-based estimates do appear to track each other on the logarithm scale. The graph on the original scale (Figure1 (b)) has many overlapping data due to the asymmetric distribution of the variable gross operating revenue. To further evaluate the presence of bias in the model-based estimates, a regression line was fitted for the log direct and log model-based estimates, and the intercept was not significantly different from zero, indicating that there is no evidence to reect the hypothesis of lack of bias in the model-based estimates. Figure1 Scatter plots for direct estimates and model based Fay-Herriot estimates (), in the logarithmic scale (a) and in the original scale (b): (a) (b) To evaluate the accuracy of the model estimates, the coefficients of variation (CVs) were calculated for the direct and model based () estimates. The comparison considers the MSE estimates in the original scale. The highest CVs of the direct estimates are greatly reduced with the use of the model-based estimator (Figure ). Figure provides evidence that for the maority of domains, the precision of the model based figures is better than for the survey direct estimates since the small area estimation approach produces estimates with lower CVs for 83.5% of the domains considered when compared with those obtained for the direct estimates. Table 3 presents the results for the State of Piauí. A more detailed analysis can be found in Neves (01).

Proceedings 59th ISI World Statistics Congress, 5-30 August 013, Hong Kong (Session CPS003) p.338 Figure. Direct and model based estimates of total gross revenue and corresponding precision estimates for specific domains 100 Direct Estimator -FH Estimator CV(%) 80 60 40 0 0 0 50 100 150 Domains in order of increasing sample size* *This numeration is the list of domains. About domains size distribution, see Neves, 01. Table 3 Estimates for the domains and their coefficients of variation State of Piauí Domains Direct Estimation CV -FH Estimation Food and beverage service activities 77.438.734 1,1 85.11.570,9 Renting of video tapes and disks 1.18.45 6,6.138.840 4,9 Renting of clothing, ewellery and accessories 1.96.51 41,7 1.83.639 3,6 Teaching of art and culture 3.189.31 37,0 3.644.969,8 Foreign language Instruction.555.536 14,1.968.606 3,6 Activities of fitness center 3.083.838 39,7 4.94.355, Washing and cleaning of textile and fur products 6.57.175 19,1 9.991.417 1,1 Source: IBGE, PAS 008. CV The Shapiro-Wilk test provides the W test statistic to assess whether the standardized residuals ~ ξ = ( ε + u ) / σ + σ follow a standard normal distribution. Small values u of W are evidence against the hypothesis of normality. The W statistic is given by: W J = 1 = J = 1 ~ a ξ ~ ~ ( ξ ξ ), (4.1) where ~ ξ are the ordered values of the standard residuals, ~ ξ is the corresponding mean and a are generated from a normal distribution with the means, variances and covariances of the order statistics for a sample of size J (Hidiroglou, 011). In this study we obtained W = 0.766 and p-value =.4 10-6, which yields to reection of the hypothesis of normality of the standardized residuals. 5 Conclusions In this study we have considered the Fay-Herriot model, a procedure for small area estimation commonly cited in the literature. The overall performance of the Fay- Herriot model was very good, showing lower CVs for the model based estimators for 83% of the domains considered when compared to the CVs obtained for the direct estimates. However, the statistical tests showed that the model residuals do not meet the assumption of normality, which indicates the need for additional study in this direction. This article applies a well-known method for small domain estimation with application to a real economic survey data conducted by IBGE. The main obective of this article is to take another step in the direction of development and application of

Proceedings 59th ISI World Statistics Congress, 5-30 August 013, Hong Kong (Session CPS003) p.339 small area estimation approaches to IBGE s economic surveys. The promising results found in this study along with its practical importance and relatively new initiative of application to Brazilian economic survey data make this topic an area that certainly deserve further research. BIBLIOGRAPHY BISHOP, Y.M.M; FIENBERG, S.E.; HOLLAND, P. W. Discrete Multivariate Analysis: Theory and Practice. The MIT Press, Cambridge-Massachussets, London-England, 1975. FAY, R. E., HERRIOT, R. A. Estimates of Income for Small Places: An Application of James-Stein Procedures to Census Data. Journal of the American Statistical Association, Vol. 74, n 366. Jun/79, p.69-77. PFEFFERMANN, D., CORREA, S. Empirical Bootstrap Bias Correction and Estimation of Prediction Mean Square Error in Small Area Estimation. Biometrika, Vol. 99, n. April/01, p.457-47. IBGE. Pesquisa Anual de Serviços 008. Diretoria de Pesquisas, Coordenação de Serviços e Comércio, 010. HIDIROGLOU, M. A. Small area estimation Fay-Herriot Area Level Model with Estimation (methodology specifications). Methodology Software Library, 11/07/011. NEVES, A. F. A. Estimação em Pequenos Domínios Aplicada à Pesquisa Anual de Serviços 008. Dissertação do Mestrado em Estudos Populacionais e Pesquisas Sociais. Escola Nacional de Ciências Estatísticas. Rio de Janeiro, Jul/01. RAO, J.N.K. Small Area Estimation. New York, Wiley, 003. SAMPLE Proect. Software Beta on Small Area Estimation. Deliverable number 13. Link: www.sample-proect.eu/images/stories/docs/samplewpd13_softbeta.pdf SILVA, D. B. N; CLARKE, P. Some Initiatives on Combining Data to Support Small Area Statistics and Analytical Requirements at ONS-UK. Paper presented at the IAOS 008 Conference on Reshaping Official Statistics.