Ensemble Spatial Autoregressive Model on. the Poverty Data in Java

Similar documents
Paddy Availability Modeling in Indonesia Using Spatial Regression

Multilevel Structural Equation Model with. Gifi System in Understanding the. Satisfaction of Health Condition at Java

MODELING ECONOMIC GROWTH OF DISTRICTS IN THE PROVINCE OF BALI USING SPATIAL ECONOMETRIC PANEL DATA MODEL

Development of Space Time Model with. Exogenous Variable by Using Transfer Function. Model Approach on the Rice Price Data

SPATIAL AUTOREGRESSIVE MODEL FOR MODELING OF HUMAN DEVELOPMENT INDEX IN EAST JAVA PROVINCE

Financial Development and Economic Growth in Henan Province Based on Spatial Econometric Model

On Optimality Conditions for Pseudoconvex Programming in Terms of Dini Subdifferentials

Gaussian Copula Regression Application

Solving Homogeneous Systems with Sub-matrices

Multilevel modeling and panel data analysis in educational research (Case study: National examination data senior high school in West Java)

The Credibility Estimators with Dependence Over Risks

Diophantine Equations. Elementary Methods

Modelling Multi Input Transfer Function for Rainfall Forecasting in Batu City

PRELIMINARY ANALYSIS OF SPATIAL REGIONAL GROWTH ELASTICITY OF POVERTY IN SUMATRA

Comparing the Univariate Modeling Techniques, Box-Jenkins and Artificial Neural Network (ANN) for Measuring of Climate Index

Measuring The Benefits of Air Quality Improvement: A Spatial Hedonic Approach. Chong Won Kim, Tim Phipps, and Luc Anselin

Spatial Variation in Infant Mortality with Geographically Weighted Poisson Regression (GWPR) Approach

Morphisms Between the Groups of Semi Magic Squares and Real Numbers

Modeling Gamma-Pareto Distributed Data Using GLM Gamma

F9 F10: Autocorrelation

Empirical Comparison of ML and UMVU Estimators of the Generalized Variance for some Normal Stable Tweedie Models: a Simulation Study

Double Gamma Principal Components Analysis

Performance of W-AMOEBA and W-Contiguity matrices in Spatial Lag Model

Robust geographically weighted regression with least absolute deviation method in case of poverty in Java Island

On Monitoring Shift in the Mean Processes with. Vector Autoregressive Residual Control Charts of. Individual Observation

Locating Chromatic Number of Banana Tree

Luc Anselin and Nancy Lozano-Gracia

A Class of Z4C-Groups

On the Detection of Heteroscedasticity by Using CUSUM Range Distribution

Improvements in Newton-Rapshon Method for Nonlinear Equations Using Modified Adomian Decomposition Method

SIMULATION AND APPLICATION OF THE SPATIAL AUTOREGRESSIVE GEOGRAPHICALLY WEIGHTED REGRESSION MODEL (SAR-GWR)

Econometrics. 9) Heteroscedasticity and autocorrelation

Sequences from Heptagonal Pyramid Corners of Integer

Approximations to the t Distribution

Research Article A PLS-Based Weighted Artificial Neural Network Approach for Alpha Radioactivity Prediction inside Contaminated Pipes

Estimation of Mean Population in Small Area with Spatial Best Linear Unbiased Prediction Method

Alternative Biased Estimator Based on Least. Trimmed Squares for Handling Collinear. Leverage Data Points

Optimum Spatial Weighted in Small Area Estimation

Formulary Applied Econometrics

Poincaré`s Map in a Van der Pol Equation

A Panel Data Analysis of The Role of Human Development Index in Poverty Reduction in Papua

Introductory Econometrics

Characteristic Value of r on The Equation 11 x r (mod 100)

ON THE NEGATION OF THE UNIFORMITY OF SPACE RESEARCH ANNOUNCEMENT

A Note on Linearly Independence over the Symmetrized Max-Plus Algebra

On Symmetric Bi-Multipliers of Lattice Implication Algebras

SPACE Workshop NSF NCGIA CSISS UCGIS SDSU. Aldstadt, Getis, Jankowski, Rey, Weeks SDSU F. Goodchild, M. Goodchild, Janelle, Rebich UCSB

Non Isolated Periodic Orbits of a Fixed Period for Quadratic Dynamical Systems

Heteroskedasticity and Autocorrelation

Proceedings of the 8th WSEAS International Conference on APPLIED MATHEMATICS, Tenerife, Spain, December 16-18, 2005 (pp )

A Signed-Rank Test Based on the Score Function

Analysis of Blood Transfused in a City Hospital. with the Principle of Markov-Dependence

Estimation and Hypothesis Testing in LAV Regression with Autocorrelated Errors: Is Correction for Autocorrelation Helpful?

Z. Omar. Department of Mathematics School of Quantitative Sciences College of Art and Sciences Univeristi Utara Malaysia, Malaysia. Ra ft.

Spatial Statistics For Real Estate Data 1

Direct Product of BF-Algebras

Using AMOEBA to Create a Spatial Weights Matrix and Identify Spatial Clusters, and a Comparison to Other Clustering Algorithms

Remark on the Sensitivity of Simulated Solutions of the Nonlinear Dynamical System to the Used Numerical Method

Integration over Radius-Decreasing Circles

The EM Algorithm for the Finite Mixture of Exponential Distribution Models

Fuzzy Sequences in Metric Spaces

A Practical Method for Decomposition of the Essential Matrix

An Improved Hybrid Algorithm to Bisection Method and Newton-Raphson Method

Third and Fourth Order Piece-wise Defined Recursive Sequences

COLUMN. Spatial Analysis in R: Part 2 Performing spatial regression modeling in R with ACS data

Lecture 6: Hypothesis Testing

A Non-Parametric Approach of Heteroskedasticity Robust Estimation of Vector-Autoregressive (VAR) Models

OUTWARD FDI, DOMESTIC INVESTMENT AND INFORMAL INSTITUTIONS: EVIDENCE FROM CHINA WAQAR AMEER & MOHAMMED SAUD M ALOTAISH

Why Bellman-Zadeh Approach to Fuzzy Optimization

Section 2 NABE ASTEF 65

SPATIAL MODELING FOR HUMAN DEVELOPMENT INDEX IN CENTRAL JAVA

Double Total Domination on Generalized Petersen Graphs 1

Diameter of the Zero Divisor Graph of Semiring of Matrices over Boolean Semiring

Empirical Power of Four Statistical Tests in One Way Layout

ECON 312 FINAL PROJECT

Forecasting Egyptian GDP Using ARIMA Models

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Strong Convergence of the Mann Iteration for Demicontractive Mappings

iafor The International Academic Forum

From Binary Logic Functions to Fuzzy Logic Functions

Remarks on the Maximum Principle for Parabolic-Type PDEs

SOME BASICS OF TIME-SERIES ANALYSIS

ACG M and ACG H Functions

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Research on Independence of. Random Variables

One-Step Hybrid Block Method with One Generalized Off-Step Points for Direct Solution of Second Order Ordinary Differential Equations

On Uniform Convergence of Double Sine Series. Variation Double Sequences

International Mathematical Forum, Vol. 9, 2014, no. 36, HIKARI Ltd,

Lecture 4: Heteroskedasticity

An Abundancy Result for the Two Prime Power Case and Results for an Equations of Goormaghtigh

No

Parameter Estimation for ARCH(1) Models Based on Kalman Filter

Heteroskedasticity. y i = β 0 + β 1 x 1i + β 2 x 2i β k x ki + e i. where E(e i. ) σ 2, non-constant variance.

Caristi-type Fixed Point Theorem of Set-Valued Maps in Metric Spaces

Fixed Effects Models for Panel Data. December 1, 2014

On a Boundary-Value Problem for Third Order Operator-Differential Equations on a Finite Interval

On Some Identities of k-fibonacci Sequences Modulo Ring Z 6 and Z 10

Dynamical Behavior for Optimal Cubic-Order Multiple Solver

Econometrics Multiple Regression Analysis: Heteroskedasticity

Econometrics of Panel Data

Transcription:

Applied Mathematical Sciences, Vol. 9, 2015, no. 43, 2103-2110 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2015.4121034 Ensemble Spatial Autoregressive Model on the Poverty Data in Java Nurul Rohmawati Department of Statistics, Faculty of Mathematics and Natural Science Bogor Agricultural University, Indonesia Hari Wijayanto and Aji Hamim Wigena Department of Statistics, Faculty of Mathematics and Natural Science Bogor Agricultural University, Indonesia Copyright 2014 Nurul Rohmawati, Hari Wijayanto and Aji Hamim Wigena. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract Spatial data analysis using classical regression usually does not fulfill the assumption of error randomness because of the existence of autocorrelation among locations. The problem can be overcome using spatial autoregressive models. The other problem related to spatial data is error heterogeneity. Standard spatial autoregressive model is less appropriate to analyze there kind of spatial data. Therefore, in this paper we apply the ensemble technique with additive noise to spatial autoregressive model. This ensemble assessment used poverty data (percentage of poor society in Java). The result shows that the ensemble spatial autoregressive model is better than spatial autoregressive model without ensemble. Keyword: Ensemble technique, spatial autoregressive model 1. Introduction Spatial data analysis using classical regression does usually not fulfill the assumption of error randomness because of the existence of autocorrelation among locations or the existence of spatial effects. These, spatial effects are

2104 Nurul Rohmawati et al. spatial autocorrelation and spatial heterogeneity. Spatial autocorrelation occurs because of the spatial data dependency. Spatial heterogeneity occurs because the difference among the areas. Spatial autocorrelation on independent variables causes an inefficient estimator. These problems can be resolved by spatial autoregressive model considering the lag influence in independent variables. Spatial data are usually heterogen. This condition leads to the difficulty in fulfilling the assumption. Spatial autoregressive standard model usually is not able to solve. Therefore, it is required a robust method to overcome this condition such as ensemble technique. The ensemble technique combines a number of spatial autoregressive models applied to data with different additive noise. This technique is expected to improve the estimation stability and robust compared to the standard model. Some related research topics using spatial regression model are spatial regression models to determine the factors that influence the poverty on East Java [2]. Partial least square regression (PLSR) ensemble technique with additive noise was robust to any additive noise [5]. Ensemble Bayesian model averaging (EBMA) model was more compatible and showed a better result than Bayesian model averaging (BMA) and geostatistical output perturbation (GOP) [3]. The aim of this research is to apply the ensemble spatial autoregressive on poverty data in Java. The ensemble process is implemented by adding noise (additive noise) on percentage of poor society data and then builds the model using spatial autoregressive. Combining spatial autoregressive models is based on the average of model coefficients. The next sections of this paper are literature review, methodology, result, discussion, and conclusion. 2. Literature Reviews 2.1. Spatial Autoregressive Model In spatial autoregressive model, there is the function of dependent variable of the location j that used as independent variable to predict the value of independent variable in location i. The equation of spatial autoregressive model is [1]: y = ρwy + Xβ + ε (1) ε~n(0, σ 2 I) which y is a vector of dependent variable (n 1), ρ is a parameter of coefficient correlation of spatial autoregressive, W is spatial weighting matrix (n n), β is parameter of coefficient regression ((k + 1) 1), X is parameter of coefficient regression (n (k + 1)) and ε is error vector (n 1). 2.2. Ensemble Technique Ensemble technique is a technique to build a predictive model by combining a number of models. The ensemble technique becomes one of the important tech-

Ensemble spatial autoregressive model on the poverty data in Java 2105 niques in improving the prediction ability of standard models [8]. Two types of ensemble techniques are hybrid and non-hybrid techniques [5]. Hybrid ensemble technique involves various modeling algorithms and then combine the prediction of each algorithm into a final prediction. Whereas the non-hybrid ensemble technique uses one type of algorithm, and use it for many times to obtain many different models, and then the different prediction models are combined to be one prediction. This paper uses non hybrid ensemble technique by combining a number of spatial autoregressive models. A single spatial autoregressive model is built by adding the noise to the dependent variable. A noise is irregular interference on data [6]. An additive noise is generated that based on ε~n(0, σ). The noise is a small value of standard deviation based on the range of data. Equation of additive noise is as follows: m = y + ε z (2) where m is the vector of dependent variable after adding the noise, y is the vector of dependent variable before adding the noise and ε z ~N(0, σ). Combining process can be implemented using simple averaging, weighted averaging, voting, etc. Combining models for spatial regression can use the simple averaging by equation as follows [7]: T H(x) = 1 h T i=1 i(x) (3) h i (x) = f(x) + ε where H(x) is the combined model by averaging, suppose the underlying true function we try to models is f(x) and x is sampled according to a distribution, ε is error vector and T is the individual models (h 1,, h T ) and the output of h i, for the instance x is h i (x) R, our task is to combine h i s to attain the final prediction on the real valued variable. 3. Methodology 3.1. Data The data used in this paper is the poverty data on district/city of Susenas (Social Economic of National Survey) 2011 from BPS_Statistics Indonesia (Badan Pusat Statistik) [4]. This areas in this research ware 118 districts/cities in Java. The dependent variable is the percentage of poor community and the independent variables are the percentage from the literacy rate of poor community on age 15-55 years old (X 1 ), percentage of poor community over 15 years old working on informal sector (X 2 ), and percentage of household receiver of Jamkesmas (Health insurance card) (X 3 ). 3.2. Methodology The steps of this research are: 1. Calculate the weighting matrix of the regions by spatial Queen matrix.

2106 Nurul Rohmawati et al. 2. Add noise to the percentage of poor community (Y) resulting k new percentage of poor community data using the following steps: Generate the value of noise randomly based on normal distribution, ε~n(0, σ). Add noise to data so that the new data are relatively similar to the row data and random. If the new data negative then the new data is set to zero. 3. Spatial regression modeling is as follows: Lagrange multiplier test to identify the spatial dependence. Breusch Pagan test to identify the spatial heterogeneity. Develop and test a spatial regression model. Compute R-square of the spatial regression model. 4. Repeat step 3 k (100) times to have k sets of data. 5. Develop ensemble model based on the average of coefficients of k models using k sets of data. 6. Compare the ensemble spatial autoregressive model and spatial autoregressive without ensemble based on the value of RMSE. 4. Results and Discussion 4.1. Exploration of Poor People in Java (Y) Java consists of six provinces (Jakarta, West Java, Central Java, Yogyakarta, East Java, and Banten). The heterogeneity of poor community in Java is very high. It can be identified using the distribution of percentage on poor community in its district / city. The map in Figure 1 shows the percentage distribution of poor community in Java. Persentage of poor community Figure 1. Map distribution pattern of poor community in Java

Ensemble spatial autoregressive model on the poverty data in Java 2107 The percentage of poor people in each province are 10.68% in Jakarta, 15.23% in West Java, 23.78% in Central Java, 43.86% in Yogyakarta, 34.64% in East Java, and 7.04% in Banten. Therefore, the percentage of poor people (variable Y) is required to add noise. The noise corresponds to relatively small standard deviation. The data of variable Y with the noise will be relatively closed to the original data. The standard deviations tested are 1.55, 1.57, 1.59, 1.61, 1.63, 1.65, 1.67, 1.69, 1.71, and 1.73. The data with those values produce the similar patterns to the original data (Figure 2). 30 25 Standard deviation 20 15 10 5 0 0 20 40 60 80 100 120 Observation Figure 2. Plot data Y and data Y with noise 4.2. Spatial Effect Test The spatial effect test is used to test spatial dependencies and spatial heterogeneity. Lagrange multiplier (LM) test is used to detect the presence of spatial dependence lag. Breusch Pagan (BP) test is used to detect the presence of spatial heterogeneity. The LM test statistic will be closed to χ 2 with degree of freedom 1 under H 0 is true, so the H 0 is rejected at significant level of α, if the value of LM is greater than the value of χ 2 (1). The LM value in the research is 20.2051 so at α= 0.05 this LM value is more than value of χ 2 (0.05,1) (3.84) and then H 0 is rejected. This fact means that there is a spatial lag. The next spatial effect test is spatial homogeneity test. Under the condition H 0, the statistic BP test will be closed to χ 2 (0.05,3), so the H 0 is rejected at significant level of α, if the value BP is greater than χ 2 (0.05,3). In this research, the value of BP is 8.3864 is more than χ 2 (0.05,3) = 7.851 which means that the data at all locations are not homogeneity. The results of the spatial lag dependence test and spatial heterogeneity show the presence of spatial effects in the data. Considering these facts, the spatial autoregressive models are used. 4.3. Spatial Autoregressive Model Based on lagrange multiplier test, the spatial autoregressive model is developed for all districts/cities in Java. This model is as follows:

2108 Nurul Rohmawati et al. ŷ = 21.843 + 0.3789Wy 0.2558X 1 + 0.1431X 2 + 0.0602X 3 The estimate of parameter, ρ, is 0.3789. This means that all districts and cities are correlated or they affect one to another. The value on coefficient of determination (R 2 ) is 64.1%. This value indicates that the variance of the poor community percentage which can be explained by the model is 64.1%. Factors which affect the dependent variable are the poor literacy rate at the age of 15-55 years (X 1 ), the poor that working informal sector (X 2 ), and the number of households receiving Jamkesmas card (X 3 ). According to the coefficients of the model, especially the coefficients of each independent variable, these coefficients can be interpreted as follows. If the poor literacy rate at the age of 15-55 years (X 1 ) increases 1% then the percentage of poor people will decrease about 0.2558%; if the poor that working informal sector (X 2 ) increases 1% then the percentage of the poor increases about 0.1431%; and if the number of households receiving Jamkesmas card (X 3 ) increases 1% then the percentage of poor people also increases about 0.0602%. The value of RMSE of the spatial autoregressive model is relatively small (0.0005). 4.4. Ensemble Spatial Autoregressive Data of variable Y with additive noises show the similar patterns. For each noise, 100 spatial autoregressive models are developed with independent variables, X1, X2, and X3. Then, the models for every noise are combined to develop an ensemble spatial autoregressive model. The ensemble spatial autoregressive models with every noise are showed in Table 1. The coefficients of the models are relatively similar. This condition indicates that the addition of a small noise does not change the coefficients significantly. Table 1 Ensemble spatial autoregressive models for each noise Noise Ensemble spatial autoregressive model ε~n(0,1.55) ŷ = 22.9146 + 0.3399Wy 0.2641X 1 + 0.1479X 2 + 0.0605X 3 ε~n(0,1.57) ŷ = 22.9388 + 0.3389Wy 0.2643X 1 + 0.1480X 2 + 0.0605X 3 ε~n(0,1.59) ŷ = 22.9633 + 0.3381Wy 0.2645X 1 + 0.1481X 2 + 0.0605X 3 ε~n(0,1.61) ŷ = 22.9878 + 0.3371Wy 0.2647X 1 + 0.1482X 2 + 0.0605X 3 ε~n(0,1.63) ŷ = 23.0126 + 0.3362Wy 0.2649X 1 + 0.1483X 2 + 0.0605X 3 ε~n(0,1.65) ŷ = 23.0377 + 0.3352Wy 0.2651X 1 + 0.1484X 2 + 0.0606X 3 ε~n(0,1.67) ŷ = 23.0629 + 0.3343Wy 0.2653X 1 + 0.1485X 2 + 0.0606X 3 ε~n(0,1.69) ŷ = 23.0882 + 0.3334Wy 0.2655X 1 + 0.1486X 2 + 0.0606X 3 ε~n(0,1.71) ŷ = 23.1137 + 0.3324Wy 0.2657X 1 + 0.1488X 2 + 0.0606X 3 ε~n(0,1.73) ŷ = 23.1393 + 0.3315Wy 0.2659X 1 + 0.1489X 2 + 0.0606X 3 The ensemble spatial autoregressive model with the noise 1.65 is as follows: ŷ = 23.0377 + 0.3352Wy 0.2651X 1 + 0.1484X 2 + 0.0606X 3

Ensemble spatial autoregressive model on the poverty data in Java 2109 The estimate of parameter, ρ, is 0.3352. This means that all districts and cities are correlated or they affect one to another. According to the coefficients of the model, especially the coefficients of each independent variable, these coefficients can be interpreted as follows. If the poor literacy rate at the age of 15-55 years (X 1 ) increases 1% then the percentage of poor people will decrease about 0.2651%; if the poor that working informal sector (X 2 ) increases 1% then the percentage of the poor increases about 0.1484%; and if the number of households receiving Jamkesmas card (X 3 ) increases 1% then the percentage of poor people also increases about 0.0606%. The ensemble spatial autoregressive model and the spatial autoregressive model without ensemble are compared based on RMSE. The better model is with smaller RMSE. Table 2 The values of RMSE of each model Regression model Noise RMSE values Autoregressive spatial 0.0005 Ensemble Autoregressive spatial ε~n(0,1.55) 0.0004 ε~n(0,1.57) 0.0003 ε~n(0,1.59) 0.0002 ε~n(0,1.61) 0.0001 ε~n(0,1.63) 0.0000 ε~n(0,1.65) 0.0000 ε~n(0,1.67) 0.0001 ε~n(0,1.69) 0.0002 ε~n(0,1.71) 0.0003 ε~n(0,1.73) 0.0004 Table 2 shows that the RMSE of ensemble spatial autoregressive model is less than the RMSE of spatial autoregressive model without ensemble. The ensemble spatial autoregressive models (with the values of RMSE are less than 0.0005), is better than the spatial autoregressive model without ensemble. This fact indicates that the ensemble spatial autoregressive model can improve the estimates to be more stable and robust. 5. Conclusion The ensemble spatial autoregressive model with additive noise is better than the spatial autoregressive model without ensemble. The ensemble spatial autoregressive model can increase the estimator capability, stable and more robust.

2110 Nurul Rohmawati et al. References [1] Anselin L. 1998. Spatial Econometrics: Method and Model. Dordrecht: Kluwer. Academic Publishers. [2] Arisanti R. 2010. Performance Spatial Regression Models for detecting factors of poverty in East Java Province. [Thesis]. Bogor: Postgraduate Program. Bogor Agricultural University. [3] Berrocal VJ, Raftery AE, Genaiting T. 2007. Combining spatial statistical and ensemble information in probabilistic weather forecasts. Monthly Weather Review, American Meteorology society. (135), 1386-1402. http://dx.doi.org/10.1175/mwr3341.1 [4] [BPS] Badan Pusat Statistik. 2013. Number of Poor and The Percentage of Poor People in Indonesia. http://www.bps.go.id/getfile.php [5] Bock De, Koen W, Coussement K, Poel DVD. 2010. Ensemble Classification based on Generalized Additive Models. Computional Statistics & Data Analysis 54(6):1535-1546. http://dx.doi.org/10.1016/j.csda.2009.12.013 [6] Mevik HB, Segtnan VH, Naes T. 2005. Ensemble methods and partial least squares regression. Journal of Chemometrics; 18(11), 498-507. http://dx.doi.org/10.1002/cem.895 [7] Zhou HZ. 2012. Ensemble Methods Foundations and Algorithms. Florida (US): CRC Pr. [8] Zhu M. 2008. Kernels and ensemble: Perspectives on statistical learning. The American Statistican. 62 (2):97-109. http://dx.doi.org/10.1198/000313008x306367 Received: December 24, 2014; Published: March 20, 2015