Regression Analysis of 911 call frequency in Portland, OR Urban Areas in Relation to Call Center Vicinity Elyse Maurer March 13, 2015 Introduction: Using the Linear Regression and Geographically Weighted Regression tutorial, I was able to determine areas of Portland, Oregon that have high frequencies of 911-calls in relation to emergency stations. This analysis focused on reducing the number of 911-calls in order to reduce the amount of public resources used to accommodate these calls. Initial projections of hot spot analysis indicate that the Portland, Oregon metropolitan area will see their 911-call frequency double over the next 10 years (Figure 1). A regression analysis is important in this scenario because the goal is to determine the issue of 911-call frequency. In order to analyze this, we can take current and past data and project it into the future. By predicting future trends, we can counteract the issue before it becomes a problem.
Figure 1. This map indicates the initial hot spot analysis for the Portland, OR urban area. A hot spot analysis represents areas of high 911 call frequency in relation to the closest call center as indicated by the green crosses. We can see a definite concentration in the central portion of the northern region of this map. In order to identify the cause of the high volume of calls, individual factors can be assessed using the Geographically Weighted Regression tool as seen in figure 2. The hot spot analysis paired with the individual variable data can provide a basis for future predictions and how to mediate the 911 call volume.
Analysis: All of the data needed for analysis was provided for me through the tutorial. I did create derived data of my own of which I stored in my own workspace. I began by checking my data to ensure that it was in the same format as the tutorial. This included turning on/off individual layers, and enabling Spatial Statistics Toolbox. Within the Spatial Statistics Toolbox, I set my workspace parameters and my output coordinate system. For this tutorial, my output coordinate system was set to NAD 1983 State Plane Oregon North FIPS 3601 Feet. Additionally, I disabled background processes by going into the Geoprocessing menu and selecting Geoprocessing Options. Next, I ran the OLS (Ordinary Least Squares) tool which allowed me to explain patterns where volume is a function of population. For this tool, I used the ObsData911Calls layer as my input, the Unique ID fields as UniqID, the Output Feature Class named as OLS911Calls.shp, the Dependent Variable as calls, and the Explanatory Variables as Pop. This provided me a map with varying polygons of shape and color. For my map, the darker red polygons indicated strong correlation between population and volume while the dark blue polygons indicated areas of weak correlation between population and volume. Through the tutorial, we determined that this analysis could be furthered by also using LowEd, Jobs, Dst2UrbCen, and Pop as explanatory variables. Using these variables allowed me to visually determine the source of the high volume calls. This provided me a more accurate map of correlation for each of the variables (Figure 2).
Figure 2. This map depicts the four analysis (Low Education, Jobs, Distance to Urban Center, and Population) points that show high and low levels of correlation in relation to the call center proximity. Each of the maps show the urban area of Portland, OR. This map was generated using the Geographically Weighted Regression (GWR) tool. By indicating each of these four variables, I am able to view areas of needed improvement based upon the variable used. For instance, areas that are of concern for the Distance to Urban Center variable are not of concern for the Low Education variable. This provides a better understanding of how we can improve future use of 911 calls.
I then used the Spatial Autocorrelation tool. This tool allowed me to check for random distribution in our dataset. The output from the Spatial Autocorrelation tool provided me an HTML file with results pertaining to distribution patterns (Figure 3). My results indicated that my patterns are randomly distributed. Figure 3. This image is the HTML output for my above images. This shows the bell curve that indicates correlation of the data. All beige areas (center of the bell curve) indicate randomness, while the outlying areas (reds and blues) indicate significance. This model will be used later in the tutorial to determine standard deviation of 911-call predictions.
Next, I ran the Geographically Weighted Regression tool with an input of the ObsData911Calls layer, a dependent variable of Calls, the Explanatory Variables set to Pop, Jobs, LowEduc, Dst2UrbCen, the Output Feature Class as ResultsGWR.shp, the Kernal Type as ADAPTIVE, and Bandwidth Method set to AICs. AICs allow the tool to determine the optimal number of neighbors. Finally, I used the GWR Model to create a predictive analysis of the future 911-calls. I used Call Data as the input feature class, Calls as the dependent variable, Pop, Jobs, LowEduc, and Dst2UrbCen as the explanatory variables, named my output shapefile name, changed my kernal type to Adaptive, my bandwidth method to Bandwidth Parameter, number of neighbors to 50, prediction locations to Prediction Locations, the prediction explanatory variables to PopFY, JobsFY, LowEducFY, Dst2UrbCen (matching this order exactly), and finally, setting a name for my output prediction feature class. This model resulted in predictions for the overall urban area of Portland, OR using the four Explanatory Variables stated above and was also my final output of this tutorial (Figure 4). Predictions can be symbolized as standard deviations from the mean which can be seen in figure 4.
Figure 4. This map indicates the confidence level of the predicted trends in 911 call frequency throughout the Portland, OR urban area. The concept of standard deviation representation refers to figure 3 where we can see areas of random distribution in the center of the bell curve and areas of significance outlying the central portion. This corresponds to this map by showing areas in beige as having random distribution and areas of red and blue having significance. Beige areas are within -0.5 to 0.5 intervals from the mean and areas of blue and red range from -2.5 to -0.5 intervals from the mean and 0.5 to 2.5 intervals from the mean, respectively.
Applications: A regression analysis could be used for virtually any quantitative data type. If you have numbers associated with two variables then you can quite easily perform a regression analysis to determine the strength of correlation between these two variables. The tighter the points on the scatterplot, the closer the correlation between variables. For my ENVS 422 research project, I can apply regression analysis to indicate areas of vulnerability. By applying multiple variables to a model such as the GWR model used in the tutorial, I would be able to view current and predictive vulnerability or areas across the United States. This would be an important application to natural hazards planning due to the fact that our world is rapidly changing in such things as climate and infrastructure, both key factors that determine vulnerability. Some specific applications of regression analysis include climate, urban, monetary, or disaster analysis. Climate analysis would be a great representation of regression modeling because you could take two variable such as precipitation and occurrence of flooding to determine the strength of correlation between these two fields. Areas of high precipitation may be expected to have high occurrences of flooding. Urban analysis would be benefitted by regression analysis for things such as future predictions. If a factor such as population census data was paired with a variable such as county data, one could predict growth rates for the future based of regression analysis. Being that the nature of a regression analysis is to show a trend line, one could extend the trend line for the future to determine trends. Monetary applications can be used similar to urban applications. For instance, I could predict future economic trends or future home values. Monetary predictions are arguably most likely to change, but it would still be possible to predict general trends. This would also be good for the value of currencies, especially
in relation to one another. As stated above, a monetary analysis would be beneficial to the application of natural hazards planning. One of the biggest factors we face in natural hazards damage today are the costs accrued by these events. As the value of our assets increase, so do the costs of damages after a hazard event. Insurance rates, office space rental, and bridge tolls may all be affected by the price increase. Regression modeling could have the potential to predict these increases before they become a reality. All in all, regression modeling is a key tool in analysis of almost anything. The application of future modeling is possibly one of the most valuable assets in statistical analysis today. With a country becoming more heavily dependent on technology and computers, regression modeling and analysis is a key way for expression and explaining information. With only a few variables, you can present countless valuable outcomes.