Statistical Analysis of Google Flu Trends

Size: px
Start display at page:

Download "Statistical Analysis of Google Flu Trends"

Transcription

1 Statistical Analysis of Google Flu Trends A PROJECT SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Melissa Sandahl IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE Yang Li and Kang James July, 2016

2 c Melissa Sandahl 2016 ALL RIGHTS RESERVED

3 Acknowledgements Firstly, I would like to thank my advisors Yang Li and Kang James for all of their time, guidance and expert support while completing this project. I would also like to thank Xuan Li for serving on my committee and the positive support. Lastly, I am very thankful for everything that the University s Mathematics & Statistics department has provided me with and helped me accomplish over the last two years. i

4 Abstract Predicting the behavior of influenza is crucial to helping health officials prepare for and decrease possible outbreaks of the infectious disease. This project discusses methods for testing Google flu count data taken from for spatial autocorrelation, seasonality and temporal effects. We will generate an appropriate seasonal ARIMA model to fit the data for the overall nation as well as use the statistical program R to develop multiple state models. Lastly, the Ljung-Box test will be applied to test for goodness of fit and model adequacy. The goal of this project is to be able to forecast future influenza outbreaks from Google flu trends across the United States in hopes of increasing preparation standards. ii

5 Contents Acknowledgements Abstract List of Tables List of Figures i ii v vi 1 Introduction 1 2 Spatial Data Spatial Weights Matrix Spatial Dependence Global Test for Spatial Autocorrelation Local Test for Spatial Autocorrelation Time Series Analysis 19 4 Seasonal ARIMA Model The General Model Simple ARIMA Example ACF and PACF R ARIMA Model State Models US Model Accuracy State Model Accuracy iii

6 5 Model Forecasting 37 6 Conclusion and Discussion 39 References 41 Appendix A. Spatial Matrix 43 Appendix B. Moran I Proof 45 iv

7 List of Tables 2.1 Global Moran Is: January June Global Moran Is: July December ARIMA model generated from ACF and PACF plots after differencing Results from ARIMA model generated from R Seasonal ARIMA models selected by R: Alabama - Montana Seasonal ARIMA models selected by R: Nebraska - Wyoming Seasonal ARIMA model statistics: Alabama - Montana Seasonal ARIMA model statistics: Nebraska - Wyoming v

8 List of Figures 2.1 Average Monthly Flu Counts from Average Monthly Flu Counts Per Person from Time Series of observed Global Morans Local Moran I values for Alabama Local Moran I values for Montana Local Moran I values for North Dakota Local Moran I values for South Dakota Time series of Local Moran I values for Alabama, Montana and North Dakota Times Series of original data collected Times series of data after it was logged ACF and PACF for US monthly data Times Series of the data after 12 month differencing ACF and PACF for US monthly data after differenced for 12 months Time Series, ACF, and PACF of the model s residuals Forecast for ARIMA(1, 0, 1) (2, 0, 0) 12 model A.1 Map of the continental states with corresponding numbers [13] A.2 List of Neighbors for the 48 States [13] vi

9 Chapter 1 Introduction Accurate models to help predict influenza outbreaks across the United States can help public health officials track and prepare for future events and hopefully decrease the amount of deaths. Nowadays, large data sets for diseases and epidemics can be collected quickly and easily through internet-based programs. However, generating useful models to help predict epidemics is largely dependent on the availability and accuracy of this data [14]. The data for this project came from Google Flu Trends [5]. Google keeps track of the number of times that the word flu or flu-like symptoms are searched on their website and maintain a database with this information on a weekly basis. The data is collected both nationwide and by each individual state. Studies concerning Google-search-based tracking models have been done, like the AutoRegression with Google data,argo, model developed by Yang, Santillana and Kou [14] and the Google Flu Trend with a state-space SEIR model by Dukik, Lopes and Polson [3]. Both models were created with hopes of tracking disease behavior at different temporal and spatial inputs. We will start the project by using the data collected for each state and investigate if there is any spatial autocorrelation present. When testing for spatial autocorrelation, we must build a spatial weights matrix which depicts the locational similarities between 1

10 the areas in the study. We will test for both global and local spatial autocorrelation in the data in order to get more accurate results. 2 Next, we will analyze the data as a seasonal time series. Using the ACF and PACF plots, we will determine if any lags in the data are significant in developing an appropriate seasonal ARIMA model that fits the data. For this project we will test the model s accuracy using the Ljung-Box test which will test if the data is independently distributed. Our goal for this project is to be able to create a model that will be able to forecast future behavior of Google flu trends on the nationwide and/or state level. In doing so, we would be able to predict if one state will have a rise in flu cases based on the frequency of google flu counts in neighboring states and previous time periods.

11 Chapter 2 Spatial Data Spatial data is the term used to describe data that has a spatial or geographical component associated with it. This data can be characteristic or more commonly, numerical observations. There are two common spatial structures that are used when modeling regional spatial data. One method is by determining the proximity of areas based on distances between the centroids of each areal unit. It is assumed that the observation is observed in the centroid of each region and then a spatial covariance structure is developed based on the distances of the centroids. This strategy does not take into account the fact that the centroid does not accurately describe the behavior across the entire region. [13] The second method, and the one used in this project, uses a neighborhood structure to create the spatial covariance matrix. This would mean that the proximity of the areal units is determined by the borders shared in the regional lattice structure. Thus, regions are considered neighbors when they share a common border. Unfortunately, this method uses irregular lattices and has not been studied as in-depth as the first method. In this project, we used area data from the 48 continental U.S. states of google flu counts collected from Google [5]. Area data is defined as the type of spatial data where the observations are associated with a fixed set of areal units. Like stated before, we used a neighborhood structure that contained areas/zones with irregular boundaries [4]. 3

12 4 2.1 Spatial Weights Matrix We utilized what s known as a spatial weights matrix to determine spatial relativity within our data. Since we wanted to investigate if a rise in Google Flu Counts in one state would affect the states surrounding it, we decided to use the neighbors of each state to generate our spatial weights matrix. Thus, when two states, areas i and j, share a common border the entries W i,j and W j,i will be given a value of 1. All diagonal entries W i,i will be assigned a value of 0 [4]. The Spatial Weights Matrix W of the 48 continental U.S. states is: 0 W 1,2... W 1,48 W 2, W 2,48... W 48,1 W 48, W i,j = { 1 if area j is neighbors with area i 0 otherwise. See appendix A for a map of the 48 areal units labeled in their corresponding alphabetical order and a list of all the neighbors for each state. Before we moved on with the data, we decided to row standardize W so that we could interpret each entry as the portion of spatial influence that area j has on area i.

13 2.2 Spatial Dependence 5 In order to create a useful model with this data, we must first check if there is spatial dependence among the observations. We will investigate if there is spatial autocorrelation in the number of Google Flu Counts based on the proximity of the 48 continental states. We will run two tests to check for spatial dependence, one on a global scale and the second will look at local spatial dependence for each location. Before we ran any tests, we wanted to look at what was happening with the flu counts per person in each state at multiple time periods. Figure 2.2 below reveals what happened every other month across the United States for the year We observed that there seems to be a larger percentage of people searching flu trends in the upper mid-west all year round than in other areas across the country. When looking for spatial dependence, we are comparing the similarity between the observations for each of the 48 locations. We will use all of the following variables to help us test for spatial autocorrelation: n number of areas in sample, i, j two different areas in sample, z i observation collected from area i, z average of all n observations, W ij similarity in location of areas i and j, M ij similarity in observations of areas i and j. To see visually what the counts we collected looked like, we took the average monthly data from and generated density plots of the geographical region. Figure 2.1 displays the density counts for every other month. We noticed that the states with the largest density of counts were the ones with the

14 6 largest populations, e.g. Texas, Arizona, California. For that reason, Figure 2.1 is not useful for making any conjectures since it would not be spatial effects that cause those states to have higher Google flu counts. To account for this, we divided our data of monthly counts by each state s population so we could use per capita data. We then generated the same density plots but with the monthly average per capita data, see Figure 2.2. When we graphed the newly transformed data we saw that there is no longer clustering happening primarily in the southern region of the US. We now see clustering happening mainly in the upper mid-west region of the states instead. One possible explanation for this is that that region generally has harsher winters and colder temperatures year round which would increase the chances of contracting influenza.

15 y y y y y y 7 Average January Counts Average July Counts Count Count x Average March Counts x Average September Counts Count Count x Average May Counts x Average November Counts Count Count x x Figure 2.1: Average Monthly Flu Counts from

16 y y y y y y 8 Average January Logged Counts Average July Logged Counts Count Count x Average March Logged Counts x Average September Logged Counts Count Count x Average May Logged Counts x Average November Logged Counts Count 0.20 Count x x Figure 2.2: Average Monthly Flu Counts Per Person from

17 2.2.1 Global Test for Spatial Autocorrelation 9 Global spatial autocorrelation measures and tests use the entire spatial weights matrix W to determine if there is spatial autocorrelation over the total area in the study. Whereas, local measures will calculate a statistic for each area in the study and use a smaller, restricted set of areal units [4]. When measuring for global spatial autocorrelation, we compared the similarities in the observations M ij with the similarities in the locations W ij by using the cross-product: n n M ij W ij (2.1) i=1 j=1 The two most commonly used methods for finding spatial autocorrelation for areal units are the Moran s I and Geary s c statistics. Both statistics will determine the overall degree of spatial correlation in the data set as a whole. For this project, we used Moran s I statistic to determine if there s global spatial autocorrelation present in our data. Moran s I statistic uses the cross-products to measure value similarity, M ij = (z i z)(z j z), versus Geary s c which uses squared differences such as (z i z j ) 2. The global Moran I statistic is: I = n n n W ij (z i z)(z j z) i=1 j=1 n n W ij i=1 j i i=1 (2.2) n (z i z) 2 E[I] = 1 n 1 (2.3) var(i) = n2 (n 1)W 1 n(n 1)W 2 2W 2 0 (n + 1)(n 1) 2 W 2 0 (2.4)

18 where 10 W 0 = n n W ij (2.5) i=1 j i W 1 = 1 2 n n (W ij + W ji ) 2 (2.6) i=1 j i n ( n n ) 2 W 2 = W kj + W ik (2.7) k=1 j=1 i=1 Please see Appendix B for the proof of the expected value for Moran I. The null hypothesis associated with the Moran I statistic is that the spatial processes influencing any spatial relationships is randomly placed and there is no spatial autocorrelation. To test for the significance of spatial autocorrelation, R will randomly assign the obersvations to the areal units and calculate the observed Moran I for a large number of these random assignments. The observed Moran I is then compared to the random set of Is and if the actual observed I falls less than the 5th percentile or greater than the 95th percentile then there is spatial autocorrelation present at the α = 0.5 level [6]. Therefore, we looked for p-values that were significant (< 0.05) which would imply that we could reject the null hypothesis and conclude that there is spatial autocorrelation present[4]. When spatial autocorrelation is present, for large data sets, the observed Moran I statistic will be a large value versus the expected value of the statistic under the null hypothesis of no spatial relation. To see this, consider when two neighboring areas i and j both have high observation values. They will both be larger than the average, z, and the cross product (z i z)(z j z) will be a large positive value. Using the Moran.I() command in the package {ape} in R we found the Global Moran I statistics for all 84 time periods in the study. These results are shown in Tables 2.1 and 2.2 below.

19 Date Observed I Expected I St. Dev P val Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Table 2.1: Global Moran Is: January June 2011

20 Date Observed I Expected I St. Dev P val Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Table 2.2: Global Moran Is: July December 2014

21 13 A graph of the observed Global Moran I values for all 84 time periods is shown below in Figure 2.3. The reference line for the expected Global Moran I is also included in the plot. One standard deviation above and below the observed value is indicated in the plot. Notice that none of the observed Global Moran values are at least one standard deviation away from the expected value but we did find five months that displayed spatial autocorrelation by testing the I values from Figures 2.1 and 2.2. This result is not very strong and we could not conclude that there is global spatial autocorrelation in the data. Thus, we decided to test each state for any local spatial autocorrelation. Global Moran Is Observed Gloval Moran Is Time Figure 2.3: Time Series of observed Global Morans Local Test for Spatial Autocorrelation Since we did not find any significant global spatial autocorrelation, our next step was to test for signs of local spatial autocorrelation. Local statistics are used to determine if each areal unit has a large amount of spatial clustering or if there are notable similarities in the observations of surrounding areas. Local indicators of spatial assocation (LISA) were created by Luc Anselin to determine the influence of each individual observation versus looking at the entire sample area [2].

22 14 The test for local spatial autocorrelation utilizes the cross product: n M ij W ij (2.8) j=1 which uses comparisons of spatial autocorrelations for a specific observation or areal unit. Similarly, M ij = (z i z)(z j z) like in the global test. Also, we now have neighborhood sets J i which are the collection of neighbors for area i and z now represents the average observation value for just area i s neighboring states. The local Moran I i statistic for area i is: I i = (z i z) n E[I i ] = 1 (n 1) j J i W ij (z j z) (2.9) n W ij (2.10) j=1 var[i i ] = where 1 (n 1) W i(2)(n b 2 ) + 2 (n 1)(n 2) W i(kh)(2b 2 n) 1 (n 1) 2 W i 2 (2.11) W i(2) = n j i W 2 ij (2.12) n n 2W i(kh) = W ik W ih (2.13) k i h i n W i = W ij (2.14) j=1 One thing to notice is that the sum of I i for all i values is equivalent to the global

23 Moran I statistic we used in Equation n n n I i (z i z) W ij (z j z) (2.15) j J i i=1 i=1 Using the Local.Moran() command in the package {spdep} in R, we again calculated Moran I statistics for each of the 84 time periods. However, when testing for local spatial autocorrelation we generated a test statistic for each of the 48 areas for each time period. Thus, we observed 4,032 local Moran I statistics. From these results, we generated a graph for each state that includes the calculated test statistic as well as one standard deviation above and below and the expected I i value for reference. From these, we found that only six states were consistently further than one standard deviation away from the expected value. These states include: Delaware, Montana, North Dakota, South Dakota, Texas and Wyoming. The corresponding graphs for Montana, North Dakota and South Dakota are displayed below along with the plot for Alabama which did not display significant spatial autocorrelation.

24 16 Local Moran I: Alabama Observed Local Moran Is Time Figure 2.4: Local Moran I values for Alabama Local Moran I: Montana Observed Local Moran Is Time Figure 2.5: Local Moran I values for Montana

25 17 Local Moran I: North Dakota Observed Local Moran Is Time Figure 2.6: Local Moran I values for North Dakota Local Moran I: South Dakota Observed Local Moran Is Time Figure 2.7: Local Moran I values for South Dakota

26 Observed Local Moran I values 18 Observed Local Moran I's Alabama Montana North Dakota Time Figure 2.8: Time series of Local Moran I values for Alabama, Montana and North Dakota Figure 2.8 displays the trends in the observed local Moran Is for three of the states. Alabama is used as a reference for a state that does not display any type of local spatial autocorrelation. Whereas, both Montana s and North Dakota s Moran Is indicated that they had significant local spatial autocorrelation. However, despite having six states with significant local spatial autocorrelation, our results are similar to those for global spatial autocorrelation. We concluded that our data did not show enough significant local spatial autocorrelation to continue to include it in our model building.

27 Chapter 3 Time Series Analysis For this project, we are using average monthly data observed for 7 years, for a total of 84 data points for each location. Since the data we collected was in terms of weeks, we manually converted the data to monthly averages. Weekly data was listed by the first day of the week that the counts started. Thus, if a week started at the end of January but also contained days in February it was only taken into account for January s average. A time series plot of the raw data is shown in Figure 3.1 below. Notice that there is a large amount of variance between the peaks. Looking at Figure 3.2, we can see that after we take the log of the data, there is much less variance in the output. The data now ranges roughly from 6 10 versus with the raw data. Also, since our data is in the form of counts then taking the log will help normalize the data. We will continue our project by using the logtransformed data to generate useful models. Another option we could have tried would have been to difference the data. Differencing is commonly used when you have a seasonal time series. The seasonality typically causes a time series to be nonstationary since the seasonality is the main factor affecting the output at certain time periods. Differencing, computing the difference between consecutive observations, can help make a time series stationary and remove patterns in the data. However, we decided to not use differencing and keep the seasonal effects in the data for modeling purposes we will discuss later in the project [7]. 19

28 20 Next, notice that there is an obvious spike in the data that occurs once every year. These spikes represent flu season which typically peaks between December and February every year with the majority of years peaking in February [12]. Because of this, we incorporated a 12 month seasonal effect into our models. Another observation of the data is the atypical spike for the flu season. According to the Center for Disease Control, CDC, the flu vaccine was 52% effective at preventing acute respiratory illness that required medical attention that year. However, the vaccine s effectiveness was lower for those aged 65 years and older; which could have been one factor for the flu season s severity [12].

29 21 Time Series of Monthly Average US Raw Data Raw Monthly Average Flu Counts Dates Figure 3.1: Times Series of original data collected Time Series of Monthly Average US Logged Data Logged Monthly Average Flu Counts Dates Figure 3.2: Times series of data after it was logged

30 Chapter 4 Seasonal ARIMA Model Since we are using monthly data to help us determine a model for predicting Google flu counts, we would expect our model to be seasonal. Having seasonality in a time series means there is a pattern present that repeats every m time periods. Thus, since flu season generally peaks around the same time each year, we would expect our data to have a reoccurring pattern every 12 months. 4.1 The General Model The seasonal ARIMA model includes both seasonal and non-seasonal autoregressive and moving average components. The model also incorporates the use of differencing which was mentioned earlier. The backwards difference operator, B, is defined as: The notation for the general model is: (B m )x t = x t m (4.1) ARIMA(p, d, q) (P, D, Q) m 22

31 p d q P D Q m non-seasonal AR order, non-seasonal differencing, non-seasonal MA order, seasonal AR order, seasonal differencing, seasonal MA order, number of months per season. 23 The model can be written as: Φ(B m )φ(b)(1 B... B d )(1 B... B D )(x t µ) = Θ(B m )θ(b)w t (4.2) The non-seasonal AR component is written as: The non-seasonal MA component is written as: φ(b) = 1 φ 1 B... φ p B p (4.3) The seasonal AR component is written as: θ(b) = 1 + θ 1 B θ q B q (4.4) The seasonal MA component is written as: Φ(B m ) = 1 Φ 1 B m... Φ P B mp (4.5) Θ(B m ) = 1 + Θ 1 B m Θ Q B mq (4.6) Simple ARIMA Example To illustrate a simple example of a seasonal ARIMA model, let s look at an ARIMA(0, 1, 1) (1, 0, 1) 12 model.

32 24 Our model would start as: Φ(B)(1 B)(x t µ) = Θ(B)θ(B)w t (4.7) Now, replace (x t µ) with z t and substitute in the appropriate equations to get: (1 ΦB)(1 B)z t = (1 + ΘB)(1 + θb)w t (4.8) (1 B ΦB ΦBB)z t = (1 + ΘB + θb + ΘθBB)w t (4.9) Thus, we get the resulting equation: z t = z t 1 + Φz t 1 + Φz t 2 + w t + Θw t 1 + θw t 1 + Θθw t 2 (4.10)

33 4.2 ACF and PACF 25 Next, we created ACF and PACF plots for the nationwide data which are shown in Figure 4.1. Series: USData ACF LAG PACF LAG Figure 4.1: ACF and PACF for US monthly data We observed that the ACF plot in Figure 4.1 has cyclic spikes that do not seem to dampen in magnitude at any lag value. Thus, it would be difficult to try and determine a model for this data based on these plots. Since we had seasonal data, it is sometimes helpful to analyze the ACF and PACF plots after differencing the data. Seasonality commonly brings about nonstationarity in data since it s typical for seasonal data to fluctuate and peak during certain time periods. Thus, we used a 12 month difference in our data and the time series plot after the differencing can be found in Figure 4.2. Notice that the new plot no longer has the consistent peak and valley pattern that was present back in Figure 3.2. We then generated ACF and PACF plots for the data after applying a 12 month difference which can be found in Figure 4.3.

34 26 Time Series after Differenced for 12 Months 12 Month Differenced Flu Counts Dates Figure 4.2: Times Series of the data after 12 month differencing Series: Differenced ACF LAG PACF LAG Figure 4.3: ACF and PACF for US monthly data after differenced for 12 months

35 27 From analyzing Figure 4.3 we were able to make an educated guess on what parameters to include in a seasonal ARIMA model. We used the the ACF plot to help us determine the values for the seasonal and non-seasonal MA order and the PACF for the number of seasonal and nonseasonal AR parameters. Looking at just the first couple lags in the ACF we determined that a non-seasonal MA order of 1 would fit since there is only a spike at lag 1. Next, we analyzed the lags at 12, 24 and 36 and decided to use a seasonal MA order of 1 as well since the lag at time 12 seemed to be the only significant seasonal lag. We used the same methodology for picking the AR components of the model and analyzing the PACF plot. Thus, we decided to use a non-seasonal AR order of 2 and a seasonal order of 0. Thus, we created the following model: ARIMA(2, 0, 1) (0, 0, 1) 12 Applying these parameters to equation (4.1.1) we get: φ(b)(x t µ) = Θ(B 12 )θ(b)w t (4.11) (1 φ 1 B φ 2 B 2 )(x t µ) = (1 + ΘB 12 )(1 + θb)w t (4.12) For simplicity, let z t = (x t µ) and multiply the 2 polynomials on the right side: (1 φ 1 B φ 2 B 2 )z t = (1 + ΘB 12 + θb + ΘθBB 12 )w t (4.13) z t = φ 1 z t 1 + φ 2 z t 2 + w t + Θw t 12 + θw t 1 + Θθw t 13 (4.14)

36 Type Coef S.E. AR 1 (φ 1 ) AR 2 (φ 2 ) MA 1 (θ) SMA 1 (Θ) Constant AIC=26.95 ˆσ 2 = Table 4.1: ARIMA model generated from ACF and PACF plots after differencing Our final model is: z t = z t 1 + w t w t w t w t 13 (4.15) 4.3 R ARIMA Model Now that we have analyzed the ACF and PACF plots for the data and generated a model using those, we decided to see what type of model R would choose to fit the data. We used the auto.arima() command in the package {forecast} in R to help us develop a potentially different model. This function in R selects the best model based on the Akaike information criterion, AIC. The AIC is used to estimate the quality of a model in relation to other models. Thus, it will determine which model is the most useful out of a set of multiple models. It will not determine significance of an individual model. Our results were as follows: ARIMA(1, 0, 1) (2, 0, 0) 12 Applying these parameters to equation (4.1.1) we get: Φ(B 12 )φ(b)(x t µ) = θ(b)w t (4.16)

37 29 (1 Φ 1 B 12 Φ 2 B 24 )(1 φb)(x t µ) = (1 + θb)w t (4.17) to get: For simplicity, let z t = (x t µ) and multiply the first 2 polynomials on the left side (1 Φ 1 B 12 Φ 2 B 24 φb + Φ 1 φb 12 B + Φ 2 φb 24 B)z t = (1 + θb)w t (4.18) z t = Φ 1 z t 12 + Φ 2 z t 24 + φz t 1 Φ 1 φz t 13 Φ 2 φz t 25 + w t + θw t 1 (4.19) Type Coef S.E. AR 1 (φ) MA 1 (θ) SAR 1 (Φ 1 ) SAR 2 (Φ 2 ) constant AIC=15.57 ˆσ 2 = Table 4.2: Results from ARIMA model generated from R Our final model is: z t = z t z t z t z t z t 25 +w t w t 1 (4.20) Our first model, which we developed by analyzing the ACF and PACF plots, had an AIC value of Whereas, the model chose by R had an AIC value of only

38 Thus, we concluded that the model generated by R, ARIMA(1, 0, 1) (2, 0, 0) 12, is a better model for the data than the one we calculated ourselves State Models Next, we ran auto.arima() on each of the 48 individual states in order to gather state-specific models. The models that R chose for each state are listed below in Tables 4.3 and 4.4. State AR MA SAR SMA Period NonSDiff SDiff AIC Alabama Arizona Arkansas California Colorado Connecticut Delaware Florida Georgia Idaho Illinois Indiana Iowa Kansas Kentucky Louisiana Maine Maryland Massachusetts Michigan Minnesota Mississippi Missouri Montana Table 4.3: Seasonal ARIMA models selected by R: Alabama - Montana

39 State AR MA SAR SMA Period NonSDiff SDiff AIC Nebraska Nevada New Hampshire New Jersey New Mexico New York North Carolina North Dakota Ohio Oklahoma Oregon Pennsylvania Rhode Island South Carolina South Dakota Tennessee Texas Utah Vermont Virginia Washington West Virginia Wisconsin Wyoming Table 4.4: Seasonal ARIMA models selected by R: Nebraska - Wyoming Notice that each state displays fairly different model specifications and that there is a wide variety of models associated with the 48 states. It does appear that the ARIMA(1, 0, 1) (2, 0, 0) 12 is the most common model present in Tables 4.3 and 4.4. This makes sense since this is the model that R also chose to describe the behavior of the overall country s flu counts. However, there are 13 states that include a seasonal difference variable in their model which is something we did not include in our model.

40 4.5 US Model Accuracy 32 We then made a time series plot of the residuals for this model as well as the ACF and PACF plots which are shown in Figure 4.4 below. res ACF PACF Lag Lag Figure 4.4: Time Series, ACF, and PACF of the model s residuals There doesn t look like there s an apparent trend in the time series for the residuals, which indicates a decent model. We also notice that there are not any significant

41 33 spikes in either the ACF or the PACF plots which would mean that there s no trends happening in lags of the residuals for the model. Based on these plots, we concluded that this model does sufficiently describe the data. Next, we ran a Ljung-Box test on the model to help us further determine the model s usefulness [8]. The Ljung-Box test is defined as: H 0 : The data are independently distributed H a : The data are not independently distributed (correlation is present) with the test statistic: Q = T (T + 2) h g=1 ˆρ 2 T g T length of time series, ˆρ sample autocorrelation at lag g, h number of lags being tested, K number of model parameters. Under the null hypothesis, the test statistic Q follows a Chi-squared distribution typically with h degrees of freedom, χ 2 (h). However, since we are using this test for an ARIMA model, we are testing if the errors resemble independence and must adjust the degrees of freedom accordingly. The degrees of freedom for a seasonal ARIMA model end up being the number of lags being tested less the number of model parameters. Thus, with a significance level of α, the rejection region for the hypothesis of randomness in the residuals is: Q > χ 2 1 α,h K According to Rob Hyndman and George Athanasopoulos [9] there is not a standard value one should use for h, but as a rule of thumb for seasonal data you should let

42 h = min(2m, T/5) where T is the length of the time series. 34 Applying our model generated from R to this test at a 95% significance level we get: h = min(2 12, 84 5 ) 17, Q = , χ ,13 = 22.36, p value = , AIC = We would reject the hypothesis of randomness in the residuals if Q > Since our test statistic for our model is , we will not reject the hypothesis and conclude that the data does show to be independently distributed. 4.6 State Model Accuracy Next, we ran the Ljung-Box test on the models that R generated for each individual state. The test statistics, degrees of freedom, p-values and AIC values are listed in Figures 4.5 and 4.6 below. We see that when using the same test we used for the overall US model, we will only reject the hypothesis of independence for the state of Arkansas at the α=0.05 level. We can conclude that the models for the rest of the 47 states look to be independently distributed and useful for our data.

43 State Q-statistic DF P val AIC Alabama Arizona Arkansas California Colorado Connecticut Delaware Florida Georgia Idaho Illinois Indiana Iowa Kansas Kentucky Louisiana Maine Maryland Massachusetts Michigan Minnesota Mississippi Missouri Montana Table 4.5: Seasonal ARIMA model statistics: Alabama - Montana

44 State Q-statistic DF P val AIC Nebraska Nevada New Hampshire New Jersey New Mexico New York North Carolina North Dakota Ohio Oklahoma Oregon Pennsylvania Rhode Island South Carolina South Dakota Tennessee Texas Utah Vermont Virginia Washington West Virginia Wisconsin Wyoming Table 4.6: Seasonal ARIMA model statistics: Nebraska - Wyoming

45 Chapter 5 Model Forecasting Our final task is to forecast future events from the model that we generated. To do this, we used R s forecast() command and applied it to our ARIMA(1, 0, 1) (2, 0, 0) 12 model. The resulting forecast is shown below in Figure Month Forecast Figure 5.1: Forecast for ARIMA(1, 0, 1) (2, 0, 0) 12 model 37

46 38 The black line on the left represents the actual data that we collected whereas the blue line is the fitted values of the model for 24 months in the future. The shaded areas are prediction intervals for the forecast. The light grey region the 95% confidence interval and the purple area is the 80% confidence interval for the forecasted data. The dotted line in Figure 5.1 is data that we collected from Google for January July From this graph, the forecast looks to be a good fit, the fitted values stay inside both the 80% and 95% confidence intervals.

47 Chapter 6 Conclusion and Discussion Starting with weekly flu count data for , we converted it to average monthly data and log-transformed it to normalize and reduce the variance in the data. We were then left with a seasonal time series ready to be analyzed for spatial and temporal effects. Our first step was to test for both global and local spatial autocorrelation. We used Moran s I statistic to test for both types of spatial correlation. Unfortunately, we did not find enough evidence to say there was signficant global spatial autocorrelation. And although a handful of states displayed local spatial autocorrelation, we decided this wasn t enough evidence to continue using spatial analysis in our model. Thus, we continued on with without spatial effects and looked into just temporal. We created two potential seasonal ARIMA models. One from analyzing the ACF and PACF plots of the seasonal differenced data and the other generated from R. We then used the AIC values to help us choose between the two models. The lower AIC value led us to choose the model that R developed for the data. Once we had our final model, we applied the Ljung-Box test to determine the overall quality of the seasonal ARIMA model. This test revealed that the data did in fact show to be independently distributed at a 95% significance level and we rejected the alternative hypthesis that correlation was present. Thus, our model was an adequate 39

48 fit for the data. 40 Lastly, we were able to create a 24 month forecast of the model. We generated the 80% and 95% prediction intervals based on the 24 month forecast and then plotted 7 months of data from 2015 to see how well the model could forecast. The actual data from January - July 2015 stayed within both confidence intervals so we accepted this forecast and would suggest using it for predicting future Google flu trends across the United States.

49 References [1] Anselin, Luc. Spatial Econometrics A Companion to Theoretical Econometrics. Chapter 14 (2003), [2] Anselin, Luc. Local Indicators of Spatial Association - LISA Geographical Analysis 27 (2): [3] Dukic, Vanja, Hedibert F. Lopes and Nicholas G. Polson. Tracking Epidemics With Google Flu Trends Data and a State-Space SEIR Model Journal of the American Statistical Association 107:500, Web. [4] Fischer, Manfred M., & Jinfeng Wang. Spatial Data Analysis: Models, Methods and Techniques Heidelberg: Springer, Springer Link. Springer International Publishing. Web. [5] Flu Trends. Google. N.p., Web. 22 July 2015, available at google.org/flutrends/about/data/flu/us/data.txt [6] Franklin, Meredith. Spatial Statistics USC Keck School of Medicine, Web. 6 July [7] Hyndman, Rob J., and George Athanasopoulos., 8.1 Stationarity and Differencing OTexts, Web. [8] Hyndman, Rob J. Thoughts on the Ljung-Box test WordPress, Web. 02 July [9] Hyndman, Rob J., and George Athanasopoulos., 8.9 Seasonal ARIMA Models OTexts, Web. 41

50 42 [10] LeSage, James P. Lecture 1: Maximum likelihood estimation of spatial regression models (2004), available at www4.fe.uc.pt/spatial/doc/lecture1.pdf Web. [11] Pace, R. Kelley, Ronald Barry, Otis W. Gilley1 and C.F. Sirmans. A method for spatial-temporal forecasting with an application to real estate prices International Journal of Forecasting 16 (200): n. pag. Web. [12] The Flu Season. Centers for Disease Control and Prevention. Centers for Disease Control and Prevention, 22 Oct Web. [13] Wall, Melanie M. A close look at the spatial structure implied by the CAR and SAR models Journal of statistical planning and inference 121 (2004): Web. [14] Yang, Shihao, Mauricio Santillana and S. C. Kou. Accurate estimation of influenza epidemics using Google search data via ARGO. Proceedings of the National Academy of Sciences Proc Natl Acad Sci USA (2015): Web.

51 Appendix A Spatial Matrix The spatial weights matrix W which is used for finding spatial autocorrelation is determined by the collection of neighboring states for each state. The geographical map of the states numbered in alphabetical order is displayed below in Figure A.1. The corresponding list of neighbors associated with the map is found in Figure A.2 [13]. Figure A.1: Map of the continental states with corresponding numbers [13] 43

52 Figure A.2: List of Neighbors for the 48 States [13] 44

53 Appendix B Moran I Proof Proof Start by noting: 1. z 1,...z n are given and I = E[I] = n n i=1 j=1 n i=1 j i 1 N 1 n W ij (z i z)(z j z) n W ij i=1 n (z i z) 2 n (z i z) 2 is a constant 2. For k l, E(z k z)(z l z) is the same k, l i 45

54 46 3. For k l, E(z k z)(z l z) = (z k z)(z l z) k l k n(n 1) = ( (z k z) k n(n 1) ) (z k z) = (z k z) 2 k n(n 1) 4. n n W ij = i=1 j=1 n n W ij since W kk = 0 i=1 j i k Now we can algebraically prove: E[I] = n n n W ij (z i z)(z j z) i=1 j=1 n n W ij i=1 j i i=1 n (z i z) 2 = n n n i=1 j=1 n(n 1) W ij ( k n n i=1 j i W ij i=1 (z k z) 2) n (z i z) 2 = 1 n 1

New Educators Campaign Weekly Report

New Educators Campaign Weekly Report Campaign Weekly Report Conversations and 9/24/2017 Leader Forms Emails Collected Text Opt-ins Digital Journey 14,661 5,289 4,458 7,124 317 13,699 1,871 2,124 Pro 13,924 5,175 4,345 6,726 294 13,086 1,767

More information

Multivariate Statistics

Multivariate Statistics Multivariate Statistics Chapter 4: Factor analysis Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2017/2018 Master in Mathematical Engineering Pedro

More information

Standard Indicator That s the Latitude! Students will use latitude and longitude to locate places in Indiana and other parts of the world.

Standard Indicator That s the Latitude! Students will use latitude and longitude to locate places in Indiana and other parts of the world. Standard Indicator 4.3.1 That s the Latitude! Purpose Students will use latitude and longitude to locate places in Indiana and other parts of the world. Materials For the teacher: graph paper, globe showing

More information

TIME SERIES ANALYSIS AND FORECASTING USING THE STATISTICAL MODEL ARIMA

TIME SERIES ANALYSIS AND FORECASTING USING THE STATISTICAL MODEL ARIMA CHAPTER 6 TIME SERIES ANALYSIS AND FORECASTING USING THE STATISTICAL MODEL ARIMA 6.1. Introduction A time series is a sequence of observations ordered in time. A basic assumption in the time series analysis

More information

Jakarta International School 6 th Grade Formative Assessment Graphing and Statistics -Black

Jakarta International School 6 th Grade Formative Assessment Graphing and Statistics -Black Jakarta International School 6 th Grade Formative Assessment Graphing and Statistics -Black Name: Date: Score : 42 Data collection, presentation and application Frequency tables. (Answer question 1 on

More information

Intercity Bus Stop Analysis

Intercity Bus Stop Analysis by Karalyn Clouser, Research Associate and David Kack, Director of the Small Urban and Rural Livability Center Western Transportation Institute College of Engineering Montana State University Report prepared

More information

Abortion Facilities Target College Students

Abortion Facilities Target College Students Target College Students By Kristan Hawkins Executive Director, Students for Life America Ashleigh Weaver Researcher Abstract In the Fall 2011, Life Dynamics released a study entitled, Racial Targeting

More information

Multivariate Statistics

Multivariate Statistics Multivariate Statistics Chapter 3: Principal Component Analysis Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2017/2018 Master in Mathematical

More information

Multivariate Statistics

Multivariate Statistics Multivariate Statistics Chapter 6: Cluster Analysis Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2017/2018 Master in Mathematical Engineering

More information

Correction to Spatial and temporal distributions of U.S. winds and wind power at 80 m derived from measurements

Correction to Spatial and temporal distributions of U.S. winds and wind power at 80 m derived from measurements JOURNAL OF GEOPHYSICAL RESEARCH, VOL. 109,, doi:10.1029/2004jd005099, 2004 Correction to Spatial and temporal distributions of U.S. winds and wind power at 80 m derived from measurements Cristina L. Archer

More information

Challenge 1: Learning About the Physical Geography of Canada and the United States

Challenge 1: Learning About the Physical Geography of Canada and the United States 60ºN S T U D E N T H A N D O U T Challenge 1: Learning About the Physical Geography of Canada and the United States 170ºE 10ºW 180º 20ºW 60ºN 30ºW 1 40ºW 160ºW 50ºW 150ºW 60ºW 140ºW N W S E 0 500 1,000

More information

Chapter 12: An introduction to Time Series Analysis. Chapter 12: An introduction to Time Series Analysis

Chapter 12: An introduction to Time Series Analysis. Chapter 12: An introduction to Time Series Analysis Chapter 12: An introduction to Time Series Analysis Introduction In this chapter, we will discuss forecasting with single-series (univariate) Box-Jenkins models. The common name of the models is Auto-Regressive

More information

Suan Sunandha Rajabhat University

Suan Sunandha Rajabhat University Forecasting Exchange Rate between Thai Baht and the US Dollar Using Time Series Analysis Kunya Bowornchockchai Suan Sunandha Rajabhat University INTRODUCTION The objective of this research is to forecast

More information

Club Convergence and Clustering of U.S. State-Level CO 2 Emissions

Club Convergence and Clustering of U.S. State-Level CO 2 Emissions Methodological Club Convergence and Clustering of U.S. State-Level CO 2 Emissions J. Wesley Burnett Division of Resource Management West Virginia University Wednesday, August 31, 2013 Outline Motivation

More information

Cooperative Program Allocation Budget Receipts Southern Baptist Convention Executive Committee May 2018

Cooperative Program Allocation Budget Receipts Southern Baptist Convention Executive Committee May 2018 Cooperative Program Allocation Budget Receipts May 2018 Cooperative Program Allocation Budget Current Current $ Change % Change Month Month from from Contribution Sources 2017-2018 2016-2017 Prior Year

More information

Cooperative Program Allocation Budget Receipts Southern Baptist Convention Executive Committee October 2017

Cooperative Program Allocation Budget Receipts Southern Baptist Convention Executive Committee October 2017 Cooperative Program Allocation Budget Receipts October 2017 Cooperative Program Allocation Budget Current Current $ Change % Change Month Month from from Contribution Sources 2017-2018 2016-2017 Prior

More information

Cooperative Program Allocation Budget Receipts Southern Baptist Convention Executive Committee October 2018

Cooperative Program Allocation Budget Receipts Southern Baptist Convention Executive Committee October 2018 Cooperative Program Allocation Budget Receipts October 2018 Cooperative Program Allocation Budget Current Current $ Change % Change Month Month from from Contribution Sources 2018-2019 2017-2018 Prior

More information

Hourly Precipitation Data Documentation (text and csv version) February 2016

Hourly Precipitation Data Documentation (text and csv version) February 2016 I. Description Hourly Precipitation Data Documentation (text and csv version) February 2016 Hourly Precipitation Data (labeled Precipitation Hourly in Climate Data Online system) is a database that gives

More information

A. Geography Students know the location of places, geographic features, and patterns of the environment.

A. Geography Students know the location of places, geographic features, and patterns of the environment. Learning Targets Elementary Social Studies Grade 5 2014-2015 A. Geography Students know the location of places, geographic features, and patterns of the environment. A.5.1. A.5.2. A.5.3. A.5.4. Label North

More information

Multivariate Analysis

Multivariate Analysis Multivariate Analysis Chapter 5: Cluster analysis Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2015/2016 Master in Business Administration and

More information

Printable Activity book

Printable Activity book Printable Activity book 16 Pages of Activities Printable Activity Book Print it Take it Keep them busy Print them out Laminate them or Put them in page protectors Put them in a binder Bring along a dry

More information

Parametric Test. Multiple Linear Regression Spatial Application I: State Homicide Rates Equations taken from Zar, 1984.

Parametric Test. Multiple Linear Regression Spatial Application I: State Homicide Rates Equations taken from Zar, 1984. Multiple Linear Regression Spatial Application I: State Homicide Rates Equations taken from Zar, 984. y ˆ = a + b x + b 2 x 2K + b n x n where n is the number of variables Example: In an earlier bivariate

More information

Preview: Making a Mental Map of the Region

Preview: Making a Mental Map of the Region Preview: Making a Mental Map of the Region Draw an outline map of Canada and the United States on the next page or on a separate sheet of paper. Add a compass rose to your map, showing where north, south,

More information

extreme weather, climate & preparedness in the american mind

extreme weather, climate & preparedness in the american mind extreme weather, climate & preparedness in the american mind Extreme Weather, Climate & Preparedness In the American Mind Interview dates: March 12, 2012 March 30, 2012. Interviews: 1,008 Adults (18+)

More information

Chapter. Organizing and Summarizing Data. Copyright 2013, 2010 and 2007 Pearson Education, Inc.

Chapter. Organizing and Summarizing Data. Copyright 2013, 2010 and 2007 Pearson Education, Inc. Chapter 2 Organizing and Summarizing Data Section 2.1 Organizing Qualitative Data Objectives 1. Organize Qualitative Data in Tables 2. Construct Bar Graphs 3. Construct Pie Charts When data is collected

More information

QF (Build 1010) Widget Publishing, Inc Page: 1 Batch: 98 Test Mode VAC Publisher's Statement 03/15/16, 10:20:02 Circulation by Issue

QF (Build 1010) Widget Publishing, Inc Page: 1 Batch: 98 Test Mode VAC Publisher's Statement 03/15/16, 10:20:02 Circulation by Issue QF 1.100 (Build 1010) Widget Publishing, Inc Page: 1 Circulation by Issue Qualified Non-Paid Circulation Qualified Paid Circulation Individual Assoc. Total Assoc. Total Total Requester Group Qualified

More information

, District of Columbia

, District of Columbia State Capitals These are the State Seals of each state. Fill in the blank with the name of each states capital city. (Hint: You may find it helpful to do the word search first to refresh your memory.),

More information

Additional VEX Worlds 2019 Spot Allocations

Additional VEX Worlds 2019 Spot Allocations Overview VEX Worlds 2019 Spot s Qualifying spots for the VEX Robotics World Championship are calculated twice per year. On the following table, the number in the column is based on the number of teams

More information

RELATIONSHIPS BETWEEN THE AMERICAN BROWN BEAR POPULATION AND THE BIGFOOT PHENOMENON

RELATIONSHIPS BETWEEN THE AMERICAN BROWN BEAR POPULATION AND THE BIGFOOT PHENOMENON RELATIONSHIPS BETWEEN THE AMERICAN BROWN BEAR POPULATION AND THE BIGFOOT PHENOMENON ETHAN A. BLIGHT Blight Investigations, Gainesville, FL ABSTRACT Misidentification of the American brown bear (Ursus arctos,

More information

Lecture 5: Ecological distance metrics; Principal Coordinates Analysis. Univariate testing vs. community analysis

Lecture 5: Ecological distance metrics; Principal Coordinates Analysis. Univariate testing vs. community analysis Lecture 5: Ecological distance metrics; Principal Coordinates Analysis Univariate testing vs. community analysis Univariate testing deals with hypotheses concerning individual taxa Is this taxon differentially

More information

Online Appendix: Can Easing Concealed Carry Deter Crime?

Online Appendix: Can Easing Concealed Carry Deter Crime? Online Appendix: Can Easing Concealed Carry Deter Crime? David Fortunato University of California, Merced dfortunato@ucmerced.edu Regulations included in institutional context measure As noted in the main

More information

Lecture 5: Ecological distance metrics; Principal Coordinates Analysis. Univariate testing vs. community analysis

Lecture 5: Ecological distance metrics; Principal Coordinates Analysis. Univariate testing vs. community analysis Lecture 5: Ecological distance metrics; Principal Coordinates Analysis Univariate testing vs. community analysis Univariate testing deals with hypotheses concerning individual taxa Is this taxon differentially

More information

Osteopathic Medical Colleges

Osteopathic Medical Colleges Osteopathic Medical Colleges Matriculants by U.S. States and Territories Entering Class 0 Prepared by the Research Department American Association of Colleges of Osteopathic Medicine Copyright 0, AAM All

More information

Summary of Natural Hazard Statistics for 2008 in the United States

Summary of Natural Hazard Statistics for 2008 in the United States Summary of Natural Hazard Statistics for 2008 in the United States This National Weather Service (NWS) report summarizes fatalities, injuries and damages caused by severe weather in 2008. The NWS Office

More information

North American Geography. Lesson 2: My Country tis of Thee

North American Geography. Lesson 2: My Country tis of Thee North American Geography Lesson 2: My Country tis of Thee Unit Overview: As students work through the activities in this unit they will be introduced to the United States in general, different regions

More information

What Lies Beneath: A Sub- National Look at Okun s Law for the United States.

What Lies Beneath: A Sub- National Look at Okun s Law for the United States. What Lies Beneath: A Sub- National Look at Okun s Law for the United States. Nathalie Gonzalez Prieto International Monetary Fund Global Labor Markets Workshop Paris, September 1-2, 2016 What the paper

More information

Crop Progress. Corn Mature Selected States [These 18 States planted 92% of the 2017 corn acreage]

Crop Progress. Corn Mature Selected States [These 18 States planted 92% of the 2017 corn acreage] Crop Progress ISSN: 00 Released October, 0, by the National Agricultural Statistics Service (NASS), Agricultural Statistics Board, United s Department of Agriculture (USDA). Corn Mature Selected s [These

More information

2005 Mortgage Broker Regulation Matrix

2005 Mortgage Broker Regulation Matrix 2005 Mortgage Broker Regulation Matrix Notes on individual states follow the table REG EXEMPTIONS LIC-EDU LIC-EXP LIC-EXAM LIC-CONT-EDU NET WORTH BOND MAN-LIC MAN-EDU MAN-EXP MAN-EXAM Alabama 1 0 2 0 0

More information

JAN/FEB MAR/APR MAY/JUN

JAN/FEB MAR/APR MAY/JUN QF 1.100 (Build 1010) Widget Publishing, Inc Page: 1 Circulation Breakdown by Issue Qualified Non-Paid Qualified Paid Previous This Previous This Total Total issue Removals Additions issue issue Removals

More information

SUPPLEMENTAL NUTRITION ASSISTANCE PROGRAM QUALITY CONTROL ANNUAL REPORT FISCAL YEAR 2008

SUPPLEMENTAL NUTRITION ASSISTANCE PROGRAM QUALITY CONTROL ANNUAL REPORT FISCAL YEAR 2008 SUPPLEMENTAL NUTRITION ASSISTANCE PROGRAM QUALITY CONTROL ANNUAL REPORT FISCAL YEAR 2008 U.S. DEPARTMENT OF AGRICULTURE FOOD AND NUTRITION SERVICE PROGRAM ACCOUNTABILITY AND ADMINISTRATION DIVISION QUALITY

More information

Office of Special Education Projects State Contacts List - Part B and Part C

Office of Special Education Projects State Contacts List - Part B and Part C Office of Special Education Projects State Contacts List - Part B and Part C Source: http://www.ed.gov/policy/speced/guid/idea/monitor/state-contactlist.html Alabama Customer Specialist: Jill Harris 202-245-7372

More information

Alpine Funds 2017 Tax Guide

Alpine Funds 2017 Tax Guide Alpine s 2017 Guide Alpine Dynamic Dividend ADVDX 1/30/17 1/31/17 1/31/17 0.020000000 0.019248130 0.000000000 0.00000000 0.019248130 0.013842273 0.000000000 0.000000000 0.000751870 0.000000000 0.00 0.00

More information

Alpine Funds 2016 Tax Guide

Alpine Funds 2016 Tax Guide Alpine s 2016 Guide Alpine Dynamic Dividend ADVDX 01/28/2016 01/29/2016 01/29/2016 0.020000000 0.017621842 0.000000000 0.00000000 0.017621842 0.013359130 0.000000000 0.000000000 0.002378158 0.000000000

More information

Multivariate Classification Methods: The Prevalence of Sexually Transmitted Diseases

Multivariate Classification Methods: The Prevalence of Sexually Transmitted Diseases Multivariate Classification Methods: The Prevalence of Sexually Transmitted Diseases Summer Undergraduate Mathematical Sciences Research Institute (SUMSRI) Lindsay Kellam, Queens College kellaml@queens.edu

More information

GIS use in Public Health 1

GIS use in Public Health 1 Geographic Information Systems (GIS) use in Public Health Douglas Morales, MPH Epidemiologist/GIS Coordinator Office of Health Assessment and Epidemiology Epidemiology Unit Objectives Define GIS and justify

More information

Milk components rebounding across all western regions

Milk components rebounding across all western regions DV Monitors Milk components rebounding across all western regions By W.K. (Bill) Sanchez, Ph.D., Dipl. ACAN Technical Service Director Dairy Diamond V As published in From DV Monitors data through the

More information

Monthly Long Range Weather Commentary Issued: APRIL 18, 2017 Steven A. Root, CCM, Chief Analytics Officer, Sr. VP,

Monthly Long Range Weather Commentary Issued: APRIL 18, 2017 Steven A. Root, CCM, Chief Analytics Officer, Sr. VP, Monthly Long Range Weather Commentary Issued: APRIL 18, 2017 Steven A. Root, CCM, Chief Analytics Officer, Sr. VP, sroot@weatherbank.com MARCH 2017 Climate Highlights The Month in Review The average contiguous

More information

FLOOD/FLASH FLOOD. Lightning. Tornado

FLOOD/FLASH FLOOD. Lightning. Tornado 2004 Annual Summaries National Oceanic and Atmospheric Administration National Environmental Satellite Data Information Service National Climatic Data Center FLOOD/FLASH FLOOD Lightning Tornado Hurricane

More information

High School World History Cycle 2 Week 2 Lifework

High School World History Cycle 2 Week 2 Lifework Name: Advisory: Period: High School World History Cycle 2 Week 2 Lifework This packet is due Monday, November 7 Complete and turn in on Friday for 10 points of EXTRA CREDIT! Lifework Assignment Complete

More information

Circle the single best answer for each multiple choice question. Your choice should be made clearly.

Circle the single best answer for each multiple choice question. Your choice should be made clearly. TEST #1 STA 4853 March 6, 2017 Name: Please read the following directions. DO NOT TURN THE PAGE UNTIL INSTRUCTED TO DO SO Directions This exam is closed book and closed notes. There are 32 multiple choice

More information

Analysis. Components of a Time Series

Analysis. Components of a Time Series Module 8: Time Series Analysis 8.2 Components of a Time Series, Detection of Change Points and Trends, Time Series Models Components of a Time Series There can be several things happening simultaneously

More information

Empirical Approach to Modelling and Forecasting Inflation in Ghana

Empirical Approach to Modelling and Forecasting Inflation in Ghana Current Research Journal of Economic Theory 4(3): 83-87, 2012 ISSN: 2042-485X Maxwell Scientific Organization, 2012 Submitted: April 13, 2012 Accepted: May 06, 2012 Published: June 30, 2012 Empirical Approach

More information

Part II. Time Series

Part II. Time Series Part II Time Series 12 Introduction This Part is mainly a summary of the book of Brockwell and Davis (2002). Additionally the textbook Shumway and Stoffer (2010) can be recommended. 1 Our purpose is to

More information

STAT 436 / Lecture 16: Key

STAT 436 / Lecture 16: Key STAT 436 / 536 - Lecture 16: Key Modeling Non-Stationary Time Series Many time series models are non-stationary. Recall a time series is stationary if the mean and variance are constant in time and the

More information

Meteorology 110. Lab 1. Geography and Map Skills

Meteorology 110. Lab 1. Geography and Map Skills Meteorology 110 Name Lab 1 Geography and Map Skills 1. Geography Weather involves maps. There s no getting around it. You must know where places are so when they are mentioned in the course it won t be

More information

Time Series Analysis of United States of America Crude Oil and Petroleum Products Importations from Saudi Arabia

Time Series Analysis of United States of America Crude Oil and Petroleum Products Importations from Saudi Arabia International Journal of Applied Science and Technology Vol. 5, No. 5; October 2015 Time Series Analysis of United States of America Crude Oil and Petroleum Products Importations from Saudi Arabia Olayan

More information

Monthly Long Range Weather Commentary Issued: SEPTEMBER 19, 2016 Steven A. Root, CCM, Chief Analytics Officer, Sr. VP,

Monthly Long Range Weather Commentary Issued: SEPTEMBER 19, 2016 Steven A. Root, CCM, Chief Analytics Officer, Sr. VP, Monthly Long Range Weather Commentary Issued: SEPTEMBER 19, 2016 Steven A. Root, CCM, Chief Analytics Officer, Sr. VP, sroot@weatherbank.com SEPTEMBER 2016 Climate Highlights The Month in Review The contiguous

More information

BlackRock Core Bond Trust (BHK) BlackRock Enhanced International Dividend Trust (BGY) 2 BlackRock Defined Opportunity Credit Trust (BHL) 3

BlackRock Core Bond Trust (BHK) BlackRock Enhanced International Dividend Trust (BGY) 2 BlackRock Defined Opportunity Credit Trust (BHL) 3 MUNICIPAL FUNDS Arizona (MZA) California Municipal Income Trust (BFZ) California Municipal 08 Term Trust (BJZ) California Quality (MCA) California Quality (MUC) California (MYC) Florida Municipal 00 Term

More information

Time Series I Time Domain Methods

Time Series I Time Domain Methods Astrostatistics Summer School Penn State University University Park, PA 16802 May 21, 2007 Overview Filtering and the Likelihood Function Time series is the study of data consisting of a sequence of DEPENDENT

More information

Last time: PCA. Statistical Data Mining and Machine Learning Hilary Term Singular Value Decomposition (SVD) Eigendecomposition and PCA

Last time: PCA. Statistical Data Mining and Machine Learning Hilary Term Singular Value Decomposition (SVD) Eigendecomposition and PCA Last time: PCA Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml

More information

LABORATORY REPORT. If you have any questions concerning this report, please do not hesitate to call us at (800) or (574)

LABORATORY REPORT. If you have any questions concerning this report, please do not hesitate to call us at (800) or (574) LABORATORY REPORT If you have any questions concerning this report, please do not hesitate to call us at (800) 332-4345 or (574) 233-4777. This report may not be reproduced, except in full, without written

More information

OUT-OF-STATE 965 SUBTOTAL OUT-OF-STATE U.S. TERRITORIES FOREIGN COUNTRIES UNKNOWN GRAND TOTAL

OUT-OF-STATE 965 SUBTOTAL OUT-OF-STATE U.S. TERRITORIES FOREIGN COUNTRIES UNKNOWN GRAND TOTAL Report ID: USSR8072-V3 Page No. 1 Jurisdiction: ON-CAMPUS IL Southern Illinois University - Carb 1 0 0 0 Black Hawk College Quad-Cities 0 0 1 0 John A Logan College 1 0 0 0 Rend Lake College 1 0 0 0 Aurora

More information

LABORATORY REPORT. If you have any questions concerning this report, please do not hesitate to call us at (800) or (574)

LABORATORY REPORT. If you have any questions concerning this report, please do not hesitate to call us at (800) or (574) LABORATORY REPORT If you have any questions concerning this report, please do not hesitate to call us at (800) 332-4345 or (574) 233-4777. This report may not be reproduced, except in full, without written

More information

Forecasting. Simon Shaw 2005/06 Semester II

Forecasting. Simon Shaw 2005/06 Semester II Forecasting Simon Shaw s.c.shaw@maths.bath.ac.uk 2005/06 Semester II 1 Introduction A critical aspect of managing any business is planning for the future. events is called forecasting. Predicting future

More information

Crop / Weather Update

Crop / Weather Update Crop / Weather Update Corn Crop Condition Percent of Acreage Rated Good or Excellent 85 80 75 70 65 60 55 50 45 As of September 9, USDA rates the crop at 68% good to excellent. The rating is up one point

More information

An Analysis of Regional Income Variation in the United States:

An Analysis of Regional Income Variation in the United States: Modern Economy, 2017, 8, 232-248 http://www.scirp.org/journal/me ISSN Online: 2152-7261 ISSN Print: 2152-7245 An Analysis of Regional Income Variation in the United States: 1969-2013 Orley M. Amos Jr.

More information

Analysis of Violent Crime in Los Angeles County

Analysis of Violent Crime in Los Angeles County Analysis of Violent Crime in Los Angeles County Xiaohong Huang UID: 004693375 March 20, 2017 Abstract Violent crime can have a negative impact to the victims and the neighborhoods. It can affect people

More information

NatGasWeather.com Daily Report

NatGasWeather.com Daily Report NatGasWeather.com Daily Report Issue Time: 5:15 pm EST Sunday, February 28 th, 2016 for Monday, Feb 29 th 7-Day Weather Summary (February 28 th March 5 th ): High pressure will dominate much of the US

More information

Estimation and application of best ARIMA model for forecasting the uranium price.

Estimation and application of best ARIMA model for forecasting the uranium price. Estimation and application of best ARIMA model for forecasting the uranium price. Medeu Amangeldi May 13, 2018 Capstone Project Superviser: Dongming Wei Second reader: Zhenisbek Assylbekov Abstract This

More information

National Wildland Significant Fire Potential Outlook

National Wildland Significant Fire Potential Outlook National Wildland Significant Fire Potential Outlook National Interagency Fire Center Predictive Services Issued: April 1, 2008 Next Issue: May 1, 2008 Wildland Fire Outlook April 2008 through July 2008

More information

Infant Mortality: Cross Section study of the United State, with Emphasis on Education

Infant Mortality: Cross Section study of the United State, with Emphasis on Education Illinois State University ISU ReD: Research and edata Stevenson Center for Community and Economic Development Arts and Sciences Fall 12-15-2014 Infant Mortality: Cross Section study of the United State,

More information

Lecture 19 Box-Jenkins Seasonal Models

Lecture 19 Box-Jenkins Seasonal Models Lecture 19 Box-Jenkins Seasonal Models If the time series is nonstationary with respect to its variance, then we can stabilize the variance of the time series by using a pre-differencing transformation.

More information

U.S. Outlook For October and Winter Thursday, September 19, 2013

U.S. Outlook For October and Winter Thursday, September 19, 2013 About This report coincides with today s release of the monthly temperature and precipitation outlooks for the U.S. from the Climate Prediction Center (CPC). U.S. CPC October and Winter Outlook The CPC

More information

Monthly Long Range Weather Commentary Issued: February 15, 2015 Steven A. Root, CCM, President/CEO

Monthly Long Range Weather Commentary Issued: February 15, 2015 Steven A. Root, CCM, President/CEO Monthly Long Range Weather Commentary Issued: February 15, 2015 Steven A. Root, CCM, President/CEO sroot@weatherbank.com JANUARY 2015 Climate Highlights The Month in Review During January, the average

More information

Grand Total Baccalaureate Post-Baccalaureate Masters Doctorate Professional Post-Professional

Grand Total Baccalaureate Post-Baccalaureate Masters Doctorate Professional Post-Professional s by Location of Permanent Home Address and Degree Level Louisiana Acadia 19 13 0 3 0 3 0 0 0 Allen 5 5 0 0 0 0 0 0 0 Ascension 307 269 2 28 1 6 0 1 0 Assumption 14 12 0 1 0 1 0 0 0 Avoyelles 6 4 0 1 0

More information

Crop / Weather Update

Crop / Weather Update Crop / Weather Update The next will be published November 13, 2018, due to the Veteran s Day holiday on Monday. Corn Harvesting Progress 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 9/9 9/16 9/23 9/30 10/7

More information

National Wildland Significant Fire Potential Outlook

National Wildland Significant Fire Potential Outlook National Wildland Significant Fire Potential Outlook National Interagency Fire Center Predictive Services Issued: September, 2007 Wildland Fire Outlook September through December 2007 Significant fire

More information

FORECASTING SUGARCANE PRODUCTION IN INDIA WITH ARIMA MODEL

FORECASTING SUGARCANE PRODUCTION IN INDIA WITH ARIMA MODEL FORECASTING SUGARCANE PRODUCTION IN INDIA WITH ARIMA MODEL B. N. MANDAL Abstract: Yearly sugarcane production data for the period of - to - of India were analyzed by time-series methods. Autocorrelation

More information

MINERALS THROUGH GEOGRAPHY

MINERALS THROUGH GEOGRAPHY MINERALS THROUGH GEOGRAPHY INTRODUCTION Minerals are related to rock type, not political definition of place. So, the minerals are to be found in a variety of locations that doesn t depend on population

More information

Autoregressive Integrated Moving Average Model to Predict Graduate Unemployment in Indonesia

Autoregressive Integrated Moving Average Model to Predict Graduate Unemployment in Indonesia DOI 10.1515/ptse-2017-0005 PTSE 12 (1): 43-50 Autoregressive Integrated Moving Average Model to Predict Graduate Unemployment in Indonesia Umi MAHMUDAH u_mudah@yahoo.com (State Islamic University of Pekalongan,

More information

Non-iterative, regression-based estimation of haplotype associations

Non-iterative, regression-based estimation of haplotype associations Non-iterative, regression-based estimation of haplotype associations Benjamin French, PhD Department of Biostatistics and Epidemiology University of Pennsylvania bcfrench@upenn.edu National Cancer Center

More information

Dynamic Time Series Regression: A Panacea for Spurious Correlations

Dynamic Time Series Regression: A Panacea for Spurious Correlations International Journal of Scientific and Research Publications, Volume 6, Issue 10, October 2016 337 Dynamic Time Series Regression: A Panacea for Spurious Correlations Emmanuel Alphonsus Akpan *, Imoh

More information

Grand Total Baccalaureate Post-Baccalaureate Masters Doctorate Professional Post-Professional

Grand Total Baccalaureate Post-Baccalaureate Masters Doctorate Professional Post-Professional s by Location of Permanent Home Address and Degree Level Louisiana Acadia 26 19 0 6 1 0 0 0 0 Allen 7 7 0 0 0 0 0 0 0 Ascension 275 241 3 23 1 6 0 1 0 Assumption 13 12 0 1 0 0 0 0 0 Avoyelles 15 11 0 3

More information

Circle a single answer for each multiple choice question. Your choice should be made clearly.

Circle a single answer for each multiple choice question. Your choice should be made clearly. TEST #1 STA 4853 March 4, 215 Name: Please read the following directions. DO NOT TURN THE PAGE UNTIL INSTRUCTED TO DO SO Directions This exam is closed book and closed notes. There are 31 questions. Circle

More information

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 011 MODULE 3 : Stochastic processes and time series Time allowed: Three Hours Candidates should answer FIVE questions. All questions carry

More information

Office of Budget & Planning 311 Thomas Boyd Hall Baton Rouge, LA Telephone 225/ Fax 225/

Office of Budget & Planning 311 Thomas Boyd Hall Baton Rouge, LA Telephone 225/ Fax 225/ Louisiana Acadia 20 17 3 0 0 0 Allen 2 2 0 0 0 0 Ascension 226 185 37 2 1 1 Assumption 16 15 1 0 0 0 Avoyelles 20 19 1 0 0 0 Beauregard 16 11 4 0 0 1 Bienville 2 2 0 0 0 0 Bossier 22 18 4 0 0 0 Caddo 91

More information

Monthly Long Range Weather Commentary Issued: NOVEMBER 16, 2015 Steven A. Root, CCM, Chief Analytics Officer, Sr. VP, sales

Monthly Long Range Weather Commentary Issued: NOVEMBER 16, 2015 Steven A. Root, CCM, Chief Analytics Officer, Sr. VP, sales Monthly Long Range Weather Commentary Issued: NOVEMBER 16, 2015 Steven A. Root, CCM, Chief Analytics Officer, Sr. VP, sales sroot@weatherbank.com OCTOBER 2015 Climate Highlights The Month in Review The

More information

Forecasting using R. Rob J Hyndman. 2.4 Non-seasonal ARIMA models. Forecasting using R 1

Forecasting using R. Rob J Hyndman. 2.4 Non-seasonal ARIMA models. Forecasting using R 1 Forecasting using R Rob J Hyndman 2.4 Non-seasonal ARIMA models Forecasting using R 1 Outline 1 Autoregressive models 2 Moving average models 3 Non-seasonal ARIMA models 4 Partial autocorrelations 5 Estimation

More information

William Battye * EC/R Incorporated, Chapel Hill, NC. William Warren-Hicks, Ph.D. EcoStat, Inc., Mebane, NC

William Battye * EC/R Incorporated, Chapel Hill, NC. William Warren-Hicks, Ph.D. EcoStat, Inc., Mebane, NC MODULATING EMISSIONS FROM ELECTRIC GENERATING UNITS AS A FUNCTION OF METEOROLOGICAL VARIABLES William Battye * EC/R Incorporated, Chapel Hill, NC William Warren-Hicks, Ph.D. EcoStat, Inc., Mebane, NC Steve

More information

Chapter 8: Model Diagnostics

Chapter 8: Model Diagnostics Chapter 8: Model Diagnostics Model diagnostics involve checking how well the model fits. If the model fits poorly, we consider changing the specification of the model. A major tool of model diagnostics

More information

National Organization of Life and Health Insurance Guaranty Associations

National Organization of Life and Health Insurance Guaranty Associations National Organization of and Health Insurance Guaranty Associations November 21, 2005 Dear Chief Executive Officer: Consistent with prior years, NOLHGA is providing the enclosed data regarding insolvency

More information

Time Series Analysis

Time Series Analysis Time Series Analysis A time series is a sequence of observations made: 1) over a continuous time interval, 2) of successive measurements across that interval, 3) using equal spacing between consecutive

More information

FE570 Financial Markets and Trading. Stevens Institute of Technology

FE570 Financial Markets and Trading. Stevens Institute of Technology FE570 Financial Markets and Trading Lecture 5. Linear Time Series Analysis and Its Applications (Ref. Joel Hasbrouck - Empirical Market Microstructure ) Steve Yang Stevens Institute of Technology 9/25/2012

More information

Time Series Analysis of Currency in Circulation in Nigeria

Time Series Analysis of Currency in Circulation in Nigeria ISSN -3 (Paper) ISSN 5-091 (Online) Time Series Analysis of Currency in Circulation in Nigeria Omekara C.O Okereke O.E. Ire K.I. Irokwe O. Department of Statistics, Michael Okpara University of Agriculture

More information

Technical note on seasonal adjustment for Capital goods imports

Technical note on seasonal adjustment for Capital goods imports Technical note on seasonal adjustment for Capital goods imports July 1, 2013 Contents 1 Capital goods imports 2 1.1 Additive versus multiplicative seasonality..................... 2 2 Steps in the seasonal

More information

Swine Enteric Coronavirus Disease (SECD) Situation Report Sept 17, 2015

Swine Enteric Coronavirus Disease (SECD) Situation Report Sept 17, 2015 Animal and Plant Health Inspection Service Veterinary Services Swine Enteric Coronavirus Disease (SECD) Situation Report Sept 17, 2015 Information current as of 12:00 pm MDT, 09/16/2015 This report provides

More information

Swine Enteric Coronavirus Disease (SECD) Situation Report June 30, 2016

Swine Enteric Coronavirus Disease (SECD) Situation Report June 30, 2016 Animal and Plant Health Inspection Service Veterinary Services Swine Enteric Coronavirus Disease (SECD) Situation Report June 30, 2016 Information current as of 12:00 pm MDT, 06/29/2016 This report provides

More information

MODELING MAXIMUM MONTHLY TEMPERATURE IN KATUNAYAKE REGION, SRI LANKA: A SARIMA APPROACH

MODELING MAXIMUM MONTHLY TEMPERATURE IN KATUNAYAKE REGION, SRI LANKA: A SARIMA APPROACH MODELING MAXIMUM MONTHLY TEMPERATURE IN KATUNAYAKE REGION, SRI LANKA: A SARIMA APPROACH M.C.Alibuhtto 1 &P.A.H.R.Ariyarathna 2 1 Department of Mathematical Sciences, Faculty of Applied Sciences, South

More information

Package ZIM. R topics documented: August 29, Type Package. Title Statistical Models for Count Time Series with Excess Zeros. Version 1.

Package ZIM. R topics documented: August 29, Type Package. Title Statistical Models for Count Time Series with Excess Zeros. Version 1. Package ZIM August 29, 2013 Type Package Title Statistical Models for Count Time Series with Excess Zeros Version 1.0 Date 2013-06-15 Author Ming Yang, Gideon K. D. Zamba, and Joseph E. Cavanaugh Maintainer

More information

Monthly Long Range Weather Commentary Issued: APRIL 1, 2015 Steven A. Root, CCM, President/CEO

Monthly Long Range Weather Commentary Issued: APRIL 1, 2015 Steven A. Root, CCM, President/CEO Monthly Long Range Weather Commentary Issued: APRIL 1, 2015 Steven A. Root, CCM, President/CEO sroot@weatherbank.com FEBRUARY 2015 Climate Highlights The Month in Review The February contiguous U.S. temperature

More information