Loglinear Residual Tests of Moran s I Autocorrelation and their Applications to Kentucky Breast Cancer Data

Size: px
Start display at page:

Download "Loglinear Residual Tests of Moran s I Autocorrelation and their Applications to Kentucky Breast Cancer Data"

Transcription

1 Geographical Analysis ISSN Loglinear Residual Tests of Moran s I Autocorrelation and their Applications to Kentucky Breast Cancer Data Ge Lin, 1 Tonglin Zhang 1 Department of Geology and Geography, West Virginia University, Morgantown, WV, Department of Statistics, Purdue University, West Lafayette, IN This article bridges the permutation test of Moran s I to the residuals of a loglinear model under the asymptotic normality assumption. It provides the versions of Moran s I based on Pearson residuals (I PR ) and deviance residuals (I DR ) so that they can be used to test for spatial clustering while at the same time account for potential covariates and heterogeneous population sizes. Our simulations showed that both I PR and I DR are effective to account for heterogeneous population sizes. The tests based on I PR and I DR are applied to a set of log-rate models for early-stage and late-stage breast cancer with socioeconomic and access-to-care data in Kentucky. The results showed that socioeconomic and access-to-care variables can sufficiently explain spatial clustering of early-stage breast carcinomas, but these factors cannot explain that for the late stage. For this reason, we used local spatial association terms and located four late-stage breast cancer clusters that could not be explained. The results also confirmed our expectation that a high screening level would be associated with a high incidence rate of early-stage disease, which in turn would reduce late-stage incidence rates. Introduction Linear or loglinear spatial regression models are common in spatial epidemiology (Best et al. 000). A set of ecological variables are often associated with disease rates or counts, and after a final model is derived, residuals can be visually inspected on a map for spatial clusters. For a linear model, a residual test of Moran s I for spatial autocorrelation can also be performed to detect spatial clustering for the unexplained regression errors. However, there is no corresponding spatial residual test of clustering for loglinear or Poisson regressions on count data, which was the challenge for the current study. The study was motivated by our inquiry into the spatial patterns of breast cancer incidents in Kentucky counties as they related to Correspondence: Ge Lin, Department of Geology and Geography, West Virginia University, Morgantown, WV glin@wvu.edu Submitted: August, 005. Revised version accepted: April 04, 006. Geographical Analysis 39 (007) r 007 The Ohio State University 93

2 Geographical Analysis the development stage of disease at diagnosis. Breast cancer staging at diagnosis is known to be associated with socioeconomic conditions, mammography screening services, and other variables (Yabroff and Gordis 003; Barry and Breen 005). As socioeconomic variables are often spatially autocorrelated (e.g., poor areas tend to be clustered), we expect clustering of breast cancer to occur, at least for the earlystate incidence rates. If there is no significant environmental cause of breast cancer, the clustering tendency should disappear once we introduce area socioeconomic variables. One way to test for the existence of spatial clustering is to set up a spatial autocorrelation test, such as Moran s I, for Guassian or continuous data by using the permutation test of residuals for Moran s I in a linear regression (Cliff and Ord 1981). Converting incidence to rate, however, is often less appealing than retaining the original count of each in spatial data analysis (Griffith and Haining 006). In addition, Moran s I test assumes that attribute values (e.g., disease prevalence) are either in equal probability among all the geographic units or from a single parent distribution. These assumptions are often violated in the permutation test of Moran s I in disease data due to heterogeneous regional populations and large variation in sparsely populated areas (Besag and Newell 1991). Although there have been several extensions of Moran s I to account for population heterogeneity (Oden 1995; Waldhor 1996; Assuncao and Reis 1999), none of them can include potential ecological covariates. For example, Oden proposes a test statistic Ipop that applies regional population sizes to adjust Moran s I. However, because of a minor modification in the null hypothesis, Ipop is no longer comparable to the original Moran s I (Assuncao and Reis 1999). Consequently, Ipop cannot be extended to evaluate covariates and spatial autocorrelation simultaneously. A spatial logit association model can include potential explanatory variables and identify high-value and low-value clusters (Lin 003; Zhang and Lin 006). It does not, however, have a global measure of spatial clustering that would complement the modeling process for local spatial logit associations. Jacqmin-Gadda et al. (1997) propose a homogeneity score test of a generalized linear model that can also include potential explanatory variables in a correlation test. The test is based on residuals in generalized linear models, a design that its authors claimed to correspond to the permutation test of linear regression errors. However, as we will later show, the test does not adjust variance for heterogeneous population sizes. In addition, because its weight matrix is not necessarily spatially constructed, its null hypothesis is not necessarily spatial independence as one would assume when applying Moran s I test. Consequently, it is not straightforward to use the score test in a generalized linear model that includes spatial correlation and heterogeneity. The purpose of this article is to extend the permutation test of residuals of the Moran s I autocorrelation to generalized linear models so that spatial analysts can directly test for spatial clustering while controlling for potential ecological covariates. While no one has proposed either deviance or Pearson residuals in the spatial statistic literature, Waller and Gotway (004) point out a form close to Pearson 94

3 Ge Lin and Tonglin Zhang Loglinear Residual Tests of Moran s I Autocorrelation residuals as a way to account for inflated variance in Moran s I under heteroskedasticity. In this article, we demonstrate that permutation tests are applicable to Pearson or deviance residuals of loglinear models in the same way as in the traditional permutation test of residuals for Moran s I. In the remaining sections of the article, we first review the permutation test of Moran s I by using regression residuals and then reformulate it in the context of Poisson data by using the Pearson and deviance residuals of a loglinear model. We then evaluate their statistical properties under the null hypothesis of spatial independence in a series of simulated patterns and apply the Pearson residual Moran s I test and deviance residual Moran s I test to breast cancer incidence in Kentucky counties that include potential ecological covariates. From linear to loglinear residual tests of Moran s I Let us consider a study area that has m regions indexed by i. Let z i be the variable of the interest in region i. Moran s I (Moran 1950) is expressed as: P m P m i¼1 j¼1 I ¼ w ijðz i zþðz j zþ hp ih m i¼1 ðz i zþ Pm P i ð1þ =m i¼1 j¼1 w ij where z ¼ P m i¼1 z i=m, w ij is an element of a spatial weight matrix W, with 1 being adjacent for regions i and j and is 0 otherwise. Under the assumption of homogeneity, the moments of Moran s I can be computed under the randomization assumption, which assumes that the observations are generated from a set of random permutations of the observed values. The pffiffiffiffiffiffiffiffiffiffiffiffiffi significance of Moran s I can be determined by the quantity I std ¼½I EðIÞŠ= VarðIÞ under the asymptotic normality assumption, where the respective mean and variance are obtained from the random permutation scheme suggested by Cliff and Ord (1981, p. 1). A significant and positive value of Moran s I (i.e., I std 4z a/ ) usually indicates a positive autocorrelation, such as the existence of either high-value or low-value clustering. A significant and negative value of Moran s I (i.e., I std o z a/ ) usually indicates a negative autocorrelation, such as a tendency toward the juxtaposition of high values with low values. If there is no spatial dependence, I is often close to the mean value 1/ (m 1), which can be approximated by 0 if m is large. In order to account for ecological covariates, it is often suggested that z i be taken as the ith regional residual of a linear regression model when testing for spatial autocorrelation (Cliff and Ord 1981, p. 198). For data resulting purely from a random process, the calculation of Moran s I in equation (1) based on the residuals is the same as the one based on the observed values. It can be concluded that I std is approximately N(0, 1) as m!1under some regularity conditions (Sen 1976). Schmoyer (1994) also demonstrated the validity of a general form of permutation test based on the residuals of a linear model with independent and identically distributed errors. In Poisson models with heterogeneous population sizes, these justifications cannot be applied (Waldhor 1996; Assuncao and Reis 1999). 95

4 Geographical Analysis The residuals of loglinear models are well established in the statistical literature in a nonspatial context. We can extend them to a spatial context to test for spatial autocorrelation. In his synthesis of previous studies, Agresti (1990, p. 431) showed that the Pearson and log-likelihood (deviance) residuals of loglinear models are asymptotically multivariate normal with mean 0 and the variance covariance matrix a projection matrix. This particular asymptotic form of the residuals is analogous to that of linear regression residuals. Following this reasoning, we can devise a log-rate model that closely resembles a linear regression model to account for potentially heterogeneous population sizes and ecological covariates. We apply the permutation test based on the asymptotic normality assumption, so that Moran s I based on the residuals of a log-rate model is analogous to Moran s I based on regression residuals. In the following, we specify the residuals of a log-rate model for the permutation test of Moran s I. Let n i be the observed counts for a Poisson random variable N i at region i (i 5 1,..., m), and let x i and y i be the ith regional population and relative risk, respectively. Then, N i are assumed to be independently Poisson distributed, or N i Poisson (E i y i ), where E i is the expected count for region i. The null hypothesis that all the relative risks (y 1,...,y m ) are 1 can be stated (Elliott et al. 000, p. 13). Suppose that a set of explanatory variables (x i,1,...,x i,k 1 ) are observed together with n i. A log-rate model can then be expressed as logð^n i =x i Þ¼^b 0 þ ^b 1 x i;1 þ...þ ^b k 1 x i;k 1 ðþ In equation (), nˆ i and log(nˆ i/x i ) are, respectively, the estimated count and estimated log rate for region i, ^b 0 is the estimate of grand mean, and the other ^bs are parameter estimates for explanatory variables. Equation () is often estimated by moving log(x i ) to the right-hand side, so that it becomes the offset. logð^n i Þ¼logðx i Þþ^b 0 þ ^b 1 x i;1 þ...þ ^b k 1 x i;k 1 ð3þ Based on these notations, the conventional Pearson residual for region i is defined as r i;p ¼ n i ^n i ^n 1= i ð4þ and the conventional deviance residual (Agresti 1990, p. 45) for region i is defined as r i;d ¼ signðn i ^n i Þ½n i logðn i =^n i Þ n i þ ^n i Š 1= ð5þ where sign(a) is1ifa40, is 0 if a 5 0, and is 1ifao0. We can test for spatial autocorrelation based on the residuals in equation () by replacing z i in (1) with either the Pearson residual in (4) or the deviance residual in (5). When z i is replaced with r i,p, for example, Moran s I becomes Pearson residual Moran s I, and we denote it as I PR, which can be expressed as 96

5 Ge Lin and Tonglin Zhang I PR ¼ P m P m i¼1 j¼1 w ij 4 P m i¼1 n i ^n i ^n 1= i n i ^n i ^n 1= i 1 m 1 m Loglinear Residual Tests of Moran s I Autocorrelation X m l¼1 X m k¼1 n l ^n l ^n 1= l n k ^n k ^n 1= k! 0 nj ^n j 1 m ^n 1= m l¼1 j! 3 h =m5 P P i m i¼1 j¼1 w ij n l ^n l ^n 1= l 1 A ð6þ When z i is replaced with r i,d, Moran s I becomes deviance or log-likelihood residual Moran s I and we denote it as I DR. To implement these tests, the parameters of explanatory variables together with the residuals are first estimated in the modelfitting process. Then, I PR and I DR are calculated. Finally, the means and variances of I PR and I DR are computed according to the random permutation scheme (Cliff and Ord 1981). The P values of I PR and I DR can be computed based on the asymptotic normality assumption when m is large and the number of covariates is small. If the independence model is rejected while all the ecological covariates can be found and modeled, the residuals of I PR or I DR from the model should be completely lacking in spatial autocorrelation. Otherwise, it may indicate spatial clustering that cannot be explained. Although no one has specified both I DR and I PR as above, it is worth noting the difference between I PR and the score test of the residuals in a generalized linear model (Jacqmin-Gadda et al. 1997). The score test, which also accounts for explanatory variables, was derived from a random-effect approach by neglecting an additional term of overdispersion. Based on its original formula, the test statistic is expressed as T ¼ Xm X m w ij ðy i ^m i ÞðY j ^m j Þ ð7þ i¼1 j6¼i where Y i is the variable of interest in region i as assumed from the exponential family, ^m i is an estimate of E(Y i ), and w ij is an element from the weight matrix. For a Poisson model, one can take Y i 5 N i so that y i is the number of the observed counts n i, and ^m i is the estimated count nˆ i in our notations. Then, T ¼ Xm X m w ij ðn i ^n i Þðn j ^n j Þ i¼1 j¼1;j6¼i This particular form of T is different from the formulation of I PR given by (6) because we use z i ¼ðn i ^n i Þ=^n 1= i instead of z i 5 n i nˆ i. This difference also leads to a variance adjustment problem for T, because the variance of Y i ^m i still depends on the i th regional population size x i even when ^m i equals the true parameter m i.in addition, w ij in T may not be based on spatial relationships, such as spatial adjacency or distance, and it adds complications in designing a proper weight matrix and in deriving the P value. Pearson residuals, on the other hand, are well established, and it is much easier to calculate or implement Pearson residuals in a stan- 97

6 Geographical Analysis dard statistical package than it is for the score test. Because of these differences, we will not compare I PR and I DR with the score test in our simulations. Simulation study To evaluate Pearson residuals I PR and deviance residuals I DR for variance adjustments, we carried out simulations under the null hypothesis of no spatial clustering. We included two alternative test statistics, Oden s Ipop and Assuncao and Reis s EBI, both of which are designed to account for heteroskedasticity without controlling for ecological covariates. As a reference point, we also included the original Moran s I, denoted by I r, by taking z i 5 n i /x i in equation (1). For ease of discussion, we list some expressions for Oden s Ipop and Assuncao and Reis s EBI below. Oden s Ipop is defined as P Ipop ¼ n m P m i¼1 j¼1 M ij ðe i d i Þðe j d j Þ nð1 bþ P m i¼1 M ii e i nb P m i¼1 M ii d i bð1 bþ x P m i¼1 P m j¼1 d id j M ij x P m i¼1 d im ii ð8þ where b ¼ n=x, e i 5 n i /n, d i 5 x i /x, Mij ¼ w p ffiffiffiffiffiffiffiffi P ij= d i d j, n ¼ m i¼1 n i and x ¼ P m i¼1 x i. Oden also derived the mean and variance of Ipop. If there is no spatial dependence, Ipop is close to the mean value 1/(x 1). Oden s Ipop weights a pair of regional populations together with the spatial weight matrix by using Mij. While w ij in Mij is still a spatial weight in the W matrix, its diagonal element w ii 6¼0. This feature makes Ipop incomparable with Moran s I when spatial clustering and population heterogeneity both exist. Noticing this difference, Assuncao and Reis (1999) proposed their version of population-adjusted p Moran s I or EBI. In their definition, z i ¼ðp i bþ= ffiffiffiffi n i, where b 5 n/x, ni 5 a1b/x i, a 5 s b/(x/m) ands ¼ P m i¼1 x iðp i dþ=x. Hence, P m P i¼1 j6¼i w p i b ij pffiffiffiffi 1 X! m p l b pj b pffiffiffiffi pffiffiffiffi 1 X m p l b pffiffiffiffi n i m l¼1 n l n j m l¼1 n l EBI ¼ " P m p i b l¼1 pffiffiffiffi 1 X # m p l b ð9þ h pffiffiffiffi =m Pm P i m m l¼1 i¼1 j¼1;6¼i w ij n i n l Although they did not notice the connection, Assuncao and Reis point out that n i can be set equal to b/x i if n i o0. If all n i o0, then I PR and EBI are identical in the absence of covariates. The justification and evaluation of EBI, in this case, would be directly applicable to I PR. However, we never encountered this case in our simulations, and so the general form of EBI is always assumed. Simulations were based on a 0 0 lattice. We defined a spatial weight matrix according to spatial adjacency, that is, w ij 5 1 if two lattice points (i, j) are adjacent, 0 otherwise. For I pop, we followed Oden and took w ii 5. These weight assignments are identical to those used in Assuncao and Reis s (1999) simulations. We denote l i and x i, respectively, as the disease rate and population size at lattice 98

7 Ge Lin and Tonglin Zhang Loglinear Residual Tests of Moran s I Autocorrelation (a1) Half Sparse Half Dense (a3) One Cluster of Dense (a5) One Cluster of Sparse ξ= Y 10 Y 10 ξ=1 Y 10 ξ= ξ=1 5 ξ= ξ= X X X (a) Quad Sparse Quad Dense (a4) Two Clusters of Dense (a6) Two Clusters of Sparse ξ=1000 ξ=1 15 ξ= ξ=1 Y 10 Y 10 ξ=1 Y 10 ξ= ξ=1 ξ= ξ= ξ= X X X Figure 1. Population patterns for (a1) (a6) in the unit of point i, and N i as the Poisson random variable with the mean value m i 5 l i x i.in each run, 400 Poisson random variables with the constant rate l i were generated independently on the lattice points, so that there would be no spatial clustering of disease rates. In order to compare a wide range of heteroskedasticity, we included a spatial homogeneous pattern, together with six heterogeneous patterns similar to those used in Waldhor s (1996) simulations (Fig. 1). We used u i and v i to represent the vertical axis and the horizontal axis, respectively, so that a lattice point i can be easily identified by i 5 0(u i 1)1v i. The seven spatial population patterns were defined in the units of 10 5 (e.g., x i 5 1 means and x i means ), and they are: (a0) Homogeneous population with x i 5 1 for all i 5 1,..., 400. (a1) Half sparse and half dense: x i 5 1ifu i 10, x i if u i 11. (a) One quad sparse and one quad dense: x i 5 1ifu i 10 and v i 10 or u i 11 and v i 10; x i if u i 10 and v i 11 or u i 11 and v i 10. (a3) All sparse except one cluster with a dense population: x i 5 1, except when qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi lattice point i is within ðu i 5Þ þðv i 5Þ 3, in which x i

8 Geographical Analysis (a4) (a5) (a6) All sparse except two clusters with dense populations: x i 5 1, except qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi when lattice point i is within ðu i 5Þ þðv i 5Þ 3 or qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðu i 15Þ þðv i 15Þ 3, in which x i All dense except one cluster with sparse population: x i , except when qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi lattice point i is within ðu i 5Þ þðv i 5Þ 3, in which x i 5 1. All dense except two clusters with sparse populations: x i , except qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi when lattice point i is within ðu i 5Þ þðv i 5Þ 3 or qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðu i 15Þ þðv i 15Þ 3, in which x i 5 1. For a given pattern of population and rate distributions, we ran the simulations 10,000 times and calculated I r, I PR, I DR, Ipop, and EBI for each run. We fixed the disease rate at for all locations in the seven heterogeneous patterns (a0 a6), so that E(N i ) 5 V(N i ) 5 10 if x i 5 1 unit and E(N i ) 5 V(N i ) 5 10,000 if x i units. Under the null hypothesis, we assessed the validity of the permutation test by comparing (a) the observed variance for each test statistic versus corresponding permutation variance and (b) the rejection rates at the 0.05 nominal value (a ). In the preliminary analysis, we also compared the observed and permutation means for each of I r, I D, I DR, and EBI. As the difference between the two could be ignored, we will not compare them. Table 1 lists the simulation results of the observed and permutation variances of I r, I PR, I DR, Ipop, and EBI based on the seven patterns (a0 a6), where s S denotes the observed sample variance from the simulation and s R denotes estimated variance from the random permutation scheme. As s S can be treated as the true values, we compared the ratio of the values of s S and s R. If the ratio is close to 1, it suggests that estimated variance is close to the permutation variance. The results show that, when the populations were homogeneous, the ratios between s S and s R were very close to 1. When the populations were heteroge- Table 1 Comparison of Variance for Selected Test Statistics I r ( 0.001) I PR ( 0.001) I DR ( 0.001) EBI ( 0.001) Ipop ( 10 1 ) s S s R s S s R s S s R (a0) (a1) (a) (a3) (a4) (a5) (a6) s S s R s S s R 300

9 Ge Lin and Tonglin Zhang Loglinear Residual Tests of Moran s I Autocorrelation Table Rejection Rates (%) for Selected Test Statistics I r I PR I DR EBI Ipop (a0) (a1) (a) (a3) (a4) (a5) (a6) neous, the rate-based test I r produced predominantly biased variance estimates, a result consistent with Waldhor s simulations. When a set of sparsely populated points or grids were clustered, the permutation tended to underestimate the true variance. In the cases of one-cluster (a5) and two-cluster (a6) patterns, the permutation variances were underestimated by about 9.% and 18.%, respectively. When the population distribution was characterized by (a1) or (a), the permutation underestimated true variances by about 50%. In contrast, the ratios between s S and s R for I PR, I DR, Ipop, and EBI were all very close to 1 in patterns (a1) to (a6). The results for I PR and I DR suggest that the estimated values of the variance are trustworthy and that they can effectively reduce potentially inflated variance. As the ratios were very close to 1 for Ipop and EBI, I PR and I DR, it might not be critical for using I PR or I DR if one simply wants to account for heteroskedasticity. Nevertheless, I PR or I DR have an advantage when one wishes to incorporate covariates. Table lists the rejection rates of the permutation test based on I r, I PR, I DR, Ipop and EBI according to the seven spatial population patterns. When populations are homogeneous, all the test statistics had a type I error rate between 4.60% to 5.03% with an I DR above 5% by a fraction. When populations were heterogeneous, all the test statistics except I r had a consistent rejection rates around 5%, suggesting that I PR, I DR, Ipop, and EBI were all reasonable under the null hypothesis and they can effectively correct potentially inflated variance with a satisfactory level of type I errors. The results for I r, in contrast, failed to reject the null hypothesis with the accepted nominal value 0.05 due to an inflated variance. Although some patterns (a3 and a4) registered lower rejection rates, all the patterns had a rejection rate more than 5%. In the cases of one or two sparsely populated regions clustered in a densely populated study area (patterns a5 and a6), the rejection rate for a spatially random pattern was more than 40%. These results essentially confirmed Waldhor s simulation results for I r. Even though the performance of the permutation test for I PR and I DR is comparable with the performance from Ipop and EBI, I PR and I DR have the flexibility of incorporating potential covariates in to their tests for spatial clustering, which we will demonstrate in the following section. 301

10 Geographical Analysis Application Data and variables County-level breast cancer and at-risk population data were obtained from the Kentucky cancer registry for the years The data set reports breast cancer cases according to their developmental stage as follows: 0, a benign tumor; 1 an in situ tumor;, localized tumor; 3, a regional tumor; 4, a distant tumor; and 5, usually used to code patients who died with later stage disease without an autopsy report on file. For the purpose of our analysis, we deleted the stage 0 cases. Following the U.S. Surveillance, Epidemiology, and End Results (SEER) Program definitions, we regrouped the in situ and localized tumors as early stage and the regional, distant and unknown tumors as late stage. Generally, early-stage breast carcinomas are confined to the breast and can often be treated successfully, whereas late-stage carcinomas tend to spread beyond the breast and are often fatal. There were a total of 16,055 breast cancer cases during this period, and 80% of the cases were between age 35 and 75. On average, there were 96.7 per 100,000 women diagnosed at an early stage and 36.9 per 100,000 at a later stage. If the overall breast cancer incidence rate is constant across counties, the early-stage breast cancer rate should, theoretically, be negatively related to the late-stage breast cancer rate. Based on 10-year age-group incidence data during the same period, we calculated the indirect standard incidence rate (SIR) or relative risk for each county (Waller and Gotway 004, p. 15), and the results are shown in Figs. and 3 in six quantiles. We observed that a strip of counties along the boundary between Appalachian and non-appalachian regions had an elevated early-stage breast cancer risk, as did the westernmost counties. With regard to late-stage breast cancer, there was a cluster of counties with a high incidence rate around the Figure. Early-stage standard relative risk in Kentucky. 30

11 Ge Lin and Tonglin Zhang Loglinear Residual Tests of Moran s I Autocorrelation Figure 3. Late-stage standard relative risk in Kentucky. northeastern Appalachian area; counties along the southern border of the state also had a higher relative risk. As screening for early-stage breast cancer tends to be associated with age, socioeconomic conditions, and access to health care within a geographic area (Freeman 1989), we included additional county variables while testing spatial clustering for both early-stage and late-stage breast cancer incidents. We obtained county population and socioeconomic data from the 000 U.S. Census. Age is expected to be positively associated with breast cancer incidents, and we obtained median age and population age groups as potential control variables. The socioeconomic conditions in a county can be related to breast cancer in two ways (Bradley, Given, and Robert 001). On the one hand, breast cancer is more prevalent among White women or those with higher socioeconomic status, and so counties that have a higher median family income (MEDFINC) are expected to have greater breast cancer incidence rates. On the other hand, women who have a higher socioeconomic status tend to be more aware of and more able to afford breast cancer screening than are those who have a lower socioeconomic status. Consequently, although counties that have better socioeconomic conditions may have a higher incidence rate of early-stage disease, they may not necessarily have a higher incidence rate of late-stage disease than do counties with worse socioeconomic conditions (Roche, Skinner, and Weinstein 00). We relied on several other data sources for access-to-care measures. We obtained breast cancer screening rates from the 1997 and 1998 Behavioral Risk Factor Surveillance Systems (BRFSS). Owing to the confidentiality concern, the original release of cancer screening level had some counties grouped together, and so we divided rates into tertiles of high (H-screening: 470%), middle (M-screening: 65 70%), and low (L-screening: o65%). A higher breast cancer screening level 303

12 Geographical Analysis (primarily by mammography) is expected to be associated with higher early-stage incidence rates, and negatively associated with late-stage rates. In the preliminary analysis, we found that the differences between low and middle tertiles were minimal, and we grouped them together in the final analysis. We also obtained the 1998 population-to-primary care physician ratio (POP/PMD) in 1998 from the Kentucky Department of Public Health; a lower ratio indicates that a physician can pay more attention to each patient. As breast cancer screening is most frequently recommended in a primary care setting, having a greater number of primary care physicians should help to reduce the incidence of all stages of breast cancer. For this reason, we expected the population-to-physician ratio to be negatively associated with both early-stage and late-stage breast cancer rates. Finally, we used data from a geographic information system to derive geographic access measures for Kentucky counties. We used the TIGER file from the 000 U.S. census to derive a measure of access to major highways. A county is coded 1 if a major national highway (HWY) passes through it, and 0 otherwise. It was expected that highway access would increase access to health care facilities and increase early-stage breast cancer diagnoses. We also divided counties into within and outside the Appalachian region (Appalachian) as encircled by the bold line in Figs. and 3. Counties within the Appalachian region generally are economically distressed and medically underserved, and the all-cause mortality rate tends to be much higher within the region (Haaga 004; Mather 004). Analysis To test Moran s I for spatial clustering, we first used Pearson residuals I PR and deviance residuals I DR for the null model without any covariates. Here, I PR and I DR in the null loglinear model correspond to the traditional Moran s I without any covariates. We then introduced explanatory variables in the so-called ecological model. In the preliminary analysis, we found that college education, poverty rate, and MEDFINC were highly correlated; we used MEDFINC in the final analysis because it was the most significant variable in terms of the likelihood ratio test. We also experimented with different age variables and found that proportions of age together with age 65 and over were not significant. We selected median age, because it was consistently significant in the ecological models for both stages; the greater the age, the greater the likelihood of breast cancer. We expected that the null model would indicate some spatial clustering through significant spatial autocorrelation and that the correlation should be weakened or disappeared once the explanatory variables were introduced. As most of explanatory variables in the literature are tapped for the early stage, their effects on the late stage are expected to be weaker. If the autocorrelation was found to persist or could not be explained by the ecological model, our task was to locate spatially clustered counties and provide our findings to epidemiologists and cancer specialists for further identification of the etiologies associated with breast cancer clusters. We used a spatial mixed 304

13 Ge Lin and Tonglin Zhang Loglinear Residual Tests of Moran s I Autocorrelation model to search for high-value and low-value spatial clusters by including additional local spatial association terms, as demonstrated by Lin (003). The method makes use of the vector of the spatial weight matrix, with 1 being adjacent to i inclusive (i.e., including the ith region itself), and 0 being otherwise. If a cluster of counties associated with the ith vector could significantly reduce the log-likelihood (deviance), it indicates a local association or cluster centered around the ith county. If the association is positive, it suggests a high-value cluster. If the association is negative, it suggests a low-value cluster (Lin and Zhang 004). After controlling for pockets of high-value and low-value clustering and ecological covariates, we would then expect an insignificant residual Moran s I. Results Table 3 lists Moran s I coefficients and the t values of ecological variables from the early-stage log-rate models. In the null model, both Pearson residual Moran s I PR and deviance residual Moran s I DR were positively significant, suggesting a clustering tendency. Once the explanatory variables were introduced into the ecological model, however, I PR and I DR both became insignificant based on their corresponding residuals. Hence, a spatial clustering tendency in the null model reflected spatial patterning of age structure, socioeconomic status, and access to care. In particular, counties with an older median age, a high level of breast cancer screening, or with easy highway access were associated with higher detection rates of early-stage breast cancer, whereas the POP/PMD and location in the Appalachian region were negatively associated with detection rates. These results were all consistent with our expectations and the existing literature. Finally, county Table 3 Early-Stage Breast Cancer Log-Rate Models Models Null Ecological Coefficient t value Coefficient t value (Intercept) Median age MEDFINC HWY Appalachian High screening POP/PMD Clustering tests Moran s I PR Moran s I DR Summary statistics G G NOTE: G stands for likelihood ratio w and t value is the ratio of estimated value and its standard error. POP/PMD, population-to-primary care physician ratio; MEDFINC, median family income. 305

14 Geographical Analysis MEDFINC had a negative and weak (P value o0.10) association with the early breast cancer detection rate. While it has been reported widely that women in lessdeveloped counties, such as those in the Appalachian area of Kentucky, are less likely to have their breast cancer diagnosed early (Gregorio et al. 00), it seemed counterintuitive to us that a higher county MEDFINC would be associated with a lower county rate of early-stage breast cancer. When we added only MEDFINC to the null model, the coefficient for MEDFINC was positive and significant. The weak association, therefore, may reflect an effect when age, the level of breast cancer screening, and other access-to-care variables were taken into account. Turning to the results from the late-stage log-rate models (Table 4), we found that Moran s I PR and I DR were significant for both the null and ecological models and that the explanatory variables in the ecological model were insufficient to account for the clustering tendency of the late-stage incidence rates. Three variables that remained significant were median age, population-to-physician ratio, and high screening level. The coefficients for the median age and population-to-physician ratio remained consistent with the early-stage model. However, the late-stage rate became negatively associated with a high screening level. This result was consistent with our expectation that a high screening level was associated with high rates of early-stage diagnoses and low rates of late-stage disease. To further pinpoint the core of clustered counties unexplained by the ecological model, we deleted insignificant variables except median age from the Table 4 Late-Stage Breast Cancer Log-Rate Models Models Null Ecology Mixed Coefficient t value Coefficient t value Coefficient t value (Intercept) Median age MEDFINC HWY Appalachian H-screening POP/PMD Barran-cluster Greenup-cluster Marshall-cluster Union-cluster Clustering tests Moran s I PR Moran s I DR Summary statistics G 5 39 df G 5 94 df G 5 51 df 5 11 NOTE: G stands for likelihood ratio w and t value is the ratio of estimated value and its standard error. POP/PMD, population-to-primary care physician ratio; MEDFINC, median family income. 306

15 Ge Lin and Tonglin Zhang Loglinear Residual Tests of Moran s I Autocorrelation Figure 4. High-value and low-value clusters of late-stage breast cancer in Kentucky. ecological model and applied a stepwise regression by including local spatial association terms. We identified four local association terms, none of which overlapped geographically. Each core county and its adjacent counties constituted a cluster, and including the core county only would not significantly reduce the clustered effect. Except for a cool spot around Barran County, the other three core counties represented the centers of three elevated late-stage clusters (Fig. 4). For instance, the rate in Union County and its adjacent counties was times the rates of other counties not included in the clusters. By including these clusters in the final model, both I PR and I DR were found to be insignificant, suggesting the disappearance of the clustering tendency once these clusters were accounted for in the model. It is worth mentioning that, if we dropped Marshall County, the median age effect would be significant, but the clustering effect would still remain. It suggests that a greater proportion of aging population around Marshall and its adjacent counties contributed to this cluster. It is also worth mentioning that Jefferson County, where Louisville is located, had a very low late-stage rate, but its adjacent counties each had a relatively high rate. Although including Jefferson County in the final model would reduce the log-likelihood ratio and the t values for I PR and I PD, counties around Jefferson County would not constitute a cluster. 1 Even though the main purpose of using I PR and I DR was to account for heteroskedasticity, new insights have been gained from analyzing I PR and I DR in the model-fitting processes. When the five explanatory variables were sufficient to reduce the spatial clustering effect of early-stage breast cancer, they highlight the underlying socioeconomic and health access processes that operate in geographic space. When known explanatory variables were not sufficient to reduce the spatial clustering effect as in the case of late-stage breast cancer, local spatial association terms can be used to further identify local clusters. The identification of spatial clusters may help to further reveal unique geographic characteristics in the clus- 307

16 Geographical Analysis tered areas associated with cancer disparities. Without measuring I PR or I DR,we would not be able to evaluate potential clustering effect in the traditional log-rate or Poisson regression models. Conclusions Although the asymptotic validity of permutation tests has been demonstrated in the literature and Pearson residuals and deviance residuals of loglinear models are asymptotically normal, no one has evaluated and applied them in the spatial context. In the current study, we have bridged Pearson residuals and deviance residuals of loglinear models with the permutation test of Moran s I under the asymptotic normality assumption. Our simulation study showed that both test statistics I PR and I DR were effective in reducing inflated variance caused by heterogeneous populations, and they both had an acceptable type I error rate. In addition, we tested both based on a set of log-rate models or Poisson regressions for early-stage and late-stage breast cancer incidence data, together with socioeconomic and access-to-care data in Kentucky. The results showed that socioeconomic and access-to-care variables were sufficient to account for spatial clustering of early-stage breast carcinomas with access-to-care measures, such as breast cancer screening and number of primary care providers being more persistent than county MEDFINC. After controlling for age, access-to-care measures, and regional distress factors in the Appalachian counties, the purported positive association between higher socioeconomic and early-stage breast cancer could be substantially weakened. For late-stage carcinomas, two salient and persistent factors were level of breast cancer screening and POP/PMD. In contrast to the finding that a high screening level was associated with a high incidence rate of early-stage breast cancer, the late-stage incidence rate was negatively associated with breast cancer screening level. This result confirmed our expectation: a high screening level is associated with a high incidence rate of early-stage diagnoses, which in turn reduces late-stage incidence rates. When the two access variables failed to reduce the spatial clustering tendencies from late-stage breast cancer, we searched for a local spatial association based on the likelihood ratio test. We located four clusters: one low-value cluster around Barran County and three high-value clusters. Two of the high-value clusters were adjacent to each other near the western corner of Kentucky. These unexplained clusters provided the basis for further investigation of the etiological and ecological factors of late-stage breast cancer in Kentucky. It should be pointed out that, even though we include median age, the age effect may not be fully accounted for in our model: future work should explore ways to account for both age and ecological effects while testing for spatial clustering. Spatial regressions have been widely used, but their use with the permutation tests of residuals either in linear or loglinear models is rarely seen. An advantage of the loglinear residual permutation test over the linear residual permutation test is that the former can account for potential spatially heterogeneous populations, 308

17 Ge Lin and Tonglin Zhang Loglinear Residual Tests of Moran s I Autocorrelation which makes it a viable alternative in the log-rate modeling of disease rates, as demonstrated in our study. The method is expected to complement some spatial cluster tests, such as the spatial scan statistic (Kulldorff 1997) and G statistic (Getis and Ord 199). In addition, the ability to show spatial clustering in I PR and I DR is complementary to disease mapping, which intends to display true disease risks while controlling for heterogeneous populations and regional risk factors (Lawson and Clark 00). Finally, a local version of loglinear residuals can be developed as an exploratory tool to complement other local indicators of spatial association (Anselin 1995). Acknowlegements Data used in this publication were provided by the Kentucky Cancer Registry, Lexington, KY. We would like to acknowledge helpful comments from Linda Pickle and Robert Hanham. We would also like to thank three reviewers and the editor for their comments and suggestions. Note 1 We also analyzed the late-stage cases versus all cases at the patient level. The results were consistent. Population-to-physician ratio and breast cancer screening and close to highway were less likely to be associated with late-stage diagnoses, while residing in the Appalachian region was more likely to be associated with late-stage diagnoses. Counties around Jefferson County had an excessive late-stage breast cancer diagnoses among breast cancer patients. References Agresti, A. (1990). Categorical Data Analysis. New York: Wiley. Anselin, L. (1995). Local Indicators of Spatial Association-LISA. Geographic Analysis 7, Assuncao, R., and E. Reis. (1999). A New Proposal to Adjust Moran s I for Population Density. Statistics in Medicine 18, Barry, J., and N. Breen. (005). The Importance of Place of Residence in Predicting Late- Stage Diagnosis of Breast or Cervical Cancer. Health and Place 11, Besag, J., and J. Newell. (1991). The Detection of Clusters in Rare Diseases. Journal of Royal Statistic Society A 154, Best, N., K. Ickstadt, and R. Wolpert. (000). Spatial Poisson Regression for Health and Exposure Data Measured at Disparate Resolutions. Journal of the American Statistical Association 95, Bradley, C. J., C. W. Given, and C. Robert. (001). Disparities in Cancer Diagnosis and Survival. Cancer 91, Cliff, A. D., and J. K. Ord. (1981). Spatial Processes: Models and Applications. London: Pion. Elliott, P., J. Wakefield, N. Best, and D. Briggs. (000). Spatial Epidemiology. New York: Oxford University Press. 309

18 Geographical Analysis Freeman, H. P. (1989). Cancer and the Socioeconomically Disadvantaged. Ca-A Cancer Journal for Clinicians 39, Getis, A., and J. Ord. (199). The Analysis of Spatial Association by Use of Distance Statistics. Geographical Analysis 4, Gregorio, D. I., M. Kulldorff, L. Barry, and H. Samociuk. (00). Geographic Differences in Invasive and in situ Breast Cancer Incidence According to Precise Geographic Coordinates, Connecticut International Journal of Cancer 100, Griffith, D., and R. Haining. (006). Beyond Mule Kicks: The Poisson Distribution in Geographical Analysis. Geographical Analysis 38, Haaga, J. H. (004). Educational Attainment in Appalachia. Population Reference Bureau. ARC Series, Vol. 5. Washington, DC: Population Reference Bureau. Jacqmin-Gadda, H., D. Commenges, C. Nejjari, and J. Dartigues. (1997). Testing of Geographical Correlation with Adjustment for Explanatory Variables: An Application to Dyspnoea in the Elderly. Statistics in Medicine 16, Kulldorff, M. (1997). A Spatial Scan Statistic. Communications in Statistics, Theory and Methods 6, Lawson, A. B., and A. Clark. (00). Spatial Mixture Relative Risk Models Applied to Disease Mapping. Statistics in Medicine 1, Lin, G. (003). A Spatial Logit Association Model for Cluster Detection. Geographical Analysis 35, Lin, G., and T. Zhang. (004). A Method for Testing Low-Value Spatial Clustering for Rare Disease. Acta Tropica 91, Mather, M. (004). Households and Families in Appalachia. Population Reference Bureau ARC Series, Vol. 4. Washington, DC: Population Reference Bureau. Moran, P. A. P. (1950). A Test for the Serial Independence of Residuals. Biometrika 37, Oden, N. (1995). Adjusting Moran s I for Population Density. Statistics in Medicine 14, Roche, L. S., R. Skinner, and R. B. Weinstein. (00). Use of a Geographic Information System to Identify and Characterize Areas with High Proportions of Distant Stage Breast Cancer. Journal of Public Health Management Practice 8, 6 3. Schmoyer, R. L. (1994). Permutation Tests Form Correlation in Regression Errors. Journal of American Statistical Association 89, Sen, A. (1976). Large Sample-Size Distribution of Statistics Used in Testing for Spatial Correlation. Geographical analysis 9, Waldhor, T. (1996). The Spatial Autocorrelation Coefficient Moran s I Under Heteroscedasticity. Statistic in Medicine 15, Waller, L. A., and C. A. Gotway. (004). Applied Spatial Statistics for Public Health Data. New York: Wiley. Yabroff, Y. R., and L. Gordis. (003). Does Stage at Diagnosis Influence the Observed Relationship Between Socioeconomic Status and Breast Cancer Incidence, Case- Fatality, and Mortality? Social Sciences and Medicine 57, Zhang, T., and G. Lin. (006). A Supplemental Indicator of High-Value or Low-Value Spatial Clustering. Geographical Analysis 38,

Cluster Detection Based on Spatial Associations and Iterated Residuals in Generalized Linear Mixed Models

Cluster Detection Based on Spatial Associations and Iterated Residuals in Generalized Linear Mixed Models Biometrics 65, 353 360 June 2009 DOI: 10.1111/j.1541-0420.2008.01069.x Cluster Detection Based on Spatial Associations and Iterated Residuals in Generalized Linear Mixed Models Tonglin Zhang 1, and Ge

More information

Identification of Local Clusters for Count Data: A. Model-Based Moran s I Test

Identification of Local Clusters for Count Data: A. Model-Based Moran s I Test Identification of Local Clusters for Count Data: A Model-Based Moran s I Test Tonglin Zhang and Ge Lin Purdue University and West Virginia University February 14, 2007 Department of Statistics, Purdue

More information

In matrix algebra notation, a linear model is written as

In matrix algebra notation, a linear model is written as DM3 Calculation of health disparity Indices Using Data Mining and the SAS Bridge to ESRI Mussie Tesfamicael, University of Louisville, Louisville, KY Abstract Socioeconomic indices are strongly believed

More information

Computational Statistics and Data Analysis

Computational Statistics and Data Analysis Computational Statistics and Data Analysis 53 (2009) 2851 2858 Contents lists available at ScienceDirect Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda Spatial

More information

USING CLUSTERING SOFTWARE FOR EXPLORING SPATIAL AND TEMPORAL PATTERNS IN NON-COMMUNICABLE DISEASES

USING CLUSTERING SOFTWARE FOR EXPLORING SPATIAL AND TEMPORAL PATTERNS IN NON-COMMUNICABLE DISEASES USING CLUSTERING SOFTWARE FOR EXPLORING SPATIAL AND TEMPORAL PATTERNS IN NON-COMMUNICABLE DISEASES Mariana Nagy "Aurel Vlaicu" University of Arad Romania Department of Mathematics and Computer Science

More information

Spatial Clusters of Rates

Spatial Clusters of Rates Spatial Clusters of Rates Luc Anselin http://spatial.uchicago.edu concepts EBI local Moran scan statistics Concepts Rates as Risk from counts (spatially extensive) to rates (spatially intensive) rate =

More information

Lattice Data. Tonglin Zhang. Spatial Statistics for Point and Lattice Data (Part III)

Lattice Data. Tonglin Zhang. Spatial Statistics for Point and Lattice Data (Part III) Title: Spatial Statistics for Point Processes and Lattice Data (Part III) Lattice Data Tonglin Zhang Outline Description Research Problems Global Clustering and Local Clusters Permutation Test Spatial

More information

1Department of Demography and Organization Studies, University of Texas at San Antonio, One UTSA Circle, San Antonio, TX

1Department of Demography and Organization Studies, University of Texas at San Antonio, One UTSA Circle, San Antonio, TX Well, it depends on where you're born: A practical application of geographically weighted regression to the study of infant mortality in the U.S. P. Johnelle Sparks and Corey S. Sparks 1 Introduction Infant

More information

ARIC Manuscript Proposal # PC Reviewed: _9/_25_/06 Status: A Priority: _2 SC Reviewed: _9/_25_/06 Status: A Priority: _2

ARIC Manuscript Proposal # PC Reviewed: _9/_25_/06 Status: A Priority: _2 SC Reviewed: _9/_25_/06 Status: A Priority: _2 ARIC Manuscript Proposal # 1186 PC Reviewed: _9/_25_/06 Status: A Priority: _2 SC Reviewed: _9/_25_/06 Status: A Priority: _2 1.a. Full Title: Comparing Methods of Incorporating Spatial Correlation in

More information

Chapter 15 Spatial Disease Surveillance: Methods and Applications

Chapter 15 Spatial Disease Surveillance: Methods and Applications Chapter 15 Spatial Disease Surveillance: Methods and Applications Tonglin Zhang 15.1 Introduction The availability of geographical indexed health and population data and statistical methodologies have

More information

Cluster investigations using Disease mapping methods International workshop on Risk Factors for Childhood Leukemia Berlin May

Cluster investigations using Disease mapping methods International workshop on Risk Factors for Childhood Leukemia Berlin May Cluster investigations using Disease mapping methods International workshop on Risk Factors for Childhood Leukemia Berlin May 5-7 2008 Peter Schlattmann Institut für Biometrie und Klinische Epidemiologie

More information

Tracey Farrigan Research Geographer USDA-Economic Research Service

Tracey Farrigan Research Geographer USDA-Economic Research Service Rural Poverty Symposium Federal Reserve Bank of Atlanta December 2-3, 2013 Tracey Farrigan Research Geographer USDA-Economic Research Service Justification Increasing demand for sub-county analysis Policy

More information

Bayesian Hierarchical Models

Bayesian Hierarchical Models Bayesian Hierarchical Models Gavin Shaddick, Millie Green, Matthew Thomas University of Bath 6 th - 9 th December 2016 1/ 34 APPLICATIONS OF BAYESIAN HIERARCHICAL MODELS 2/ 34 OUTLINE Spatial epidemiology

More information

Finding Hot Spots in ArcGIS Online: Minimizing the Subjectivity of Visual Analysis. Nicholas M. Giner Esri Parrish S.

Finding Hot Spots in ArcGIS Online: Minimizing the Subjectivity of Visual Analysis. Nicholas M. Giner Esri Parrish S. Finding Hot Spots in ArcGIS Online: Minimizing the Subjectivity of Visual Analysis Nicholas M. Giner Esri Parrish S. Henderson FBI Agenda The subjectivity of maps What is Hot Spot Analysis? Why do Hot

More information

A spatial scan statistic for multinomial data

A spatial scan statistic for multinomial data A spatial scan statistic for multinomial data Inkyung Jung 1,, Martin Kulldorff 2 and Otukei John Richard 3 1 Department of Epidemiology and Biostatistics University of Texas Health Science Center at San

More information

Analyzing the Geospatial Rates of the Primary Care Physician Labor Supply in the Contiguous United States

Analyzing the Geospatial Rates of the Primary Care Physician Labor Supply in the Contiguous United States Analyzing the Geospatial Rates of the Primary Care Physician Labor Supply in the Contiguous United States By Russ Frith Advisor: Dr. Raid Amin University of W. Florida Capstone Project in Statistics April,

More information

Community Health Needs Assessment through Spatial Regression Modeling

Community Health Needs Assessment through Spatial Regression Modeling Community Health Needs Assessment through Spatial Regression Modeling Glen D. Johnson, PhD CUNY School of Public Health glen.johnson@lehman.cuny.edu Objectives: Assess community needs with respect to particular

More information

Using AMOEBA to Create a Spatial Weights Matrix and Identify Spatial Clusters, and a Comparison to Other Clustering Algorithms

Using AMOEBA to Create a Spatial Weights Matrix and Identify Spatial Clusters, and a Comparison to Other Clustering Algorithms Using AMOEBA to Create a Spatial Weights Matrix and Identify Spatial Clusters, and a Comparison to Other Clustering Algorithms Arthur Getis* and Jared Aldstadt** *San Diego State University **SDSU/UCSB

More information

Everything is related to everything else, but near things are more related than distant things.

Everything is related to everything else, but near things are more related than distant things. SPATIAL ANALYSIS DR. TRIS ERYANDO, MA Everything is related to everything else, but near things are more related than distant things. (attributed to Tobler) WHAT IS SPATIAL DATA? 4 main types event data,

More information

Measuring community health outcomes: New approaches for public health services research

Measuring community health outcomes: New approaches for public health services research Research Brief March 2015 Measuring community health outcomes: New approaches for public health services research P ublic Health agencies are increasingly asked to do more with less. Tough economic times

More information

Quasi-likelihood Scan Statistics for Detection of

Quasi-likelihood Scan Statistics for Detection of for Quasi-likelihood for Division of Biostatistics and Bioinformatics, National Health Research Institutes & Department of Mathematics, National Chung Cheng University 17 December 2011 1 / 25 Outline for

More information

BAYESIAN MODEL FOR SPATIAL DEPENDANCE AND PREDICTION OF TUBERCULOSIS

BAYESIAN MODEL FOR SPATIAL DEPENDANCE AND PREDICTION OF TUBERCULOSIS BAYESIAN MODEL FOR SPATIAL DEPENDANCE AND PREDICTION OF TUBERCULOSIS Srinivasan R and Venkatesan P Dept. of Statistics, National Institute for Research Tuberculosis, (Indian Council of Medical Research),

More information

Lecture 3: Exploratory Spatial Data Analysis (ESDA) Prof. Eduardo A. Haddad

Lecture 3: Exploratory Spatial Data Analysis (ESDA) Prof. Eduardo A. Haddad Lecture 3: Exploratory Spatial Data Analysis (ESDA) Prof. Eduardo A. Haddad Key message Spatial dependence First Law of Geography (Waldo Tobler): Everything is related to everything else, but near things

More information

Person-Time Data. Incidence. Cumulative Incidence: Example. Cumulative Incidence. Person-Time Data. Person-Time Data

Person-Time Data. Incidence. Cumulative Incidence: Example. Cumulative Incidence. Person-Time Data. Person-Time Data Person-Time Data CF Jeff Lin, MD., PhD. Incidence 1. Cumulative incidence (incidence proportion) 2. Incidence density (incidence rate) December 14, 2005 c Jeff Lin, MD., PhD. c Jeff Lin, MD., PhD. Person-Time

More information

Detection of temporal changes in the spatial distribution of cancer rates using local Moran s I and geostatistically simulated spatial neutral models

Detection of temporal changes in the spatial distribution of cancer rates using local Moran s I and geostatistically simulated spatial neutral models J Geograph Syst (2005) 7:137 159 DOI: 10.1007/s10109-005-0154-7 Detection of temporal changes in the spatial distribution of cancer rates using local Moran s I and geostatistically simulated spatial neutral

More information

APPENDIX V VALLEYWIDE REPORT

APPENDIX V VALLEYWIDE REPORT APPENDIX V VALLEYWIDE REPORT Page Intentionally Left Blank 1.2 San Joaquin Valley Profile Geography The San Joaquin Valley is the southern portion of the Great Central Valley of California (Exhibit 1-1).

More information

Neighborhood social characteristics and chronic disease outcomes: does the geographic scale of neighborhood matter? Malia Jones

Neighborhood social characteristics and chronic disease outcomes: does the geographic scale of neighborhood matter? Malia Jones Neighborhood social characteristics and chronic disease outcomes: does the geographic scale of neighborhood matter? Malia Jones Prepared for consideration for PAA 2013 Short Abstract Empirical research

More information

This report details analyses and methodologies used to examine and visualize the spatial and nonspatial

This report details analyses and methodologies used to examine and visualize the spatial and nonspatial Analysis Summary: Acute Myocardial Infarction and Social Determinants of Health Acute Myocardial Infarction Study Summary March 2014 Project Summary :: Purpose This report details analyses and methodologies

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population

More information

Hierarchical Modeling and Analysis for Spatial Data

Hierarchical Modeling and Analysis for Spatial Data Hierarchical Modeling and Analysis for Spatial Data Bradley P. Carlin, Sudipto Banerjee, and Alan E. Gelfand brad@biostat.umn.edu, sudiptob@biostat.umn.edu, and alan@stat.duke.edu University of Minnesota

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

Lecture 3: Exploratory Spatial Data Analysis (ESDA) Prof. Eduardo A. Haddad

Lecture 3: Exploratory Spatial Data Analysis (ESDA) Prof. Eduardo A. Haddad Lecture 3: Exploratory Spatial Data Analysis (ESDA) Prof. Eduardo A. Haddad Key message Spatial dependence First Law of Geography (Waldo Tobler): Everything is related to everything else, but near things

More information

ANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS

ANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS Libraries 1997-9th Annual Conference Proceedings ANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS Eleanor F. Allan Follow this and additional works at: http://newprairiepress.org/agstatconference

More information

Outline. Practical Point Pattern Analysis. David Harvey s Critiques. Peter Gould s Critiques. Global vs. Local. Problems of PPA in Real World

Outline. Practical Point Pattern Analysis. David Harvey s Critiques. Peter Gould s Critiques. Global vs. Local. Problems of PPA in Real World Outline Practical Point Pattern Analysis Critiques of Spatial Statistical Methods Point pattern analysis versus cluster detection Cluster detection techniques Extensions to point pattern measures Multiple

More information

Models for Count and Binary Data. Poisson and Logistic GWR Models. 24/07/2008 GWR Workshop 1

Models for Count and Binary Data. Poisson and Logistic GWR Models. 24/07/2008 GWR Workshop 1 Models for Count and Binary Data Poisson and Logistic GWR Models 24/07/2008 GWR Workshop 1 Outline I: Modelling counts Poisson regression II: Modelling binary events Logistic Regression III: Poisson Regression

More information

Spatial Analysis I. Spatial data analysis Spatial analysis and inference

Spatial Analysis I. Spatial data analysis Spatial analysis and inference Spatial Analysis I Spatial data analysis Spatial analysis and inference Roadmap Outline: What is spatial analysis? Spatial Joins Step 1: Analysis of attributes Step 2: Preparing for analyses: working with

More information

Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p )

Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p ) Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p. 376-390) BIO656 2009 Goal: To see if a major health-care reform which took place in 1997 in Germany was

More information

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While

More information

2011/04 LEUKAEMIA IN WALES Welsh Cancer Intelligence and Surveillance Unit

2011/04 LEUKAEMIA IN WALES Welsh Cancer Intelligence and Surveillance Unit 2011/04 LEUKAEMIA IN WALES 1994-2008 Welsh Cancer Intelligence and Surveillance Unit Table of Contents 1 Definitions and Statistical Methods... 2 2 Results 7 2.1 Leukaemia....... 7 2.2 Acute Lymphoblastic

More information

Econometric Analysis of Cross Section and Panel Data

Econometric Analysis of Cross Section and Panel Data Econometric Analysis of Cross Section and Panel Data Jeffrey M. Wooldridge / The MIT Press Cambridge, Massachusetts London, England Contents Preface Acknowledgments xvii xxiii I INTRODUCTION AND BACKGROUND

More information

Poisson Regression. Ryan Godwin. ECON University of Manitoba

Poisson Regression. Ryan Godwin. ECON University of Manitoba Poisson Regression Ryan Godwin ECON 7010 - University of Manitoba Abstract. These lecture notes introduce Maximum Likelihood Estimation (MLE) of a Poisson regression model. 1 Motivating the Poisson Regression

More information

Impact of Cliff and Ord (1969, 1981) on Spatial Epidemiology

Impact of Cliff and Ord (1969, 1981) on Spatial Epidemiology Impact of Cliff and Ord (1969, 1981) on Spatial Epidemiology Sylvia Richardson 1 and Chantal Guihenneuc-Jouyaux 2 1 Centre for Biostatistics and MRC-HPA Centre for Environment and Health, Imperial College

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent

More information

Local Spatial Autocorrelation Clusters

Local Spatial Autocorrelation Clusters Local Spatial Autocorrelation Clusters Luc Anselin http://spatial.uchicago.edu LISA principle local Moran local G statistics issues and interpretation LISA Principle Clustering vs Clusters global spatial

More information

Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links

Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links Communications of the Korean Statistical Society 2009, Vol 16, No 4, 697 705 Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links Kwang Mo Jeong a, Hyun Yung Lee 1, a a Department

More information

Mohammed. Research in Pharmacoepidemiology National School of Pharmacy, University of Otago

Mohammed. Research in Pharmacoepidemiology National School of Pharmacy, University of Otago Mohammed Research in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of Otago What is zero inflation? Suppose you want to study hippos and the effect of habitat variables on their

More information

Spatial Variation in Infant Mortality with Geographically Weighted Poisson Regression (GWPR) Approach

Spatial Variation in Infant Mortality with Geographically Weighted Poisson Regression (GWPR) Approach Spatial Variation in Infant Mortality with Geographically Weighted Poisson Regression (GWPR) Approach Kristina Pestaria Sinaga, Manuntun Hutahaean 2, Petrus Gea 3 1, 2, 3 University of Sumatera Utara,

More information

Generalized Linear Models (GLZ)

Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) are an extension of the linear modeling process that allows models to be fit to data that follow probability distributions other than the

More information

Correlation and regression

Correlation and regression 1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,

More information

Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence

Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence Sunil Kumar Dhar Center for Applied Mathematics and Statistics, Department of Mathematical Sciences, New Jersey

More information

Spatial Variation in Hospitalizations for Cardiometabolic Ambulatory Care Sensitive Conditions Across Canada

Spatial Variation in Hospitalizations for Cardiometabolic Ambulatory Care Sensitive Conditions Across Canada Spatial Variation in Hospitalizations for Cardiometabolic Ambulatory Care Sensitive Conditions Across Canada CRDCN Conference November 14, 2017 Martin Cooke Alana Maltby Sarah Singh Piotr Wilk Today s

More information

BOOTSTRAPPING WITH MODELS FOR COUNT DATA

BOOTSTRAPPING WITH MODELS FOR COUNT DATA Journal of Biopharmaceutical Statistics, 21: 1164 1176, 2011 Copyright Taylor & Francis Group, LLC ISSN: 1054-3406 print/1520-5711 online DOI: 10.1080/10543406.2011.607748 BOOTSTRAPPING WITH MODELS FOR

More information

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization. Statistical Tools in Evaluation HPS 41 Dr. Joe G. Schmalfeldt Types of Scores Continuous Scores scores with a potentially infinite number of values. Discrete Scores scores limited to a specific number

More information

Stat 642, Lecture notes for 04/12/05 96

Stat 642, Lecture notes for 04/12/05 96 Stat 642, Lecture notes for 04/12/05 96 Hosmer-Lemeshow Statistic The Hosmer-Lemeshow Statistic is another measure of lack of fit. Hosmer and Lemeshow recommend partitioning the observations into 10 equal

More information

EXPLORATORY SPATIAL DATA ANALYSIS OF BUILDING ENERGY IN URBAN ENVIRONMENTS. Food Machinery and Equipment, Tianjin , China

EXPLORATORY SPATIAL DATA ANALYSIS OF BUILDING ENERGY IN URBAN ENVIRONMENTS. Food Machinery and Equipment, Tianjin , China EXPLORATORY SPATIAL DATA ANALYSIS OF BUILDING ENERGY IN URBAN ENVIRONMENTS Wei Tian 1,2, Lai Wei 1,2, Pieter de Wilde 3, Song Yang 1,2, QingXin Meng 1 1 College of Mechanical Engineering, Tianjin University

More information

Exploratory Spatial Data Analysis (ESDA)

Exploratory Spatial Data Analysis (ESDA) Exploratory Spatial Data Analysis (ESDA) VANGHR s method of ESDA follows a typical geospatial framework of selecting variables, exploring spatial patterns, and regression analysis. The primary software

More information

MODELING COUNT DATA Joseph M. Hilbe

MODELING COUNT DATA Joseph M. Hilbe MODELING COUNT DATA Joseph M. Hilbe Arizona State University Count models are a subset of discrete response regression models. Count data are distributed as non-negative integers, are intrinsically heteroskedastic,

More information

y response variable x 1, x 2,, x k -- a set of explanatory variables

y response variable x 1, x 2,, x k -- a set of explanatory variables 11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate

More information

Cluster Analysis using SaTScan

Cluster Analysis using SaTScan Cluster Analysis using SaTScan Summary 1. Statistical methods for spatial epidemiology 2. Cluster Detection What is a cluster? Few issues 3. Spatial and spatio-temporal Scan Statistic Methods Probability

More information

TABLE OF CONTENTS INTRODUCTION TO MIXED-EFFECTS MODELS...3

TABLE OF CONTENTS INTRODUCTION TO MIXED-EFFECTS MODELS...3 Table of contents TABLE OF CONTENTS...1 1 INTRODUCTION TO MIXED-EFFECTS MODELS...3 Fixed-effects regression ignoring data clustering...5 Fixed-effects regression including data clustering...1 Fixed-effects

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Inclusion of Non-Street Addresses in Cancer Cluster Analysis

Inclusion of Non-Street Addresses in Cancer Cluster Analysis Inclusion of Non-Street Addresses in Cancer Cluster Analysis Sue-Min Lai, Zhimin Shen, Darin Banks Kansas Cancer Registry University of Kansas Medical Center KCR (Kansas Cancer Registry) KCR: population-based

More information

Part 8: GLMs and Hierarchical LMs and GLMs

Part 8: GLMs and Hierarchical LMs and GLMs Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course

More information

Chapter 22: Log-linear regression for Poisson counts

Chapter 22: Log-linear regression for Poisson counts Chapter 22: Log-linear regression for Poisson counts Exposure to ionizing radiation is recognized as a cancer risk. In the United States, EPA sets guidelines specifying upper limits on the amount of exposure

More information

Model Estimation Example

Model Estimation Example Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

Estimating the long-term health impact of air pollution using spatial ecological studies. Duncan Lee

Estimating the long-term health impact of air pollution using spatial ecological studies. Duncan Lee Estimating the long-term health impact of air pollution using spatial ecological studies Duncan Lee EPSRC and RSS workshop 12th September 2014 Acknowledgements This is joint work with Alastair Rushworth

More information

ESRI 2008 Health GIS Conference

ESRI 2008 Health GIS Conference ESRI 2008 Health GIS Conference An Exploration of Geographically Weighted Regression on Spatial Non- Stationarity and Principal Component Extraction of Determinative Information from Robust Datasets A

More information

How to deal with non-linear count data? Macro-invertebrates in wetlands

How to deal with non-linear count data? Macro-invertebrates in wetlands How to deal with non-linear count data? Macro-invertebrates in wetlands In this session we l recognize the advantages of making an effort to better identify the proper error distribution of data and choose

More information

GIS in Locating and Explaining Conflict Hotspots in Nepal

GIS in Locating and Explaining Conflict Hotspots in Nepal GIS in Locating and Explaining Conflict Hotspots in Nepal Lila Kumar Khatiwada Notre Dame Initiative for Global Development 1 Outline Brief background Use of GIS in conflict study Data source Findings

More information

Maggie M. Kovach. Department of Geography University of North Carolina at Chapel Hill

Maggie M. Kovach. Department of Geography University of North Carolina at Chapel Hill Maggie M. Kovach Department of Geography University of North Carolina at Chapel Hill Rationale What is heat-related illness? Why is it important? Who is at risk for heat-related illness and death? Urban

More information

SaTScan TM. User Guide. for version 7.0. By Martin Kulldorff. August

SaTScan TM. User Guide. for version 7.0. By Martin Kulldorff. August SaTScan TM User Guide for version 7.0 By Martin Kulldorff August 2006 http://www.satscan.org/ Contents Introduction... 4 The SaTScan Software... 4 Download and Installation... 5 Test Run... 5 Sample Data

More information

STRUCTURAL EQUATION MODELING. Khaled Bedair Statistics Department Virginia Tech LISA, Summer 2013

STRUCTURAL EQUATION MODELING. Khaled Bedair Statistics Department Virginia Tech LISA, Summer 2013 STRUCTURAL EQUATION MODELING Khaled Bedair Statistics Department Virginia Tech LISA, Summer 2013 Introduction: Path analysis Path Analysis is used to estimate a system of equations in which all of the

More information

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity LECTURE 10 Introduction to Econometrics Multicollinearity & Heteroskedasticity November 22, 2016 1 / 23 ON PREVIOUS LECTURES We discussed the specification of a regression equation Specification consists

More information

Variables and Variable De nitions

Variables and Variable De nitions APPENDIX A Variables and Variable De nitions All demographic county-level variables have been drawn directly from the 1970, 1980, and 1990 U.S. Censuses of Population, published by the U.S. Department

More information

Negative Multinomial Model and Cancer. Incidence

Negative Multinomial Model and Cancer. Incidence Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence S. Lahiri & Sunil K. Dhar Department of Mathematical Sciences, CAMS New Jersey Institute of Technology, Newar,

More information

Treatment Variables INTUB duration of endotracheal intubation (hrs) VENTL duration of assisted ventilation (hrs) LOWO2 hours of exposure to 22 49% lev

Treatment Variables INTUB duration of endotracheal intubation (hrs) VENTL duration of assisted ventilation (hrs) LOWO2 hours of exposure to 22 49% lev Variable selection: Suppose for the i-th observational unit (case) you record ( failure Y i = 1 success and explanatory variabales Z 1i Z 2i Z ri Variable (or model) selection: subject matter theory and

More information

Multinomial Logistic Regression Models

Multinomial Logistic Regression Models Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word

More information

Important note: Transcripts are not substitutes for textbook assignments. 1

Important note: Transcripts are not substitutes for textbook assignments. 1 In this lesson we will cover correlation and regression, two really common statistical analyses for quantitative (or continuous) data. Specially we will review how to organize the data, the importance

More information

Hypothesis Testing hypothesis testing approach

Hypothesis Testing hypothesis testing approach Hypothesis Testing In this case, we d be trying to form an inference about that neighborhood: Do people there shop more often those people who are members of the larger population To ascertain this, we

More information

Probability and Probability Distributions. Dr. Mohammed Alahmed

Probability and Probability Distributions. Dr. Mohammed Alahmed Probability and Probability Distributions 1 Probability and Probability Distributions Usually we want to do more with data than just describing them! We might want to test certain specific inferences about

More information

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 16 Introduction

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 16 Introduction Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 16 Introduction ReCap. Parts I IV. The General Linear Model Part V. The Generalized Linear Model 16 Introduction 16.1 Analysis

More information

Visualization Based Approach for Exploration of Health Data and Risk Factors

Visualization Based Approach for Exploration of Health Data and Risk Factors Visualization Based Approach for Exploration of Health Data and Risk Factors Xiping Dai and Mark Gahegan Department of Geography & GeoVISTA Center Pennsylvania State University University Park, PA 16802,

More information

KAAF- GE_Notes GIS APPLICATIONS LECTURE 3

KAAF- GE_Notes GIS APPLICATIONS LECTURE 3 GIS APPLICATIONS LECTURE 3 SPATIAL AUTOCORRELATION. First law of geography: everything is related to everything else, but near things are more related than distant things Waldo Tobler Check who is sitting

More information

MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES

MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES REVSTAT Statistical Journal Volume 13, Number 3, November 2015, 233 243 MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES Authors: Serpil Aktas Department of

More information

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form: Outline for today What is a generalized linear model Linear predictors and link functions Example: fit a constant (the proportion) Analysis of deviance table Example: fit dose-response data using logistic

More information

Defining Statistically Significant Spatial Clusters of a Target Population using a Patient-Centered Approach within a GIS

Defining Statistically Significant Spatial Clusters of a Target Population using a Patient-Centered Approach within a GIS Defining Statistically Significant Spatial Clusters of a Target Population using a Patient-Centered Approach within a GIS Efforts to Improve Quality of Care Stephen Jones, PhD Bio-statistical Research

More information

2/7/2018. Module 4. Spatial Statistics. Point Patterns: Nearest Neighbor. Spatial Statistics. Point Patterns: Nearest Neighbor

2/7/2018. Module 4. Spatial Statistics. Point Patterns: Nearest Neighbor. Spatial Statistics. Point Patterns: Nearest Neighbor Spatial Statistics Module 4 Geographers are very interested in studying, understanding, and quantifying the patterns we can see on maps Q: What kinds of map patterns can you think of? There are so many

More information

Are Forecast Updates Progressive?

Are Forecast Updates Progressive? CIRJE-F-736 Are Forecast Updates Progressive? Chia-Lin Chang National Chung Hsing University Philip Hans Franses Erasmus University Rotterdam Michael McAleer Erasmus University Rotterdam and Tinbergen

More information

Introduction to Spatial Statistics and Modeling for Regional Analysis

Introduction to Spatial Statistics and Modeling for Regional Analysis Introduction to Spatial Statistics and Modeling for Regional Analysis Dr. Xinyue Ye, Assistant Professor Center for Regional Development (Department of Commerce EDA University Center) & School of Earth,

More information

Final Exam - Solutions

Final Exam - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis March 19, 2010 Instructor: John Parman Final Exam - Solutions You have until 5:30pm to complete this exam. Please remember to put your

More information

SPACE Workshop NSF NCGIA CSISS UCGIS SDSU. Aldstadt, Getis, Jankowski, Rey, Weeks SDSU F. Goodchild, M. Goodchild, Janelle, Rebich UCSB

SPACE Workshop NSF NCGIA CSISS UCGIS SDSU. Aldstadt, Getis, Jankowski, Rey, Weeks SDSU F. Goodchild, M. Goodchild, Janelle, Rebich UCSB SPACE Workshop NSF NCGIA CSISS UCGIS SDSU Aldstadt, Getis, Jankowski, Rey, Weeks SDSU F. Goodchild, M. Goodchild, Janelle, Rebich UCSB August 2-8, 2004 San Diego State University Some Examples of Spatial

More information

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics Faculty of Health Sciences Regression models Counts, Poisson regression, 27-5-2013 Lene Theil Skovgaard Dept. of Biostatistics 1 / 36 Count outcome PKA & LTS, Sect. 7.2 Poisson regression The Binomial

More information

STA6938-Logistic Regression Model

STA6938-Logistic Regression Model Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of

More information

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1 Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson

More information

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Logistic Regression 1 / 38 Logistic Regression 1 Introduction

More information

An Introduction to Path Analysis

An Introduction to Path Analysis An Introduction to Path Analysis PRE 905: Multivariate Analysis Lecture 10: April 15, 2014 PRE 905: Lecture 10 Path Analysis Today s Lecture Path analysis starting with multivariate regression then arriving

More information

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY (formerly the Examinations of the Institute of Statisticians) GRADUATE DIPLOMA, 2007

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY (formerly the Examinations of the Institute of Statisticians) GRADUATE DIPLOMA, 2007 EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY (formerly the Examinations of the Institute of Statisticians) GRADUATE DIPLOMA, 2007 Applied Statistics I Time Allowed: Three Hours Candidates should answer

More information