Testing for Spatial Association of Qualitative Data Using Symbolic Dynamics

Similar documents
Comparison of Thematic Maps Using Symbolic Entropy

Additional File 1 - Detailed explanation of the expression level CPD

Specification -- Assumptions of the Simple Classical Linear Regression Model (CLRM) 1. Introduction

Chapter 11. Supplemental Text Material. The method of steepest ascent can be derived as follows. Suppose that we have fit a firstorder

Communication on the Paper A Reference-Dependent Regret Model for. Deterministic Tradeoff Studies

Quick Visit to Bernoulli Land

AP Statistics Ch 3 Examining Relationships

MULTIPLE REGRESSION ANALYSIS For the Case of Two Regressors

2.3 Least-Square regressions

Improvements on Waring s Problem

XII.3 The EM (Expectation-Maximization) Algorithm

Team. Outline. Statistics and Art: Sampling, Response Error, Mixed Models, Missing Data, and Inference

Excess Error, Approximation Error, and Estimation Error

728. Mechanical and electrical elements in reduction of vibrations

Two Approaches to Proving. Goldbach s Conjecture

On the assessment of ship grounding risk in restricted channels

Chapter 12 Lyes KADEM [Thermodynamics II] 2007

Root Locus Techniques

System in Weibull Distribution

Statistical Properties of the OLS Coefficient Estimators. 1. Introduction

Start Point and Trajectory Analysis for the Minimal Time System Design Algorithm

Design of Recursive Digital Filters IIR

Chapter 6 The Effect of the GPS Systematic Errors on Deformation Parameters

BAYESIAN CURVE FITTING USING PIECEWISE POLYNOMIALS. Dariusz Biskup

Harmonic oscillator approximation

Scattering cross section (scattering width)

Alpha Risk of Taguchi Method with L 18 Array for NTB Type QCH by Simulation

Improvements on Waring s Problem

APPROXIMATE FUZZY REASONING BASED ON INTERPOLATION IN THE VAGUE ENVIRONMENT OF THE FUZZY RULEBASE AS A PRACTICAL ALTERNATIVE OF THE CLASSICAL CRI

Image Registration for a Series of Chest Radiograph Images

Xiangwen Li. March 8th and March 13th, 2001

1 cos. where v v sin. Range Equations: for an object that lands at the same height at which it starts. v sin 2 i. t g. and. sin g

Scattering of two identical particles in the center-of. of-mass frame. (b)

Introduction to Interfacial Segregation. Xiaozhe Zhang 10/02/2015

BULLETIN OF MATHEMATICS AND STATISTICS RESEARCH

PHYS 100 Worked Examples Week 05: Newton s 2 nd Law

Pythagorean triples. Leen Noordzij.

Confidence intervals for the difference and the ratio of Lognormal means with bounded parameters

and decompose in cycles of length two

Applied Mathematics Letters

ECOLE CENTRALE PARIS COORDINATED CONTROL OF INVENTORIES, AND BACKORDERS IN STOCHASTIC MANUFACTURING SYSTEMS

The gravitational field energy density for symmetrical and asymmetrical systems

External conjectural variations in symmetric oligopoly equilibrium. Abstract

ITERATIVE ESTIMATION PROCEDURE FOR GEOSTATISTICAL REGRESSION AND GEOSTATISTICAL KRIGING

Method Of Fundamental Solutions For Modeling Electromagnetic Wave Scattering Problems

Chapter 8: Fast Convolution. Keshab K. Parhi

A New Gibbs-Sampling Based Algorithm for Bayesian Model Updating of Linear Dynamic Systems with Incomplete Complex Modal Data

Two-Layered Model of Blood Flow through Composite Stenosed Artery

Several generation methods of multinomial distributed random number Tian Lei 1, a,linxihe 1,b,Zhigang Zhang 1,c

Estimation of Finite Population Total under PPS Sampling in Presence of Extra Auxiliary Information

PROBABILITY AND STATISTICS Vol. III - Analysis of Variance and Analysis of Covariance - V. Nollau ANALYSIS OF VARIANCE AND ANALYSIS OF COVARIANCE

What is LP? LP is an optimization technique that allocates limited resources among competing activities in the best possible manner.

1 Review From Last Time

A Survival-Adjusted Quantal-Response Test for Analysis of Tumor Incidence Rates in Animal Carcinogenicity Studies

A Markov Chain Model for the Analysis of Round-Robin Scheduling Scheme

Verification of Selected Precision Parameters of the Trimble S8 DR Plus Robotic Total Station

Predictors Using Partially Conditional 2 Stage Response Error Ed Stanek

Chapter 5: Root Locus

Computational and Statistical Learning theory Assignment 4

Least Squares Fitting of Data

On the U-WPF Acts over Monoids

Least Squares Fitting of Data

1 Definition of Rademacher Complexity

The 7 th Balkan Conference on Operational Research BACOR 05 Constanta, May 2005, Romania

Small signal analysis

x = , so that calculated

CALCULATION OF CUMULATIVE DAMAGE OF TETRAPOD ARMOR LAYER

On Pfaff s solution of the Pfaff problem

The multivariate Gaussian probability density function for random vector X (X 1,,X ) T. diagonal term of, denoted

A SIMPLE METHOD TO INCORPORATE THERMAL BRIDGE EFFECTS INTO DYNAMIC HEAT LOAD CALCULATION PROGRAMS. Akihiro Nagata

bounds compared to SB and SBB bounds as the former two have an index parameter, while the latter two

Extended Prigogine Theorem: Method for Universal Characterization of Complex System Evolution

Projectile Motion. Parabolic Motion curved motion in the shape of a parabola. In the y direction, the equation of motion has a t 2.

COS 511: Theoretical Machine Learning

Wind - Induced Vibration Control of Long - Span Bridges by Multiple Tuned Mass Dampers

Randomized Accuracy-Aware Program Transformations For Efficient Approximate Computations

A Result on a Cyclic Polynomials

LECTURE :FACTOR ANALYSIS

On a generalized Chu Vandermonde identity

Scattering by a perfectly conducting infinite cylinder

ELASTIC WAVE PROPAGATION IN A CONTINUOUS MEDIUM

Variable Structure Control ~ Basics

On the number of regions in an m-dimensional space cut by n hyperplanes

Fermi-Dirac statistics

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

arxiv: v2 [math.co] 3 Sep 2017

Comparison of Regression Lines

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

A METHOD TO REPRESENT THE SEMANTIC DESCRIPTION OF A WEB SERVICE BASED ON COMPLEXITY FUNCTIONS

Negative Binomial Regression

Denote the function derivatives f(x) in given points. x a b. Using relationships (1.2), polynomials (1.1) are written in the form

Designing Fuzzy Time Series Model Using Generalized Wang s Method and Its application to Forecasting Interest Rate of Bank Indonesia Certificate

Joint Source Coding and Higher-Dimension Modulation

Problem #1. Known: All required parameters. Schematic: Find: Depth of freezing as function of time. Strategy:

A A Non-Constructible Equilibrium 1

MULTISTART OPTIMIZATION WITH A TRAINABLE DECISION MAKER FOR AVOIDING HIGH-VALUED LOCAL MINIMA

Solution Methods for Time-indexed MIP Models for Chemical Production Scheduling

Collaborative Filtering Recommendation Algorithm

The Geometry of Logit and Probit

Supporting Information. Hydroxyl Radical Production by H 2 O 2 -Mediated. Conditions

Transcription:

Tetng for Spatal Aocaton of Qualtatve Data Ung Sybolc Dynac Manuel Ruz, Fernando López Facultad de C.C. de la Eprea Dpto. Método Cuanttatvo e Inforátco Unverdad Poltécnca de Cartagena Antono Páez Centre for Spatal Analy School of Geography and Earth Scence McMater Unverty Publhed n Journal of Geographcal Syte do:10.1007/10109-009-0100-1 Abtract. Qualtatve patal varable are portant n any feld of reearch. However, unle the decade-worth of reearch devoted to the patal aocaton of quanttatve varable, the exploratory analy of patal qualtatve varable relatvely le developed. The obectve of the preent paper to propoe a new tet (Q) for patal ndependence. Th a ple, content, and powerful tattc for qualtatve patal ndependence that we develop ung concept fro ybolc dynac and ybolc entropy. The Q tet can be ued to detect, gven a patal dtrbuton of event, pattern of patal aocaton of qualtatve varable n a wde varety of ettng. In order to enable hypothe tetng, we gve a tandard ayptotc dtrbuton of an affne tranforaton of the ybolc entropy under the null hypothe of ndependence n the patal qualtatve proce. We nclude nuercal experent to deontrate the fnte aple behavour of the tet, and how t applcaton by ean of an eprcal exaple that explore the patal aocaton of fat food etablhent n the Greater Toronto Area n Canada. Keyword. Spatal ndependence, qualtatve varable, ybolc dynac, entropy, fat food 0

1 Introducton The concept of patal autocorrelaton central to any effort to undertand the patalty of phenoena, and to buld patal theory and odel (Grffth 1999; Mller 2004). Fro t orgn n atheatcal tattc (Geary 1954; Krhna Iyer 1949; Moran 1948) the noton of autocorrelaton ha anated, and n turn been gven latng currency by, quanttatve geography, patal analy, and patal tattc (Get 2008). It fro thee dcplne that the analy of ap pattern ha dffued throughout, tartng wth the wor of quanttatve geographer (e.g., Dacey 1968), to Clff and Ord (1973, 1981) and Rpley (1981), through the text of Aneln (1988), Grffth (1988), Hanng (1990), and Cree (1993). Now, patal autocorrelaton analy ued to upport reearch n an ever ncreang phere of cogent dcplne. A vat aorty of wor n patal analy ha htorcally been concerned wth the analy of varable of a contnuou and nterval nature. It thu nteretng to note that n fact the frt attept to decrbe ap fro a tattcal pont of vew, wa ade n reference to qualtatve varable (Dacey 1968; Moran 1948), pecfcally blac and whte colored (or later -colored) ap, and only n econd place to contnuou varable (Clff and Ord 1973; Geary 1954; Moran 1950). The reaon for th htorcal developent ee clear. Lnear regreon for the ultvarate analy of contnuou varable wa, untl relatvely recent te, the ntruent of choce for tattcal analy of patal data. In turn, the analy of ap pattern wa, alot fro the begnnng, eant to erve a a dagnotc tool for the analy of redual n lnear regreon (ee Geary 1954, pp. 115-116 and agan p. 144). Depte the tradtonal focu on contnuou varable n patal analy, there are nuerou tuaton where qualtatve varable are the focu of reearch, and t n th context that the hypothe of patal ndependence of qualtatve data portant. Bede early wor wth on count tattc (e.g., Clff and Ord 1981; Dacey 1968; Upton and Fngleton 1985), and oe ore recent wor by Boot (2003), not uch reearch ha been devoted to th cla of proble n an exploratory ettng, even f patal odelng technque for qualtatve data have een gnfcant progre n recent year (e.g., Bhat and Sener 2009; Char and Parent 2009; Dubn 1995; McMllen 1992; Paez 2006; Roberton et al. 2009; Wang and Kocelan 2009). The obectve of th paper to propoe a new tattc for the exploratory analy of patal qualtatve/nonal data. The tattc eant to dentfy whether neghbourng value of a patal qualtatve varable tend to be ore lar or dlar than would be expected by chance. The approach propoed to tet th hypothe of patal ndependence for qualtatve varable baed on prncple drawn fro ybolc dynac. Sybolc dynac ha been ued for the nvetgaton of non-lnear dynac yte (Hao and Zheng 1998) and provde an deal et of tool for repreentng dcrete procee. We ue thee tool to derve a new tattc, tered Q, partng fro a functon of ybolc entropy. In addton, we dcu the theoretcal properte of the propoed tattc and nvetgate t fnte aple behavour by ean of an extenve et of nuercal experent. Fnally, we llutrate the uefulne of the Q tattc eprcally wth a cae tudy that explore the patal aocaton of varou fat food etablhent type, naely Pzza, Haburger, and Sandwch etablhent, n the Greater Toronto Area een Canada. In the concludng ecton, we dcu a nuber of valuable feature of our tattc, and drecton for future reearch. 1

2 Bacground A noted above, the tudy of autocorrelaton of qualtatve varable wa aong the earlet for of patal analy, but fro the tart eant to upport the ue of lnear regreon for contnuou varable. Soe early applcaton confr th connecton, a for exaple the analy that Hanng (1978) conducted for crop falure n Nebraa and Kana. Whle the pree that crop falure fored one or ore regonal cluter had been prevouly advanced (e.g., Hewe 1965), applcaton of a contguty eaure by Hanng (1978) provded the tattcal evdence neceary to confr the vual appraal of crop falure pattern. An ntrgung feature of th tudy the converon of an nterval varable (percentage falure) to a nonal varable by tang value below or above the ean, or n other word, the categorzaton of a contnuou varable. Th not a lonely exaple of uch practce of dcretzng contnuou varable, and other ntance nclude Chuang and Huang (1992) aeent of the level of noe n dgtal age that converted grey cale radologcal age to blac and whte pattern, or Goldborough (1994) tudy of algal enueraton, whereby overall ean denty wa ued to clafy unt a dene or pare. One can only peculate a to the reaon why contnuou varable were converted to nonal varable n thee tude, nce the fact that reducton to a nonal varable nvolve oe erou nforaton lo wa not lot n thee author (ee Chuang and Huang 1992, p. 367). Fro a coputatonal tandpont, there are ndcaton that a late a 1992, the proce of countng on requred to calculate autocorrelaton tattc wa tll fraught wth dffculte and plagued wth error (Ghent et al. 1992). Relatve plcty ay have alo been a factor. In any cae, t clear that a vat aorty of reearch effort were ndeed devoted to the developent of tattc for contnuou varable to erve the need poed by the extended ue of regreon analy. A a reult, t conventonal n conteporary patal analytcal practce to ue tattc approprate for contnuou varable at the global (Moran I, Geary c, varographc analy) or local level (Aneln 1995; Get and Ord 1993). There are ultple exaple of reearch where the focu n fact a qualtatve varable. In ntegrated chp anufacturng, for ntance, the patal tructure of nonfunctonal chp n wafer recognzed a a way to provde ueful nforaton about the anufacturng proce. In th cae, chp n a wafer are clafed a good or bad (e.g., Taa and Haada 1993, p. 150), and the obectve to deterne whether defect are randoly or non-randoly cattered. Nonal data are alo found n plant pathology, a n De Jong and De Bree (1995) tudy of patal pattern of deae n coercal feld of lee, where the varable of nteret a health tatu bnary clafcaton ( healthy and nfected ). Lewe, Real and McElhany (1996) dcu the ue nonal varable when thee are the deae tatu of plant. In veternary cence, Mannell et al. (1998) have tuded wne fever n Sardna ung uncpal level data followng a bnary clafcaton chee defned a outbrea and unaffected. In evolutonary bology, patal varaton n ftne wa exaned by Stratton and Bennngton (1996) n an experent pleented to nfer natural electon procee that operate n pace through the aeent of patal varaton n genotype dtrbuton. In th experent, data collected after a rando ntal dtrbuton of eed wa analyzed to elucdate whether plant that carry dentcal genetc arer are patally aocated, and the clafcaton wa defned by ean of dentty, that, pattern of aocaton of plant wth the ae genetc arer (e.g., f there are three arer, then AA, Aa, aa). In eparate reearch, Epperon and Alvarez-Buylla (1997) alo nvetgate the patal tructure of nonal varable baed on on for two genotype. Bell et al. (2008) are ntereted n patal pattern of nury. In th 2

nvetgaton, on count tattc are ued to decrbe the patal co-occurrence of nure by aault or ntentonal elf-har, wth the reult uggetng that aault nure utaned by ale who reded n neghbourng area were ore frequent than expected purely by chance. Self-har nure dd not dplay the ae trength of patal pattern. The ntenton of the tattc propoed n th paper to upport analy n reearch that ae ue of qualtatve varable, uch a the exaple above. 3 Sybolzaton of a patal proce wth dcrete outcoe Developent of the Q tattc baed on the applcaton of ybolc dynac concept. Sybolc dynac an approach, developed n the feld of atheatc for the tudy of dynacal yte (Hao and Zheng 1998) that cont of odellng a dynac yte by ean of a dcrete et contng of equence of abtract ybol obtaned for a utable partton of the tate pace. The bac dea behnd ybolc dynac to conder a pace n whch the poble tate of a yte are repreented, and each poble tate correpond to one unque pont n the tate pace. Th pace can then be parttoned nto a fnte nuber of regon and each regon can be labelled by an alphabetcal letter. In th regard, ybolc dynac a coare-graned decrpton of dynac. Even though coare-graned ethod loe a certan aount of detaled nforaton, oe eental feature of the dynac ay be ept, ncludng perodcty and dependence, aong other (for an overvew of thee concept ee Hao and Zheng 1998). If the proce nherently dcrete to begn wth, then ybolc dynac provde an deal tool for t tudy. In order to pleent ybolc dynac concept the ybol for a proce ut be defned, or n other word, the proce need to be ybolzed. In prncple, there no reaon to antcpate that ybolzaton procedure wll be unque gven a patal proce, and n fact t poble to conceve of everal poble way to ybolze a proce. Therefore the general fraewor propoed here can be adapted to the necete of pecfc proble, and ut a the cae wth connectvty atrce n patal odellng, t generally poble to ncorporate ubtantve undertandng of the proce of nteret n order to refne the ybolzaton procedure. Th a feature that lend great flexblty to our approach. In order to enure broad applcablty of the tattc propoed, n th paper we propoe a general, all-purpoe ybolzaton procedure whch allow u to capture the dependence of a dcrete proce n geographcal pace. Let u begn by defnng a dcrete patal proce { X } S, where S a et of geographcal coordnate that denote the locaton of event. Thee locaton are gven and fxed. Further, denote by A { a1 a 2 a } the et of poble value that X can tae, for all S. Clearly, there are dfferent categore n th notaton, whch could be blac / whte or ye / no (=2), AA / Aa / aa f there are three genetc arer (=3), and o on. In other word, obervaton are ade at patally dcrete locaton, and the outcoe of the proce dcrete a well. A natural way to ybolze uch a proce to ebed t n an -denonal pace a follow: X ( ) ( X X X ) for S (1) 0 0 1 1 0 where 1, 2,..., -1 are the -1 nearet neghbour of 0. We wll call th -denonal pace an -urroundng. A ey to ybolzng the proce to defne the crtera that deterne whch patal event are the neghbour of 0. To th end, we propoe a defnton of neghbour baed on proxty (.e. nearet neghbour crteron). Whenever 3

0 0 two neghbour are equdtant, then the polar coordnate ( ) of are condered, tang 0 a the orgn. Th ple that the -1 nearet neghbour wll be thoe event atfyng the followng two condton that enure the unquene of X() for all S : (a) (b) The dtance of the -1 neghbour fro 0 atfe the condton that 0 0 0 1 2 1; and In the cae of a te n ter of the dtance fro 0, (.e. f 0 0 precedence goe to the aller angle (.e. ). 1 0 0 1 ) then The et of the -1 nearet neghbour denoted a N { 1 2 1}. Snce an -urroundng X() cont of obervaton, and there are poble value that each obervaton can tae, there are dtnct cobnaton of value for an - urroundng. We wll denote each of thee unque cobnaton by an abtract ybol, ay, and wll defne { 1 2 } a the et of all poble ybol. Furtherore, we wll ay that a locaton of -type f and only f X(). A an llutraton of the ybolzaton procedure, conder a ple patal yte contng of a regular hexagonal teellaton a hown n Fg. 1, and a proce wth two poble outcoe (=2). The outcoe are hown n the fgure n dar color when they are cla 1 and lght color when they are of cla 2. Tang =6 a the ze of the -urroundng, th gve a total of 2 6 =64 dfferent cobnaton of value, or ybol ( 1 through 64 ), a lted n Table 1. Pleae note that a hexagonal teellaton ued only for llutratve purpoe. The ybolzaton procedure equally applcable to regular and rregular dtrbuton of obervaton, and to pont a well a area. Fg. 1. Sple patal yte and proce wth two type of outcoe. Snce n a hexagonal teellaton the dtance fro 0 the ae for all 6 contguou patal unt, and eepng n nd that polar coordnate begn at an angle of 0 n the potve drecton of the x ax n Cartean coordnate, t hould be clear that neghbour are arranged n order of ncreang angle fro the orgn of the polar coordnate yte. Then, referrng agan to Fg. 1, we ay that locaton 1 of ybol 13, nce X ( ) (1,1,2,2,1,1) 1, wherea locaton 2 of ybol 34, nce X ( ) (2,1,1,1,1,2) 2. It portant to note that whle the nuber of clae deterned by the 4

nature of the proce, the ze of the -urroundng not, whch gve oe flexblty to the analyt to explore varou alternatve, however bounded by the necety to atfy oe nu condton requred to enure derable tattcal properte, a dcued ore fully below. Table 1. Lt of ybol for =2, =6 1 =(1,1,1,1,1,1) 17 =(1,2,1,1,1,1) 33 =(2,1,1,1,1,1) 49 =(2,2,1,1,1,1) 2 =(1,1,1,1,1,2) 18 =(1,2,1,1,1,2) 34 =(2,1,1,1,1,2) 50 =(2,2,1,1,1,2) 3 =(1,1,1,1,2,1) 19 =(1,2,1,1,2,1) 35 =(2,1,1,1,2,1) 51 =(2,2,1,1,2,1) 4 =(1,1,1,1,2,2) 20 =(1,2,1,1,2,2) 36 =(2,1,1,1,2,2) 52 =(2,2,1,1,2,2) 5 =(1,1,1,2,1,1) 21 =(1,2,1,2,1,1) 37 =(2,1,1,2,1,1) 53 =(2,2,1,2,1,1) 6 =(1,1,1,2,1,2) 22 =(1,2,1,2,1,2) 38 =(2,1,1,2,1,2) 54 =(2,2,1,2,1,2) 7 =(1,1,1,2,2,1) 23 =(1,2,1,2,2,1) 39 =(2,1,1,2,2,1) 55 =(2,2,1,2,2,1) 8 =(1,1,1,2,2,2) 24 =(1,2,1,2,2,2) 40 =(2,1,1,2,2,2) 56 =(2,2,1,2,2,2) 9 =(1,1,2,1,1,1) 25 =(1,2,2,1,1,1) 41 =(2,1,2,1,1,1) 57 =(2,2,2,1,1,1) 10 =(1,1,2,1,1,2) 26 =(1,2,2,1,1,2) 42 =(2,1,2,1,1,2) 58 =(2,2,2,1,1,2) 11 =(1,1,2,1,2,1) 27 =(1,2,2,1,2,1) 43 =(2,1,2,1,2,1) 59 =(2,2,2,1,2,1) 12 =(1,1,2,1,2,2) 28 =(1,2,2,1,2,2) 44 =(2,1,2,1,2,2) 60 =(2,2,2,1,2,2) 13 =(1,1,2,2,1,1) 29 =(1,2,2,2,1,1) 45 =(2,1,2,2,1,1) 61 =(2,2,2,2,1,1) 14 =(1,1,2,2,1,2) 30 =(1,2,2,2,1,2) 46 =(2,1,2,2,1,2) 62 =(2,2,2,2,1,2) 15 =(1,1,2,2,2,1) 31 =(1,2,2,2,2,1) 47 =(2,1,2,2,2,1) 63 =(2,2,2,2,2,1) 16 =(1,1,2,2,2,2) 32 =(1,2,2,2,2,2) 48 =(2,1,2,2,2,2) 64 =(2,2,2,2,2,2) Once the ybolzaton of the proce ha been defned, t poble to calculate the frequency of each ybol, whch ply the nuber of locaton that are of -type: n { S X ( ) } (2) where denote the cardnalty of a et. Snce th frequency defned for each of ybol, under the condton above, the relatve frequency of a ybol can be ealy coputed a: p( ) p { S of type} S whereby S denote the cardnalty of the et S (the total nuber of ybolzed obervaton). Now, under th ettng, we can defne the ybolc entropy of the patal proce { X } for an ebeddng denon 2. Th entropy defned a the S Shanon entropy of the dtnct ybol a follow: h( ) p ln( p ) (4) (3) 5

Sybolc entropy, or h, ( ) the nforaton contaned n coparng the - urroundng defned for the patal proce. Notce that when one ybol, ay, tend to donate the proce then p 1 and p 0 for all, whch ple that p ln( p ) 0 and p ln( p ) 0 and therefore h 0. Furtherore, when the value of the qualtatve varable are dentcally and ndependently dtrbuted, all ybol hould appear wth equal frequency, n whch cae we have that p 1/ for all. The entropy functon then bounded between 0 h ln, where the lower bound ndcate a tendency for only one ybol to occur (.e. there a tendency toward patternng n the dtrbuton of the value of the qualtatve varable), and the upper bound correpond to a copletely rando yte (..d. patal equence). A an llutraton, conder the tuaton llutrated n Fg. 2, wth =2 and =3, whch ean 3 that there are 2 8 ybol. The left panel how a rando dtrbuton of the value of the qualtatve varable. The htogra of the frequency of each of eght ybol verfe that all ybol appear wth lar frequency. The rght panel how the cae where the value are dtrbuted non-randoly and two ybol tend to appear wth ore frequency than the ret. Rarely wll the frequency of ybol be dentcal, and the queton that eerge whether departure fro th are gnfcant. In other word, do oe ybol appear wth ore or le frequency than what would be expected by chance alone? The reult needed to tattcally tet th hypothe are derved next. Fg. 2. Rando and non-rando dtrbuton of value of qualtatve varable (=2) and frequency of ybol 6

4 Contructon of the ndependence tet In th ecton, we contruct a patal ndependence tet for a dcrete qualtatve patal varable. We alo prove that an affne tranforaton of the ybolc entropy defned n 2 Eq. (4) ayptotcally dtrbuted. Let { X } be a dcrete patal proce and be a fxed ebeddng S denon. In order to contruct a tet for patal ndependence n { X }, we conder the followng null hypothe: H0 { X } S patally ndependent, agant any other alternatve. Now, for a ybol, we defne the rando varable Z a follow: Z 1 f X( ) 0 otherwe that, we have that Z 1 f and only f of -type, Z 0 otherwe. Then Z a Bernoull varable wth probablty of ucce p, where ucce ean that of -type. It traghtforward to ee that: n p 1 (6) 1 Let u aue that et S fnte and of order R (the nuber of ybolzed locaton). Then we are ntereted n nowng how any are of -type for all ybol. We contruct the followng varable to th end: Y S Z (7) varable therefore The varable Y can tae the value {01 2 R }. Notce that not all the Z are ndependent (due to the overlappng of oe -urroundng), and Y not exactly a bnoal rando varable. Neverthele, the u of dependent Bernoull varable can be approxated to a bnoal rando varable whenever (ee Soon 1996): S (5) () () Dependence aong the ndcator are wea; and The probablty of the ndcator to occur all. Condton () atfed by the way the ybol have been contructed, nce n th cae, under the null hypothe, the probablty of ucce of the ndcator Z all ( p 1/ ). Condton (), on the other hand can uually be atfed only f the event are dtrbuted n a regular array, and the ze of the -urroundng relatvely all, n whch cae the overlap are nor. More generally, when the ze of the - urroundng large, or when ther patal arrangeent rregular, th condton becoe ore dffcult to antan, f we conder all the ndcator Z for all S. Addtonal tep are therefore needed to enure that the dependence aong the ndcator Z are wea. 7

In order to attan a good bnoal approxaton, we conder a ubet of locaton S S wth controlled overlap, o that the dependence aong the ndcator Z are wea for S. Ue of a ubet of locaton wll caue a lo of nforaton, and th lo wll be greater n the eaure that et S aller. A reaonable balance therefore ut be truc between trongly dependent ndcator and too uch lo of nforaton. In order to control the aount of overlap aong the Bernoull varable, we can tae a S thoe coordnate n S uch that for any two coordnate S the et of nearet neghbour of and are at ot r (a all enough potve nteger) f they nterect: 0 f non-overlappng N N r otherwe We call th nteger r the degree of overlap of the patal proce { X }. We now turn to a ethod to elect the et S atfyng the above condton. Let u defne the et S recurvely a follow. Frt choe a locaton 0 S at rando and fx an 0 0 0 nteger r wth 0 r. Let N { 1 2 1} be the et of nearet neghbour to 0, where the 0 0 A0 0 1 r 2 1 0 0 0 are ordered by dtance to 0. Let u call 1 r 1 and defne { }. Tae the et of nearet neghbour to 1, naely 1 1 1 1 N { 1 2 1}, n the et of locaton S A 0 and defne 2 r 1. Now for 1 we defne where 1 r 1 N { }, of the et S 0 1 1 1 1 1 2 1 1 n the et of nearet neghbour to r 1 1 S (8), 1 { A}. Contnue th proce whle there are locaton to ybolze. In the end, we have contructed a et of locaton: 0 S { 0 1 R } (9) uch that the varable Y S Z can be approxated to a bnoal dtrbuton for a utable choce of r. Notce that the axu nuber of locaton that can be N ybolzed wth an overlappng degree r R r 1, where the operator [ x ] denote the nteger part of a real nuber x. Gven the above conderaton, we can now tate the followng reult (the proof can be found n the Appendx). Theore 1 Let { X } be a qualtatve dcrete patal proce wth S N. Let S A { a1 a 2 a } be the et of poble value that X can tae, for all S. Let r be N the overlappng degree of { X } and R [ ] 1, where [ x ] denote the nteger part S of a real nuber x. Let { 1 2 } be the et of ybol defned n Secton 2. Let the nuber of te that cla a appear n ybol and q P( X a ). Denote by h ( ) the ybolc entropy defned n Eq. (2) for a fxed ebeddng denon 2 wth. If the patal proce { X } ndependent, then: n Q R q h 1 R 1 r 2 ln (10) S 8

2 ayptotcally dtrbuted. 1 Note that f { X } alo dentcally dtrbuted, n other word, each value of the S varable appear wth equal frequency, then q 1 and therefore Eq. (10) reduce to: Q( ) 2 R Ln( ) h( ). (11) Let be a real nuber wth 0 1. Let 9 2 be uch that: 2 2 P ( ) 1 (12) Then, to tet: H0 { X } S patally ndependent, the decon rule n the applcaton of the Q ( ) tet at a 100(1 )% confdence level : 2 If Q( ) then reect H Otherwe do not reect H 5 Properte of the Q() tet 0 0 Next, we prove that the Q ( ) tet content for a wde varety of patally dependent procee. Th a valuable property nce the tet wll reect ayptotcally the aupton of patal ndependence whenever there patal dependence wthn the - urroundng. By patal dependence of order le than or equal to we ean that, whatever the tructure of the patal proce, there ext dependence between the rando varable located at pont and t -urroundng or a part of t. We wll denote by Q ( ) the etator of Q ( ). The proof of the followng theore can be found n the Appendx. Theore 2 Let { X } be a dcrete patal proce, and 2 wth. Then: S lpr( Q( ) C) 1 (14) R under patal dependence of order aller than or equal to for all 0 C C. Thu, the tet baed on Q ( ) content agant all patal dependence of order le than or equal to. Converely, nce the dependence detected by the tet at ot of order, f the dependence tructure of the proce of order larger than, then t wll not be preent n every -urroundng and therefore the ybol ay not capture t. Snce Theore 2 ple Q ( ) wth probablty approachng one under patal dependence of order le than or equal to, then upper-taled crtcal value are approprate. A prevouly noted, fro a practcal pont of vew, the reearcher ha to decde upon the ebeddng denon n order to copute ybolc entropy and therefore, to calculate the Q ( ) tattc. Whle th afford oe flexblty, there are alo oe condton that ut be oberved n order to gude a decon. Note that the nuber of locaton that are ybolzed (R) hould be larger than the nuber of ybol ( ) n order to have at leat the ae nuber of -urroundng a ybol have been defned 2 ( 1 ). When the dtrbuton appled n practce, and all the expected frequence are larger than or equal to fve the ltng tabulated (13) 2 dtrbuton gve,

2 a a rule, the value wth an approxaton uffcent for ordnary purpoe (ee chapter 10 of Rohatg 1976). For th reaon, t trongly advable to wor wth at leat 5 ybolzed obervaton. 6 Fnte aple behavour of Q() In th ecton, we exane the fnte aple behavour of the Q() tet. Th to etablh the power and ze of the tattc under varou level of patal aocaton, aple ze, ze of the -urroundng, and degree of overlap. In addton, we explore the potental pact of boundary effect. 6.1 Sze and Power of the Tet n Fnte Saple To nvetgate the power and ze of the tet, we conduct an extenve et of nuercal experent. Let u begn wth oe conderaton regardng the data generatng proce ued for the experent. In order to obtan categorcal rando varable wth controlled degree of patal dependence, we have degned a two-tage data generatng proce. Frtly, we ulate autocorrelated data ung the followng odel: 1 Y ( I W) (15) where ~ N 0,1, I the N N dentty atrx, a paraeter of patal dependence, and W a connectvty atrx that deterne the et of patal relatonhp aong pont. The proce, therefore, baed on the auto-noral odel. Alternatve odel for the data generaton proce were condered (e.g., the autologtc) but the auto-noral provde the bet alternatve for controllng the frequency of each categorcal value n the ulaton. In the econd tep of the data generaton proce, the contnuou patally autocorrelated varable Y ued to defne a dcrete patal proce a follow. Let b be defned by: p( Y b ) wth < (16) Let A { a1 a 2 a } and defne the dcrete patal proce a: a f Y b 1 1 f 1 a f Y b 1 X a b Y b The lat te that need to be deterned before data can be generated a pecfc patal arrangeent of obervaton, o that atrx W can be defned. In th regard, we note that Farber et al. (2009) report that quare teellaton provde poor approxate to the topology of real geographcal yte. For th reaon, we prefer to ue for our experent hexagonal teellaton, whch ore cloely reeble the topology of Vorono teellaton and adntratve zonng yte ued n any eprcal applcaton. Two experent ue regular lattce of ze N=100, 400, 900, 1,600, 2,500, and 3,600. In addton to thee regular teellaton, we ulate rregular, but not rando, patal dtrbuton of obervaton, wth the ae nuber of obervaton. The data are generated ung Eq. (15), wth the connectvty atrx defned n ter of frt-order contguty. Matrx W row-tandardzed for the calculaton. Fgure 3 how exaple of the dfferent patal dtrbuton of N=900 (17) 10

obervaton generated for 0 (no patal tructure), 05 and 09, and for =2, 3, 4 poble outcoe. A can be een there, when the value of paraeter ncreae, ore cell of the ae colour cluter together. Exaple of the rregular dtrbuton of obervaton ued n the econd et of experent are hown n Fg. 4. The followng paraeter pace explored, fro no- to hgh-autocorrelaton, x aple ze, three clae for nuber of outcoe, three -urroundng ze, and overlap: Autocorrelaton paraeter = 0.0, 0.2, 0.5, and 0.9 Saple ze N= 100, 400, 900, 1,600, 2,500 and 3,600. Nuber of outcoe =2, 3 and 4 Sze of -urroundng: three (elf + 2 neghbour), four (elf + 3 neghbour), fve (elf + 4 neghbour) Degree of overlap r=1, 2,..., a approprate (ee below) Data are ulated 1,000 te for each cobnaton of paraeter (nuber of replcaton), and the tet wa appled to each generated dataet at level of gnfcance 0.05. The nuber of te that the probablty value of the tattc exceeded 0.05 wa recorded, whch, followng the decon rule poed n Eq. (13), would ndcate reecton of the null hypothe. We would expect the tattc to fal to reect the null hypothe ot of the te when the level of autocorrelaton zero (ze of the tattc). At the ae te, we would expect t to reect the null hypothe ore frequently a the level of autocorrelaton goe up (power of the tattc). ρ = 0 ρ = 0.5 ρ = 0.9 =2 =3 =4 Fg. 3. Exaple of dtrbuton of obervaton on a regular hexagonal lattce (N=900), for dfferent nuber of outcoe and level of. 11

Fg. 4. Irregular dtrbuton of obervaton N=900 and N=3,600 The reult of the nuercal experent appear n Table 2, 3, and 4, for the regular and rregular ettng repectvely. Pleae note that cobnaton of paraeter are elected that atfy the general rule that there ut be at leat 5 ybolzed obervaton R. Snce the nuber of ybolzed locaton depend on the degree of overlap, r ut be elected o that the nuber of ybolzed locaton R greater than 5. The reult of the experent ndcate that the ze of the tet (the reecton rate when the varable ndependent) typcally hgher for regular lattce, ndcatng a lghtly greater r for fale potve, copared to rregular dtrbuton of obervaton. Increang the overlappng degree lead to ore ybolzed obervaton. Th, a prevouly dcued, can reult n non-ndependent Bernoull varable, and therefore a hgher ze of the tet, a een n the table. The ncreae n ze expected to carry over for hgher level of patal dependence. The experent are conducted for equal and unequal frequency of the value of the qualtatve varable. The ze of the tet not affected by change n the proportonalty of varable value n our experent. Wth regard to the power of the tattc, when the overlappng degree hgh, the power wll tend to be hgh a well. Th due to two effect: frt, the tartng ze hgher, and econdly, the nuber of ybolzed locaton greater. Wth regard to equal and unequal frequence of the value of the varable, there a lght lo n power when the value are not oberved wth dentcal frequence. Th lo ore ared for all aple ze, whch ay ae t dffcult to dentfy even oderately trong patally dependent procee n all aple tuaton. The lo n power becoe le relevant a the ze of the aple ncreae, even for oderately large aple uch a N=400. A uual, the power of the tattc tend to ncreae wth ncreang aple ze. For a fxed aple ze, the power lower a the nuber of categore ncreae, but th due to the fact that the nuber of ybol wll ncreae a well, and o the rato of ybolzed locaton to ybol wll decreae. It nteretng to rear that the reult are notceably dfferent for the cae where obervaton are rregularly dtrbuted copared to regular teellaton, when other paraeter are coparable. Th would ugget that the topology of the yte to oe extent can nfluence the perforance of the tattc. Whle an n-depth nvetgaton of the effect of topology beyond the cope of th paper, th uggeted a a topc for future reearch along the lne of the tude by Páez et al. (2008) and Farber et al. (2009). 12

Table 2. Sze and power of the Q tet for =2 Regular lattce p 1 = p 2 =1/2 p 1 =1/4; p 2 =3/4 N R r ρ = 0 ρ = 0.2 ρ = 0.5 ρ = 0.9 ρ = 0 ρ = 0.2 ρ = 0.5 ρ = 0.9 100 49 3 1 0.051 0.032 0.088 0.736 0.034 0.041 0.069 0.640 97 4 3 0.081 0.118 0.242 0.936 0.077 0.088 0.205 0.876 199 3 1 0.027 0.036 0.329 1.000 0.032 0.040 0.204 1.000 400 199 4 2 0.045 0.052 0.370 1.000 0.052 0.065 0.268 1.000 397 4 3 0.083 0.131 0.609 1.000 0.096 0.112 0.512 1.000 449 3 1 0.031 0.062 0.676 1.000 0.024 0.048 0.517 1.000 900 449 4 2 0.038 0.070 0.792 1.000 0.028 0.051 0.621 1.000 897 4 3 0.076 0.144 0.926 1.000 0.082 0.138 0.823 1.000 299 5 2 0.062 0.074 0.610 1.000 0.062 0.105 0.498 1.000 799 3 1 0.034 0.088 0.878 1.000 0.035 0.075 0.779 1.000 1600 799 4 2 0.041 0.118 0.974 1.000 0.046 0.102 0.902 1.000 1597 4 3 0.083 0.259 0.997 1.000 0.082 0.203 0.977 1.000 532 5 2 0.046 0.090 0.859 1.000 0.057 0.103 0.765 1.000 Irregular lattce p 1 = p 2 =1/2 p 1 =1/4; p 2 =3/4 N R r ρ = 0 ρ = 0.2 ρ = 0.5 ρ = 0.9 ρ = 0 ρ = 0.2 ρ = 0.5 ρ = 0.9 100 49 3 1 0.029 0.037 0.086 0.811 0.027 0.042 0.090 0.704 97 4 3 0.052 0.068 0.242 0.952 0.053 0.054 0.170 0.897 199 3 1 0.024 0.055 0.465 1.000 0.025 0.059 0.354 1.000 400 199 4 2 0.036 0.061 0.539 1.000 0.036 0.062 0.423 1.000 397 4 3 0.046 0.127 0.766 1.000 0.053 0.119 0.675 1.000 449 3 1 0.025 0.096 0.876 1.000 0.038 0.078 0.764 1.000 900 449 4 2 0.037 0.125 0.924 1.000 0.048 0.113 0.817 1.000 897 4 3 0.053 0.215 0.984 1.000 0.052 0.207 0.952 1.000 299 5 2 0.053 0.102 0.800 1.000 0.056 0.113 0.685 1.000 799 3 1 0.024 0.160 0.993 1.000 0.036 0.116 0.950 1.000 1600 799 4 2 0.038 0.199 0.998 1.000 0.046 0.144 0.977 1.000 1597 4 3 0.039 0.383 1.000 1.000 0.052 0.308 1.000 1.000 532 5 2 0.033 0.126 0.978 1.000 0.067 0.126 0.942 1.000 13

Table 3. Sze and power of the Q tet for =3 Regular lattce p 1 = p 2 = p 3 =1/3 p 1 =1/8; p 2 =3/8; p 3 =4/8 N R R ρ = 0 ρ = 0.2 ρ = 0.5 ρ = 0.9 ρ = 0 ρ = 0.2 ρ = 0.5 ρ = 0.9 400 199 3 1 0.030 0.048 0.290 1.000 0.037 0.039 0.296 1.000 449 3 1 0.033 0.037 0.611 1.000 0.034 0.055 0.555 1.000 900 449 4 2 0.064 0.077 0.690 1.000 0.058 0.098 0.726 1.000 897 4 3 0.067 0.148 0.891 1.000 0.101 0.162 0.890 1.000 799 3 1 0.020 0.072 0.879 1.000 0.029 0.056 0.839 1.000 1600 799 4 2 0.025 0.104 0.958 1.000 0.066 0.130 0.946 1.000 1597 4 3 0.078 0.209 0.995 1.000 0.127 0.225 0.991 1.000 1249 3 1 0.027 0.090 0.992 1.000 0.031 0.091 0.977 1.000 2500 1249 4 2 0.041 0.134 0.996 1.000 0.047 0.153 0.992 1.000 2497 4 3 0.094 0.265 0.999 1.000 0.107 0.302 1.000 1.000 Irregular lattce p 1 = p 2 = p 3 =1/3 p 1 =1/8; p 2 =3/8; p 3 =4/8 N R R ρ = 0 ρ = 0.2 ρ = 0.5 ρ = 0.9 ρ = 0 ρ = 0.2 ρ = 0.5 ρ = 0.9 400 199 3 1 0.029 0.037 0.086 0.811 0.027 0.042 0.090 0.704 449 3 1 0.024 0.055 0.465 1.000 0.025 0.059 0.354 1.000 900 449 4 2 0.036 0.061 0.539 1.000 0.036 0.062 0.423 1.000 897 4 3 0.046 0.127 0.766 1.000 0.053 0.119 0.675 1.000 799 3 1 0.025 0.096 0.876 1.000 0.038 0.078 0.764 1.000 1600 799 4 2 0.037 0.125 0.924 1.000 0.048 0.113 0.817 1.000 1597 4 3 0.053 0.215 0.984 1.000 0.052 0.207 0.952 1.000 1249 3 1 0.024 0.160 0.993 1.000 0.036 0.116 0.950 1.000 2500 1249 4 2 0.038 0.199 0.998 1.000 0.046 0.144 0.977 1.000 2497 4 3 0.039 0.383 1.000 1.000 0.052 0.308 1.000 1.000 Table 4. Sze and Power of the Q tet for =4 Regular lattce p 1 = p 2 = p 3 = p 4 =1/4 p 1 =1/12; p 2 =2/12; p 3 =3/12; p 4 =6/12 N R r ρ = 0 ρ = 0.2 ρ = 0.5 ρ = 0.9 ρ = 0 ρ = 0.2 ρ = 0.5 ρ = 0.9 900 449 3 1 0.033 0.059 0.517 1.000 0.039 0.081 0.514 1.000 1600 799 3 1 0.026 0.043 0.788 1.000 0.040 0.069 0.725 1.000 2500 1249 3 1 0.026 0.076 0.971 1.000 0.026 0.086 0.927 1.000 1799 3 1 0.031 0.099 0.997 1.000 0.037 0.099 0.995 1.000 3600 1799 4 2 0.070 0.185 1.000 1.000 0.097 0.271 0.998 1.000 3597 4 3 0.077 0.280 1.000 1.000 0.147 0.400 0.999 1.000 Irregular lattce p 1 = p 2 = p 3 = p 4 =1/4 p 1 =1/12; p 2 =2/12; p 3 =3/12; p 4 =6/12 N R r ρ = 0 ρ = 0.2 ρ = 0.5 ρ = 0.9 ρ = 0 ρ = 0.2 ρ = 0.5 ρ = 0.9 900 449 3 1 0.033 0.081 0.799 1.000 0.039 0.085 0.755 1.000 799 3 1 0.031 0.111 0.991 1.000 0.052 0.103 0.965 1.000 1600 1597 4 3 0.098 0.308 1.000 1.000 0.098 0.366 1.000 1.000 1249 3 1 0.036 0.167 1.000 1.000 0.036 0.149 0.999 1.000 2500 2497 4 3 0.086 0.395 1.000 1.000 0.112 0.490 1.000 1.000 1799 3 1 0.017 0.251 1.000 1.000 0.038 0.240 1.000 1.000 3600 1799 4 2 0.062 0.302 1.000 1.000 0.098 0.399 1.000 1.000 3597 4 3 0.059 0.521 1.000 1.000 0.122 0.576 1.000 1.000 14

6.2 Boundary effect In th ecton, we are ntereted n aeng the potental effect of yte boundare when data pont are not oberved beyond an arbtrarly defned boundary. The uual queton when conderng boundary effect whether the behavour of a tattcal etator change f varable X nfluenced by X and a locaton outde of the tudy area (Hanng 1990, p. 101; Upton and Fngleton 1985, p. 365). For u, the queton whether our ablty to detect a patally dependent proce nfluenced by unnown value of the varable at locaton beyond the boundary of the tudy area. In other word, how frequently the tattc would agree to ether reect or fal to reect the null hypothe for two yte that are coparable (n ze) f one oberved copletely and the other oberved only partally. Table 5. Sze and power of Q() n the preence of boundary effect =2 p1 p 2 1/ 2 Partal yte Coplete yte N O N C R r ρ = 0 ρ = 0.2 ρ = 0.5 ρ = 0.9 N O =N C ρ = 0 ρ = 0.2 ρ = 0.5 ρ = 0.9 100 196 49 3 1 0.044 0.035 0.052 0,609 0.051 0.032 0.088 0.736 100 97 4 3 0.091 0.111 0.178 0,876 0.081 0.118 0.242 0.936 400 576 199 3 1 0.024 0.047 0.248 1,000 0.027 0.036 0.329 1.000 400 397 4 3 0.073 0.110 0.571 1,000 0.083 0.131 0.609 1.000 900 1156 449 3 1 0.029 0.043 0.570 1,000 0.031 0.062 0.676 1.000 900 897 4 3 0.065 0.124 0.874 1,000 0.076 0.144 0.926 1.000 1600 1936 799 3 1 0.027 0.084 0.841 1,000 0.034 0.088 0.878 1.000 1600 1597 4 3 0.083 0.236 0.989 1,000 0.083 0.259 0.997 1.000 =3 p1 p2 p 3 1/ 3 Partal yte Coplete yte N O N C R r ρ = 0 ρ = 0.2 ρ = 0.5 ρ = 0.9 N O =N C ρ = 0 ρ = 0.2 ρ = 0.5 ρ = 0.9 400 576 199 3 1 0.025 0.038 0.223 0.999 0.030 0.048 0.290 1.000 400 397 4 3 0.125 0.156 0.519 1.000 0.141 0.156 0.586 1.000 900 1156 449 3 1 0.023 0.041 0.502 1.000 0.033 0.037 0.611 1.000 900 897 4 3 0.081 0.139 0.838 1.000 0.067 0.148 0.891 1.000 1600 1936 799 3 1 0.024 0.050 0.828 1.000 0.020 0.072 0.879 1.000 1600 1597 4 3 0.077 0.192 0.984 1.000 0.078 0.209 0.995 1.000 2500 2916 1249 3 1 0.019 0.093 0.969 1.000 0.027 0.090 0.992 1.000 2500 2497 4 3 0.097 0.267 0.999 1.000 0.094 0.265 0.999 1.000 =4 p1 p2 p3 p 4 1/ 4 Partal yte Coplete yte N O N C R r ρ = 0 ρ = 0.2 ρ = 0.5 ρ = 0.9 N O =N C ρ = 0 ρ = 0.2 ρ = 0.5 ρ = 0.9 900 1156 449 3 1 0.036 0.049 0.398 1.000 900 0.033 0.059 0.517 1.000 1600 1936 799 3 1 0.031 0.047 0.703 1.000 1600 0.026 0.043 0.788 1.000 2500 2916 1249 3 1 0.026 0.050 0.947 1.000 2500 0.026 0.076 0.971 1.000 3600 4096 1799 3 1 0.035 0.076 0.995 1.000 3600 0.031 0.099 0.997 1.000 To evaluate the effect of boundare n the ze and power of the tet, we conduct a econd ulaton experent. The data are generated ung the ae procedure decrbed n the precedng ecton. In rregular lattce, -urroundng are contructed baed on proxty, only reortng to the drecton crteron n the rare cae when two neghbour are equdtant. Boundary effect could be ore crtcal n regular teellaton due to the way that the -urroundng are contructed nvolvng the drecton of the neghbour. For th reaon, we repeat the experent wth regular hexagonal teellaton only. The reult can be copared to the experent n the 15

precedng ecton (Table 2, 3 and 4), conducted under the aupton that the yte wa copletely oberved. In the cae of the new ulaton, a coplete yte ulated wth N C cae. In order to ulate the boundare, we reove all hexagon n the perphery, to obtan an oberved yte wth a total of N O cae after ottng all obervaton n the boundare. The ulaton done for 1000 replcaton, and the frequency of reecton of the tattc recorded for each cobnaton of paraeter to calculate the ze and power of the tet when appled to a partal yte. The reult of the experent are preented n Table 5 for varou value of N O,, and other paraeter. The colun gve the frequency of reecton of the null hypothe. It to be expected that the frequency of reecton n the cae of a partal yte wll not be affected when the level of autocorrelaton zero. In th cae, the data pont are ndependent, and what happen beyond the boundary tay there. The reult of the ulaton confr th, nce t can be een that the ze coparable for every cae tuded. The reult ndcate that boundary effect nfluence the power of the tattc when autocorrelaton preent. The effect n general to reduce the power of the tattc, although the reducton relatvely all n ot cae, and the lo n power tend to be aller a the nuber of obervaton and the level of autocorrelaton ncreae, nce th naturally aocated wth hgher power n any cae. Typcal recoendaton for the treatent of boundary effect nclude to collect ore data pont whenever poble, or to create an artfcal boundary or buffer, and to conder the obervaton wthn the buffer a a nown boundary. The frt recoendaton enble but frequently unfeable. The econd recoendaton of dubou ert n the cae of our tattc, becaue any gan n power are bound to be nor a uggeted by the ulaton, and lely be offet by the lo of power aocated wth a reduced nuber of obervaton. 7 Illutraton: Fat food etablhent n Toronto We now proceed to llutrate the ue of the Q() tet by ean of an eprcal exaple concernng the patal aocaton of fat food etablhent n the cty of Toronto, Canada, pecfcally thoe offerng prarly [P]zza, [S]andwch, and [H]aburger product. Ue of patal tattc ha recently been appled to the tudy of food envronent (Autn et al. 2005), and our exaple llutrate other way n whch the food landcape can be exaned fro a patal perpectve. In partcular, we explore the queton of whether the type of etablhent ndependent fro t neghbour, whether etablhent tend to attract or repel etablhent of the ae type. 7.1 Data The analy baed on bune pont, whch record the locaton of dfferent etablhent n the cty of Toronto, a well a ther ndutral code and other charactertc, uch a varou categore of ze, revenue, etc. The bune drectory baed on nfocanada data, whch copled fro over 200,000 ource, ncludng telephone drectore, annual report, pre releae, cty and ndutral drectore, new te, and new bune ltng. The databae telephoncally verfed annually by nfocanada to enure the accuracy of the nforaton. Th nforaton proceed and pacaged by Envronc Analytc to produce a bune profle databae. The fnal databae for analy nclude a cuto Standard Indutral Clafcaton code whch allow for the dentfcaton of bune group. Locaton coordnate are coded by Envronc Analytc to enable appng applcaton of the bunee recorded n the databae. For the purpoe of th llutraton, a ubet of obervaton extracted fro the fle correpondng to the regon urroundng the cty 16

of Toronto, to obtan a et of 877 bunee wth Standard Indutral Code 5812 clafcaton ( Eatng Place ) that can be dentfed a offerng prarly one of 3 type of fat food, ncludng [P]zza (n P =303), [S]andwch (n S =299), and two aor [H]aburger chan n the cty (n H =275). The patal dtrbuton of fat food place hown n Fg. 5. It reaonable n th cae to thn of the obervaton a a copletely oberved yte, becaue developent beyond the tudy area pare wth the excepton of the wetern boundary. A uggeted by the ulaton n Secton 6.2, even f there are boundary effect, there lttle r that a fale potve wll be obtaned, and the lo of power bound to be relatvely all. 17

Fg. 5. Fat food etablhent n Toronto and the Greater Toronto Area 18

7.2 Analy and reult To upport our analy of patal aocaton of a qualtatve varable (etablhent type) we frt analyze the pont pattern of etablhent. Th done by ean of nearet neghbour analy, an approach developed wth the obectve of eaurng the degree of proxty between event and ther nearet neghbour (Baley and Gatrell 1995). The pecfc technque we ue the G functon, a cuulatve plot that how the proporton of event that have a nearet neghbour at a dtance of d or le: G d # n d d (18) N The analy can be perfored for th order neghbour, that, the proporton of event whoe th nearet neghbour at a dtance d or le. A teep ncreae of the functon ndcate a tendency toward patal cluterng. The reult of th analy are hown n Fg. 6, where t can be een that about 70% of event have a frt order nearet neghbour wthn 500 dtance, and about 90% have frt order neghbour wthn 1.2. About 70% of event have a econd order neghbour wthn 1.1, and a thrd order neghbour wthn 1.5. Th gve a tronger ba to the prelnary preon that there a good deal of patal cluterng n the locaton pattern of fat food etablhent. Fg. 6. Event-to-event th nearet neghbour dtance analy We now turn to the queton of whether there are pattern of aocaton for the dfferent type of etablhent. Applcaton of our tattc traghtforward. The nuber of poble event outcoe =3, and the nuber of obervaton N=877. ln N/ 5 / ln 4.7033, eanng that we can explore - Gven thee value urroundng of ze two (elf and one nearet neghbour), three (elf and two nearet neghbour) and four (elf and three nearet neghbour). On the other hand, we are prevented by the aple ze to explore -urroundng of ze fve or larger. Baed on our prevou applcaton of the G functon, there appear to be only relatvely a nor dfference between =3 and =4, n ter of the patal dtrbuton of the locaton of etablhent. Snce a property of the tattc that t detect patal dependence of order le than or equal to, t ee enble to elect =4 for our analy. If dependence detected, t would carry for the cae of =2 and =3. One addtonal decon to ae concern the degree of overlap. An overlap of r=1 doe not atfy the crteron that the nuber of ybolzed locaton be greater than 5 =405. Overlap of r=2 and r=3 reult n R=437 and R=874 ybolzed locaton repectvely. Snce the ulaton reult ndcate that hgher value of R generally ncreae the power of the 19

tattc, we elect r=3 for our applcaton. The uary of thee paraeter and reult of the tet are hown n Table 6. Table 6. Suary of paraeter and reult Saple ze (N) 877 Sybolzed obervaton (R) 874 Nuber of clae () 3 Sze of -urroundng () 4 Degree of overlap (r) 3 Nuber of ybol (n) 81 Rato R/n 10.79 5 405.00 Frequency of clae 0.3136 0.3455 0.3409 Q tet for patal dependence n qualtatve data Tet Value DF p-value Q (equprobab.) 177.27 80 <10-5 Q (non-equprob.) 166.38 80 <10-5 The value of the tattc for the approxate cae of equal frequency of clae 177.27 (ee Eq. (11)), and for non-equprobablty 166.38 (ee Eq. (10)). 2 Thee value are teted ung the dtrbuton wth 1 80 degree of freedo. The cut-off value for reecton at the 0.05 level of gnfcance 101.8795, whch the value of the tet exceed, and therefore, accordng to our decon rule, lead to reecton of the hypothe of ndependence. Alternatvely, the probablty value n both cae are aller than 10-5. The tet reect the hypothe of ndependence. However, dependence could tae dfferent for. An attractve feature of the Q() tattc that t baed on the frequency of dfferent ybol beng oberved, whch allow a ore n-depth exploraton of the pattern of aocaton. Recall that the probablty of each ybol appearng under the hypothe of ndependence 1/, o that n th cae, nce there are 874 ybolzed locaton, each ybol would appear approxately eleven te. It poble to plot a htogra wth the actual frequency of the 81 ybol (ee Fg. 7). The expected frequency under the null ndcated by the dotted lne n the fgure, and t poble to ee whch ybol devate fro th expectaton, and n whch drecton (ore frequent, le frequent). The ybol carry a far aount of nforaton, nce each ybol repreent a partcular cobnaton of event, and alo ther order of proxty and pobly drectonalty fro 0. 20

Fg. 7. Frequency of fat food type co-locaton ybol n Toronto (=4 and r=3) In Fg. 8, we condene the nforaton contaned n the htogra, n order to dplay only the type of event n -urroundng, but not other feature of the pattern, uch a order of proxty. Th allow u to dcern that four etablhent of a nd (four pzza, four andwch, or four haburger place) are eldo found together. Much ore coon the cae where neghbourng group 4 etablhent cont of a varety of etablhent, wth at ot two of one cla, and one each of the other two clae. Th would tend to ndcate n addton to the evdence of econoe of aggloeraton evnced by the patal cluterng that wthn cluter there a pattern of copetton or repulon between etablhent of the ae type. Fgure 8. Co-locaton of event, condened htogra. H,S, and P ndcate the type of etablhent, e.g., HHPP ean two Haburger and two Pzza n a group of four. 21

8 Concluon In th paper, we have propoed a new tattc Q() ueful to tet the hypothe of ndependence aong patally dtrbuted qualtatve data. Qualtatve data are recevng ncreaed attenton fro a nuber of dcplnary perpectve. However, bede blac and whte or -coloured on count tattc (e.g., Clff and Ord 1981; Dacey 1968; Upton and Fngleton 1985), and the wor of Boot (2003), there ha been only lted developent n ter of patal analy of qualtatve data, copared to the developent of ethod and technque ueful to tudy contnuou varable. Our tattc therefore propoed a a copleent to further enrch the dverty of the patal analy toolbox. The Q() tattc developed partng fro concept of ybolc dynac. Sybolc dynac provde an deal et of tool to nvetgate dcrete procee. Our tattc therefore degned for the analy of patally dcrete event, wth qualtatve/nonal outcoe. In th paper, we provde the nferental ba for conductng tet of hypothe baed on an affne tranforaton of the tattc, and a decon rule propoed to reect or fal to reect the hypothe of ndependence. We have alo perfored an extenve et of nuercal experent that deontrate the ze and power of the tattc to dentfy patal aocaton under a range of dfferent condton, and the potental effect of boundare. In addton an exaple llutrate the uefulne of the tattc to addre ubtantve reearch queton. In addton to t ablty to dentfy pattern of patal aocaton, an attractve feature of our tattc that t baed on the frequency of occurrence of varou abtract ybol that can be lned to eanngful tate of the yte. The frequency of the ybol can be exaned to obtan n-depth nforaton about departure fro the expected frequence under the null hypothe of ndependence. The ablty to do th an, f not dentcal, to that provded by Moran catterplot, n that t gve pecfc pattern of aocaton that can be contrated wth dfferent dea about the ubtantve proce. In our exaple, we dcued a condened htogra of the ybol, whch provde a plfed perpectve on the pattern of aocaton. However, t not dffcult to envon other queton of nteret that could be explored ung the full htogra, for exaple, concernng drectonal or proxty trend of other type of event (e.g., do andwch etablhent tend to be cloer, or further away fro pzza locaton, relatve to haburger place?) In fact, the ybolzaton procedure can be odfed n order to addre pecfc reearch need, for exaple to deal wth queton of anotropy or other. Th a atter for further reearch. Two addtonal pont are ndcated a drecton for addtonal reearch. Frt, the nuber of outcoe typcally depend on the nature of the proce, and therefore a paraeter beyond the control of the analyt. The nuber of obervaton needed to conduct analy can qucly explode dependng on the ze of the - urroundng. For exaple, f =4, and one dere to exane -urroundng of ze three, at leat 625 pont would be requred. In contrat, an -urroundng of fve would requre 3,125 obervaton. A topc for further nvetgaton whether dfferent ybolzaton chee can help to antan data need under control. Secondly, ung the htogra of frequency of ybol to tet whether each ybol depart gnfcantly fro the expected frequency appear a a prong drecton for further reearch. The htogra already provde a decopoton of the tattc, and the ablty to tet devaton for pecfc ybol would further enhance the capablte of the tattc, n the anner of varou other local tattc of patal aocaton (Aneln 1995; Get and Ord 1993). 22

9 Appendx: Proof Proof of Theore 1 Under the null H 0, the ont probablty denty functon of the n varable ( Y Y Y ) : 1 2 a1 a 2 a a1 a2 P Y a Y a Y a p p p 1 1 2 2 1 2 a1 a 2 a a (A.1) where a1 a 2 an R. Conequently, the ont dtrbuton of the n varable ( Y Y Y ) a ultnoal dtrbuton. 1 2 The lelhood functon of the dtrbuton gven by Eq. (A.1) : R n n 1 2 L( p p p ) p p p n n n 1 2 1 2 1 2 and nce, 1 p 1, t follow that R n n 1 2 1 2 1 2 1 1 2 1 2 n L p p p p p p p p n n n Then the logarth of th lelhood functon rean a ln L p p p ln n ln p n n n 1 2 1 R 1 2 n ln 1 p p p 1 2 1 1 In order to obtan the axu lelhood etator pˆ of 12 n, we olve the followng equaton p ln L p p p 0 1 2 n to get that: n (A.2) (A.3) (A.4) p for all (A.5) n pˆ R (A.6) Then the lelhood rato tattc (ee for exaple Lehan 1986): Y (0) n (0) n (0) n (0) n 1 2 R n n n p p p p 1 2 1 2 1 n n n R 1 2 n n n n n p p p 1 2 1 2 R 1 1 R n n (0) (0) p R p R n n 1 1 n (A.7) 23