Relative density of the random r-factor proximity catch digraph for testing spatial patterns of segregation and association
|
|
- Brice Hart
- 5 years ago
- Views:
Transcription
1 Computational Statistics & Data Analysis Relative density of the random r-factor proximity catch digraph for testing spatial patterns of segregation and association Elvan Ceyhan, Carey E. Priebe, John C. Wierman Applied Mathematics, Statistics, Johns Hopkins University, Whitehead Hall, Baltimore 11868, USA Received 18 January 4; accepted 1 March 5 Available online 1 April 5 Abstract Statistical pattern classification methods based on data-random graphs were introduced recently. In this approach, a random directed graph is constructed from the data using the relative positions of the data points from various classes. Different random graphs result from different definitions of the proximity region associated with each data point and different graph statistics can be employed for data reduction. The approach used in this article is based on a parameterized family of proximity maps determining an associated family of data-random digraphs. The relative arc density of the digraph is used as the summary statistic, providing an alternative to the domination number employed previously. An important advantage of the relative arc density is that, properly re-scaled, it is a U-statistic, facilitating analytic study of its asymptotic distribution using standard U-statistic central limit theory. The approach is illustrated with an application to the testing of spatial patterns of segregation and association. Knowledge of the asymptotic distribution allows evaluation of the Pitman and Hodges Lehmann asymptotic efficacies, and selection of the proximity map parameter to optimize efficiency. Furthermore the approach presented here also has the advantage of validity for data in any dimension. 5 Elsevier B.V. All rights reserved. Keywords: Random proximity graphs; Delaunay triangulation; Relative density; Segregation; Association Corresponding author. Tel.: ; fax: address: cep@jhu.edu C.E. Priebe /$ - see front matter 5 Elsevier B.V. All rights reserved. doi:1.116/j.csda.5..
2 196 E. Ceyhan et al. / Computational Statistics & Data Analysis Introduction Classification and clustering have received considerable attention in the statistical literature. In recent years, a new classification approach has been developed which is based on the relative positions of the data points from various classes. Priebe et al. introduced the class cover catch digraphs CCCD in R and gave the exact and the asymptotic distribution of the domination number of the CCCD Priebe et al., 1. DeVinney et al., Marchette and Priebe, Priebe et al. a,b applied the concept in higher dimensions and demonstrated relatively good performance of CCCD in classification. The methods employed involve data reduction condensing by using approximate minimum dominating sets as prototype sets since finding the exact minimum dominating set is an NP-hard problem -in particular for CCCD. Furthermore the exact and the asymptotic distribution of the domination number of the CCCD are not analytically tractable in multiple dimensions. Ceyhan and Priebe introduced the central similarity proximity map and r-factor proximity maps and the associated random digraphs in Ceyhan and Priebe,5, respectively. In both cases, the space is partitioned by the Delaunay tessellation which is the Delaunay triangulation in R. In each triangle, a family of data-random proximity catch digraphs is constructed based on the proximity of the points to each other. The advantages of the r-factor proximity catch digraphs are that an exact minimum dominating set can be found in polynomial time and the asymptotic distribution of the domination number is analytically tractable. The latter is then used to test segregation and association of points of different classes in Ceyhan and Priebe 5. Segregation and association are two patterns that describe the spatial relation between two or more classes. See Section.5 for more detail. In this article, we employ a different statistic, namely the relative arc density, that is the proportion of all possible arcs directed edges which are present in the data random digraph. This test statistic has the advantage that, properly rescaled, it is a U-statistic. Two plain classes of alternative hypotheses, for segregation and association, are defined in Section.5. The asymptotic distributions under both the null and the alternative hypotheses are determined in Section by using standard U-statistic central limit theory. Pitman and Hodges Lehman asymptotic efficacies are analyzed in Sections 4. and 4.4 respectively. This test is related to the available tests of segregation and association in the ecology literature, such as Pielou s test and Ripley s test. See discussion in Section 6 for more detail. Our approach is valid for data in any dimension, but for simplicity of expression and visualization, will be described for two-dimensional data.. Preliminaries.1. Proximity maps Let Ω, M be a measurable space and consider a function N : Ω Ω Ω, where Ω represents the power set of Ω. Then given Y Ω, the proximity map N Y = N, Y : Ω Ω associates with each point x Ω a proximity region N Y x Ω. Typically,
3 E. Ceyhan et al. / Computational Statistics & Data Analysis Fig. 1. Construction of r-factor proximity region, N Y x shaded region. N is chosen to satisfy x N Y x for all x Ω. The use of the adjective proximity comes form thinking of the region N Y x as representing a neighborhood of points close to x Toussaint, 198; Jaromczyk and Toussaint, r-factor proximity maps We now briefly define r-factor proximity maps. see, Ceyhan and Priebe, 5 for more details. Let Ω = R and let Y = { } y 1, y, y R be three non-collinear points. Denote by TY the triangle including the interior formed by the three points i.e. TY is the convex hull of Y. For r 1, ], define NY r to be the r-factor proximity map as follows; see also Fig. 1. Using line segments from the center of mass centroid of TY to the midpoints of its edges, we partition TY into vertex regions R y 1, R y, and R y.forx TY\Y, let vx Y be the vertex in whose region x falls, so x Rvx. Ifx falls on the boundary of two vertex regions, we assign vx arbitrarily to one of the adjacent regions. Let ex be the edge of TY opposite vx. Let lx be the line parallel to ex through x. Let dvx, lx be the Euclidean perpendicular distance from vx to lx. Forr 1,, let l r x be the line parallel to ex such that d vx, l r x = rdvx, lx and d lx, l r x <dvx, l r x.let T r x be the triangle similar to and with the same orientation as TY having vx as a vertex and l r x as the opposite edge. Then the r-factor proximity region NY r x is defined to be T r x TY. Notice that r 1 implies x NY r x. Note also that lim r NY r x=ty for all x TY\Y, so we define NY x = TY for all such x. Forx Y, we define NY r x ={x} for all r 1, ].
4 198 E. Ceyhan et al. / Computational Statistics & Data Analysis Data-random proximity catch digraphs If X n := {X 1,X,...,X n } is a set of Ω-valued random variables, then the N Y X i,i= 1,...,n, are random sets. If the X i are independent and identically distributed, then so are the random sets N Y X i. iid In the case of an r-factor proximity map, notice that if X i F and F has a non-degenerate two-dimensional probability density function f with supportf TY, then the special case in the construction of NY r X falls on the boundary of two vertex regions occurs with probability zero. The proximities of the data points to each other are used to construct a digraph. A digraph is a directed graph; i.e. a graph with directed edges from one vertex to another based on a binary relation. Define the data-random proximity catch digraph D with vertex set V = {X 1,...,X n } and arc set A by X i,x j A Xj N Y X i. Since this relationship is not symmetric, a digraph is needed rather than a graph. The random digraph D depends on the joint distribution of the X i and on the map N Y..4. Relative density The relative arc density of a digraph D = V, A of order V =n, denoted ρd, is defined as ρd = A nn 1, where denotes the set cardinality functional Janson et al.,. Thus ρd represents the ratio of the number of arcs in the digraph D to the number of arcs in the complete symmetric digraph of order n, which is nn 1. For brevity of notation we use relative density rather than relative arc density henceforth. iid If X 1,...,X n F the relative density of the associated data-random proximity catch digraph D, denoted ρ X n ; h, N Y,isaU-statistic, where ρ X n ; h, N Y = 1 hx i,x j ; N Y, 1 nn 1 i<j h { } { } X i,x j ; N Y = I Xi,X j A + I Xj,X i A = I { X j N Y X i } + I { } X i N Y Xj, where I is the indicator function. We denote h X i,x j ; N Y as hij for brevity of notation. Although the digraph is asymmetric, h ij is defined as the number of arcs in D between vertices X i and X j, in order to produce a symmetric kernel with finite variance Lehmann, The random variable ρ n := ρx n ; h, N Y depends on n and N Y explicitly and on F implicitly. The expectation E ρ n ], however, is independent of n and depends on only F
5 E. Ceyhan et al. / Computational Statistics & Data Analysis and N Y : E ρ n ] = 1 E h 1 ] 1 for all n. The variance Var ρ n ] simplifies to Var ] 1 ρ n = nn 1 Var h 1] + n nn 1 Cov h 1,h 1 ] A central limit theorem for U-statistics Lehmann, 1988 yields n ρn E ] L ρ n N, Cov h1,h 1 ], 5 provided Cov h 1,h 1 ] >. The asymptotic variance of ρ n, Cov h 1,h 1 ], depends on only F and N Y. Thus, we need determine only E h 1 ] and Cov h 1,h 1 ] in order to obtain the normal approximation approx ρ n N E ] ] E h1 ] ρ n, Var ρn = N, Cov h 1,h 1 ] for large n. n 6.5. Null and alternative hypotheses In a two class setting, the phenomenon known as segregation occurs when members of one class have a tendency to repel members of the other class. For instance, it may be the case that one type of plant does not grow well in the vicinity of another type of plant, and vice versa. This implies, in our notation, that X i are unlikely to be located near any elements of Y. Alternatively, association occurs when members of one class have a tendency to attract members of the other class, as in symbiotic species, so that the X i will tend to cluster around the elements of Y, for example. See, for instance, Dixon 1994 and Coomes et al The null hypothesis for spatial patterns have been a contraversial topic in ecology from the early days. Gotelli and Graves 1996 have collected a voluminous literature to present a comprehensive analysis of the use and misuse of null models in ecology community. They also define and attempt to clarify the null model concept as a pattern-generating model that is based on randomization of ecological data or random sampling from a known or imagined distribution....the randomization is designed to produce a pattern that would be expected in the absence of a particular ecological mechanism. In other words, the hypothesized null models can be viewed as thought experiments, which is conventially used in the physical sciences, and these models provide a statistical baseline for the analysis of the patterns. For statistical testing for segregation and association, the null hypothesis we consider is a type of complete spatial randomness; that is, H : X i iid UT Y, where UT Y is the uniform distribution on TY. If it is desired to have the sample size be a random variable, we may consider a spatial Poisson point process on TY as our null hypothesis.
6 19 E. Ceyhan et al. / Computational Statistics & Data Analysis We define two classes of alternatives, Hε S and Hε A with ε, /, for segregation and association, respectively. For y Y, let ey denote the edge of TY opposite vertex y, and for x TY let l y x denote the line parallel to ey through x. Then define Ty, ε = { x TY : d y,l y x ε }. Let Hε S iid be the model under which X i U TY\ y Y Ty, ε and Hε A be the model under which X iid i U y Y T y, / ε. Thus the segregation model excludes the possibility of any X i occurring near a y j, and the association model requires that all X i occur near a y j. The / ε in the definition of the association alternative is so that ε = yields H under both classes of alternatives. Remark. These definitions of the alternatives are given for the standard equilateral triangle. The geometry invariance result of Theorem 1 from Section still holds under the alternatives, in the following sense. If, in an arbitrary triangle, a small percentage δ 1% where δ, 4/9 of the area is carved away as forbidden from each vertex using line segments parallel to the opposite edge, then under the transformation to the standard equilateral triangle this will result in the alternative H S. This argument is for segregation with δ < 1/4; a similar δ/4 construction is available for the other cases.. Asymptotic normality under the null and alternative hypotheses First we present a geometry invariance result which allows us to assume TY is the standard equilateral triangle, T,, 1,, 1/, /, thereby simplifying our subsequent analysis. Theorem 1. Let Y = { } y 1, y, y R be three non-collinear points. For i = 1,...,nlet iid X i F =UT Y, the uniform distribution on the triangle TY. Then for any r 1, ] the distribution of ρ X n ; h, NY r is independent of Y, hence the geometry of TY. Proof. A composition of translation, rotation, reflections, and scaling will transform any given triangle T o = T y 1, y, y into the basic triangle Tb = T,, 1,, c 1,c with <c 1 1/, c > and 1 c 1 +c 1, preserving uniformity. The transformation e : R R given by e u, v = u + 1 c 1 / v, / c v takes T b to the equilateral triangle T e = T,, 1,, 1/, /. Investigation of the Jacobian shows that e also preserves uniformity. Furthermore, the composition of e with the rigid motion transformations maps the boundary of the original triangle T o to the boundary of the equilateral triangle T e, the median lines of T o to the median lines of T e, and lines parallel to the edges of T o to lines parallel to the edges of T e. Since the joint distribution of any collection of the h ij involves only probability content of unions and intersections of regions bounded by precisely such lines, and the probability content of such regions is preserved since uniformity is preserved, the desired result follows.
7 E. Ceyhan et al. / Computational Statistics & Data Analysis Based on Theorem 1 and our uniform{ null hypothesis, we may assume that TY is the standard equilateral triangle with Y =,, 1,, 1/, } / henceforth. For our r-factor proximity map and uniform null hypothesis, the asymptotic null distribution of ρ n r = ρ X n ; h, NY r can be derived as a function of r. Let μr := E ρn r ] and νr := Cov h 1,h 1 ]. Notice that μr = E h 1 ] / = P X NY r X 1 is the probability of an arc occurring between any pair of vertices..1. Asymptotic normality under the null hypothesis By detailed geometric probability calculations, provided in Appendix A, the mean and the asymptotic variance of the relative density of the r-factor proximity catch digraph can explicitly be computed. The central limit theorem for U-statistics then establishes the asymptotic normality under the uniform null hypothesis. These results are summarized in the following theorem: Theorem. For r 1,, n ρn r μr νr L N, 1, 7 where 7 16 r for r 1, /, μr = 1 8 r + 4 8r r for r /,, 1 r for r,, 8 and with νr = ν 1 rir 1, 4/ + ν r Ir 4/, / + ν r Ir /, + ν 4 r Ir, ] 9 ν 1 r = 7 r1 184 r r r r r r r 888 r r 4, ν r = 5467 r1 78 r r r r r r 1555 r r 4, ν r = 7 r 1 7 r r 1 5 r r r r r r + 1 r ]/ 7648 r r 6], ν 4 r = 15 r4 11 r 48 r r 6. For r =, ρ n r is degenerate. See Appendix A for proof. Consider the form of the mean and variance functions, which are depicted in Fig.. Note that μr is monotonically increasing in r, since the proximity region of any data point
8 19 E. Ceyhan et al. / Computational Statistics & Data Analysis Fig.. Asymptotic null mean μr left and variance νr right, from Eqs. 8 and 9 in Theorem, respectively. The vertical lines indicate the endpoints of the intervals in the piecewise definition of the functions. Notice that the vertical axes are differently scaled. increases with r. In addition, μr 1asr, since the digraph becomes complete asymptotically, which explains why ρ n r is degenerate, i.e. νr =, when r =. Note also that μr is continuous, with the value at r = 1 μ1 = 7/16. Regarding the asymptotic variance, note that νr is continuous in r with lim r νr= and ν1=4/58.58 and observe that sup r 1 νr.15 at argsup r 1 νr.45. To illustrate the limiting distribution, r = yields n ρn μ or equivalently ν = 19n 5 ρ n approx 5 N 8, 5. 19n ρ n 5 8 L N, 1 Fig. indicates that, for r =, the normal approximation is accurate even for small n although kurtosis may be indicated for n = 1. Fig. 4 demonstrates, however, that severe skewness obtains for small values of n, and extreme values of r. The finite sample variance in Eq. 4 and skewness may be derived analytically in much the same way as was Cov h 1,h 1 ] for the asymptotic variance. In fact, the exact distribution of ρ n r is, in principle, available by successively conditioning on the values of the X i. Alas, while the joint distribution of h 1,h 1 is available, the joint distribution of {h ij } 1 i<j n, and hence the calculation for the exact distribution of ρ n r, is extraordinarily tedious and lengthy for even small values of n.
9 E. Ceyhan et al. / Computational Statistics & Data Analysis Density 4 1 Density Density Fig.. Depicted are the distributions of ρ n approx N 58, for n = 1,, 1 left to right. Histograms 5 19n are based on 1 Monte Carlo replicates. Solid curves represent the approximating normal densities given by Theorem. Again, note that the vertical axes are differently scaled Density 1 5 Density Fig. 4. Depicted are the histograms for 1, Monte Carlo replicates of ρ 1 1 left and ρ 1 5 right indicating severe small sample skewness for extreme values of r. Letting H n r = n i=1 hx i,x n+1, the exact distribution of ρ n r can be evaluated based on the recurrence n + 1nρ n+1 r d = nn 1ρ n r + H n r by noting that the conditional random variable H n r X n+1 is the sum of n independent and identically distributed random variables. Alas, this calculation is also tedious for large n... Asymptotic normality under the alternatives Asymptotic normality of relative density of the proximity catch digraphs under the alternative hypotheses of segregation and association can be established by the same method as under the null hypothesis. Let E S ε ] E A ε ] be the expectation with respect to the uniform distribution under the segregation association alternatives with ε, /.
10 194 E. Ceyhan et al. / Computational Statistics & Data Analysis Theorem. Let μ S r, ε and μ A r, ε be the mean and ν S r, ε and ν A r, ε be the covariance, Cov h 1,h 1 ] for r, 1] and ε, / under segregation and association. Then under Hε S, n ρ n r μ S r, ε L N, νs r, ε for the values of the pair r, ε for which ν S r, ε>. Likewise, under Hε A, n ρ n r μ A r, ε L N, νa r, ε for the values of the pair r, ε for which ν A r, ε>. Proof Sketch. Under the alternatives, i.e. ε >,ρ n r is a U-statistic with the same symmetric kernel h ij as in the null case. The mean μ S r, ε = E ε ρn r ] = E ε h 1 ] / and μ A r, ε, now a function of both r and ε, is again in, 1]. The asymptotic variance ν S r, ε = Cov ε h 1,h 1 ] and ν A r, ε, also a function of both r and ε, is bounded above by 1/4, as before. The explicit forms of μ S r, ε and μ A r, ε is given, defined piecewise, in Appendix B. Sample values of μ S r, ε, ν S r, ε and μ A r, ε, ν A r, ε are given in Appendix C for segregation with ε = /4 and for association with ε = /1. Thus asymptotic normality obtains provided ν S r, ε> ν A r, ε>; otherwise ρ n r is degenerate. Note that under Hε S, ν S r, ε> for r, ε 1, /ε and under H A ε, ν A r, ε> for r, ε 1, /4, /,, ] /4 1, /ε, / {1}, /1. Notice that for the association class of alternatives any r 1, yields asymptotic normality for all ε, /, while for the segregation class of alternatives only r = 1 yields this universal asymptotic normality. 4. The test and analysis The relative density of the proximity catch digraph is a test statistic for the segregation/association alternative; rejecting for extreme values of ρ n r is appropriate since under segregation we expect ρ n r to be large, while under association we expect ρ n r to be small. Using the test statistic n ρn r μr R =, 1 νr the asymptotic critical value for the one-sided level α test against segregation is given by z α = Φ 1 1 α, 11 where Φ is the standard normal distribution function. Against segregation, the test rejects for R>z 1 α and against association, the test rejects for R<z α.
11 E. Ceyhan et al. / Computational Statistics & Data Analysis kernel density estimate kernel density estimate relative density relative density Fig. 5. Two Monte Carlo experiments against the segregation alternative H S. Depicted are kernel density /8 estimates for ρ n 11/1 for n = 1 left and n = 1 right under the null solid and alternative dashed Consistency Theorem 4. The test against Hε S which rejects for R>z 1 α and the test against Hε A rejects for R<z α are consistent for r 1, and ε, /. which Proof. Since the variance of the asymptotically normal test statistic, under both the null and the alternatives, converges to as n or is degenerate, it remains to show that the mean under the null, μr = E ρ n r], is less than greater than the mean under the alternative, μ S r, ε = E ε ρn r ] μ A r, ε against segregation association for ε >. Whence it will follow that power converges to 1 as n. Detailed analysis of μ S r, ε and μ A r, ε in Appendix B indicates that under segregation μ S r, ε>μr for all ε > and r 1,. Likewise, detailed analysis of μ A r, ε in Appendix C indicates that under association μ A r, ε<μr for all ε > and r 1,. Hence the desired result follows for both alternatives. In fact, the analysis of μr, ε under the alternatives reveals more than what is required for consistency. Under segregation, the analysis indicates that μ S r, ε 1 < μ S r, ε for ε 1 < ε. Likewise, under association, the analysis indicates that μ A r, ε 1 > μ A r, ε for ε 1 < ε. 4.. Monte Carlo power analysis In Fig. 5, we present a Monte Carlo investigation against the segregation alternative H S for r = 11/1 and n = 1, 1. With n = 1, the null and alternative probability /8 density functions for ρ are very similar, implying small power 1, Monte Carlo replicates yield β S mc =.787, which is based on the empirical critical value. With n=1, there is more separation between null and alternative probability density functions; for this
12 196 E. Ceyhan et al. / Computational Statistics & Data Analysis power.4 power Fig. 6. Monte Carlo power using the asymptotic critical value against segregation alternatives H S left and /8 H S right as a function of r, for n = 1. The circles represent the empirical significance levels while triangles /4 represent the empirical power values. The r values plotted are 1, 11/1, 1/1, 4/,,,,, 5, 1. case, 1 Monte Carlo replicates yield β S mc =.77. Notice also that the probability density functions are more skewed for n = 1, while approximate normality holds for n = 1. For a given alternative and sample size, we may consider analyzing the power of the test using the asymptotic critical value as a function of the proximity factor r. InFig. 6, we present a Monte Carlo investigation of power against H S and H S as a function /8 /4 of r for n = 1. The empirical significance level is about.5 for r =, which have the empirical power β S 1 r, /8.5, and β S 1 r, /4 = 1. So, for small sample sizes, moderate values of r are more appropriate for normal approximation, as they yield the desired significance level and the more severe the segregation, the higher the power estimate. In Fig. 7, we present a Monte Carlo investigation against the association alternative H A for r = 11/1 and n = 1 and 1. The analysis is same as in the analysis of the /1 Fig. 5. InFig. 8, we present a Monte Carlo investigation of power against H A and /1 H A 5 /4 as a function of r for n = 1. The empirical significance level is about.5 for r=/,,, 5 which have the empirical power β A 1 r, /1.5 with maximum power at r =, and β A 1 r, 5 /4 = 1atr =. So, for small sample sizes, moderate values of r are more appropriate for normal approximation, as they yield the desired significance level, and the more severe the association, the higher the power estimate. 4.. Pitman asymptotic efficiency Pitman asymptotic efficiency PAE provides for an investigation of local asymptotic power local around H. This involves the limit as n as well as the limit as ε
13 E. Ceyhan et al. / Computational Statistics & Data Analysis kernel density estimate kernel density estimate relative density relative density Fig. 7. Two Monte Carlo experiments against the association alternative H A. Depicted are kernel density /1 estimates for ρ n 11/1 for n = 1 left and 1 right under the null solid and alternative dashed power.4 power Fig. 8. Monte Carlo power using the asymptotic critical value against association alternatives H A left and /1 H A 5 /4 right as a function of r, for n = 1. The r values plotted are 1, 11/1, 1/1, 4/,,,, 5, 1.. A detailed discussion of PAE can be found in Kendall and Stuart 1979 and Eeden 196. For segregation or association alternatives the PAE is given by PAE ρ n r = μ k r, ε = /νr where k is the minimum order of the derivative with respect to ε for which μ k r, ε= =. That is, μ k r, ε= = butμ l r, ε== for l=1,,...,k 1. Then under segregation alternative H S ε and association alternative H A ε, the PAE of ρ nr is
14 198 E. Ceyhan et al. / Computational Statistics & Data Analysis S PAE r 6 4 PAE r A Fig. 9. Pitman asymptotic efficiency against segregation left and association right as a function of r. Notice that vertical axes are differently scaled. given by PAE S r = μ S r, ε = νr μ and PAE A A r = r, ε =, νr respectively, since μ S r, ε = = μ A r, ε = =. Eq. 9 provides the denominator; the numerator requires μr, ε which is provided in Appendix B for under both segregation and association alternatives, where we only use the intervals of r that do not vanish as ε. In Fig. 9, we present the PAE as a function of r for both segregation and association. Notice that PAE S r = 1 = 16/7.8571, lim r PAE S r =,PAE A r = 1 = 1744/ , lim r PAE A r =, argsup r 1, PAE A r 1.6 with sup r 1, PAE A r PAE A r has also a local supremum at r l with PAE A r l Based on the asymptotic efficiency analysis, we suggest, for large n and small ε, choosing r large for testing against segregation and choosing r small for testing against association Hodges Lehmann asymptotic efficiency Hodges Lehmann asymptotic efficiency HLAE of ρ n r see, e.g., Hodges and Lehmann, 1956 under Hε S is given by HLAE S r, ε := μ Sr, ε μr. ν S r, ε HLAE for association is defined similarly. Unlike PAE, HLAE does not involve the limit as ε. Since this requires the mean and, especially, the asymptotic variance of ρ n r under an alternative, we investigate HLAE for specific values of ε. Fig. 1 contains a graph
15 E. Ceyhan et al. / Computational Statistics & Data Analysis Fig. 1. Hodges Lehmann asymptotic efficiency against segregation alternative Hε S ε = /8, /4, /7 left to right. as a function of r for Fig. 11. Hodges Lehmann asymptotic efficiency against association alternative Hε A ε = /1, /1, 5 /4 left to right. as a function of r for of HLAE against segregation as a function of r for ε = /8, /4, /7. See Appendix C for explicit forms of μ S r, ε and ν S r, ε for ε = /4. From Fig. 1, we see that, against Hε S, HLAES r, ε appears to be an increasing function, dependent on ε,ofr. Let r d ε be the minimum r such that ρ n r becomes degenerate under /8 /4 = 4, r d =, and r d the alternative Hε S. Then r d ε, ] /4, r d ε = /ε and for ε /7 =. In fact, for /4, /, r d ε = /ε. Notice that lim r rd ε HLAE S r, ε =, which is in agreement with PAE S as ε ; since as ε, HLAE becomes PAE and r d ε and under H, ρ n r is degenerate for r =.So HLAE suggests choosing r large against segregation, but in fact choosing r too large will reduce power since r r d ε guarantees the complete digraph under the alternative and, as r increases therefrom, provides an ever greater probability of seeing the complete digraph under the null. Fig. 11 contains a graph of HLAE against association as a function of r for ε = 5 /4, /1, /1. See Appendix C for explicit forms of μa r, ε and ν A r, ε for ε = /1. Notice that since νr, ε = for ε /1, HLAE A r = 1, ε = for ε /1 and lim r HLAE A r, ε =.
16 194 E. Ceyhan et al. / Computational Statistics & Data Analysis Fig. 1. Asymptotic power function against segregation alternative H S as a function of r for n = 1 first from /8 left and n = 1 s and association alternative H A as a function of r for n = 1 third and 1 fourth. /1 In Fig. 11 we see that, against Hε A, HLAEA r, ε has a local supremum for r sufficiently larger than 1. Let r be the value at which this local supremum is attained. Then r 5 /1 /1 /4., r , and r 1.5. Note that, as ε gets smaller, r gets smaller. Furthermore, HLAE A r = 1, /1 < and as ε, r becomes the global supremum, and PAE A r =1= and argsup r 1 PAE A r = So, when testing against association, HLAE suggests choosing moderate r, whereas PAE suggests choosing small r Asymptotic power function analysis The asymptotic power function see e.g., Kendall and Stuart, 1979 can also be investigated as a function of r, n, and ε using the asymptotic critical value and an appeal to normality. Under a specific segregation alternative Hε S, the asymptotic power function is given by Π S r, n, ε = 1 Φ z1 α νr + nμr μs r, ε, νs r, ε where z 1 α = Φ 1 1 α. Under H A ε,wehave Π A r, n, ε = Φ zα νr + nμr μa r, ε. νa r, ε Analysis of Fig. 1 shows that, against H S, a large choice of r is warranted for n=1 /8 but, for smaller sample size, a more moderate r is recommended.against H A, a moderate /1 choice of r is recommended for both n=1 and 1. This is in agreement with Monte Carlo investigations.
17 E. Ceyhan et al. / Computational Statistics & Data Analysis Multiple triangle case Suppose Y is a finite collection of points in R with Y. Consider the Delaunay triangulation assumed to exist of Y, where T j denotes the jth Delaunay triangle, J denotes the number of triangles, and C H Y denotes the convex hull of Y. We wish to test H : iid X i U C H Y against segregation and association alternatives. The digraph D is constructed using NY r j as described in Section., where for X i T j the three points in Y defining the Delaunay triangle T j are used as Y j. Let ρ n r, J be the relative density of the digraph based on X n and Y which yields J Delaunay triangles, and let w j := AT j /A C H Y for j = 1,...,J, where A C H Y = J j=1 A T j with A being the area functional. Then we obtain the following as a corollary to Theorem. Corollary 1. The asymptotic null distribution for ρ n r, J conditional on W={w 1,...,w J } for r 1, ] is given by Nμr, J, νr, J /n provided that νr,j> with μr, J := μr J j=1 j=1 w j and J νr, J := νr wj + J J 4μr wj j=1 wj j=1 where μr and νr are given by Eqs. 8 and 9, respectively., 1 Proof. See Appendix D. By an appropriate application of Jensen s Inequality, we see that Jj=1 w j Jj=1. wj Therefore, νr, J =iffνr= and Jj=1 Jj=1, wj = wj so asymptotic normality may hold even when νr =. Similarly, for the segregation association alternatives with 4ε / 1% of the triangles around the vertices of each triangle is forbidden allowed, we obtain the above asymptotic distribution of ρ n r with μr being replaced by μ S r, ε, νr by ν S r, ε, μr, J, by μ S r, J, ε, and νr, J by ν S r, J, ε. Likewise for association. iid Thus in the case of J>1, we have a conditional test of H : X i U C H Y which once again rejects against segregation for large values of ρ n r, J and rejects against association for small values of ρ n r, J. Depicted in Fig. 1 are the segregation with δ=1/16 i.e. ε= /8, null, and association with δ = 1/4 i.e. ε = /1 realizations from left to right with n = 1, Y =1, and J = 1. For the null realization, the p-value is greater than.1 for all r values and both alternatives. For the segregation realization, we obtain p<.1 for 1 <r 5 and p>.4 for r = 1 and r 1. For the association realization, we obtain p<.15 for 1 <r, p =.14 for r = 1, and p>.5 for r 5. Note that this is only for one realization of X n.
18 194 E. Ceyhan et al. / Computational Statistics & Data Analysis Fig. 1. Realizations of segregation left, H middle, and association right for Y =1, J = 1, and n = 1. We implement the above described Monte Carlo experiment 1 times with n = 1, n =, and n = 5 and find the empirical significance levels α S n, J and α A n, J and the empirical powers β S n r, /8,J and β A n r, /1,J. These empirical estimates are presented in Table 1 and plotted in Figs. 14 and 15. Notice that the empirical significance levels are all larger than.5 for both alternatives, so this test is liberal in rejecting H against both alternatives for the given realization of Y and n values. The smallest empirical significance levels and highest empirical power estimates occur at moderate r values r = /,, against segregation and at smaller r values r =, / against association. Based on this analysis, for the given realization of Y, we suggest the use of moderate r values for segregation and slightly smaller for association. Notice also that as n increases, the empirical power estimates gets larger for both alternatives. The conditional test presented here is appropriate when the W are fixed, not random. An unconditional version requires the joint distribution of the number and relative size of Delaunay triangles when Y is, for instance, a Poisson point pattern. Alas, this joint distribution is not available Okabe et al., Related rest statistics in multiple triangle case For J>1, we have derived the asymptotic distribution of ρ n r, J = A /nn 1. Let A j be the number of arcs, n j := X n T j, and ρ nj r be the arc density for triangle T j for j = 1,...,J.So J j=1 n j nj 1 /nn 1ρ nj r = ρ n r, J, since J j=1 n j n j 1/nn 1ρ nj r = J j=1 A j /nn 1 = A /nn 1 = ρ n r, J. Let Û n := J j=1 wj ρ n j r where w j = AT j /A C H Y. Since ρ nj r are asymptotically independent, n Û n μr, J and n ρ n r, J μr, J both converge in distribution to N, νr, J. In the denominator of ρ n r, J, weusenn 1 as the maximum number of arcs possible. However, by definition, we can at most have a digraph with J complete symmetric components of order n j, for j = 1,...,J. Then the maximum number possible is n t := J j=1 n j nj 1. Then the adjusted arc density is ρ adj A n,j :=. Then n t
19 E. Ceyhan et al. / Computational Statistics & Data Analysis Table 1 The empirical significance level and empirical power values under H S and H A, N = 1, n = 1, and J = 1, at α =.5 for the realization of Y in Fig. 1 /8 /1 r 1 11/1 6/5 4/ / 5 1 n = 1, N = 1 αsn, J βs n r, /8,J αan, J βa n r, /1,J n =, N = 1 αsn, J βs n r, /8,J αan, J βa n r, /1,J n = 5, N = 1 αsn, J βs n r, /8,J αan, J βa r, /1,J n
20 1944 E. Ceyhan et al. / Computational Statistics & Data Analysis power.6.4 power.6.4 power Fig. 14. Monte Carlo power using the asymptotic critical value against H S, as a function of r, for n = 1 /8 left, n = middle, and n = 5 right conditional on the realization of Y in Fig. 1. The circles represent the empirical significance levels while triangles represent the empirical power values power.6.4 power.6.4 power Fig. 15. Monte Carlo power using the asymptotic critical value against H A as a function of r, for n = 1 /1 left, n = middle, and n = 5 right conditional on the realization of Y in Fig. 1. The circles represent the empirical significance levels while triangles represent the empirical power values. ρ adj n,j r = J j=1 A j /n t = J j=1 n j nj 1 /n t ρ nj r. Since n j nj 1 /n t for each j, and J j=1 n j nj 1 /n t = 1, ρ adj n,j r is a mixture of ρ n j r s. Then ρ adj n,j r is ] asymptotically normal with mean E ρ adj n,j r = μr, J and the variance of ρ adj n,j r is 1 J / J νr wj w j + 4μr J / J wj wj 1. n j=1 j=1 j=1 j=1 5.. Asymptotic efficiency analysis for J>1 The PAE, HLAE, and asymptotic power function analysis are given for J = 1 in Sections , respectively. For J>1, the analysis will depend on both the number of triangles as well as the size of the triangles. So the optimal r values with respect to these efficiency criteria for J = 1 do not necessarily hold for J>1, hence the analyses need to be updated, given the values of J and W.
21 E. Ceyhan et al. / Computational Statistics & Data Analysis Fig. 16. Pitman asymptotic efficiency against segregation left and association right as a function of r with J = 1. Notice that vertical axes are differently scaled. Under segregation alternative H S ε, the PAE of ρ nr, J is given by μ PAE S J r = S r, J, ε = νr, J μ S r, ε= J j=1 wj = νr J j=1 wj +4μ Sr, ε = Jj=1 Jj=1. wj wj Under association alternative Hε A the PAE of ρ n r, J is similar. In Fig. 16, we present the PAE as a function of r for both segregation and association conditional on the realization of Y in Fig. 1. Notice that, unlike J = 1 case, PAE S J r is bounded. Some values of note are PAE S J ρn 1 =.884, lim r PAE S J r=8 J Jj=1 Jj=1 j=1 wj /56 wj wj 19.4, argsup r 1,] PAE S J r As for association, PAEA J r = 1 = , lim r PAE A J r=, argsup r 1 PAEA J r=1.5 with PAEA J r = Based on the asymptotic efficiency analysis, we suggest, for large n and small ε, choosing moderate r for testing against segregation and association. Under segregation, the HLAE of ρ n r, J is given by HLAE S J r, ε := μ Sr, J, ε μr, J ν S r, J, ε Jj=1 Jj=1 μ S r, ε wj μr wj = ν S r, ε J j=1 wj + 4μ Sr, ε Jj=1 Jj=1. wj wj
22 1946 E. Ceyhan et al. / Computational Statistics & Data Analysis Fig. 17. Hodges Lehmann asymptotic efficiency against segregation alternative Hε S ε = /8, /4, /7 left to right and J = 1. as a function of r for Notice that HLAE S J r, ε = = and lim HLAE S J r, ε = and HLAE is bounded provided that νr,j>. We calculate HLAE of ρ n r, J under Hε S for ε = /8, ε = /4, and ε = /7. In Fig. 17 we present HLAE S J r, ε for these ε values conditional on the realization of Y in Fig. 1. Note that with ε= /8, HLAE S J r =1, /8.4 and argsup r 1, ] HLAE S J r, / with the supremum.544.with ε= /4, HLAE S J r=1, /4.45 and argsup r 1, ] HLAE S J r, / with the supremum With ε = /7, HLAE S J r = 1, /7.45 and argsup r 1, ] HLAE S J r, /7 1.88with the supremum Furthermore, we observe that HLAE S J r, /7 > HLAE S J r, /4 > HLAE S J r, /8. Based on the HLAE analysis for the given Y we suggest moderate r values for moderate segregation and small r values for severe segregation. The explicit form of HLAE A J r, ε is similar to HLAES J r, ε which implies HLAEA J r, ε= = and lim HLAE A J r, ε =. We calculate HLAE of ρ n r, J under Hε A for ε = /1, ε = /1, and ε = 5 /4. In Fig. 18 we present HLAE S J r, ε for these ε values conditional on the realization of Y in Fig. 1. Note that with ε = /1, HLAE A J r = 1, /1.9 and argsup r 1, ] HLAE A J r, / with the supremum.157. With ε = /1, HLAE A J r = 1, /1.168 and argsup r 1, ] HLAE A J r, / with the supremum With ε = 5 /4, HLAE A J r = 1, 5 /4.17 and argsup r 1, ] HLAE A J r, 5 /4.96 with the supremum Furthermore, we observe that HLAE A J r, 5 /4 > HLAE A J r, /1 > HLAE A J r, /1. Based on the HLAE analysis for the given Y we suggest moderate r values for moderate association and large r values for severe association.
23 E. Ceyhan et al. / Computational Statistics & Data Analysis Fig. 18. Hodges Lehmann asymptotic efficiency against association alternative Hε A ε = /1, /1, 5 /4 left to right and J = 1. as a function of r for 6. Discussion and conclusions In this article we investigate the mathematical properties of a random digraph method for the analysis of spatial point patterns. The first proximity map similar to the r-factor proximity map NY r in literature is the spherical proximity map N S x := Bx,rx, see the references for CCCD in the Introduction. A slight variation of N S is the arc-slice proximity map N AS x := Bx,rx Txwhere Txis the Delaunay cell that contains x see Ceyhan and Priebe,. Furthermore, Ceyhan and Priebe introduced the central similarity proximity map N CS in Ceyhan and Priebe and NY r in Ceyhan and Priebe 5. The r-factor proximity map, when compared to the others, has the advantages that the asymptotic distribution of the domination number γ n NY r is tractable see Ceyhan and Priebe, 5, an exact minimum dominating set can be found in polynomial time. Moreover NY r and N CS are geometry invariant for uniform data over triangles. Additionally, the mean and variance of relative density ρ n is not analytically tractable for N S and N AS. While NY r x, N CSx, and N AS x are well defined only for x C H Y, the convex hull of Y, N S x is well defined for all x R d. The proximity maps N S and N AS require no effort to extend to higher dimensions. The N S the proximity map associated with CCCD is used in classification in the literature, but not for testing spatial patterns between two or more classes. We develop a technique to test the patterns of segregation or association. There are many tests available for segregation and association in ecology literature. See Dixon 1994 for a survey on these tests and relevant references. Two of the most commonly used tests are Pielou s χ test of independence and Ripley s test based on Kt and Lt functions. However, the test we introduce here is not comparable to either of them. Our test is a conditional test conditional on a realization of J number of Delaunay triangles and W the set of relative areas of the Delaunay triangles and we require the number of triangles J is fixed and relatively small compared to n = X n. Furthermore, our method deals with a slightly different type of data than most methods to examine spatial patterns. The sample size for one type of point type X points is much larger compared to the other type Y points. This implies that in practice, Y could be stationary or have much longer life span than members of X. For example, a
24 1948 E. Ceyhan et al. / Computational Statistics & Data Analysis special type of fungi might constitute X points, while the tree species around which the fungi grow might be viewed as the Y points. There are two major types of asymptotic structures for spatial data Lahiri, In the first, any two observations are required to be at least a fixed distance apart, hence as the number of observations increase, the region on which the process is observed eventually becomes unbounded. This type of sampling structure is called increasing domain asymptotics. In the second type, the region of interest is a fixed bounded region and more or more points are observed in this region. Hence the minimum distance between data points tends to zero as the sample size tends to infinity. This type of structure is called infill asymptotics, due to Cressie The sampling structure for our asymptotic analysis is infill, as only the size of the type X process tends to infinity, while the support, the convex hull of a given set of points from type Y process, C H Y is a fixed bounded region. Moreover, our statistic that can be written as a U-statistic based on the locations of type X points with respect to type Y points. This is one advantage of the proposed method: most statistics for spatial patterns can not be written as U-statistics. The U-statistic form avails us the asymptotic normality, once the mean and variance is obtained by tedious detailed geometric calculations. The null hypothesis we consider is considerably more restrictive than current approaches, which can be used much more generally. The null hypothesis for testing segregation or association can be described in two slightly different forms Dixon, 1994: i complete spatial randomness, that is, each class is distributed randomly throughout the area of interest. It describes both the arrangement of the locations and the association between classes. ii random labeling of locations, which is less restrictive than spatial randomness, in the sense that arrangement of the locations can either be random or non-random. Our conditional test is closer to the former in this regard. Pielou s test provide insight only on the interaction between classes, hence there is no assumption on the allocation of the observations, which makes it more appropriate for testing the null hypothesis of random labeling. Ripley s test can be used for both types of null hypotheses, in particular, it can be used to test a type of spatial randomness against another type of spatial randomness. The test based on the mean domination number in Ceyhan and Priebe 5 is not a conditional test, but requires both n and number of Delaunay triangles J to be large. The comparison for a large but fixed J is possible. Furthermore, under segregation alternatives, the Pitman asymptotic efficiency is not applicable to the mean domination number case, however, for large n and J we suggest the use of it over arc density since for each ε >, Hodges Lehmann asymptotic efficiency is unbounded for the mean domination number case, while it is bounded for arc density case with J>1. As for the association alternative, HLAE suggests moderate r values which has finite Hodges Lehmann asymptotic efficiency. So again, for large J and n mean domination number is preferable. The basic advantage of ρ n r is that, it does not require J to be large, so for small J it is preferable. Although the statistical analysis and the mathematical properties related to the r-factor proximity catch digraph are done in R, the extension to R d with d> is straightforward.
25 E. Ceyhan et al. / Computational Statistics & Data Analysis See Ceyhan and Priebe 5 for more detail on the construction of the associated proximity region in higher dimensions. Moreover, the geometry invariance, asymptotic normality of the U-statistic and consistency of the tests hold for d>. Acknowledgements This research was supported by the Defense Advanced Research Projects Agency as administered by the Air Force Office of Scientific Research under contract DOD F and by Office of Naval Research Grant N The authors thank anonymous referees for valuable comments and suggestions. Appendix A. Derivation of μr and νr In the standard equilateral triangle, let y 1 =,, y = 1,, y = 1/, /, M C be the center of mass, M j be the midpoints of the edges e j for j = 1,,. Then M C = 1/, /6, M 1 = /4, /4, M = 1/4, /4, M = 1/,. Recall that E ρ n r]= nn 1 1 i<j E h ij ]= 1 E h 1]=μr = P X j NY r X i. Let X n be a random sample of size n from UT Y. Forx 1 = u, v, l r x 1 = rv + r u x. Next, let N 1 := l r x 1 e and N := l r x 1 e. Then for z 1 T s := T y 1,M,M C, N r Y z 1 = T y 1,N 1,N provided that lr x 1 is not outside of TY, where N 1 = r y 1 + x 1 /, and N = r y 1 + x 1 /6, y 1 + x 1 r/. Now we find μr for r 1,. First, observe that, by symmetry, μr = P X N r Y X 1 = 6P X N r Y X 1, X 1 T s. Let l s r, x be the line such that rd y 1,l s r, x = d y 1,e 1 and ls r, x TY =,so l s r, x = 1 r x. Then if x 1 T s is above l s r, x then NY r x 1 = TY, otherwise, NY r x 1 = T r x 1 TY. For r 1, /, l s r, x T s =,sony r x 1 = T r x 1 TY for all x T s. Then P X NY r X 1/ x/ A N r 1,X 1 T s = Y x 1 7 dy dx = AT Y 196 r. where A N r Y x 1 = μr = 7 16 r. 1 r y + x and AT Y = /4. Hence for r 1, /,
26 195 E. Ceyhan et al. / Computational Statistics & Data Analysis l s r =, x y =1/, / l s r =1.75,x l s r =4,x e e 1 M C y 1 =, s 1 s M e y =1, Fig. 19. The cases for relative position of l s r, x with various r values. For r /,, l s r, x crosses through M M C. Let the x coordinate of l s r, x y 1 M C be s 1, then s 1 = /4r. See Fig. 19 for the relative position of l s r, x and T s. Then P X NY r X 1,X 1 T s = 1/ x/ s1 x/ = + 1/ x/ s 1 A N r Y x 1 AT Y A NY r x 1 l s r,x dy dx dy dx + AT Y 1 dy dx AT Y = 6 + r4 + 64r r 48r. 1/ ls r,x s 1 A N r Y x 1 AT Y dy dx Hence for r /,, μr = 1 8 r 8r r + 4. For r,, l s r, x crosses through y 1 M. Let the x coordinate of l s r, x y 1 M be s, then s = 1/r. See Fig. 19.
Department of Mathematics, Koç University, Istanbul, Turkey. Online publication date: 19 February 2011
This article was downloaded by: [TÜBİTAK EKUAL] On: 22 March 2011 Access details: Access Details: [subscription number 772815469] Publisher Taylor & Francis Informa Ltd Registered in England and Wales
More informationComparison of relative density of two random geometric digraph families in testing spatial clustering. Elvan Ceyhan
Comparison of relative density of two random geometric digraph families in testing spatial clustering Elvan Ceyhan TEST An Official Journal of the Spanish Society of Statistics and Operations Research
More informationA new family of random graphs for testing spatial segregation
The Canadian Journal of Statistics 27 Vol. 35, No. 1, 2007, Pages 27 50 La revue canadienne de statistique A new family of random graphs for testing spatial segregation Elvan CEYHAN, Carey E. PRIEBE and
More informationIntroduction to Real Analysis Alternative Chapter 1
Christopher Heil Introduction to Real Analysis Alternative Chapter 1 A Primer on Norms and Banach Spaces Last Updated: March 10, 2018 c 2018 by Christopher Heil Chapter 1 A Primer on Norms and Banach Spaces
More informationModule 3. Function of a Random Variable and its distribution
Module 3 Function of a Random Variable and its distribution 1. Function of a Random Variable Let Ω, F, be a probability space and let be random variable defined on Ω, F,. Further let h: R R be a given
More informationReview of Statistics
Review of Statistics Topics Descriptive Statistics Mean, Variance Probability Union event, joint event Random Variables Discrete and Continuous Distributions, Moments Two Random Variables Covariance and
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationMathematical Methods for Physics and Engineering
Mathematical Methods for Physics and Engineering Lecture notes for PDEs Sergei V. Shabanov Department of Mathematics, University of Florida, Gainesville, FL 32611 USA CHAPTER 1 The integration theory
More informationExistence and Uniqueness
Chapter 3 Existence and Uniqueness An intellect which at a certain moment would know all forces that set nature in motion, and all positions of all items of which nature is composed, if this intellect
More informationLebesgue Measure on R n
CHAPTER 2 Lebesgue Measure on R n Our goal is to construct a notion of the volume, or Lebesgue measure, of rather general subsets of R n that reduces to the usual volume of elementary geometrical sets
More information6 Markov Chain Monte Carlo (MCMC)
6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution
More informationFinite Elements. Colin Cotter. January 18, Colin Cotter FEM
Finite Elements January 18, 2019 The finite element Given a triangulation T of a domain Ω, finite element spaces are defined according to 1. the form the functions take (usually polynomial) when restricted
More informationPRIMES Math Problem Set
PRIMES Math Problem Set PRIMES 017 Due December 1, 01 Dear PRIMES applicant: This is the PRIMES 017 Math Problem Set. Please send us your solutions as part of your PRIMES application by December 1, 01.
More informationδ-hyperbolic SPACES SIDDHARTHA GADGIL
δ-hyperbolic SPACES SIDDHARTHA GADGIL Abstract. These are notes for the Chennai TMGT conference on δ-hyperbolic spaces corresponding to chapter III.H in the book of Bridson and Haefliger. When viewed from
More informationCONSTRAINED PERCOLATION ON Z 2
CONSTRAINED PERCOLATION ON Z 2 ZHONGYANG LI Abstract. We study a constrained percolation process on Z 2, and prove the almost sure nonexistence of infinite clusters and contours for a large class of probability
More informationPacking and Minkowski Covering of Congruent Spherical Caps on a Sphere for N = 2,..., 9
Original Paper Forma, 1, 197 5, 006 Packing and Minkowski Covering of Congruent Spherical Caps on a Sphere for N =,..., 9 Teruhisa SUGIMOTO 1 * and Masaharu TANEMURA 1, 1 The Institute of Statistical Mathematics,
More informationGRAPHS OF CONVEX FUNCTIONS ARE σ1-straight
ROCKY MOUNTAIN JOURNAL OF MATHEMATICS Volume 33, Number 3, Fall 2003 GRAPHS OF CONVEX FUNCTIONS ARE σ1-straight RICHARD DELAWARE ABSTRACT. A set E R n is s-straight for s>0ife has finite Method II outer
More informationNotes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed
18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,
More information= 10 such triples. If it is 5, there is = 1 such triple. Therefore, there are a total of = 46 such triples.
. Two externally tangent unit circles are constructed inside square ABCD, one tangent to AB and AD, the other to BC and CD. Compute the length of AB. Answer: + Solution: Observe that the diagonal of the
More informationSet, functions and Euclidean space. Seungjin Han
Set, functions and Euclidean space Seungjin Han September, 2018 1 Some Basics LOGIC A is necessary for B : If B holds, then A holds. B A A B is the contraposition of B A. A is sufficient for B: If A holds,
More informationNetworks: Lectures 9 & 10 Random graphs
Networks: Lectures 9 & 10 Random graphs Heather A Harrington Mathematical Institute University of Oxford HT 2017 What you re in for Week 1: Introduction and basic concepts Week 2: Small worlds Week 3:
More informationGIST 4302/5302: Spatial Analysis and Modeling Point Pattern Analysis
GIST 4302/5302: Spatial Analysis and Modeling Point Pattern Analysis Guofeng Cao www.spatial.ttu.edu Department of Geosciences Texas Tech University guofeng.cao@ttu.edu Fall 2018 Spatial Point Patterns
More informationInt. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS014) p.4149
Int. Statistical Inst.: Proc. 58th orld Statistical Congress, 011, Dublin (Session CPS014) p.4149 Invariant heory for Hypothesis esting on Graphs Priebe, Carey Johns Hopkins University, Applied Mathematics
More informationPractice Problems Section Problems
Practice Problems Section 4-4-3 4-4 4-5 4-6 4-7 4-8 4-10 Supplemental Problems 4-1 to 4-9 4-13, 14, 15, 17, 19, 0 4-3, 34, 36, 38 4-47, 49, 5, 54, 55 4-59, 60, 63 4-66, 68, 69, 70, 74 4-79, 81, 84 4-85,
More informationChapter One. The Calderón-Zygmund Theory I: Ellipticity
Chapter One The Calderón-Zygmund Theory I: Ellipticity Our story begins with a classical situation: convolution with homogeneous, Calderón- Zygmund ( kernels on R n. Let S n 1 R n denote the unit sphere
More informationCommon Core State Standards for Mathematics - High School
to the Common Core State Standards for - High School I Table of Contents Number and Quantity... 1 Algebra... 1 Functions... 3 Geometry... 6 Statistics and Probability... 8 Copyright 2013 Pearson Education,
More information3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?
MA 645-4A (Real Analysis), Dr. Chernov Homework assignment 1 (Due ). Show that the open disk x 2 + y 2 < 1 is a countable union of planar elementary sets. Show that the closed disk x 2 + y 2 1 is a countable
More informationNotes 6 : First and second moment methods
Notes 6 : First and second moment methods Math 733-734: Theory of Probability Lecturer: Sebastien Roch References: [Roc, Sections 2.1-2.3]. Recall: THM 6.1 (Markov s inequality) Let X be a non-negative
More informationLecture Notes 1: Vector spaces
Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector
More informationBichain graphs: geometric model and universal graphs
Bichain graphs: geometric model and universal graphs Robert Brignall a,1, Vadim V. Lozin b,, Juraj Stacho b, a Department of Mathematics and Statistics, The Open University, Milton Keynes MK7 6AA, United
More informationTheorems. Theorem 1.11: Greatest-Lower-Bound Property. Theorem 1.20: The Archimedean property of. Theorem 1.21: -th Root of Real Numbers
Page 1 Theorems Wednesday, May 9, 2018 12:53 AM Theorem 1.11: Greatest-Lower-Bound Property Suppose is an ordered set with the least-upper-bound property Suppose, and is bounded below be the set of lower
More informationNon-parametric Inference and Resampling
Non-parametric Inference and Resampling Exercises by David Wozabal (Last update. Juni 010) 1 Basic Facts about Rank and Order Statistics 1.1 10 students were asked about the amount of time they spend surfing
More informationINDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS
INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS STEVEN P. LALLEY AND ANDREW NOBEL Abstract. It is shown that there are no consistent decision rules for the hypothesis testing problem
More informationA tailor made nonparametric density estimate
A tailor made nonparametric density estimate Daniel Carando 1, Ricardo Fraiman 2 and Pablo Groisman 1 1 Universidad de Buenos Aires 2 Universidad de San Andrés School and Workshop on Probability Theory
More informationMathematics for Economists
Mathematics for Economists Victor Filipe Sao Paulo School of Economics FGV Metric Spaces: Basic Definitions Victor Filipe (EESP/FGV) Mathematics for Economists Jan.-Feb. 2017 1 / 34 Definitions and Examples
More informationFinite Metric Spaces & Their Embeddings: Introduction and Basic Tools
Finite Metric Spaces & Their Embeddings: Introduction and Basic Tools Manor Mendel, CMI, Caltech 1 Finite Metric Spaces Definition of (semi) metric. (M, ρ): M a (finite) set of points. ρ a distance function
More informationConcentration of Measures by Bounded Size Bias Couplings
Concentration of Measures by Bounded Size Bias Couplings Subhankar Ghosh, Larry Goldstein University of Southern California [arxiv:0906.3886] January 10 th, 2013 Concentration of Measure Distributional
More informationDOUBLY PERIODIC SELF-TRANSLATING SURFACES FOR THE MEAN CURVATURE FLOW
DOUBLY PERIODIC SELF-TRANSLATING SURFACES FOR THE MEAN CURVATURE FLOW XUAN HIEN NGUYEN Abstract. We construct new examples of self-translating surfaces for the mean curvature flow from a periodic configuration
More informationTHE N-VALUE GAME OVER Z AND R
THE N-VALUE GAME OVER Z AND R YIDA GAO, MATT REDMOND, ZACH STEWARD Abstract. The n-value game is an easily described mathematical diversion with deep underpinnings in dynamical systems analysis. We examine
More informationInference For High Dimensional M-estimates. Fixed Design Results
: Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and
More informationINVARIANT CURVES AND FOCAL POINTS IN A LYNESS ITERATIVE PROCESS
International Journal of Bifurcation and Chaos, Vol. 13, No. 7 (2003) 1841 1852 c World Scientific Publishing Company INVARIANT CURVES AND FOCAL POINTS IN A LYNESS ITERATIVE PROCESS L. GARDINI and G. I.
More informationReview. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda
Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with
More informationAutomorphism groups of wreath product digraphs
Automorphism groups of wreath product digraphs Edward Dobson Department of Mathematics and Statistics Mississippi State University PO Drawer MA Mississippi State, MS 39762 USA dobson@math.msstate.edu Joy
More informationDISTINGUISHING PARTITIONS AND ASYMMETRIC UNIFORM HYPERGRAPHS
DISTINGUISHING PARTITIONS AND ASYMMETRIC UNIFORM HYPERGRAPHS M. N. ELLINGHAM AND JUSTIN Z. SCHROEDER In memory of Mike Albertson. Abstract. A distinguishing partition for an action of a group Γ on a set
More informationMarkov Chain Monte Carlo (MCMC)
Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can
More informationClustering. CSL465/603 - Fall 2016 Narayanan C Krishnan
Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification
More informationLebesgue Measure on R n
8 CHAPTER 2 Lebesgue Measure on R n Our goal is to construct a notion of the volume, or Lebesgue measure, of rather general subsets of R n that reduces to the usual volume of elementary geometrical sets
More informationKEMATH1 Calculus for Chemistry and Biochemistry Students. Francis Joseph H. Campeña, De La Salle University Manila
KEMATH1 Calculus for Chemistry and Biochemistry Students Francis Joseph H Campeña, De La Salle University Manila February 9, 2015 Contents 1 Conic Sections 2 11 A review of the coordinate system 2 12 Conic
More informationf-divergence Estimation and Two-Sample Homogeneity Test under Semiparametric Density-Ratio Models
IEEE Transactions on Information Theory, vol.58, no.2, pp.708 720, 2012. 1 f-divergence Estimation and Two-Sample Homogeneity Test under Semiparametric Density-Ratio Models Takafumi Kanamori Nagoya University,
More informationCourse 212: Academic Year Section 1: Metric Spaces
Course 212: Academic Year 1991-2 Section 1: Metric Spaces D. R. Wilkins Contents 1 Metric Spaces 3 1.1 Distance Functions and Metric Spaces............. 3 1.2 Convergence and Continuity in Metric Spaces.........
More informationPortable Assisted Study Sequence ALGEBRA IIB
SCOPE This course is divided into two semesters of study (A & B) comprised of five units each. Each unit teaches concepts and strategies recommended for intermediate algebra students. The second half of
More informationUnsupervised Learning with Permuted Data
Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University
More informationApril 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning
for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions
More informationDiscrete Geometry. Problem 1. Austin Mohr. April 26, 2012
Discrete Geometry Austin Mohr April 26, 2012 Problem 1 Theorem 1 (Linear Programming Duality). Suppose x, y, b, c R n and A R n n, Ax b, x 0, A T y c, and y 0. If x maximizes c T x and y minimizes b T
More informationStatistics and Probability Letters
Statistics and Probability Letters 79 2009 223 233 Contents lists available at ScienceDirect Statistics and Probability Letters journal hoepage: www.elsevier.co/locate/stapro A CLT for a one-diensional
More information1 Stochastic Dynamic Programming
1 Stochastic Dynamic Programming Formally, a stochastic dynamic program has the same components as a deterministic one; the only modification is to the state transition equation. When events in the future
More informationSufficient conditions for a period incrementing big bang bifurcation in one-dimensional maps.
Sufficient conditions for a period incrementing big bang bifurcation in one-dimensional maps. V. Avrutin, A. Granados and M. Schanz Abstract Typically, big bang bifurcation occur for one (or higher)-dimensional
More informationHandout 5. α a1 a n. }, where. xi if a i = 1 1 if a i = 0.
Notes on Complexity Theory Last updated: October, 2005 Jonathan Katz Handout 5 1 An Improved Upper-Bound on Circuit Size Here we show the result promised in the previous lecture regarding an upper-bound
More informationarxiv: v1 [cs.cg] 16 May 2011
Collinearities in Kinetic Point Sets arxiv:1105.3078v1 [cs.cg] 16 May 2011 Ben D. Lund George B. Purdy Justin W. Smith Csaba D. Tóth August 24, 2018 Abstract Let P be a set of n points in the plane, each
More informationStatistical Data Analysis
DS-GA 0 Lecture notes 8 Fall 016 1 Descriptive statistics Statistical Data Analysis In this section we consider the problem of analyzing a set of data. We describe several techniques for visualizing the
More informationLecture 6 - Introduction to Electricity
Lecture 6 - Introduction to Electricity A Puzzle... We are all familiar with visualizing an integral as the area under a curve. For example, a b f[x] dx equals the sum of the areas of the rectangles of
More informationLinear estimation in models based on a graph
Linear Algebra and its Applications 302±303 (1999) 223±230 www.elsevier.com/locate/laa Linear estimation in models based on a graph R.B. Bapat * Indian Statistical Institute, New Delhi 110 016, India Received
More informationAsymptotic Statistics-VI. Changliang Zou
Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous
More informationB. Appendix B. Topological vector spaces
B.1 B. Appendix B. Topological vector spaces B.1. Fréchet spaces. In this appendix we go through the definition of Fréchet spaces and their inductive limits, such as they are used for definitions of function
More informationVector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.
Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar
More informationConcentration of Measures by Bounded Couplings
Concentration of Measures by Bounded Couplings Subhankar Ghosh, Larry Goldstein and Ümit Işlak University of Southern California [arxiv:0906.3886] [arxiv:1304.5001] May 2013 Concentration of Measure Distributional
More informationAsymptotic Statistics-III. Changliang Zou
Asymptotic Statistics-III Changliang Zou The multivariate central limit theorem Theorem (Multivariate CLT for iid case) Let X i be iid random p-vectors with mean µ and and covariance matrix Σ. Then n (
More informationCases Where Finding the Minimum Entropy Coloring of a Characteristic Graph is a Polynomial Time Problem
Cases Where Finding the Minimum Entropy Coloring of a Characteristic Graph is a Polynomial Time Problem Soheil Feizi, Muriel Médard RLE at MIT Emails: {sfeizi,medard}@mit.edu Abstract In this paper, we
More informationExact goodness-of-fit tests for censored data
Exact goodness-of-fit tests for censored data Aurea Grané Statistics Department. Universidad Carlos III de Madrid. Abstract The statistic introduced in Fortiana and Grané (23, Journal of the Royal Statistical
More informationAPPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.
APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product
More informationarxiv: v2 [math.ds] 9 Jun 2013
SHAPES OF POLYNOMIAL JULIA SETS KATHRYN A. LINDSEY arxiv:209.043v2 [math.ds] 9 Jun 203 Abstract. Any Jordan curve in the complex plane can be approximated arbitrarily well in the Hausdorff topology by
More informationWe simply compute: for v = x i e i, bilinearity of B implies that Q B (v) = B(v, v) is given by xi x j B(e i, e j ) =
Math 395. Quadratic spaces over R 1. Algebraic preliminaries Let V be a vector space over a field F. Recall that a quadratic form on V is a map Q : V F such that Q(cv) = c 2 Q(v) for all v V and c F, and
More informationRelationships between upper exhausters and the basic subdifferential in variational analysis
J. Math. Anal. Appl. 334 (2007) 261 272 www.elsevier.com/locate/jmaa Relationships between upper exhausters and the basic subdifferential in variational analysis Vera Roshchina City University of Hong
More informationA vector identity for the Dirichlet tessellation
Math. Proc. Camb. Phil. Soc. (1980), 87, 151 Printed in Great Britain A vector identity for the Dirichlet tessellation BY ROBIN SIBSON University of Bath (Received 1 March 1979, revised 5 June 1979) Summary.
More informationNormal Fans of Polyhedral Convex Sets
Set-Valued Analysis manuscript No. (will be inserted by the editor) Normal Fans of Polyhedral Convex Sets Structures and Connections Shu Lu Stephen M. Robinson Received: date / Accepted: date Dedicated
More informationGlossary Common Core Curriculum Maps Math/Grade 9 Grade 12
Glossary Common Core Curriculum Maps Math/Grade 9 Grade 12 Grade 9 Grade 12 AA similarity Angle-angle similarity. When twotriangles have corresponding angles that are congruent, the triangles are similar.
More informationAhlswede Khachatrian Theorems: Weighted, Infinite, and Hamming
Ahlswede Khachatrian Theorems: Weighted, Infinite, and Hamming Yuval Filmus April 4, 2017 Abstract The seminal complete intersection theorem of Ahlswede and Khachatrian gives the maximum cardinality of
More informationLearning Objectives for Stat 225
Learning Objectives for Stat 225 08/20/12 Introduction to Probability: Get some general ideas about probability, and learn how to use sample space to compute the probability of a specific event. Set Theory:
More informationRenyi divergence and asymptotic theory of minimal K-point random graphs Alfred Hero Dept. of EECS, The University of Michigan, Summer 1999
Renyi divergence and asymptotic theory of minimal K-point random graphs Alfred Hero Dept. of EECS, The University of Michigan, Summer 999 Outline Rényi Entropy and Rényi Divergence ffl Euclidean k-minimal
More informationMA 323 Geometric Modelling Course Notes: Day 11 Barycentric Coordinates and de Casteljau s algorithm
MA 323 Geometric Modelling Course Notes: Day 11 Barycentric Coordinates and de Casteljau s algorithm David L. Finn December 16th, 2004 Today, we introduce barycentric coordinates as an alternate to using
More informationA Bootstrap Test for Conditional Symmetry
ANNALS OF ECONOMICS AND FINANCE 6, 51 61 005) A Bootstrap Test for Conditional Symmetry Liangjun Su Guanghua School of Management, Peking University E-mail: lsu@gsm.pku.edu.cn and Sainan Jin Guanghua School
More informationImmerse Metric Space Homework
Immerse Metric Space Homework (Exercises -2). In R n, define d(x, y) = x y +... + x n y n. Show that d is a metric that induces the usual topology. Sketch the basis elements when n = 2. Solution: Steps
More informationEconomics 204 Fall 2011 Problem Set 1 Suggested Solutions
Economics 204 Fall 2011 Problem Set 1 Suggested Solutions 1. Suppose k is a positive integer. Use induction to prove the following two statements. (a) For all n N 0, the inequality (k 2 + n)! k 2n holds.
More informationTopological properties
CHAPTER 4 Topological properties 1. Connectedness Definitions and examples Basic properties Connected components Connected versus path connected, again 2. Compactness Definition and first examples Topological
More informationDA Freedman Notes on the MLE Fall 2003
DA Freedman Notes on the MLE Fall 2003 The object here is to provide a sketch of the theory of the MLE. Rigorous presentations can be found in the references cited below. Calculus. Let f be a smooth, scalar
More informationGraphs with few total dominating sets
Graphs with few total dominating sets Marcin Krzywkowski marcin.krzywkowski@gmail.com Stephan Wagner swagner@sun.ac.za Abstract We give a lower bound for the number of total dominating sets of a graph
More informationAlgorithms for Picture Analysis. Lecture 07: Metrics. Axioms of a Metric
Axioms of a Metric Picture analysis always assumes that pictures are defined in coordinates, and we apply the Euclidean metric as the golden standard for distance (or derived, such as area) measurements.
More informationJensen s inequality for multivariate medians
Jensen s inequality for multivariate medians Milan Merkle University of Belgrade, Serbia emerkle@etf.rs Given a probability measure µ on Borel sigma-field of R d, and a function f : R d R, the main issue
More informationCS5314 Randomized Algorithms. Lecture 18: Probabilistic Method (De-randomization, Sample-and-Modify)
CS5314 Randomized Algorithms Lecture 18: Probabilistic Method (De-randomization, Sample-and-Modify) 1 Introduce two topics: De-randomize by conditional expectation provides a deterministic way to construct
More informationControlling and Stabilizing a Rigid Formation using a few agents
Controlling and Stabilizing a Rigid Formation using a few agents arxiv:1704.06356v1 [math.ds] 20 Apr 2017 Abstract Xudong Chen, M.-A. Belabbas, Tamer Başar We show in this paper that a small subset of
More information9th Bay Area Mathematical Olympiad
9th Bay rea Mathematical Olympiad February 27, 2007 Problems with Solutions 1 15-inch-long stick has four marks on it, dividing it into five segments of length 1,2,3,4, and 5 inches (although not neccessarily
More informationConnected spatial random networks
February 4, 2010 Proximity graphs Network distance Proximity graphs Network distance L =1.25 d =2.5 Punctured lattice L =1.32 d =3 L =1.50 d =3 L =1.61 d =3 L =2.00 d =4 Square lattice L =2.71 d =5 L =2.83
More informationPrague, II.2. Integrability (existence of the Riemann integral) sufficient conditions... 37
Mathematics II Prague, 1998 ontents Introduction.................................................................... 3 I. Functions of Several Real Variables (Stanislav Kračmar) II. I.1. Euclidean space
More informationTHE information capacity is one of the most important
256 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 1, JANUARY 1998 Capacity of Two-Layer Feedforward Neural Networks with Binary Weights Chuanyi Ji, Member, IEEE, Demetri Psaltis, Senior Member,
More informationRandom Walks on Hyperbolic Groups III
Random Walks on Hyperbolic Groups III Steve Lalley University of Chicago January 2014 Hyperbolic Groups Definition, Examples Geometric Boundary Ledrappier-Kaimanovich Formula Martin Boundary of FRRW on
More informationBootstrap tests of multiple inequality restrictions on variance ratios
Economics Letters 91 (2006) 343 348 www.elsevier.com/locate/econbase Bootstrap tests of multiple inequality restrictions on variance ratios Jeff Fleming a, Chris Kirby b, *, Barbara Ostdiek a a Jones Graduate
More informationChapter 3. Betweenness (ordering) A system satisfying the incidence and betweenness axioms is an ordered incidence plane (p. 118).
Chapter 3 Betweenness (ordering) Point B is between point A and point C is a fundamental, undefined concept. It is abbreviated A B C. A system satisfying the incidence and betweenness axioms is an ordered
More information215 Problem 1. (a) Define the total variation distance µ ν tv for probability distributions µ, ν on a finite set S. Show that
15 Problem 1. (a) Define the total variation distance µ ν tv for probability distributions µ, ν on a finite set S. Show that µ ν tv = (1/) x S µ(x) ν(x) = x S(µ(x) ν(x)) + where a + = max(a, 0). Show that
More informationFunctions, Graphs, Equations and Inequalities
CAEM DPP Learning Outcomes per Module Module Functions, Graphs, Equations and Inequalities Learning Outcomes 1. Functions, inverse functions and composite functions 1.1. concepts of function, domain and
More informationMATHEMATICS. Higher 2 (Syllabus 9740)
MATHEMATICS Higher (Syllabus 9740) CONTENTS Page AIMS ASSESSMENT OBJECTIVES (AO) USE OF GRAPHING CALCULATOR (GC) 3 LIST OF FORMULAE 3 INTEGRATION AND APPLICATION 3 SCHEME OF EXAMINATION PAPERS 3 CONTENT
More information