A Tractable, Parsimonious and Highly Flexible Model for Cylindrical Data, with Applications

Size: px

Start display at page:

Download "A Tractable, Parsimonious and Highly Flexible Model for Cylindrical Data, with Applications"

Shonda Garrison
6 years ago
Views:

A Tractable, Parsimonious and Highly Flexible Model for Cylindrical Data, with Applications Toshihiro Abe Nanzan University Christophe Ley SBS-EM,

1 A Tractable, Parsimonious and Highly Flexible Model for Cylindrical Data, with Applications Toshihiro Abe Nanzan University Christophe Ley SBS-EM, ECARES, Université libre de Bruxelles June 5 ECARES working paper 5- ECARES ULB - CP 4/4 5, F.D. Roosevelt Ave., B-5 Brussels BELGIUM

2 A tractable, parsimonious and highly flexible model for cylindrical data, with applications Toshihiro Abe and Christophe Ley Nanzan University, Nagoya, Japan Université libre de Bruxelles, Brussels, Belgium May 9, 5 Abstract In this paper, we propose cylindrical distributions obtained by combining the sine-skewed von Mises distribution (circular part) with the Weibull distribution (linear part). This new model, the WeiSSVM, enjoys numerous advantages: simple normalizing constant and hence very tractable density, parameter-parsimony and interpretability, good circular-linear dependence structure, easy random number generation thanks to known marginal/conditional distributions, flexibility illustrated via excellent fitting abilities, and a straightforward extension to the case of directional-linear data. Inferential issues, such as independence testing, can easily be tackled with our model, which we apply on two real data sets. We conclude the paper by discussing future applications of our model. Key words: Circular-linear data, directional-linear data, distributions on the cylinder, sineskewed von Mises distribution, Weibull distribution Introduction Cylindrical data are observations that consist of a directional part (a set of angles), which is often of a circular nature (a single angle), and a linear part (mostly a positive real number). This explains the alternative terminology of directional-linear or circular-linear data. Such data occur frequently in natural sciences; typical examples are wind direction and another climatological variable such as wind speed or air temperature, the direction an animal moves and the distance moved, or wave direction and wave height. Recent studies of cylindrical data include the exploration of wind direction and SO concentration ([7]), the analysis of Japanese earthquakes ([3]), the link between wildfire orientation and burnt area ([6]), and space-time modeling of sea currents in the Adriatic Sea ([], [5]). A non-trivial yet fundamental problem is the joint modeling of the directional/circular and linear variables via the construction of cylindrical probability distributions. The best known examples stem from the seminal papers Mardia and Sutton (978) [9], conditioning from a trivariate normal distribution, and Johnson and Wehrly (978) [], invoking maximum entropy principles. The latter also provide in their paper a general way, based on copulas, to construct circular-linear distributions with specified marginals. We refer to [3] for a thorough study of this construction and for references having put it to use. One such example is [4], who uses circular distributions based on nonnegative trigonometric sums. A more flexible generalization

3 of the Mardia-Sutton model is given in [4]. All these models shall be described in detail in the course of this paper. What desirable properties should a good cylindrical distribution possess? It should be able to model diverse shapes, in other words present good fitting aptitudes, yet it should ideally remain of a tractable form (this is crucial for stochastic properties, estimation purposes and for describing the circular-linear relationship) and be parsimonious in terms of parameters at play. The marginal and conditional distributions should optimally lead to popular and flexible directional resp. linear models (e.g., there is no reason for the circular component to be always symmetric), whilst the dependence structure has to take care of a reasonable joint behavior. Indeed, numerous examples of cylindrical data require that the circular concentration tends to increase with the linear component. All these conditions are well fulfilled by the new model we propose in the present paper. For the sake of simplicity and for the sake of comparison with the large majority of existing models from the literature, we shall present it and investigate its properties in the circular-linear setting (and discuss later the directional-linear extension), where the probability density function is of the form (θ, x) αβα π cosh(κ) { + λ sin(θ µ)} xα exp [ (βx) α { tanh(κ) cos(θ µ)}], () where (x, θ) [, ) [, π), α > is a (linear) shape parameter, β > a (linear) scale parameter, µ < π a (circular) location parameter, κ controls the (circular) concentration and λ is a (circular) skewing parameter. We term the distribution () WeiSSVM: Wei of course stands for the linear Weibull distribution with density x αβ α x α exp { (βx) α } over R +, which is a very popular distribution to model diverse natural phenomena, especially wind speed, whereas SSVM is an abbreviation for the sine-skewed von Mises distribution. This circular distribution, presented and studied in detail in [], is a tractable skew extension (simple multiplication by { + λ sin(θ µ)}, without altering the normalizing constant) of the von Mises density θ exp{κ cos(θ µ)} πi (κ) where I (κ) is the modified Bessel function of the first kind and order zero. This explains our motivation for (): a versatile linear distribution, combined with a flexible yet tractable circular one. D contour plots of the density () are given in Figure. The dependence structure is chosen in such a way that the normalizing constant is of an extremely simple form. Independence is attained when κ, in which case the density () becomes the product of the linear Weibull and the circular cardioid distribution. The numerous good properties of the WeiSSVM will be studied in detail in Section, and compared with the well-known models from the literature in Section 3. In that same section, we shall also present further new circular-linear densities having the same flavor as the WeiSSVM. Maximum likelihood estimation and the ensuing efficient likelihood ratio tests (including tests for circular-linear independence) are discussed in Section 4. We will study two distinct cylindrical data sets in Section 5, and show the excellent modeling capacities of the WeiSSVM compared with other models. In Section 6 we provide the straightforward extension of the WeiSSVM to the general directional-linear setting, and we conclude the paper by some final comments in Section 7.

4 3.4 (a) λ (b) λ.5 (c) λ Direction Direction Direction Length Length Length Figure : Contour plots of the WeiSSVM density () over [, π) [, 5) for (α, β, µ, κ) (,,, ) with (a) λ, (b) λ.5 and (c) λ. Properties of the WeiSSVM. The normalizing constant As can be seen from (), the normalizing constant is very simple, which is rather rare for cylindrical, or more generally, directional models. For a better understanding of the intricacies of our construction, we now briefly establish its expression: π π αβ α αβ α π π π cosh(κ) αβ α, { + λ sin(θ µ)} x α exp [ (βx) α { tanh(κ) cos(θ µ)}] dθdx x α exp [ (βx) α { tanh(κ) cos(θ µ)}] dθdx tanh(κ) cos(θ µ) dθ + tanh (κ/) + tanh (κ/) tanh(κ/) cos(θ µ) dθ where we have used the facts that { tanh (κ/)}/[π{+tanh (κ/) tanh(κ/) cos(θ µ)}] is the density of the wrapped Cauchy distribution and { + tanh (κ/)}/{ tanh (κ/)} cosh(κ).. Special cases and parameter interpretability When the circular skewness parameter λ, we obtain the Weibull von Mises distribution, which is to the best of the authors knowledge also new. If moreover α, that is, the linear Weibull is turned into the exponential distribution, () becomes β exp [ βx { tanh(κ) cos(θ µ)}], π cosh(κ) 3

5 which coincides with the first model in [] and could be termed exponential von Mises distribution and abbreviated ExpVM. When the circular concentration parameter κ, we re-write the WeiSSVM density under the product form π { + λ sin(θ µ)} αβα x α exp { (βx) α }, () which corresponds to the product of a Weibull with a cardioid density. Noting that sin(θ µ) cos(θ µ π/), the location parameter of the cardioid distribution here is µ + π/. Quoting [] about their first two distributions, A major limitation of the two previous densities is that if X and Θ are independent, then Θ is forced to be uniformly distributed on the circle. Thanks to the sine-skewed circular structure, the WeiSSVM does not suffer from this drawback. The preceding special cases also well illustrate the interpretations all parameters enjoy: α is the linear shape parameter, β is the linear scale parameter, µ is the circular location parameter and λ is the circular skewness parameter. The most interesting parameter in some sense is κ, which bears both the interpretation as circular concentration parameter and as parameter regulating the circular-linear dependence structure. In the independent setting (), it is to be noted that λ endorses the role of concentration parameter of the cardioid density..3 Marginal and conditional distributions The marginal density of the circular component Θ from pdf () is given by f(θ) { + λ sin(θ µ)} αβ α x α exp [ (βx) α { tanh(κ) cos(θ µ)}] dx π cosh(κ) + λ sin(θ µ) π cosh(κ) tanh(κ) cos(θ µ) tanh (κ/) π + λ sin(θ µ) + tanh (κ/) tanh(κ/) cos(θ µ), which is the sine-skewed wrapped Cauchy distribution ([]), a flexible extension of the symmetric wrapped Cauchy distribution. The marginal density of the linear component X from pdf () in turn corresponds to f(x) π π cosh(κ) αβα x α { + λ sin(θ µ)} exp [ (βx) α { tanh(κ) cos(θ µ)}] dθ π π cosh(κ) αβα x α exp{ (βx) α } exp {(βx) α tanh(κ) cos(θ µ)} dθ I (x α β α tanh(κ)) αβ α x α exp{ (βx) α }. cosh(κ) This is an extended version of the marginal density obtained for the ExpVM in []; as already noticed, it simplifies to the Weibull when κ. The conditional densities from () are now readily given by f(θ x) πi (x α β α tanh(κ)) { + λ sin(θ µ)} exp {(βx)α tanh(κ) cos(θ µ)} (3) In the original Johnson-Wehrly parameterization, we would write κ β tanh(κ) < β, and β/ cosh(κ) (β κ ) /, which is exactly their expression. 4

6 and f(x θ) α [ β{ tanh(κ) cos(θ µ)} /α] α [ { } α ] x α exp β ( tanh(κ) cos(θ µ)) /α x. (4) Both densities are quite common; (3) is the sine-skewed von Mises distribution with concentration (βx) α tanh(κ) (note how large values of x tend to increase concentration, as is often desirable) whereas (4) is the Weibull with shape parameter β ( tanh(κ) cos(θ µ)) /α..4 Random number generation Thanks to the results of the previous section, we can describe a simple random number generation algorithm by decomposing f(θ, x) into f(x θ)f(θ), in other words, by first generating Θ f(θ) and then X Θ θ f(x θ). The algorithm goes as follows. Step : Generate a random variable Θ following a (symmetric) wrapped Cauchy law with location µ and concentration tanh(κ/), and generate independently U U nif[, ]. Step : Define Θ as { Θ if U < { + λ sin(θ µ)}/ Θ if U { + λ sin(θ µ)}/; Θ then follows the sine-skewed wrapped Cauchy distribution. Step 3: Generate X from a Weibull with shape parameter β { tanh(κ) cos(θ µ)} /α. Random number generation from sine-skewed distributions follows from general skew-symmetric theory on R k ; see also []..5 Moment expressions As is known, the moments of the Weibull distribution and trigonometric moments of the sineskewed von Mises distribution are given explicitly. These nice properties are inherited to our model. For n,,... and m,,..., we have E[X n cos(mθ)] αβ α π x n cos(mθ) ( + λ sin θ) x α exp [ (βx) α { tanh(κ) cos θ}] dθdx π cosh(κ) π cos(mθ) αβ α x n x α exp { (βx) α ( tanh(κ) cos θ)} dxdθ π cosh(κ) π Γ( + n/α) cos(mθ) dθ π cosh(κ) β n n/α+ ( tanh(κ) cos θ) Γ(n/α + ){cosh(κ)}n/α+ cosh(κ)β n Γ(n/α + ){cosh(κ)}n/α β n π π cos(mθ) (cosh(κ) sinh(κ) cos θ) Γ(n/α + m)p m n/α (cosh(κ)) Γ(n/α + ) {cosh(κ)}n/α Γ(n/α + m) β n Pn/α m (cosh(κ)), n/α+ dθ 5

7 where Pν m (z) is the associated Legendre function of the first kind of degree ν and order m given by (equation 8.7. of [8], p. 969) P m ν (z) ( ν) m π π Here, we used the relation Similarly, cos mt (z + z cos t) dt Γ(ν + ) ν+ πγ(ν m + ) ( ν) m Γ(m ν) Γ( ν) m Γ(ν + ) ( ) Γ(ν m + ). cos mt (z z cos t) E[X n sin(mθ)] αβ α π x n sin(mθ) ( + λ sin θ) x α exp [ (βx) α { tanh(κ) cos θ}] dθdx π cosh(κ) λ π sin(mθ) sin θ αβ α x n x α exp [ (βx) α { tanh(κ) cos θ}] dxdθ π cosh(κ) λ π Γ(n/α + ) sin(mθ) sin θ dθ π cosh(κ) β n n/α+ { tanh(κ) cos θ} λγ(n/α + ){cosh(κ)}(n/α+) cosh(κ)β n λγ(n/α + ){cosh(κ)}n/α β n { λ{cosh(κ)}n/α β n π {cos((m )θ) cos((m + )θ)} π (cosh(κ) sinh(κ) cos θ) ( Γ(n/α + m) Γ(n/α + ) Γ(n/α + m)p m n/α n/α+ dθ ν+ dt. ) P m Γ(n/α m) n/α (cosh(κ)) Γ(n/α + ) P m+ n/α (cosh(κ)) } n/α (cosh(κ)) (cosh(κ)) Γ(n/α m)p m+ Specifying choices for m and n, and noting that the marginal of the circular part is the sine skewed wrapped Cauchy density, we obtain the following simple moment expressions (we write P ν (z) for P ν (z)) E[X] {cosh(κ)}/α Γ(/α + ) P β /α (cosh(κ)), ( κ ) E[cos(Θ)] tanh, E[sin(Θ)] λ { ( tanh κ )}, E[X cos(θ)] {cosh(κ)}/α Γ(/α) P/α β (cosh(κ)), } {Γ(/α + )P /α (cosh(κ)) Γ(/α )P/α (cosh(κ)) E[X sin(θ)] λ{cosh(κ)}/α β 3 Comparison with other new models and existing models For the sake of consistency with the original proposals in the literature, we shall use in what follows the same parameters as the authors of the diverse proposals. This entails, of course, that some of our parameters (e.g., the skewness parameter λ) will endorse different roles in the following models; this should however not raise any concerns, as we explain in detail the parameters for each model... 6

8 3. The Mardia-Sutton and Kato-Shimizu models Kato and Shimizu (8) [4] propose a cylindrical distribution as an extension of the distribution by Mardia and Sutton (978) [9]. Their model has as density ] {x µ(θ)} f KS (θ, x) C exp [ σ + κ cos(θ µ ) + κ cos{(θ µ )}, (5) where θ < π, < x <, σ >, κ, κ >, µ < π, / µ < π/, µ(θ) µ + λ cos(θ ν), < µ <, λ >, ν < π and its normalizing constant C is provided by C (π) 3/ σ I (κ )I (κ ) + I j (κ )I j (κ ) cos{j(µ µ )}. j The conditional distribution of X given Θ θ is a normal distribution and the marginal distribution of Θ is the generalized von Mises distribution ([7]). The conditional distribution of Θ given X x is also the generalized von Mises distribution, and the marginal distribution of X does not admit a simple form; see [4] for details. The dependence is obviously regulated via their parameter λ, independence occurring for λ, leading to the product of a normal and the generalized von Mises. A clear drawback of the Kato-Shimizu model is that the density involves an infinite sum in the normalizing constant which, in practice, must be approximated using a finite sum of central terms. The Mardia-Sutton model is obtained by setting κ in (5). The infinite sum in the normalizing constant then vanishes, resulting in a simpler density. All properties from above of course are the same, except that the generalized von Mises is replaced with the von Mises. 3. The Johnson-Wehrly- and Johnson-Wehrly-3 models Besides what we may now call ExpVM model, [] have also proposed the density f JW (θ, x) e κ /(4σ ) (x λ) C exp { πσ σ + κx } cos(θ µ) σ where θ < π, < x <, < λ <, κ, σ > and µ < π, and with normalizing constant ( ) ( ) κλ κ C ( ) ( ) κ κλ π I σ I 4σ + I j 4σ I j σ. As in the previous model, the conditional distribution of X given Θ θ is a normal distribution and the marginal distribution of Θ is the generalized von Mises, whereas the conditional distribution of Θ given X x is the von Mises distribution, and the marginal distribution of X is proportional to exp{ (x λ) /(σ )}I (κx/σ ); see [4], who have studied the Johnson- Wehrly- model, for more details. [] have noticed as drawback that, in case of independence (here, κ ), the circular component is forced to be uniform. In order to overcome the latter limitation, Johnson and Wehrly have further proposed the density f JW 3 (θ, x) C exp { λx + κx cos(θ µ ) + ν cos(θ µ )} (6) j 7

9 where θ < π, < x <, λ > κ > and µ, µ < π, and with normalizing constant C π κ I j I j (ν) cos{j(µ µ )} (ν) + λ κ (λ +. λ κ ) j j This density has as conditional circular distribution the von Mises and as conditional linear distribution an exponential; the circular marginal density is proportional to exp{ν cos(θ µ )}/{λ κ cos(θ µ )}, while the linear marginal is of a complicated form. Independence is attained at κ, with (6) becoming the product of an exponential and the von Mises density. 3.3 The Fernández-Durán model [] have further proposed a general, copula-like way of defining a cylindrical density, namely via the expression (θ, x) πg{π(f Θ (θ) + F X (x))}f Θ (θ)f X (x) (7) where g and f Θ are circular densities, f X is a linear density, and F Θ and F X stand for the corresponding cumulative distribution functions. As established in Theorem 5 of [], such a formulation ensures the marginal densities are f Θ and f X, respectively. When g is uniform, (7) becomes the simple product of both marginals. This nice construction, which does not underpin our model (), has been put to use by Fernández-Durán (7) [4] with the Weibull as linear component f X and both g and f Θ circular densities based on nonnegative trigonometric sums, of the form π + n π j (a j cos(jθ) + b j sin(jθ)) with a j ib j π n j ν c ν+j c ν for complex numbers c j such that n j c j (π) ; see [4] for details. The number n of terms in the sum is not fixed, hence figures as an additional parameter (n is the uniform, n the cardioid). Conditional densities are given by standard copula theory, but their forms are usually not known and of a complicated form for n >. 3.4 A new model: the generalized Gamma sine skewed von Mises distribution A natural generalization of our WeiSSVM model consists in replacing the linear Weibull part with the Generalized Gamma distribution ([]), resulting in the generalized Gamma sine skewed von Mises (GGSSVM) density (θ, x) C { + λ sin(θ µ)} x α exp [ (βx) γ { tanh(κ) cos(θ µ)}], (8) with α, γ, β, κ >, λ and µ < π. The normalizing constant is calculated as follows: π Γ(α/γ) γβ α { + λ sin(θ µ)}x α e (βx)γ { tanh(κ) cos(θ µ)} dxdθ π dθ { tanh(κ) cos θ} α/γ πγ(α/γ){cosh(κ)}α/γ P α/γ (cosh(κ)) γβ α. The WeiSSVM clearly corresponds to γ α in (8). An interesting submodel is the Gamma sine skewed von Mises (GamSSVM), obtained for γ, where, as the name suggests, the linear part is Gamma distributed. All properties of the GGSSVM and GamSSVM are easily obtained along the same lines as our developments in Section. It is to be noted that the 8

10 Table : Comparison, in terms of stochastic properties, of the different cylindrical densities: Weibull sine skewed von Mises (WeiSSVM), Mardia-Sutton (MS), Kato-Shimizu (KS), Johnson- Wehrly- (JW) or exponential von Mises, Johnson-Wehrly- (JW), Johnson-Wehrly-3 (JW3), Fernández-Durán (FD), generalized Gamma sine skewed von Mises (GGSSVM) and Gamma sine skewed von Mises (GamSSVM). means good, means medium, 3 means bad. WeiSSVM MS KS JW JW JW3 FD GGSSVM GamSSVM Simple density Independence 3 3 Marginal in Θ Marginal in X Conditional in Θ 3 Conditional in X 3 Overall circular marginal distribution for the GGSSVM is the sine skewed Jones Pewsey distribution (see [], and []). The WeiSSVM occupies a particular role within the GGSSVM, as it has a much simpler normalizing constant (no associated Legendre functions are required), especially compared with the GamSSVM. 3.5 Comparison In Table, we have drawn a comparison between the distinct proposed models, by having recourse to the following criteria: (i) is the density expressed in simple terms, hence tractable, (ii) is the independence structure good in the sense of [], (iii)-(vi) does the model give rise to reasonable (in the sense of well-known) marginal and conditional distributions. For each criterion, we have given points between and 3 ( means good, means medium, 3 means bad). These criteria are based on commonly recognized (from the literature) good properties a cylindrical model should exhibit; the ranking, of course, may be subject to criticism in the sense that one may disagree with some of our marks. As can be seen from this comparison, our WeiSSVM model comes out first, followed next by the Mardia-Sutton model. Now, clearly, this comparison lacks an important aspect, namely the inferential viewpoint (although, with regard to estimation purposes, criterion (i) is intimately related to reasonable estimation properties). We deliberately do not add the important criterion Flexibility/Fitting, as this issue will be treated in Section 5, where we compare several models in terms of their fitting properties. 4 Statistical inference 4. Parameter estimation Let (θ, x ),..., (θ n, x n ) be independent and identically distributed samples drawn from the distribution with density (). Then the log-likelihood function can be expressed as 9

11 l(α, β, µ, κ, λ) (α ) + log x i β α i x α i { tanh (κ) cos(θ i µ)} i log{ + λ sin(θ i µ)} + n{α log β + log α log(π cosh(κ))}. (9) i The elements of the score vector are just the first-order partial derivatives of (9) with respect to each of the parameters: l ( α log x i β α log(βx i )x α i { tanh (κ) cos(θ i µ)} + n log β + ), α i l β αβα i x α i { tanh(κ) cos(θ i µ)} + nα β, x α i sin(θ i µ) λ i l µ βα tanh(κ) l κ l λ β α {cosh(κ)} i i i x α i cos(θ i µ) n tanh(κ), i sin(θ i µ) + λ sin(θ i µ). cos(θ i µ) + λ sin(θ i µ), Any numerical root-finding algorithm can readily solve the associated likelihood equations and yield the maximum likelihood estimates of the five parameters at play. Quite conveniently, some elements of the expected Fisher information matrix I are zero, namely I αλ, I βλ and I κλ, implying that the maximum likelihood estimate of λ is asymptotically independent of the maximum likelihood estimates of α, β and κ. This property is especially important when performing hypothesis tests about λ under unspecified α, β, κ, as their estimation then does not affect the power of such tests. 4. Submodel and independence testing Testing for submodels of the WeiSSVM model is straightforward via likelihood ratio tests. For each parameter η {α, β, µ, κ, λ}, we denote ˆη the unconstrained maximum likelihood estimate and ˆη the maximum likelihood estimate under the respective null hypotheses. Two particular instances are of interest. On the one hand, testing for the Johnson-Wehrly-, or ExpVM, submodel, which is taken care of by the test statistic T JW {log l(, ˆβ, ˆµ, ˆκ, ) log l(ˆα, ˆβ, ˆµ, ˆκ, ˆλ)}, rejecting H : {α } {λ } at asymptotic level α whenever T JW exceeds χ ; α, the α-upper quantile of the chi-square distribution with degrees of freedom. On the other hand, we are interested in testing for circular-linear independence via the test statistic T Indep {log l(ˆα, ˆβ, ˆµ,, ˆλ ) log l(ˆα, ˆβ, ˆµ, ˆκ, ˆλ)}, to be compared with χ ; α. Such tests, or the goal of defining measures of angular-linear correlation, have a long-standing history in the statistical literature, initiated by [8] and [9]; see [6] for a recent proposal, based on directional-linear kernel density estimation, and for references.

12 5 Fitting two circular-linear real data sets In this section we shall illustrate the good fitting behavior of the WeiSSVM by analyzing two popular data sets from the literature. More concretely, we shall compare the WeiSSVM with the models Johnson-Wehrly- or ExpVM, Mardia-Sutton, Kato-Shimizu, the independence model (linear Weibull and circular sine skewed densities), and the alternative new models, GamSSVM and GGSSVM. From the more complicated (in terms of tractability) models described in Section 3, we have chosen the Kato-Shimizu model since it is the most recent and the authors have shown its good fitting abilities in [4]. Our means of comparison shall be the Akaike Information Criterion (AIC). 5. Periwinkle data We give an analysis of n 3 observations which consist of the movements of blue periwinkles after they had been transplanted downshore from the height at which they normally live. The data set was taken from Table of [5]; see that paper for details about the experience. A visual inspection of Figure in [5] or of Figure.a) in [4] reveals that the concentration of the circular part tends to increase with length, which is precisely one of the features that the WeiSSVM model can well incorporate. Moreover, [4] have shown that, on basis of the Pewsey test of symmetry (see []), the circular part of the data is asymmetric, which can well be captured by the sine skewed von Mises distribution. Table presents the maximum likelihood estimates, maximized log-likelihood and Akaike Information Criterion values obtained from all models under investigation. As we can see, the location parameters of the GGSSVM and its submodels are close (note that, as remarked in Section., the location of the Independence model is.97 + π/.4) and the WeiSSVM has the lowest AIC value. It clearly improves on Johnson-Wehrly- and Mardia-Sutton, and even on the flexible Kato-Shimizu model. It is quite remarkable to notice the tiny difference in the maximized log-likelihood between WeiSSVM and the embedding model, the GGSSVM. The likelihood ratio test for the Johnson-Wehrly- submodel (w.r.t. the WeiSSVM) takes value T JW ( ) 8.7, with p-value., which emphatically rejects the Johnson-Wehrly- model. Even stronger, the independence test yields T Indep ( ) 37.36, clearly stressing the dependence between the angular and the linear part. As a conclusion, our WeiSSVM model (with 5 parameters) is a good-fitting and parsimonious model for the periwinkle data set. For visual impression, we have superimposed the contour plot of the fitted WeiSSVM model on a list plot of the data in the panel making up Figure. 5. Wind direction and temperature data As second example, we consider the original data set from [9], consisting of 8 measurements of wind direction and temperature at Kew during the period The data are taken from Table in [9], whose Figure also provides a good idea of the distribution of the data. Although the effect noticed for the periwinkle data, namely high concentration for high linear values, is less marked here, Mardia and Sutton have noted (and established) a strong dependence between the circular and the linear component. It has been shown in [4] that the Mardia-Sutton model is extremely good for this data set; it is therefore very interesting to compare it with the WeiSSVM and related new models. Table 3 contains the maximum likelihood estimates, maximized log-likelihood and Akaike information criterion values. The (circular) location parameters of our proposed models are almost the same (again, the location of the Independence model is.6 + π/.95). We see that our

13 Table : Maximum likelihood estimates (MLEs), maximized log-likelihood, l max, and Akaike Information Criterion (AIC) values for the Weibull sine skewed von Mises (WeiSSVM) and its competitor models, the generalized Gamma sine skewed von Mises (GGSSVM), the Gamma sine skewed von Mises (GamSSVM), the exponential von Mises (ExpVM), the independence (Indep.), Mardia-Sutton (MS) and Kato-Shimizu (KS) models, fitted to the blue periwinkle data. MLEs Distributions ˆα ˆβ ˆγ ˆµ ˆκ ˆλ lmax AIC WeiSSVM GGSSVM GamSSVM JW/ExpVM Indep ˆµ ˆσ ˆλ ˆν ˆµ ˆµ ˆκ ˆκ l max AIC MS KS WeiSSVM model best incorporates the non-trivial behavior of this data set (its AIC value is clearly below that of the MS model), and again it is much better than the Johnson-Wehrly- model (which is clearly rejected as submodel). A contour plot of the fitted WeiSSVM model with a list plot of the data is provided in Figure 3. We finally note that the independence test of course heavily rejects (p-value.) the null of independence, hereby agreeing with [9]. 6 Extension to the directional-linear setting Yet another advantage of the WeiSSVM is its straightforward extension to the directionallinear setting. It is obtained by replacing the circular sine skewed von Mises density with its equivalent on unit spheres S k {v R k : v }, k 3, recently defined in [6]. The cosine part simply becomes the scalar product θ µ between θ S k and the location parameter µ S k, while λ sin(θ µ) is expressed as (θ µ) λ S µ (θ), λ S k, where S µ (θ (θ µ)µ)/ θ (θ µ)µ is the multivariate sign vector on the unit sphere. We refer to [6] for further information, and for more general skew-rotsymmetric distributions, as they are termed. The density of the Weibull sine skewed Fisher-von Mises-Langevin, in short WeiSSFVML, distribution on S k R +, for the directional part with respect to the usual surface area measure dσ k, is defined as (θ, x) f(θ, x) C k ( + ) (θ µ) λ S µ (θ) x α exp { (βx) α ( tanh(κ)θ µ) }. () The normalizing constant of the distribution () is simply given by C k αβ α {sinh(κ)} (k/) (π) k/ cosh(κ)p (k/) k/ (cosh(κ)). In higher dimensions, the VM is called Fisher-von Mises-Langevin and hence abbreviated FVML.

14 3 Direction Length Figure : Contour plot of the blue periwinkle data (in lengths and radians), together with the fitted WeiSSVM density. The data are plotted over [, π) [, 5). Indeed S k S k S k (µ ) ( + ) (θ µ) λ S µ (θ) x α e (βx)α ( tanh(κ)θ µ) dxdσ k (θ) x α e (βx)α ( tanh(κ)θ µ) dxdσ k (θ) π k/ αβ α Γ(k/)B(/, (k )/) x α e (βx)α ( tanh(κ)t) dxdσ k (v)( t ) (k 3)/ dt (π)k/ cosh(κ)p (k/) k/ (cosh(κ)) αβ α {sinh(κ)} (k/), ( t ) (k 3)/ tanh(κ)t dt where B(, ) denotes the beta function. We have used above the change of variables formula dσ k (θ) ( t ) (k 3)/ dσ k (v)dt where v S k (µ ) {v R k : v, v µ }, the equality ω k ω k /B(/, (k )/) (with ω k π k/ /Γ(k/) the surface area measure of S k ) as well as, like for the result of Section 5 in [], the following relationship of the associated Legendre function (equation 8.7. of [8], p. 969) Pν µ (z) (z ) µ µ πγ(µ + ) ( t ) µ (z + t z ) dt [Rµ >, arg(z ± ) < π]. µ ν Clearly, the distribution () reduces to () when k, and as in [], the distribution also has a simpler form when k 3, namely, f(θ, x) αβα tanh(κ) 4πκ ( + ) (θ µ) λ S µ (θ) x α exp { (βx) α ( tanh(κ)θ µ) }. 3

15 Table 3: Maximum likelihood estimates (MLEs), maximized log-likelihood, l max, and Akaike Information Criterion (AIC) values for the Weibull sine skewed von Mises (WeiSSVM) and its competitor models, the generalized Gamma sine skewed von Mises (GGSSVM), the Gamma sine skewed von Mises (GamSSVM), the exponential von Mises (ExpVM), the independence (Indep.), Mardia-Sutton (MS) and Kato-Shimizu (KS) models, fitted to the wind-temperature data. MLEs Distributions ˆα ˆβ ˆγ ˆµ ˆκ ˆλ lmax AIC WeiSSVM GGSSVM GamSSVM JW/ExpVM Indep ˆµ ˆσ ˆλ ˆν ˆµ ˆµ ˆκ ˆκ l max AIC MS KS The nice properties from Section extend to the directional-linear setting, with the circular distributions replaced with their higher-dimensional directional counterparts. Maximum likelihood estimators for the parameters α, β, µ, κ, λ can readily be derived, baring in mind that µ is constrained to lie on S k ; one way to overcome the latter issue consists in using spherical coordinates to express the location. 7 Discussion and future research In this paper, we have introduced a new distribution for circular-linear data, the WeiSSVM. We have presented its numerous good properties: tractable density expression (in particular, simple normalizing constant), a good dependence structure in the sense that, in case of independence, the circular part is not necessarily uniform but cardioid, nice expressions for the marginal and conditional circular and linear expressions (except for the marginal linear one, whose density is slightly more complex), much more flexibility than the first model in [] which is a special case of the WeiSSVM, and direct extension to the general directional-linear setting, yielding the WeiSSFVML distribution. Last but certainly not least, the WeiSSVM exhibits very good fitting properties (shown by means of two distinct data sets), improving in particular on the more complicated models from the literature. Thus, we can consider our model as a tractable, parsimonious (in terms of the number of parameters) and flexible model for cylindrical data. Given its fitting capacities and simple parameter interpretation, the WeiSSVM is a viable model to investigate in detail further data sets. Two concrete examples shall be elucidated in the future. The first concerns ecological data related to trees. Indeed, [] have only used the direction of fallen logs, hence a pure circular setting, to model the influence of neighborhood structure and directionality of radiation on crown asymmetry; a more detailed analysis can be obtained by adding as linear part the distance to each neighboring tree. The second data set concerns cylindrical data consisting of the burnt area and the direction of wildfires in Portugal, as analyzed in [3] and [6]. Our parametric model will be an interesting alternative especially to the non-parametric approach of the latter paper. Moreover, these data are both circular-linear 4

16 Direction Length Figure 3: Contour plot of the wind and temperature data (in lengths and radians), together with the fitted WeiSSVM density. The data are plotted over [, π) [3, 6). and directional-linear, requiring our extension from Section 6. Acknowledgement Toshihiro Abe was supported in part by JSPS KAKENHI Grant Number 5K7593. Christophe Ley thanks the Fonds National de la Recherche Scientifique, Communauté française de Belgique, for financial support via a Mandat de Chargé de Recherche. References [] T. Abe, Y. Kubota, K. Shimatani, T. Aakala, and T. Kuuluvainen. Circular distributions of fallen logs as an indicator of forest disturbance regimes. Ecological Indicators, 8: ,. [] T. Abe and A. Pewsey. Sine-skewed circular distributions. Statistical Papers, 5:683 77,. [3] A. M. G. Barros, J. M. C. Pereira, and U. J. Lund. Identifying geographical patterns of wildfire orientation: A watershed-based analysis. Forest Ecology and Management, 64:98 7,. [4] J. J. Fernández-Durán. Models for circular-linear and circular-circular data constructed from circular distributions based on nonnegative trigonometric sums. Biometrics, 63: , 7. [5] N. I. Fisher and A. J. Lee. Regression models for an angular response. Biometrics, 48: , 99. 5

17 [6] E. García-Portugués, A. M. G. Barros, R. M. Crujeiras, W. González-Manteiga, and J. M. C. Pereira. A test for directional-linear independence, with applications to wildfire orientation and size. Stochastic Environmental Research and Risk Assessment, 8:6 75, 4. [7] E. García-Portugués, R. M. Crujeiras, and W. González-Manteiga. Exploring wind direction and SO concentration by circular-linear density estimation. Stochastic Environmental Research and Risk Assessment, 7:55 67, 3. [8] I. S. Gradshteyn and I. M. Ryzhik. Tables of integrals, series, and products, 8th Edn. London: Academic Press, 5. [9] R. A. Johnson and T. E. Wehrly. Measures and models for angular correlation and angularlinear correlation. Journal of the Royal Statistical Society Series B, 39: 9, 977. [] R. A. Johnson and T. E. Wehrly. Some angular-linear distributions and related regression models. Journal of the American Statistical Association, 73:6 66, 978. [] M. C. Jones and A. Pewsey. A family of symmetric distributions on the circle. Journal of the American Statistical Association, :4 48, 5. [] M. C. Jones and A. Pewsey. Inverse Batschelet distributions. Biometrics, 68:83 93,. [3] M. C. Jones, A. Pewsey, and S. Kato. On a class of circulas: copulas for circular distributions. Annals of the Institute of Statistical Mathematics, DOI:.7/s , 4. [4] S. Kato and K. Shimizu. Dependent models for observations which include angular ones. Journal of Statistical Planning and Inference, 38: , 8. [5] F. Lagona, M. Picone, A. Maruotti, and S. Cosoli. A hidden Markov approach to the analysis of space-time environmental data with linear and circular components. Stochastic Environmental Research and Risk Assessment, 9:397 49, 5. [6] C. Ley and T. Verdebout. Skew-rotsymmetric distributions on unit spheres and related efficient inferential procedures. ECARES Working Paper 4-46, 4. [7] V. M. Maksimov. Necessary and sufficient statistics for a family of shifts of probability distributions on continuous bicompact groups. Rossiskaya Akademiya Nauk. Teor. Verojatnost. i Primenen., :37 3 (in Russian), English Translation: Theory of Probability and its Applications, 67 8, 967. [8] K. V. Mardia. Linear-circular correlation coefficients and rhythmometry. Biometrika, 63:43 45, 976. [9] K. V. Mardia and T. W. Sutton. A model for cylindrical variables with applications. Journal of the Royal Statistical Society Series B, 4:9 33, 978. [] A. Pewsey. Testing circular symmetry. Canadian Journal of Statistics, 3:59 6,. [] E. W. Stacy. A generalization of the Gamma distribution. Annals of Mathematical Statistics, 33:87 9, 96. [] F. Wang, A. E. Gelfand, and G. Jona-Lasinio. Joint spatio-temporal analysis of a linear and a directional variable: space-time modeling of wave heights and wave directions in the Adriatic Sea. Statistica Sinica, 5:5 39, 5. 6

18 [3] M.-Z. Wang, K. Shimizu, and K. Uesu. An analysis of earthquakes latitude, longitude and magnitude data by use of directional statistics. Japanese Journal of Applied Statistics, pages 9 44 (in Japanese), 3. 7

By Bhattacharjee, Das. Published: 26 April 2018

By Bhattacharjee, Das. Published: 26 April 2018 Electronic Journal of Applied Statistical Analysis EJASA, Electron. J. App. Stat. Anal. http://siba-ese.unisalento.it/index.php/ejasa/index e-issn: 2070-5948 DOI: 10.1285/i20705948v11n1p155 Estimation