Joint work with Nottingham colleagues Simon Preston and Michail Tsagris.

Size: px

Start display at page:

Download "Joint work with Nottingham colleagues Simon Preston and Michail Tsagris."

Shanon Wiggins
5 years ago
Views:

1 /pgf/stepx/.initial=1cm, /pgf/stepy/.initial=1cm, /pgf/step/.code=1/pgf/stepx/.expanded= pt,/pgf/stepy/.expanded= pt, /pgf/step/.value required /pgf/images/width/.estore in= /pgf/images/height/.estore in= /pgf/images/page/.estore in= /pgf/images/interpolate/.cd,.code=,.default=true /pgf/images/mask/.code= /pgf/images/mask/matte/.cd,.estore in=,.value

2 /pgf/imagesheight=,width=,page=,interpolate=false,mask=,width=14pt,heig /pgf/imagesheight=,width=,page=,interpolate=false,mask=,width=14pt,heig /pgf/imagesheight=,width=,page=,interpolate=false,mask=,width=11pt,heig /pgf/imagesheight=,width=,page=,interpolate=false,mask=,width=11pt,heig /pgf/imagesheight=,width=,page=,interpolate=false,mask=,width=14pt,heig /pgf/imagesheight=,width=,page=,interpolate=false,mask=,width=14pt,heig Joint work with Nottingham colleagues Simon Preston and Michail Tsagris. Regression Models for Directional Data ADISTA14 Workshop 1 / 25

3 Regression Models for Directional Data Andrew Wood School of Mathematical Sciences University of Nottingham ADISTA14 Workshop, Wednesday 21 May 2014 Joint with Nottingham colleagues Simon Preston and Michail Tsagris. Supported by EPSRC. Regression Models for Directional Data ADISTA14 Workshop 1 / 25

4 Outline of Talk 1 Introduction and background discussion. 2 Regression models with Fisher error distribution. 3 Regression models with Kent error distribution. 4 Some numerical results. 5 Discussion. Regression Models for Directional Data ADISTA14 Workshop 2 / 25

5 Data Structure Data structure : {y 1, x 1 },..., {y n, x n }, where for each i, y i is a response vector and x i is a covariate vector; y i S 2 = {y i R 3 : yi y i = 1}, i.e. y i is a unit vector in 3D space; the covariate vectors are p-dimensional, i.e. x i R p ; it is assumed throughout the talk that y 1,..., y n are independent; The main purpose here is to develop parametric regression models on the sphere in which rotational symmetry of the error distribution is not assumed. Regression Models for Directional Data ADISTA14 Workshop 3 / 25

6 Multivariate linear model Consider the standard multivariate linear model: Y = XB + E, where Y (n p) is the response matrix; X (n q) is the covariate matrix; B (q p) is the parameter matrix; E (n p) is the unobserved error matrix whose rows are assumed to be IID with common distribution N p (0, Σ). In this situation, the least squares estimator (and MLE) of B, does not depend on Σ. ˆB = (X X) 1 X Y, However, we would still not want to assume that Σ is a scalar multiple of the identity, which is the analogue of the assumption of rotational symmetry of the error distribution on the sphere. Regression Models for Directional Data ADISTA14 Workshop 4 / 25

7 A general covariate vector x i A second purpose is to develop models which can handle a general covariate x i, i.e. special structure for x i such as x i S 2 or x i R is not assumed. The spirit of the modelling here is similar to that often used with generalised linear models (GLMs). The use of GLMs is sometimes a bit cavalier in the following respects: for simplicity and convenience we assume covariate information combines in linear fashion (i.e. through the linear predictor ) on a suitable scale (i.e. after applying the link function ); we do not expect our models to extrapolate well to regions in which the data are not well represented. We shall make similar assumptions here. Regression Models for Directional Data ADISTA14 Workshop 5 / 25

8 Fisher error distribution We will be focusing on the following two error distributions on the sphere: Rotationally symmetric case: the Fisher (or von Mises-Fisher) distribution. Rotationally asymmetric case: the Kent distribution. The Fisher distribution has pdf given by f (y κ, µ) = c F (κ) 1 exp(κy µ), where µ S 2 is a unit vector, the mean direction, and κ 0 is a concentration parameter. In the case of the Fisher distribution, we can construct a regression model with constant concentration parameter κ, by specifying µ to be of the form µ = µ(x, γ), µ i = µ(x i, γ), i = 1,..., n, with γ a parameter vector, and x i the covariate vector for observation i. Regression Models for Directional Data ADISTA14 Workshop 6 / 25

9 Particular cases of the Fisher regression model In cases (i) (iii) below it is assumed that y i, x i S 2. Case (i) [Chang, AoS, 1986; Rivest, AoS, 1989] Here, µ i = Ax i, where A is an unknown rotation matrix to be estimated. Case (ii) [Downs, Biometrika, 2003] A regression model on the sphere based on Möbius transformations. Case (iii) [Rosenthal et al., JASA, 2014] In this case, µ i = Ax i Ax i, where A is a non-singular 3 3 matrix to be estimated. Regression Models for Directional Data ADISTA14 Workshop 7 / 25

10 Particular cases of the Fisher regression model (continued) Case (iv) In this case, µ i = QR i δ, where Q is a 3 3 orthogonal matrix, and δ S 2 is a unit vector. One way to define the 3 3 rotation matrix R i is by 0 c 1i c 2i R i = exp(c i ), where C i = c 1i 0 c 3i c 2i c 3i 0 is skew-symmetric, and c ji = γ j x i, j = 1, 2, 3. A second possibility is to define R i = (I 3 + C i )(I C i ) 1, which has the advantage that it is not a many-to-one mapping. Regression Models for Directional Data ADISTA14 Workshop 8 / 25

11 The Kent distribution on S 2 Kent (1982) proposed a 5 parameter family which contains the Fisher family as well as rotationally asymmetric distributions. This has pdf where [ ] f (y; µ, ξ 1, ξ 2, κ, β) = c K (κ, β) 1 exp κµ y + β{(ξ1 y) 2 (ξ2 y) 2 } µ, ξ 1 and ξ 2 are mutually orthogonal unit vectors; κ 0 and β 0 are concentration and shape parameters. The distribution is unimodal if κ > 2β. Regression Models for Directional Data ADISTA14 Workshop 9 / 25

12 Motivation for Kent distributions Datasets in directional statistics and shape analysis are often highly concentrated. When highly concentrated datasets are projected onto a suitable tangent space, a multivariate normal approximation in the tangent space is often reasonable. When the Kent distribution is highly concentrated and unimodal, the induced distribution in the tangent space at the mode is approximately bivariate normal. Regression Models for Directional Data ADISTA14 Workshop 10 / 25

13 Orthonormal basis determined by a unit vector Consider a unit vector Provided sin θ 0, sin θ cos φ sin θ sin φ cos θ, sin θ cos φ sin θ sin φ cos θ sin φ cos φ 0. and cos θ cos φ cos θ sin φ sin θ. In cartesian coordinates, assuming x x 2 2 y 1 y 2 y 3, y 2 / y1 2 + y 2 2 y 1 / y1 2 + y and 0, we have y 1 y 3 / y1 2 + y 2 2 y 2 y 3 / y1 2 + y y3 2. Regression Models for Directional Data ADISTA14 Workshop 11 / 25

14 Model for axes Given that µ i = µ(x i, γ), use the previous slide to define an orthonormal triple µ i, ξ 1i, ξ 2i, ξ 1i = ξ 1i (x i, γ), ξ 2i = ξ 2i (x i, γ). Now suppose where y i Kent(µ i, ξ 1i, ξ 2i, κ, β), i = 1,..., n, ξ 1i = (cos ψ i )ξ 1i + (sin ψ i )ξ 2i, ξ 2i = (sin ψ i )ξ 1i + (cos ψ i )ξ 2i, and ψ i = ψ(x i, α), i.e. ψ i depends on the covariate vector x i and a further parameter vector α. Regression Models for Directional Data ADISTA14 Workshop 12 / 25

15 Parameters in the Kent regression model There are three parameter vectors: α; γ; and (κ, β). In particular: The parameter vector γ, along with the covariate vector x i, determines the mean direction µ i, and also the orthonormal triple µ i, ξ 1i and ξ 2i. The parameter vector α, along with x i, determines the angle ψ i. Note that the angle ψ i is measured relative to the coordinate system determined by the orthonormal triple µ i, ξ 1i and ξ 2i, which depends on i. The parameters κ 0 and β 0 are dispersion parameters. If so desired, one could allow κ and β to depend on x i and further parameters, but for simplicity we assume here than κ and β do not depend on i. Regression Models for Directional Data ADISTA14 Workshop 13 / 25

16 The log-likelihood under the Kent model The log-likelihood under the Kent model is given by l(α, β, γ, κ) = n log c K (κ, β)+κ where, as above, µ i = µ(x i, γ), n i=1 y i µ i +β n i=1 { ( y i ) 2 ) } 2 ξ 1i (yi ξ 2i, ξ 1i = (cos ψ i )ξ 1i + (sin ψ i )ξ 2i, ξ 2i = (sin ψ i )ξ 1i + (cos ψ i )ξ 2i, ξ 1i = ξ 1i (x i, γ), ξ 2i = ξ 2i (x i, γ), ψ i = ψ i (x i, α), and c K (κ, β) is the Kent normalising constant. A key point: this construction is generic, in the sense that we can use any suitable rotationally symmetric Fisher regression model for µ i = µ(x i, γ), and any suitable (double) von Mises model for ψ i = ψ i )x i, α). Regression Models for Directional Data ADISTA14 Workshop 14 / 25

17 A structured (or switching) optimisation algorithm When maximising the log-likelihood, we have found it desirable to cycle between 3 steps: Step 1: maximise l(α (m), β (m), γ (m), κ (m) ) over γ with α (m), β (m), κ (m) held fixed, to obtain γ (m+1). Step 2: maximise l(α (m), β (m), γ (m+1), κ (m) ) over α with β (m), γ (m+1) and κ (m) held fixed, to obtain α (m+1). Step 3: maximise l(α (m+1), β (m), γ (m+1), κ (m) ) over β and κ with α (m+1) and γ (m+1) held fixed, to obtain β (m+1) and κ (m+1). Regression Models for Directional Data ADISTA14 Workshop 15 / 25

18 Comments on the algorithm The normalising constant c K (κ, β) can be calculated exactly (using the Holonomic gradient method), or approximately, to a high level of accuracy (using e.g. the Kume and Wood (2005) saddlepoint approximation). So Step 3 in the algorithm is relatively straightforward. Step 1 can be performed by modifying (because of the addition of the quadratic term) whatever algorithm is used to fit the rotationally symmetric Fisher model. Step 2 is equivalent to fitting a weighted (double) von Mises regression model for the ψ i. The weight for observation i is proportional to ( y i ξ 1i ) 2 + ( y i ξ 2i ) 2. Step 2 is simplified due to the following result. Regression Models for Directional Data ADISTA14 Workshop 16 / 25

19 A useful lemma Consider a two-parameter natural exponential family likelihood based on an IID sample of size n: l(θ 1, θ 2 ) = n log c(θ 1, θ 2 ) + θ 1 t 1 + θ 2 t 2, where θ 1 and θ 2 are natural parameters and (t 1, t 2 ) is the sufficient statistic. Let h( t 1, t 2 ) = l{ˆθ 1 ( t 1, t 2 ), ˆθ 2 ( t 1, t 2 )} denote the maximised likelihood, viewed as a function of t 1 = t 1 /n and t 2 = t 2 /n. Lemma. h( t 1, t 2 ) is an increasing function of t 1 at ( t 1, t 2 ) if and only if ˆθ 1 = ˆθ 1 ( t 1, t 2 ) is strictly positive at ( t 1, t 2 ). Regression Models for Directional Data ADISTA14 Workshop 17 / 25

20 Application of the lemma An application of the lemma to Step 2 of the computation algorithm shows that, because β 0, Step 2 is equivalent to maximising over α the quadratic term in the log likelihood, namely n i=1 { ( y i ) 2 ) } 2 ξ 1i (yi ξ 2i. This is equivalent to maximising the log-likelihood of a weighted (double) von Mises regression model. Regression Models for Directional Data ADISTA14 Workshop 18 / 25

21 An alternative approach Recall the Fisher regression model in Case (iv): y i Fisher(µ i, κ), where µ i = QR i δ with Q and R i (3 3) rotational matrices and delta S 2. We could modify to a Kent distribution by writing y i Kent(µ i, ξ 1i, ξ 2i, κ, β), where µ i, ξ 1i and ξ 2i are an orthonormal triple determined by with R i = R(x i, γ). QR i = (µ i, ξ 1i, ξ 2i ), Regression Models for Directional Data ADISTA14 Workshop 19 / 25

22 Animal (seal) tracking data Some numerical results from the animal movement tracking data found in Jonsen et al. (2005). Estimated parameters and log-likelihood for the Fisher and Kent regression models assuming linear predictor is linear in time. Fisher log-likelihood = 377.6, ˆκ = ˆγ = ( , , ) Kent log-likelihood = 432.8, ˆκ = , ˆβ = ˆγ = ( , , ) Regression Models for Directional Data ADISTA14 Workshop 20 / 25

23 Animal (seal) tracking data plot Regression Models for Directional Data ADISTA14 Workshop 21 / 25

24 A single simulated example Regression Models for Directional Data ADISTA14 Workshop 22 / 25

25 γ = ( 0.02, 0.03, 0.004) ˆγ = ( , , ) True κ = 50 and estimated κ, ˆκ, is with n = 50. The x i are scalars from a normal with mean 0 and standard deviation 10. The latitude goes from 4 to 66.8 degrees and the longitude from 9 to 329 degrees. Regression Models for Directional Data ADISTA14 Workshop 23 / 25

26 Some simulation results Some simulation results from the Fisher model under Case (iv). True betas Estimated Betas Estimated kappa N=50 Kappa= Kappa= N=100 Kappa= Kappa= Table 1: Estimated parameters (averages over 500 simulations). Root mean squared errors Estimated Estimated Betas kappa Kappa= N=50 Kappa= Kappa= N=100 Kappa= Table 2: Square root of the MSEs of the parameters (500 simulations) Regression Models for Directional Data ADISTA14 Workshop 24 / 25

27 Selected References Chang, T. (1986). Spherical regression. Annals of Statistics, 14, Downs, T.D. (2003). Spherical regression. Biometrika, 90, Jonsen, I.D., Flemming, J.M., Myers, R.A. (2005). Robust state-space modelling of animal movement data. Ecology, 86, Kent, J.T. (1982). The Fisher Bingham distribution on the sphere. J.R. Statist. Soc. B 44, Rivest, L-P. (1989). Spherical regression for concentrated Fisher-von Mises distributions. Annals of Statistics, 17, Rosenthal, M., Wei, W., Klassen, E. & Srivastava, A. (2014). Spherical regression models using projective linear transformations. Journal of the American Statistical Association (to appear). Regression Models for Directional Data ADISTA14 Workshop 25 / 25

On the Fisher Bingham Distribution

On the Fisher Bingham Distribution BY A. Kume and S.G Walker Institute of Mathematics, Statistics and Actuarial Science, University of Kent Canterbury, CT2 7NF,UK A.Kume@kent.ac.uk and S.G.Walker@kent.ac.uk