Davy PAINDAVEINE Thomas VERDEBOUT

Similar documents
Optimal Rank-Based Tests for the Location Parameter of a Rotationally Symmetric Distribution on the Hypersphere

arxiv: v1 [math.st] 7 Nov 2017

SUPPLEMENT TO TESTING UNIFORMITY ON HIGH-DIMENSIONAL SPHERES AGAINST MONOTONE ROTATIONALLY SYMMETRIC ALTERNATIVES

arxiv: v2 [stat.ap] 27 Mar 2012

Independent Component (IC) Models: New Extensions of the Multinormal Model

Invariant HPD credible sets and MAP estimators

Semiparametric Gaussian Copula Models: Progress and Problems

Semiparametric Gaussian Copula Models: Progress and Problems

Discriminant Analysis with High Dimensional. von Mises-Fisher distribution and

On Independent Component Analysis

Directional Statistics

Graduate Econometrics I: Maximum Likelihood II

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky

Multivariate Regression

By Bhattacharjee, Das. Published: 26 April 2018

Lecture 21. Hypothesis Testing II

Lecture 3. Inference about multivariate normal distribution

Likelihood-based inference with missing data under missing-at-random

Joint work with Nottingham colleagues Simon Preston and Michail Tsagris.

Statistical Methods for Particle Physics Lecture 1: parameter estimation, statistical tests

Signed-rank Tests for Location in the Symmetric Independent Component Model

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations

Homework 7: Solutions. P3.1 from Lehmann, Romano, Testing Statistical Hypotheses.

Maximum likelihood characterization of rotationally symmetric distributions on the sphere

arxiv: v1 [math.st] 31 Jul 2007

Discussion Paper No. 28

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances

Goodness of Fit Test and Test of Independence by Entropy

University of California, Berkeley

5 Introduction to the Theory of Order Statistics and Rank Statistics

Some New Aspects of Dose-Response Models with Applications to Multistage Models Having Parameters on the Boundary

D I S C U S S I O N P A P E R

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES

Irr. Statistical Methods in Experimental Physics. 2nd Edition. Frederick James. World Scientific. CERN, Switzerland

arxiv: v3 [math.st] 10 Jan 2014

Tutorial on Approximate Bayesian Computation

A copula goodness-of-t approach. conditional probability integral transform. Daniel Berg 1 Henrik Bakken 2

Stat 5101 Lecture Notes

Testing Goodness Of Fit Of The Geometric Distribution: An Application To Human Fecundability Data

Lecture 7 Introduction to Statistical Decision Theory

Nonparametric Tests for Multi-parameter M-estimators

A nonparametric two-sample wald test of equality of variances

Random Matrix Eigenvalue Problems in Probabilistic Structural Mechanics

More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction

Multivariate Time Series: VAR(p) Processes and Models

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

On Fisher Information Matrices and Profile Log-Likelihood Functions in Generalized Skew-Elliptical Models

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

When is a copula constant? A test for changing relationships

The Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Sharp Generalization Error Bounds for Randomly-projected Classifiers

Hypothesis testing in multilevel models with block circular covariance structures

A permutation test for comparing rotational symmetry in three-dimensional rotation data sets

Multivariate Distributions

On Multivariate Runs Tests. for Randomness

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics

Inference on reliability in two-parameter exponential stress strength model

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

Frailty Models and Copulas: Similarities and Differences

HANDBOOK OF APPLICABLE MATHEMATICS

Robust Backtesting Tests for Value-at-Risk Models

The International Journal of Biostatistics

Multivariate Non-Normally Distributed Random Variables

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon

A Comparison of Robust Estimators Based on Two Types of Trimming

Exact Statistical Inference in. Parametric Models

Journal of Multivariate Analysis. Sphericity test in a GMANOVA MANOVA model with normal error

On the Fisher Bingham Distribution

Random Eigenvalue Problems Revisited

STAT331. Cox s Proportional Hazards Model

A Note on UMPI F Tests

Covariance function estimation in Gaussian process regression

On prediction and density estimation Peter McCullagh University of Chicago December 2004

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Chapter 4. Theory of Tests. 4.1 Introduction

Financial Econometrics and Volatility Models Copulas

1 Mixed effect models and longitudinal data analysis

ROBUST TESTS BASED ON MINIMUM DENSITY POWER DIVERGENCE ESTIMATORS AND SADDLEPOINT APPROXIMATIONS

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

PRE-TEST ESTIMATION OF THE REGRESSION SCALE PARAMETER WITH MULTIVARIATE STUDENT-t ERRORS AND INDEPENDENT SUB-SAMPLES

A Goodness-of-fit Test for Copulas

Near-exact approximations for the likelihood ratio test statistic for testing equality of several variance-covariance matrices

c 2005 Society for Industrial and Applied Mathematics

ABOUT PRINCIPAL COMPONENTS UNDER SINGULARITY

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data

Does k-th Moment Exist?

Derivation of Monotone Likelihood Ratio Using Two Sided Uniformly Normal Distribution Techniques

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Control of Directional Errors in Fixed Sequence Multiple Testing

Second-Order Inference for Gaussian Random Curves

Testing Algebraic Hypotheses

Tests Using Spatial Median

Issues on quantile autoregression

UNIVERSITÄT POTSDAM Institut für Mathematik

Tests for spatial randomness based on spacings

GARCH Models Estimation and Inference. Eduardo Rossi University of Pavia

Transcription:

2013/94 Optimal Rank-Based Tests for the Location Parameter of a Rotationally Symmetric Distribution on the Hypersphere Davy PAINDAVEINE Thomas VERDEBOUT

Optimal Rank-Based Tests for the Location Parameter of a Rotationally Symmetric Distribution on the Hypersphere Davy Paindaveine and Thomas Verdebout Université libre de Bruxelles, Brussels, Belgium Laboratoire EQUIPPE, Université Lille Nord de France, France Abstract Rotationally symmetric distributions on the unit hyperpshere are among the most successful ones in directional statistics. These distributions involve a finite-dimensional parameter θ and an infinite-dimensional parameter g 1, that play the role of location and angular density parameters, respectively. In this paper, we consider semiparametric inference on θ, under unspecified g 1. More precisely, we focus on hypothesis testing and develop tests for null hypotheses of the form H 0 : θ = θ 0, for some fixed θ 0. Using the uniform local and asymptotic normality result from Ley et al. (2013), we define parametric tests that achieve Le Cam optimality at a target angular density. To improve on the poor robustness of these parametric procedures, we then introduce a class of rank tests for the same problem. Parallel to parametric tests, the proposed rank tests achieve Le Cam optimality under correctly specified angular densi- 1

ties. We derive the asymptotic properties of the various tests and investigate their finite-sample behaviour in a Monte Carlo study. Keywords and phrases: Group invariance, rank-based tests, rotationally symmetric distributions, spherical statistics, uniform local and asymptotic normality 1 Introduction Spherical or directional data naturally arise in a plethora of earth sciences such as geology (see, e.g., Fisher (1989)), seismology (Storetvedt and Scheidegger (1992)), astrophysics (Briggs (1993)), oceanography (Bowers and Mould (2000)) or meteorology (Fisher (1987)), as well as in studies of animal behavior (Fisher and Embleton (1987)) or even in neuroscience (Leong and Carlile (1998)). For decades, spherical data were explored through linear approximations trying to circumvent the curved nature of the data. Then the seminal paper Fisher (1953) showed that linearization hampers a correct study of several phenomena (such as, e.g., the remanent magnetism found in igneous or sedimentary rocks) and that it was therefore crucial to take into account the non-linear, spherical, nature of the data. Since then, a huge literature has been dedicated to a more appropriate study of spherical data; we refer to Mardia (1975), Jupp and Mardia (1989), Mardia and Jupp (2000) or to the second chapter of Merrifield (2006) for a detailed overview. Spherical data are commonly viewed as realizations of a random vector X taking values in the unit hypersphere S k 1 := { x R k x := x x = 1 } (k 2). In the last decades, numerous (classes of) distributions on S k 1 have been proposed and investigated. In this paper, we focus on the class of rotationally symmetric distributions on S k 1, that were introduced in Saw (1978). This class is characterized by the fact that the probability mass at x is a monotone nondecreasing function of 2

the spherical distance x θ between x and a given θ S k 1. This implies that the resulting equiprobability contours are the (k 2)-hyperspheres x θ = c (c [ 1, 1]), and that this north pole θ may be considered as a spherical mode, hence may be interpreted as a location parameter. Of course, this assumption of rotational symmetry may seem very restrictive. Yet, the latter is often used to model real phenomena. Indeed, according to Jupp and Mardia (1989), rotationally symmetric spherical data appear inter alia in situations where the observation process imposes such symmetrization (e.g., the rotation of the earth; see Mardia and Edwards (1982)). Another instance where rotational symmetry is appropriate is obtained when the observation scheme does not allow to make a distinction between the measurements x and O θ x for any rotation matrix O θ such that O θ θ = θ. In such a case, indeed, only the projection of x onto the modal axis θ can be observed; see Clark (1983). In the absolutely continuous case (with the dominating measure being the uniform distribution on S k 1 ), rotationally symmetric distributions have a probability density function (pdf) of the form x cg 1 (x θ), for some nondecreasing function g 1 : [ 1, 1] R +. Hence, the problem is intrinsically of a semiparametric nature. While inference about θ has been considered in many papers (see, among others, Chang (2004) and Tsai and Sen (2007)), semiparametrically efficient inference procedures in the rotationally symmetric case have not been developed in the literature. The only exception is the very recent contribution by Ley et al. (2013), where rank-based estimators of θ that achieve semiparametric efficiency at a target angular density are defined. Their methodology, that builds on Hallin and Werker (2003), relies on invariance arguments and on the uniform local and asymptotic nature with respect to θ, at a fixed g 1 of the model considered. Ley et al. (2013), however, consider point estimation only, hence do not address 3

situations where one would like to test the null hypothesis that the location parameter θ is equal to a given θ 0 ( S k 1 ). In this paper, we therefore extend the results from Ley et al. (2013) to hypothesis testing. This leads to a class of rank tests for the aforementioned testing problem, that, when based on correctly specified scores, are semiparametrically optimal. The proposed tests are invariant both with respect to the group of continuous monotone increasing transformations (of spherical distances) and with respect to the group of orthogonal transformations fixing the null value θ 0. Their main advantage over studentized tests is that they are not only validity-robust but also are efficiency-robust. The outline of the paper is as follows. In Section 2, we carefully define the class of rotationally symmetric distributions considered, introduce the main assumptions needed, and state the uniform local and asymptotic normality result that will be the main technical tool for this work. Then we derive in Section 3 the resulting optimal parametric tests and study their asymptotic behaviour. In Section 4, we discuss the group invariance structure of the testing problem considered, propose a class of (invariant) rank tests, and study their asymptotic properties. Finally, we conduct in Section 5 a Monte Carlo study to investigate the finite-sample behaviour of the proposed tests. 2 Rotationally symmetric distributions and ULAN The random vector X, with values in the unit sphere S k 1 of R k, is said to be rotationally symmetric about θ( S k 1 ) if and only if, for all orthogonal k k matrices O satisfying Oθ = θ, the random vectors OX and X are equal in distribution. If X is further absolutely continuous (with respect to the usual surface area measure on S k 1 ), 4

then the corresponding density is of the form f θ,g1 : S k 1 R k (2.1) x c k,g1 g 1 (x θ), where c k,g1 (> 0) is a normalization constant and g 1 : [ 1, 1] R is some nonnegative function called an angular function in the sequel. Throughout the paper, we then adopt the following assumption on the data generating process. Assumption (A). The observations X 1,..., X n are mutually independent and admit a common density of the form (2.1), for some θ S k 1 and some angular function g 1 in the collection F of functions from [ 1, 1] to R + that are positive and monotone nondecreasing. The notation (instead of g 1 ) will be used when considering a fixed angular density. An angular function that plays a fundamental role in directional statistics is then t ;exp,κ (t) = exp(κt), (2.2) for some concentration parameter κ(> 0). Clearly, ;exp,κ satisfies the conditions in Assumption (A). The resulting rotationally symmetric distribution was introduced in Fisher (1953) and is known as the Fisher-von Mises-Langevin (FvML(κ)) distribution. Other examples are the so-called linear rotationally symmetric distributions (LIN(a)), that are obtained for angular densities defined by (t) = t + a, with a > 1. In the sequel, the joint distribution of X 1,..., X n under Assumption (A) will be denoted as P θ,g 1. Note that, under P θ,g 1, the random variables X 1θ,..., X nθ are mutually independent and admit the common density (with respect to the Lebesgue measure over the real line) t g 1 (t) := ω k c k,g1 B( 1 2, 1 2 (k 1)) g 1(t)(1 t 2 ) (k 3)/2 I [ 1,1] (t), 5

where B(, ) is the beta function, ω k = 2π k/2 /Γ(k/2) is the surface area of S k 1, and I A ( ) stands for the indicator function of the set A. The corresponding cdf will be denoted by t Still under P θ,g 1, the random vectors S 1 (θ),..., S n (θ), where we let G 1 (t) = t 1 g 1(t)dt. S i (θ) := X i (X iθ)θ i = 1,..., n, (2.3) X i (X iθ)θ, are independent of the X iθ s, and are i.i.d., with a common distribution that is uniform over the unit (k 2)-sphere S k 1 (θ ) := {x S k 1 : x θ = 0}. It is easy to check that the common mean vector and covariance matrix of the S i (θ) s are given by 0 and (I k θθ )/(k 1), respectively. Fix now an angular density and consider the parametric family of probability measures P := { P θ, θ S k 1}. For P to be uniformly locally and asymptotically normal (ULAN), the angular density needs to satisfy some mild regularity conditions; more precisely, as we will state in Proposition 2.1 below, ULAN holds if belongs to the collection F ULAN of angular densities in F (see Assumption (A) above) that (i) are absolutely continuous (with a.e. derivative f 1, say) and such that (ii), letting ϕ f1 := f 1/, the quantity is finite. J k ( ) := 1 1 ϕ 2 (t)(1 t 2 ) (t) dt = 1 0 ϕ 2 ( 1 1 F 1 (u))(1 ( F 1 (u)) 2 ) du As usual, uniform local and asymptotic normality describes the asymptotic behaviour of local likelihood ratios of the form P θ+n 1/2 τ,, P θ, where the sequence (τ ) is bounded. In the present curved setup, (τ ) should be such that θ + n 1/2 τ S k 1 for all n, which imposes that θ τ = O(n 1/2 ). In particular, if the sequence (τ ) converges to some τ( R k ), then τ is orthogonal to θ. 6

We then have the following result (see Ley et al. (2013) for the proof). Proposition 2.1 Fix F ULAN. Then the family P ULAN, with central sequence = { P θ, θ S k 1} is θ, := 1 n n ϕ f1 (X iθ) 1 (X i θ)2 S i (θ) (2.4) i=1 and Fisher information matrix More precisely, (i) for any sequence (τ ) as above, log Γ θ,f1 := J k( ) k 1 (I k θθ ). (2.5) ( ) P θ+n 1/2 τ, = (τ ) P θ, 1 2 (τ ) Γ θ,f1 (τ ) + o P (1) θ, as n under P θ,, and (ii) θ,, still under P θ,, is asymptotically normal with mean zero and covariance matrix Γ θ,f1. As we show in the next section, this ULAN result allows to define Le Cam optimal tests for H 0 : θ = θ 0 under specified angular density. 3 Optimal parametric tests For some fixed θ 0 S k 1 and F ULAN, consider the problem of testing H 0 : θ = θ 0 versus H 1 : θ θ 0 in P, that is, consider the testing problem { H0 : { P θ 0, } H 1 : θ θ 0 { P θ, }. (3.6) For this problem, we define the test φ that, at asymptotic level α, rejects the 7

null of (3.6) whenever Q := ( θ 0, ) Γ θ 0, θ 0, (3.7) k 1 n = ϕ f1 (X nj k ( ) iθ 0 )ϕ f1 (X jθ 0 ) 1 (X i θ 0 ) 2 (3.8) i,j=1 1 (X j θ 0 ) 2 (S i (θ 0 )) S j (θ 0 ) > χ 2 k 1,1 α, where A denotes the Moore-Penrose inverse of A and χ 2 k 1,1 α stands for the α- upper quantile of a chi-square distribution with k 1 degrees of freedom. Applying in the present context the general results in Hallin et al. (2010) about hypothesis testing in curved ULAN families, yields that φ is Le Cam optimal more precisely, locally and asymptotically maximin at asymptotic level α for the problem (3.6). The asymptotic properties of this test are stated in the following result. Theorem 3.1 Fix θ 0 S k 1 and f F ULAN. Then, (i) under P θ 0,, Q is asymptotically chi-square with k 1 degrees of freedom; (ii) under P θ 0 +n 1/2 τ,, where the sequence (τ ) in R k satisfies θ 0τ = O(n 1/2 ) and converges to τ, Q is asymptotically non-central chi-square, still with k 1 degrees of freedom, and non-centrality parameter (iii) the sequence of tests φ τ Γ θ0, τ = J k( ) k 1 τ 2 ; (3.9) has asymptotic size α under P θ 0, ; (iv) φ is locally asymptotically maximin, at asymptotic level α, when testing {P θ 0, } against alternatives of the form θ θ 0 {P θ, }. For the particular case of the fixed-κ Fisher-von Mises-Langevin (FvML(κ)) dis- 8

tribution (obtained for ;exp,κ ; see (2.2)), we obtain Q ;exp,κ = = κ 2 (k 1) nj k (;exp,κ ) κ 2 (k 1) nj k (;exp,κ ) n (X i (X iθ 0 )θ 0 ) (X j (X jθ 0 )θ 0 ) i,j=1 n X i(i k θ 0 θ 0)X j. (3.10) i,j=1 The main drawback of the parametric tests φ is their lack of (validity-)robustness : under angular density g 1, there is no guarantee that φ asymptotically meets the nominal level constraint. under P θ 0,g 1. Indeed Q is, in general, not asymptotically χ 2 k 1 In practice, however, the underlying angular density may hardly be assumed to be known, and it is therefore needed to define robustified versions of φ that will combine (a) Le Cam optimality at (Theorem 3.1(iv)) and (b) validity under a broad collection of angular densities {g 1 }. A first way to perform such a robustification is to rely on studentization. This simply consists in considering test statistics of the form Q ;Stud := ( θ 0, ) (ˆΓ g 1 θ 0, ) θ 0,, where ˆΓ g 1 θ 0, is an arbitrary consistent estimator of the covariance matrix in the asymptotic multinormal distribution of θ 0, under P θ 0,g 1. The resulting tests, that reject the null H 0 : θ = θ 0 (with unspecified angular density) whenever Q ;Stud > χ2 k 1,1 α, are validity-robust that is, they asymptotically meet the level constraint under a broad range of angular densities and remain Le Cam optimal at. Of special interest is the FvML studentized test φ ;exp ;Stud, say that rejects the null hypothesis whenever Qφ ;exp ;Stud = k 1 n ˆL k (g 1 ) n X i(i k θ 0 θ 0)X j, (3.11) where ˆL k (g 1 ) := 1 n 1 n i=1 (X iθ 0 ) 2 is a consistent estimator of L k (g 1 ) := 1 E θ 0,g 1 [(X iθ 0 ) 2 ] (the expectation is taken under P θ 0,g 1 ); this test was studied in Watson 9 i,j=1

(1983). From studentization, this test is valid under any rotationally symmetric distribution; moreover, since Q ;exp ;Stud = Q ;exp,κ + o P (1) as n under P θ 0,;exp,κ for any κ, this test is also optimal in the Le Cam sense under any FvML distribution. Studentization, however, typically leads to tests that fail to be efficiency-robust, in the sense that the resulting type 2 risk may dramatically increase when the underlying angular density g 1 much deviates from the target density or target densities, in the case of the FvML studentized test φ ;exp ;Stud at which they are optimal. That is why studentization will not be considered in this paper. Instead, we will take advantage of the group invariance structure of the testing problem considered, in order to introduce invariant tests that are both validity- and efficiency-robust. As we will see, invariant tests in the present context are rank tests. 4 Optimal rank tests We start by describing the group invariance structure of the testing problem considered (Section 4.1). Then we introduce (and study the properties of) rank-based versions of the central sequences from Proposition 2.1 (Section 4.2). This will allow us to develop the resulting (optimal) rank tests and to derive their asymptotic properties (Section 4.3). 4.1 Group invariance structure Still for some given θ 0 S k 1, consider the problem of testing H 0 : θ = θ 0 against H 1 : θ θ 0 under unspecified angular density g 1, that is, consider the testing problem { H0 : g 1 F { P θ 0,g 1 } H 1 : θ θ 0 g 1 F { P θ,g 1 }. (4.12) 10

This testing problem is invariant under two groups of transformations, which we now quickly describe. (i) To define the first group, we introduce the tangent-normal decomposition X i = (X iθ 0 )θ 0 + X i (X iθ 0 )θ 0 S i (θ 0 ), i = 1,..., n of the observations X i, i = 1,..., n. The first group of transformations we consider is then G = {g h : h H},, with g h : (R k ) n (R k ) n (X 1,..., X n ) (h(x 1θ 0 )θ 0 + X 1 h(x 1θ 0 )θ 0 S 1 (θ 0 ),..., h(x nθ 0 )θ 0 + X n h(x nθ 0 )θ 0 S n (θ 0 )), where H is the collection of mappings h : [ 1, 1] [ 1, 1] that are continuous, monotone increasing, and satisfy lim t ±1 h(t) = ±1. The null hypothesis of (4.12) is clearly invariant under the group G,. The invariance principle therefore suggests restricting to tests that are invariant with respect to this group. As it was shown in Ley et al. (2013), the maximal invariant I (θ 0 ) associated with G, is the sign-and-rank statistic (S 1 (θ 0 ),..., S n (θ 0 ), R 1 (θ 0 ),..., R n (θ 0 )), where R i (θ 0 ) denotes the rank of X iθ 0 among X 1θ 0,..., X nθ 0. Consequently, the class of invariant tests coincides with the collection of tests that are measurable with respect to I (θ 0 ), in short, with the class of (sign-and-)rank or, simply, rank tests. It is easy to check that G, is actually a generating group for the null hypothesis g 1 F {P θ 0,g 1 } in (4.12). As a direct corollary, rank tests are distribution-free under the whole null hypothesis. This explains why rank tests will be validity-robust. (ii) Of course, the null hypothesis in (4.12) is also invariant under orthogonal transformations fixing the null location value θ 0. More precisely, it is invariant under 11

the group G rot = {g O : O O θ0 },, with g O : (R k ) n (R k ) n (X 1,..., X n ) (OX 1,..., OX n ), where O θ0 is the collection of all k k matrices O satisfying Oθ 0 = θ 0. Clearly, the vectors of signs and ranks above is not invariant under G rot,, but the statistic ( (S1 (θ 0 ) ) S2 (θ 0 ), ( S 1 (θ 0 ) ) S3 (θ 0 ),..., ( S n 1 (θ 0 ) ) Sn (θ 0 ), R 1 (θ 0 ),..., R n (θ 0 )) (4.13) is. Tests that are measurable with respect to the statistic in (4.13) will therefore be invariant with respect to both groups considered above. 4.2 Rank-based cental sequences To combine validity-robustness/invariance with Le Cam optimality at a target angular density, we introduce rank-based versions of the central sequences that appear in the ULAN property above (Proposition 2.1). More precisely, we consider rank statistics of the form θ,k = 1 n n ( ) Ri (θ) K S i (θ), n + 1 i=1 where the score function K : [0, 1] R is throughout assumed to be continuous (which implies that it is bounded and square-integrable over [0, 1]). In order to state the asymptotic properties of the rank-based random vector we introduce the following notation. For any g 1 F, write θ,k,g 1 := 1 n n K ( G1 (X iθ) ) S i (θ), i=1 where G 1 denotes the cdf of X iθ under P θ,g 1. For any g 1 F ULAN, define further Γ θ,k := J k(k) k 1 (I k θθ ) and Γ θ,k,g1 := J k(k, g 1 ) (I k θθ ), k 1 12 θ,k,

with J k (K) := 1 0 K2 (u)du and J k (K, g 1 ) := 1 K(u)K 0 g 1 (u)du, where we wrote 1 K g1 (u) := ϕ g1 (u) 1 ( G 1 (u)) 2 for any u [0, 1]. We then have the following result (see Ley et al. (2013)). Proposition 4.1 Let Assumption (A) hold, and fix θ S k 1 and g 1 F. Then (i) under P θ,g 1, θ,k = θ,k,g 1 + o L 2(1) as n ; (ii) under P θ,g 1, θ,k is asymptotically multinormal with mean zero and covariance matrix Γ θ,k ; (iii) under P θ+n 1/2 τ,g 1, where the sequence (τ ) in R k satisfies θ 0τ = O(n 1/2 ) and converges to τ, θ,k is asymptotically multinormal with mean Γ θ,k,g 1 τ and covariance matrix Γ θ,k (the result in (iii) requires the reinforcement g 1 F ULAN ). For any FULAN C1 C1, where FULAN denotes the collection of angular densities in F ULAN that are continuously differentiable over [ 1, 1], the function u K f1 (u) := 1 1 ϕ f1 ( F 1 (u)) 1 ( F 1 (u)) 2 is a valid score function to be used in rank-based central sequences. Proposition 4.1(i) then entails that, under P θ,, ϑ,k f1 is asymptotically equivalent in L 2, hence also in probability to the parametric -central sequence θ, = θ,k f1,. This provides the key to develop rank tests that are Le Cam optimal at any given FULAN C1. As for Proposition 4.1(ii)-(iii), they allow to derive the asymptotic properties of the resulting optimal rank tests. 4.3 Rank tests Fix a score function K as above. The previous sections then make it natural to consider the rank test φk, say that, at asymptotic level α, rejects the null hypothesis H 0 : θ = θ 0 (with unspecified angular density g 1 ) whenever QK = ( = k 1 nj k (K) > χ 2 k 1,1 α. θ 0,K ) Γ θ 0,K n K i,j=1 θ 0,K ( Ri (θ 0 ) n + 1 13 ) K ( ) Rj (θ 0 ) (S i (θ 0 )) S θ0 (X j ) n + 1

Clearly, this test is invariant with respect to both groups introduced in Section 4.1, since it is measurable with respect to the statistic in (4.13). The following result, that summarizes the asymptotic properties of φ K, easily follows from Proposition 4.1. Theorem 1 Let Assumption (A) hold. Then, (i) under g 1 F {P θ 0,g 1 }, QK is asymptotically chi-square with k 1 degrees of freedom; (ii) under P θ 0 +n 1/2 τ,g 1, where g 1 F ULAN and where the sequence (τ ) in R k satisfies θ 0τ = O(n 1/2 ) and converges to τ, QK is asymptotically non-central chi-square, still with k 1 degrees of freedom, and non-centrality parameter τ Γ θ,k,g1 Γ θ 0,K Γ θ,k,g 1 τ = J 2 k (K, g 1) (k 1)J k (K) τ 2 ; (4.14) (iii) the sequence of tests φk has asymptotic size α under g 1 F {P θ 0,g 1 }; (iv) φ with FULAN C1, is locally and asymptotically maximin, at asymptotic level α, when testing g 1 F {P θ 0,g 1 } against alternatives of the form θ θ 0 {P θ, }. K f1, Note that, for K = K f1 and g 1 = (with FULAN C1 ), the non-centrality parameter in (4.14) above rewrites J 2 k (K, ) (k 1)J k (K f1 ) τ 2 = J k( ) k 1 τ 2, hence coincides with the non-centrality parameter in (4.14). This establishes the optimality statement in Theorem 1(iv). 5 Simulations In this final section, we investigate the finite-sample behaviour of the proposed rank tests φk through a Monte Carlo study. We generated M = 10, 000 (mutually independent) random samples of (k =)3-dimensional rotationally symmetric random vectors ε (l) i, i = 1,..., n = 250, l = 1, 2, 3, 4, 14

with location θ 0 = (1, 0, 0) and with angular densities that are FvML(1), FvML(3), LIN(1.1), and LIN(2), for l = 1, 2, 3, and 4, respectively; see Section 2. Each random vector ε (l) i was then transformed into X (l) i;ω := O ω ε (l) i, i = 1,..., n, l = 1, 2, 3, 4, ω = 0, 1, 2, 3, with O ω = cos(πω/50) sin(πω/50) 0 sin(πω/50) cos(πω/50) 0 0 0 1. Clearly, ω = 0 corresponds to the null hypothesis H 0 : θ = θ 0, whereas ω = 1, 2, 3 correspond to increasingly severe alternatives. On each replication of the samples (X (l) 1;ω,..., X (l) n;ω), l = 1, 2, 3, 4, ω = 0, 1, 2, 3, we performed the following tests, all at asymptotic level α = 5% : (i) the test φ ;exp,3, that is, the parametric test φ using a FvML(3) angular target density, (ii) the test φ ;exp ;Stud that is based on the FvML studentized statistic Q ;exp ;Stud defined in (3.11), and (iii) the rank-based tests φk FvML(1), φk FvML(3), φk LIN(2), and φk LIN(2) that are Le Cam optimal at FvML(1), FvML(3), Lin(1.1), and Lin(2) distributions, respectively. The resulting rejection frequencies are provided in Table 1. These empirical results perfectly agree with the asymptotic theory : the parametric test φ ;exp,3 is the most powerful one at the FvML(3) distribution, but its null size deviates quite much from the target size α = 5% away from the FvML(3). The studentized parametric test φ ;exp ;Stud, on the contrary, shows a null behaviour that is satisfactory under all distributions considered. It also dominates its rank-based competitors under FvML densities, which is in line with the fact that it is optimal in the class of FvML densities. Outside this class, however, the proposed rank tests are more powerful than the studentized test, which translates their better efficiencyrobustness. The null behaviour of the proposed rank tests is very satisfactory, and 15

ω Underlying Test 0 1 2 3 density FvML(1) FvML(3) LIN(1.1) LIN(2) φ ;exp,3.1162.1572.2673.4493 φ ;exp ;Stud.0486.0733.1502.2867 φ K.0494.0732.1516.2891 FvML(1) φ K.0525.0720.1467.2743 FvML(3) φ K.0504.0662.1237.2248 LIN(1.1) φ K.0527.0738.1505.2890 LIN(2) φ f.0528.2250.7104.9719 1;exp,3 φ ;exp ;Stud.0536.2239.7062.9701 φ K.0530.2177.6828.9615 FvML(1) φ K.0541.2244.7056.9695 FvML(3) φ K.0509.2066.6688.9598 LIN(1.1) φ K.0538.2220.7025.9674 LIN(2) φ f.1290.1661.2729.4556 1;exp,3 φ ;exp ;Stud.0498.0668.1372.2749 φ K.0493.0676.1396.2747 FvML(1) φ K.0496.0723.1588.3245 FvML(3) φ K.0496.0711.1608.3359 LIN(1.1) φ K.0501.0698.1499.3001 LIN(2) φ f.1384.1496.1727.2284 1;exp,3 φ ;exp ;Stud.0526.0579.0741.1074 φ K.0545.0585.0769.1100 FvML(1) φ K.0542.0600.0773.1069 FvML(3) φ K.0540.0600.0737.0968 LIN(1.1) φ K.0540.0610.0781.1110 LIN(2) Table 1: Rejection frequencies (out of M = 10, 000 replications), under the null (ω = 0) and increasingly severe alternatives (ω = 1, 2, 3), of the parametric FvML(3)- test (φ ;exp,3 ), the studentized FvML-test (φ ;exp ;Stud), and of the rank tests achieving optimality at FvML(1), FvML(3), LIN(1.1), and LIN(2) densities (φ K FvML(1), φk FvML(3), φk LIN(1.1), and φk LIN(2), respectively). The sample size is n = 250 and the nominal level is 5%; see Section 5 for further details. 16

their optimality under correctly specified angular densities is confirmed. Acknowledgments Davy Paindaveine s research is supported by an A.R.C. contract from the Communauté Française de Belgique and by the IAP research network grant nr. P7/06 of the Belgian government (Belgian Science Policy). References Bowers, J. A., M. I. D. and Mould, G. I. (2000), Directional statistics of the wind and waves, Applied Ocean Research, 22, 401 412. Briggs, M. S. (1993), Dipole and quadrupole tests of the isotropy of gamma-ray burst locations, Astrophys. J., 407, 126 134. Chang, T. (2004), Spatial statistics, Statist. Sci., 19, 624 635. Clark, R. M. (1983), Estimation of parameters in the marginal Fisher distribution, Austr. J. Statist., 25, 227 237. Fisher, N. I., L. T. and Embleton, B. J. J. (1987), Statistical Analysis of Spherical Data, Cambridge University Press, UK. Fisher, N. I. (1953), Dispersion on a sphere, Proceedings of the Royal Society of London A, 217, 295 305. (1987), Problems with the current definitions of the standard deviation of wind direction, J. Clim. Appl. Meteorol., 26, 1522 1529. (1989), Smoothing a sample of circular data, J. Struct. Geol., 11, 775 778. 17

Hallin, M., Paindaveine, D., and Verdebout, T. (2010), Optimal rank-based testing for principal components, Ann. Statist., 38, 3245 3299. Hallin, M. and Werker, B. J. (2003), Semi-parametric efficiency, distribution-freeness and invariance, Bernoulli, 9, 137 165. Jupp, P. and Mardia, K. (1989), A unified view of the theory of directional statistics, Internat. Statist. Rev., 57, 261 294. Leong, P. and Carlile, S. (1998), Methods for spherical data analysis and visualization, J. Neurosci. Meth., 80, 191 200. Ley, C., Swan, Y., Thiam, B., and Verdebout, T. (2013), Optimal R-estimation of a spherical location, Statist. Sinica, 23, 305 333. Mardia, K. (1975), Statistics of directional data (with discussion), J. Roy. Statist. Soc. Ser. B, 37, 349 393. Mardia, K. V. and Edwards, R. (1982), Weighted distributions and rotating caps, Biometrika, 69, 323 330. Mardia, K. V. and Jupp, P. (2000), Directional Statistics, Wiley, New York. Merrifield, A. (2006), An investigation of mathematical models for animal group movement, using classical and statistical approaches, Ph.D. thesis, The University of Sydney, Sydney, Australia. Saw, J. G. (1978), A family of distributions on the m-sphere and some hypothesis tests, Biometrika, 65, 69 73. Storetvedt, K. M. and Scheidegger, A. E. (1992), Orthogonal joint structures in the Bergen area, southwest Norway, and their regional significance, Phys. Earth Planet. In., 73, 255 263. 18

Tsai, M. and Sen, P. (2007), Locally best rotation-invariant rank tests for modal location, J. Multivariate Anal., 98, 1160 1179. Watson, G. (1983), Statistics on Spheres, Wiley, New York. 19