Nonparametric inference with directional and linear data

Size: px

Start display at page:

Download "Nonparametric inference with directional and linear data"

Hubert Flynn
6 years ago
Views:

1 PhD Dissertation Nonparametric inference with directional and linear data Eduardo García Portugués Departamento de Estatística e Investigación Operativa Universidade de Santiago de Compostela September

3 Nonparametric inference with directional and linear data Eduardo García Portugués Supervised by Wenceslao González Manteiga and Rosa M. Crujeiras

4 This work has been supported by FPU grant AP 957 from Ministerio de Educación, graduate grant for research stays from Fundación Barrié, Project MTM8 from Ministerio de Ciencia e Innovación, Project MDS75PR from Dirección Xeral de I+D da Xunta de Galicia and IAP network StUDyS from Belgian Science Policy. The author gratefully acknowledges the computational resources used at the SVG cluster of the CESGA Supercomputing Center that allowed to run most of the simulations.

5 Informe de los directores Don Wenceslao González Manteiga, Catedrático del Departamento de Estatística e Investigación Operativa de la Universidade de Santiago de Compostela y Doña Rosa M. Crujeiras Casais, Profesora Contratada Doctora en el mismo departamento, como directores de la tesis titulada Nonparametric inference with directional and linear data Por la presente, declaran Que la tesis doctoral presentada por Don Eduardo García Portugués es idónea para ser presentada, de acuerdo con el artículo del Reglamento de Estudios de Doctorado, por la modalidad de compendio de artículos, en los que el doctorando tuvo participación en el peso de la investigación y su contribución fue decisiva para llevar a cabo este trabajo. Y que está en conocimiento de los coautores, tanto doctores como no doctores, participantes en los artículos, que ninguno de los trabajos reunidos en esta tesis serán presentados por ninguno de ellos en otra tesis de doctorado, lo que firmamos bajo nuestra responsabilidad. En Santiago de Compostela, a de septiembre de. Los directores: Prof. Dr. Wenceslao González Manteiga Prof. Dr. Rosa M. Crujeiras Casais El doctorando: Eduardo García Portugués iii

7 Agradecimientos Quiero comenzar estos agradecimientos expresando mi profunda gratitud por el apoyo recibido por parte de mis directores, Wenceslao González Manteiga y Rosa M. Crujeiras, durante todo el proceso de realización de esta tesis doctoral. A Wences le agradezco el haber despertado mi interés por la estadística en mis últimos años de licenciatura, la confianza que depositó en mí desde mis primeros pasos como investigador y la continua motivación que me ha transmitido. Estoy muy agradecido a Rosa por su apoyo diario, tanto profesional como personal, a lo largo de estos años. Valoro especialmente sus innumerables y exhaustivas revisiones críticas, que han servido para perfeccionar notablemente el trabajo desarrollado en esta tesis. La verdad es que me siento enormemente afortunado por haberla tenido como codirectora. I would like also to thank Ingrid Van Keilegom for her hospitality and her guidance in research during my stay at Louvain-la-Neuve. Gracias a mi madre y a mi padre, a vosotros os lo debo todo en esta vida. Gracias también a mi familia por haberme apoyado en esta y en otras etapas y, en especial, a mi abuela Isabel Del Río Galza y a mi abuelo Rafael Portugués Carrera. Que esta tesis sirva como mi pequeño homenaje a los grandes esfuerzos que ambos realizaron por el bienestar de la familia. Finalmente, agradezco a mis compañeros del Departamento de Estadística e Investigación Operativa y de la Facultad de Matemáticas por los buenos momentos pasados durante estos años. Santiago de Compostela, de septiembre de. Eduardo García Portugués v

9 Contents Informe de los directores Agradecimientos Contents List of Figures List of Tables iii v x xvi xviii Introduction. What is directional data? Contributions of the thesis Real datasets Wind direction Hipparcos dataset Portuguese wildfires Protein angles Text mining Manuscript distribution References Density estimation by circular-linear copulas. Introduction Background Some circular and circular-linear distributions Some notes on copulas Estimation algorithm Some simulation results Application to wind direction and SO concentration Exploring SO and wind direction in and Final comments References Kernel density estimation for directional-linear data. Introduction Background on linear and directional kernel density estimation Kernel density estimation for directional data vii

10 viii Contents.. Kernel density estimation for directional-linear data Main results Error measurement and optimal bandwidth MISE for directional and directional-linear kernel density estimators Some exact MISE calculations for mixture distributions Conclusions A Some technical lemmas B Proofs of the main results C Proofs of the technical lemmas References Bandwidth selectors for kernel density estimation with directional data 79. Introduction Kernel density estimation with directional data Available bandwidth selectors A new rule of thumb selector Selectors based on mixtures Mixtures fitting and selection of the number of components Comparative study Directional models Circular case Spherical case The effect of dimension Data application Wind direction Position of stars Conclusions A Proofs B Models for the simulation study C Extended tables for the simulation study References A nonparametric test for directional-linear independence 7 5. Introduction Background to kernel density estimation Linear kernel density estimation Directional kernel density estimation Directional-linear kernel density estimation A test for directional-linear independence The test statistic Calibration of the test Simulation study Real data analysis Data description Results Discussion A Proof of Lemma

11 Contents ix References Central limit theorems for directional and linear random variables with applications 9 6. Introduction Background Central limit theorem for the integrated squared error Main result Extensions of Theorem Testing independence with directional random variables Goodness-of-fit test with directional random variables Testing a simple null hypothesis Composite null hypothesis Calibration in practise Extensions to directional-directional models Simulation study Real data application A Sketches of the main proofs A. CLT for the integrated squared error A. Testing independence with directional data A. Goodness-of-fit test for models with directional data References Testing parametric models in linear-directional regression 5 7. Introduction Nonparametric linear-directional regression The projected local estimator Properties Goodness-of-fit test for linear-directional regression Bootstrap calibration Simulation study Application to text mining A Main results A. Projected local estimator properties A. Asymptotic results for the goodness-of-fit test References Future research Bandwidth selection in nonparametric linear-directional regression Kernel density estimation under rotational symmetry R package DirStats A goodness-of-fit test for the Johnson and Wehrly copula structure References A Supplement to Chapter 6 9 A. Technical lemmas A.. CLT for the ISE A.. Testing independence with directional data

12 x Contents A.. Goodness-of-fit test for models with directional data A.. General purpose lemmas A. Further results for the independence test A.. Closed expressions A.. Extension to the directional-directional case A.. Some numerical experiments A. Extended simulation study A.. Parametric models A.. Estimation A.. Simulation A.. Alternative models A..5 Bandwidth choice A..6 Further results A. Extended data application References B Supplement to Chapter 7 B. Particular cases of the projected local estimator B.. Local constant B.. Local linear with q = B.. Local linear with q = B. Technical lemmas B.. Projected local estimator properties B.. Asymptotic results for the goodness-of-fit test B.. General purpose lemmas B. Further simulation results Resumen en castellano 7 Bibliografía del resumen

13 List of Figures. Polar coordinates in Ω and spherical coordinates in Ω just an octant of the sphere is represented for a better display Von Mises densities on Ω and Ω. From right to left, densities corresponding to q = with κ and κ and to q = with κ and κ, respectively. The parameters are µ = q,, κ = and κ = 5, with q denoting a vector of q zeros. Samples of size n = 5 are drawn.. From right to left, densities corresponding to: MS π, 5,,,, ; density. with a vm 5π, link and marginals N, and circular uniform; density. with a vmπ, 7 link and marginals JP,, and uniform circular; WNT, π 6,,,. Samples of size n = are drawn Diagram with the contributions of the thesis marked in bold and a short state of the art with the main references in the scope of the thesis. Future research lines described in Chapter 8 are included in italics Rose diagrams for wind direction in station B for January left, January center and in station A mourela in June right. The first two ones show the average SO concentration in µg/m associated with winds coming from each partition of the windrose Number of observed stars per square degree, in galactic coordinates cell size, extracted from Figure.. of Perryman 997. The higher concentrations of stars are located around the equator galactic plane and two spots that represent the Orion s arm left and the Gould s Belt right Number of hectares burnt in in each watershed delineated by Barros et al. left plot and main orientation of a wildfire obtained from the PC of the perimeter right plot; extracted from Barros et al Representation of the dihedral angles in the primary protein structure left, adapted from Richardson and Ramachadran plot of the dihedral angles of the alaninealanine-alanine segments contained in the ProteinsAAA object of Fernández-Durán and Gregorio-Domínguez View of the Slashdot website on July 7, Locations of monitoring station circle and power plant in Galicia NW-Spain. Location of station B: 7 W, 5 N. Power plant location: W, 6 6 N Rose diagrams for wind direction in station B for left plot and right plot, with average SO concentration Copula density surfaces. Left plot: J&W copula with von Mises joining density with parameters µ = π and κ =. Middle plot: QS-copula with α = π. Right plot: reflected Frank copula with α = xi

14 xii List of Figures. Illustration for data reflection for copula density estimation. The central square in each plot corresponds with the original data ranks. Left plot: mirror reflection from Gijbels and Mielniczuk 99. Right plot: circular-mirror reflection for estimator Density surfaces for the simulation study. Top left: Example. with µ = π and κ =. Top right: Example. with µ = π, κ = 5, µ = π/ and κ =. Bottom left: Example. with α = π, µ = π/ and κ =.5. Bottom right: Example. with α =, µ = π/ and κ = Boxplots of the ISE for Example. top left, Example. top right, Example. bottom left and Example. bottom right for n = 5 and different estimation procedures Circular-linear density estimator for wind direction and SO concentrations in monitoring station B. Left column:. Right column: Left: contour plot of a von Mises density vmµ, κ, with µ =,, and κ =. Right: contour plot of the mixture of von Mises densities From left to right: circular-linear mixture.5 and corresponding circular and linear marginal densities, respectively. Random samples of size n = are drawn From left to right: exact MISE and AMISE for the linear mixture 5 N, + 5N, + 5N, and the circular and spherical mixtures., for a range of bandwidths between and. The black curves are for the MISE, whereas the red ones are for the AMISE. Solid curves correspond to n = and dotted to n =. Vertical lines represent the bandwidth values minimizing each curve Upper plot, from left to right: exact MISE versus AMISE for the circular-linear mixture.5 for n = and n =. Lower plot, from left to right: spherical-linear mixture.5 for n = and n =. The solid curves are for the MISE, where the dashed ones are for the AMISE. The pairs of bandwidths that minimizes each surface error are denoted by h, g MISE and by h, g AMISE The effect of the extra term in h ROT. Left plot: logarithm of the curves of MISEh TAY, MISEh ROT and MISEh MISE for sample size n = 5. The curves are computed by Monte Carlo samples and h MISE is obtained exactly. The abscissae axis represents the variation of the parameter θ π, π, which indexes the reference density vm,, + vm cosθ, sinθ,. Right plot: logarithm of h TAY, h ROT, h MISE and their corresponding MISE for different values of κ, with n = Simulation scenarios for the circular case. From left to right and up to down, models M to M. For each model, a sample of size 5 is drawn Simulation scenarios for the spherical case. From left to right and up to down, models M to M. For each model, a sample of size 5 is drawn Left: density of the wind direction in the meteorological station of A Mourela. Right: density of the stars collected in the Hipparcos catalogue, represented in galactic coordinates and Aitoff projection

15 List of Figures xiii 5. Descriptive maps of wildfires in Portugal with the watersheds delineated by Barros et al.. The left map shows the number of hectares burnt from fire perimeters associated with each watershed. Each fire perimeter is associated with the watershed that contains its centroid. The center map represents the mean slope of the fires of each watershed, where the slope is measured in degrees stands for plain slope and 9 for a vertical one. Finally, the right map shows watersheds where fires display preferential alignment according to Barros et al Random samples of n = 5 points for the simulation models in the circular-linear case, with δ =.5 situation with dependence. From left to right and up to down, M to M6. M, M and M present a deviation from the independence in terms of the conditional expectation; M and M5 account for a deviation in terms of the conditional variance and M6 includes deviations both in conditional expectation and variance p-values from the independence test for the first principal component PC of the fire perimeter and the burnt area on a log scale, by watersheds. From left to right, the first and second maps represent the circular-linear p-values PC in R and their corrected versions using the FDR, respectively. The third and fourth maps represent the spherical-linear situation PC in R, with uncorrected and corrected p-values by FDR, respectively Left: density contour plot for fires in watershed number, the watershed in the second plot on the left of Figure 5. with p-value =.. The number of fires in the watershed is n = 5. The contour plot shows that the size of the area burnt is related with the orientation of the fires in the watershed. Right: scatter plot of the fires slope and the burnt area for the whole dataset, with a nonparametric kernel regression curve showing the negative correlation between fire slope and size Density models for the simulation study in the circular-linear case. From left to right and up to down, models CL to CL Density models for the simulation study in the circular-circular case. From left to right and up to down, models CC to CC Empirical size and power of the circular-linear left, model CL and circular-circular right, model CC8 goodness-of-fit tests for a logarithmic spaced grid. Lower surface represents the empirical rejection rate under H. and upper surface under H.5. Green colour indicates that the percentage of rejections is in the 95% confidence interval of α =.5, blue that is smaller and orange that is larger. Black points represent the empirical size and power obtained with the median of the LCV bandwidths Left: parametric fit model from Mardia and Sutton 978 to the circular mean orientation and mean log-burnt area of the fires in each of the watersheds of Portugal. Right: parametric fit model from Fernández-Durán 7 for the dihedral angles of the alanine-alanine-alanine segments QQ-plot comparing the quantiles of the asymptotic distribution given by Theorem 7. with the sample quantiles for { nh T j n π nh} 5 with n = j= left and n = 5 5 right Parametric regression models for scenarios S to S, for circular and spherical cases Empirical sizes first row and powers second row for significance level α =.5 for the different scenarios, with p = solid line and p = dashed line. From left to right, columns represent dimensions q =,, with sample size n = Significance trace of the local constant goodness-of-fit test for the constrained linear model.6

16 xiv List of Figures 8. Directional regression models proposed for the simulation study. From left to right and up to down, models used in scenarios S to S, with the first three rows fo the circular versions and the last three for the spherical ones Directional densities to be considered in the simulation study. From left to right and up to down, the first two columns represent the directional densities D to D6 in the circular case. The corresponsing spherical versions are in the third and fourth columns. For each density, a sample of size n = 5 is drawn From right to left: vmθ, κ density, rotational kernel density estimators ˆf hlcv,θ and ˆf hlcv,ˆθ MLE and usual kernel density estimator ˆf hlcv. Sample size is n =, θ = q, and κ = From right to left: contourplots for copula C G, copula C Gn and empirical copula C n, for a sample size of n = 5 with q = and gθ = πi κ e κ cosθ µ a circular von Mises density with µ = π and κ = A. Comparison of the asymptotic and empirical distributions of nh q ng n T n A n for sample sizes n = 5 j k, j =,, k =,,, 5. Black curves represent a kernel estimation from simulations, green curves represent a normal fit to the unknown density and red curves represent the theoretical asymptotic distribution A. Empirical size and power of the goodness-of-fit tests for a grid of bandwidths. First two rows, from left to right and up to down: models CL, CL5, CL7, CL8, CL9 and CL. Last two rows: CC, CC5, CC7, CC8, CC9 and CC. Lower surface represents the empirical rejection rate under H. and upper surface under H.5. Green colour represent that the empirical rejection is in the 95% confidence interval of α =.5, blue that is lower and orange that is larger. Black points represent the sized and powers obtained with the median of the LCV bandwidths for model CC under H is outside the grid A. Upper row, from left to right: parametric fit model from Mardia and Sutton 978 to the circular mean orientation and mean log-burnt area of the fires in each of the watersheds of Portugal; parametric fit model from Fernández-Durán 7 for the dihedral angles of the alanine-alanine-alanine segments. Lower row: p-values of the goodness-of-fit tests for a grid, with the LCV bandwidth for the data B. From left to right: directional densities for scenarios S to S for circular and spherical cases B. From left to right: deviations and and conditional standard deviation function σ for circular and spherical cases B. Densities of the response Y under the null solid line and under the alternative dashed line for scenarios S to S columns, from left to right and dimensions q =,, rows, from top to bottom B. Empirical sizes for α =. first row, α =.5 second row and α =. third row for the different scenarios, with p = solid line and p = dashed line. From left to right, columns represent dimensions q =,, with sample size n = B.5 Empirical sizes for α =. first row, α =.5 second row and α =. third row for the different scenarios, with p = solid line and p = dashed line. From left to right, columns represent dimensions q =,, with sample size n = B.6 Empirical sizes for α =. first row, α =.5 second row and α =. third row for the different scenarios, with p = solid line and p = dashed line. From left to right, columns represent dimensions q =,, with sample size n =

17 List of Figures xv B.7 Empirical powers for the different scenarios, with p = solid line and p = dashed line. From top to bottom, rows represent sample sizes n =, 5, 5 and from left to right, columns represent dimensions q =,,

19 List of Tables. MISE for estimating the circular-linear density in Examples. and.. Relative efficiencies for JWSP, JWNP, CSP and CNP are taken with respect to JWP MISE for estimating the circular-linear density in Examples. and Comparative study for the circular case, with sample size n = 5. Columns of the selector represent the MISE, with bold type for the minimum of the errors. The standard deviation of the ISE is given between parentheses Ranking for the selectors for the circular and spherical cases, for sample sizes n =, 5, 5,. The higher the score in the ranking, the better the performance of the selector. Bold type indicates the best selector Comparative study for the spherical case, with sample size n = 5. Columns of the selector represent the MISE, with bold type for the minimum of the errors. The standard deviation of the ISE is given between parentheses Ranking for the selectors for dimensions q =,, 5 and sample size n =. The larger the score in the ranking, the better the performance of the selector. Bold type indicates the best selector Directional densities considered in the simulation study Comparative study for the circular case, with up to down blocks corresponding to sample sizes, 5 and, respectively. Columns of the selector represent the MISE, with bold type for the minimum of the errors. The standard deviation of the ISE is given between parentheses Comparative study for the spherical case, with up to down blocks corresponding to sample sizes, 5 and, respectively. Columns of the selector represent the MISE, with bold type for the minimum of the errors. The standard deviation of the ISE is given between parentheses Comparative study for higher dimensions with sample size n = : up to down blocks correspond to dimensions q =,, 5. Columns of the selector represent the MISE, with bold type for the minimum of the errors. The standard deviation of the ISE is given between parentheses Proportion of rejections for the Rn, U n, λ n, Tn LCV and Tn BLCV tests of independence for sample sizes n = 5,, for a nominal significance level of 5%. For the six different models the values of the deviation from independence parameter are δ = independence,.5 and.5. Each proportion was calculated using B = permutations for each of M = random samples of size n simulated from the specified model xvii

20 xviii List of Tables 5. Proportion of rejections for the Rn, U n, λ n, Tn LCV and Tn BLCV tests of independence for sample sizes n = 5, for a nominal significance level of 5%. For the six different models the values of the deviation from independence parameter are δ = independence,.5 and.5. Each proportion was calculated using B = permutations for each of M = random samples of size n simulated from the specified model Computing times in seconds for Tn LCV and Tn BLCV as a function of sample size and dimension q, with q = for the circular-linear case and q = for the spherical-linear case. The tests were run with B = permutations and the times were measured in a.5 GHz core Empirical size and power of the circular-linear and circular-circular goodness-of-fit tests for models CL CL and CC CC respectively with significance level α =.5 and different sample sizes and deviations Specification of simulation scenarios for model Fitted constrained linear model on the Slashdot dataset, with R =.5. The significances of each coefficient are lower than Simulation scenarios for comparing the bandwidth selectors. f SN is the Skew Normal distribution of Azzalini 985, while the rest of notations can be seen in Subsection A. Notation for the densities described in Tables A. and A A. Circular-linear models A. Circular-circular models A. Empirical size and power of the circular-linear goodness-of-fit test for models CL CL with different sample sizes, deviations and significance levels A.5 Empirical size and power of the circular-circular goodness-of-fit test for models CC CC with different sample sizes, deviations and significance levels

21 Chapter Introduction An introduction to basic concepts and statistical tools for analysing directional data is provided in this chapter. Section. presents a short introduction to the analysis of directional data and to the most well known directional distributions. The contributions of this thesis are summarized in Section., with a brief state of the art on nonparametric inference with directional and linear data. Section. describes the real datasets used along the manuscript. Finally, Section. presents the thesis distribution and organization, with short abstracts describing the contents of each chapter. Contents. What is directional data? Contributions of the thesis Real datasets Wind direction Hipparcos dataset Portuguese wildfires Protein angles Text mining Manuscript distribution References What is directional data? The term directional data was coined in the first book of Mardia 97 to refer to data whose support is a circumference, a sphere or, generally, an hypersphere of arbitrary dimension. This kind of data appears naturally in several applied disciplines: proteomics angles in the structure of proteins; see for example Hamelryck et al. ; environmental sciences wind direction Johnson and Wehrly, 978, direction of waves Jona-Lasinio et al., ; biology animal orientation, see Batschelet 98 for several examples; cyclic phenomena arrival times at a care unit Fisher, 99, page 9, seasonality in freezing and thawing Oliveira et al., ; astronomy position of stars, see Sections..8 and.5. of Perryman 997; image analysis Dryden, 5 or even in text mining analysis of word frequency in texts, see for example Banerjee et al. 5. The collection of statistical techniques intended to analyse directional

22 Chapter. Introduction data was named as directional statistics by the homonym book of Mardia and Jupp, the revised reprint of Mardia 97. Directional observations are represented as points in the Euclidean hypersphere of dimension q, denoted by Ω q = { x R q+ : x = } also referred to as S q, where the simplest cases correspond to the circumference q = and the sphere q =. Inference with directional data is indeed constrained inference, since all the methods used for statistical analysis should take into account the special nature of Ω q, something that is not required with the usual linear i.e. Euclidean data. A pedagogical example that illustrates this problem is the definition of an appropriate directional mean for the simplest situation: for two observations X and X in the circumference Ω. A first attempt could be to consider the Euclidean mean X = X +X, but then X is not guaranteed to belong to Ω. Another possibility is to consider polar coordinates see Figure., compute the usual mean of the corresponding angles θ, θ, π and then set the directional mean as the point cos θ, sin θ, where θ = θ +θ. The problem with this approach is that if, for example, θ = π and θ = 7π, then θ = π, which produces an output in the opposite direction of the obvious mean, corresponding this one to θ =. A reasonable definition for a directional mean is obtained by Mardia and Jupp for further details. X X if X, not defined otherwise, see x = cos θ, sin θ x Ω θ x Figure.: Polar coordinates in Ω and spherical coordinates in Ω just an octant of the sphere is represented for a better display. Two main approaches have been followed in the statistical literature for the analysis of directional data, differing in the kind of representation. The first one is based on polar and spherical coordinates see Figure. to develop methods which are specifically designed to treat circular and spherical data, respectively, the most common types of directional data in practise. This is the approach followed by the books of Fisher 99, Jammalamadaka and SenGupta and Pewsey et al. for circular data and of Fisher et al. 99 for spherical. Unfortunately, the extensions of these methods to an arbitrary dimension q are not straightforward due to the nature of the spherical coordinates for higher dimensions. The second approach relies only in the Cartesian coordinates of the points in Ω q, without assuming any particular dimension and

23 .. What is directional data? thus ensuring more generality. This is the approach followed in the thesis, except in Chapter, where the first one is employed. Perhaps the most popular directional distribution is the von Mises-Fisher density see Watson 98 and Mardia and Jupp, or shortly the von Mises. The von Mises density, denoted by vmµ, κ or by vmµ, κ, if q = and µ = cos µ, sin µ, is given by { } f vm x; µ, κ = C q κ exp κx T µ, C q κ = π q+ κ q I q being µ Ω q the directional mean, κ the concentration parameter around the mean κ = gives the uniform density on Ω q and I ν the modified Bessel function of the first kind and order ν, which can be written as see equation.. of Olver et al. κ, I ν z = z ν t ν π / Γ ν + e zt dt. This distribution is considered as the Gaussian analogue for directional data for two main reasons. First, it presents the same Maximum Likelihood Estimator MLE characterization property that the Gaussian distribution has in the Euclidean case: it is the only directional distribution whose MLE of the location parameter is the directional sample mean see Bingham and Mardia 975 for a proof. Second, the von Mises density can be obtained from a random vector normally distributed conditioned to have unit norm. That is, for a normal random vector X N q+ µ, σ I q+, with µ R q+ \{} and σ >, it happens that Y = X X = µ vm µ, µ σ. This result shows that the inverse of the concentration parameter κ of a von Mises can be identified with the variance of a multivariate normal with covariance matrix proportional to the identity. See Gatto for a proof of this result in a more general situation. Another remarkable directional distribution is the one given by Jones and Pewsey 5, which is denoted by JPµ, κ, ψ. Originally motivated for the circular case, its density can be also defined in Ω q for an arbitrary dimension q: f JP x; µ, κ, ψ = q sinhκψ coshκψ + sinhκψx T µ q Γ q+ P q coshκψ ψ + q ψ, where µ Ω q is the location parameter, κ is the concentration around µ, ψ R is a shape parameter that controls a kind of negative kurtosis with respect to a vmµ, κ ψ < stands for more peaked densities while ψ > produces flatter ones and P ν µ is the Legendre function of the first kind, order µ and degree ν see equation.. of Olver et al.. This parametric family has the interesting property of containing as particular cases the vmµ, κ corresponding to ψ and, with q = and taking polar coordinates, the Cardioid ψ =, the Wrapped

24 Chapter. Introduction Cauchy ψ = and the Cartwright s power-of-cosine ψ <, κ. See Section of Jammalamadaka and SenGupta for more information on these distributions..5.5 Figure.: Von Mises densities on Ω and Ω. From right to left, densities corresponding to q = with κ and κ and to q = with κ and κ, respectively. The parameters are µ = q,, κ = and κ = 5, with q denoting a vector of q zeros. Samples of size n = 5 are drawn. In many situations, directional random variables appear together with a linear or another directional variable, being the circular-linear support on the cylinder Ω R and circular-circular support on the torus Ω Ω data the most common situations. See Figure. for some examples of densities in these supports. For both cases, a semiparametric model for the joint density was proposed in Johnson and Wehrly 978 and Wehrly and Johnson 979: f Θ,X θ, x = πg πf Θ θ ± F X x f Θ θf X x,. being Θ a circular variable with density f Θ and distribution function F Θ and X either a linear or a circular variable with associated f X and F X. g is a circular density that acts as the link function between the two marginal densities, given by f Θ and f X, either considering a positive negative dependency or a negative sign positive dependency in ±. g can be interpreted in terms of copulas see Nelsen 6 for an introduction to copulas, since the copula density of Θ, X is given by c Θ,X u, v = πg πu ± v. Different parametric models can be obtained from this semiparametric structure, for example the bivariate von Mises model of Shieh and Johnson 5. However, not all parametric densities in this context satisfy.. For example, the circularlinear density of Mardia and Sutton 978 denoted by MSµ, κ, m, ρ, ρ, σ or the circularcircular Wrapped Normal Torus given in Example 7. of Johnson and Wehrly 977 denoted by WNTm, m, σ, σ, ρ are parametric densities that do not verify.. The expressions of these densities are, respectively: f MS θ, x; µ, κ, m, σ, ρ, ρ = f vm θ; µ, κ f N z; mθ; µ, κ, m, σ, ρ, ρ, σ ρ ρ, f WNT θ, ψ; m, m, σ, σ, ρ = f N θ + πp, ψ + πp ; m, m, σ, σ, ρ, p = p = where mθ; µ, κ, m, σ, ρ, ρ = m + σκ {ρ cosθ cosµ + ρ sinθ sinµ}, f N ; m, σ stands for the density of a N m, σ and f N, ; m, m, σ, σ, ρ represents the density of a bivariate normal with mean vector m, m T, marginal variances σ and σ and correlation coefficient ρ. Among others, these two distributions will be employed along the different chapters. For example, the last two ones appear in Chapter 6 and the relation. is crucial for Chapter.

.. Contributions of the thesis 5 Figure.: From right to left, densities corresponding to: MS π, 5,,,, ; density. with a vm 5π, link and marginals N, and circular uniform; density.

25 .. Contributions of the thesis 5 Figure.: From right to left, densities corresponding to: MS π, 5,,,, ; density. with a vm 5π, link and marginals N, and circular uniform; density. with a vmπ, 7 link and marginals JP,, and uniform circular; WNT, π 6,,,. Samples of size n = are drawn. Finally, it should be noted that directional data also arise related to or as particular cases of more general spaces, as it happens for example in statistical shape analysis see Dryden and Mardia 998 and Kendall et al. 999 for a review on the topic or when considering statistics on Riemannian manifolds see Bhattacharya and Bhattacharya and references therein.. Contributions of the thesis Parametric methods have played a predominant role in the development of statistical inference for the analysis of directional data see Mardia 97 and Watson 98. Later publications, such as Fisher 99, Fisher et al. 99, Mardia and Jupp, Jammalamadaka and SenGupta and Pewsey et al. also devoted much of their attention to the use of parametric techniques. These methods rely on the assumption that a certain parametric hypothesis in the stochastic generating process holds. For example, inference on the unknown density of a directional random variable is usually done by assuming a certain density model, up to the determination of some unknown parameters which are estimated from the data. Whereas this procedure leads to optimal results in terms of efficiency if the parametric assumption holds, the estimation can be totally misleading if the assumption fails. On the other hand, nonparametric methods do not rely on strong parametric assumptions on the stochastic generating process, except for some mild smoothness conditions. The main advantage is that nonparametric methods always provide reasonable solutions for inference in general, no matter if a parametric assumption holds or not. Obviously, a nonparametric method is not optimal compared with a parametric competitor designed ad hoc for a parametric scenario, but still very useful. For example, the comparison of a parametric and a nonparametric fit leads to a so called goodness-of-fit test, which formally checks if the parametric hypothesis is plausible given the sample information. The aim of this thesis is to provide new methodological tools for nonparametric inference with directional and linear data. Specifically, nonparametric methods are obtained for both estimation and testing, for the density and the regression curves, in situations where directional random variables are present, that is, directional, directional-linear and directional-directional random variables. In what follows, short states of the art on these topics are given jointly with the contributions of the thesis, referring to the papers providing Chapters 7, the main core of the manuscript. See also Figure. for a diagram with the main references and contributions.

26 6 Chapter. Introduction Nonparametric inference with directional and linear data Density function Regression function Directional Directional-linear or directional-directional Directional predictor and linear response Linear predictor and directional response Estimation Hall et al. 987 Bai et al. 988 Taylor 8 Oliveira et al. García-Portugués Future work: Section 8. Testing Zhao and Wu Boente et al. Future work: Section 8. Estimation Carnicero et al. García-Portugués et al. a García-Portugués et al. b Future work: Section 8. Testing García-Portugués et al. a García-Portugués et al. b Future work: Section 8. Software Agostinelli and Lund Oliveira et al. Future work: Section 8. Estimation Wang et al. Di Marzio et al. 9 Di Marzio et al. García-Portugués et al. Future work: Section 8. Testing Deschepper et al. 8 García-Portugués et al. Estimation Boente and Fraiman 99 Di Marzio et al. Di Marzio et al. Testing Figure.: Diagram with the contributions of the thesis marked in bold and a short state of the art with the main references in the scope of the thesis. Future research lines described in Chapter 8 are included in italics.

27 .. Contributions of the thesis 7 A. Density function. Consider X and Y two directional variables and Z a linear one. Let f X, f X,Z and f X,Y be their directional, directional-linear and directional-directional density functions, respectively. A.. Estimation. The objective is the estimation of f X, f X,Z and f X,Y by kernel smoothing methods. The copula density is also estimated for circular-linear and circularcircular cases. State of the art. Kernel density estimation for f X was firstly considered by Hall et al. 987 and Bai et al. 988, who established its basic asymptotic properties, and was later studied by Klemelä. Taylor 8 and Oliveira et al. proposed bandwidth selection rules for the circular case, the latter based in the asymptotic error expression stated in Di Marzio et al. 9. For the estimation of the densities f X,Z and f X,Y with circular variables, Fernández- Durán 7 introduced a parametric method based on the copula structure given by Johnson and Wehrly 978. A fully nonparametric estimation of the copula density employing Bernstein polynomials was given in Carnicero et al.. Contributions. A new nonparametric method for estimating circular-linear and circular-circular densities from the estimation of the copula structure of Johnson and Wehrly 978 is presented in García-Portugués et al. a. The method considers, among other approaches, a modification of the kernel density estimator of Gijbels and Mielniczuk 99. Since this procedure is hardly extensible to higher dimensions, in García-Portugués et al. b a new kernel density estimator for f X,Z is given, which avoids the estimation via copulas. Exact error expressions are derived for the density estimators of f X,Z and f X,Y. These expressions are used in García-Portugués to set up new bandwidth selection rules for the kernel density estimator of f X. A.. Testing. The two goals are: i test if X and Z are independent, i.e., test H : f X,Z, = f X f Z holds; ii test if f X,Z has a particular parametric form, i.e., if H : f X,Z {f θ : θ Θ} holds. Similarly with f X,Y instead of f X,Z. State of the art. Up to the author s knowledge, the only goodness-of-fit test for parametric directional densities was proposed by Boente et al.. It is based on the Central Limit Theorem CLT of Zhao and Wu for the Integrated Squared Error ISE of the kernel density estimator for f X. These papers can be regarded as directional analogues of Fan 99 or Bickel and Rosenblatt 97 and Hall 98, respectively. There exist also correlation based tests for detecting circular-linear association, such as the ones given by Mardia 976, Johnson and Wehrly 977 and Fisher and Lee 98. Contributions. A test for assessing the independence between a directional and a linear random variable also adaptable to the directional-directional case is given in García-Portugués et al. a. The test statistic can be seen as an analogue of Rosenblatt and Wahlen 99 or Rosenblatt 975, since it considers the squared distance between the estimator of f X,Z and the product of the estimators of f X and f Z. The CLT for the ISE of the directional-linear and directionaldirectional estimator is obtained in García-Portugués et al. b. This serves as a keystone to derive the asymptotic distribution of the independence test and

28 8 Chapter. Introduction also goodness-of-fit tests for directional-linear and directional-directional parametric densities. The consistency of a bootstrap resampling method for calibration is also proved. B. Regression function. Let X be a directional variable and Y and ε two linear variables. It is assumed that a regression model for Y over X holds, that is Y = mx + σxε, with m = E Y X = and σ = Var Y X =. B.. Estimation. The objective is the estimation of m by kernel smoothing methods using a local linear estimator. State of the art. An adaptation of the Nadaraya-Watson estimator for the regression function m was given by Wang et al. who derived its law of the iterated logarithm. In Di Marzio et al. 9 a local polynomial estimator for m, when the predictor is circular, is presented. This approach was lately considered in Di Marzio et al. for regression with circular response. Di Marzio et al. proposed a different local linear estimator with either directional predictor or response based on Taylor expansions constructed with the tangent normal decomposition. An earlier definition of a local estimator with directional response and linear predictor was given in Boente and Fraiman 99. Contributions. A projected local linear estimator for the regression function m is considered in García-Portugués et al.. The estimator is motivated by a modified Taylor expansion designed to avoid the overparametrization that appears when considering the classical local linear estimator. Asymptotic bias, variance and normality of the estimator are provided, as well as the equivalent kernel formulation. Particular cases of the estimator include the one from Wang et al. and the local linear estimator with circular predictor of Di Marzio et al. 9. B.. Testing. The goal is to test if m belongs to a class of parametric regression functions, i.e., if the hypothesis H : m {m θ : θ Θ} holds. State of the art. Deschepper et al. 8 provided a test for the significance of a linear response on a circular predictor, which is, up to the author s knowledge, the only nonparametric test in the regression setting with directional variables. A resampling mechanism for the calibration of the test statistic was also proposed in the mentioned reference. Contributions. A goodness-of-fit test for parametric regression models with directional predictor and linear response is presented in García-Portugués et al.. The projected local linear estimator is used to construct a test statistic that measures the squared distance between this nonparametric estimator and a smoothed version of the parametric one similar to the one of Härdle and Mammen 99, using either local constant or linear fits. The asymptotic distribution of the test statistic and its power against local alternatives are addressed, together with a consistent resampling procedure.. Real datasets Along the thesis, different data examples have been considered to motivate and illustrate the new methodologies. In this section an exhaustive description of the real datasets, most of them

29 .. Real datasets 9 original except for the protein angles and part of the Portuguese wildfires is provided. For a collection of classical datasets on directional data see Fisher 99, Fisher et al. 99 and Mardia and Jupp... Wind direction The coal power plant of As Pontes W, 6 6 N, located in the northwest of Spain, has one of the highest electricity power generation capacity among the power plants in the country. The power plant is able to generate up to megawatts, but unfortunately at the expense of a considerable emission of pollutants. Due to the high concentration of pollutants and the serious consequences in the environment that acid rain causes produced by high concentrations of sulphur dioxide, a series of precautionary measures designed to reduce the emissions of the power plant were applied since 5. The measurement of different pollutants concentration, including sulphur dioxide SO, is controlled by means of a network of monitoring stations located around the plant, that also measure meteorological variables of interest. Among these variables, the direction in which the wind blows is recorded, since it plays a predominant role on the dissemination of particles in the atmosphere. The data application in Chapter is focused on the relation between SO concentration and wind direction in a monitoring station located at the northeast of the power plant B station, 7 W, 5 N. The aim is to check if wind blowing from the power plant carries higher concentrations of SO and the effectiveness of the implemented precautionary measures. For this purpose, two datasets were obtained for the months of January and from minutely recordings at station B. After that, the following steps were performed: not available observations were omitted; data were hourly averaged in order to mitigate serial dependence; a slight perturbation was applied to avoid serial repeated data arising from limitations of the measuring devices; the SO sample was transformed using a Box-Cox transformation to mitigate its skewness. This procedure results in a pair of datasets with 76 and 7 observations for and, respectively... N N..8 N W E W E W E S S.9.. S Figure.5: Rose diagrams for wind direction in station B for January left, January center and in station A mourela in June right. The first two ones show the average SO concentration in µg/m associated with winds coming from each partition of the windrose. On the other hand, the data application in Chapter includes the study of the wind direction in a station closer to the power plant A Mourela station, W, N, in order to determine the directions where the pollutants are more likely to be spread. The

30 Chapter. Introduction data acquisition was performed as described previously but recording only the wind direction, omitting steps and and considering measurements from June. Figure.5 shows three rose diagrams circular histograms summarizing the information of the datasets. The first two ones are from the station B before and after the precautionary measures. While in there is a high SO concentration associated to the south and southwest winds coming from the power plant, this is not the case for, where average SO concentrations are roughly constant... Hipparcos dataset The Hipparcos space astrometry mission was a carried out by the European Space Agency between in order to pinpoint the position of more than one hundred thousand stars. The massive enumeration of stars obtained from the mission were collected in the Hipparcos catalogue Perryman, 997 and made openly accessible. A decade later, a new revision of the raw data was carried out by van Leeuwen 7. During this period, the advances on the measuring techniques allowed to determine with higher precision the exact position of the satellite during the mission. As a consequence, this revised version of the dataset presents a significant improvement in the overall reliability of the astrometric catalogue. This is the dataset that has been considered in the application in Chapter and can be downloaded from the VizieR catalogue service Ochsenbein et al.,. The number of stars in the dataset is n = Since stars are objects that are continuously moving, the measurements in the Hipparcos catalogue were done with respect to their positions in a common reference date. This concept is known as epoch Ep in astronomy and it was fixed to the median year with respect to the duration of the mission, Ep The position of stars is referred to the position that occupy in the celestial sphere, i.e., the location in the earth surface that arises as the intersection with the imaginary line that joins the centre of the idealized earth perfectly spherical with the star. This is parametrized by a couple of angles λ, β, λ π, π, β, π, so that x = cos β cos λ, x = cos β sin λ, x = sin β. The centre of the celestial sphere is placed at the centre of the Milky Way and its equator corresponds to the galactic plane, this is, the rotation plane of the galaxy. Point representation in this coordinate system is known as galactic coordinates and is very popular in astronomy due to its easy interpretation. The usual way of representing spherical surfaces in the Hipparcos catalogue and in astronomy in general is the Aitoff projection. This transformation projects the sphere surface inside an ellipse with major semi-axis twice the minor semi-axis, whose longitude is R. The point x, y inside the ellipse is given by { x = R cos β sin λ/ / + cos β cos λ/, y = R sin β / + cos β cos λ/. This projection does not conserve distances but it does preserve the area the proportions between areas of regions in the sphere and areas of the projected regions remain constant.

31 .. Real datasets Figure.6 shows the Aitoff projection of the histogram of the stars counts as presented in the Hipparcos catalogue. A smoothed version of this figure can be obtained by replacing the histogram with a kernel density estimator for spherical data that takes into account the continuous nature of the observations. This is done in Chapter employing a suitable bandwidth selector in the kernel density estimator. Figure.6: Number of observed stars per square degree, in galactic coordinates cell size, extracted from Figure.. of Perryman 997. The higher concentrations of stars are located around the equator galactic plane and two spots that represent the Orion s arm left and the Gould s Belt right... Portuguese wildfires Chapters 5 and 6 analyse directional data arising from the main orientations of wildfires occurred in Portugal from 985 to 5. This massive data collection contains the n = 687 fire perimeters see right plot of Figure.7 together with their log burnt areas and was acquired from the imagery of the Landsat satellites. Imagery covering the mainland of Portugal was obtained before and after the fire season, providing a snapshot of the fires that occurred during the season. Annual fire perimeters were derived through a semi-automatic procedure that starts with supervised image classification and is followed by manual editing Barros et al.,. The Minimum Mapping Unit MMU of the satellite is 5 hectares, which, although is not able to capture the smallest fires, allows to map up to the 9% of total burnt area. Watersheds play an important role on the study of wildfires and their orientation. In Barros et al. the authors delimited watersheds see left plot of Figure.7 in which the wildfires are grouped, studying which of them showed a preferential alignment with the fires orientation. The orientation of the different object perimeters either watersheds or wildfires is determined by the first principal component PC obtained from the points that constitute the object s boundary, either in bidimensional space defined by each vertex s latitude and longitude coordinates, or in tridimensional space, taking also into account the altitude. Then, the PC corresponds to an axis that passes through the object mass centre and that maximizes the variance of the projected vertices, represented in R or in R see right plot of Figure.7.

A.M.G. Barros et al. / Forest Ecology and Management 6 98 7 hectares. Along coastal areas, watersheds smaller than, hectares were included as long as they contained at least 5 fires.

32 A.M.G. Barros et al. / Forest Ecology and Management hectares. Along coastal areas, watersheds smaller than, hectares were included as long as they contained at least 5 fires. The final map was edited to exclude international watersheds, since we do not have the perimeters of fires occurring in the Spanish portion of these watersheds. This resulted in a total of watersheds, with sizes varying from, hectares to 77,85 hectares. Each fire was considered belonging to the watershed con- Chapter. Introduction taining its centroid. The watersheds correspond to 8% of the Portuguese mainland territory and contain,59 fire perimeters, accounting for 9% of the overall area burned. The number of fires per watershed ranges from 5 to 98, while the area burned varies from 5 hectares to 8,9 hectares Fig.. In the two-dimensional case, the orientation is an axial object the orientation N/S is also S/N. These orientations can be encoded.. Orientation by vs direction an angular variable θ, π with period π, so θ is a usual circular variable. With this Circular codification, data refers to datathe measured angles on angular, scale, π, in π, degrees or radians. There are two kinds of circular data, vectorial π represent the E/W, NE/SW, N/S and NW/SE orientations, directional respectively. and axial orientational Incircular thedata. three-dimensional Vectorial data space, the orientation is consists of a directed line where both the departure point and coded by a pair of angles θ, direction ϕ ofusing movementspherical are known, e.g., the coordinates vanishing directions ent shades of grey distinguish the range of the intervals described above. see Figure., where θ, π of homing pigeons. Axial data consists of an axis or plays the same role as the previous case and ϕ, π undirected measures line, the inclination Φ = π where either end of the line can be taken as the direction of movement, such as a fracture in a rock exposure Fisher, 99. for flat slope and ϕ = for vertical; The only analysis positive of circular datangles requires theare definition considered of an origin, since negative ones lead to the and a sense of rotation clockwise or counterclockwise same inclination. Therefore, Jammalamadaka points and with Sengupta, spherical. In this work coordinates we computed θ, ϕ, which lie on the upper the orientation of each watershed and fire event. These orientations correspond as a realization to axial data, since we oflack a information spherical on igni- semisphere, can be considered variable. tion points or the actual fire spread direction. We considered Fig.. Classification of axial data in terms of compass orientation. For each fire and watershed perimeter an orientation value, h or, is calculated. Orientation values range between and 8 and were classified into compass classification as a function of h or as follows: N/S,h or;.5^h or 57.5;8; NE/ SW,h or.5;67.5; E/W,h or67.5;.5; SE/NW,h or.5;57.5. Differ- true north N as the origin and measured orientations clockwise. Given that all orientations are axial, it follows that North, N is equivalent to 8 South, S Fig.. For the sake of simplicity, we shall refer to axial measurements in the compass classifications: N/S, NE/SW, E/W and SE/NW, which can be regarded as equivalent to the orientations S/N, SW/NE, W/E and NW/SE, respectively. Fig.. Fire perimeter vertices are represented by its X and Y coordinates, in a bi-dimensional space. From all possible axis passing through the object center of mass, the first principal component axis PC axis, corresponds to the axis that maximizes the variance among projection of all points that constitute the object boundary and also reflects the longest diagonal of the object. The second principal component axis PC axis is orthogonal to the PC axis. In this example principal component analysis of the vertices resulted in a PC axis with NE/SW.8 orientation. This orientation is measured considering True North as and rotating clockwise. Figure.7: Number of hectares burnt in in each watershed delineated by Barros et al. left plot and main orientation of a wildfire obtained from the PC of the perimeter right plot; extracted from Barros et al.. In Chapter 5 the independence between circular or spherical orientations of the different objects and burnt areas is tested with the whole dataset. Chapter 6 analyses a reduced dataset obtained by considering the average wildfire circular orientation in each of the watersheds and the mean of the burnt area, yielding a dataset of size n =. A goodness-of-fit test is applied for a parametric circular-linear model to assess its suitability for explaining the dataset... Protein angles A scientific field where directional statistics is called to play an important role is proteomics. Biomolecular structures like proteins are often expressed in terms of the dihedral angles that describe the rotations of the backbone around the bonds between atoms N-C α angle φ and C α -C angle Ψ. The scatterplot of these pairs of angles in a protein, known as the Ramachadran plot, provides an easy way to view the allowed torsion combinations of the backbone. The distribution of the dihedral angles and its modelling is a key step in the study of the so-called protein folding problem, one of the main open problems in biology nowadays. See Hamelryck et al. and references therein for deeper insights on proteomics and directional methods used in the field.

33 .. Real datasets The dataset analysed in Chapter 6 contains pairs of dihedral angles in the primary structure of 9 proteins. This dataset was initially studied in Fernández-Durán 7 by the use of the copula structure of Wehrly and Johnson 979 and parametric models for the marginal densities and link function. The application in Chapter 6 concerns a goodness-of-fit test for such parametric models. The dataset is formed by pairs of angles from segments of the type alanine-alanine-alanine in alanine amino acids, extracted from a representative sample of 9 proteins retrieved from the July list of recommended proteins from the Protein Data Bank Berman et al.,. The dataset is available in the ProteinsAAA object of the R package CircNNTSR Fernández-Durán and Gregorio-Domínguez,. ψ 8º 9º º 9º 8º 8º 9º º 9º 8º Figure.8: Representation of the dihedral angles in the primary protein structure left, adapted from Richardson and Ramachadran plot of the dihedral angles of the alanine-alanine-alanine segments contained in the ProteinsAAA object of Fernández-Durán and Gregorio-Domínguez. φ..5 Text mining Directional statistics has also interesting applications in high dimensional settings. A good example is text mining, where documents are usually represented as normalized vectors in Ω D being D the number of words in a dictionary. This concept is the celebrated vector space model: a collection of documents known as corpus d,..., d n is codified by the set of vectors {d i,..., d id } n i= the document-term matrix with respect to a dictionary {w,..., w D }, such that d ij represent the frequency of the dictionary s j-th word in the document d i. In this scenario a normalization is required to avoid unbalances between large and small documents. Consider for example the case where a document is produced by copying and pasting N times another document: both vectors will have the same direction but the length of the former will be N times larger, thus make them appear to be rather different elements in R D, although they share the same information. Taking the Euclidean norm then d i / d i Ω D and hence the corpus can be regarded as a sample of directional data. The data application in Chapter 7 concerns a corpus extracted from the news aggregator Slashdot wwww.slashdot.org: a goodness-of-fit test is applied for a linear model to assess its suitability for explaining the popularity of the news from their content. The website publishes daily news/stories that are submitted and evaluated by users. The site is mainly focused on technology and science, but it also covers news related with politics or digital rights. Its main structure

34 Chapter. Introduction can be seen in Figure.9, which essentially can be regarded as a series of news entries. Each of them includes a title, a short abstract of the news and information regarding the submission user, date, keywords,.... Under the abstract a link shows the number of comments in the discussion section attached to the entry. Obviously, the degree of participation in the discussion section varies notably according to the news topic. For example, news related with politics and with proprietary software tend to be more controversial and generate more discussion, whereas news with high specific scientific content often have fewer comments. Figure.9: View of the Slashdot website on July 7,. The dataset was acquired as follows. First, titles, summaries and number of comments in all the news appeared in were downloaded by parsing automatically the news archive, obtaining a collection of n = 8 documents. After that, the next steps were performed using the text mining R library tm Meyer et al., 8 and self-programmed code: merge titles and summaries in the same document, omitting user submission details; deletion of HTML codes; conversion to lowercase; deletion of stop words retrieved from the stop words lists given in tm and MySQL, punctuation, white spaces and numbers; 5 stemming of words to reduce them to their root form; 6 pruning to remove words too rare or too frequent more than the 5% of the processed words only appeared in a single document. The last step was performed by considering only the words that appear within 58 and 96 documents, which were D = 58. These quantities correspond to quantiles 95% and 99.95% of the document frequency i.e., the number of documents containing a particular word empirical distribution. Finally, the corpus was stored as a normalized document-term matrix using the dictionary formed by the D words.

35 .. Manuscript distribution 5. Manuscript distribution The main core of the thesis are Chapters to 7. Each of them contains an original article about a specific topic related with the main guiding theme: nonparametric inference with directional and linear data. Therefore, each chapter is presented as a self-contained paper with proper abstract, sections, appendices and references, just in the same way as it was published, accepted for publication or submitted. The reference of the paper that gives rise to the chapter is included in the front page. At the time of presenting this manuscript, the papers from Chapters 5 have been published, the paper from Chapter 6 has been accepted and the one from Chapter 7 has been submitted. Short summaries of all the chapters and their associated papers are given below. Chapter : Introduction. An introduction to the field of directional statistics is presented in this first chapter. The introduction describes the state-of-the art and main references in the topics where the thesis presents new contributions. For a better understanding, an explanatory diagram of the contributions of this thesis is given. The chapter also describes the data sets used along the manuscript and the structure of the thesis. Chapter : Density estimation by circular-linear copulas García-Portugués et al., a. This chapter presents different approaches to the estimation of a circular-linear or circular-circular density by the use of copulas and the dependence structure of Johnson and Wehrly 978. A nonparametric procedure is used to analyse the relationship between the wind direction and the SO concentration in a monitoring station near As Pontes coal power plant, in order to study the effectiveness of precautionary measures for pollutants reduction. Chapter : Kernel density estimation for directional-linear data García-Portugués et al., b. A natural alternative to the nonparametric method given in the previous chapter is a kernel density estimator applied on the data directly, i.e. without requiring copula modelling. The kernel density estimator for directional-linear data is presented in this chapter, providing results for bias, variance and asymptotic normality. Exact error expressions are obtained for the directional-linear kernel density estimator but also for the directional one, setting the basis for Chapter. Chapter : Bandwidth selectors for kernel density estimation with directional data García-Portugués,. Based on the asymptotic and exact error expressions given in Chapter, three new bandwidth selectors are proposed. The first selector is a natural analogue of the circular selector given in Taylor 8 while the other two arise from combining mixtures of von Mises distributions with the asymptotic or exact error criteria. The performance of the proposed selectors are compared in an extensive simulation study and the best selector is illustrated with the datasets from wind directions and from the Hipparcos satellite. Chapter 5: A nonparametric test for directional-linear independence García-Portugués et al., a. Using the estimator given in Chapter, a test based on the squared distance between the joint kernel estimator and the product of marginal directional and linear kernel density estimators is considered. A closed expression is given for the statistic and a resampling method based on permutations is employed to calibrate the test statistic. The test performance is analysed in a simulation study under a variety of situations and is applied to account for the influence of the orientation of the wildfires on their size in the Portuguese wildfires dataset.

36 6 Chapter. Introduction Chapter 6: Central limit theorems for directional and linear random variables with applications García-Portugués et al., b. This chapter is devoted to derive a CLT for the ISE of the kernel density estimator in Chapter. The result is used to establish the convergence in distribution of the independence test given in Chapter 5 and to analyse a new goodness-offit test for parametric families of directional-linear densities. A consistent bootstrap strategy is given for the goodness-of-fit test and illustrated in an extensive simulation study. The goodnessof-fit test is applied to the protein angles and Portuguese wildfires datasets. Chapter 7: Testing parametric models in linear-directional regression García-Portugués et al.,. A new local linear estimator is proposed for the estimation of the regression function with directional predictor and linear response, establishing its different properties. Based on this estimator, a goodness-of-fit test is constructed to check the null hypothesis that the unknown regression function belongs to a certain parametric family. The asymptotic distribution of the test statistic is obtained jointly with the power under local alternatives and a consistent bootstrap algorithm. The test is illustrated in a simulation study and is applied to the Slashdot dataset. Chapter 8: Future research. Different ideas for future projects are outlined in this chapter: new bandwidth selectors in nonparametric linear-directional regression, a kernel density estimator for directional data under rotational symmetry, an R package implementing the methods described in this thesis and a goodness-of-fit test for the Johnson and Wehrly 978 copula structure. Appendix A: Supplement to Chapter 6. This supplement contains the proofs of the technical lemmas used in Chapter 6, exhaustive details of the simulation study, further results for the independence test and an extended data application. Appendix B: Supplement to Chapter 7. Particular cases of the projected local linear estimator, proofs of the technical lemmas and further results for the simulation study for Chapter 7 are collected in this appendix. Resumen en castellano. This last part presents a short summary of the thesis in Spanish. References Agostinelli, C. and Lund, U.. R package circular: circular statistics version.-7. Bai, Z. D., Rao, C. R., and Zhao, L. C Kernel estimators of density function of directional data. J. Multivariate Anal., 7: 9. Banerjee, A., Dhillon, I. S., Ghosh, J., and Sra, S. 5. Clustering on the unit hypersphere using von Mises-Fisher distributions. J. Mach. Learn. Res., 6:5 8. Barros, A. M. G., Pereira, J. M. C., and Lund, U. J.. Identifying geographical patterns of wildfire orientation: a watershed-based analysis. Forest. Ecol. Manag., 6:98 7. Batschelet, E. 98. Circular statistics in biology. Academic Press, Inc., London-New York.

37 References 7 Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N., and Bourne, P. E.. The protein data bank. Nucleic Acids Res., 8:5. Bhattacharya, A. and Bhattacharya, R.. Nonparametric inference on manifolds, volume of Institute of Mathematical Statistics Monographs. Cambridge University Press, Cambridge. Bickel, P. J. and Rosenblatt, M. 97. On some global measures of the deviations of density function estimates. Ann. Statist., 6:7 95. Bingham, M. S. and Mardia, K. V Maximum likelihood characterization of the von Mises distribution. In Patil, G. P., Kotz, S., and Ord, J. K., editors, A modern course on statistical distributions in scientific work, volume 7 of NATO Advanced Study Institutes Series. Series C, Mathematical and Physical Sciences., pages D. Reidel, Dordrecht. Boente, G. and Fraiman, R. 99. Nonparametric regression for directional data. Trabajos de Matemática, 76. Boente, G., Rodríguez, D., and González-Manteiga, W.. Goodness-of-fit test for directional data. Scand. J. Stat., : Carnicero, J. A., Ausín, M. C., and Wiper, M. P.. Non-parametric copulas for circularlinear and circular-circular data: an application to wind directions. Stoch. Environ. Res. Risk Assess., 78:99. Deschepper, E., Thas, O., and Ottoy, J. P. 8. Tests and diagnostic plots for detecting lack-of-fit for circular-linear regression models. Biometrics, 6:9 9. Di Marzio, M., Panzera, A., and Taylor, C. C. 9. Local polynomial regression for circular predictors. Statist. Probab. Lett., 799: Di Marzio, M., Panzera, A., and Taylor, C. C.. Non-parametric regression for circular responses. Scand. J. Stat., :8 55. Di Marzio, M., Panzera, A., and Taylor, C. C.. Nonparametric regression for spherical data. J. Amer. Statist. Assoc., 956: Dryden, I. L. 5. Statistical analysis on high-dimensional spheres and shape spaces. Ann. Statist., : Dryden, I. L. and Mardia, K. V Statistical shape analysis. Wiley Series in Probability and Statistics. Probability and Statistics. John Wiley & Sons, Chichester. Fan, Y. 99. Testing the goodness of fit of a parametric density function by kernel method. Economet. Theor., :6 56. Fernández-Durán, J. J. 7. Models for circular-linear and circular-circular data constructed from circular distributions based on nonnegative trigonometric sums. Biometrics, 6: Fernández-Durán, J. J. and Gregorio-Domínguez, M. M.. CircNNTSR: an R package for the statistical analysis of circular data using NonNegative Trigonometric Sums NNTS models. R package version..

38 8 Chapter. Introduction Fisher, N. I. 99. Statistical analysis of circular data. Cambridge University Press, Cambridge. Fisher, N. I. and Lee, A. J. 98. Biometrika, 68: Nonparametric measures of angular-linear association. Fisher, N. I., Lewis, T., and Embleton, B. J. J. 99. Statistical analysis of spherical data. Cambridge University Press, Cambridge. García-Portugués, E.. Exact risk improvement of bandwidth selectors for kernel density estimation with directional data. Electron. J. Stat., 7: García-Portugués, E., Barros, A. M. G., Crujeiras, R. M., González-Manteiga, W., and Pereira, J. a. A test for directional-linear independence, with applications to wildfire orientation and size. Stoch. Environ. Res. Risk Assess., 85:6 75. García-Portugués, E., Crujeiras, R. M., and González-Manteiga, W. a. Exploring wind direction and SO concentration by circular-linear density estimation. Stoch. Environ. Res. Risk Assess., 75: García-Portugués, E., Crujeiras, R. M., and González-Manteiga, W. b. Kernel density estimation for directional-linear data. J. Multivariate Anal., :5 75. García-Portugués, E., Crujeiras, R. M., and González-Manteiga, W. b. Central limit theorems for directional and linear data with applications. Statist. Sinica, to appear. García-Portugués, E., Van Keilegom, I., Crujeiras, R. M., and González-Manteiga, W.. Testing parametric models in linear-directional regression. arxiv:9.56. Gatto, R.. The generalized von Mises-Fisher distribution. In Wells, M. T. and SenGupta, A., editors, Advances in directional and linear statistics, pages Physica, Heidelberg. Gijbels, I. and Mielniczuk, J. 99. Estimating the density of a copula function. Comm. Statist. Theory Methods, 9:5 6. Hall, P. 98. Central limit theorem for integrated square error of multivariate nonparametric density estimators. J. Multivariate Anal., : 6. Hall, P., Watson, G. S., and Cabrera, J Kernel density estimation with spherical data. Biometrika, 7: Hamelryck, T., Mardia, K., and Ferkinghoff-Borg, J., editors. Bayesian methods in structural bioinformatics. Statistics for Biology and Health. Springer, Berlin. Härdle, W. and Mammen, E. 99. Comparing nonparametric versus parametric regression fits. Ann. Statist., : Jammalamadaka, S. R. and SenGupta, A.. Topics in circular statistics, volume 5 of Series on Multivariate Analysis. World Scientific Publishing, River Edge. Johnson, R. A. and Wehrly, T Measures and models for angular correlation and angularlinear correlation. J. Roy. Statist. Soc. Ser. B, 9: 9. Johnson, R. A. and Wehrly, T. E Some angular-linear distributions and related regression models. J. Amer. Statist. Assoc., 76:6 66.

39 References 9 Jona-Lasinio, G., Gelfand, A., and Jona-Lasinio, M.. Spatial analysis of wave direction data using wrapped gaussian processes. Ann. Appl. Stat., 6: Jones, M. C. and Pewsey, A. 5. A family of symmetric distributions on the circle. J. Amer. Statist. Assoc., 7: 8. Kendall, D. G., Barden, D., Carne, T. K., and Le, H Shape and shape theory. Wiley Series in Probability and Statistics. John Wiley & Sons, Chichester. Klemelä, J.. Estimation of densities and derivatives of densities with directional data. J. Multivariate Anal., 7:8. Mardia, K. V. 97. Statistics of directional data, volume of Probability and Mathematical Statistics. Academic Press, London. Mardia, K. V Linear-circular correlation coefficients and rhythmometry. Biometrika, 6: 5. Mardia, K. V. and Jupp, P. E.. Directional statistics. Wiley Series in Probability and Statistics. John Wiley & Sons, Chichester, second edition. Mardia, K. V. and Sutton, T. W A model for cylindrical variables with applications. J. Roy. Statist. Soc. Ser. B, :9. Meyer, D., Hornik, K., and Feinerer, I. 8. Text mining infrastructure in R. J. Stat. Softw., 55: 5. Nelsen, R. B. 6. An introduction to copulas. Springer Series in Statistics. Springer, New York, second edition. Ochsenbein, F., Bauer, P., and Marcout, J.. The VizieR database of astronomical catalogues. Astron. Astrophys. Supp. Ser., :. Oliveira, M., Crujeiras, R. M., and Rodríguez-Casal, A.. A plug-in rule for bandwidth selection in circular density estimation. Comput. Statist. Data Anal., 56: Oliveira, M., Crujeiras, R. M., and Rodríguez-Casal, A.. Nonparametric circular methods for exploring environmental data. Environ. Ecol. Stat., : 7. Oliveira, M., Crujeiras, R. M., and Rodríguez-Casal, A.. NPCirc: nonparametric circular methods. J. Stat. Softw., to appear. Olver, F. W. J., Lozier, D. W., Boisvert, R. F., and Clark, C. W., editors. NIST handbook of mathematical functions. Cambridge University Press, Cambridge. Perryman, M. A. C The Hipparcos and Tycho catalogues, volume of ESA SP. ESA Publication Division, Noordwijk. Pewsey, A., Neuhäuser, M., and Ruxton, G. D.. Circular statistics in R. Oxford University Press, Oxford. Richardson, J. S.. Protein backbone PhiPsiOmega drawing. Retrieved August 6, from drawing.jpg.

40 Chapter. Introduction Rosenblatt, M A quadratic measure of deviation of two-dimensional density estimates and a test of independence. Ann. Statist., :. Rosenblatt, M. and Wahlen, B. E. 99. A nonparametric measure of independence under a hypothesis of independent components. Statist. Probab. Lett., 5:5 5. Shieh, G. S. and Johnson, R. A. 5. Inferences based on a bivariate distribution with von Mises marginals. Ann. Inst. Statist. Math., 57: Taylor, C. C. 8. Automatic bandwidth selection for circular density estimation. Comput. Statist. Data Anal., 57:9 5. van Leeuwen, F. 7. Hipparcos, the new reduction of the raw data, volume 5 of Astrophysics and Space Science Library. Springer, Dordrecht. Wang, X., Zhao, L., and Wu, Y.. Distribution free laws of the iterated logarithm for kernel estimator of regression function based on directional data. Chinese Ann. Math. Ser. B, : Watson, G. S. 98. Statistics on spheres, volume 6 of University of Arkansas Lecture Notes in the Mathematical Sciences. John Wiley & Sons, New York. Wehrly, T. E. and Johnson, R. A Bivariate models for dependence of angular observations and a related Markov process. Biometrika, 67: Zhao, L. and Wu, C.. Central limit theorem for integrated square error of kernel estimators of spherical density. Sci. China Ser. A, :7 8.

41 Chapter Density estimation by circular-linear copulas Abstract The study of environmental problems usually requires the description of variables with different nature and the assessment of relations between them. In this work, an algorithm for flexible estimation of the joint density for a circular-linear variable is proposed. The method is applied for exploring the relation between wind direction and SO concentration in a monitoring station close to a power plant located in Galicia NW-Spain, in order to compare the effectiveness of precautionary measures for pollutants reduction in two different years. Reference García-Portugués, E., Crujeiras, R. M. and González-Manteiga, W.. Exploring wind direction and SO concentration by circular-linear density estimation. Stoch. Environ. Res. Risk Assess., 7: Contents. Introduction Background Some circular and circular-linear distributions Some notes on copulas Estimation algorithm Some simulation results Application to wind direction and SO concentration Exploring SO and wind direction in and Final comments References Introduction Air pollution studies require the investigation of relationships between emission sources and pollutants concentration in nearby sites. In addition, the effectiveness of environmental policies

42 Chapter. Density estimation by circular-linear copulas with the aim of pollution reduction should be checked, at least in a descriptive way, in order to assess whether the implemented precautionary measurements had worked out. Different statistical methods have been considered for the study of the relation between wind direction and pollutants concentration, both for exploratory and for inferential analysis, taking into account that wind direction is a circular variable requiring a proper statistical treatment. Wind potential assessment using descriptive methods and spectral analysis has been carried out by Shih 8. In addition, wind direction has been proved to play a significant role in detection of emission sources see Chen et al. and air quality studies see Bayraktar et al., although the wind direction is not treated as a circular variable, but discretized as a factor. Jammalamadaka and Lund 6 considered regression models for the pollutants concentration linear response over the wind direction circular explanatory variable, constructing the regression function in terms of the sine and cosine components of the circular variable. Recently, Deschepper et al. 8 introduced a graphical diagnostic tool, jointly with a test, for fit assessment in parametric circular-linear regression, illustrating the technique in an air quality environmental study. Figure.: Locations of monitoring station circle and power plant in Galicia NW-Spain. Location of station B: 7 W, 5 N. Power plant location: W, 6 6 N. From a more technical perspective, Johnson and Wehrly 978 and Wehrly and Johnson 979 presented a method for obtaining joint circular-linear and circular-circular densities with specified marginals, respectively. Fernández-Durán introduced a new family of circular distributions based on nonnegative trigonometric sums, and this idea is used in Fernández-Durán 7 in the construction of circular-linear densities, adapting the proposal of Johnson and Wehrly 978. The introduction of nonnegative trigonometric sums for modelling the circular distributions involved in the formulation of Johnson and Wehrly 978 allows for more flexible models, that may present skewness or multimodality, features that cannot be reflected through

43 .. Introduction the von Mises distribution the classical model for circular variables. However, the flexibility claimed in this proposal can be also obtained by a completely nonparametric approach. In this work, a procedure for modelling the relation between a circular and a linear variable is proposed. The relation is specified by a circular-linear density which, represented in terms of copulas, can be estimated nonparametrically. The estimation algorithm can be also adapted to a semiparametric framework, when an underlying model for the marginal distributions can be imposed, and it also comprises the classical Johnson and Wehrly 978 model. It also enables the construction of an estimation framework without imposing an underlying parametric model. The copula approach presents some computational advantages, in order to carry out a simulation study. The practical aim of this work is to explore the relation between wind incidence direction wind blowing from this direction and SO levels in a monitoring station close to a power plant located in Galicia NW-Spain. The monitoring station and the thermal power plant locations are shown in Figure.. In the power plant, energy was usually produced from the combustion of local coal, which also generates pollutants as sulphur dioxide SO. In order to reduce the emission of SO to the atmosphere, and to comply with European regulations, coal with less sulphur content has been used since 5. In addition, the power plant changed the energy production system by settling a combined process of coal and gas burning in 8. These measures were aimed to reduce the SO emissions. January January.. N N W E W E S S.9.. Figure.: Rose diagrams for wind direction in station B for left plot and right plot, with average SO concentration. The analysed data corresponds to SO and wind incidence direction measurements taken during January and January, with one minute frequency. The monitoring station B see Figure. is located in a wind farm. kilometres in the NE direction with respect to the power plant. In Figure., rose diagrams for wind direction are shown, including also the average SO concentration for each wind direction sector. It can be seen that higher values of SO are shown in. Note also that there are two dominant wind incidence directions, specifically, blowing from SW and from NE. In addition, note that the average SO values for

44 Chapter. Density estimation by circular-linear copulas winds blowing from SW are remarkably high for, which is explained by the position of the monitoring station with respect to the power plant. However, the relation between wind direction and SO levels is not clear from these representations, and the dependency or lack of dependency between them should be investigated. This work is organized as follows. Section. provides some background on circular and circularlinear random variables and a brief review on copula methods. The algorithm for estimating a circular-linear density is detailed and discussed in Section.. The finite sample properties of the algorithm, combining parametric and nonparametric approaches, are illustrated by a simulation study. The completely nonparametric version of this algorithm is applied for analysing the relation between wind direction and SO concentration in Section.. Some final comments are given in Section.5.. Background The main goal of this work is to describe the relation between wind direction and SO concentration in a monitoring station close to a power plant at two different time moments, before and after precautionary measurements to reduce SO emissions were applied. Bearing in mind the different nature of the variables and noticing that measurements from wind direction are angles, some background on circular and circular-linear random variables is introduced. This methodology will be needed in order to describe the wind direction itself and the joint relation between the two variables. For that purpose, the Johnson and Wehrly 978 family J&W in what follows of circular-linear distributions will be introduced. A general procedure, based on the copula representation of a density, allows for a more flexible estimation framework. The goal of this copula representation is twofold: firstly, the classical J&W family can be written in such a way, just involving univariate circular and linear densities; secondly, with the copula representation, flexible circular-linear relations beyond this specific model are also possible... Some circular and circular-linear distributions Denote by Θ a circular random variable with support in the unit circle S. A circular distribution Ψ for Θ assigns a probability to each direction cosθ, sinθ in the plane R, characterized by the angle θ, π see Mardia and Jupp for a survey on statistical methods for circular data. The von Mises distribution is the analogue of the normal distribution in circular random variables. This family of distributions, usually denoted by vmµ, κ, is characterized by two parameters: µ, π, the circular mean and κ, a circular concentration parameter around µ. The corresponding density function is given by ϕ vm θ; µ, κ = πi κ eκ cosθ µ, θ, π,. being I κ = π π e κ cos ω dω the modified Bessel function of first kind and order zero. The von Mises cumulative distribution function, considering the zero angle as the starting point, is defined as: Ψ vm θ; µ, κ = The uniform circular distribution, θ ϕ vm ω; µ, κ dω, θ, π. ϕ U θ =, θ, π,. π

45 .. Background 5 is obtained as a particular case of the von Mises family, for κ =. Circular density estimation can be performed by parametric methods, such as Maximum Likelihood Estimation MLE, or using nonparametric techniques, based on kernel approaches, as will be seen later. In order to explain the relation between a circular and a linear random variable in our example, wind direction and SO concentration, the construction of a joint circular-linear density will be considered. A circular-linear random variable Θ, X is supported on the cylinder S R or in a subset of it and a circular-linear density for Θ, X, namely p, must satisfy a periodicity condition in the circular argument, that is: pθ, x = pθ + πk, x, θ, π, x R, k Z, as well as the usual assumptions on taking nonnegative values and integrating one. Johnson and Wehrly 978 proposed a method for obtaining circular-linear densities with specified marginals. Denote by ϕ and f the circular and linear marginal densities, respectively, and by Ψ and F their corresponding distribution functions. Let also g be another circular density. Then pθ, x = πg π Ψθ F x ϕθfx. is a density for a circular-linear distribution for a random variable Θ, X, with specified marginal densities ϕ and f see Johnson and Wehrly 978, Theorem 5. Circular-linear densities with specified marginals can be also obtained considering the sum of the marginal distributions in the argument of the joining density in.. Circular-linear densities may include von Mises and Gaussian marginals, but the dependence between them will be specified by the joining density g. In fact, the independence model corresponds to taking ϕ U in. as the joining density. From a data sample of Θ, X, assuming that the joint density can be represented as in., an estimator of p could be obtained by the estimations of the marginals and the joining density. Wehrly and Johnson 979 proved that the construction of circular-circular distributions that is, distributions on the torus can be done similarly to., just considering prespecified circular marginal distributions. Note that. is just a construction method and not a characterization of circular-linear densities. In addition, there are no available testing procedures for checking if a certain dataset follows such a distribution. Hence, a more general approach for circular-linear density construction would be helpful. In the next section, some background on copulas will be introduced, allowing for a more flexible procedure for obtaining circular-linear densities with specified marginals, where the representation in. fits as a particular case. With this proposal, a fully nonparametric estimation procedure can be applied... Some notes on copulas Copula functions are multivariate distributions with uniform marginals see Nelsen 6 for a complete review on copulas. One of the main results in copula theory is Sklar s theorem, which, in the bivariate case, states that if F is a joint distribution function with marginals F and F then there exists a copula C such that: F x, y = CF x, F y, x, y R..

46 6 Chapter. Density estimation by circular-linear copulas If F and F are continuous distributions, then C is unique. Conversely, if C is a copula and F and F are distribution functions, then F defined by. is a bivariate distribution with marginals F and F. If the marginal random variables are absolutely continuous, Sklar s result can be interpreted in terms of the corresponding densities. Denoting by c the copula density, the bivariate density of F in. can be written as fx, y = cf x, F y f xf y, x, y R. As pointed out by Nazemi and Elshorbagy, copula modelling provides a simple but powerful way for describing the interdependence between environmental variables. In our particular setting, the nature of the variables is different, being the variables of interest linear SO concentration and circular wind direction. Circular-linear copulas, that will be denoted by C Θ,X, can be also defined taking into account the characteristics of the circular marginal, satisfying c Θ,X, v = c Θ,X, v, v,, where c Θ,X is the corresponding circular-linear copula density. Hence, a circular-linear density with marginals ϕ and f is given by: pθ, x = c Θ,X Ψθ, F x ϕθfx, θ, π, x R..5 Note that J&W s proposal can be seen as a particular case of.5. For a certain joining density g in., the corresponding copula density is given by: c Θ,X Ψθ, F x = πg π Ψθ ± F x,.6 where the sign ± refers to the possibility of considering the sum or the difference of the marginal distributions in the argument of g, as it was previously mentioned. A circular-circular density, following the result of Wehrly and Johnson 979, can be constructed similarly just considering circular marginals for Θ, Ω and guaranteeing that the copula density satisfies c Θ,Ω, v = c Θ,Ω, v and c Θ,Ω u, = c Θ,Ω u,, u, v,. For simplicity, the copula density in.6 will be denoted as J&W copula see Figure., left plot. The representation of a circular-linear density in.5 enables the construction of new families of circular-linear distributions. From Corollary..5. in Nelsen 6, a new family of circularlinear copulas with quadratic section QS copula in the linear component can be constructed. The copula densities in this family are given by c α Θ,Xu, v = + πα cosπu v, α π..7 The QS copula family is parametrized by α, which accounts for the deviation from the independence copula corresponding to α =. Figure. middle plot shows the wavy surface corresponding to α = π. The position of the three modes in the density, centred along u =, u =.5 and u =, as well as their concentration, is controlled by the value of α. A possible way to derive new copulas is through mixtures of other copulas see Nelsen 6. Thus, for any copula c, the mixture c Θ,X u, v = cu, v + c u, v + cu, v + c u, v,.8

47 Density v Density v Density v.. Estimation algorithm 7 leads to a new copula satisfying c Θ,X, v = c Θ,X, v, v,. This construction also satisfies c Θ,X u, = c Θ,X u,, u,, and provides a circular-circular copula which is also circular-linear. A parametrized copula density c α Θ,X can be obtained considering, for example, the parametric Frank copula: c α u, v = α e α e αu+v e α e αu e αv, α. The mixture copula.8 will be referred to as the reflected copula of c. The parameter α also measures the deviation from independence, which is a limit case as α tends to zero. The copula density surface can be seen in Figure. right plot. In this example, the copula density surface shows five modes concentrated in the corners and the middle point of the unit square, and the peakedness of the modes increases as α grows. These three families will be considered in the simulation study. It should be also mentioned that the copula representation poses some computational advantages in order to reproduce by simulation data samples from circular-linear distributions see Section... Finally, although this work is focused on the circular-linear case, some comments will be also made about circularcircular distributions. u u u Figure.: Copula density surfaces. Left plot: J&W copula with von Mises joining density with parameters µ = π and κ =. Middle plot: QS-copula with α = π. Right plot: reflected Frank copula with α =.. Estimation algorithm Denote by {Θ i, X i } n i= a random sample of data for the circular-linear random variables Θ, X and consider the copula representation for p in.5. In this joint circular-linear density model, three density functions must be estimated: the marginal densities ϕ and f and also the corresponding distributions and the copula density c Θ,X. A new natural procedure for estimating p is given in the following algorithm. Algorithm. Estimation algorithm. i. Obtain estimators for the marginal densities ˆϕ, ˆf and the corresponding marginal distributions ˆΨ, ˆF.

48 8 Chapter. Density estimation by circular-linear copulas ii. Compute an artificial sample { ˆΨ Θi, ˆF Xi } n i= and estimate the copula density ĉ Θ,X. iii. Obtain the circular-linear density estimator as ˆp θ, x = ĉ Θ,X ˆΨθ, ˆF x ˆϕθ ˆfx. The estimation of the marginal densities in step i can be done by parametric methods or by nonparametric procedures. For instance, a parametric estimator for ˆf respectively, for ˆF can be obtained by MLE. In the circular case, that is, for obtaining ˆϕ, MLE approaches are also possible see Jammalamadaka and SenGupta, Chapter. These estimators are consistent, although restricted to parametric families such as the von Mises distribution or mixtures of von Mises. Nonparametric kernel density estimation for linear random variables was introduced by Parzen and Rosenblatt see Wand and Jones 995 for references on kernel density estimation and the properties of this estimator have been well studied in the statistical literature. Consider {X i } n i= a random sample of a linear variable X with density f. The kernel density estimator of f in a point x R is given by ˆf h x = nh n x Xi K h i=,.9 where K is a kernel function usually a symmetric and unimodal density and h is the bandwidth parameter. One of the crucial problems in kernel density estimation is the bandwidth choice. There exist several alternatives for obtaining a global bandwidth minimizing a certain error criterion, usually the Mean Integrated Squared Error MISE, such as the rule-of-thumb, least-squares cross-validatory procedures see Wand and Jones 995 or other plug-in rules, like the one proposed by Sheather and Jones 99. Hall et al. 987 introduced a nonparametric kernel density estimator for directional data in the q-dimensional sphere S q. For the circular case q =, denoting by Θ a random variable with density ϕ, the circular kernel density estimation from a sample {Θ i } n i= is given by ˆϕ ν θ = c ν n n L ν cosθ Θ i, θ, π,. i= where L is the circular kernel, ν is the circular bandwidth and c ν is a constant such that ˆϕ ν is a density. Some differences should be noted in contrast to the linear kernel density estimator in.9. First, the kernel function L must be a rapidly varying function, such as the exponential see Hall et al Secondly, the behaviour of ν is opposite to h: in linear kernel density estimation, small values of the bandwidth h produce undersmoothed estimators small values of ν oversmooth the density, whereas large values of h give oversmoothed curves large values of ν produce undersmoothing. See Hall et al. 987 for a detailed description of the estimator and its properties. As in the linear case, bandwidth selection is also an issue in circular kernel density estimation. Although in the linear case it is a well-studied problem, for circular density estimation there are still some open questions. Hall et al. 987 proposed selecting the smoothing parameter by maximum likelihood cross-validation. There are other recent proposals, such as the automatic bandwidth selection method introduced by Taylor 8, but based on his results none of the selectors proposed seems to show a superior behaviour.

49 .. Estimation algorithm 9 Although for the marginal distributions it may be reasonable to assume a parametric model, it is not that clear for the copula function, regarding the dependence structure between the marginals. Hence, in a general situation, the copula estimation in step ii should be carried out by a nonparametric procedure that will be explained below. However, for the J&W density in., the copula density c Θ,X is linked with a joining circular density g in.6 and this circular density can be estimated in the same way as the marginal circular density. Note that, in this family, all the estimators involved in the algorithm are obtained in a strictly univariate way, which simplifies their computation u linear ranks v linear ranks u circular ranks v linear ranks Figure.: Illustration for data reflection for copula density estimation. The central square in each plot corresponds with the original data ranks. Left plot: mirror reflection from Gijbels and Mielniczuk 99. Right plot: circular-mirror reflection for estimator.. Nonparametric copula density estimation can be also done by kernel methods, as proposed by Gijbels and Mielniczuk 99. The proposed estimator is similar to the classical bivariate kernel density estimator, with a product kernel and with a mirror image data modification. This mirror image see Figure., left plot consists in reflecting the data with respect to all edges and corners of the unit square, in order to reduce the edge effect. In our particular case of circular-linear copula densities, the reflection must be done accounting for the circular nature of the first component, as shown in Figure., right plot. The copula density kernel estimator can be defined as: ĉ Θ,X u, v = n n i= 9 l= K B u ˆΨ Θ l i, v ˆF X l i,. where ˆΨ and ˆF are the marginal kernel distribution functions, KB u, v = B KB / u, v is a bivariate rescaled kernel see Ruppert and Wand 99 for notation and B is the bandwidth matrix B denotes the determinant. A plug-in rule for selecting the bandwidth matrix B was proposed by Duong and Hazelton. The reflected data sample in each square l =,..., 9, namely { ˆΨ Θ l i, ˆF X l i } n i= is obtained by rows in the plot Figure., right plot as follows: u, v, u, v, u +, v bottom row; u, v, u, v, u +, v middle row; u, v, u, v, u +, v top row.

50 Chapter. Density estimation by circular-linear copulas As pointed out by Omelka et al. 9, the estimator proposed by Gijbels and Mielniczuk 99 in the linear case still suffers from corner bias, which could be corrected by bandwidth shrinking. This issue is not so important in the circular-linear setting, given the periodicity condition. It should be also noticed that, with this construction, the estimator obtained in step iii, with the proposed reflection, is guaranteed to be a density as long as the distribution and density estimators satisfy that ˆF = ˆf and ˆΨ = ˆϕ. The estimation algorithm can be extended for circular-circular densities, just with suitable circular density estimators for the marginals and a slight modification of the data reflection for the copula estimation. Specifically, reflection in scheme as the one shown in Figure., the middle one corresponding to the original circular-circular data quantiles will be done as follows: u, v, u, v, u +, v bottom row; u, v, u, v, u +, v middle row; u, v, u, v, u +, v top row... Some simulation results In order to check the performance of the estimation algorithm for circular-linear densities, the following scenarios are reproduced. Examples. and. were proposed by Johnson and Wehrly 978. Example. corresponds to the QS-copula family given by.7 and Example. to the reflected Frank copula constructed in.8. Example. J&W copula with circular uniform and normal marginal distributions. Let ϕ U denote the circular uniform density. and φ the standard normal density Φ the standard normal distribution. Take the joining density g = ϕ vm ; µ, κ. A circular-linear density with marginals ϕ and φ is given by p θ, x = I κ exp {κ cosθ πφx µ} ϕ Uθφx. Example. J&W copula with von Mises and normal marginal distributions. Consider g = ϕ vm ; µ, κ and the density marginals φ and ϕ vm ; µ, κ with corresponding distributions Φ and Ψ vm ; µ, κ, respectively. A joint circular-linear density is given by p θ, x = I κ exp { κ cos πψ vm θ; µ, κ Φx µ } ϕ vm θ; µ, κ φx. Example. QS-copula with von Mises and normal marginal distributions. Take ϕ vm ; µ, κ and φ as marginals. The circular-linear density with copula.7 and α = π is given by p θ, x = + π cosπψ vmθ; µ, κ Φx ϕ vm θ; µ, κ φx. Example. Reflected Frank copula with von Mises and normal marginal distributions. Take ϕ vm ; µ, κ and φ as marginals. The circular-linear density with reflected Frank copula α = is obtained by the mixture construction.8: p θ, x = c α Θ,X Ψ vm θ; µ, κ, Φx ϕ vm θ; µ, κ φx. As commented in Section, the formulation of the joint circular-linear density in terms of copulas simplifies the simulation of random samples. The general idea is to split the joint distribution

51 Density Density Density Density.. Estimation algorithm P by Sklar s theorem in a copula C Θ,X and marginals Ψ and F. Therefore, if we simulate a sample from U, V uniform random variables with copula C Θ,X and we apply the marginal quantiles transformations, then Ψ U, F V will be a sample from the distribution P. The simulation of U, V values from the copula C Θ,X can be performed by the conditional method for simulating multivariate distributions see Johnson 987. The conditional distribution of V given U = u, denoted by C u, can be expressed as C u v = C Θ,Xu, v u = v c Θ,X u, t dt,. where the first equality is an immediately property of copulas. So, for simulating random samples for the examples, or more generally, for simulating random samples of circular-linear random variables with density.5, we may proceed with the following algorithm. x x theta theta x x theta theta Figure.5: Density surfaces for the simulation study. Top left: Example. with µ = π and κ =. Top right: Example. with µ = π, κ = 5, µ = π/ and κ =. Bottom left: Example. with α = π, µ = π/ and κ =.5. Bottom right: Example. with α =, µ = π/ and κ =.5.

52 Chapter. Density estimation by circular-linear copulas Algorithm. Simulation algorithm. i. Simulate U, W U,, where U, stands for the standard uniform distribution. ii. Compute V = C u W. iii. Obtain Θ = Ψ U, X = F V. In step i, two independent and uniformly distributed random variables are simulated. The conditional simulation method for obtaining U, V from the circular-linear copula C Θ,X is performed in step ii. Finally, quantile transformations from the marginals are applied, obtaining a sample from Θ, X following the joint density.. For Examples. and., the conditional distribution C u in. is related to a von Mises vmµ, κ and in step ii, for each u, V = π Ψ vm W ; πu µ, κ. For Example., for each u, V = a+ a+ aw a, with a = πα cos πu. Example. is simulated from the mixture of copulas.8, using that for the Frank copula, step ii becomes V = α log + W e α W e αu e αu. The estimation algorithm proposed will be applied to estimate the densities in the examples. Different versions of the estimation algorithm can be implemented, considering parametric and nonparametric estimation methods. In addition, for the J&W densities Examples. and., the estimation of the circular-linear density can be approached by representation., in terms of a circular joining density, or by the more general representation.5. Summarizing, the following variants of the estimation algorithm will be presented: i. J&W, parametric JWP: based on representation., marginals as well as joining density are parametrically estimated. The results will be used as a benchmark for the J&W models Examples. and.. ii. J&W, semiparametric JWSP: based on representation., marginals are estimated parametrically and a nonparametric kernel method is used for the joining density. iii. J&W, nonparametric JWNP: based on representation., marginals and joining density are estimated by kernel methods. iv. Copula, semiparametric CSP: based on representation.5, parametric estimation is considered for marginals. The copula density is estimated by kernel methods. v. Copula, nonparametric CNP: based on representation.5, marginals and copula density are estimated by kernel methods. In the parametric case, density estimators have been obtained by MLE, specifying the von Mises family for the circular distributions and the normal family for the linear marginal. Nonparametric estimation has been carried out using kernel methods. The kernel density estimator in.9, with Gaussian kernel and Sheather and Jones 99 bandwidth, has been used for obtaining ˆf. For ˆϕ and ĝ, the circular kernel density. has been implemented, with exponential kernel and likelihood cross-validatory bandwidth. In the semiparametric approaches parametric marginals and nonparametric joining density or copula, Maximum Likelihood has been used for obtaining ˆϕ and ˆf. The circular kernel estimator. has been considered for ĝ. The copula density kernel estimator. has been used for ĉ Θ,X, with bivariate Gaussian kernel and restricted bandwidth matrix. Specifically, following the procedure of Duong and Hazelton,

53 .. Estimation algorithm full bandwidth matrices have been tried, but with non significant differences in the values of the principal diagonal. Hence, a restricted bandwidth with two smoothing parameter values is used, considering the same element in the principal diagonal and a second element in the secondary diagonal, regarding for the kernel orientation. In order to check the performance of the procedure for estimating circular-linear densities, the MISE criterion is considered: MISE = π E ˆpθ, x pθ, x dx dθ. The MISE is approximated by Monte Carlo simulations, taking replicates. Four sample sizes have been used: n = 5, n =, n = 5 and n =. In the first example, the set-up parameters are µ = π and κ =. For the second example, we take µ = π, κ = 5, µ = π/ and κ =. Both in Examples. and. the parameters in the von Mises marginal were set to µ = µ = π/ and κ = κ =.5, and the linear marginal is a standard normal. J&W-based Copula-based Relative efficiency n JWP JWSP JWNP CSP CNP JWSP JWNP CSP CNP Example Example Table.: MISE for estimating the circular-linear density in Examples. and.. efficiencies for JWSP, JWNP, CSP and CNP are taken with respect to JWP. Relative J&W-based Copula-based n JWSP JWNP CSP CNP Example Example Table.: MISE for estimating the circular-linear density in Examples. and..

54 Chapter. Density estimation by circular-linear copulas In Figure.5, surface plots for the example densities are shown, the top row corresponding to the J&W family Example. and Example.. Simulation results for these examples can be seen in Table.. For all the alternatives of the algorithm, the MISE is reduced when increasing the sample size. Example. presents higher values for the MISE, and it is due to the estimation of a more complex structure in the circular marginal density circular uniform in Example. and von Mises in Example.. In both models, the estimation methods providing the information about the J&W structure that is, based on representation. work better, as expected. Nevertheless, the copula based approaches, CSP and CNP, are competitive with the JWNP. JWP JWSP JWNP CSP CNP ISE x JWP JWSP JWNP CSP CNP ISE x JWSP JWNP CSP CNP ISE x JWSP JWNP CSP CNP.5..5 ISE x Figure.6: Boxplots of the ISE for Example. top left, Example. top right, Example. bottom left and Example. bottom right for n = 5 and different estimation procedures. The parametric method JWP presents the lowest MISE values for all sample sizes in both examples, so it will be taken as a benchmark for computing the relative efficiencies of the nonparametric and semiparametric approaches, both based on the density. or on the copula density.5. Relative efficiencies are obtained as the ratio between the MISE of the parametric method and the MISE of the nonparametric and semiparametric procedures. The relative efficiencies see Table. are higher for the semiparametric approach, with better results for Example.. Boxplots for the ISE for sample size n = 5 can be seen in Figure.6, top row.

55 .. Application to wind direction and SO concentration 5 The larger variability of the nonparametric methods JWNP and CNP can be appreciated for both examples. For Example. see Figure.6, top right plot, note that JWNP shows the highest values for the ISE. Obviously, the first three proposals JWP, JWSP, JWNP make sense for distributions belonging to the J&W family. However, since the marginals considered in Examples. and. also belong to the von Mises circular and the normal linear classes, as in Examples. and., JWSP, JWNP, CSP and CNP variants of the algorithm will be applied in the last two cases. Results are reported in Table.. MISE values decrease with sample size, showing CSP and CNP a similar behaviour. Note also that assuming a representation like., increases dramatically the MISE: for instance, in Example., for n = 5 or n =, the MISE for JWSP or JWNP is four times the one provided by CNP. The effect of this misspecification in the copula structure can be clearly seen in the bottom row of Figure.6.. Application to wind direction and SO concentration The goal of this work is to explore the relation between wind incidence direction and SO concentration in monitoring station B near a power plant see Figure. for location of station B. SO is measured in µg/m and wind direction as a counterclockwise angle in, π. With this codification,, π, π and π represent east, north, west and south direction, respectively. The dataset contains observations recorded minutely in January and January, but due to technical limitations in the measuring device, SO is only registered when it is higher than µg/m. Concentration values below this threshold are considered as non significant and are recorded as the lower detection limit µg/m. Data have been hourly averaged, resulting 76 observations for and 7 for. In order to avoid repeated data, perturbation procedures have been applied to both marginals, and will be detailed below. Afterwards, a Box-Cox transformation for the SO concentration with λ =.8 for and λ = 7. for have been applied. For the sake of simplicity, we will refer to these transformed data as SO concentration, but note that figures are shown in the transformed scale. Measurement devices, both for the wind direction and for SO concentration, did not present a sufficient precision to avoid repeated data, and this problem was inherited also for the hourly averages. The appearance of repeated measurements posed a problem in the application of the procedure, specifically, in the bandwidth computation. Perturbation in the linear variable, the SO concentration, was carried out following Azzalini 98. A pseudo-sample of SO levels is obtained as follows: X i = X i + bε i, where X i denote the observed values, b =.ˆσn / and ε i, i =,..., n, are independent and identically distributed random variables from the Epanechnikov kernel in 5, 5. ˆσ is a robust estimator of the variance, which has been computed using the standardized interquartile range. Azzalini 98 showed that this choice of b for the data perturbation allows for consistent estimation of the distribution function, getting a mean squared error with the same magnitude as the one from the empirical cumulative distribution function. The problem of repeated measures also occurs for wind direction. In this case, a perturbation procedure similar to the linear variable case can be used, just considering the circular variable

56 6 Chapter. Density estimation by circular-linear copulas the angle in the real line. Then, a pseudo-sample of wind direction was obtained as θ i = θ i + dε i, with θ i denoting the wind direction measurements and ε i, i =,..., n, were independently generated from a von Mises distribution with µ = and κ =, with d = n /. We have checked by simulations that the applied perturbation did not affect the distribution of the data... Exploring SO and wind direction in and The estimation procedure is applied to data from and in station B, considering the fully nonparametric approach. Specifically, a nonparametric kernel density estimator for the SO concentration was used, with the plug-in rule bandwidth obtained by method of Sheather and Jones 99 h =.75 6 for, h =. for. For the wind direction, circular kernel density estimation has been also performed, with likelihood cross-validatory bandwidth ν = 6. for, ν = for. In Figure.7, the estimation of the circular-linear density surfaces, with the corresponding contour plots, is shown. For two modes in the SW and NE directions see Figure.7, right column can be identified, both with a similar behaviour and with quite low values of SO concentrations. Recall that the scale is Box-Cox transformed for data in the original scale, see Figure.. A different situation occurs in see Figure.7, left column where, in addition to the two modes that appear in for low values of SO, a third mode arises. This mode is related to winds blowing from SW from the thermal power plant and to SO values significantly higher than for. This relation suggests that the additional mode represents SO pollutants coming from the power plant, and its disappearance for illustrates the effectiveness of the control measures applied during 5 8 to reduce the SO emissions from the power plant..5 Final comments A flexible algorithm for estimating circular-linear densities is proposed based on a copula density representation. The method provides a completely nonparametric estimator, but it can be modified to accommodate the classical Johnson and Wehrly family of circular-linear distributions. In the purely nonparametric version of the algorithm, circular and linear kernel density estimators have been used for the marginals, although other nonparametric density estimators could be considered. In addition, the extension of the algorithm for circular-circular density estimation is straightforward. In our air quality data application the precision of the measurement devices posed some extra problems in the data analysis. The lack of precision resulted in the appearance of repeated values for the wind direction, and a data perturbation procedure was needed in order to apply the algorithm. The perturbation method proposed has been checked empirically, and it is inspired by the results for kernel distribution estimation, but our guess is that similar results could be obtained with just perturbing the data by summing errors from a highly concentrated distribution e.g. a von Mises distribution with large κ. Nevertheless, data perturbation in the circular setting needs further investigation. Another possible problem that may be encountered in practice, for linear variables, is censoring, that may be due to detection limits or other

57 Density. Density.5. Final comments 7 phenomena. Under censoring, the observation values are only partially known, and suitable estimation procedures for density estimation with censored data should be applied. Wind direction SO concentration Wind direction SO concentration SO concentration SO concentration π π π π Wind direction π π π π Wind direction Figure.7: Circular-linear density estimator for wind direction and SO concentrations in monitoring station B. Left column:. Right column:. Finally, a natural criticism of our analysis of the air quality data is that the measurements were not independent of one another. Temporal dependence could be accounted for in the estimation procedure by using a proper bandwidth selector, such as a k-fold cross-validatory bandwidth for the linear kernel density estimator. However, there are not such alternatives for circular data up to the authors knowledge and the study of bandwidth selection rules for circular dependent data is beyond the scope of this paper. The simulation study and real data analysis has been carried out in R. R Development Core Team, using self-programmed code and package circular Agostinelli and Lund. For the real data analysis, the computing time for is.77 seconds, taking the computation of the copula estimator 8.58 seconds. The same procedures take.57 seconds for the data. All computations were done on a computer with.6 GHz core. This shows that the computational cost of the method is not high and its application is feasible in practice.

58 8 Chapter. Density estimation by circular-linear copulas Acknowledgements The authors acknowledge the support of Project MTM8, from the Spanish Ministry of Science and Innovation and Project MDS75PR from Dirección Xeral de I+D, Xunta de Galicia. Work of E. García-Portugués has been supported by FPU grant AP 957 from the Spanish Ministry of Education. We also thank the referee and the Associate Editor for providing constructive comments and help in improving this paper. References Agostinelli, C. and Lund, U.. R package circular: circular statistics version.-. Azzalini, A. 98. A note on the estimation of a distribution function and quantiles by a kernel method. Biometrika, 68:6 8. Bayraktar, H., Turalioğlu, F. S., and Tuncel, G.. Average mass concentrations of TSP, PM and PM.5 in Erzurum urban atmosphere, Turkey. Stoch. Environ. Res. Risk Assess., : Chen, C.-C., Wu, K.-Y., and Chang-Chien, G.-P.. Point source identification using a simple permutation test: a case study of elevated PCDD/F levels in ambient air and soil and their relation to the distance to a local municipal solid waste incinerator. Stoch. Environ. Res. Risk Assess., 6:5. Deschepper, E., Thas, O., and Ottoy, J. P. 8. Tests and diagnostic plots for detecting lack-of-fit for circular-linear regression models. Biometrics, 6:9 9. Duong, T. and Hazelton, M. L.. Plug-in bandwidth matrices for bivariate kernel density estimation. J. Nonparametr. Stat., 5:7. Fernández-Durán, J. J.. Circular distributions based on nonnegative trigonometric sums. Biometrics, 6:99 5. Fernández-Durán, J. J. 7. Models for circular-linear and circular-circular data constructed from circular distributions based on nonnegative trigonometric sums. Biometrics, 6: Gijbels, I. and Mielniczuk, J. 99. Estimating the density of a copula function. Comm. Statist. Theory Methods, 9:5 6. Hall, P., Watson, G. S., and Cabrera, J Kernel density estimation with spherical data. Biometrika, 7: Jammalamadaka, S. R. and Lund, U. J. 6. The effect of wind direction on ozone levels: a case study. Environ. Ecol. Stat., : Jammalamadaka, S. R. and SenGupta, A.. Topics in circular statistics, volume 5 of Series on Multivariate Analysis. World Scientific Publishing, River Edge. Johnson, M. E Multivariate statistical simulation. Wiley Series in Probability and Mathematical Statistics. Applied Probability and Statistics. John Wiley & Sons, New York.

59 References 9 Johnson, R. A. and Wehrly, T. E Some angular-linear distributions and related regression models. J. Amer. Statist. Assoc., 76:6 66. Mardia, K. V. and Jupp, P. E.. Directional statistics. Wiley Series in Probability and Statistics. John Wiley & Sons, Chichester, second edition. Nazemi, A. and Elshorbagy, A.. Application of copula modelling to the performance assessment of reconstructed watersheds. Stoch. Environ. Res. Risk Assess., 6:89 5. Nelsen, R. B. 6. An introduction to copulas. Springer Series in Statistics. Springer, New York, second edition. Omelka, M., Gijbels, I., and Veraverbeke, N. 9. Improved kernel estimation of copulas: weak convergence and goodness-of-fit testing. Ann. Statist., 75B: 58. R Development Core Team. R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. Ruppert, D. and Wand, M. P. 99. Multivariate locally weighted least squares regression. Ann. Statist., :6 7. Sheather, S. J. and Jones, M. C. 99. A reliable data-based bandwidth selection method for kernel density estimation. J. Roy. Statist. Soc. Ser. B, 5: Shih, D. C.-F. 8. Wind characterization and potential assessment using spectral analysis. Stoch. Environ. Res. Risk Assess., :7 56. Taylor, C. C. 8. Automatic bandwidth selection for circular density estimation. Comput. Statist. Data Anal., 57:9 5. Wand, M. P. and Jones, M. C Kernel smoothing, volume 6 of Monographs on Statistics and Applied Probability. Chapman & Hall, London. Wehrly, T. E. and Johnson, R. A Bivariate models for dependence of angular observations and a related Markov process. Biometrika, 67:55 56.

60 Chapter. Density estimation by circular-linear copulas

61 Chapter Kernel density estimation for directional-linear data Abstract A nonparametric kernel density estimator for directional-linear data is introduced. The proposal is based on a product kernel accounting for the different nature of both directional and linear components of the random vector. Expressions for bias, variance and Mean Integrated Squared Error MISE are derived, jointly with an asymptotic normality result for the proposed estimator. For some particular distributions, an explicit formula for the MISE is obtained and compared with its asymptotic version, both for directional and directional-linear kernel density estimators. In this same setting a closed expression for the bootstrap MISE is also derived. Reference García-Portugués, E., Crujeiras, R. M. and González-Manteiga, W.. Kernel density estimation for directional-linear data. J. Multivariate Anal., :5 75. Contents. Introduction Background on linear and directional kernel density estimation.... Kernel density estimation for directional data Kernel density estimation for directional-linear data Main results Error measurement and optimal bandwidth MISE for directional and directional-linear kernel density estimators Some exact MISE calculations for mixture distributions Conclusions A Some technical lemmas B Proofs of the main results C Proofs of the technical lemmas References

62 Chapter. Kernel density estimation for directional-linear data. Introduction Kernel density estimation, and kernel smoothing methods in general, is a classical topic in nonparametric statistics. Starting from the first papers by Akaike 95, Rosenblatt 956 and Parzen 96, extensions of the kernel density methodology have been brought up in different contexts, dealing with other smoothers, more complex data censorship, truncation, dependence or dynamical models see Müller 6 for a review. Some comprehensive references in this topic include the books by Silverman 986, Scott 99 and Wand and Jones 995, among others. Beyond the linear case, kernel density estimation has been also adapted to directional data, that is, data in the q-dimensional sphere see Jupp and Mardia 989 for a complete review of the theory of directional statistics. Hall et al. 987 defined two type of kernel estimators and give asymptotic formulae of bias, variance and square loss. Almost simultaneously, Bai et al. 988 established the pointwise, uniformly strong consistency and L consistency of a quite similar estimator in the same context. Later, Zhao and Wu stated a central limit theorem for the integrated squared error of the previous kernel density estimator based on the U-statistic martingale ideas developed by Hall 98. Some of the results by Hall et al. 987 were extended by Klemelä, who studied the estimation of the Laplacian of the density and other types of derivatives. All these references consider the data lying on a general q-sphere of arbitrary dimension q, which comprises as particular cases circular data q = and spherical data q =. For the particular case of circular data, there are more recent works dealing with the problem of smoothing parameter selection in kernel density estimation, such as Taylor 8 and Oliveira et al.. Di Marzio et al. study the kernel density estimator on the q-dimensional torus, and propose some bandwidth selection methods. A more general approach has been followed by Hendriks 99, who discusses the estimation of the underlying distribution by means of Fourier expansions in a Riemannian manifold. This differential geometry viewpoint has been exploited recently by Pelletier 5 and Henry and Rodriguez 9. Nevertheless, the original approach seems to present a good balance between generality and complexity. The aim of this work is to introduce and derive some basic properties of a joint kernel density estimator for directional-linear data, i.e. data with a directional and a linear component. This type of data arise in a variety of applied fields such as meteorology when analysing the relation between wind direction and wind speed, oceanography in the study of sea currents and environmental sciences, among others. As an example, such an estimator has been used by García-Portugués et al. for studying the relation between pollutants and wind direction in the presence of an emission source. Specifically, the novelty of this work comprises the analysis of asymptotic properties of the directional-linear kernel density estimator, deriving bias, variance and asymptotic normality. As a by-product, the Mean Integrated Squared Error MISE follows, as well as the expression for optimal Asymptotic MISE AMISE bandwidths. In addition, for a particular class of densities consisting of mixtures of directional von Mises and normals, it is possible to compare the AMISE with the exact MISE. These results have been also obtained for the purely directional case, considering mixtures of von Mises distributions in the q-dimensional sphere, completing the existing results for directional data.

63 .. Background on linear and directional kernel density estimation This paper is organized as follows. Section. presents some background on kernel density estimation for linear data and directional data. The proposed directional-linear kernel density estimator and the main results of this paper are included in Section., where the bias, variance and asymptotic normality are derived. Section. is focused in the issue of error measurement and expressions for the AMISE of the estimator and the exact MISE for particular cases of mixtures are obtained, both in the directional and directional-linear contexts. Conclusions and final comments are given in Section.5. The proofs of the results and some technical lemmas are given in the Appendix.. Background on linear and directional kernel density estimation This section is devoted to a brief introduction on kernel density estimation for linear and directional data. For the sake of simplicity, f will denote the target density in this paper, which may be linear, directional, or directional-linear, depending on the context. Let Z denote a linear random variable with support suppz R and density f. Consider Z,..., Z n a random sample of Z, with size n. The linear kernel density estimator introduced by Akaike 95, Rosenblatt 956 and Parzen 96 is defined as ˆf g z = n z Zi K, z R,. ng g i= where K denotes the kernel, usually a symmetric density about the origin, and g > is the bandwidth parameter, which controls the smoothness of the estimator. Specifically, large values of the bandwidth parameter will produce oversmoothed estimates of f, whereas small values will provide undersmoothed curves. The asymptotic properties of this estimator and its adaptation to different contexts yielded a remarkably prolific field within the statistical literature, as noted in the introduction. It is well known that under some regularity conditions on the kernel and the target density, the bias of the estimator. is of order Og, whereas the variance is Ong, clearly showing the need of accounting for a trade-off between bias and variance in any bandwidth selection procedure. Specifically, the expected value of the linear kernel estimator at z R is: E ˆfg z = fz + µ Kf zg + o g, where µ p K = R zp Kz dz represents the p-th moment of the kernel K. Similarly, the variance of. at z R is given by: Var ˆfg z = ng RKfz + o ng, where RK = R K z dz. Further details on computations for the linear kernel density estimator can be found in Section.5 of Wand and Jones Kernel density estimation for directional data As previously mentioned, kernel density estimation has been adapted to different contexts such as directional data, that is, data on a q-dimensional sphere, being circular data q = and

64 Chapter. Kernel density estimation for directional-linear data spherical data q = particular cases. Let X denote a directional random variable with density f. The support of such a variable is the q-dimensional sphere, denoted by Ω q = { x R q+ : x + + x q+ = }. The Lebesgue measure in Ω q will be denoted by ω q and, therefore, a directional density satisfies Ω q fx ω q dx =. Remark.. When there is no possible misunderstanding, ω q will also denote the surface area of Ω q : ω q = ω q Ω q = Γ q+ π q+, q, where Γ represents the Gamma function defined as Γp = x p e x dx, for p >. The directional kernel density estimator was proposed by Hall et al. 987 and Bai et al. 988, following two different perspectives in the treatment of directional data. In this paper, the definition in Bai et al. 988 will be considered, although it can also be related with one of the proposals in Hall et al Given a random sample X,..., X n, of a directional variable X with density f, the directional kernel density estimator is given by: ˆf h x = c h,ql n n x T X i L h i=, x Ω q,. where L is the directional kernel, h > is the bandwidth parameter and c h,q L is a normalizing constant depending on the kernel L, the bandwidth h and the dimension q. The scalar product of two vectors, x and y, is denoted by x T y, where T is the transpose operator. In this setting, directional kernels are not directional densities but functions of rapid decay. Therefore, to ensure that the resulting estimator is indeed a directional density, the normalizing constant c h,q L is needed. Specifically see Bai et al. 988, the inverse of this normalizing constant for any x Ω q is given by c h,q L = Ω q L x T y ω q dy = h q λ h,q L h q λ q L,. h with λ h,q L = ω q h Lrr q rh q dr and λ q L = q ω q Lrr q dr. The asymptotic behaviour of λ h,q L is established in Lemma. and the notation a n b n indicates that an b n as n see also Bai et al. 988 and Zhao and Wu. Properties of the directional kernel density estimator. have been analysed by Bai et al. 988, who proved pointwise, uniform and L -norm consistency. A central limit theorem for the integrated squared error of the estimator has been established by Zhao and Wu, as well as the expression for the bias under some regularity conditions, stated below: D. Extend f from Ω q to R q+ \ {} by defining fx f x/ x for all x, where denotes the Euclidean norm. Assume that the gradient vector fx = fx x,, fx T x q+ and the Hessian matrix Hfx = fx x i x j i,j q+ exist, are continuous on Rq+ \ {} and square integrable.

65 .. Background on linear and directional kernel density estimation 5 D. Assume that L :,, is a bounded and Riemann integrable function such that < L k rr q dr <, q, for k =,. D. Assume that h = h n is a sequence of positive numbers such that h n and nh q n as n. Remark.. L must be a rapidly decreasing function, quite different from the bell-shaped kernels K involved in the linear estimator.. To verify D, L must decrease faster than any power function, since r α r q dr =, α R, q. Lemma in Zhao and Wu states that, under the previous conditions D D, the expected value of the directional kernel density estimator in a point x Ω q, is E ˆfh x = fx + b q LΨf, xh + o h, where Ψf, x = x T fx + q fx x T Hfxx,. b q L = / Lrr q dr Lrr q dr,.5 being fx = q+ i= fx x i the Laplacian of f. Note that the bias is of order Oh, but in., apart from the curvature of the target density which is captured by the Hessian matrix, a gradient vector also appears. On the other hand, the scaling constant b q L can be interpreted as a kind of moment of the directional kernel L. Note that, condition D with k = is needed for the bias computation. The same condition with k = is required for deriving the pointwise variance of the estimator., which was also given by Hall et al. 987 and Klemelä. Proposition.. Under conditions D D, the variance of ˆf h x at x Ω q is given by where Var ˆfh x = c h,ql d q Lfx + o nh q, n d q L = / L rr q dr Lrr q dr. Regarding the normalizing constant expression., the order of the variance is O nh q, where q is the dimension of the sphere. This order coincides with the corresponding one for a multivariate kernel density estimator in R q see Scott 99. A popular choice for the directional kernel is Lr = e r, r, also known as the von Mises kernel due to its relation with the von Mises-Fisher distribution see Watson 98. In a q-dimensional sphere, the von Mises model vmµ, κ has density { } f vm x; µ, κ = C q κ exp κx T µ, C q κ = π q+ κ q I q κ,.6

6 Chapter. Kernel density estimation for directional-linear data being µ Ω q the directional mean and κ the concentration parameter around the mean. In Figure.

66 6 Chapter. Kernel density estimation for directional-linear data being µ Ω q the directional mean and κ the concentration parameter around the mean. In Figure. left plot, the contour plot of a spherical von Mises is shown. I ν is the modified Bessel function of order ν, z ν I ν z = t ν π / Γ ν + e zt dt. For the particular case of the target density being a q-dimensional von Mises vmµ, κ, the term. in the bias computation becomes: Ψ f vm ; µ, κ, x = κc q κe κxt µ x T µ + κq x T µ. As κ, which means that the distribution is approaching a uniform model in the sphere, the previous term also tends to zero. Considering the von Mises kernel in the directional estimator. allows for its interpretation as a mixture of von Mises-Fisher densities ˆf h x = n f vm x; X i, /h,.7 n i= where, for each von Mises component, the mean value is i-th observation X i and the concentration is given by h, involving the smoothing parameter. Figure.: Left: contour plot of a von Mises density vmµ, κ, with µ =,, and κ =. Right: contour plot of the mixture of von Mises densities.. In addition, the normalizing constant. appearing in the construction of the directional kernel estimator. has a simple expression for a von Mises kernel, given by c h,q L = π q Γ q { } + t exp t q dt = C q /h e /h..8 h For a general kernel, the asymptotic behaviour of c h,q L was remarked in. and it can be specified for the von Mises kernel. In this case,.8 depends on C q /h, which involves

67 .. Main results 7 a Bessel function of order q /. Applying a Taylor expansion for I ν, it can be seen that I ν z = e z z π + O z, z and c h,q L presents also a simple form: c h,q L = π q+ e h h q e h h + O π h = π q h q + O h q+. Finally, the other terms involved in bias and variance, namely b q L and d q L, become for the von Mises kernel. b q L = q, d ql = q +, q.. Kernel density estimation for directional-linear data Consider a directional-linear random variable, X, Z with support suppx, Z Ω q R and joint density f. For the simple case of circular data q =, the support of the variable is the cylinder. Following the ideas in the previous section for the linear and directional cases, given a random sample X, Z,..., X n, Z n, the directional-linear kernel density estimator can be defined as: ˆf h,g x, z = c h,ql ng n x T X i LK h i=, z Z i, x, z Ω q R,.9 g where LK is a directional-linear kernel, g is the bandwidth parameter for the linear component, h the bandwidth parameter for the directional component and c h,q L is the normalizing constant for the directional part, defined in.. For the sake of simplicity, a product kernel LK, = L K will be considered throughout this paper. Although a product kernel formulation has been adopted, the results could be generalized for a directional-linear kernel, with the suitable modifications in the required conditions.. Main results Before stating the main results, some notation will be introduced. The target directional-linear density will be denoted by f. The gradient vector and Hessian matrix of f, with respect to both components directional and linear are defined in this setting as: fx, z fx, z =,..., x fx,z x..... Hfx, z = fx,z x q+ x fx,z z x fx, z x q+, fx,z x x q+ fx,z x q+ fx,z z x q+ T fx, z = x fx, z, z fx, z T, z fx,z x z. fx,z x q+ z fx,z z = H xfx, z H x,z fx, z, H x,z fx, z T H z fx, z where subscripts x and z are used to denote the derivatives with respect to the directional and linear components, respectively. The Laplacian of f restricted to the directional component is denoted by xfx, z = q+ fx,z i=. The following conditions will be required in order to xi prove the main results:

68 8 Chapter. Kernel density estimation for directional-linear data DL. Extend f from Ω q R to R q+ \A, A = { x, z R q+ : x = }, by defining fx, z f x/ x, z for all x and z R, where denotes the Euclidean norm. Assume that fx, z and Hfx, z exist, are continuous and square integrable. DL. Assume that the directional kernel L satisfies condition D and the linear kernel K is a symmetric around zero and bounded linear density function with finite second order moment. DL. Assume that h = h n and g = g n are sequences of positive numbers such that h n, g n and nh q ng n as n. The next two results provide the expressions for the bias and the variance of the directional-linear kernel density estimator.9. Proposition.. Under conditions DL DL, the expected value of the directional-linear kernel density estimator.9 in a point x, z Ω q R is given by E ˆfh,g x, z = fx, z + b q LΨ x f, x, zh + µ KH z fx, zg + o h + g, where Ψ x f, x, z = x T x fx, z + q xfx, z x T H x fx, zx. Proposition.. Under conditions DL DL, the variance for the directional-linear kernel density estimator.9 in a point x, z Ω q R is given by Var ˆfh,g x, z = c h,ql RKd q Lfx, z + o nh q g. ng In view of the previous results, some comments must be done. Firstly, the effects of the directional and linear part can be clearly identified. For the bias, marginal contributions appear as two addends and also the remaining orders from each part are separated. For the variance, the terms corresponding to both parts can be also identified, although turning up in a product form. In addition, the respective orders for bias and variance are analogous to those ones obtained with a q + -multivariate estimator in R q+ see Scott 99. It can be also proved that the directional-linear kernel density estimator.9 is asymptotically normal, under the same conditions as those ones used for deriving the expected value and the variance, and a further smoothness property on the product kernel. Theorem.. Under conditions DL DL, if LK +δ r, v r q dv dr < for some δ >, then the directional-linear kernel density estimator.9 is asymptotically normal: nh q d g ˆfh,g x, z fx, z ABias ˆfh,g x, z N, RKd q Lfx, z, pointwise in x, z Ω q R, where ABias ˆfh,g x, z = b q LΨ x f, x, zh + µ KH z fx, zg. The smoothness condition on the directional-linear kernel is required in order to ensure Lyapunov s condition and obtain the asymptotic normal distribution. Again, the effect of the two parts can be identified in the previous equation, as well as in the rate of convergence of the estimator. R

69 .. Error measurement and optimal bandwidth 9. Error measurement and optimal bandwidth The analysis of the performance of the kernel density estimator requires the specification of appropriate error criteria. Consider a generic kernel density estimator ˆf, which can be linear, directional or directional-linear. A global error measurement for quantifying the overall performance of this estimator is given by the MISE: MISE ˆf = E ˆfu fu du. The MISE can be interpreted as a function of the bandwidth and its minimization yields an optimal bandwidth in the sense of the quadratic loss. For the linear kernel density estimator. and under some regularity conditions see Wand and Jones 995, the MISE is given by: MISE ˆfg = µ K Rf g + ng RK + o g + ng. The asymptotic version of the MISE, namely the AMISE, can be used to derive an optimal bandwidth that minimizes this error. This optimal bandwidth is given by g AMISE = RK µ K Rf n Although the previous expression does not provide a bandwidth value in practice, given that it depends on the curvature of the target density Rf, some interesting issues should be noticed. For instance, the order of the asymptotic optimal bandwidth is On /5. Also, this result is the starting point of more sophisticated bandwidth selectors such as the ones given by Sheather and Jones 99 and Cao 99. A comparison of the performance of different bandwidth selectors can be found in Cao et al. 99, whereas Jones et al. 996 provides a review on bandwidth selection methods MISE for directional and directional-linear kernel density estimators In the previous sections, the bias and variance for the directional kernel estimator see Zhao and Wu for the bias and Proposition. for the variance and for the directional-linear kernel estimator Propositions. and. were obtained. Hence, it is straightforward to get the MISE for these estimators. Proposition.. Under conditions D D, the MISE for the directional kernel density estimator. is given by MISE ˆfh = b q L Ωq Ψf, x ω q dxh + c h,ql d q L + o h + nh q. n Following Wand and Jones 995, MISE ˆfh = AMISE ˆfh + o h + nh q, providing AMISE ˆfh a suitable large sample approximation that allows for the computation of an optimal bandwidth with closed expression, minimizing this asymptotic error criterion.

70 5 Chapter. Kernel density estimation for directional-linear data Corollary.. The AMISE optimal bandwidth for the directional kernel density estimator. is given by h AMISE = +q qd q L b q L, λ q LRΨf, n where RΨf, = Ω q Ψf, x ω q dx and λ q L = q ω q Lrr q dr. Expressions for MISE and AMISE can be also derived for the directional-linear estimator. In order to simplify the notation, let denote I φ = φx, z dz ω qdx, for a function φ : Ω q R R. Proposition.5. Under conditions DL DL, the MISE for the directional-linear kernel density estimator.9 is given by MISE ˆfh,g = b q L I Ψ x f,, h + µ K I H z f, g + b q Lµ KI Ψ x f,, H z f, h g + c h,ql d q LRK ng + o h + g + nh q g. Unfortunately, it is not straightforward to derive a full closed expression for the optimal pair of bandwidths h, g AMISE, although it is possible to compute them by numerical optimization. However, such a closed expression can be obtained for the particular case q =, where the circular and linear bandwidths can be considered as proportional. Corollary.. Consider the parametrization g = βh. The optimal AMISE pair of bandwidths h, g AMISE = h AMISE, βh AMISE can be obtained from h AMISE = 5+q q + d q LRK βλ q LR b q LΨ x f,, + β µ KH z f,, n where R b q LΨ x f,, + β µ KH z f, = Ω bq q R LΨ x f, x, z + β µ KH z fx, z dz ω q dx and λ q L is defined as in the previous corollary. For the circular-linear data case q =, the parameter β is given by: β = µ K I H z f, b q L I Ψ x f,, Despite a formal way for deriving the orders of the AMISE bandwidths has not been derived, a quite plausible conjecture is that for q >, h, g AMISE = O n /+q, O n /5 or, equivalently, that β = β n = O n q /5+q. Indeed, this is satisfied for q =. Finally, it is interesting to note that considering g = βh, a single bandwidth for the kernel estimator.9 is required, having the optimal bandwidth under this formulation order O n /5+q. This coincides with the order of the kernel linear estimator in R p, with p = dimω q R = q +..

71 .. Error measurement and optimal bandwidth 5.. Some exact MISE calculations for mixture distributions Closed expressions for the MISE for the directional and directional-linear estimators can be obtained for some particular distribution models, and they will be derived in this section. In the linear setting, Marron and Wand 99 obtained a closed expression for the MISE of. if the kernel K is a normal density and the underlying model is a mixture of normal distributions. Specifically, the density of an r-mixture of normal distributions with respective means m j and variances σj, for j =,..., r is given by r f r z = p j φ σj z m j, j= r p j =, p j, j= where p j, j =,..., r denote the mixture weights and φ σ is the density of a normal with zero mean and variance σ, i.e., φ σ z = πσ exp { }. z Marron and Wand 99 showed that σ the exact MISE of the linear kernel estimator is MISE ˆfg = π gn + p T n Ω g Ω g + Ω g p,. where p = p,..., p r T and Ω a g are matrices with entries Ω a g = φ σa m i m j ij, σ a = ag + σ i + σ j, for a =,,. Similar results can be obtained for the directional and directional-linear estimators, when considering mixtures of von Mises for the directional case, and mixtures of von Mises and normals for the directional-linear scenario see Figure. for some examples. For the directional setting, an r-mixture of von Mises with means µ j and concentration parameters κ j, for j =,..., r is given by r f r x = p j f vm x; µ j, κ j, j= r p j =, p j.. j= Consider a random sample X,..., X n, of a directional variable X with density f r see Figure., right plot. The following result gives a closed expression for the MISE of the directional kernel estimator. Proposition.6. Let f r be the density of an r-mixture of directional von Mises.. The exact MISE of the directional kernel estimator., obtained from a random sample of size n, with von Mises kernel Lr = e r is MISE ˆfh = D q hn + p T n Ψ h Ψ h + Ψ h p,. where p = p,..., p r T and D q h = C q /h Cq /h. The matrices Ψa h, a =,, have entries: Cq κ i C q κ j Ψ h =, C q κ i µ i + κ j µ j ij Ψ h = C q /h e κjxt µj C q κ i C q κ j Ω q C q x/h + κ i µ i ω qdx, ij

72 z 5 Chapter. Kernel density estimation for directional-linear data Ψ h = C q /h C q κ i C q κ j C q x/h + κ i µ i C q x/h + κ j µ j ωq dx, Ω q ij where C q is defined in equation.6. The matrices involved in. are not as simple as the ones for the linear case, due to the convolution properties of the von Mises density. For practical implementation of the exact MISE, it should be noticed that matrices Ψ h and Ψ h can be evaluated using numerical integration in q-spherical coordinates. For clarity purposes, constants C q κ i are included inside matrices Ψ h, Ψ h and Ψ h but it is computationally more efficient to consider them within the weights, that is, take p = p C q κ,..., p r C q κ r. π π.5 Density..... π π π π θ π 6 z Figure.: From left to right: circular-linear mixture.5 and corresponding circular and linear marginal densities, respectively. Random samples of size n = are drawn. From Proposition.6, it is easy to derive an analogous result for the case of a r-mixture of directional-linear independent von Mises and normals: r f r x, z = p j f vm x; µ j, κ j φ σj z m j, j= r p j =, p j.. j= Proposition.7. Let f r be the density of an r-mixture of directional-linear independent von Mises and normals densities given in.. For a random sample of size n, the exact MISE of the directional-linear kernel density estimator.9 with von Mises-normal kernel LKr, t = e r φ t is MISE ˆfh,g = D q hπ gn + p T n Ψ h Ω g Ψ h Ω g + Ψ h Ω g p, where denotes the Hadamard product between matrices and the involved terms are defined as in Proposition.6 and equation.. Once the exact MISE and the AMISE for mixtures of von Mises and normals are derived, it is possible to compare these two error criteria. To that end, let consider the following directional mixture 5 vm, q, + 5 vm q,, + 5 vm, q,,.

73 .. Error measurement and optimal bandwidth 5 where q represents a vector of q zeros, and the directional-linear mixture, 5 N vm, q, + 5 N, vm q,, + 5 N, vm, q,..5 Figure. shows the comparison between the exact and asymptotic MISE for the linear, circular and spherical case. As first noted by Marron and Wand 99 for the linear estimator, there exist significative differences between these two errors, being the most remarkable one the rapid growth of the AMISE with respect to the MISE for larger values of the bandwidth. This effect is due to the fact that, for a general bandwidth h, lim h AMISE ˆfh = since AMISE ˆfh is proportional to h, whereas the MISE level offs at lim h MISE ˆfh = Ω q fx ω q dx. Besides, for the directional case, this effect seems to be augmented probably because of a scale effect in the bandwidths, in the sense that the support of the directional variables is bounded, which is not the case for the linear ones considered. However, although the AMISE and MISE curves differ significantly, the corresponding optimal bandwidths get closer for increasing sample sizes. MISE MISE AMISE MISE MISE AMISE MISE MISE AMISE g h h Figure.: From left to right: exact MISE and AMISE for the linear mixture 5 N, + 5N, + 5N, and the circular and spherical mixtures., for a range of bandwidths between and. The black curves are for the MISE, whereas the red ones are for the AMISE. Solid curves correspond to n = and dotted to n =. Vertical lines represent the bandwidth values minimizing each curve. Figure. contains the contourplots of the exact and asymptotic MISE for the circular-linear and spherical-linear cases. The conclusions are more or less the same as for Figure.: the asymptotic MISE grows rather quickly than the exact MISE for large values of h or g. On the other hand, the contour lines of both surfaces are quite close for small values of the bandwidths and the optimal bandwidths also get closer for larger sample sizes. As an immediate application of Propositions.6 and.7, a bootstrap version of the MISE for the directional and directional-linear estimators can be derived. The bootstrap MISE is an estimator of the true MISE obtained by considering a smooth bootstrap resampling scheme, which will be briefly detailed. In the linear case, the bootstrap MISE is given by MISE g ˆfg P = E ˆf g z ˆf gp z dz, R where ˆf g z = ni= z Z ng K i g, being the sample Z,..., Z n distributed as ˆf gp. In this case, g P is a pilot bandwidth and the expectation E is taken with respect to the density estimator ˆf gp.

74 5 Chapter. Kernel density estimation for directional-linear data For the linear case, Cao 99 derived an exact closed expression for MISE g P ˆfg that actually avoids the needing of resampling and obtained a bandwidth that minimizes the bootstrap MISE by previously computing a suitable pilot bandwidth g P h, g MISE h, g AMISE MISE AMISE h, g h, g MISE AMISE MISE AMISE h, g MISE h, g AMISE MISE AMISE h, g MISE h, g AMISE MISE AMISE Figure.: Upper plot, from left to right: exact MISE versus AMISE for the circular-linear mixture.5 for n = and n =. Lower plot, from left to right: spherical-linear mixture.5 for n = and n =. The solid curves are for the MISE, where the dashed ones are for the AMISE. The pairs of bandwidths that minimizes each surface error are denoted by h, g MISE and by h, g AMISE. The following two results show the bootstrap MISE expressions for the estimators. and.9 in the case where the kernels are von Mises and normals. As in the linear case, no resampling is needed for computing the bootstrap MISE. These bootstrap versions of the error provide an overall summary of the estimator behaviour, with no restriction on the underlying densities, as long as von Mises and normal kernels are considered. In addition, the following results could be used to derive a bandwidth selector, but it will depend on the selection of pilot bandwidths for both components, which is not an easy problem.

75 .5. Conclusions 55 Corollary.. The bootstrap MISE for directional data, given a sample of length n, the von Mises kernel Lr = e r and a pilot bandwidth h P, is: MISE h ˆfh P = D q hn + n T n Ψ h Ψ h + Ψ h, where the matrices Ψ ah, a =,, have the same entries as Ψ a h but with κ i = /h P and µ i = X i for i =,..., n. Remark.. The particular case where q = and h P = h, Corollary. corresponds to the expression of the bootstrap MISE given in Di Marzio et al.. Corollary.. The bootstrap MISE for directional-linear data, given a sample of length n, the von Mises-normal kernel LKr, t = e r φ t and a pair of pilot bandwidths h P, g P, is: MISE h P,g ˆfh,g P = D q hπ gn + n T n Ψ h Ω g Ψ h Ω g + Ψ h Ω g, where the matrices Ψ ah and Ω ag, a =,, have the same entries as Ψ a h and Ω a g but with κ i = /h P, µ i = X i, m i = Z i and σ i = g P for i =,..., n..5 Conclusions A kernel density estimator for directional-linear data is proposed. Bias, variance and asymptotic normality of the estimator are derived, as well as expressions for the MISE and AMISE. For the particular case of mixtures of von Mises, for directional data, and mixtures of von Mises and normals, in the directional-linear case, the exact expressions for the MISE are obtained, which enables the comparison with their asymptotic versions. Undoubtedly, one of the main issues in kernel estimation is the appropriate selection of the bandwidth parameter. Although an optimal pair of bandwidths in the AMISE sense has been derived, further research must be done in order to obtain a bandwidth selection method that could be applied in practice. This problem extends somehow to the directional setting, where likelihood and least squares cross-validation methods seem to be the available procedures. However, the exact MISE computations open a route to develop bandwidth selectors, for instance, following the ideas in Oliveira et al.. In fact, a bootstrap version for the MISE when assuming that the underlying mode is a mixture allows for the derivation of bootstrap bandwidths, as in Cao 99 for the linear case. A straightforward extension of the proposed estimator can be found in the directional-multidimensional setting, considering a multidimensional random variable. In this case, the linear part of the estimator should be properly adapted including a multidimensional kernel and possibly a bandwidth matrix. Acknowledgements The authors acknowledge the support of Project MTM8, from the Spanish Ministry of Science and Innovation, Project MDS75PR from Dirección Xeral de I+D, Xunta de

76 56 Chapter. Kernel density estimation for directional-linear data Galicia and IAP network StUDyS, from Belgian Science Policy. Work of E. García-Portugués has been supported by FPU grant AP 957 from the Spanish Ministry of Education. The authors also acknowledge the suggestions by two anonymous referees that helped improving this paper..a Some technical lemmas Some technical lemmas that will be used along the proofs of the main results are introduced in this section. To begin with, Lemma. establishes the asymptotic behaviour of λ h,q L in.. With the aim of clarifying the computation of the integrals in the proofs of the main results, Lemma. details a change of variables in Ω q, whereas Lemma. is used to simplify integrals in Ω q. Lemma. shows some of the constants introduced along the work for the case where the kernel is von Mises and, finally, Lemma.5 states the Lemma of Zhao and Wu. Detailed proofs of these lemmas can be found in Appendix.C. This appendix also includes a rebuild of the proof of the Lemma.5, using the same techniques as for the other results, which presents some differences from the original proof. Lemma.. Under condition D, the limit of λ h,q L = ω h q Lrr q rh q dr, when h, is lim h,ql = λ q L = q ω q h Lrr q dr,.6 where ω q is the surface area of Ω q, for q. Lemma. A change of variables in Ω q. Let f be a function defined in Ω q and y Ω q a fixed point. The integral Ω q fx ω q dx can be expressed in one of the following equivalent integrals: fx ω q dx = f t, t ξ t q ω q dξ dt.7 Ω q Ω q = ty + t By ξ t q ω q dξ dt,.8 Ω q f where B y = b,..., b q q+ q is the semi-orthonormal matrix B T y B y = I q and B y B T y = I q+ yy T resulting from the completion of y to the orthonormal basis {y, b,..., b q }. Lemma.. Consider x Ω q, a point in the q-dimensional sphere with entries x,..., x q+. For all i, j =,..., q +, it holds that {, i j, x i ω q dx =, x i x j ω q dx = ω q Ω q Ω q q+, i = j, where ω q is the surface area of Ω q, for q. Lemma.. For the von Mises kernel, i.e., Lr = e r, r, c h,q L = e /h h q π q+ I q /h, λ q L = π q, bq L = q, d ql = q. Lemma.5 Lemma in Zhao and Wu. Under the conditions D D, the expected value of the directional kernel density estimator in a point x Ω q, is E ˆfh x = fx + b q LΨf, xh + o h, where Ψf, x and b q L are given in. and.5, respectively.

77 .B. Proofs of the main results 57.B Proofs of the main results Proof of Proposition.. The variance can be decomposed in two terms as follows: Var ˆfh x = c h,ql x E L T X n h n E ˆfh x,.9 where the calculus of the first term is quite similar to the calculus of the bias given in Lemma.5 and the second is given by the same result. Therefore, analogously to the equation.7 of Lemma.5, the first addend can be expressed as c h,q L h h q L rr q h r q f x + α x,ξ ω q dξ dr,. n Ω q just replacing the kernel L by the squared kernel L and where α x,ξ = rh x + h r h r B x ξ Ω q with B x defined as in Lemma.. By condition D, the Taylor expansion of f at x is fx + α x,ξ fx = α T x,ξ fx + αt x,ξhfxα x,ξ + o α T x,ξα x,ξ. Hence,. = c h,ql h h q L rr q h r q {fx rh ω q x T fx n + r h ω q x T Hfxx+ h r h rω q fx x T Hfxx + rω q o h } dr q = c h,ql h {ω q c h,q Lh q L rr q h r q dr fx n h h ω q c h,q Lh q L rr q h r q dr x T fx + h ω q + h ω q h h c h,q Lh q L rr q + h r q dr x T Hfxx c h,q Lh q L rr q h r q dr q fx x T Hfxx h +ω q c h,q Lh q L rr q h r q dr o h }.. The integrals in. can be simplified. For that purpose, define for h > and indices i =,,, j =, the following function: φ h,i,j r = c h,q Lh q L rr q +i h r q j,h r, r,. As n, the bandwidth h and the limit of φ h,i,j is given by φ i,j r = lim h φ h,i,j r = λ q L L rr q +i q j, r.

78 58 Chapter. Kernel density estimation for directional-linear data Applying the Dominated Convergence Theorem DCT and the same techniques of the proof of Lemma. see Remark.5, it can be seen that: lim h φ h,i,j r dr = λ q L q j L rr q +i dr.6 = j ω q d q L, i =, j ω q e q L, i =, j L rr q + dr ω q Lrr q dr, i =, where e q L = L rr q dr / Lrr q dr. Then, taking into account that ϕ h,i,j r dr = ϕ i,j r dr + o the integrals in brackets of. can be replaced, obtaining that. = c h,ql n d q Lfx + e q Lh Ψf, x + o nh q.. The second term in.9 is given by E ˆfh x = fx + b q Lh Ψf, x + o h.. The result holds from. and.: Var ˆfh x = c h,ql d q Lfx + e q Lh Ψf, x n fx + b q Lh Ψf, x + o nh q, n which can be simplified into Var ˆfh x = c h,ql d q Lfx + o nh q. n Proof of Proposition.. Denote by Bias ˆfh,g x, z = E ˆfh,g x, z fx, z the bias of the kernel estimator. Applying the change of variables stated in Lemma. and then an ordinary change of variables given by r = t, the bias results in: h Bias ˆfh,g x, z = c h,ql x T X E LK g h, z Z fx, z g = c h,ql x T y LK g Ω q R h, z t fy, t fx, z dt ω q dy g x T y = c h,q L LK, v fy, z gv fx, z dv ω q dy = c h,q L Ω q R Ω q R h u LK, v f ux + u Bx ξ, z gv fx, z h u q dv ω q dξ du h = c h,q Lh q LK r, v f x, z + α x,z,ξ fx, z dv ω q dξ Ω q r q h r q dr R

79 .B. Proofs of the main results 59 where α x,z,ξ = h = c h,q Lh q L r r q h r q K v R f x, z + α x,z,ξ fx, z ω q dξ dv dr,. Ω q rh x + h r h r B x ξ, gv Ω q R. The computation of the last integral in. is achieved using the multivariate Taylor expansion of f at x, z, in virtue of condition DL: fx, z + α x,z,ξ fx, z = α T x,z,ξ fx, z + αt x,z,ξhfx, zα x,z,ξ + o α T x,z,ξα x,z,ξ. Let denote by γ x,ξ = rh x + h r h r B x ξ. Bearing in mind the directional and linear components of the gradient fx, z and the Hessian matrix Hfx, z, it follows fx, z + α x,z,ξ fx, z = γ T x,ξ x fx, z gv z fx, z + γ T x,ξh x fx, zγ x,ξ gvγ T x,ξh x,z fx, z + g v H z fx, z + o α T x,z,ξα x,z,ξ. Then, the calculus of the integral Ω q f x, z + α x,z,ξ fx, z ω q dξ can be split into six addends. Second and sixth terms are computed straightforward: Ω q gv z fx, z ω q dξ = ω q gv z fx, z,.5 Ω q g v H z fx, z ω q dξ = ω q g v H z fx, z..6 For the first and fourth addends, by Lemma., the integration of ξ i with respect to ξ is zero: Ω q γ T ξ,z x fx, z ω q dξ = ω q h rx T x fx, z,.7 Ω q gvγ x,ξ H x,z fx, z ω q dξ = gvω q h rx T H x,z fx, z..8 Finally, in the fifth term, the integrand can be decomposed as follows: q γ T x,ξh x fx, zγ x,ξ = h r x T H x fx, zx + h r h r ξ i ξ j b T i H x fx, zb j i,j= q h r h r ξ i x T H x fx, zb i. i= In virtue of Lemma., the third addend vanishes as well as the second, except for the diagonal terms. Next, as {x, b,..., b q } is an orthonormal basis in R q+, the sum of the diagonal terms can be computed by simple algebra: q q b T i H x fx, zb i = tr H x fx, z b i b T i i= i= = tr H x fx, z I q+ xx T

80 6 Chapter. Kernel density estimation for directional-linear data = xfx, z x T H x fx, zx, where xfx, z is the Laplacian of f restricted to the directional component x, I q+ is the identity matrix of order q + and tr is the trace operator. By Lemma. and the previous calculus, the fifth term is γ T ξ,zh x fx, zγ ξ,z ω q dξ Ω q = ω q h r x T H x fx, zx + ω q h r h rq xfx, z x T H x fx, zx..9 Note also that the order of α T x,z,ξ α x,z,ξ is easily computed: o α T x,z,ξα x,z,ξ = ro h + v o g.. Combining.5., and using condition DL on the kernel K: h {. = c h,q Lh q L r r q h r q K v γ ξ,x x fx, z gv z fx, z R Ω q + γ T ξ,xh x fx, zγ ξ,x gvγ T ξ,xh x,z fx, z + g v H z fx, z + ro h + v o g } ω q dξ dv dr h = ω q c h,q Lh q L r r q h r q R { Kv h rx T x fx, z gv z fx, z + h r x T H x fx, z + h r h rq xfx, z x T H x fx, zx + gvh rx T H x,z fx, z + g v H zfx, z + ro h + v o g } dv dr h { = ω q c h,q Lh q L r r q h r q h rx T x fx, z + h r x T H x fx, z + h r h rq xfx, z x T H x fx, zx + g H z fx, zµ K + ro h + µ Ko g } dr.. For h >, i =,,, j =,, consider the following functions ϕ h,i,j r = c h,q Lh q Lrr q +i h r q j,h r, r,. When n, h and the limit of ϕ h,i,j is given by Applying Remark.5 of Lemma., lim h ϕ i,j r = lim h ϕ h,i,j r = λ q L Lrr q +i q j, r. ϕ i,j,h r dr = λ q L q j Lrr q +i dr.6 = j ω q, i =, j ω q b q L, i =, j Lrr q ω + dr q Lrr q dr, i =.

81 .B. Proofs of the main results 6 Then, the six integrals in. can be written using ϕ i,j,h r dr = ϕ i,j r dr + o. Replacing this in. leads to. = h bq L ω q + o x T x fx, z ω q + h ω q bq L Lrr q + dr Lrr q dr + o x T H x fx, zx ω q + h ω q bq L + o ω q + ω q + o ω q q xfx, z x T H x fx, zx g H z fx, zµ K bq L + ω q + o o h + ω q + o o g ω q ω q = h b q L x T x fx, z + q xfx x T H x fx, zx + g H z fx, zµ K + O h + o h + o g = h b q LΨ x f, x, z + g H zfx, zµ K + o h + g. Proof of Proposition.. The variance can be decomposed as Var ˆfh,g x, z = c h,ql x ng E LK T X h, z Z g n E ˆfh,g x, z,. where the calculus of the first term is quite similar to the calculus of the bias and the second is given in the previous result. Analogous to., c h,q L x ng E LK T X h, z Z g = c h,ql h h q L rr q h r q K v ng R fx, z + α x,z,ξ ω q dξ dv dr,. Ω q just replacing LK by LK. Then, using that K is a symmetric function around zero: K v dv = RK, vk v dv =, v K v dv = µ K,. R R Applying the multivariate Taylor expansion of f at x, z and by., equation. results in c h,q L h. = ω q h q L rr q h r q ng gv z fx, z + R R { K v fx, z h rx T x fx, z h r x T H x fx, z + h r h rq xfx, z x T H x fx, zx

82 6 Chapter. Kernel density estimation for directional-linear data. + gvh rx T H x,z fx, z + g v H zfx, z + ro h + v o g } dv dr c h,q L h = ω q h q L rr q h r q {RKfx, z RKh rx T x fx, z ng + RK h r x T H x fx, z + h r h rq xfx, z x T H x fx, zx + µ K g H zfx, z + ro h + µ K o g } dr..5 Define the following functions, for h >, i =,, and j =, : φ h,i,j r = c h,q Lh q L rr q +i h r q j,h r, r,. When n, h and the limit of φ h,i,j is given by φ i,j r = lim h φ h,i,j r = λ q L L rr q +i q j, r. Applying the same techniques of the proof of Lemma. to the functions φ h,i,j with the different values of i, j and L instead of L, and using the relation., it follows: j ω q d q L, i =, lim φ h,i,j r dr = λ q L q j L rr q +i dr.6 j = ω q e q L, i =, h j L rr q + dr ω q Lrr q, i =, dr where e q L = L rr q dr / Lrr q dr. So, for the terms between square brackets of.5, φ h,i,j r dr = φ i,j r dr + o. Replacing this leads to.5 = c { h,ql dq L RKω q + o fx, z RKh eq L ω q + o x T x fx, z ng ω q ω q + RKh ω q L rr q + dr ω q Lrr q dr + o x T H x fx, z + RKh ω q eq L + o q ω xfx, z x T H x fx, zx q + µ K g ω q dq L + o H z fx, z ω q eq L + ω q + o o h dq L + ω q + o o } g ω q ω q = c h,ql RKd q Lfx, z + RKe q Lh Ψ x fx, z + µ K d q L g ng H zfx, z + o nh q g..6 The second term of. is E ˆfh,g x, z = fx, z + b q Lh Ψ x f, x, z + g µ KH z fx, z + o h + g..7

83 .B. Proofs of the main results 6 Joining.6 and.7, Var ˆfh,g x, z = c h,ql ng which can be simplified into RKd q Lfx, z + RKe q Lh Ψ x fx, z + µ K d q L g H zfx, z fx, z + b q Lh Ψ x f, x, z + g n µ KH z fx, z + o nh q g + o n h + g, Var ˆfh,g x, z = c h,ql RKd q Lfx, z + o nh q g. ng Proof of Theorem.. Let {X i, Z i } n i= be a random sample from the directional-linear random variable X, Z, whose support is contained in Ω q R. The directional kernel estimator in a fixed point x, z Ω q R can be written as ˆf hn,gn x, z = n V n,i, n i= V n,i = c h,ql x T X i LK g h, z Z i, n g n where notation h n and g n for the bandwidths remarks their dependence on the sample size n given by condition DL. As {X i, Z i } n i= is a collection of independent and identically distributed iid copies of X, Z, then {V n,i } n i= is also an iid collection of copies of the random variable V n = LK x T X, z Z h n g n. Then, the Lyapunov s condition ensures that, if for some δ > the next condition holds: E V n E V n +δ lim n n δ Var V n + δ =, then the following central limit theorem is valid: n Vn E V n Var Vn d N,, where V n = n ni= V n,i. This condition will be proved for V n = LK x T X, z Z h n g n. First of all, the order of E V n +δ is E V n +δ = chn,ql = g n c hn,ql x T y LK g n +δ h n, z t g n x LK +δ T y h, z t n g n +δ fy, t dt ω q dy fy, t dt ω q dy

84 6 Chapter. Kernel density estimation for directional-linear data chn,ql +δ h = g n h q n n g n q Ω q R LK +δ r, v fx, z + α x,z,ξ dv ω q dξ r q h nr dr chn,ql +δ h = g n h q n n LK +δ r, v dv ω q dξ r q h q g nr dr n Ω q R fx, z + o h n + gn chn,ql +δ g n h q g n q ω q LK +δ r, v r q dv dr fx, z n R λq L h q +δ n g n h q g n q ω q LK +δ r, v r q dv dr fx, z n R = h q ng n +δ R LK+δ r, v r q dv dr fx, z q +δ ω q Lrr q +δ dr = O h q ng n +δ. On the other hand, by Proposition., the variance of V n has order Var V n = c h,ql RKd q Lfx, z + o h q g ng n RKd qlfx, z n λ q L h q = O ng n h q ng n. Using that E V n E V n +δ = O E V n +δ see Remark. and by condition DL, it follows that the Lyapunov s condition is satisfied: E V n E V n +δ = O hq ng n +δ = O nh q ng n δ Var V n + δ n δ h q ng n + δ n δ, as n. Therefore, ˆf hn,gn x, z E ˆfhn,g n x, z Var ˆfhn,g n x, z d N,, pointwise for every x, z Ω q R note that n is included in the variance term. Plugging-in the asymptotic expressions for the bias and the variance results nh q d ng n ˆfhn,g n x, z fx, z ABias ˆfhn,g n x, z N, RKd q Lfx, z. Remark.. The proof of E V n E V n +δ = O E V n +δ is simple. For example, using the Newton Binomial: for any r R, x + y r = r k= k x r k y k, with r k := rr r k+ k!. As + δ >, by the triangular inequality and the Newton Binomial, E V n E V n +δ E V n E V n +δ + δ = E V n +δ k E V n k k k=

85 .B. Proofs of the main results 65 + δ = E V n +δ k E V n k. k k= Now, as E V n = fx, z + O h n + g n = O by Proposition. and by condition DL, the terms E V n are constants asymptotically. Also, as E V n r E V n s for r s, it follows E V n E V n +δ = O + δ E k k= V n +δ k = O E V n +δ. Proof of Proposition.. It is straightforward from Proposition. and Lemma.5. For a point x in Ω q : MSE ˆfh x = E ˆfh x fx + Var ˆfh x = b q LΨf, xh + o h c h,q L + d q Lfx + o nh q n = b q L Ψf, x h + c h,ql d q Lfx + o h + nh q. n Integrating over Ω q in the previous equation, MISE ˆfh = b q L Ωq Ψf, x ω q dxh + c h,ql d q L + o h + nh q. n Proof of Corollary.. To obtain the bandwidth that minimizes AMISE consider. in the previous equation and derive it with respect to h: d dh AMISE ˆfh = b q L R Ψf, h qλ q L h q+ d q Ln =. The solution of this equation results in h AMISE = +q qd q L b q L. λ q LRΨf, n Proof of Proposition.5. It is straightforward from Propositions. and.: MSE ˆfh,g x, z = E ˆfh,g x, z = fx, z + Var ˆfh,g x, z h b q LΨ x f, x, z + g H zfx, zµ K + o h + o g + c h,ql RKd q Lfx, z + o nh q g ng = h b q L Ψ x f, x, z + g µ K H z fx, z

86 66 Chapter. Kernel density estimation for directional-linear data + h g b q Lµ KH z fx, zψ x f, x, z + c h,ql RKd q Lfx, z + o h + g + nh q g. ng Integrating the previous equation and denoting by I φ = φx, z dz ω qdx for a function φ : Ω q R R, MISE ˆfh,g = b q L I Ψ x f,, h + g µ K I H z f, + h g b q Lµ KI Ψ x f,, H z f, + c h,ql d q LRK ng + o h + g + nh q g. Proof of Corollary.. Suppose that g = βh in the previous equation. Again, use that c h,q L λ q L h q and derive with respect to h to obtain where d dh AMISE ˆfh,βh = c h + c h + c h q + c h q+ =, c = b q L I It follows immediately that Ψ x f,,, c = µ K I c = b q Lµ KI Ψ x f,, H z f, β, h AMISE = q + c c + c + c H z f, β, c = d qlrk λ q Lnβ. 5+q. Given that R b q LΨ x f,, + β µ KH z f, = c + c + c, the desired expression is obtained. In the case where q = it is possible to derive the form of β by solving h AMISE ˆfh,g = and g AMISE ˆfh,g =. For this case, β has the closed form β = µ K I H z f, b q L I Ψ x f,,. Proof of Proposition.6. Consider the r-mixture of directional von Mises densities given in.. Then: MISE ˆfh = E = E Ω q Ω q ˆfh x f r x ωq dx ˆfh x ˆf h xf r x + f r x ω q dx

87 .B. Proofs of the main results 67 = c h,ql n Ω q Ω q L + c h,ql n n x T y f r y ω q dx ω q dy Ω q h Ω q Ω q L x T y x T z L f r yf r z ω q dx ω q dy ω q dz x T y c h,q L f r xf r y ω q dx ω q dy Ω q Ω q L + f r x ω q dx Ω q = The four terms of the previous equation will be computed separately. The first one is The second one is 8 = c h,ql n Ω q n c h,q L = p j n = = = 9 = c h,ql n n = c h,ql n n j= n j= n j= n j= p j c h,q L n Ω q L Ω q p j c h,q L c h/,q Ln p j c h,q L c h/,q Ln = D q hn. h h h x T y f r y ω q dx ω q dy h Ω q e e x T y Ω q Ω q x T y h C q κ j e κ jy T µ j ω q dx ω q dy h/ ω q dxc q κ j e κ jy T µ j ω q dy Ω q C q κ j e κ jyt µ j ω q dy x T y x T z L Ω q Ω q Ω q h L h f r yf r z ω q dx ω q dy ω q dz r r e /h e xt y/h e xt z/h p j p l C q κ j C q κ l e κ jy T µ j e κ lz T µ l Ω q Ω q Ω q j= l= ω q dx ω q dy ω q dz = c h,ql n r r e /h p j p l C q κ j C q κ l n j= l= xt y/h e e xt z/h e κ jy T µ j e κ lz T µ l ω q dx ω q dy ω q dz Ω q Ω q Ω q n = π q+ r r h q I q /h p j p l C q κ j C q κ l n j= l= e xt y/h +κ j yt µ j ω q dy e xt z/h +κ l zt µ l ω q dz ω q dx Ω q Ω q Ω q

88 68 Chapter. Kernel density estimation for directional-linear data = n C q /h r r j= l= e Ωq x/h +κ l µ l z T x/h +κl µ l x/h +κ l µ l p j p l C q κ j C q κ l ω q dz x/h+κjµ j yt e Ω q Ω q ω q dx x/h +κj µ j x/h +κ j µ j ω q dy = n r r C q /h C q κ j C q κ l p j p l j= l= Ω q C q x/h + κ j µ j C q x/h + κ l µ l ω qdx = n p T Ψ hp, where Ψ h r r is the matrix with ij-th entry C q /h C qκ j C qκ l Ω q C q x/h +κ j µ j C q x/h +κ l µ l ω qdx. The third one results in: x T y = c h,q L f r xf r y ω q dx ω q dy = c h,q L Ω q r j= l= L Ω q r = c h,q Le /h r p j p l r j= l= Ω q h Ω q e x T y h C q κ j C q κ l e κ jx T µ j e κ ly T µ l ω q dx ω q dy p j p l C q κ j C q κ l e Ω q Ωq y/h +κ j µ j x T y/h +κj µ j y/h +κ j µ j ω q dxe κ ly T µ l ω q dy r r = C q /h e κlyt µl p j p l C q κ j C q κ l Ω q C q y/h + κ j µ j ω qdy = p T Ψ hp, j= l= where the matrix Ψ h r r has ij-th entry C q /h C q κ j C q κ l Ω q Finally, the fourth term is: r = p j f vm x; µ j, κ j ω q dx = Ω q j= r r Ω q j= l= p j p l f vm x; µ j, κ j f vm x; µ l, κ l ω q dx r r = p j p l C q κ j C q κ l e κjxt µ j e κlxt µ l ω q dx j= l= Ω q = = r r j= l= r r j= l= = p T Ψ hp, p j p l C q κ j C q κ l e Ωq κ jµ j +κ l µ l x T κj µ j +κ l µ l κ j µ j +κ l µ l p j p l C q κ j C q κ l C q κ j µ j + κ l µ l e κ l yt µ l C q y/h +κ j µ j ω qdy. ω q dx where Ψ h r r represents the matrix with ij-th entry C q /h C qκ j C qκ l Ω q C q y/h +κ j µ j ω qdy. Note that if κ j µ j + κ l µ l =, then Ω q ω q dx = C q = ω q so the result is consistent in this situation.

89 .B. Proofs of the main results 69 Proof of Proposition.7. Consider the r-mixture of directional-linear independent von Mises and normals f r x, z = r j= p j f vm x; µ j, κ j φ σj z m j. Hence: MISE ˆfh,g = E = E = c h,ql ng ˆfh,g x, z f r x, z dz ωq dx ˆf h,g x, z ˆf h,g x, zf r x, z + f r x, z dz ω q dx x LK T y h, z t f r y, t dz ω q dx dt ω q dy g + c h,ql n x T y LK ng h, z t g x T u LK h, z s f r y, tf r u, s dz ω q dx dt ω q dy ds ω q du g c h,ql x T y LK g h, z t f r x, zf r y, t g dz ω q dx dt ω q dy + f r x, z dz ω q dx.. As the directional-kernel is a product kernel and the mixtures are independent the directional and linear parts can be easily disentangled:. = n n j= p j c h,q L Ω q Ω q L x T y f vm y; µ j, κ j ω q dx ω q dy h z t g K φ σj t m j dz dt R R g + n n n p j p l c h,q L j= l= Ω q Ω q Ω q L f vm y; µ j, κ j f vm u; µ l, κ l ω q dx ω q dy ω q du K g R R R n n j= l= n n p j p l j= l= R x T y x T u L z t z s K φ σj t m j φ σl s m l dz dt ds g g x p j p l c T y h,q L f vm x; µ j, κ j f vm y; µ l, κ l Ω q Ω q L z t ω q dx ω q dy K g R R g + φ σj z m j φ σl z m l dz h h h φ σj z m j φ σl z m l dz dt f vm x; µ j, κ j Ω q f vm x; µ l, κ l ω q dx..

90 7 Chapter. Kernel density estimation for directional-linear data The directional parts were calculated in the previous theorem and the linear ones were studied in Marron and Wand 99 see also Wand and Jones 995, page 6. The combination of these two results yields. = D q hπ ng + n p T Ψ h Ω g p + p T Ψ h Ω g p + p T Ψ h Ω g p, where the r r matrices Ω a g have the ij-th entry equal to φ σa m i m j, σ a = ag +σi +σ j for a =,, and Ψ a h are the matrices of Proposition.6. The notation denotes the Hadamard product between matrices, i.e., if A ij = a ij, B ij = b ij, then A B ij = a ij b ij. Proof of Corollary.. In virtue of equation.7, if the kernel of the density estimator. is Lr = e r, r, then the kernel estimator is the n-mixture of von Mises with means X i, i =,..., n, and common concentrations /h P given by.7, where h P is the pilot bandwidth parameter Proof of Corollary.. It follows immediately from the previous proposition and corollary..c Proofs of the technical lemmas Proof of Lemma.. Consider the functions ϕ h r = Lrr q h r q,h r, ϕr = lim h ϕ h r = Lrr q q, r. Then, proving lim h λ h,q L = λ q L is equivalent to proving lim h ϕ h r dr = ϕr dr. Consider first the case q. As q, then h r q q, h >, r, h. Then: ϕ h r Lrr q q,h r ϕr, r,, h >. Because ϕr dr < by condition D on the kernel L, then by the DCT it follows that lim h ϕ h r dr = ϕr dr. For the case q =, ϕ h r = Lrr h r. Consider now the following decomposition: ϕ h r dr = Lrr h r,h r dr + Lrr h r h,h r dr. The limit of the first integral can be derived analogously with the DCT. As h r monotone increasing, then h r, r, h, h >. Therefore: Lrr h r,h r Lrr,h r ϕr, r,, h >. is Then, as lim h Lrr h r,h r = ϕr and ϕr dr < by condition D, DCT guarantees that lim h Lrr h r,h r dr = ϕr dr.

91 .C. Proofs of the technical lemmas 7 For the second integral, as a consequence of D and Remark., L must decrease faster than any power function. In particular, for some fixed h >, Lr r, r h, h, h, h. Using this, it results in: h lim h h This completes the proof. Lrr h h r dr lim h h r h r dr = lim h =. h Remark.5. It is possible to apply the same techniques to prove the result with the functions ϕ h,i,j,k r = L k rr q +i h r q j,h r, ϕ i,j,k r = lim h ϕ h,i,j,k r = L k rr q +i q j, r, with i =,,, j =, and k =,. For the cases where q j, use DCT. For the other cases, subdivide the integral over, h into the intervals, h and h, h. Then apply DCT in the former and use a suitable power function to make the latter tend to zero in the same way as described previously. Proof of Lemma.. Following Blumenson 96, if x is a vector of norm r with components x j, j =,..., n, with respect to an orthonormal basis in R n, then the n-dimensional spherical coordinates of x are given by x = r cos φ, j x j = r cos φ j sin φ k, j =,..., n, x n = r sin θ x n = r cos θ k= n k= n k= sin φ k, sin φ k, n J = r n k= sin k φ n k.. where φ j π, j =,..., n, θ < π and r <. J denotes the Jacobian of the transformation. Special cases of this parametrization are the polar coordinates n =, { x = r cos θ, J = r, x = r sin θ, and the spherical coordinates n =, x = r cos φ, x = r sin θ sin φ, x = r cos θ sin φ, J = r sin φ. Note that sometimes this parametrization appears with the roles of x and x swapped. To continue with the previous notation, let denote q = n. Using the spherical coordinates r =, as the integration is on Ω n and then applying the change of variables t = cos φ, dφ = t dt,.5

92 7 Chapter. Kernel density estimation for directional-linear data it follows that fx ω n dx Ω n = fx,..., x n dx,..., x n Ω n. π π = n π n f cos φ, cos φ sin φ,..., cos θ sin φ k n sin k φ n k k=.5 π π = = n k= π π n j=n π n dφ j dθ sin k φ n k t n t k=. = f Ω n f n π f sin k φ n k t n k= t, cos φ t,..., cos θ n j=n dφ j dt dθ k= t, cos φ t,..., cos θ n j=n dφ j dθ dt t, t ξ,..., t ξn t n dξ,..., ξ n dt = t, t ξ t n ω n dξ dt. Ω n f k= sin φ k t sin φ k t So, for the q-dimensional sphere Ω q, equation.7 follows. Note that as the parametrization. is invariant to coordinates permutations and t can be placed in any argument of the function. The rest of the arguments will remain having the entries t n ξ. This expression can be improved using an adequate basis representation. From a fixed point y Ω q, it is possible to complete an orthonormal basis of R q+, say {y, b,..., b q }. So an element x Ω q will be expressed as: q x = x, y y + x, b i b i = ty + t ξ, i= where t = x, y, and ξ T y = {η Ω q : η y}. Related to the basis {y, b,..., b q }, there are the orthogonal matrix B = y, b,..., b q q+ q+ and the semi-orthogonal matrix B y = b,..., b q q+ q. Using the fact that B is an orthonormal matrix, is possible to make the change x = Bz, with det B = and B Ω q = B T Ω q = Ω q as B preserves distances. Then, the relation.8 holds: fx ω q dx = fbz det B ω q dz Ω q B Ω q

93 .C. Proofs of the technical lemmas 7 = fbz ω q dz Ω q.7 = = f Ω q f Ω q B t, t ξ T t q ω q dξ dt ty + t By ξ t q ω q dξ dt. Proof of Lemma.. Without loss of generality, assume that, by the q-spherical coordinates., x i = cos φ and x j = cos φ sin φ. Using this, the calculus are straightforward for the integrands x i and x i x j it is assumed that only the terms with positive index are taken into account in the products: Ω q x i ω q dx = = Ω q x i x j ω q dx = π π π π = ω q =, = π π j=q π π π q π q π q q dφ j dθ π q cos φ k= q π q k= sin k φ q k sin k φ q k sin q φ j=q q cos φ cos φ sin φ k= sin k φ q k j=q j=q dφ j dθ π dφ j dθ cos φ sin q φ dφ k= sin k φ q k sin q φ sin q φ dφ j dθ π cos φ sin q φ dφ cos φ sin q φ dφ = ω q =. The integrand x i is even simpler, using the fact that the integration is over Ω q: Ω q x i ω q dx = q + q+ k= Ω q x k ω q dx = q + q+ Ω q k= x k ω q dx = ω q q +. Proof of Lemma.. For a =,, p =, and q, the properties of the Gamma function ensure that L a rr q p dr = e ar r q p dr = Γ q p + a q p+. Therefore: λ q L = q π q q Γ q Γ =π q q q q, bq L =Γ Γ / = q, d ql = Γ q / q Γ = q. The expression for c h,q L arises from the fact that c h,q L = C q /h e /h. q

94 7 Chapter. Kernel density estimation for directional-linear data Proof of Lemma.5. This proof is a rebuild of the one given in Zhao and Wu and is included for the aim of completeness of this work. Furthermore, many techniques used in this proof are also helpful for the proofs of other results in this paper. Let denote Bias ˆfh x = E ˆfh x fx. To compute the bias, use Lemma. for the change of variables with the orthonormal and semi-orthonormal matrices B = x, b,..., b q and B x = b,..., b q, and then apply the ordinary change of variables This results in: x T X Bias ˆfh x = c h,q LE L fx h x T y = c h,q L L Ω q h x T y = c h,q L L Ω q h t = c h,q L L Ω q h r = t h, dr = h dt..6 t q ω q dξ dt.6 h = c h,q Lh q fy ω q dy c h,q L L Ω q fy fx ω q dy f x T y ω q dyfx h tx + t Bx ξ fx Ω q Lr f x + α x,ξ fx r q h r q ω q dξ dr h = c h,q Lh q Lrr q h r q Ω f x + α x,ξ fx q ω q dξ dr,.7 where α x,ξ = rh x + h r h r B x ξ Ω q. By condition D, the Taylor expansion of f at x is fx + α x,ξ fx = α T x,ξ fx + αt x,ξhfxα x,ξ + o α T x,ξα x,ξ, so the calculus of.7 can be split in three parts. For the first use that the integration of ξ i vanishes by Lemma.: α T x,ξ fx ω q dξ = rh x T fx ω q dξ Ω q Ω q + h r h r ξ T B T x fx ω q dξ Ω q In the second, by the results of Lemma., α T x,ξhfxα x,ξ ω q dξ = r h Ω q = rh ω q x T fx.8 Ω q x T Hfxx ω q dξ

95 .C. Proofs of the technical lemmas 75 rh r h r x T HfxB x ξ ω q dξ Ω q + h r h r ξ T B T x HfxB x ξ ω q dξ Ω q = r h ω q x T Hfxx + h r h r q Ω q i,j= b T i Hfxb j ξ i ξ j ω q dξ = r h ω q x T Hfxx q + h r h r b T i Hfxb i ξi Ω ω q dξ q i= = r h ω q x T Hfxx + h r h rω q q fx x T Hfxx..9 In the last step it is used that by q i= b ib T i + xx T = B x B T x = I q+ xx T, q q b T i Hfxb i = tr Hfx b i b T i = tr Hfx I q+ xx T = fx x T Hfxx. i= i= Apart from this, the order of the Taylor expansion is o α T x,ξα x,ξ = o r h + h r h r = o r h + h r h r = ro h..5 Adding.8.5, { h.7 = ω q c h,q Lh q Lrr q h r q rh x T fx + r h + h r h r q fx x T Hfxx + r o } h dr xt Hfxx h = h ω q c h,q Lh q Lrr q h r q dr x T fx + h ω h q c h,q Lh q Lrr q + h r q dr x T Hfxx + h ω h q c h,q Lh q Lrr q h r q dr q fx x T Hfxx h + ω q c h,q Lh q Lrr q h r q dr o h..5 Consider the following functions for h > and i, j =, : ϕ h,i,j r = c h,q Lh q Lrr q +i h r q j,h r, r,. When n, h and the limit of ϕ h,i,j is given by ϕ i,j r = lim h ϕ h,i,j r = λ q L Lrr q +i q j, r.

96 76 Chapter. Kernel density estimation for directional-linear data Then, by Remark.5 and Lemma.: lim h ϕ h r dr = λ q L q j Lrr q i dr.6 = j ω q b q L, i =, j Lrr q ω + dr q Lrr q dr, i =. So, for the terms between square brackets of.5, ϕ h r dr = ϕr dr + o. Replacing this in.5 leads to.5 = h bq L ω q + o x T fx ω q + h ω q bq L Lrr q + dr Lrr q dr + o x T Hfxx + h ω q = h b q L ω q bq L ω q + o + o o h ω q + O h + o h q fx x T bq L Hfxx + ω q x T fx + q fx x T Hfxx = h b q LΨf, x + o h. References Akaike, H. 95. An approximation to the density function. Ann. Inst. Statist. Math., 6:7. Bai, Z. D., Rao, C. R., and Zhao, L. C Kernel estimators of density function of directional data. J. Multivariate Anal., 7: 9. Blumenson, L. E. 96. Classroom notes: a derivation of n-dimensional spherical coordinates. Amer. Math. Monthly, 67:6 66. Cao, R. 99. Bootstrapping the mean integrated squared error. J. Multivariate Anal., 5:7 6. Cao, R., Cuevas, A., and Gonzalez Manteiga, W. 99. A comparative study of several smoothing methods in density estimation. Comput. Statist. Data Anal., 7:5 76. Di Marzio, M., Panzera, A., and Taylor, C. C.. Kernel density estimation on the torus. J. Statist. Plann. Inference, 6:56 7. García-Portugués, E., Crujeiras, R. M., and González-Manteiga, W.. Exploring wind direction and SO concentration by circular-linear density estimation. Stoch. Environ. Res. Risk Assess., 75: Hall, P. 98. Central limit theorem for integrated square error of multivariate nonparametric density estimators. J. Multivariate Anal., : 6.

97 References 77 Hall, P., Watson, G. S., and Cabrera, J Kernel density estimation with spherical data. Biometrika, 7: Hendriks, H. 99. Nonparametric estimation of a probability density on a Riemannian manifold using Fourier expansions. Ann. Statist., 8:8 89. Henry, G. and Rodriguez, D. 9. Kernel density estimation on Riemannian manifolds: asymptotic results. J. Math. Imaging Vision, :5 9. Jones, C., Marron, J. S., and Sheather, S. J Progress in data-based bandwidth selection for kernel density estimation. Computation. Stat., :7 8. Jupp, P. E. and Mardia, K. V A unified view of the theory of directional statistics, Int. Stat. Rev., 57:6 9. Klemelä, J.. Estimation of densities and derivatives of densities with directional data. J. Multivariate Anal., 7:8. Marron, J. S. and Wand, M. P. 99. Exact mean integrated squared error. Ann. Statist., :7 76. Müller, H.-G. 6. Density estimation-ii. In Kotz, S., Balakrishnan, N., Read, C., and Vidakovic, B., editors, Encyclopedia of statistical sciences, volume, pages John Wiley & Sons, Hoboken, second edition. Oliveira, M., Crujeiras, R. M., and Rodríguez-Casal, A.. A plug-in rule for bandwidth selection in circular density estimation. Comput. Statist. Data Anal., 56: Parzen, E. 96. On estimation of a probability density function and mode. Ann. Math. Statist., : Pelletier, B. 5. Kernel density estimation on Riemannian manifolds. Statist. Probab. Lett., 7:97. Rosenblatt, M Remarks on some nonparametric estimates of a density function. Ann. Math. Statist., 7:8 87. Scott, D. W. 99. Multivariate density estimation. Wiley Series in Probability and Mathematical Statistics. Applied Probability and Statistics. John Wiley & Sons, New York. Sheather, S. J. and Jones, M. C. 99. A reliable data-based bandwidth selection method for kernel density estimation. J. Roy. Statist. Soc. Ser. B, 5: Silverman, B. W Density estimation for statistics and data analysis. Monographs on Statistics and Applied Probability. Chapman & Hall, London. Taylor, C. C. 8. Automatic bandwidth selection for circular density estimation. Comput. Statist. Data Anal., 57:9 5. Wand, M. P. and Jones, M. C Kernel smoothing, volume 6 of Monographs on Statistics and Applied Probability. Chapman & Hall, London. Watson, G. S. 98. Statistics on spheres, volume 6 of University of Arkansas Lecture Notes in the Mathematical Sciences. John Wiley & Sons, New York.

98 78 Chapter. Kernel density estimation for directional-linear data Zhao, L. and Wu, C.. Central limit theorem for integrated square error of kernel estimators of spherical density. Sci. China Ser. A, :7 8.

99 Chapter Bandwidth selectors for kernel density estimation with directional data Abstract New bandwidth selectors for kernel density estimation with directional data are presented in this work. These selectors are based on asymptotic and exact error expressions for the kernel density estimator combined with mixtures of von Mises distributions. The performance of the proposed selectors is investigated in a simulation study and compared with other existing rules for a large variety of directional scenarios, sample sizes and dimensions. The selector based on the exact error expression turns out to have the best behaviour of the studied selectors for almost all the situations. This selector is illustrated with real data for the circular and spherical cases. Reference García-Portugués. Exact risk improvement of bandwidth selectors for kernel density estimation with directional data. Electron. J. Stat., 7: Contents. Introduction Kernel density estimation with directional data Available bandwidth selectors A new rule of thumb selector Selectors based on mixtures Mixtures fitting and selection of the number of components Comparative study Directional models Circular case Spherical case The effect of dimension Data application Wind direction Position of stars

100 8 Chapter. Bandwidth selectors for directional kernel density estimation.7 Conclusions A Proofs B Models for the simulation study C Extended tables for the simulation study References Introduction Bandwidth selection is a key issue in kernel density estimation that has deserved considerable attention during the last decades. The problem of selecting the most suitable bandwidth for the nonparametric kernel density estimator introduced by Rosenblatt 956 and Parzen 96 is the main topic of the reviews of Cao et al. 99, Jones et al. 996 and Chiu 996, among others. Comprehensive references on kernel smoothing and bandwidth selection include the books by Silverman 986, Scott 99 and Wand and Jones 995. Bandwidth selection is still an active research field in density estimation, with some recent contributions like Horová et al. and Chacón and Duong in the last years. Kernel density estimation has been also adapted to directional data, that is, data in the unit hypersphere of dimension q. Due to the particular nature of directional data periodicity for q = and manifold structure for any q, the usual multivariate techniques are not appropriate and specific methodology that accounts for their characteristics has to be considered. The classical references for the theory of directional statistics are the complete review of Jupp and Mardia 989 and the book by Mardia and Jupp. The kernel density estimation with directional data was firstly proposed by Hall et al. 987, studying the properties of two types of kernel density estimators and providing cross-validatory bandwidth selectors. Almost simultaneously, Bai et al. 988 provided a similar definition of kernel estimator, establishing its pointwise and L consistency. Some of the results by Hall et al. 987 were extended by Klemelä, who studied the estimation of the Laplacian of the density and other types of derivatives. Whereas the framework for all these references is the general q-sphere, which comprises as particular case the circle q =, there exists a remarkable collection of works devoted to kernel density estimation and bandwidth selection for the circular scenario. Specifically, Taylor 8 presented the first plug-in bandwidth selector in this context and Oliveira et al. derived a selector based on mixtures and on the results of Di Marzio et al. 9 for the circular Asymptotic Mean Integrated Squared Error AMISE. Recently, Di Marzio et al. proposed a product kernel density estimator on the q-dimensional torus and cross-validatory bandwidth selection methods for that situation. Another nonparametric approximation for density estimation with circular data was given in Fernández-Durán and Fernández-Durán and Gregorio-Domínguez. In the general setting of spherical random fields Durastanti et al. derived an estimation method based on a needlet basis representation. Directional data arise in many applied fields. For the circular case q = a typical example is wind direction, studied among others in Jammalamadaka and Lund 6, Fernández-Durán 7 and García-Portugués et al. a. The spherical case q = poses challenging applications in astronomy, for example in the study of stars position in the celestial sphere or in the study of the cosmic microwave background radiation Cabella and Marinucci, 9. Finally, a novel field where directional data is present for large q is text mining Banerjee et al., 5,

101 .. Kernel density estimation with directional data 8 where documents are usually codified as high dimensional unit vectors. For all these situations, a reliable method for choosing the bandwidth parameter seems necessary to trust the density estimate. The aim of this work is to introduce new bandwidth selectors for the kernel density estimator for directional data. The first one is a rule of thumb which assumes that the underlying density is a von Mises and it is intended to be the directional analogue of the rule of thumb proposed by Silverman 986 for data in the real line. This selector uses the AMISE expression that can be seen, among others, in García-Portugués et al. b. The novelty of the selector is that it is more general and robust than the previous proposal by Taylor 8, although both rules exhibit an unsatisfactory behaviour when the reference density spreads off from the von Mises. To overcome this problem, two new selectors based on the use of mixtures of von Mises for the reference density are proposed. One of them uses the aforementioned AMISE expression, whereas the other one uses the exact MISE computation for mixtures of von Mises densities given in García-Portugués et al. b. Both of them use the Expectation-Maximization algorithm of Banerjee et al. 5 to fit the mixtures and, to select the number of components, the BIC criteria is employed. These selectors based on mixtures are inspired by the earlier ideas of Ćwik and Koronacki 997, for the multivariate setting, and Oliveira et al. for the circular scenario. This paper is organized as follows. Section. presents some background on kernel density estimation for directional data and the available bandwidth selectors. The rule of thumb selector is introduced in Section. and the two selectors based on mixtures of von Mises are presented in Section.. Section.5 contains a simulation study comparing the proposed selectors with the ones available in the literature. Finally, Section.6 illustrates a real data application and some conclusions are given in Section.7. Supplementary materials with proofs, simulated models and extended tables are given in the appendix.. Kernel density estimation with directional data Denote by X a directional random variable with density f. The support of such variable is the q-dimensional sphere, namely Ω q = { x R q+ : x + +x q+ = }, endowed with the Lebesgue measure in Ω q, that will be denoted by ω q. Then, a directional density is a nonnegative function that satisfies Ω q fx ω q dx =. Also, when there is no possible confusion, the area of Ω q will be denoted by ω q = ω q Ω q = q+ π q+ Γ, q, where Γ represents the Gamma function defined as Γp = x p e x dx, p >. Among the directional distributions, the von Mises-Fisher distribution see Watson 98 is perhaps the most widely used. The von Mises density, denoted by vmµ, κ, is given by { } f vm x; µ, κ = C q κ exp κx T µ, C q κ = π q+ κ q I q κ,.

102 8 Chapter. Bandwidth selectors for directional kernel density estimation where µ Ω q is the directional mean, κ the concentration parameter around the mean, T stands for the transpose operator and I p is the modified Bessel function of order ν, I ν z = z ν t ν π / Γ ν + e zt dt. This distribution is the main reference for directional models and, in that sense, plays the role of the normal distribution for directional data is also a multivariate normal N µ, κ I q+ conditioned on Ω q ; see Mardia and Jupp. A particular case of this density sets κ =, which corresponds to the uniform density that assigns probability ω q to any direction in Ω q. Given a random sample X,..., X n from the directional random variable X, the proposal of Bai et al. 988 for the directional kernel density estimator at a point x Ω q is ˆf h x = c h,ql n n x T X i L h i=,. where L is a directional kernel a rapidly decaying function with nonnegative values and defined in,, h > is the bandwidth parameter and c h,q L is a normalizing constant. This constant is needed in order to ensure that the estimator is indeed a density and satisfies that x c h,q L T y = ω q dx = O h q. Ω q L h As usual in kernel smoothing, the selection of the bandwidth is a crucial step that affects notably the final estimation: large values of h result in a uniform density in the sphere, whereas small values of h provide an undersmoothed estimator with high concentrations around the sample observations. On the other hand, the choice of the kernel is not seen as important for practical purposes and the most common choice is the so called von Mises kernel Lr = e r. Its name is due to the fact that the kernel estimator can be viewed as a mixture of von Mises-Fisher densities as follows: ˆf h x = n f vm x; X i, /h, n i= where, for each von Mises component, the mean value is the i-th observation X i and the common concentration parameter is given by /h. The classical error measurement in kernel density estimation is the L distance between the estimator ˆf h and the target density f, the so called Integrated Squared Error ISE. As this is a random quantity depending on the sample, its expected value, the Mean Integrated Squared Error MISE, is usually considered: MISEh = E ISE ˆfh = E ˆfh x fx ωq dx, Ω q which depends on the bandwidth h, the kernel L, the sample size n and the target density f. Whereas the two last elements are fixed when estimating a density from a random sample, the bandwidth has to be chosen also the kernel, although this does not present a big impact in the

103 .. Kernel density estimation with directional data 8 performance of the estimator. Then, a possibility is to search for the bandwidth that minimizes the MISE: h MISE = arg min h> MISEh. To derive an easier form for the MISE that allows to obtain h MISE, the following conditions on the elements of the estimator. are required: D. Extend f from Ω q to R q+ \ {} by fx f x/ x for all x R q+ \ {}, where denotes the Euclidean norm. Assume that the gradient vector fx and the Hessian matrix Hfx exist, are continuous and square integrable. D. Assume that L :,, is a bounded and integrable function such that < L k rr q dr <, q, for k =,. D. Assume that h = h n is a positive sequence such that h n and nh q n as n. The following result, available from García-Portugués et al. b, provides the MISE expansion for the estimator.. It is worth mentioning that, under similar conditions, Hall et al. 987 and Klemelä also derived analogous expressions. Proposition. García-Portugués et al. b. Under conditions D D, the MISE for the directional kernel density estimator. is given by MISEh =b q L RΨf, h + c h,ql d q L + o h + nh q, n where RΨf, = Ω q Ψf, x ω q dx, b q L = Lrr q dr Lrr q dr, d ql = L rr q dr Lrr q dr and Ψf, x = x T fx + q fx x T Hfxx.. This results leads to the decomposition MISEh = AMISEh+o h + nh q, where AMISE stands for the Asymptotic MISE. It is possible to derive an optimal bandwidth for the AMISE in this sense, h AMISE = arg min h> AMISEh, that will be close to h MISE when h + nh q is small enough. Corollary. García-Portugués et al. b. The AMISE optimal bandwidth for the directional kernel density estimator. is given by h AMISE = +q qd q L b q L,. λ q LRΨf, n where λ q L = q ω q Lrr q dr. Unfortunately, expression. can not be used in practise since it depends on the curvature term RΨf, of the unknown density f.

104 8 Chapter. Bandwidth selectors for directional kernel density estimation.. Available bandwidth selectors The first proposals for data-driven bandwidth selection with directional data are from Hall et al. 987, who provide cross-validatory selectors. Specifically, Least Squares Cross-Validation LSCV and Likelihood Cross-Validation LCV selectors are introduced, arising as the minimizers of the cross-validated estimates of the squared error loss and the Kullback-Leibler loss, respectively. The selectors have the following expressions: ˆf i h LSCV = arg max h> CV h, CV h = n h LCV = arg max h> CV KLh, CV KL h = n i= n i= log ˆf h i X i i ˆf h X i, Ω q ˆfh x ω q dx, where h represents the kernel estimator computed without the i-th observation. See Remark. for an efficient computation of h LSCV. Recently, Taylor 8 proposed a plug-in selector for the case of circular data q = for the estimator with the von Mises kernel. The selector of Taylor 8 uses from the beginning the assumption that the reference density is a von Mises to construct the AMISE. This contrasts with the classic rule of thumb selector of Silverman 986, which supposes at the end i.e., after deriving the AMISE expression that the reference density is a normal. The bandwidth parameter is chosen by first obtaining an estimation ˆκ of the concentration parameter κ in the reference density for example, by maximum likelihood and using the formula h TAY = π I ˆκ ˆκ I ˆκn Note that the parametrization of Taylor 8 has been adapted to the context of the estimator. by denoting by h the inverse of the squared concentration parameter employed in his paper. More recently, Oliveira et al. proposed a selector that improves the performance of Taylor 8 allowing for more flexibility in the reference density, considering a mixture of von Mises. This selector is also devoted to the circular case and is mainly based on two elements. First, the AMISE expansion that Di Marzio et al. 9 derived for the circular kernel density estimator by the use of Fourier expansions of the circular kernels. This expression has the following form when the kernel is a circular von Mises the estimator is equivalent to consider Lr = e r, q = and h as the inverse of the squared concentration parameter in.: AMISEh = I h / π 6 I h / f θ dθ + I h / nπi h /..5 The second element is the Expectation-Maximization EM algorithm of Banerjee et al. 5 for fitting mixtures of directional von Mises. The selector, that will be denoted by h OLI, proceeds as follows i. Use the EM algorithm to fit mixtures from a determined range of components. ii. Choose the fitted mixture with minimum AIC. iii. Compute the curvature term in.5 using the fitted mixture and seek for the h that minimizes this expression, that will be h OLI. 5.

105 .. A new rule of thumb selector 85. A new rule of thumb selector Using the properties of the von Mises density it is possible to derive a directional analogue to the rule of thumb of Silverman 986, which is the optimal AMISE bandwidth for normal reference density and normal kernel. The rule is resumed in the following result. Proposition. Rule of thumb. The curvature term for a von Mises density vmµ, κ is RΨf vm ; µ, κ, = q+ π q+ κ q+ I q κ q qi q+ κ + + qi q+ κ. If ˆκ is a suitable estimator for κ, then the rule of thumb selector for the kernel estimator. with a directional kernel L is h ROT = ˆκ q+ b q L λ q L q d q L q+ π q+ qi q+ I q ˆκ ˆκ + + qi q+ ˆκ n +q. If L is the von Mises kernel, then: π I ˆκ 5, q =, ˆκ I ˆκ + ˆκI ˆκ n 8 sinh 6 ˆκ h ROT = ˆκ + ˆκ, q =, sinhˆκ ˆκ coshˆκ n π I q ˆκ +q ˆκ q+ qi q+ ˆκ + + qˆκi q+ ˆκ n, q..6 The parameter κ can be estimated by maximum likelihood. In view of the expression for h ROT in.6, it is interesting to compare it with h TAY when q =. As it can be seen, both selectors coincide except for one difference: the term I ˆκ in the sum in the denominator of h ROT. This extra term can be explained by examining the way that both selectors are derived. Whereas the selector h ROT derives the bandwidth supposing that the reference density is a von Mises when the AMISE is already derived in a general way, the selector h TAY uses the von Mises assumption to compute it. Therefore, it is expected that the selector h ROT will be more robust against deviations from the von Mises density. Figure. collects two graphs exposing these comments, that are also corroborated in Section.5. The left plot shows the MISE for h TAY and h ROT for the density vm,, + vm cosθ, sinθ,, where θ π, π. This model represents two equally concentrated von Mises densities that spread off from being the same to being antipodal. As it can be seen, the h ROT selector is slightly more accurate when the von Mises model holds θ = π and when the deviation is large θ π, π. When θ π, π, both selectors perform similar. This graph also illustrates the main problem of these selectors: the von Mises density is not flexible enough to capture densities with multimodality and it approximates them by the flat uniform density.

106 86 Chapter. Bandwidth selectors for directional kernel density estimation When the density is a vmµ, κ, the right plot of Figure. shows the output of h TAY, h ROT, h MISE and their corresponding errors with respect to κ. The effect of the extra term is visible for low values of κ, where MISEh TAY presents a local maxima. This corresponds with higher values of h TAY with respect to h ROT and h MISE, which means that the former produces oversmoothed estimations of the density i.e. tend to the uniform case faster. Despite the worse behaviour of h TAY, when the concentration parameter increases the effect of the extra term is mitigated and both selectors are almost the same. logmise MISEh TAY MISEh ROT MISEh MISE logh logmiseh h TAY h ROT h MISE 6 8 MISEh TAY MISEh ROT MISEh MISE π π π 5π π 6 8 θ κ Figure.: The effect of the extra term in h ROT. Left plot: logarithm of the curves of MISEh TAY, MISEh ROT and MISEh MISE for sample size n = 5. The curves are computed by Monte Carlo samples and h MISE is obtained exactly. The abscissae axis represents the variation of the parameter θ π, π, which indexes the reference density vm,, + vm cosθ, sinθ,. Right plot: logarithm of h TAY, h ROT, h MISE and their corresponding MISE for different values of κ, with n = 5.. Selectors based on mixtures The results of the previous section show that, although the rule of thumb presents a significant improvement with respect to the Taylor 8 selector in terms of generality and robustness, it also shares the same drawbacks when the underlying density is not the von Mises model see Figure.. To overcome these problems, two alternatives for improving h ROT will be considered. The first one is related with improving the reference density to plug-in into the curvature term. The von Mises density has been proved to be not flexible enough to estimate properly the curvature term in.. This is specially visible when the underlying model is a mixture of antipodal von Mises, but the estimated curvature term is close to zero the curvature of a uniform density. A modification in this direction is to consider a suitable mixture of von Mises for the reference density, that will be able to capture the curvature of rather complex underlying densities. This idea was employed first by Ćwik and Koronacki 997 considering mixtures of multivariate normals and by Oliveira et al. in the circular setting.

107 .. Selectors based on mixtures 87 The second improvement is concerned with the error criterion for the choice of the bandwidth. Until now, the error criterion considered was the AMISE, which is the usual in the literature of kernel smoothing. However, as Marron and Wand 99 showed for the linear case and García-Portugués et al. b did for the directional situation, the AMISE and MISE may differ significantly for moderate and even large sample sizes, with a potential significative misfit between h AMISE and h MISE. Then, a substantial decreasing of the error of the estimator. is likely to happen if the bandwidth is obtained from the exact MISE, instead of the asymptotic version. Obviously, the problem of this new approach is how to compute exactly the MISE, but this can be done if the reference density is a mixture of von Mises. The previous two considerations, improve the reference density and the error criterion, will lead to the bandwidth selectors of Asymptotic MIxtures AMI, denoted by h AMI, and Exact MIxtures EMI, denoted by h EMI. Before explaining in detail the two proposed selectors, it is required to introduce some notation on mixtures of von Mises. An M-mixture of von Mises densities with means µ j, concentration parameters κ j and weights p j, with j =,..., M, is denoted by M f M x = p j f vm x; µ j, κ j, j= M p j =, p j..7 j= When dealing with mixtures, the tuning parameter is the number of components, M, which can be estimated from the sample. The notation f M will be employed to represent the mixture of M components where the parameters are estimated and M is obtained from the sample. The details of this fitting are explained later in Algorithm.. Then, the AMI selector follows from modifying the rule of thumb selector to allow fitted mixtures of von Mises. It is stated in the next procedure. Algorithm. AMI selector. Let X,..., X n be a random sample of a directional variable X. i. Compute a suitable estimation f M using Algorithm.. ii. For a directional kernel L, set and for the von Mises kernel, qd q L h AMI = b q L λ q LR Ψ f M, n h AMI = q q π q R Ψ f +q M, n. +q Remark.. Unfortunately, the curvature term R Ψ f M, does not admit a simple closed expression, unless for the case where M =, i.e., when h AMI is equivalent to h ROT. This is due to the cross-product terms between the derivatives of the mixtures that appear in the integrand. However, this issue can be bypassed by using either numerical integration in q-spherical coordinates or Monte Carlo integration to compute R Ψ f M, for any M.

108 88 Chapter. Bandwidth selectors for directional kernel density estimation The EMI selector relies on the exact expression of the MISE for densities of the type.7, that will be denoted by MISE M h = E ˆfh x f M x ωq dx. Ω q Similarly to what Marron and Wand 99 did for the linear case, García-Portugués et al. b derived the closed expression of MISE M h when the directional kernel is the von Mises one. The calculations are based on the convolution properties of the von Mises, which unfortunately are not so straightforward as the ones for the normal, resulting in more complex expressions. Proposition. García-Portugués et al. b. Let f M be the density of an M-mixture of directional von Mises.7. The exact MISE of the directional kernel estimator. with von Mises kernel and obtained from a random sample of size n is MISE M h = D q hn + p T n Ψ h Ψ h + Ψ h p,.8 where p = p,..., p M T and D q h = C q /h Cq /h. The matrices Ψa h, a =,, have entries: Cq κ i C q κ j Ψ h = C q κi µ i + κ j µ j C q /h C q κ i C q κ j, Ψ h = ij Ω q C q x/h e κ jx T µ j ω q dx, + κ i µ i ij C q /h Cq κ i C q κ j Ψ h = C q x/h + κ i µ i C q x/h + κ j µ j ω qdx, Ω q where C q is defined in equation.. Remark.. A more efficient way to implement.8, specially for large sample sizes and higher dimensions, is the following expression: { MISE M h = D q hn } + E ˆfh x f M x E ˆfh x ω q dx, Ω q where the integral is either evaluated numerically using q-spherical coordinates or Monte Carlo integration and E ˆfh x is computed using E ˆfh x = M j= ij p j C q κ j C q /h C q x/h + κ j µ j. Remark.. By the use of similar techniques, when the kernel is von Mises, the LSCV selector i admits an easier expression for the CV loss that avoids the calculation of the integral of ˆf h : CV h = C q /h n n n n ext i X j/h C q /h nc q X i + X j /h C q/h nc q /h. i= j>i Based on the previous result, the philosophy of the EMI selector is the following: using a suitable pilot parametric estimation of the unknown density given by Algorithm., build the exact MISE and obtain the bandwidth that minimizes it. This is summarized in the following procedure.

109 .. Selectors based on mixtures 89 Algorithm. EMI selector. Consider the von Mises kernel and let X,..., X n be a random sample of a directional variable X. i. Compute a suitable estimation f M using Algorithm.. ii. Obtain h EMI = arg min h> MISE Mh... Mixtures fitting and selection of the number of components The EM algorithm of Banerjee et al. 5, implemented in the R package movmf see Hornik and Grün, provides a complete solution to the problem of estimation of the parameters in a mixture of directional von Mises of dimension q. However, the issue of selecting the number of components of the mixture in an automatic and optimal way is still an open problem. The propose considered in this work is an heuristic approach based on the Bayesian Information Criterion BIC, defined as BIC = l + k log n, where l is the log-likelihood of the model and k is the number of parameters. The procedure looks for the fitted mixture with a number of components M that minimizes the BIC. This problem can be summarized as the global minimization of a function BIC defined on the naturals number of components. The heuristic procedure starts by fitting mixtures from M = to M = M B, computing their BIC and providing M, the number of components with minimum BIC. Then, in order to ensure that M is a global minimum and not a local one, M N neighbours next to M are explored i.e. fit mixture, compute BIC and update M, if they were not previously explored. This procedure continues until M has at least M N neighbours at each side with larger BICs. A reasonable compromise for M B and M N, checked by simulations, is to set M B = log n and M N =. In order to avoid spurious solutions, fitted mixtures with any κ j > 5 are removed. The procedure is detailed as follows. Algorithm. Mixture estimation with data-driven selection of the number of components. Let X,..., X n be a random sample of a directional variable X with density f. i. Set M B = log n and M N as the user supplies, usually M N =. ii. For M varying from to M B, i. estimate the M-mixture with the EM algorithm of Banerjee et al. 5 and ii. compute the BIC of the fitted mixture. iii. Set M as the number of components of the mixture with lower BIC. iv. If M B M N < M, set M B = M B + and turn back to step ii. Otherwise, end with the final estimation f M. Other informative criteria, such as the Akaike Information Criterion AIC and its corrected version, AICc, were checked in the simulation study together with BIC. The BIC turned out to be the best choice to use with the AMI and EMI selectors, as it yielded the minimum errors.

110 9 Chapter. Bandwidth selectors for directional kernel density estimation.5 Comparative study Along this section, the three new bandwidth selectors will be compared with the already proposed selectors described in Subsection... A collection of directional models, with their corresponding simulation schemes, are considered. Subsection.5. is devoted to comment the directional models used in the simulation study all of them are defined for any arbitrary dimension q, not just for the circular or spherical case. These models are also described in the appendix. For each of the different combinations of dimension, sample size and model, the MISE of each selector was estimated empirically by Monte Carlo samples, with the same seed for the different selectors. This is used in the computation of MISEh MISE, where h MISE is obtained as a numerical minimization of the estimated MISE. The calculus of the ISE was done by: Simpson quadrature rule with discretization points for q = ; Lebedev and Laikov 995 rule with 58 nodes for q = and Monte Carlo integration with sampling points for q > same seed for all the integrations. Finally, the kernel considered in the study is the von Mises..5. Directional models The first models considered are the uniform density in Ω q and the von Mises density given in.. The analogous of the von Mises for axial data i.e., directional data where fx = f x is the Watson distribution Wµ, κ Mardia and Jupp, : { f W x; µ, κ = M q κ exp κx T µ }, where M q κ = ω q eκt t q dt. This density has two antipodal modes: µ and µ, both of them with concentration parameter κ. A further extension of this density is the called Small Circle distribution SCµ, τ, ν Bingham and Mardia, 978: { f SC x; µ, τ, ν = A q τ, ν exp τx T µ ν }, where A q τ, ν = ω q e τt ν t q dt, ν, and τ R. For the case τ, this density has a kind of modal strip along the q -sphere { x Ω q : x T µ = ν }. A common feature of all these densities is that they are rotationally symmetric, that is, their contourlines are q -spheres orthogonal to a particular direction. This characteristic can be exploited by means of the so called tangent-normal decomposition see Mardia and Jupp, that leads to the change of variables { x = tµ + t B µ ξ, ω q dx = t q dt ω q dξ,.9 where µ Ω q is a fixed vector, t = µ T x measures the distance of x from µ, ξ Ω q and B µ = b,..., b q q+ q is the semi-orthonormal matrix B T µb µ = I q and B µ B T µ = I q+ µµ T, with I q the q-identity matrix resulting from the completion of µ to the orthonormal basis {µ, b,..., b q }. The family of rotationally symmetric densities can be parametrized as f gθ,µx = g θ µ T x,.

111 .5. Comparative study 9 where g θ is a function depending on a vector parameter θ Θ R p and such that ω q g θ t t q is a density in,, for all θ Θ. Using this property, it is easy to simulate from.. Algorithm. Sampling from a rotationally symmetric density. Let be the rotationally symmetric density. and consider the notation of.9. i. Sample T from the density ω q g θ t t q. ii. Sample ξ from a uniform in Ω q Ω = {, }. iii. T µ + T B µ ξ is a sample from f gθ,µ. Remark.. Step i can always be performed using the inversion method Johnson, 987. This approach can be computationally expensive: it involves solving the root of the distribution function, which is computed from an integral evaluated numerically if no closed expression is available. A reasonable solution to this for a fixed choice of g θ and µ is to evaluate once the quantile function in a dense grid for example, points equispaced in,, save the grid and use it to interpolate using cubic splines the new evaluations, which is computationally fast. Extending these ideas for rotationally symmetric models, two new directional densities are proposed. The first one is the Directional Cauchy density DCµ, κ, defined as an analogy with the usual Cauchy distribution as f DC x; µ, κ = D q κ + κ x T µ, D qκ = π + κ /, q =, π log + κκ, q =, ω q t q +κ t dt, q >, where µ is the mode direction and κ the concentration parameter around it κ = gives the uniform density. This density shares also some of the characteristics of the usual Cauchy distribution: high concentration around a peaked mode and a power decay of the density. The other proposed density is the Skew Normal Directional density SNDµ, m, σ, λ, f SND x; µ, m, σ, λ = S q m, σ, λg m,σ,λ µ T x, S q m, σ, λ= ω q g m,σ,λ t t q dt, where g m,σ,λ is the skew normal density of Azzalini 985 with location m, scale σ and shape λ that is truncated to the interval,. The density is inspired by the wrapped skew normal distribution of Pewsey 6, although it is based on the rotationally symmetry rather than in wrapping techniques. A particular form of this density is an homogeneous cap in a neighbourhood of µ that decreases very fast outside of it. Non rotationally symmetric densities can be created by mixtures of rotationally symmetric. However, it is interesting to introduce a purely non rotationally symmetric density: the Projected Normal distribution of Pukkila and Rao 988. Denoted by PNµ, Σ, the corresponding density is f PN x; µ, Σ = π p Σ p Q I p Q Q { } exp Q Q Q, where Q = x T Σ x, Q = µ T Σ x, Q = µ T Σ µ and I p α = t p exp { t α } dt.

112 9 Chapter. Bandwidth selectors for directional kernel density estimation Sampling from this distribution is extremely easy: just sample X N µ, Σ and then project X to Ω q by X/ X. The whole collection of models, with densities in total, are detailed in Table.5 in Appendix.B. Figures. and. show the plots of these densities for the circular and spherical cases..5. Circular case For the circular case, the comparative study has been done for the models described in Figure. see Table.5 to see their densities, for the circular selectors h LCV, h LSCV, h TAY, h OLI, h ROT, h AMI and h EMI and for the sample sizes, 5, 5 and. Due to space limitations, only the results for sample size 5 are shown in Table., and the rest of them are relegated to Appendix.C. In addition, to help summarizing the results a ranking similar to Ranking B of Cao et al. 99 will be constructed. The ranking will be computed according to the following criteria: for each model, the m bandwidth selectors h,..., h m considered are sorted from the best performance lowest error to the worst performance largest error. The best bandwidth receives m points, the second m and so on. These points, denoted by r, are standardized by m and multiplied by the relative performance of each selector compared with the best one. In other words, the points of the selector h k, if h opt is the best one, are r k MISEh opt m MISEh k. The final score for each selector is the sum of the ranks obtained in all the twenty models thus, a selector which is the best in all models will have points. With this ranking, it is easy to group the results in a single and easy to read table. In view of the results, the following conclusions can be extracted. Firstly, h ROT performs well in certain unimodal models such as M von Mises and M6 skew normal directional, but its performance is very poor with multimodal models like M5 Watson. In its particular comparison with h TAY, it can be observed that both selectors share the same order of error, but being h ROT better in all the situations except for one: the uniform model M. This is due to the extra term commented in Section.: its absence in the denominator makes that h TAY faster than h ROT when the concentration parameter κ and, what is a disadvantage for κ >, turns out in an advantage for the uniform case. With respect to h AMI and h EMI, although their performance becomes more similar when the sample size increases, something expected, h EMI seems to be on average a step ahead from h AMI, specially for low sample sizes. Among the cross-validated selectors, h LCV performs better than h LSCV, a fact that was previously noted by simulation studies carried out by Taylor 8 and Oliveira et al.. Finally, h OLI presents the most competitive behaviour among the previous proposals in the literature when the sample size is reasonably large see Table.. The comparison between the circular selectors is summarized in the scores of Table.. For all the sample sizes considered, h EMI is the most competitive selector, followed by h AMI for all the sample sizes except n =, where h LCV is the second. The effect of the sample effect is also interesting to comment. For n =, h LCV and h ROT perform surprisingly well, in contrast with h OLI, which is the second worst selector for this case. When the sample size increases, h ROT and h TAY have a decreasing performance and h OLI stretches differences with h AMI, showing a similar behaviour. This was something expected as both selectors are based on error criteria

113 .5. Comparative study 9 that are asymptotically equivalent. The cross-validated selectors show a stable performance for sample sizes larger than n =. Figure.: Simulation scenarios for the circular case. From left to right and up to down, models M to M. For each model, a sample of size 5 is drawn.

114 9 Chapter. Bandwidth selectors for directional kernel density estimation Model h MISE h LCV h LSCV h TAY h OLI h ROT h AMI h EMI M M M M M M M M M M M M M M M M M M M M Table.: Comparative study for the circular case, with sample size n = 5. Columns of the selector represent the MISE, with bold type for the minimum of the errors. The standard deviation of the ISE is given between parentheses. q n h LCV h LSCV h TAY h OLI h ROT h AMI h EMI Table.: Ranking for the selectors for the circular and spherical cases, for sample sizes n =, 5, 5,. The higher the score in the ranking, the better the performance of the selector. Bold type indicates the best selector..5. Spherical case The comparative study for the spherical case has been done for the directional selectors h LSCV, h ROT, h AMI and h EMI, in the models given in Figure.. As in the previous case, Table. contains the scores of the selectors for the different sample sizes, Table. includes the detailed results for n = 5 and the rest of the sample sizes are shown in Appendix.C. In this case the results are even more clear. The h EMI selector is by far the best, with an important gap between its competitors for all the sample sizes considered. Further, the effect of computing the exact error instead of the asymptotic one can be appreciated: h AMI only is competitive against the cross-validated selectors for sample sizes larger than n = 5, while h EMI remains always the most competitive. In addition, the performance of h AMI seems to decrease

115 .5. Comparative study 95 due to the effect of the dimension in the asymptotic error and does not converge so quick as in the circular case to the performance of h EMI. Figure.: Simulation scenarios for the spherical case. From left to right and up to down, models M to M. For each model, a sample of size 5 is drawn. An interesting fact is that h LSCV performs better than h LCV, contrarily to what happens in the circular case. This phenomena is strengthen with higher dimensions, as it can be seen in the next subsection. A possible explanation is the following. For the standard linear case, LCV

116 96 Chapter. Bandwidth selectors for directional kernel density estimation has been proved to be a bad selector in densities with heavy tails see Cao et al. 99 that are likely to produce outliers. In the circular case, the compact support jointly with periodicity may mitigate this situation, something that does not hold when the dimension increases and the sparsity of the observations is more likely. This makes that among the cross-validated selectors h LCV works better for q = and h LSCV for q >. Model h MISE h LCV h LSCV h ROT h AMI h EMI M M M M M M M M M M M M M M M M M M M M Table.: Comparative study for the spherical case, with sample size n = 5. Columns of the selector represent the MISE, with bold type for the minimum of the errors. The standard deviation of the ISE is given between parentheses..5. The effect of dimension Finally, the previous selectors are tested in higher dimensions. Table. summarizes the information for dimensions q =,, 5 and sample size n = see Table.8 in Appendix.C for whole results. As it can be seen, h EMI continues performing better than its competitors. Also, as previously commented in the spherical case, h AMI has a lower performance due to the misfit between AMISE and MISE, which gets worse when the sample size is fixed and the dimension increases. h LSCV arises as the second best selector for higher dimensions, outperforming h LCV, as happened in the spherical case. q h LCV h LSCV h ROT h AMI h EMI Table.: Ranking for the selectors for dimensions q =,, 5 and sample size n =. The larger the score in the ranking, the better the performance of the selector. Bold type indicates the best selector.

.6. Data application 97.6 Data application According with the comparative study of the previous section, the h EMI selector poses in average the best performance of all the considered selectors.

117 .6. Data application 97.6 Data application According with the comparative study of the previous section, the h EMI selector poses in average the best performance of all the considered selectors. In this section it will be applied to estimate the density of two real datasets..6. Wind direction Wind direction is a typical example of circular data. The data of this illustration was recorded in the meteorological station of A Mourela W, N, located near the coal power plant of As Pontes, in the northwest of Spain. The wind direction has a big impact on the dispersion of the pollutants from the coal power plant and a reliable estimation of its unknown density is useful for a further study of the pollutants transportation. The wind direction was measured minutely at the top of a pole of 8 metres during the month of June,. In order to mitigate serial dependence, the data has been hourly averaged by computing the circular mean, resulting in a sample of size 67. The resulting bandwidth is h EMI =.896, obtained from the data-driven mixture of von Mises. Left plot of Figure. represents the estimated density, which shows a clear predominance of the winds from the west and three main modes. Running time, measured in a.5 GHz core, is. seconds.89 for the mixtures fitting and. for bandwidth optimization. Density.... E N W S E Wind direction Figure.: Left: density of the wind direction in the meteorological station of A Mourela. Right: density of the stars collected in the Hipparcos catalogue, represented in galactic coordinates and Aitoff projection..6. Position of stars A challenging field where spherical data is present is astronomy. Usually, the position of stars is referred to the position that occupy in the celestial sphere, i.e., the location in the earth surface that arises as the intersection with the imaginary line that joins the centre of the earth with the star. A massive enumeration of near stars is given in the Hipparcos catalogue Perryman, 997, that collects the findings of the Hipparcos mission carried out by the European Space Agency in An improved version of the original dataset, available from van Leeuwen

118 98 Chapter. Bandwidth selectors for directional kernel density estimation 7, contains a corrected collection of the position of the stars on the celestial sphere as well as other star variables. For many years, most of the statistical tools used to describe this kind of data were histograms adapted to the spherical case, where the choice of the bin width was done manually see page 8 of Perryman 997. In this illustration, a smooth estimation of the spherical density is given using the optimal smoothing of the h EMI selector. Using the 7955 star positions from the dataset of van Leeuwen 7, the underlying density is approximated with components automatically obtained, resulting the bandwidth h EMI =.6. Note that the analysis of such a large dataset by cross-validatory techniques would demand an enormous amount of computing time and memory resources, whereas the running time for h EMI is reasonable, with 56. seconds 7. for the mixtures fitting and 8.67 for bandwidth optimization. The right plot of Figure. shows the density of the position of the measured stars. This plot is given in the Aitoff projection see Perryman 997 and in galactic coordinates, which means that the equator represents the position of the galactic rotation plane. The higher concentrations of stars are located around two spots, that represent the Orion s arm left and the Gould s Belt right of our galaxy..7 Conclusions Three new bandwidth selectors for directional data are proposed. The rule of thumb extends and improves significantly the previous proposal of Taylor 8, but also fails estimating densities with multimodality. On the other hand, the selectors based on mixtures are competitive with the previous proposals in the literature, being the EMI selector the most competitive on average among all, for different sample sizes and dimensions, but specially for low or moderate sample sizes. The performance of AMI selector is one step behind EMI, a difference that is reduced when sample size increases. In the comparison study, new rotationally symmetric models have been introduced and other interesting conclusions have been obtained. First, LCV is also a competitive selector for the circular case and outperforms LSCV, something that was known in the literature of circular data. However, this situation is reversed for the spherical case and higher dimensions, where LSCV is competitive and performs better than LCV. The final conclusion of this paper is simple: the EMI bandwidth selector presents a reliable choice for kernel density estimation with directional data and its performance is at least as competitive as the existing proposals until the moment. Acknowledgements The author gratefully acknowledges the comments and guidance of professors Rosa M. Crujeiras and Wenceslao González-Manteiga. The work of the author has been supported by FPU grant AP 957 from the Spanish Ministry of Education. Support of Project MTM8, from the Spanish Ministry of Science and Innovation, Project MDS75PR from Dirección Xeral de I+D, Xunta de Galicia and IAP network StUDyS, from Belgian Science Policy, are acknowledged. The author acknowledges the helpful comments by the Editor and an Associate Editor.

119 .A. Proofs 99.A Proofs Proof of Proposition.. By simple differentiation, the operator. in a von Mises density vmµ, κ is Ψ f vm ; µ, κ, x = κc q κe κxt µ x T µ + κq x T µ. Then, by the change of variables of.9, R Ψf vm ; µ, κ, = Ψf vm ; µ, κ, x ω q dx Ω q = κ C q κ = Ω q κ q+ q π q + I q κ Γ q e t κt + κ q t t q dt ω q dξ e t κt + κ q t t q dt. The integral can divided into three terms expanding the square. After two integrations by parts, the sum of the first two is e κt t q t dt κ e κt t q t dt = e κt t q dt. q q This integral and the last term follows immediately by the integral form of the modified Bessel function, yielding R Ψf vm ; µ, κ, = q+ π q+ κ q+ I q κ q qi q+ κ + + qi q+ κ. The particular case q = follows by using I z = πz sinhz, I z = πz coshz and relations I ν z = I ν+ z + ν z I νz and I ν+ z = I ν z ν z I νz. Also, for the von Mises kernel Lr = e r, it is easy to see that λ q L = π q, bq L = q and d q L = q..b Models for the simulation study Table.5 collects the densities of the different models used in the simulation study. Apart from the notations introduced in Section.5 for the families of directional densities, the following terminology is used. First, the vector q represent a vector with q zeros. Second, functions ρ and ρ give the polar and spherical parametrization of a vector from a single and a pair of angles, respectively: ρ θ = cosθ, sinθ, ρ θ, φ = cosθ sinφ, sinθ sinφ, cosφ, θ, π, φ, π. Thirdly, the notation #i for an index i varying in the ordered set S aims to represent the position of i in S. Finally, the matrix Σ is such that the first three elements of diagσ are,, 8 and the rest of them are. Matrix Σ are just like matrix Σ but with the diagonal reversed.

120 Chapter. Bandwidth selectors for directional kernel density estimation.c Extended tables for the simulation study Tables.6 and.7 show the results for sample sizes, 5 and for the circular and spherical cases, respectively. The case n = 5 is collected in Tables. and.. Finally, Table.8 contains the simulation results for sample size n = and dimensions q =,, 5. Model Description Density M Uniform Unif. ωq M Von Mises vm vm q,, M Projected normal PN, PN q,, I q+ rotationally symmetric M Projected normal, non PN, q, Σ rotationally symmetric M5 Directional Cauchy DC DC q,, M6 Skew normal directional SND q,,,, 5 M7 Watson W, q, M8 M9 M M M M Mixture of two 9 vm Skewed mixture of vm 8 vm q = q > Mixture of two PN Bandage 5 vm q = q > Mixture of PN and DC Mixture of Unif. and DC vmq,, + vm, q, 8 vm 8 vm 5,, 5 q,, i {,,,,6,8,9} i {,,,,6,8,9} PN, q, Σ + PN,, q vm ρ, + + vm iπ ρ i {,} vm ρ, π + i {,} i {,} vm ρ iπ 6, 5 + vm iπ ρ, q, + vm ρ iπ, iπ vm ρ iπ, 5 #i q iπ vm, ρ,, 5 #i, Σ i {,} PN, q, Σ + DC,, q, 5 5 ω q + DC 5,, q, M Trimodal vm vmq,, + vm q, ρ 5π M5 Small circle SC q,, M6 Double small circle SCq,, + SC, q, M7 Spiral vm 9 q = ρ i= vm πi 8, i M8 M9 M q > Claw vm Double spiral vm q = q > Windmill vm q = q > ρ 9 i= vm πi 8, πi 6 i= 9 i= 9 i= vm q, ρ i+π,, vm ρ iπ 6, +iπ 8, q,, q,, q, 5 + vm ρ iπ, iπ, q, i, + vm q, ρ 7π, i+π, 5 + vm q, ρ 5, 5 vm ρ πi 8, i + vm ρ πi 8, ρ vm πi 8, πi 6, q, i, q, +vm ρ πi 8, πi 6 i= vm ρ iπ, 5 vm,, + vm q,, + i= j {,5,6} vm ρ iπ, π j, q, 5 Table.5: Directional densities considered in the simulation study.

121 .C. Extended tables for the simulation study Model h MISE h LCV h LSCV h TAY h OLI h ROT h AMI h EMI M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M Table.6: Comparative study for the circular case, with up to down blocks corresponding to sample sizes, 5 and, respectively. Columns of the selector represent the MISE, with bold type for the minimum of the errors. The standard deviation of the ISE is given between parentheses.

122 Chapter. Bandwidth selectors for directional kernel density estimation Model h MISE h LCV h LSCV h ROT h AMI h EMI M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M Table.7: Comparative study for the spherical case, with up to down blocks corresponding to sample sizes, 5 and, respectively. Columns of the selector represent the MISE, with bold type for the minimum of the errors. The standard deviation of the ISE is given between parentheses.

123 .C. Extended tables for the simulation study Model h MISE h LCV h LSCV h ROT h AMI h EMI M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M Table.8: Comparative study for higher dimensions with sample size n = : up to down blocks correspond to dimensions q =,, 5. Columns of the selector represent the MISE, with bold type for the minimum of the errors. The standard deviation of the ISE is given between parentheses.

124 Chapter. Bandwidth selectors for directional kernel density estimation References Azzalini, A A class of distributions which includes the normal ones. Scand. J. Statist., :7 78. Bai, Z. D., Rao, C. R., and Zhao, L. C Kernel estimators of density function of directional data. J. Multivariate Anal., 7: 9. Banerjee, A., Dhillon, I. S., Ghosh, J., and Sra, S. 5. Clustering on the unit hypersphere using von Mises-Fisher distributions. J. Mach. Learn. Res., 6:5 8. Bingham, C. and Mardia, K. V A small circle distribution on the sphere. Biometrika, 65: Cabella, P. and Marinucci, D. 9. Statistical challenges in the analysis of cosmic microwave background radiation. Ann. Appl. Stat., :6 95. Cao, R., Cuevas, A., and Gonzalez Manteiga, W. 99. A comparative study of several smoothing methods in density estimation. Comput. Statist. Data Anal., 7:5 76. Chacón, J. E. and Duong, T.. Data-driven density derivative estimation, with applications to nonparametric clustering and bump hunting. Electron. J. Stat., 7:99 5. Chiu, S.-T A comparative review of bandwidth selection for kernel density estimation. Statist. Sinica, 6:9 5. Ćwik, J. and Koronacki, J A combined adaptive-mixtures/plug-in estimator of multivariate probability densities. Comput. Statist. Data Anal., 6:99 8. Di Marzio, M., Panzera, A., and Taylor, C. C. 9. Local polynomial regression for circular predictors. Statist. Probab. Lett., 799: Di Marzio, M., Panzera, A., and Taylor, C. C.. Kernel density estimation on the torus. J. Statist. Plann. Inference, 6:56 7. Durastanti, C., Lan, X., and Marinucci, D.. Needlet-Whittle estimates on the unit sphere. Electron. J. Stat., 7: Fernández-Durán, J. J.. Circular distributions based on nonnegative trigonometric sums. Biometrics, 6:99 5. Fernández-Durán, J. J. 7. Models for circular-linear and circular-circular data constructed from circular distributions based on nonnegative trigonometric sums. Biometrics, 6: Fernández-Durán, J. J. and Gregorio-Domínguez, M. M.. Maximum likelihood estimation of nonnegative trigonometric sum models using a Newton-like algorithm on manifolds. Electron. J. Stat., :. García-Portugués, E., Crujeiras, R. M., and González-Manteiga, W. a. Exploring wind direction and SO concentration by circular-linear density estimation. Stoch. Environ. Res. Risk Assess., 75:55 67.

125 References 5 García-Portugués, E., Crujeiras, R. M., and González-Manteiga, W. b. Kernel density estimation for directional-linear data. J. Multivariate Anal., :5 75. Hall, P., Watson, G. S., and Cabrera, J Kernel density estimation with spherical data. Biometrika, 7: Hornik, K. and Grün, B.. movmf: an R package for fitting mixtures of von Mises-Fisher distributions. J. Stat. Softw., 58:. Horová, I., Koláček, J., and Vopatová, K.. Full bandwidth matrix selectors for gradient kernel density estimate. Comput. Statist. Data Anal., 57:6 76. Jammalamadaka, S. R. and Lund, U. J. 6. The effect of wind direction on ozone levels: a case study. Environ. Ecol. Stat., : Johnson, M. E Multivariate statistical simulation. Wiley Series in Probability and Mathematical Statistics. Applied Probability and Statistics. John Wiley & Sons, New York. Jones, C., Marron, J. S., and Sheather, S. J Progress in data-based bandwidth selection for kernel density estimation. Computation. Stat., :7 8. Jupp, P. E. and Mardia, K. V A unified view of the theory of directional statistics, Int. Stat. Rev., 57:6 9. Klemelä, J.. Estimation of densities and derivatives of densities with directional data. J. Multivariate Anal., 7:8. Lebedev, V. I. and Laikov, D. N A quadrature formula for the sphere of the st algebraic order of accuracy. Dokl. Math., 59:77 8. Mardia, K. V. and Jupp, P. E.. Directional statistics. Wiley Series in Probability and Statistics. John Wiley & Sons, Chichester, second edition. Marron, J. S. and Wand, M. P. 99. Exact mean integrated squared error. Ann. Statist., :7 76. Oliveira, M., Crujeiras, R. M., and Rodríguez-Casal, A.. A plug-in rule for bandwidth selection in circular density estimation. Comput. Statist. Data Anal., 56: Parzen, E. 96. On estimation of a probability density function and mode. Ann. Math. Statist., : Perryman, M. A. C The Hipparcos and Tycho catalogues, volume of ESA SP. ESA Publication Division, Noordwijk. Pewsey, A. 6. Modelling asymmetrically distributed circular data using the wrapped skewnormal distribution. Environ. Ecol. Stat., : Pukkila, T. M. and Rao, C. R Pattern recognition based on scale invariant discriminant functions. Inform. Sci., 5: Rosenblatt, M Remarks on some nonparametric estimates of a density function. Ann. Math. Statist., 7:8 87.

126 6 Chapter. Bandwidth selectors for directional kernel density estimation Scott, D. W. 99. Multivariate density estimation. Wiley Series in Probability and Mathematical Statistics. Applied Probability and Statistics. John Wiley & Sons, New York. Silverman, B. W Density estimation for statistics and data analysis. Monographs on Statistics and Applied Probability. Chapman & Hall, London. Taylor, C. C. 8. Automatic bandwidth selection for circular density estimation. Comput. Statist. Data Anal., 57:9 5. van Leeuwen, F. 7. Hipparcos, the new reduction of the raw data, volume 5 of Astrophysics and Space Science Library. Springer, Dordrecht. Wand, M. P. and Jones, M. C Kernel smoothing, volume 6 of Monographs on Statistics and Applied Probability. Chapman & Hall, London. Watson, G. S. 98. Statistics on spheres, volume 6 of University of Arkansas Lecture Notes in the Mathematical Sciences. John Wiley & Sons, New York.

127 Chapter 5 A nonparametric test for directional-linear independence Abstract A nonparametric test for assessing the independence between a directional random variable circular or spherical, as particular cases and a linear one is proposed in this paper. The statistic is based on the squared distance between nonparametric kernel density estimates and its calibration is done by a permutation approach. The size and power characteristics of various variants of the test are investigated and compared with those for classical correlation-based tests of independence in an extensive simulation study. Finally, the best-performing variant of the new test is applied in the analysis of the relation between the orientation and size of Portuguese wildfires. Reference García-Portugués, E., Barros, A. M. G., Crujeiras, R. M., González-Manteiga, W. and Pereira, J. M. C.. A test for directional-linear independence, with applications to wildfire orientation and size. Stoch. Environ. Res. Risk Assess., 85:6 75. Contents 5. Introduction Background to kernel density estimation Linear kernel density estimation Directional kernel density estimation Directional-linear kernel density estimation A test for directional-linear independence The test statistic Calibration of the test Simulation study Real data analysis Data description Results Discussion A Proof of Lemma References

128 8 Chapter 5. A nonparametric test for directional-linear independence 5. Introduction Characterization of wildfire orientation patterns at landscape scale has important management implications Moreira et al., ; Lloret et al., ; Moreira et al.,. It has been shown that landscape fuel reduction treatments will only be successful if strategically placed in order to intersect fire spread in the heading direction Finney, ; Schmidt et al., 8. Barros et al. assessed the existence of preferential fire perimeter orientation at watershed level, to support the spatial layout of fuelbreak networks. Their analysis identified clusters of watersheds where fire perimeters were preferentially aligned along the NE/SW and the SE/NW axes. Those watersheds included fire perimeters that together account for roughly 65% of the overall burnt area in Portugal, over the period from 975 to 5, while in the remaining watersheds fire perimeters were randomly aligned. In Figure 5., some descriptive maps of the data of interest are displayed. The left plot shows the total area burnt in each watershed, whereas the middle plot represents the mean slope of the fires in each region. Finally, the right plot indicates which watersheds exhibit a preferred fire orientation, versus a random orientation, according to Barros et al.. The authors argued that spatial patterns of fire perimeter orientation found in the -year dataset could be explained by dominant weather during the Portuguese fire season Pereira et al., 5. However, given that fire perimeter orientation analysis is eventbased i.e., it is based on the orientation of each fire event all perimeters are treated equally independently of their size. In this paper, a test for assessing independence between wildfire size and orientation is presented, complementing the work of Barros et al.. Furthermore, orientation of the wildfire will be considered in two-dimensional and three-dimensional spaces. Spatial characterization of a wildfire, by means of its main orientation, and the associated burnt area, must be handled by non-standard statistical approaches, given the special nature of fire orientation. Specifically, it can be measured as an angle in the plane two-dimensional orientation or as a pair of angles identifying a direction in the three-dimensional sphere, if the main slope of the wildfire is taken into account. Hence, appropriate methods for handling circular and, more generally, directional data must be considered, jointly with suitable combinations of directional and linear techniques. The analysis of the relation between directional and linear variables has been classically approached through the construction of circular-linear correlation coefficients. The adaptation of the classical linear correlation coefficient to the circular-linear setting was introduced by Mardia 976 and Johnson and Wehrly 977 and further studied by Liddell and Ord 978, who obtained its exact distribution under certain parametric assumptions. For the circular-linear case, a rank-based test of association was also proposed by Mardia 976, who derived its asymptotic distribution. Later, Fisher and Lee 98 adapted Kendall s τ as a measure of circular-linear association based on the notion of concordance in the cylinder. To the best of the authors knowledge, these three tests are the only available for testing the independence in directional-linear variables. As they are based on correlation coefficients, these tests are only powerful against deviations in the conditional expectation that can be measured by the corresponding coefficient. As a consequence, none of these tests are able to capture all possible types of dependence, neither for the conditional expectation nor for more complex types of dependence.

129 5.. Introduction 9 Figure 5.: Descriptive maps of wildfires in Portugal with the watersheds delineated by Barros et al.. The left map shows the number of hectares burnt from fire perimeters associated with each watershed. Each fire perimeter is associated with the watershed that contains its centroid. The center map represents the mean slope of the fires of each watershed, where the slope is measured in degrees stands for plain slope and 9 for a vertical one. Finally, the right map shows watersheds where fires display preferential alignment according to Barros et al.. From a different perspective, circular and linear variables can also be jointly modeled by the construction of circular-linear distributions. Johnson and Wehrly 978 introduced a method for deriving circular-linear densities with specified marginals. A new family of circular-linear distributions based on nonnegative trigonometric sums, which proved to be more flexible in capturing the data structure, was proposed by Fernández-Durán 7, adapting the method by Johnson and Wehrly 978. More recently, García-Portugués et al. a exploited the copula representation of the Johnson and Wehrly 978 family, allowing for a completely nonparametric estimator, which was applied to analyze SO concentrations and wind direction. Nevertheless, the aforementioned methods are designed for the circular-linear case, whereas in our context, a more general tool for handling directional-linear relations is needed, provided that wildfire orientation may be reported in two or three dimensions. In this paper, the assessment of the relation between a directional circular or spherical, as particular cases and a linear variable is approached through the construction of a formal test to check directional-linear independence. Inspired by the ideas of Rosenblatt 975 and Rosenblatt and Wahlen 99 for the linear setting see also Ahmad and Li 997, the proposed test statistic is based on a nonparametric directional-linear kernel density estimator and an L distance is taken as a discrepancy measure between the joint estimator and the one constructed under the independence hypothesis. The new test presents some interesting advantages: it is designed in a general fashion for directional variables of all dimensions and it is able to capture all kinds of deviations from independence by virtue the nonparametric density estimation. Besides, one gets a kernel density estimate as a spin-off, which provides further information about the form of dependence when independence is rejected.

130 Chapter 5. A nonparametric test for directional-linear independence The remainder of the paper is organized as follows. In Section 5., some background to kernel density estimation, for linear, directional and directional-linear data is presented. Section 5. is devoted to the introduction of the test statistic, introducing a simplified version of the test and describing in detail its practical application. The finite sample performance of the test, in terms of size and power, is assessed through a simulation study for circular-linear and spherical-linear variables. Application to real data is provided in Section 5., including data description and results, focusing on the assessment of independence between wildfire orientation and burnt area size in Portugal. Some discussion and final comments are given in Section Background to kernel density estimation In the linear setting, the basic building block for the independence test introduced by Rosenblatt 975 is a kernel density estimator. Independence between two linear random variables is assessed through an L distance between a bidimensional kernel density estimator and the product of the marginal kernel density estimators. In order to extend such a procedure to the directional-linear case, kernel density estimation for linear, directional and directional-linear variables is required. A brief background on kernel density estimators will be provided in this section. 5.. Linear kernel density estimation The well-known kernel density estimator for linear data was introduced by Rosenblatt 956 and Parzen 96. Given a random sample Z,..., Z n from a linear random variable Z i.e. with support suppz R with density f, the kernel density estimator at a point z R is defined as ˆf g z = ng n z Zi K g i= where K is a kernel function, usually a symmetric density about the origin, and g > is the smoothing or bandwidth parameter, which controls the roughness of the estimator. Properties of this estimator have been deeply studied see Silverman 986 or Wand and Jones 995 for comprehensive reviews. It is also well known that the choice of kernel normal, Epanechnikov, etc. has little effect on the overall shape of the kernel density estimate. However, the bandwidth is a key tuning parameter: large values produce oversmoothed estimates of f, whereas small values provide undersmoothed curves. Comprehensive reviews on bandwidth selection are given in Cao et al. 99, Chiu 996 and Jones et al. 996, among others., 5.. Directional kernel density estimation Denote by X a directional random variable with density f. The support of such a variable is the q-dimensional sphere, namely Ω q = { x R q+ : x + +x q+ = }, endowed with the Lebesgue measure in Ω q, that will be denoted by ω q. Therefore, a directional density is a nonnegative function that satisfies Ω q fx ω q dx =. The directional kernel density estimator was introduced by Hall et al. 987 and Bai et al Given a random sample X,..., X n, of a directional variable X with suppx Ω q and

131 5.. Background to kernel density estimation density f, at a point x Ω q the estimator is given by ˆf h x = c h,ql n x T X i L n h, 5. i= where L is the directional kernel, h > is the bandwidth parameter and c h,q L is a normalizing constant depending on the kernel L, the bandwidth h and the sphere dimension q. The scalar product of two vectors, x and y, is denoted by x T y, where T denotes the transpose operator. A common choice for the directional kernel is Lr = e r, r, also known as the von Mises kernel due to its relation with the von Mises-Fisher distribution see Watson 98. In a q-dimensional sphere, the von Mises density vmµ, κ is given by { f vm x; µ, κ = C q κ exp κx T µ }, C q κ = κ q π q+ I q κ, 5. where µ Ω q is the mean direction, κ is the concentration parameter around the mean and I ν is the modified Bessel function of order ν, z ν I ν z = t ν π / Γ ν + e zt dt. For the von Mises kernel, the value of c h,q L is C q /h e /h and the directional estimator 5. can be interpreted as a mixture of von Mises-Fisher densities: ˆf h x = n f vm x; X i, /h. n i= Note that large values of h provide a small concentration parameter, which results in a uniform model in the sphere, whereas small values of h give high concentrations around the sample observations, providing an undersmoothed curve. Cross-validation rules based on Likelihood Cross Validation LCV and Least Squares Cross Validation LSCV for bandwidth selection were discussed by Hall et al Directional-linear kernel density estimation. Consider a directional-linear random variable, X, Z with support suppx, Z Ω q R and joint density f. For the simple case of circular data q =, the support of the variable is the cylinder and, in general, the support is a multidimensional cylinder. Following the ideas in the previous sections for the linear and directional cases, given a random sample X, Z,..., X n, Z n, the directional-linear kernel density estimator at a point x, z Ω q R can be defined as ˆf h,g x, z = c h,ql n x T X i LK ng h, z Z i, 5. g i= where LK is a directional-linear kernel, g > is the linear bandwidth parameter, h > is the directional bandwidth and c h,q L is the directional normalizing constant. The estimator 5. was introduced by García-Portugués et al. b, who also studied its asymptotic properties in terms of bias and variance, and established its asymptotic normality.

132 Chapter 5. A nonparametric test for directional-linear independence A product kernel LK, = L K, specifically, the von Mises-normal kernel LKr, t = e r φ t, r,, t R, will be considered throughout this paper in order to simplify computations, where φ σ denotes the density of a normal with zero mean and standard deviation σ. Similarly to the linear and directional kernel density estimators, a smoothing parameter bidimensional, in this case is involved in the estimator construction. The cross-validation procedures introduced by Hall et al. 987 can be adapted to the directional-linear setting, yielding the following bandwidth selectors: n h, g LCV = arg max log h,g> i= n n h, g LSCV = arg max h,g> ˆf i h,g X i, Z i, i= ˆf h,g i X i, Z i ˆf h,g x, z dz ω q dx where fh,g i represents the kernel density estimator computed without the i-th datum. 5. A test for directional-linear independence The new test statistic for assessing independence between a directional and a linear variable is described in this section. 5.. The test statistic Consider the joint directional-linear density f X,Z for the variable X, Z. f X and f Z denote the directional and linear marginal densities, respectively. The null hypothesis of independence between both components can be stated as, H : f X,Z x, z = f X xf Z z, x, z Ω q R and the alternative hypothesis as H a : f X,Z x, z f X xf Z z, for any x, z Ω q R. Following the idea of Rosenblatt 975, a natural statistic to test H arises from considering the L distance between the nonparametric estimation of the joint density f X,Z by the directionallinear kernel estimator 5., denoted by ˆf X,Z;h,g, and the nonparametric estimation of f X,Z under H, given by the product of the marginal directional and linear kernel estimators, denoted by ˆf X;h and ˆf Z;g, respectively. We therefore propose the following test statistic: T n = ˆfX,Z;h,g, ˆf X;h ˆfZ;g, 5. where stands for the squared L distance in Ω q R between two functions f and f : f, f = f x, z f x, z ω q dx dz. The test statistic depends on a pair of bandwidths h, g, which is used for the directionallinear estimator, and whose components are also considered for the marginal directional and

133 5.. A test for directional-linear independence linear kernel density estimators. Under the null hypothesis of independence, H, it holds that E ˆfX,Z;h,g x, z = E ˆfX;h x E ˆfZ;g z. Asymptotic properties of 5. have been studied by García-Portugués et al., who proved its asymptotic normality under independence, but with a slow rate of convergence that does not encourage its use in practice. For that reason, a calibration mechanism will be needed for the practical application of the test. In addition, the construction of T n requires the calculation of an integral over Ω q R, which may pose computational problems since it involves the calculation of several nested integrals. However, if the kernel estimators are obtained using von Mises and normal kernels, then an easy to compute expression for T n can be obtained, as stated in the following lemma. Lemma 5.. If the kernel estimators involved in 5., obtained from a random sample {X i, Z i } n i= of X, Z, are constructed with von Mises and normal kernels, the following expression for T n holds: T n = n n Ψh Ωg n ΨhΩg + n Ψh n T n Ωg T n, 5.5 where denotes the Hadamard product and Ψh and Ωg are n n matrices given by C q /h Ψh = C q X i + X j /h, Ωg = ij φ g Z i Z j, ij where n is a vector of n ones and C q is the normalizing function 5.. The proof of this result can be seen in Appendix 5.A. Note that expression 5.5 for T n only requires matrix operations. This will be the expression used for computing the test statistic. It should also be noted that the effect of the dimension q appears only in the definition of C q and in X i + X j, and both are easily scalable for large q. Thus, an important advantage of 5.5 is that computing requirements are similar for different dimensions q, something which is not the case if 5. is employed with numerical integration. 5.. Calibration of the test The null hypothesis of independence is stated in a nonparametric way, which determines the resampling methods used for calibration. However, as the null hypothesis is of a non-interaction kind, a permutation approach which is not at all foreign to hypothesis testing seems a reliable option. If {X i, Z i } n i= is a random sample from the directional-linear variable X, Z and σ is a random permutation of n elements, then { } n X i, Z σi, represents the resulting σ-permuted i= sample. Tn σ denotes the test statistic computed from the σ-permuted random sample. Under the assumption of independence between the directional and linear components, it is reasonable to expect that the distribution of T n is similar to the distribution of Tn σ, which can be easily approximated by Monte Carlo methods. In addition to its simplicity, the main advantage of the use of permutations is its easy implementation using Lemma 5., as it is possible to reuse the computation of the matrices Ψh and

134 Chapter 5. A nonparametric test for directional-linear independence Ωg needed for T n to compute a σ-permuted statistic Tn σ. In virtue of expression 5.5 and the definition of Tn σ, the σ-permuted test statistic is given by Tn σ = n n Ψh Ωσ g n ΨhΩσ g + n Ψh n T n Ω σ g T n, where the ij-th entry of the matrix Ω σ g is the σiσj-entry of Ωg. For the computation of Ψh and Ωg, symmetry properties reduce the number of computations and can also be used to optimize the products Ψh Ω σ g and ΨhΩ σ g. The last addend of T σ n is the same as that of T n and there is no need to recompute it. The testing procedure can be summarized in the following algorithm. Algorithm 5.. Let {X i, Z i } n i= be a random sample from a directional-linear variable X, Z. i. Obtain a suitable pair of bandwidths h, g. ii. Compute the observed value of T n from 5.5, with kernel density estimators taking bandwidths h, g. iii. Permutation calibration. For b =,..., B n!, compute T σ b n random permutation σ b. with bandwidths h, g for a iv. Approximate the p-value by # { T n T σ b n }/ B, where # denotes the cardinal of the set. In steps ii and iii, a pair of bandwidths must be chosen. For the directional-linear case, as commented in Section 5., cross-validation bandwidths, namely h, g LCV and h, g LSCV, can be considered. However, as usually happens with cross-validatory bandwidths, these selectors tend to provide undersmoothed estimators, something which a priori is not desirable as introduces a substantial variability in the statistic T n. To mitigate this problem, a more sophisticated bandwidth selector will be introduced. Considering the von Mises-normal kernel, the bootstrap version for the Mean Integrated Squared Error MISE of the directional-linear kernel density estimator 5. was derived by García-Portugués et al. b: MISE h p,g p h, g = C q /h C q /h π gn + n n n Ψ h Ω g Ψ h Ω g + Ψ Ω T n, where matrices Ψ ah and Ω ag, a =,, are Ψ C q /h p = C q Xi + X j /h, Ψ C h = q /h p ij Ωq C q /h p e xt X j /h p C q x/h + X i /h p Ψ C q /h C q /h p h = Ω q C q x/h + X i /h p C q x/h + X j /h p ω qdx, ij Ω = φ g p Z i Z j, ij Ω ag = φ σa,g Z i Z j ij, ω qdx with σ a,g = ag + gp, a =,, and h p, g p a given pair of pilot bandwidths. Then, the estimation bandwidths are obtained as h, g bo = arg min h,g> MISE h p,g p h, g. ij,

135 5.. A test for directional-linear independence 5 The choice of h p, g p is needed in order to compute h, g bo. This must be done by a joint criterion for two important reasons. Firstly, to avoid the predominance of smoothing in one component that may dominate the other this could happen, for example, if the directional variable is uniform, as in that case the optimal bandwidth tends to infinity. Secondly, to obtain a test with more power against deviations from independence. Based on these comments, a new bandwidth selector, named Bootstrap Likelihood Cross Validation BLCV, is introduced: h, g BLCV = arg min h,g> MISE h,g MLCV h, g, where the pair of bandwidths h, g MLCV are obtained by enlarging the order of h, g LCV to be of the kind O n /6+q, O n /7, the order that one would expect for a pair of directionallinear pilot bandwidths. For the linear component, this can be seen in the paper by Cao 99, where the pilot bandwidth is proved to be g p = O n /7, larger than the order of the optimal estimation bandwidth, n /5. For the directional case there is no pilot bandwidth available, but considering that the order of the optimal estimation bandwidth is n /+q García-Portugués et al., b, then a plausible conjecture is h p = O n /6+q. 5.. Simulation study Six different directional-linear models were considered in the simulation study. The models are indexed by a δ parameter that measures the degree of deviation from the independence, where δ = represents independence and δ > accounts for different degrees of dependence. The models show three kind of possible deviations from the independence: first order deviations, that is, deviations in the conditional expectation M, M and M; second order deviations or conditional variance deviations M and M5 and first and second order deviations M6. In order to clarify notation, φ ; m, σ and f LN ; m, σ represent the density of a normal and a log-normal, with mean/log-scale m and standard deviation/shape σ. Notation q represents a vector of q zeros. M. f x, z = φ z; δ + x T µ, σ f vm x; µ, κ, with µ = q,, κ = and σ =. M. f x, z = f LN z; δ + x T µ, σ f vm x; µ, κ, with µ =, q, κ = and σ =. M. f x, z = rf LN z; δ + x T µ, σ + rφz; m, σ pf vm x; µ, κ + p f vm x; µ, κ, with µ = q,, µ = q,, κ =, κ =, p =, σ = σ =, m = and r =. M. f x, z = φ z; m, +δ xt µ r f vm x; µ, κ, with µ = q,, κ =, µ r =, q and m =. M5. f 5 x, z = f LN z; m, 5 + δx T µ µ pf vm x; µ, κ + pf vm x; µ, κ, with µ = q,, µ = q,, κ = κ =, p = and m =. M6. f 6 x, z = rf LN z; m, σ + rφ z; δ + x T µ r, + δxt µ r f vm x; µ, κ, with µ = q,, µ r =, q, κ =, p =, σ =, m = and r =. The choice of the models was done in order to capture situations with heteroskedasticity, skewness in the linear component and different types of von Mises mixtures in the directional component.

136 6 Chapter 5. A nonparametric test for directional-linear independence M Θ Z π π π π M Θ Z 5 6 π π π π M Θ Z π π π π M Θ Z π π π π M5 Θ Z π π π π M6 Θ Z π π π π Figure 5.: Random samples of n = 5 points for the simulation models in the circular-linear case, with δ =.5 situation with dependence. From left to right and up to down, M to M6. M, M and M present a deviation from the independence in terms of the conditional expectation; M and M5 account for a deviation in terms of the conditional variance and M6 includes deviations both in conditional expectation and variance. For the proposed models, different deviations from independence have been considered, by setting δ =,.5,.5. The proposed test statistic has been computed for all the models and sample sizes n = 5,,, 5,. The new test based on the permutation resampling described in Algorithm 5., depending on the bandwidth choice, is denoted by T LCV n and T BLCV n. The number of permutations considered was B = and the number of Monte Carlo replicates was M =. Both the circular-linear and spherical-linear cases were explored. For the circularlinear case, the test was compared with the three tests available for circular-linear association, described as follows: Circular-linear correlation coefficient from Mardia 976 and Johnson and Wehrly 977, denoted by R n. Rank circular-linear correlation coefficient from Mardia 976, denoted by U n. λ n measure of cylindrical association of Fisher and Lee 98, implemented with its incomplete version λ n considering m = 5 random -tuples. Although there exists an exact distribution for R n under certain normality assumptions on the linear response and asymptotic distributions for U n and λ n, for a fair comparison, the calibration of these tests has also been done by permutations B =. The exact and asymptotic

137 5.. A test for directional-linear independence 7 distributions for R n and U n were also tried instead of the permutation approach, providing empirical levels and powers quite similar to the ones based on permutations. The proportion of rejections under H k,δ for model number k with δ deviation is reported in Tables 5. and 5., for the circular-linear and spherical-linear cases, with different sample sizes. In the circular-linear case, the empirical size is close to the nominal level for all the competing tests. The Tn LCV test for this case shows in general a satisfactory behavior under the null hypothesis, except for some cases in M, M and M5, where the test tends to reject the null hypothesis more than expected. This is mostly corrected by Tn BLCV, with a decrease of power with respect to Tn LCV in M. For the spherical-linear case, the improvement in size approximation Tn BLCV is notable, specially for small sample sizes. If the tests maintain the nominal significance level of 5%, it is expected that approximately 95% of the observed proportions of rejections under the null hypothesis i.e. when δ = to lie within the interval.6,.6 to three decimal places. Regarding power, the test for Rn is the most powerful one for M and M, although the performance of Tn LCV and Tn BLCV, specially for M, is quite similar. This was to be expected, as the circular-linear association tests should present more power against deviations of the first order. However, for M, M and M5, all these tests are not able to distinguish the alternatives and the rejection ratios are close to the nominal level, resulting in λ n being the test with better behavior among them. In contrast, Tn LCV and Tn BLCV correctly detect the deviations from the null. In M6, Rn is only the most competitive for the situation with n = 5, with Tn LCV and Tn BLCV the most competitive for the remainder of situations. U n shows a similar performance to Rn, but with more power in M and less in M5. λ n is less affected than R n and U n by the change of models, but also has lower power than them for M, M and M6. The results for the spherical-linear case are quite similar to the previous ones for the empirical size, but with lower power in comparison with the circular-linear scenario, something expected as a consequence of the difference in dimensionality. Some final comments on the simulation results follow. For the different sample sizes and dimensions, the running times for Tn LCV and Tn BLCV are collected in Table 5.. Computation times for Tn LCV are very similar for different dimensions q, whereas Tn BLCV is affected by q due to the choice of the bandwidths h, g BLCV. The choice of the kernels was corroborated to be non-important for testing, as similar results were obtained for the test Tn LCV using the directional-linear kernel LKr, t = r, r t, t. Cross-validatory bandwidths LSCV and BLSCV were also tried in the simulation study, providing worse results this is also what usually happens with directional data, as it can be seen in García-Portugués. Finally, it is worth mentioning that bootstrap calibration was also tried as an alternative to the permutation approach, using a pair of bandwidths for estimation and another pair for the smooth resampling. The results in terms of size, power and computing times were substantially worse than the ones obtained for permutations. In conclusion, both Tn LCV and Tn BLCV tests show a competitive behavior in all the simulation models, sample sizes and dimensions considered, only being outperformed by Rn in M and M. Nevertheless, for those models, the rejection rates of both tests are in general close to the ones of Rn. The test Tn BLCV corrects the over rejection of Tn LCV in certain simulation models, without a significant loss in power but at the expense of a high computational cost. Finally, the classical tests Rn, U n and λ n presented critical problems on detecting second order and some

138 8 Chapter 5. A nonparametric test for directional-linear independence n Model Circular-linear R n U n λ n T LCV n T BLCV n Spherical-linear T LCV n T BLCV n 5 H, H, H, H, H 5, H 6, H, H, H, H, H 5, H 6, H, H, H, H, H 5, H 6, H, H, H, H, H 5, H 6, H, H, H, H, H 5, H 6, H, H, H, H, H 5, H 6, H, H, H, H, H 5, H 6, H, H, H, H, H 5, H 6, H, H, H, H, H 5, H 6, Table 5.: Proportion of rejections for the Rn, U n, λ n, Tn LCV and Tn BLCV tests of independence for sample sizes n = 5,, for a nominal significance level of 5%. For the six different models the values of the deviation from independence parameter are δ = independence,.5 and.5. Each proportion was calculated using B = permutations for each of M = random samples of size n simulated from the specified model.

139 5.. A test for directional-linear independence 9 n Model Circular-linear R n U n λ n T LCV n T BLCV n Spherical-linear T LCV n T BLCV n 5 H, H, H, H, H 5, H 6, H, H, H, H, H 5, H 6, H, H, H, H, H 5, H 6, H, H, H, H, H 5, H 6, H, H, H, H, H 5, H 6, H, H, H, H, H 5, H 6, Table 5.: Proportion of rejections for the Rn, U n, λ n, Tn LCV and Tn BLCV tests of independence for sample sizes n = 5, for a nominal significance level of 5%. For the six different models the values of the deviation from independence parameter are δ = independence,.5 and.5. Each proportion was calculated using B = permutations for each of M = random samples of size n simulated from the specified model. Test q Sample size 5 5 T LCV n T BLCV n Table 5.: Computing times in seconds for Tn LCV and Tn BLCV as a function of sample size and dimension q, with q = for the circular-linear case and q = for the spherical-linear case. The tests were run with B = permutations and the times were measured in a.5 GHz core.

140 Chapter 5. A nonparametric test for directional-linear independence first order deviations from the independence. For all those reasons, the final recommendation is to preferably use the test Tn BLCV for inference on directional-linear independence and Tn LCV for a less computing intensive exploratory analysis. 5. Real data analysis 5.. Data description The original Portuguese fire atlas, covering the period from 975 to 5, is the longest annual and country-wide cartographic fire database in Europe Pereira and Santos,. Annual wildfire maps were derived from Landsat data, which represents the world s longest and continuously acquired collection of moderate resolution land remote sensing data, providing a unique resource for those who work in forestry, mapping and global change research. For each year in the dataset, Landsat imagery covering Portugal s mainland was acquired after the end of the fire season, thus providing a snapshot of the fires that occurred during the season. Annual fire perimeters were derived through a semi-automatic procedure that starts with supervised image classification, followed by manual editing of classification results. Minimum Mapping Unit MMU, i.e., the size of the smallest fire mapped, changed according to available data. Between 975 and 98 the MultiSpectral Scanner era, spatial resolution of satellite images is 8 meters and MMU of 5 hectares. From 98 onwards with data availability at spatial resolution of meters Thematic Mapper and Enhanced Thematic Mapper era MMU is 5 hectares, allowing to map a larger number of smaller fires than in the era. Below an MMU of approximately 5 hectares the burnt area classification errors increase substantially, and given the very skewed nature of fire size distribution, the 5 hectares threshold ensures that over 9% of total area actually burnt is mapped. For consistency, and due to discrepancies in minimum mapping unit between and 985 5, in this study only fire perimeters mapped in the latter period were considered, which results in 687 fire perimeters. This application is based on the watershed delineation proposed by Barros et al.. In their work, watersheds were derived from the Shuttle Radar Topography Mission SRTM digital terrain model Farr et al., 7 using the ArcGIS hydrology toolbox ESRI, 9. Minimum watershed size was interactively increased so that each watershed contained a minimum of 5 fire observations see the cited work for more details. Fire perimeters straddling watershed boundaries were allocated to the watershed that contained its centroid. The orientation of fire perimeters and watersheds was determined by principal component analysis, following the approach proposed by Luo 998, pages 6. Specifically, principal component analysis was applied to the points that constitute the object s boundary fire or watershed, with orientation given by the first principal component PC. Boundary points can be represented either in bidimensional space defined by each vertex s latitude and longitude coordinates, or in tridimensional space, taking also into account the altitude. Then, the PC corresponds to an axis that passes through the center of mass of the object and maximizes the variance of the projected vertices, represented in R or in R. The fact of computing the PC also in R aims to take into account the variability of fires according to their slope, which, as the center plot of Figure 5. shows, presents marked differences between regions. Then, the orientation of the object is taken as the direction given by its PC.

5.. Real data analysis It is important to notice that an orientation is an axial observation, and that some conversion is needed for applying the directional-linear independence test.

141 5.. Real data analysis It is important to notice that an orientation is an axial observation, and that some conversion is needed for applying the directional-linear independence test. In the two-dimensional case, the orientations can be encoded by an angular variable Θ, π, with period π, so Θ is a circular variable. Then, with this codification, the angles, π, π, π represent the E/W, NE/SW, N/S and NW/SE orientations, respectively. In the three-dimensional space, the orientation is coded by a pair of angles Θ, Φ using spherical coordinates, where Θ, π plays the same role as the previous setting and Φ, π measures the inclination Φ = π for flat slope and Φ = for vertical; only positive angles are considered as the slope of a certain angle ω equals the slope of ω. Therefore, points with spherical coordinates Θ, Φ, which lie on the upper semisphere, can be regarded as a realization of a spherical variable. 5.. Results The null hypothesis of independence between wildfire orientation and its burnt area in log scale is rejected, either using orientations in R or in R, with a common p-value.. The test is carried out using the bandwidth selector BLCV considered from now on and all the 687 observations for years 985 5, ignoring stratification by watershed, and with B = permutations. The p-values for the null hypothesis of independence between the orientation of a watershed and the total burnt area of fires within the region are.8 and. for orientations in R and in R, respectively. Therefore, the null hypothesis is emphatically rejected. Figure 5.: p-values from the independence test for the first principal component PC of the fire perimeter and the burnt area on a log scale, by watersheds. From left to right, the first and second maps represent the circular-linear p-values PC in R and their corrected versions using the FDR, respectively. The third and fourth maps represent the spherical-linear situation PC in R, with uncorrected and corrected p-values by FDR, respectively.

142 Chapter 5. A nonparametric test for directional-linear independence After identifying the presence of dependence between wildfire orientation and size, it is possible to carry out a watershed-based spatial analysis by applying the test to each watershed, in order to detect if the presence of dependence is homogeneous, or if it is only related to some particular areas. Figure 5. represents maps of p-values of the test applied to the observations of each watershed, using PC in R and in R from left to right, first and third plots of Figure 5., respectively. The maps reveal the presence of and 7 watersheds where the null hypothesis of independence is rejected with significance level α =.5, for the circular-linear and the spherical-linear cases, respectively. This shows that the presence of dependence between fire orientation and size is not homogeneous and it is located in specific watersheds see Figure 5.. It is also interesting to note that the inclusion of the altitude coordinate in the computation of the PC leads to a richer detection of dependence between the wildfire orientation and size at the watershed level. This is due to the negative relation between the fire slope and size see Figure 5., as large fires tend to have a flatter PC in R because they occur over highly variable terrain. Finally, the resulting p-values from the watershed analysis can also be adjusted using the False Discovery Rate FDR procedure of Benjamini and Yekutieli from left to right, second and fourth plots of Figure 5.. It is also possible to combine the p-values of the unadjusted maps with the FDR to test for independence between the wildfire orientation and the log-burnt area. The resulting p-values are. for the circular-linear and spherical-linear cases. Log of burnt area E/W NE/SW N/S NW/SE E/W Fire orientation Figure 5.: Left: density contour plot for fires in watershed number, the watershed in the second plot on the left of Figure 5. with p-value =.. The number of fires in the watershed is n = 5. The contour plot shows that the size of the area burnt is related with the orientation of the fires in the watershed. Right: scatter plot of the fires slope and the burnt area for the whole dataset, with a nonparametric kernel regression curve showing the negative correlation between fire slope and size. 5.5 Discussion A nonparametric test for assessing independence between a directional and a linear component has been proposed, and its finite sample performance has been investigated in a simulation study. Simulation results support a satisfactory behavior of the permutation test implemented

143 5.5. Discussion with LCV and BLCV bandwidths, in most cases outperforming the available circular-linear testing proposals, and being competitive in other cases. The proposed BLCV bandwidths presents better results in terms of empirical size, although further study is required in bandwidth selection. In addition, when the null hypothesis of independence is rejected, the kernel density estimate can be used to explore the form of dependence, at least for the circular-linear and spherical-linear cases. The application of the test to the entire wildfire orientation and size dataset makes possible the detection of dependence between these two variables, for both two-dimensional or threedimensional orientation. The same conclusion holds for watershed orientation and total area burnt. A detailed study of each watershed allows for a more specific insight into the problem. The evidence of independence between fire size and fire orientation in some watersheds suggests that an event-based analysis such as the work of Barros et al. should yield results similar to those that would be expected from an area-based analysis. On the other hand, detection of dependence between fire size and orientation in watersheds with uniform orientation Barros et al., highlights cases where there may be a mixture of orientations. In such cases, an analysis taking fire size into account might find evidence of preferential orientation in fire perimeters. In watersheds where fire events show preferential orientation non-uniform distribution and there is dependence between size and orientation, fire orientation distributions are structured in relation to fire size, especially considering the typically asymmetric nature of fire size distributions, dominated by a small number of very large events Strauss et al., 989. In these cases, an area-weighted analysis of fire perimeter orientation might lead to different results than those found by Barros et al.. When altitude is included in calculation of the PC in R it highlights the negative relation between fire slope and size, which is mostly due to the fact that larger fires present flatter PC. Slope has a skewed distribution, with low mean value and a relatively long right tail. Thus, while small fires usually occur on high slopes, large fires on consistently steep areas are unlikely. Finally, it can be argued that the data are probably not independent and identically distributed over space and time. Unfortunately, given the data gathering procedure detailed at the beginning of Section 5. dependence patterns cannot be clearly identified. Accounting for temporal or spatial dependence directly in the directional-linear kernel estimator and in the testing procedure is an open problem. Acknowledgments The authors acknowledge the support of Project MTM8, from the Spanish Ministry of Science and Innovation, Project MDS75PR from Dirección Xeral de I+D, Xunta de Galicia and IAP network StUDyS, from Belgian Science Policy. Work of E. García-Portugués has been supported by FPU grant AP 957 from the Spanish Ministry of Education and work of A. M. G. Barros, by Ph.D. Grant SFRH/BD/98/7 from the Fundação para a Ciência e Tecnologia. J. M. C. Pereira participated in this research under the framework of research projects Forest fire under climate, social and economic changes in Europe, the Mediterranean and other fire-affected areas of the world FUME, EC FP7 Grant Agreement No. 888 and Fire-Land-Atmosphere Inter-Relationships: understanding processes to predict wildfire regimes in Portugal FLAIR, PTDC/AAC/AMB/7/8. Authors gratefully acknowledge an anonymous referee for the suggestion of employing permutations for the test calibration and

144 Chapter 5. A nonparametric test for directional-linear independence the careful revision of the paper. Authors also acknowledge the suggestions raised by another referee. 5.A Proof of Lemma 5. Proof. The closed expression just involving matrix computations for T n is obtained by splitting the calculus into three addends: T n = ˆfX,Z;h,g x, z ˆf X;h x ˆf Z;g z dz ωq dx ch,q L n x T X i z Zi = L ng h i= K g ˆf X;h x ˆf Z;g z dz ω q dx n n c h,q L x T X i z Zi x T X j z Zj = i= j= n g L h K L g h K dz ω q dx g n c h,q L x T X i z Zi L i= ng h K ˆf X;h x g ˆf Z;g z dz ω q dx + ˆf X;h x ˆfZ;g z dz ω q dx = The first addend is 6 = c h,ql n g = c h,ql n g n = c h,ql n e /h = C q /h n n i= j= n n e /h i= j= Ω q n n i= j= n n i= j= For the second addend, x T X i z Zi L h K g L e xt X i +X j /h ω q dx K R φ g Z i Z j C q X i + X j /h φ g Z i Z j C q X i + X j /h. x T X j z Zj h K dz ω q dx g z Zi z Zj K dz g g n c h,q L x T X i z Zi 7 = L i= ng h K ˆf X;h x g ˆf Z;g z dz ω q dx = c h,ql n x T X i z Zi L ˆf ng i= Ω q h X;h x ω q dx K ˆf Z;g z dz R g = c { h,ql n x T X i ch,q L n x T X j L ng i= Ω q h L n h j= ω q dx z Zi n } z Zk K dz g ng g K R k=

145 References 5 = n { n } n c h,q L e /h e xt X i +X j /h ω q dx φ g Z i Z k i= Ω q k= = n { n C q /h n } n C q X i + X j /h φ g Z i Z k. i= j= Finally, the third addend is obtained as 8 = ˆf X;h x ˆfZ;g z dz ω q dx k= = ˆfX;h x ω q dx ˆf Z;g z dz Ω q R n n c h,q L x T X i x T X j = L ω q dx = n i= j= n n n i= j= n n i= j= Ω q L n g K R h z Zi g C q /h C q X i + X j /h K h z Zj dz g n n n i= j= φ g Z i Z j From the previous results and after applying some matrix algebra, it turns out that T n = n n Ψh Ωg n ΨhΩg + n Ψh n T n Ωg T n,. where: C q /h Ψh = C q X i + X j /h, Ωg = ij φ g Z i Z j. ij References Ahmad, I. A. and Li, Q Testing independence by nonparametric kernel method. Statist. Probab. Lett., :. Bai, Z. D., Rao, C. R., and Zhao, L. C Kernel estimators of density function of directional data. J. Multivariate Anal., 7: 9. Barros, A. M. G., Pereira, J. M. C., and Lund, U. J.. Identifying geographical patterns of wildfire orientation: a watershed-based analysis. Forest. Ecol. Manag., 6:98 7. Benjamini, Y. and Yekutieli, D.. The control of the false discovery rate in multiple testing under dependency. Ann. Statist., 9: Cao, R. 99. Bootstrapping the mean integrated squared error. J. Multivariate Anal., 5:7 6.

146 6 Chapter 5. A nonparametric test for directional-linear independence Cao, R., Cuevas, A., and Gonzalez Manteiga, W. 99. A comparative study of several smoothing methods in density estimation. Comput. Statist. Data Anal., 7:5 76. Chiu, S.-T A comparative review of bandwidth selection for kernel density estimation. Statist. Sinica, 6:9 5. ESRI 9. ArcMap 9.l. Environmental Systems Resource Institute, Redlands. Farr, T. G., Rosen, P. A., Caro, E., Crippen, R., Duren, R., Hensley, S., Kobrick, M., Paller, M., Rodriguez, E., Roth, L., Seal, D., Shaffer, S., Shimada, J., Umland, J., Werner, M., Oskin, M., Burbank, D., and Douglas, A. 7. The shuttle radar topography mission. Rev. Geophys., 5. Fernández-Durán, J. J. 7. Models for circular-linear and circular-circular data constructed from circular distributions based on nonnegative trigonometric sums. Biometrics, 6: Finney, M. A.. Design of regular landscape fuel treatment patterns for modifying fire growth and behavior. Forest Sci., 7:9 8. Fisher, N. I. and Lee, A. J. 98. Biometrika, 68: Nonparametric measures of angular-linear association. García-Portugués, E.. Exact risk improvement of bandwidth selectors for kernel density estimation with directional data. Electron. J. Stat., 7: García-Portugués, E., Crujeiras, R. M., and González-Manteiga, W. a. Exploring wind direction and SO concentration by circular-linear density estimation. Stoch. Environ. Res. Risk Assess., 75: García-Portugués, E., Crujeiras, R. M., and González-Manteiga, W. b. Kernel density estimation for directional-linear data. J. Multivariate Anal., :5 75. García-Portugués, E., Crujeiras, R. M., and González-Manteiga, W.. Central limit theorems for directional and linear data with applications. Statist. Sinica, to appear. Hall, P., Watson, G. S., and Cabrera, J Kernel density estimation with spherical data. Biometrika, 7: Johnson, R. A. and Wehrly, T Measures and models for angular correlation and angularlinear correlation. J. Roy. Statist. Soc. Ser. B, 9: 9. Johnson, R. A. and Wehrly, T. E Some angular-linear distributions and related regression models. J. Amer. Statist. Assoc., 76:6 66. Jones, C., Marron, J. S., and Sheather, S. J Progress in data-based bandwidth selection for kernel density estimation. Computation. Stat., :7 8. Liddell, I. G. and Ord, J. K Linear-circular correlation coefficients: some further results. Biometrika, 65:8 5. Lloret, F., Calvo, E., Pons, X., and Diaz-Delgado, R.. Wildfires and landscape patterns in the Eastern Iberian Peninsula. Landscape Ecol., 78:

147 References 7 Luo, D Pattern recognition and image processing. Horwood Series in Engineering Science. Horwood, Chichester. Mardia, K. V Linear-circular correlation coefficients and rhythmometry. Biometrika, 6: 5. Moreira, F., Rego, F. C., and Ferreira, P. G.. Temporal pattern of change in a cultural landscape of northwestern Portugal: implications for fire occurrence. Landscape Ecol., 6: Moreira, F., Viedma, O., Arianoutsou, M., Curt, T., Koutsias, N., Rigolot, E., Barbati, A., Corona, P., Vaz, P., Xanthopoulos, G., Mouillot, F., and Bilgili, E.. Landscapewildfire interactions in southern Europe: implications for landscape management. J. Environ. Manag., 9:89. Parzen, E. 96. On estimation of a probability density function and mode. Ann. Math. Statist., : Pereira, J. and Santos, T.. Fire risk and burned area mapping in Portugal. Technical report, Direção-Geral das Florestas, Lisboa. Pereira, M. G., Trigo, R. M., da Camara, C. C., Pereira, J. M. C., and Leite, S. M. 5. Synoptic patterns associated with large summer forest fires in portugal. Agr. Forest Meteorol., 9: 5. Rosenblatt, M Remarks on some nonparametric estimates of a density function. Ann. Math. Statist., 7:8 87. Rosenblatt, M A quadratic measure of deviation of two-dimensional density estimates and a test of independence. Ann. Statist., :. Rosenblatt, M. and Wahlen, B. E. 99. A nonparametric measure of independence under a hypothesis of independent components. Statist. Probab. Lett., 5:5 5. Schmidt, D. A., Taylor, A. H., and Skinner, C. N. 8. The influence of fuels treatment and landscape arrangement on simulated fire behavior, Southern Cascade range, California. Forest. Ecol. Manag., 558 9:7 8. Silverman, B. W Density estimation for statistics and data analysis. Monographs on Statistics and Applied Probability. Chapman & Hall, London. Strauss, D., Bednar, L., and Mees, R Do one percent of forest fires cause ninety-nine percent of the damage? Forest Sci., 5:9 8. Wand, M. P. and Jones, M. C Kernel smoothing, volume 6 of Monographs on Statistics and Applied Probability. Chapman & Hall, London. Watson, G. S. 98. Statistics on spheres, volume 6 of University of Arkansas Lecture Notes in the Mathematical Sciences. John Wiley & Sons, New York.

148 8 Chapter 5. A nonparametric test for directional-linear independence

149 Chapter 6 Central limit theorems for directional and linear random variables with applications Abstract A central limit theorem for the integrated squared error of the directional-linear kernel density estimator is established. The result enables the construction and analysis of two testing procedures based on the squared loss: a nonparametric independence test for directional and linear random variables and a goodness-of-fit test for parametric families of directional-linear densities. Limit distributions for both test statistics, and a consistent bootstrap strategy for the goodness-of-fit test, are developed for the directional-linear case and adapted to the directionaldirectional setting. Finite sample performance for the goodness-of-fit test is illustrated in a simulation study. This test is also applied to real datasets from biology and environmental sciences. Reference García-Portugués, E., Crujeiras, R. M. and González-Manteiga, W.. Central limit theorems for directional and linear random variables with applications. Statist. Sinica, to appear. Contents 6. Introduction Background Central limit theorem for the integrated squared error Main result Extensions of Theorem Testing independence with directional random variables Goodness-of-fit test with directional random variables Testing a simple null hypothesis Composite null hypothesis Calibration in practise Extensions to directional-directional models

150 Chapter 6. Central limit theorems for directional and linear random variables 6.6 Simulation study Real data application A Sketches of the main proofs A. CLT for the integrated squared error A. Testing independence with directional data A. Goodness-of-fit test for models with directional data References Introduction Statistical inference on random variables comprises estimation and testing procedures that allow to characterize the underlying distribution, regardless the variables nature and/or dimension. Specifically, density estimation stands out as one of the fundamental problems in statistical inference, where parametric and nonparametric approaches have been explored. Within the nonparametric alternatives, kernel density estimation see Silverman 986, Scott 99 or Wand and Jones 995, as comprehensive references for scalar random variables provides a simple and intuitive way to explore and do inference on random variables. Among other contexts, kernel density estimation has been also adapted to directional data see Mardia and Jupp, that is, data in the q-dimensional sphere. This kind of data arises, for example, in meteorology, when measuring wind direction, or in proteomics, when studying the angles in protein structure circular data, q =, see Fernández-Durán 7; in astronomy, stars positions in the celestial sphere q =, see García-Portugués are also directional data, and even in text mining large q, see Chapter 6 in Srivastava and Sahami 9 directional data examples can be found. Some early works on kernel density estimation with directional data are the papers by Hall et al. 987 and Bai et al. 988, who introduced kernel density estimators and their properties bias, variance and uniformly strong consistency, among others. The estimation of the density derivatives was studied by Klemelä, and Zhao and Wu stated a Central Limit Theorem CLT for the Integrated Squared Error ISE of the directional kernel density estimator. Some more recent works deal with the bandwidth selection problem, such as Taylor 8 and Oliveira et al., devoted to circular data and García-Portugués, for a general dimension. In some contexts, joint density models for directional and linear random variables are useful e.g. for describing wind direction and SO concentration García- Portugués et al., a. In this setting, a kernel density estimator for directional-linear data was proposed and analysed by García-Portugués et al. b. Regardless of estimation purposes, kernel density estimators have been extensively used for the development of goodness-of-fit tests see González-Manteiga and Crujeiras for a review and independence tests. For example, Bickel and Rosenblatt 97 and Fan 99, provided goodness-of-fit tests for parametric densities for real random variables. Similarly, in the directional setting, Boente et al. presented a goodness-of-fit test for parametric directional densities. For assessing independence between two linear random variables, Rosenblatt 975 proposed a test statistic based on the squared difference between the joint kernel density estimator and the product of the marginal ones see also Rosenblatt and Wahlen 99. This idea was adapted to the directional-linear setting by García-Portugués et al., who derived a permutation independence test and compared its performance in practise with the testing pro-

151 6.. Background posals given by Mardia 976, Johnson and Wehrly 978 and Fisher and Lee 98 in this context. The main device for the goodness-of-fit and independence tests presented above is the CLT for the ISE of the kernel density estimator. Hence, the main aim of this work is to provide such a result for the directional-linear kernel estimator introduced by García-Portugués et al. b and use it to derive a goodness-of-fit test for parametric families of directional-linear densities and an independence test for directional and linear variables. The CLT is obtained by proving an extended version of Theorem in Hall 98. The goodness-of-fit test follows by taking the ISE between the joint kernel estimator and a smoothed parametric estimate of the unknown density as a test statistic. For the independence test, the test statistic introduced in García-Portugués et al. will be considered and its asymptotic properties will be studied. Jointly with the asymptotic distribution, a bootstrap resampling strategy to calibrate the goodness-of-fit test will be investigated. Finite sample performance of the goodness-of-fit test is checked through an extensive simulation study, and this methodology is applied to analyse real datasets from forestry and proteomics. In addition, the results obtained for the directional-linear case are also adapted to the directional-directional context. This paper is organized as follows. Section 6. presents some background on kernel density estimation for directional and linear random variables. Section 6. includes the CLT for the ISE of the directional-linear estimator and its extension to the directional-directional setting. The independence test for directional and linear variables is presented in Section 6.. The goodnessof-fit test for simple and composite null hypotheses, its bootstrap calibration and extensions are given in Section 6.5. The empirical performance of the goodness-of-fit test is illustrated with a simulation study in Section 6.6 and with applications to real datasets in Section 6.7. Appendix 6.A collects the outline of the main proofs. Technical lemmas and further details on simulations and data analysis are provided as supplementary material, as well as the extensions of the independence test. 6. Background For the sake of simplicity, f will denote the target density along the paper, which may be linear, directional, directional-linear or directional-directional, depending on the context. Let Z denote a linear random variable with support suppz R and density f and let Z,..., Z n be a random sample of Z. The linear kernel density estimator is defined as ˆf g z = ng n z Zi K g i=, z R, where K denotes the kernel function and g > is the bandwidth parameter, which controls the smoothness of the estimator see Silverman 986, among others. Let X denote a directional random variable with density f and support the q-dimensional sphere, denoted by Ω q = { x R q+ : x + + x q+ = }. The Lebesgue measure in Ω q is denoted by ω q and, therefore, a directional density satisfies Ω q fx ω q dx =. When there is no possible confusion, ω q will also denote the surface area of Ω q : ω q = ω q Ω q = π q+ /Γ q+. The

152 Chapter 6. Central limit theorems for directional and linear random variables directional kernel density estimator introduced by Hall et al. 987 and Bai et al. 988 for a directional density f, based on a random sample X,..., X n in the q-sphere, is defined as ˆf h x = c h,ql n x T X i L n h, x Ω q, i= where L is the directional kernel, h > is the bandwidth parameter and the scalar product of two vectors, x and y, is denoted by x T y, where x T is the transpose of the column vector x. c h,q L is a normalizing constant depending on the kernel L, the bandwidth h and the dimension q. Specifically, Bai et al. 988 indicate that the inverse of the normalizing constant is given by c h,q L = λ h,q Lh q λ q Lh q, 6. where λ h,q L = ω q h Lrr q rh q dr and λ q L = q ω q Lrr q dr. The notation a n b n means that a n = b n + o, as n. A usual choice for the directional kernel is Lr = e r, also known as the von Mises kernel due to its relation with the von Mises-Fisher density see Watson 98, vmµ, κ, given by { } f vm x; µ, κ = C q κ exp κx T µ, C q κ = π q+ κ q I q being µ Ω q the directional mean, κ > the concentration parameter around the mean and I ν the modified Bessel function of order ν. The kernel estimator for a directional-linear density f based on a random sample X, Z,..., X n, Z n, with X i, Z i Ω q R, i =,..., n, was proposed by García-Portugués et al. b: ˆf h,g x, z = c h,ql ng n x T X i LK h i= κ,, z Z i, x, z Ω q R, 6. g where LK is a directional-linear kernel, h and g are the bandwidths for the directional and the linear components, respectively, and c h,q L is the normalizing constant. For the sake of simplicity, the product kernel LK, = L K is considered. Asymptotic properties of this estimator were studied by García-Portugués et al. b. In order to quantify the error of the density estimator, the ISE, defined as ISE ˆfh,g = ˆfh,g x, z fx, z dz ωq dx, can be used. In this expression, the integral is taken with respect to the product measure ω q m R, with m R denoting the usual Lebesgue measure in R. Finally, it is possible to define a directional-directional kernel density estimator at x, y Ω q Ω q from a random sample X, Y,..., X n, Y n, with X i, Y i Ω q Ω q, i =,..., n, that comes from a directional-directional density f: ˆf h,h x, y = c h,q L c h,q L n n i= L x T X i h y T Y i L h.

153 6.. Central limit theorem for the integrated squared error To fix the notation that will be used along the paper, note that Rϕ denotes the integral of the squared function ϕ along its domain. The following integrals will be also needed: µ K = R z Kz dz, b q L = Lrr q dr Lrr q dr. Density derivatives of different orders will be denoted as follows: fx, z fx, z =,..., x Hfx, z = fx, z x q+, fx,z fx,z x i x j x z fx,z z x T fx,z z T fx, z = x fx, z, z fx, z T, z = H xfx, z H x,z fx, z. H x,z fx, z T H z fx, z 6. Central limit theorem for the integrated squared error The main result in this paper, which is the basis for the tests presented in the next sections, is the CLT for the ISE of the kernel density estimator Main result The following conditions are required to derive the asymptotic distribution for the ISE: A. Extend f from Ω q R to R q+ \ {, z : z R}, by defining fx, z f x/ x, z for all x and z R. Assume that f and its first three derivatives are bounded and uniformly continuous with respect to the product Euclidean norm in Ω q R, x, z = x + z. A. Assume that L :,, and K : R, are continuous and bounded. Assume that L is nonincreasing and such that < λ q L, λ q L <, q and that K is a linear density, symmetric around zero and with µ K <. A. Assume that h = h n and g = g n are sequences of positive numbers such that h n, g n and nh q ng n as n. The uniform continuity and boundedness up to the second derivatives of f is a common assumption that appears, among others, in Hall 98 and Rosenblatt and Wahlen 99, while the assumption on the third derivatives is needed for the uniformity convergence. The assumption of compact support for the directional kernel L, stated in Zhao and Wu, is replaced by the nonincreasing requirement Lr Ls, for r s and the finiteness of λ q L and λ q L. These two conditions are less restrictive and allow for the consideration of the von Mises kernel. The next theorem provides the limit distribution of the ISE for 6.. Its proof is based on a generalization of Theorem in Hall 98, stated as Lemma A. in Appendix 6.A. Theorem 6. CLT for the directional-linear ISE. Denote the ISE of ˆf h,g by I n. Then, under conditions A A, it holds that i. n φh, g I n E I n ii. nh q g I n E I n d N,, if nφh, gh q g, d N, σ, if nφh, gh q g,

154 Chapter 6. Central limit theorems for directional and linear random variables iii. nh q g I n E I n where < δ < and φh, g = b ql d N, δ + σ, if nφh, gh q g δ, q σxh + µ K σzg + b qlµ K σ X,Z h g, q with σ X,Z = Cov tr H x f, X, Z, H z fx, Z, σ X = Var tr H xf, X, Z and σ Z = Var H z fx, Z. The remaining constants are given by: { σ = Rf γ q λ q L r q ρ q Lρϕ q r, ρ dρ} dr { KuKu + v du} dv, R R L r + ρ rρ + L r + ρ + rρ, q =, ϕ q r, ρ = θ q L r + ρ θrρ dθ, q, {, q =, γ q = ω q ωq q, q. The same limit distributions hold in i iii if E I n is replaced by E ˆfh,g x, z fx, z dz ωq dx + λ ql λ q L RK nh q. g Bearing in mind the CLT result in Hall 98 for the linear case, a bandwidth-free rate of convergence should be expected in iii. Nevertheless, when nφh, gh q g δ, the analytical difficulty of joining the two rates of convergence of the dominant terms forces the normalizing rate to be nh q g, although the sequence of bandwidths is restricted to satisfy the constraint nφh, gh q g δ. To clarify this point, the next corollary presents a special case with proportional bandwidth sequences where the rate of convergence can be analytically stated in a bandwidth-free form. Corollary 6.. Under conditions A A, and assuming g n < δ <, = βh n for a fixed β > and i. n h I n E I n d N, φ, β, if nh q+5, ii. nh q+ d I n E I n N, σ, if nh q+5, iii. n q+9 d q+5 I n E I n N, φ, βδ 6.. Extensions of Theorem 6. q+5 + σ δ q+ q+5, if nh q+5 δ. The previous results can be adapted to other contexts involving directional variables, such as directional-directional or directional-multivariate random vectors. Once the common structure and the effects of each component are determined, it is easy to reproduce the computations duplicating a certain component or modifying it. This will be used to derive the directional-directional versions of the most relevant results along the paper. By considering a single bandwidth for the

155 6.. Testing independence with directional random variables 5 estimator defined in R p as in Hall 98, for example, Theorem 6. can be easily adapted to account for the multivariate component. Considering the directional-directional estimator ˆf h,h, the corresponding analogues of conditions A A are obtained extending f from Ω q Ω q to {x, y R q +q + : x, y } and assuming nh q,n hq,n. Then, it is possible to derive a directional-directional version of Theorem 6.. Corollary 6. CLT for the directional-directional ISE. Denote the ISE of ˆfh,h by I n = Ω q Ω q ˆf h,h x, y fx, y ω q y ω q x. Then, under the directional-directional analogues of conditions A A, i. n φh, h I n E I n d Z, if nφh, h h q hq, ii. nh q hq d I n E I n σz, if nφh, h h q hq, iii. nh q hq d I n E I n δ + σ Z, if nφh, h h q hq δ, where < δ < and φh, h = b q L q σxh + b q L q σyh + 8b q L b q L σ X,Y h q q h. { σ = Rf γ q λ q L r q ρ q L ρϕ q r, ρ dρ} dr γ q λ q L { r q ρ q L ρϕ q r, ρ dρ} dr, with σ X,Y = Cov tr H x f, X, Y, tr H y f, X, Y, σ X = Var tr H xf, X, Y and σ Y = Var tr H y f, X, Y. The same limit distributions hold in i iii if E I n is replaced by ωq λ q L E ˆfh,h x, y fx, y x ω q y+ λ q L Ω q Ω q λ q L λ q L nh q. hq 6. Testing independence with directional random variables Given a random sample X, Z,..., X n, Z n from a directional-linear variable X, Z, one may be interested in the assessment of independence between both components. If such a hypothesis is rejected, the joint kernel density estimator may give an idea of the dependence structure between them. Let denote by f X,Z the directional-linear density of X, Z and by f X and f Z the directional and linear marginal densities, respectively. In this setting, the null hypothesis of independence is stated as H : f X,Z x, z = f X xf Z z, x, z Ω q R and the alternative hypothesis as H : f X,Z x, z f X xf Z z, for some x, z Ω q R. A statistic to test H can be constructed considering the squared distance between the nonparametric estimator of joint density, denoted in this setting by ˆf X,Z;h,g, and the product of the corresponding marginal kernel estimators, denoted by ˆf X,h and ˆf Z,g. That is: T n = ˆfX,Z;h,g x, z ˆf X;h x ˆf Z;g z dz ωq dx.

156 6 Chapter 6. Central limit theorems for directional and linear random variables This kind of test was firstly introduced by Rosenblatt 975 and Rosenblatt and Wahlen 99 for real bivariate random variables, considering the same bandwidths for smoothing both components. The directional-linear context will require the following assumption on the degree of smoothness in each component. A. Assume that h q ng n c, with < c <, as n. Theorem 6. Directional-linear independence test. Under conditions A A and under the null hypothesis of independence, where nh q g Tn A n A n = λ ql λ q L RK nh q g d N, σ I, λ ql λ q L Rf Z nh q RKRf X, ng and σ I is defined as σ in Theorem 6. but with Rf = Rf X Rf Z. Since the leading term is the same as in Theorem 6. for nφh, gh q g, the asymptotic variance is also the same. As in the CLT for the ISE, the effect of both components can be clearly disentangled in the asymptotic variance and in the bias term. The a priori complex contribution of the directional part in Theorems 6. and 6. is explained for a particular scenario in the supplementary material, together with some numerical experiments for the illustration of Theorem Goodness-of-fit test with directional random variables Testing methods for a specific parametric directional-linear density simple H or for a parametric family composite H will be presented in this section Testing a simple null hypothesis Given a random sample {X i, Z i } n i= from an unknown directional-linear density f, the simple null hypothesis testing problem is stated as H : f = f θ, θ Θ, where f θ is a certain parametric density with known parameter θ belonging to the parameter space Θ R p, with p. The alternative hypothesis will be taken as H : fx, z f θ x, z, for some x, z Ω q R in a set of positive measure. The proposed test statistic is given by R n = ˆfh,g x, z LK h,g f θ x, z dz ωq dx, 6. where LK h,g f θ x, z represents the expected value of ˆfh,g x, z under H. In general, for a function f, this expected value is given by LK h,g fx, z = c h,ql x T y LK g h, z t fy, t dt ω q dy. 6. g Smoothing the parametric density was considered by Fan 99, in the linear setting, to avoid the bias effects in the integrand of the square error between the nonparametric estimator under the alternative and the parametric estimate under the null. A modification of the smoothing proposal by Fan 99 was used by Boente et al. for the directional case.

157 6.5. Goodness-of-fit test with directional random variables 7 Theorem 6.. Under conditions A A and under the simple null hypothesis H : f = f θ, with θ Θ known, nh q g R n λ ql λ q L RK d nh q N, σθ g, where σ θ follows from replacing f = f θ in σ from Theorem Composite null hypothesis Consider the testing problem H : f F Θ = {f θ : θ Θ}, where F Θ is a class of parametric densities indexed by the p-dimensional parameter θ, vs. H : f / F Θ. Under H which is equivalent to H : f = f θ, θ Θ, where θ is unknown, a parametric density estimator fˆθ can be obtained by Maximum Likelihood ML. The following conditions are required: A5. The function f θ is twice continuously differentiable with respect to θ and the derivatives are bounded and uniformly continuous for x, z. A6. There exists θ Θ such that ˆθ θ = O P n and if H : f = f θ holds for a certain θ Θ, then θ = θ. A5 is a regularity assumption on the parametric density, whereas A6 states that the estimation of the unknown parameter must be n-consistent in order to ensure that the effects of parametric estimation can be neglected. Note that the n-consistency is required both under H for Theorem 6. and H for Theorem 6.6, which is satisfied by the ML estimator. The test statistic is an adaptation of 6., but plugging-in the estimator of the unknown parameter θ under H in the test statistic expression: R n = ˆfh,g x, z LK h,g fˆθx, z dz ωq dx. 6.5 Theorem 6. Goodness-of-fit test for directional-linear densities. Under conditions A A, A5 A6 and under the composite null hypothesis H : f = f θ, with θ Θ unknown, nh q g R n λ ql λ q L RK d nh q N, σθ g. Families of local alternatives, also known as Pitman alternatives, are a common way to measure power for tests based on kernel smoothers e.g. Fan 99. For the directional-linear case, these alternatives can be written as H P : fx, z = f θ x, z + nh q g x, z, 6.6 where x, z : Ω q R R is such that x, z dz ω qdx =. A necessary condition to derive the limit distribution of R n under H P estimator for θ : is that the estimator ˆθ is a n-consistent A7. For the family of alternatives 6.6, it holds that ˆθ θ = O P n. Theorem 6.5 Local power under Pitman alternatives. Under conditions A A, A5 A7 and under the alternative hypothesis 6.6, nh q g R n λ ql λ q L RK d nh q N R, σθ g.

158 8 Chapter 6. Central limit theorems for directional and linear random variables 6.5. Calibration in practise In order to effectively calibrate the proposed test in practise, a parametric bootstrap procedure is investigated. The bootstrap statistic is defined as Rn = ˆf h,g x, z LK h,g fˆθ x, z dz ωq dx, where the superscript indicates that the estimators are computed from the bootstrap sample {X i, Z i }n i= obtained from the density fˆθ, with ˆθ computed from the original sample. The bootstrap procedure, considering the composite null hypothesis testing problem, is detailed in the following algorithm. Calibration for the simple null hypothesis test can be done replacing ˆθ and ˆθ by θ. Algorithm 6. Testing procedure. Let {X i, Z i } n i= be a random sample from f. To test H : f = f θ, with θ Θ unknown, proceed as follows: i. Obtain ˆθ, a n-consistent estimator of θ. ii. Compute R n = Ω ˆfh,g q R x, z LK h,g fˆθx, z dz ωq dx. iii. Bootstrap strategy. For b =,..., B: a Obtain a random sample {X i, Z i }n i= from fˆθ. b Compute ˆθ as in step i, from the bootstrap sample in a. c Compute Rn b = ˆf h,g x, z LK h,g fˆθ x, z dz ωq dx, where ˆf h,g from the bootstrap sample in a. is obtained iv. Approximate the p-value of the test as p-value # { R b n R n } /B. The consistency of this testing procedure is proved in the next result, using the bootstrap analogue of assumption A6. A8. ˆθ ˆθ = O P n, where P represents the probability of X, Z conditioned on the sample {X i, Z i } n i=. Theorem 6.6 Bootstrap consistency. Under conditions A A, A5 A6 and A8, for a certain θ Θ, nh q g Rn λ ql λ q L RK d nh q N, σθ g in probability. Then, the probability distribution function pdf of R n conditioned on the sample converges in probability to a Gaussian pdf, regardless H holds or not. The asymptotic distribution coincides with the one of R n if H holds θ = θ Extensions to directional-directional models The directional-directional versions of the previous results follow under analogous assumptions modifying A5, 6. and 6.6 accordingly. The directional-directional test statistic for the composite hypothesis testing problem is R n = ˆfh,h x, y L L,h,h fˆθx, y ωq dy ω q dx. Ω q Ω q

159 6.6. Simulation study 9 Corollary 6. Goodness-of-fit test for directional-directional densities. Under the directionaldirectional analogues of conditions A A, A5 A6 and under the composite null hypothesis H : f = f θ, with θ Θ unknown, nh q hq R n λ q L λ q L λ q L λ q L nh q hq d N, σθ. 6.6 Simulation study The finite sample performance of the directional-linear and directional-directional goodness-offit tests is illustrated in this section for a variety of models, sample sizes and bandwidth choices. The study considers circular-linear and circular-circular scenarios, although these tests can be easily applied in higher dimensions, such as spherical-linear or spherical-circular, due to their general definition and resampling procedure. Details on simulated models and further results are included as supplementary material. Circular-Linear CL and Circular-Circular CC parametric scenarios are considered. Figures 6. and 6. show the density contours in the cylinder CL and in the torus CC for the different models. The detailed description of each model is given in the supplementary material. Deviations from the composite null hypothesis H : f F Θ are obtained by mixing the true density f θ with a density such that the resulting density does not belong to F Θ : H δ : f = δf θ +δ, δ. The goodness-of-fit tests are applied using the bootstrap strategy, for the whole collection of models, sample sizes n =, 5, and deviations δ =,.,.5 δ = for the null hypothesis. The number of bootstrap and Monte Carlo replicates is. In each case model, sample size and deviation, the performance of the goodness-of-fit test is shown for a fixed pair of bandwidths, obtained from the median of simulated Likelihood Cross Validation LCV bandwidths: h, g LCV = arg max ni= h,g> log fh,g i X i, Z i, h, h LCV = arg max ni= h,h > log fh i,h X i, Y i, 6.7 where f i... denotes the kernel estimator computed without the i-th datum. A deeper insight on the bandwidth effect is provided for some scenarios, where percentage of rejections are plotted for a grid of bandwidths see Figure 6. for two cases, and supplementary material for extended results. The kernels considered are the von Mises and the normal ones. Table 6. collects the results of the simulation study for each combination of model CL or CC, deviation δ and sample size n. When the null hypothesis holds, significance levels are correctly attained for α =.5 see supplementary material for α =.,., for all sample sizes, models and deviations. When the null hypothesis does not hold, the tests perform satisfactorily, having in both cases a quick detection of the alternative when only a % and a 5% of the data come from a density out of the parametric family. As expected, the rejection rates grow as the sample size and the deviation from the alternative do.

Chapter 6. Central limit theorems for directional and linear random variables Figure 6.: Density models for the simulation study in the circular-linear case.

160 Chapter 6. Central limit theorems for directional and linear random variables Figure 6.: Density models for the simulation study in the circular-linear case. From left to right and up to down, models CL to CL. Figure 6.: Density models for the simulation study in the circular-circular case. From left to right and up to down, models CC to CC.

161 6.6. Simulation study Model Sample size n and deviation δ n = n = 5 n = δ= δ=. δ=.5 δ= δ=. δ=.5 δ= δ=. δ=.5 CL CL CL CL CL CL CL CL CL CL CL CL CC CC CC CC CC CC CC CC CC CC CC CC Table 6.: Empirical size and power of the circular-linear and circular-circular goodness-of-fit tests for models CL CL and CC CC respectively with significance level α =.5 and different sample sizes and deviations. Figure 6.: Empirical size and power of the circular-linear left, model CL and circular-circular right, model CC8 goodness-of-fit tests for a logarithmic spaced grid. Lower surface represents the empirical rejection rate under H. and upper surface under H.5. Green colour indicates that the percentage of rejections is in the 95% confidence interval of α =.5, blue that is smaller and orange that is larger. Black points represent the empirical size and power obtained with the median of the LCV bandwidths.

162 Chapter 6. Central limit theorems for directional and linear random variables Finally, the effect of the bandwidths is explored in Figure 6.. For models CL and CC8, the empirical size and power of the tests are computed on a bivariate grid of bandwidths, for sample size n = and deviations δ = green surface, null hypothesis and δ =.5 orange surface. As it can be seen, the tests are correctly calibrated regardless the bandwidths value. However, the power is notably affected by the choice of the bandwidths, with different behaviours depending on the model and the alternative. Reasonable choices of the bandwidths, such as the median of the LCV bandwidths 6.7, present a competitive power. Further results supporting the same conclusions are available on supplement. 6.7 Real data application The proposed goodness-of-fit tests are applied to study two real datasets see supplementary material for further details. The first dataset comes from forestry and contains orientations and log-burnt areas of 687 wildfires occurred in Portugal between 985 and 5. Data was aggregated in watersheds, giving observations of the circular mean orientation and mean logburnt area for each watershed circular-linear example. Further details on the data acquisition procedure, measurement of fires orientation and watershed delimitation can be seen in Barros et al. and García-Portugués et al.. The model proposed by Mardia and Sutton 978 was tested for this dataset Figure 6., left using the LCV bandwidths and B = bootstrap replicates, resulting a p-value of.56, showing no evidence against the null hypothesis. Mean log burn area ψ 8º 9º º 9º 8º E/W NE/SW N/S NW/SE E/W 8º 9º º 9º 8º Fires mean orientation φ Figure 6.: Left: parametric fit model from Mardia and Sutton 978 to the circular mean orientation and mean log-burnt area of the fires in each of the watersheds of Portugal. Right: parametric fit model from Fernández-Durán 7 for the dihedral angles of the alanine-alanine-alanine segments. The second dataset contains pairs of dihedral angles of segments of the type alanine-alaninealanine in alanine amino acids in 9 proteins. The dataset, formed by pairs of angles circular-circular, was studied by Fernández-Durán 7 using Nonnegative Trigonometric Sums NTSS for the marginal and link function of the model of Wehrly and Johnson 979. The best model in terms of BIC described in Fernández-Durán 7 was implemented using twostep Maximum Likelihood Estimation MLE procedure and the tools of the CircNNTSR package Fernández-Durán and Gregorio-Domínguez for fitting the NTSS parametric densities Figure 6., right. The resulting p-value with the LCV bandwidths is., indicating that the dependence model of Wehrly and Johnson 979 is not flexible enough to capture the

163 6.A. Sketches of the main proofs dependence structure between the two angles. The reason for this lack of fit may be explained by a poor fit in a secondary cluster of data around Ψ = 9, as it can be seen in the contour plot shown in Figure 6.. Acknowledgements This research has been supported by project MTM8 from the Spanish Ministry of Science and StuDyS network, from the Interuniversity Attraction Poles Programme IAP network P7/6, Belgian Science Policy Office. First author s work has been supported by FPU grant AP 957 from the Spanish Ministry of Education. Authors acknowledge the computational resources used at the SVG cluster of the CESGA Supercomputing Center. The editors and two anonymous referees are acknowledged for their contributions. 6.A Sketches of the main proofs This section contains the sketches of the main proofs. Proofs for technical lemmas, complete numerical experiments and simulation results and further details on data analysis are given in supplement. 6.A. CLT for the integrated squared error Proof of Theorem 6.. The ISE can be decomposed into four addends, I n = I n, +I n, +I n, +I n, : I n, = c h,ql ng I n, = c h,ql n g I n, = c h,ql n g I n, = n i= n i= i<j n E ˆfh,g x, z where LK n x, z, y, t = LK x T y h LK n x, z, X i, Z i E ˆfh,g x, z LK n x, z, X i, Z i dz ω q dx, fx, z dz ω q dx, LK n x, z, X i, Z i LK n x, z, X j, Z j dz ω q dx, fx, z dz ωq dx,, z t g E LK x T X h., z Z g Except for the fourth term, which is deterministic, the CLT for the ISE is derived by examining the asymptotic behaviour of each addend. The first two ones can be written as I n, = n i= I i n, and I n, = c h,ql ni= I i n g n,, where Ii n, and Ii n, can be directly extracted from the previous expressions. Then, by Lemma A., and by Lemma A., n φh, g In, I n, = λ ql λ q L RK nh q g d N, O P n h q g. 6.9

164 Chapter 6. Central limit theorems for directional and linear random variables The third term can be written as follows: I n, = c h,ql n g i<j n H n X i, Z i, X j, Z j = c h,ql n g U n, 6. where U n is an U-statistic with kernel function H n given in Lemma A.. U n is degenerate since E LK n x, z, X, Z =. In order to properly apply Lemma A. for obtaining the asymptotic distribution of U n in 6., Lemma A. provides the explicit expressions for the required elements. Then, considering ϕ n in Lemma A., condition A n Bn is satisfied by assumption A and as a consequence B d n U n N,. Since the variance of In, is Var I n, = c h,ql n g Var U n = σ n h q + o, 6. g by Slutsky s theorem, results 6. and 6., n h q g In, From 6.8, 6.9 and 6., it follows that: d N, σ. 6. I n E I n = n φh, g Nn, + O P n h q g + σn h q g Nn,, 6. d where N n,, N n, N,. By A, n h q g q = o nh g and the second addend I n, is asymptotically negligible compared with I n,. In order to determine the dominant term between I n, and I n,, the squared quotient between the coefficients of both of them is examined, being of order nφh, gh q g. Then if nφh, gh q g the last term on 6. is asymptotically negligible in comparison with the first, while if nφh, gh q g, the first term is negligible in comparison with the last. Note also that by 6.9, expression 6. can also be stated as I n E ˆfh,g x, z fx, z dz ωq dx + λ ql λ q L RK nh q g = n φh, g Nn, + O P n h q g + σn h q g Nn,. The case where nφh, gh q g δ, < δ <, needs a special treatment because none of the terms can be neglected. In this case, I n E I n = n φh, g Nn, + σn h q g Nn, + O P n h q g = n h q g δ Nn, + σnn, + O P n h q g. In order to apply Lemma A., set Ũn = I n, + I n, with n Ũ n = ϕ n X i, Z i + i= i<j n H n X i, Z i, X j, Z j, where ϕ n X, Z = I n,, H n x, z, y, t = c h,ql H n g n x, z, y, t and G n x, z, y, t = E H n X, Z, x, z H n X, Z, y, t.

165 6.A. Sketches of the main proofs 5 By Lemma A. and the definitions of H n, G n, ϕ n and M n, E H n X, Z, X, Z = n h q g σ + o, E H n X, Z, X, Z = O n 8 h q g, E G n X, Z, X, Z = O n 8 h q g, E ϕ nx, Z = n φh, g + o, E ϕ nx, Z = O n h 8 + g 8, E MnX, Z = O n 6 h + g h q g. Applying these orders and using that nφh, gh q g δ, it follows A n B n = O n + O nh q g h q g + O nh q g + O h q g. Then, by A, the four previous orders tend to zero and therefore B n Ũ n N,, where B n n φh, g + n h q g σ n h q g δ + σ. Finally, n h q g δ + σ d I n, + I n, N, by Slutsky s theorem. Proof of Corollary 6.. As g = βh, for a fixed β >, then nφh, gh q g = O nh q+5 and the cases in Theorem 6. are given by the asymptotic behaviour of this sequence. When nh q+5 and nh q+5, the result is obtained immediately, whereas for nh q+5 δ, < δ < Lemma A. gives B n φ, βn h + σ n h q+ n q+9 q+5 φ, βδ q+5 + σ δ q+ q+5. Therefore, n q+9 q+5 φ, βδ q+5 + σ δ q+ q+5 I n, + I n, d N,. Proof of Corollary 6.. The proof follows from an adaptation of the proof of Theorem 6. to the directional-directional context. 6.A. Testing independence with directional data Proof of Theorem 6.. The test statistic is decomposed as T n = T n, + T n, + T n, taking into account that, under independence, E ˆfh,g x, z = E ˆfh x E ˆfg z : T n, = ˆfh,g x, z E ˆfh,g x, z dz ωq dx, T n, = T n, = ˆfh x ˆf g z E ˆfh x E ˆfg z dz ωq dx, ˆfh,g x, z E ˆfh,g x, z ˆfh x ˆf g z E ˆfh x E ˆfg z dz ω q dx. By Chebychev s inequality and Lemmas A.6 and A.7, the sum of the second and third addends is E T n, + O P n h q + g. Considering the test statistic decomposition and using Lemma A.5 yields T n = T n, E T n, + O P n h q + g d

166 6 Chapter 6. Central limit theorems for directional and linear random variables = λ ql λ q L RK nh q g RKRf X ng + σn h q g Nn λ ql λ q L Rf Z nh q, + o n h q + g + O P n h q + g. Now, O P n h q + g is negligible in comparison with the second addend by condition A and the deterministic order o n h q + g is also negligible by A and A. Therefore, nh q g d T n A n N, σi. 6.A. Goodness-of-fit test for models with directional data Proof of Theorem 6.. Under H : f = f θ, the test statistic R n = I n, + I n,, where I n, and I n, are given by 6.9 and 6. in the proof of Theorem 6., so nh q g I n, + I n, λ ql λ q L RK nh q g d N, σθ. 6. Proof of Theorem 6.. The test statistic is decomposed as R n = R n, + I n, + I n, + R n, by adding and subtracting E ˆfh,g x, z = LK h,g fx, z, with R n, = R n, = ˆfh,g x, z LK h,g fx, z LK h,g fx, z fˆθx, z dz ω q dx, LKh,g fx, z fˆθx, z dz ω q dx. The limit of I n, + I n, is given by 6. whereas by Lemma A.8 R n, and R n, are negligible in probability. Then, the limit distribution of R n is determined by I n, + I n,. Proof of Theorem 6.5. As in the proof of Theorem 6., R n = R n, + I n, + I n, + R n,, where I n, + I n, behaves as 6.. The asymptotic variance remains σθ since Rf = Rf θ + R Ω + fx, z x, z dz ω q R qdx nh q g n h q g and then the second and third addends are negligible with respect to the first one by assumption A, remaining the same asymptotic variance. The terms R n, = R n, + R n, and R n, = R n, + R n, + R n, are decomposed as follows: R n, = ˆfh,g x, z LK h,g fx, z LK h,g x, z dz ω q dx, nh q g Ω q R R n, = LK nh q g h,g x, z dz ω q dx, R n, = LK h,g fx, z fˆθx, z LK h,g x, z dz ω q dx. nh q g Ω q R The remaining terms follow from Lemma A.9.

167 References 7 Proof of Theorem 6.6. Similarly to the proof of Theorem 6., R n = R n, + I n, + I n, + R n,, where the terms involved are the bootstrap versions of the ones defined in the aforementioned proof: Rn, = ˆf h,g x, z LK h,g z fˆθx, LK h,g fˆθx, z fˆθ x, z dz ω q dx, In, = c h,ql n n g LKn x, z, X i, Zi dz ω q dx, i= In, = c h,ql n g LKn x, z, X i, Zi LKnx, z, X j, Zj dz ω q dx, i<j n Rn, = LK h,g fˆθx, z z fˆθ x, dz ωq dx, with LKnx, z, y, t = LK x T y, z t h g E LK x T X, z Z h g and where E represents the expectation with respect to fˆθ, which is obtained from the original sample. Using the same arguments as in Lemma A.8, but replacing assumption A6 by A8, it follows that nh q g Rn, and nhq g Rn, converge to zero conditionally on the sample, that is, in probability P. On the other hand, the terms In, and I n, follow from considering similar arguments to the ones used for deriving 6.9 and 6., but conditionally on the sample. Specifically, it follows that In, = λql λ ql RK nh q g +O P n h q g and, for a certain θ Θ, nh q g In, d N, σθ. The main difference with the proof of Theorem 6. concerns the asymptotic variance given by nh q g In, : Var nh q g In, p σθ, since by assumption A5, Rfˆθ = Rf θ + O P n. Hence, nh q g R n λ ql λ q L RK nh q g and the bootstrap consistency follows. = o P + O P nh q g + σθ N n + o P Proof of Corollary 6.. The proof follows by adapting the proof of Theorem 6.. References Bai, Z. D., Rao, C. R., and Zhao, L. C Kernel estimators of density function of directional data. J. Multivariate Anal., 7: 9. Barros, A. M. G., Pereira, J. M. C., and Lund, U. J.. Identifying geographical patterns of wildfire orientation: a watershed-based analysis. Forest. Ecol. Manag., 6:98 7. Bickel, P. J. and Rosenblatt, M. 97. On some global measures of the deviations of density function estimates. Ann. Statist., 6:7 95. Boente, G., Rodríguez, D., and González-Manteiga, W.. Goodness-of-fit test for directional data. Scand. J. Stat., :59 75.

168 8 Chapter 6. Central limit theorems for directional and linear random variables Fan, Y. 99. Testing the goodness of fit of a parametric density function by kernel method. Economet. Theor., :6 56. Fernández-Durán, J. J. 7. Models for circular-linear and circular-circular data constructed from circular distributions based on nonnegative trigonometric sums. Biometrics, 6: Fernández-Durán, J. J. and Gregorio-Domínguez, M. M.. CircNNTSR: an R package for the statistical analysis of circular data using NonNegative Trigonometric Sums NNTS models. R package version.. Fisher, N. I. and Lee, A. J. 98. Biometrika, 68: Nonparametric measures of angular-linear association. García-Portugués, E.. Exact risk improvement of bandwidth selectors for kernel density estimation with directional data. Electron. J. Stat., 7: García-Portugués, E., Barros, A. M. G., Crujeiras, R. M., González-Manteiga, W., and Pereira, J.. A test for directional-linear independence, with applications to wildfire orientation and size. Stoch. Environ. Res. Risk Assess., 85:6 75. García-Portugués, E., Crujeiras, R. M., and González-Manteiga, W. a. Exploring wind direction and SO concentration by circular-linear density estimation. Stoch. Environ. Res. Risk Assess., 75: García-Portugués, E., Crujeiras, R. M., and González-Manteiga, W. b. Kernel density estimation for directional-linear data. J. Multivariate Anal., :5 75. González-Manteiga, W. and Crujeiras, R. M.. An updated review of goodness-of-fit tests for regression models. Test, :6. Hall, P. 98. Central limit theorem for integrated square error of multivariate nonparametric density estimators. J. Multivariate Anal., : 6. Hall, P., Watson, G. S., and Cabrera, J Kernel density estimation with spherical data. Biometrika, 7: Johnson, R. A. and Wehrly, T. E Some angular-linear distributions and related regression models. J. Amer. Statist. Assoc., 76:6 66. Klemelä, J.. Estimation of densities and derivatives of densities with directional data. J. Multivariate Anal., 7:8. Mardia, K. V Linear-circular correlation coefficients and rhythmometry. Biometrika, 6: 5. Mardia, K. V. and Jupp, P. E.. Directional statistics. Wiley Series in Probability and Statistics. John Wiley & Sons, Chichester, second edition. Mardia, K. V. and Sutton, T. W A model for cylindrical variables with applications. J. Roy. Statist. Soc. Ser. B, :9.

169 References 9 Oliveira, M., Crujeiras, R. M., and Rodríguez-Casal, A.. A plug-in rule for bandwidth selection in circular density estimation. Comput. Statist. Data Anal., 56: Rosenblatt, M A quadratic measure of deviation of two-dimensional density estimates and a test of independence. Ann. Statist., :. Rosenblatt, M. and Wahlen, B. E. 99. A nonparametric measure of independence under a hypothesis of independent components. Statist. Probab. Lett., 5:5 5. Scott, D. W. 99. Multivariate density estimation. Wiley Series in Probability and Mathematical Statistics. Applied Probability and Statistics. John Wiley & Sons, New York. Silverman, B. W Density estimation for statistics and data analysis. Monographs on Statistics and Applied Probability. Chapman & Hall, London. Srivastava, A. N. and Sahami, M., editors 9. Text mining: classification, clustering, and applications. Chapman & Hall/CRC Data Mining and Knowledge Discovery Series. CRC Press, Boca Raton. Taylor, C. C. 8. Automatic bandwidth selection for circular density estimation. Comput. Statist. Data Anal., 57:9 5. Wand, M. P. and Jones, M. C Kernel smoothing, volume 6 of Monographs on Statistics and Applied Probability. Chapman & Hall, London. Watson, G. S. 98. Statistics on spheres, volume 6 of University of Arkansas Lecture Notes in the Mathematical Sciences. John Wiley & Sons, New York. Wehrly, T. E. and Johnson, R. A Bivariate models for dependence of angular observations and a related Markov process. Biometrika, 67: Zhao, L. and Wu, C.. Central limit theorem for integrated square error of kernel estimators of spherical density. Sci. China Ser. A, :7 8.

170 5 Chapter 6. Central limit theorems for directional and linear random variables

171 Chapter 7 Testing parametric models in linear-directional regression Abstract This paper presents a goodness-of-fit test for parametric regression models with scalar response and directional predictor, that is, vectors in a sphere of arbitrary dimension. The testing procedure is based on the weighted squared distance between a smooth and a parametric regression estimator, where the smooth regression estimator is obtained by a projected local approach. Asymptotic behavior of the test statistic under the null hypothesis and local alternatives is provided, jointly with a consistent bootstrap algorithm for application in practice. A simulation study illustrates the performance of the test in finite samples. The procedure is also applied to a real data example from text mining. Reference García-Portugués, E., Van Keilegom, I., Crujeiras, R. M. and González-Manteiga, W.. Testing parametric models in linear-directional regression. arxiv Contents 7. Introduction Nonparametric linear-directional regression The projected local estimator Properties Goodness-of-fit test for linear-directional regression Bootstrap calibration Simulation study Application to text mining A Main results A. Projected local estimator properties A. Asymptotic results for the goodness-of-fit test References

172 5 Chapter 7. Testing parametric models in linear-directional regression 7. Introduction Directional data data on a general sphere of dimension q appear in a variety of contexts being the simplest one provided by observations of angles on a circle circular data, for instance, from wind directions or animal orientation Mardia and Jupp,. Stars positions could be seen as data on a two-dimensional sphere and quite recently, directional data in higher dimensions have been considered in text mining Srivastava and Sahami, 9. In order to identify a statistical pattern within a certain collection of texts, these objects may be represented by a vector on a sphere where each vector component gives the relative frequency of a certain word. For instance, from this vector-space representation, text classification see Banerjee et al. 5 can be performed, but other interesting problems such as popularity prediction could be tackled. Such a characterization can be done, for instance, for articles in news aggregators, where popularity prediction in web texts news can be quantified by the number of comments or views see Tatar et al.. In addition, in order to model or predict the popularity of a certain web entry based on its contents in a vector-space form, a linear-directional regression model could be used. When dealing with directional and linear variables at the same time, the joint behavior could be modeled by considering a flexible density estimator, as the one proposed by García-Portugués et al.. Nevertheless, a regression approach may be more useful, allowing at the same time for explaining a relation between the variables and for making predictions. Nonparametric regression estimation methods for linear-directional models have been proposed by different authors. For instance, Cheng and Wu introduced a general local linear regression method on manifolds, and quite recently, Di Marzio et al. presented a local polynomial method for the regression function when both the predictor and the response are defined on spheres. Despite the fact that these methods provide flexible estimators which may capture the regression shape, in terms of interpretation of the results, purely parametric models may be more convenient. In this context, goodness-of-fit testing methods can be designed, providing a tool for assessing a certain parametric linear-directional regression model. Up to the authors knowledge, such a problem has not been considered in the statistical literature, with the exception of Deschepper et al. 8, who propose an exploratory tool and a lack-of-fit test for circular-linear regression. A goodness-of-fit test for parametric regression models will be presented in this paper. The test is based on a squared distance between the parametric fit and a nonparametric one. Specifically, a modified local linear estimator, similar to the proposal of Di Marzio et al. will be introduced with this purpose. Theoretical properties of the test statistic will be studied, and its effectiveness in practice will be confirmed by simulation results. The paper is organized as follows. Basic notation is introduced in Section 7., where the projected local regression estimator is also analyzed. Section 7. includes the main results, regarding the asymptotic behavior of the test statistic. A consistent bootstrap strategy is also presented. The performance of the proposed method is assessed for finite samples in a simulation study, provided in Section 7.. Section 7.5 shows a real data application on news popularity prediction. An appendix contains the proofs of the main results. Proofs of technical lemmas and further information on the simulation study, jointly with more simulation results, are provided as supplementary material.

173 7.. Nonparametric linear-directional regression 5 7. Nonparametric linear-directional regression Some basic concepts in nonparametric directional density and linear-directional regression estimation will be provided in this section. Basic notation will be introduced, jointly with some motivation for the regression estimator proposal that will be used in the testing procedure. Let Ω q = { x R q+ : x + + x q+ = } denote the q-sphere in R q+, with associated Lebesgue measure denoted by ω q when there is no possible confusion, the surface area of Ω q will be denoted by ω q = π q+ / Γ q+, q. A directional density f on Ωq satisfies Ω q fx ω q dx =. Consider a random sample X,..., X n in Ω q, from a directional random variable X with density f. A kernel density estimator for f was introduced by Hall et al. 987 and Bai et al For a given point x Ω q, the kernel density estimator is defined as ˆf h x = n L h x, X i, n i= x T y L h x, y = c h,q LL, 7. where L is a directional kernel, h > is the bandwidth parameter and the normalizing constant c h,q L is given by c h,q L = λ h,q Lh q = λ q Lh q + o 7. with λ h,q L = ω q h Lrr q rh q dr and λ q L = q ω q Lrr q dr. In many practical situations, the interest lies in the analysis of the directional variable X jointly with a real random variable Y. The joint behavior of both variables X, Y may be studied by a density approach, as in García-Portugués et al.. However, a regression approach may be more suitable in some situations. Assume that the directional random variable X with density f may be the covariate in the following regression model Y = mx + σxε, 7. where Y is a scalar random response variable, m is the regression function given by the conditional mean mx = E Y X = x, and σ is the conditional variance σ x = Var Y X = x. Errors are collected by ε, a random variable such that E ε X =, E ε X = and E ε X and E ε X are bounded random variables. The regression and density functions m, f : Ω q R can be extended from Ω q to R q+ \ {} by considering a radial projection see Zhao and Wu for the density case. This allows for the consideration of derivatives of these functions and the use of Taylor expansions. A. m and f are extended from Ω q to R q+ \ {} by m x m x/ x and f x f x/ x. m is three times and f is twice continuously differentiable and f is bounded away from zero. The continuity up to the second derivatives of f and up to the third derivatives of m, together with the q-spherical compact support, guarantees that these functions are in fact uniformly bounded. As a consequence of the radial extension, the directional derivative of m in the direction x and evaluated at x is zero, that is, x T mx =. h

174 5 Chapter 7. Testing parametric models in linear-directional regression Consider, from now on, that a random sample from model 7., namely X, Y,..., X n, Y n independent and identically distributed iid vectors in Ω q R, is available. Given two points x, X i Ω q, under assumption A, the one-term Taylor expansion of m at X i, conditionally on X,..., X n, can be written as: mx i = mx + mx T X i x + O X i x = mx + mx T I q+ xx T X i x + O X i x = mx + mx T B x B T x X i x + O X i x β + β T B T x X i x, with B x = b,..., b q q+ q the projection matrix. For a given x Ω q, let {b,..., b q } be a collection of q resulting vectors that complete x to an orthonormal basis {x, b,..., b q } of R q+ given, for example, by Gram-Schmidt. The projection matrix B x = b,..., b q is a q + q semiorthogonal matrix, i.e. B T x B x = I q, with I q the identity matrix of dimension q. By the spectral decomposition theorem, B x B T x = q i= b ib T i = I q+ xx T. With this setting, the first coefficient β R captures the constant effect in mx while the second one, β R q, contains the linear effects of the projected gradient of m given by B T x mx. It should be noted that β has dimension q an appropriate dimension in the q-sphere Ω q, instead of having dimension q +, the one that will arise from an usual Taylor expansion in R q+. The previous Taylor expansion provides the motivation for the projected local estimator of the regression function m at x Ω q that will be introduced and analyzed in the next section. 7.. The projected local estimator As in the previous section, consider X, Y,..., X n, Y n a random sample from model 7. and recall the expansion of the regression function derived under assumption A: mx i β + β T B T x X i x. The projected local estimator proposed in this work is obtained as a local fit by weighting the constants β or the hyperplanes β + β T B T x X i x according to the influence of X i over x. Both situations can be formulated together as a weighted least squares problem: min β R q+ n i= Y i β δ p, β,..., β q T B T x X i x Lh x, X i, 7. where δ p,q is the Kronecker Delta and is used to control both the local constant p = and local linear p = fits and L h are the directional kernels, defined as in 7.. The solution to the minimization problem 7. is given by ˆβ = X T x,pw x X x,p X T x,p W x Y, 7.5 where Y is the vector of observed responses, W x is the weight matrix and X x,p is the design matrix. Specifically: Y X x T B x Y =., W x = diag L h x, X,..., L h x, X n, X x, =.. Y n X n x T B x

175 7.. Nonparametric linear-directional regression 55 and X x, = n, with d denoting a column vector of length d with all entries equal to one and whose dimension will be omitted and determined by the context. The projected local constant or linear estimator at x is given by the estimated coefficient ˆβ = ˆm h,p x and is a weighted linear combination of the responses: ˆm h,p x = ˆβ = e T where W p n x, X i = e T X T x,pw x X x,p X T x,p W x Y = n Wn p x, X i Y i, 7.6 i= X T x,pw x X x,p X T x,p W x e i e i denotes a unit canonical vector. It should be noted that, for p = local constant, Wn L x, X i = h x,x i n j= L. This corresponds hx,x j to the Nadaraya-Watson estimator for directional predictor and scalar response, introduced by Wang et al.. Remark 7.. Di Marzio et al. have recently proposed a local linear estimator for model 7., but from a different approach. Specifically, these authors consider another alternative for developing the Taylor expansions of m based on the tangent-normal decomposition: X i = x cos η i + ξ i sin η i, with η i, π and ξ i Ω q satisfying that ξ T i x =. This approach leads to an overparametrized design matrix of q + columns which makes X T x,pw x X x,p exactly singular. This problem can be avoided in practice by computing a pseudo-inverse. Although both approaches come from different motivations, it can be seen that their asymptotic behavior is the same conditional bias, variance and asymptotic normality. This can be checked by considering the different parametrization used in Di Marzio et al., where the smoothing parameter is κ /h. It should be also noted that, for the circular case q =, the projected local estimator corresponds to the proposal by Di Marzio et al. 9. See supplement for a detailed discussion of particular cases. 7.. Properties Asymptotic bias and variance for the estimator 7.5, jointly with its asymptotic normality, will be derived in this section. Some further assumptions will be required: A. σ is uniformly continuous and bounded away from zero. A. The directional kernel L is a bounded function L :,, with exponential decay: Lr Me αr, r,, with M, α >. A. The sequence of bandwidths h = h n is positive and satisfies h and nh q. Conditions A and A are the usual ones for the multivariate local linear estimator see Ruppert and Wand 99. Condition A allows for the use of non-compactly supported kernels, such as the von Mises kernel Lr = e r, and implies condition A in García-Portugués et al.. Theorem 7. Conditional bias and variance. Under assumptions A A, the conditional bias and variance for the projected local estimator with p =, are given by E ˆm h,p x mx X,..., X n = b ql B p xh + o P h h + δ p, O P, q nh q Var ˆm h,p x X,..., X n = λ ql λ q L nh q σ x + o P nh q, fx

176 56 Chapter 7. Testing parametric models in linear-directional regression uniformly for all x Ω q, where the terms in the bias are given by tr stands for the trace: { fxt mx B p x = fx + tr H m x, p =, Lrr q dr b q L = tr H m x, p =, Lrr q dr. Remark 7.. As it happens in the Euclidean setting, the conditional bias is reduced from the local constant fit to the local linear one, whereas the variance remains the same for both estimators. The expressions and residual terms obtained in this setting agree with their Euclidean analogues see Fan et al. 996, with the role of the kernel s second moment played by q b q L and the integral of the squared kernel by λ q L λ q L. From Theorem 7., an equivalent kernel expression can be obtained. Such a result allows for an explicit form of the weights W p n x, X i, resulting an estimator asymptotically equivalent in probability to 7.6. This formulation, for p =, provides a simpler form for the weighting kernel than the one given in 7.6. In addition, the asymptotic expression will only depend on the datum X i, Y i and not on the whole data sample. This feature will be crucial for developing the goodness-of-fit test in Section 7. and the asymptotic normality of the estimators, collected in the next result. Corollary 7. Equivalent kernel. Under assumptions A A, the projected local estimator ˆm h,p x = n i= Wn p x, X i Y i for p =, satisfies uniformly in x Ω q : n ˆm h,p x = L hx, X i Y i + o P, L x T h x, X i = nh q λ q Lfx L X i h. i= Note that the equivalent kernel is the same for p = and p =, as it happens in the Euclidean case when estimating the regression function see pages 6 66 of Fan and Gijbels 996 with ν = and p =,. Theorem 7. Asymptotic normality. Under assumptions A A and for p =,, for every fixed point x Ω q such that E Y mx +δ X = x <, for some δ >, nh q ˆm h,p x mx b ql B p xh d N, λ ql λ q L σ x. q fx 7. Goodness-of-fit test for linear-directional regression In this section, a test statistic for assessing if the regression function m belongs to a class of parametric functions M Θ = {m θ : θ Θ R s } will be introduced. Assuming that model 7. holds, the goal is to test the null hypothesis H : mx = m θ x, for all x Ω q, versus H : mx m θ x, for some x Ω q, with θ Θ known simple hypothesis or unknown composite hypothesis and where the statement for all holds except for a set of probability zero and for some holds for a set of positive probability. The proposed statistic to test H compares the projected local estimator introduced in Subsection 7.. with a parametric estimator in M Θ throughout a squared weighted norm: T n = ˆmh,p x L h,p mˆθx ˆfh xwx ω q dx Ω q

177 7.. Goodness-of-fit test for linear-directional regression 57 where L h,p mx = n i= Wn p x, X i mx i represents the local smoothing of the function m from measurements {X i } n i= and ˆθ denotes either the known parameter θ simple null hypothesis or a consistent estimator composite null hypothesis; see condition A6 below. This smoothing of the possibly estimated parametric regression function is included to reduce the asymptotic bias Härdle and Mammen, 99. In order to mitigate the effect of the difference between ˆm h,p and mˆθ in sparse areas of the covariate, the squared difference is weighted by a kernel density estimate of X, namely ˆf h. Furthermore, by the inclusion of ˆfh, the effects of the unknown density both on the asymptotic bias and variance are removed. Optionally, a weight function w : Ω q, can also be considered. Two additional assumptions regarding the smoothness of the parametric regression function and the estimation of θ in the composite hypothesis are required for deriving the distribution of T n under the null hypothesis: A5. m θ is continuously differentiable as a function of θ, and this derivative is also continuous for x Ω q. A6. Under H, there exists an n-consistent estimator ˆθ of θ, i.e. ˆθ θ = O P n and such that, under H, ˆθ θ = O P n for a certain θ. Theorem 7. Limit distribution of T n. Under conditions A A6 and under the null hypothesis H : m M Θ that is, mx = m θ x, for all x Ω q, nh q T n λ ql λ q L nh q σθ d xwx ω q dx N, νθ, Ω q where σ θ x = E Y m θ X X = x is the conditional variance under H and { νθ = σθ xwx ω q dx γ q λ q L r q ρ q Lρϕ q r, ρ dρ} dr, Ω q L r + ρ rρ + L r + ρ + rρ, q =, ϕ q r, ρ = θ q L r + ρ θrρ dθ, q, {, q =, γ q = ω q ω q q, q. It should be noted that the convergence rate as well as the asymptotic bias and variance agree with the results in the multivariate setting given by Härdle and Mammen 99 and Alcalá et al. 999, except for the cancellation of the design density effects in bias and variance, achieved by the inclusion of ˆf h in the test statistic. The use of a local estimator with p = or p = local constant or local linear does not affect the limiting distribution, given that the equivalent kernel is the same as stated in Corollary 7.. Finally, the general complex structure of the asymptotic variance see also Zhao and Wu and García-Portugués et al. for the density context turns much simpler if a von Mises kernel see Subsection 7.. is used: ν = σ xwx ω q dx 8π q. Ω q The contribution of this kernel to the asymptotic bias is λ q L λ q L = π q.

178 58 Chapter 7. Testing parametric models in linear-directional regression The power of the proposed test statistic is also investigated for a family of local Pitman alternatives that is asymptotically close to H. Denote these local alternatives by H P : H P : mx = m θ x + nh q gx, for all x Ω q, where m θ M Θ, g : Ω q R and, by assumption A, nh q. With this notation, H P becomes H when the component g is such that m θ + nh q g M Θ g, for example and H when the previous statement does not hold for a set of positive probability. The following conditions are required for deriving the limiting distribution of T n under H P : A7. The function g is continuous. A8. Under H P, the n-consistent estimator ˆθ also satisfies ˆθ θ = O P n. Theorem 7. Power under local alternatives. Under conditions A A5, A7 A8 and under the hypothesis H P, nh q T n λ ql λ q L nh q Ω q σ θ xwx ω q dx d N gx fxwx ω q dx, νθ. Ω q Under local alternatives, the effect of g in the limiting distribution of the test statistic is clearly seen in the asymptotic bias. Specifically, the shift is given by the squared norm of g, weighted with respect to the product of f and w and therefore the test asymptotically detects all kinds of local alternatives from H whose component g has a positive squared weighted norm. To illustrate the effective convergence of the statistic to the asymptotic distribution, a simple numerical experiment is provided. The regression setting is the model Y = c + ε, with c =, ε N, σ, σ = and X uniformly distributed on the circle q =. The composite hypothesis H : m c, for c R unknown test for no effect, is checked using the local constant estimator p = with von Mises kernel and considering the weight function w. Figure 7. presents two QQ-plots computed from samples { nh T j n π nh} 5 obtained for different j= sample sizes n. Two bandwidth sequences h n = n r, r =, 5 are chosen to illustrate the effect of the bandwidths in the convergence to the asymptotic distribution, and, specifically, that the effect of undersmoothing boosts the convergence since the bias is mitigated. The Kolmogorov- Smirnov K-S and Shapiro-Wilk S-W tests are applied on to measure the closeness of the empirical distribution of the statistic to a N, νθ and to normality, respectively. 7.. Bootstrap calibration As it usually happens in smoothed tests see Härdle and Mammen 99 or García-Portugués et al., the asymptotic distribution cannot be used to calibrate the test statistic for small or moderate sample sizes due to the slow convergence rate and due to the presence of unknown quantities depending on the design density and the error structure. In this situation, bootstrap calibration is an alternative. The main idea is to approximate the distribution of T n under H by one of its bootstrapped version T n, which can be arbitrarily well approximated by Monte Carlo by generating bootstrap samples {X i, Y i }n i=. Under H, the bootstrap responses are obtained from the parametric

179 7.. Goodness-of-fit test for linear-directional regression 59 fit and bootstrap errors that imitate the conditional variance by a wild bootstrap procedure: Yi = mˆθx i + ˆε i Vi, where ˆε i = Y i mˆθx i and the variables V,..., V n are independent from the observed sample and iid with E Vi =, Var V i = and finite third and fourth moments. A common choice is considering a binary variable with probabilities P { Vi = 5/ } = 5 + 5/ and P { Vi = + 5/ } = 5 5/, which corresponds to the golden section bootstrap. The bootstrap test statistic is Tn = ˆm h,px L h,p mˆθ x ˆfh xwx ω q dx, Ω q where ˆm h,p and ˆθ are the analogues of ˆm h,p and ˆθ, respectively, obtained from the bootstrapped sample {X i, Y i }n i=. Theoretical quantiles of a N, ν θ Quantiles for h n =.5 n. p values: K S=., S W=.. Quantiles for h n =.5 n 5. p values: K S=., S W=.. Theoretical quantiles of a N, ν θ Quantiles for h n =.5 n. p values: K S=.7, S W=.8. Quantiles for h n =.5 n 5. p values: K S=., S W= Sample quantiles Sample quantiles Figure 7.: QQ-plot comparing the quantiles of the asymptotic distribution given by Theorem 7. with the sample quantiles for { nh T j n π nh} 5 with n = j= left and n = 5 5 right. The testing procedure for calibrating the test is summarized in the next algorithm, stated for the composite hypothesis. If the simple hypothesis is considered, then set θ = ˆθ = ˆθ. Algorithm 7. Test in practice. Consider {X i, Y i } n i= a random sample from model 7.. To test H : m M Θ, set a bandwidth h and a weight function w and proceed as follows: i. Compute ˆθ and obtain the fitted residuals ˆε i = Y i mˆθx i, i =,..., n. ii. Compute T n = Ω q ˆmh,p x L h,p mˆθx ˆfh xwx ω q dx. iii. Bootstrap resampling. For b =,..., B: a Obtain a bootstrap random sample {X i, Yi }n i=, where Y i = mˆθx i + ˆε i Vi are iid golden section binary variables, i =,..., n. b Compute ˆθ as in i, but now from the bootstrap sample from a. c Compute T b n = Ω q ˆm h,p x L h,p mˆθ x ˆfh xwx ω q dx. and V i

180 6 Chapter 7. Testing parametric models in linear-directional regression iv. Approximate the p-value by B Bb= {T b n T n}. Bootstrap strategies may be computational expensive. In this case, it should be noted that the test statistic can be written as T n = ni= Ω q Wnx, p X i ˆε i ˆfh xwx ω q dx, using the equivalent kernel notation. The bootstrap test statistic Tn is the same, just taking ˆε i = Yi mˆθ X i instead of ˆε i, so there is no need to recompute the other elements of T n in the bootstrap resampling. In order to prove the consistency of the resampling mechanism detailed in Algorithm 7., that is, that the bootstrapped statistic T n has the same asymptotic distribution as the original statistic T n, a bootstrap analogue of assumption A6 is required: A9. The estimator ˆθ computed from {X i, Yi }n i= is such that ˆθ ˆθ = O P n, where P is the probability law conditional on {X i, Y i } n i=. Based on this assumption and on the previous ones, it is proved from Theorem 7. that the probability distribution function pdf of Tn conditional on the sample converges always in probability to a Gaussian pdf, which is the same asymptotic pdf of T n if H holds. Theorem 7.5 Bootstrap consistency. Under conditions A A6 and A9 and conditionally on {X i, Y i } n i=, nh q T n λ ql λ q L nh q Ω q σ θ xwx ω q dx d N, νθ in probability. If the null hypothesis holds, then θ = θ. 7. Simulation study The finite sample performance of the goodness-of-fit test is explored in four regression models, considering different sample sizes, dimensions and bandwidths. Given the regression model 7. and taking T n as test statistic, the following components must be specified: the density of the predictor X, the regression function m, the noise σxε and the deviations from H. The parametric regression functions with directional covariate and scalar response are shown in Figure 7., with the following codification: the radius from the origin represents the response mx for a x direction, resulting in a distortion from a perfect circle or sphere. The noise considered is ε N,, with two different conditional standard deviations given by σ x = homocedastic and σ x = + f M6x heteroskedastic, with f M6 being the density of the M6 model in García-Portugués. In order to define the design densities, some models introduced by García-Portugués have been considered: M uniform, M, M and M are used as single densities or as part of mixture distributions, as in S and S see Table 7.. The alternative hypothesis H is obtained by adding the deviations x = cosπx x q+ / log + x q+ and x = cosπx x exp {x q+ } to the true regression function m θ x. The different combinations considered in S to S are given in Table 7. see supplementary material also for further details.

181 7.. Simulation study 6 Figure 7.: Parametric regression models for scenarios S to S, for circular and spherical cases. Scenario Regression function Parameters Density Noise Deviation S mx = c c = M Het. S mx = c + η T x c =, η =, q S mx = c + a sinπx + b cosπx c =, a =, b = x 5 M + 5 M Het. x 5 M + 5 M Hom. x S mx = c + a sin πb + x q+ c =, a =, b = M Hom. Table 7.: Specification of simulation scenarios for model 7.. x The tests based on the projected local constant and local linear estimators p =, are compared in these four scenarios, under H and H, for a grid of bandwidths, different sample sizes n =, 5, 5 and dimensions q =,,. M = Monte Carlo trials and B = bootstrap replicates are considered. Parametric estimation is done by nonlinear least squares, which is justified by their simplicity and asymptotic normality under certain conditions Jennrich, 969, hence satisfying assumption A6. The empirical sizes of the goodness-of-fit tests are shown using the so called significance trace Bowman and Azzalini, 997, that is, the curve of percentages of empirical rejections for different bandwidths. These empirical rejections are computed from the same generated samples and bootstrap replicates. As it is shown in Figure 7., except for very small bandwidths that result in a conservative test, the significance level is stabilized around the 95% confidence band for the nominal level α =.5, for the different scenarios, dimensions and sample sizes. With respect to the power, given the mild deviations from the null hypotheses see supplementary material for quantification, the power performance of the proposed tests seems quite competitive. Despite the fact that the test based on the local linear estimator p = provides a better power for large bandwidths in certain scenarios, the overall impression is that the test with p = is hard to beat: the powers with p = and p = are almost the same for low dimensions, whereas as the dimension increases the local constant estimator performs better for a wider range of

182 6 Chapter 7. Testing parametric models in linear-directional regression bandwidths. This effect could be explained by the spikes that local linear regression tends to show in the boundaries of the support design densities of S and S, which become more important as the dimension increases. More simulation results for different sample sizes and significance levels are available in the supplement. Empirical size Empirical size Empirical size h h h Empirical power Empirical power Empirical power h h h Figure 7.: Empirical sizes first row and powers second row for significance level α =.5 for the different scenarios, with p = solid line and p = dashed line. From left to right, columns represent dimensions q =,, with sample size n =. 7.5 Application to text mining A challenging field where directional data techniques may be applied is text mining. In different applications within this context, it is quite common to consider a corpus collection of documents and to determine the so-called vector space model: a corpus d,..., d n is codified by the set of vectors {d i,..., d id } n i= the document-term matrix with respect to a dictionary or a bag of words {w,..., w D }, such that d ij represents the frequency of the dictionary s j-th word in the document d i. Since large documents are expected to have higher word frequencies, a normalization is required. For instance, if the Euclidean norm is used, d i / d i, then the documents can be regarded as points in Ω D providing therefore a set of directional data. Some recent references using directional statistics in text mining are Banerjee et al. 5, Buchta et al. and Surian and Chawla. In this example, the corpus that is analyzed was acquired from the news aggregator Slashdot wwww.slashdot.org. This website publishes summaries of news about technology and science that are submitted and evaluated by users. Each news entry includes a title, a summary with

183 7.5. Application to text mining 6 links to other related news and a discussion thread gathering users comments. Obviously, not all the news have the same impact in terms of popularity or participation, measuring this variable as the number of comments for each entry. The goal of this application is to test a linear model that takes as a predictor the topic of the news a directional variable and as a response the log-number of comments. The consideration of simple linear models seems frequent in this context. For instance, Tatar et al. consider linear regression models for providing a ranking on online news based on the number of comments at two different time moments after the news publication. Asur and Huberman present simple linear models for predicting the box-office of a film from tweets information, and show that a simple linear regression performs better in predicting the box-office than artificial money markets. Also in favor of linear models, it can be also argued that in text classifications, it has been checked that non-linear classifiers hardly provide any advantage with respect to linear ones Joachims,. Titles, summaries and number of comments in each news appeared in were downloaded, resulting in a collection of n = 8 documents. After that, the next steps were performed with the help of the text mining R library tm Meyer et al., 8: merge titles and summaries in the same document, omitting user submission details; deletion of HTML codes; conversion to lowercase; deletion of stop words defined in tm and MySQL, punctuation, white spaces and numbers; 5 stemming of words. The distribution of the document frequency df, number of documents containing a particular word is highly right skewed and more than 5% of the processed words only appeared in a single document, while in contrast a few words are repeated in many documents. To overcome this problem, a pruning was done such that only the words with df between quantiles 95% and 99.95% were considered words appearing within 58 and 96 documents. After this process, the documents are represented in a document term matrix formed by the D = 58 words. p values h Figure 7.: Significance trace of the local constant goodness-of-fit test for the constrained linear model. In order to construct a plausible linear model, a preliminary variable selection was performed using LASSO regression with tuning parameter λ selected by an overpenalized three standard error rule see Hastie et al. 9. After removing some extra variables by using a backward stepwise method with BIC, we obtain a fitted vector ˆη R D with d = 77 non-zero entries. The test is applied to check the null hypothesis of a candidate linear model with coefficient η constrained to be zero except in these previously selected d words, that is H : mx = c + η T x,

184 6 Chapter 7. Testing parametric models in linear-directional regression with η subject to Aη = for an adequate choice of the matrix A D d D. The significance trace in Figure 7. shows no evidence to reject the linear model for a wide grid of bandwidths, using a local constant approach local linear was not implemented due to its higher cost and computational limitations. Table 7. shows the fitted linear model under the null hypothesis. As it can be seen, news where stemmed words like kill, climat, polit appear have a strong positive impact on the number of comments, since these news are likely more controversial and generate broader discussions. On the other hand, scientific related words like mission, abstract or lab have a negative impact, since they tend to raise more objective and higher specific discussions. int conclud gun kill refus averag lose obama declin climat snowden stop wrong war polit senat tesla violat concern slashdot ban reason health pay window american told worker man comment state think movi ask job drive know problem employe nsa charg feder money sale need microsoft project network cell imag avail video process data materi nasa launch electron robot satellit detect planet help cloud hack open lab mobil techniqu vulner mission team supercomput abstract simul demo guid Table 7.: Fitted constrained linear model on the Slashdot dataset, with R =.5. The significances of each coefficient are lower than.. Acknowledgments We would like to thank professor David E. Losada, from CiTIUS at the University of Santiago de Compostela, for his guidance in the data application. This research was supported by Project MTM8 from the Spanish Ministry of Science and Innovation, by Project MDS75PR from Dirección Xeral de I+D of the Xunta de Galicia, by IAP research network grant nr. P7/6 of the Belgian government Belgian Science Policy, by the European Research Council under the European Community s Seventh Framework Programme FP7/7 / ERC Grant agreement No. 65, and by the contract Projet d Actions de Recherche Concertées ARC /6 9 of the Communauté française de Belgique granted by the Académie universitaire Louvain. Work of E. García-Portugués has been supported by a grant from Fundación Barrié and FPU grant AP 957 from the Spanish Ministry of Education. Authors gratefully acknowledge the computational resources used at the SVG cluster of the CESGA Supercomputing Center. Supplement Three extra appendices are included as supplementary material, containing particular cases of the projected local estimator, the technical lemmas and further results for the simulation study.

185 7.A. Main results 65 7.A Main results 7.A. Projected local estimator properties Proof of Theorem 7.. The proof is divided in three sections: the conditional bias is first obtained for p =, then the result for p = follows by restricting the computations to the first column of X x,p and the variance is proved to be common to both estimators. Bias of ˆm h,. Working conditionally, by 7. and 7.5, E ˆm h,p x X,..., X n = e T X T x,pw x X x,p X T x,p W x m, 7.7 where m = mx,..., mx n T. The proof is based on Theorem. in Ruppert and Wand 99 but adapted to the projected local estimator. First of all, consider the Taylor expansion of mx i of second order around the point x Ω q, which follows naturally by extending the one given in Section 7. since x T H m xx =, where H m x is the Hessian of m: mx i = mx + mx T B x B T x X i x + X i x T B x B T x H m xb x B T x X i x + o X i x. The Taylor expansion can be expressed componentwise also for the orders as T m = X x, mx, B T x mx + Q mx + o R m x, with Q m x the vector with i-th entry given by X i x T B x B T x H m xb x B T x X i x and remainder term of order R m x = X x,..., X n x T, uniform in x Ωq since the third derivative of m is bounded by assumption A. Then, by 7.7, the first term in the Taylor expansion is e T X T x,pw x X x,p X T x,p W x X x, mx, B T x mx T, 7.8 which for p = equals mx and hence the conditional bias is given by e T X T x,p W x X x,p X T x,p times the remaining vector. By using the results i, ii and iv of Lemma B., it follows that, componentwise, n X T x,pw x X x,p = n L h x, X i n L i= h x, X i B T x X i x fx = L h x, X i X i x T B x L h x, X i B T x X i xx i x T B x b ql q fx T B x h b ql q B T x fxh b ql q I q fxh + o P T. 7.9 This matrix can be inverted by the inversion formula of a block matrix, resulting in fx fx fx T B n X T x x,pw x X x,p = fx B T bql x fx q fxh + o P T. 7. Iq

186 66 Chapter 7. Testing parametric models in linear-directional regression Now the quadratic term of the Taylor expansion yields by results v vi of Lemma B.: n X T x,pw x Q m x = n L h x, X i X i x T B x B T x H m xb x B T x X i x n L i= h x, X i B T x X i xx i x T B x B T x H m xb x B T x X i x bql = q tr H m x fxh + o P h o P h. 7. Finally, the remaining order is o P h, because X e T X T T x,pw x X x,p x,p W x R m x = fx + o P, fx fx T B x + o P T O P h, o P h T T, by setting H m x I q+ and using v vi from Lemma B.. Joining 7. and 7., then E ˆm h, x mx X,..., X n = b ql tr H m x h + o P h. q Bias of ˆm h,. For the case p = the product in 7.8 is not mx but slightly different. By 7., e T n X T x,w x X x, = fx + op and also by 7.9 and i ii in Lemma B., n X T x,w x X x, = fx + o P, b ql fx T B x h + o h P h T + O P q nh q T. Then, 7.8 turns into mx + bql fx T mx q fx h + o P h T + O h P nh q T because the coefficient in mx is exactly one. Adding this to the bias of ˆm h,, the result follows since the contribution of the linear part in 7. and in the remaining order is negligible. Variance of ˆm h,p. By the variance property for linear combinations, Var ˆm h,p x X,..., X n = e T X T x,pw x X x,p X T x,p W x VW x X x,p X T x,pw x X x,p e, where V = diag σ X,..., σ X n T. By results vii ix of Lemma B., n X T x,pw x VW x X x,p = n L h x, X iσ X i L h x, X ix i x T B x σ X i n L i= h x, X ib T x X i xσ X i L h x, X ib T x X i xx i x T B x σ X i λql λ ql = h σ xfx T q T + o P h q T. 7. Therefore, by 7. and 7., the common variance expression follows. Proof of Corollary 7.. Note that Wn p x, X i = e T X T x,p W x X x,p, δp, X i x T T B x L h x, X i. Then, by expression 7., uniformly in x Ω q it follows that ˆm h,p x = nfx n L h x, X i Y i + o P i=

187 7.A. Main results 67 + δ p, fx T B x fx n n L h x, X i B T x X i xy i + o P. i= By 7. and 7., the first addend is nh q λ qlfx ni= L h x, X i Y i + o P. The second term is o P see iii in Lemma B. and negligible in comparison with the first one, which is O P. Then, it can be absorbed inside the factor + o P, proving the corollary. Proof of Theorem 7.. For a fixed x Ω q, the next decomposition is studied: nh q ˆm h,p x mx = nh q ˆm h,p x E ˆm h,p x X,..., X n + nh q E ˆm h,p x X,..., X n mx = N + N. Term N. From the proof of Theorem 7., N = nh q e T X T X T x,p W x X x,p x,p W x Y m. By the Cramér-Wold device, if n a T X T x,pw x Y m = n n i= V n,i = V n is asymptotically normal for any a R pq+, then n X T x,pw x Y m is also asymptotically normal. To obtain the asymptotic normality of V n the Lyapunov s Central Limit Theorem CLT for triangular arrays {V n,i } n i= is employed, this is: if for δ > lim n δ Var Vn + δ E V n E V n +δ =, n where V n = L h x, XY mxa T, δ p, B T x X x, then n V n EV n d N,. From VarVn E V n E V n +δ = O E V n +δ and the use of Lemma B.6, it holds that: E V n +δ = c h,ql +δ +δ c h,q L +δ y mxa T, δ p, B T x x x fx,y x, y dy + o R = λ ql +δ a +δ λ q L +δ h +δq fxe Y mx +δ X = x + o = O h +δq. Note that E Y mx +δ X = x < is required for a δ > and that by A the kernel L +δ plays the same role as L. By using this result with δ =, it follows that Var V n E Vn = O h q. Therefore, E V n E V n +δ h +δq = O n δ Var V n + δ n δ h + δ q = O nh q δ, so by A and the Cramér-Wold device nh q X T d x,pw x Y m N pq+, Σ, with the covariance matrix arising from 7.: λq L Σ = λ q L σ xfx T T. On the other hand, by 7., e T n X T x,pw x X x,p converges in probability to fx, fx fx T B x if p = and to fx if p =. The desired result then follows by the use of

188 68 Chapter 7. Testing parametric models in linear-directional regression Slutsky s theorem: N = nh q e T X T x,pw x X x,p X T x,p W x Y m d N, λ ql λ q L σ x fx Term N. By the conditional bias expansion of Theorem 7. N converges in probability as N = nh q b ql B q xh + o P + δ p, o P, q so adding this bias to N the asymptotic normality is proved by Slutsky s theorem. 7.A. Asymptotic results for the goodness-of-fit test Proof of Theorems 7. and 7.. Both theorems are proved at the same time by assuming that H P holds and considering H a particular case with g. The proof follows the steps of Härdle and Mammen 99 and Alcalá et al. 999 and makes use of the equivalent kernel representation for simplifying the computations and applying de Jong 987 s CLT. The test statistic T n can be separated into three addends by adding and subtracting the true smoothed regression function: T n = ˆmh,p x L h,p mˆθx ˆfh xwx ω q dx Ω q = T n, + T n, T n, + o P, 7. where, thanks to result i from Lemma B., the addends are: n T n, = Wn p x, X i Y i m θ X i fxwx ω q dx, Ω q i= T n, = Lh,p mθ mˆθ x fxwx ω q dx, Ω q T n, = ˆm h,p x L h,p m θ x L h,p mθ mˆθ xfxwx ωq dx. Ω q By Slutsky s theorem, the asymptotic distribution of T n will be the one of T n, + T n, T n,, so the proof is divided in the examination of each addend. Terms T n, and T n,. By a Taylor expansion on m θ x as a function of θ see A5, T n, = = Ω q Ω q T m x θ L h,p ˆθ θ fxwx ω q dx θ θ=θn T ˆθ θ Lh,p O P x fxwx ωq dx = O P n, with θ n Θ. The second equality holds by the boundedness of m θx θ for x Ω q, where the last holds by A6 and A8. On the other hand, T n, = θ Ω ˆθ T mθ ˆm h,p x L h,p m θ x L h,p xfxwx ω q dx q θ θ=θn.

189 7.A. Main results 69 = O P n ˆm h,p x L h,p m θ x fxwx ω q dx Ωq = O P n, because of the previous considerations used and i from Lemma B.. As a consequence, nh q T n, = q q q O P h and nh T n, = O P h, so by A it happens that nh q Tn, p and nh q p Tn,. 7. Term T n,. Now, T n, can be dealt with the equivalent kernel of Corollary 7.: T n, = Ω q n L h x, X i + o P Y i m θ X i fxwx ω q dx i= = T n, + o P. Using again Slutsky s theorem, the asymptotic distribution of T n,, and hence of T n, will be the one of T n,. Now it is possible to split T n, = T n, + T n, + T n, 7.5 by recalling that Y i m θ X i = σx i ε i + nh q gx i by the model definition 7. and hypothesis H P. Specifically, under H P the conditional variance can be expressed as σ x = E Y m θ X nh q gx X = x = σ θ x + o, uniformly in x Ω q since g and σ θ are continuous and bounded by A and A7. Therefore, T n, = Ω q T n, = nh q Ω q T n, = nh q n L h x, X i σx i ε i fxwx ω q dx, i= n L h x, X i gx i fxwx ω q dx, i= n n L h x, X i L h x, X j σx i ε i gx j fxwx ω q dx. Ω q i= j= By results ii and iii of Lemma B., the behavior of the two last terms is nh q T n, = gx fxwx ω q dx + o P and nh q T n, = o P. 7.6 Ω q For the first addend, consider now n T n, = L h x, X i σx i ε i fxwx ω q dx = Ω q i= + T a n, Ω q i j + T b n,. L h x, X i L h x, X j σx i σx j ε i ε j fxwx ω q dx From result iv of Lemma B. and because σ x = σ θ x + o uniformly, nh q T a n, = λ ql λ q L h q Ω q σ θ xwx ω q dx + o + o P. 7.7

190 7 Chapter 7. Testing parametric models in linear-directional regression T b n, The asymptotic behavior of is obtained using Theorem. in de Jong 987. This result states that the sum W n = n i,j= W ijn, with W ijn random variables depending on the sample d size and on independent variables X i and X j, converges as W n N, v under the following conditions: a the random variables W ijn are clean, i.e. E W ijn + W jin X i = for i < j n, b Var W n v, c max nj= i n Var W ijn v, d E W n v. In order to apply this result, let us denote first { q nh W ijn = Ω q L h x, X i L h x, X j σx i σx j ε i ε j fxwx ω q dx, i j,, i = j. Then, nh q b T n, = W n = i j W ijn and the random variables on which W ijn depends are X i, ε i and X j, ε j. Condition a is easily seen to hold by E ε X = and the tower property, which implies that E W ijn =. Because of this, the fact that W ijn = W jin and Lemma. in de Jong 987, we have for condition b: Var W n = E i j W ijn = E i j Wijn = nn E Wijn. 7.8 Then, by v in Lemma B. and the fact that σ x = σ θ x+o, E W ijn = n ν θ + o and as a consequence Var W n ν θ. Condition c follows easily from the previous computation: max n i n j= Var W ijn v max i n n νθ + o νθ = n + o. To check condition d, note that E Wn can be split in the following form in virtue of Lemma. in de Jong 987, as Härdle and Mammen 99 stated: E Wn = E W i j nw i j nw i j nw i j n i j i j i j i j = 8 E Wijn + E WijnW kln + 8 i,j,k E W ijn WiknW jkn i,j i,j,k,l + 9 E W ijn W jkn W kln W lin, 7.9 i,j,k,l where the notation stands for the summation over all pairwise different indexes i.e., indexes that satisfy i j for their associated W ijn. By the results given in v of Lemma B., E Wijn = O n h q, E W ijn W jkn W kln W lin = O n h q and E W ijn Wikn W jkn = O n. Therefore, by 7.8 and 7.9, E Wn = i j k l E WijnW kln + o = E Wijn + o = Var Wn + o i j

191 7.A. Main results 7 and by A, E W n = Var Wn + o, so condition d is satisfied, having that nh q T b n, d N, νθ. 7. Finally, using decompositions 7. and 7.5 and results 7. and 7.6, it holds nh q Tn = nh q T a b n, + T n, + T n, λq L λ q L = σ h q θ xwx ω q dx + nh q Ω q + gx fxwx ω q dx + o P Ω q T n, + o P + T n, + T n, T b n, and the limit distribution follows by Slutsky s theorem and result 7.. Proof of Theorem 7.5. The proof mimics the steps of the proof of Theorem 7.. First of all, the bootstrap test statistic T n can be separated as T n = T n, + T n, T n,, 7. where Tn, = Tn, = Tn, = Ω q Ω q Ω q n i= W p n x, X i Y i mˆθx i ˆfh xwx ω q dx, L h,p mˆθ mˆθ x ˆfh xwx ω q dx, ˆm h,px L h,p mˆθx L h,p mˆθ mˆθ x ˆfh xwx ω q dx. Terms Tn, and T n,. By assumption A9 and analogous computations to the ones in the proof of Theorem 7., it follows that nh q Tn, p and nh q Tn, p, where the convergence is stated in the probability law P that is conditional on the sample {X i, Y i } n i=. Term Tn,. By the resampling procedure of Algorithm 7., ˆε ivi dominant term can be split into Tn, = n Ω q i= + Ω q i j = T n, + T n,. W p n x, X i ˆε i V i ˆfh xwx ω q dx = Y i mˆθx i V i Wn p x, X i Wn p x, X j ˆε i Vi ˆε j Vj ˆf h xwx ω q dx and the From result i of Lemma B., the first term is nh q T n, = λ ql λ q L h q Ω q σ θ xwx ω q dx + o P + o P, 7.

192 7 Chapter 7. Testing parametric models in linear-directional regression so the dominant term is T n,, whose asymptotic behavior is obtained using Theorem. in de Jong 987 conditionally on the sample. Let us denote now Vi E W W ijn = { q nh Ω q Wn p x, X i Wn p x, X j ˆε i Vi ˆε jvj h xwx ω q dx, i j,, i = j. Then, nh q T n, = Wn = i j W ijn and the random variables on which W ijn depends are now and Vj. Condition a of the theorem follows immediately by the properties of the V i s: ijn + W jin V i =. On the other hand, analogously to 7.8, Var W n = i j E Wijn = n h q Wn p x, X i Wn p x, X j ˆε iˆε j ˆfh xwx ω q dx i j Ω q and by result ii of Lemma B., Var Wn p νθ, resulting in the verification of condition c in probability. Condition d is checked using the same decomposition for E Wn and the results collected in ii of Lemma B.. Hence E Wn = Var Wn + o P and d is satisfied in probability, from which it follows that, conditionally on {X i, Y i } n i= the pdf of nh q T n, converges in probability to the pdf of N, νθ, that is: nh q T n, d N, νθ in probability. 7. Joining 7. and 7. and applying Slutsky s theorem conditionally on the sample, the theorem is proved: nh q T n = nh q T n, + T n, + T n, + Tn, λq L λ q L = h q σθ xwx ω q dx + nh q T n, Ω q + o P + o P. References Alcalá, J. T., Cristóbal, J. A., and González-Manteiga, W Goodness-of-fit test for linear models based on local polynomials. Statist. Probab. Lett., :9 6. Asur, S. and Huberman, B. A.. Predicting the future with social media. In Huang, J. X., King, I., Raghavan, V., and Rueger, S., editors, Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, pages IEEE. Bai, Z. D., Rao, C. R., and Zhao, L. C Kernel estimators of density function of directional data. J. Multivariate Anal., 7: 9. Banerjee, A., Dhillon, I. S., Ghosh, J., and Sra, S. 5. Clustering on the unit hypersphere using von Mises-Fisher distributions. J. Mach. Learn. Res., 6:5 8.

193 References 7 Bowman, A. W. and Azzalini, A Applied smoothing techniques for data analysis: the kernel approach with S-Plus illustrations. Oxford Statistical Science Series. Clarendon Press, Oxford. Buchta, C., Kober, M., Feinerer, I., and Hornik, K.. Spherical k-means clustering. J. Stat. Softw., 5:. Cheng, M.-Y. and Wu, H.-T.. Local linear regression on manifolds and its geometric interpretation. J. Amer. Statist. Assoc., 85:. de Jong, P A central limit theorem for generalized quadratic forms. Probab. Theory Related Fields, 75:6 77. Deschepper, E., Thas, O., and Ottoy, J. P. 8. Tests and diagnostic plots for detecting lack-of-fit for circular-linear regression models. Biometrics, 6:9 9. Di Marzio, M., Panzera, A., and Taylor, C. C. 9. Local polynomial regression for circular predictors. Statist. Probab. Lett., 799: Di Marzio, M., Panzera, A., and Taylor, C. C.. Nonparametric regression for spherical data. J. Amer. Statist. Assoc., 956: Fan, J. and Gijbels, I Local polynomial modelling and its applications, volume 66 of Monographs on Statistics and Applied Probability. Chapman & Hall, London. Fan, J., Gijbels, I., Hu, T.-C., and Huang, L.-S A study of variable bandwidth selection for local polynomial regression. Statist. Sinica, 6: 7. García-Portugués, E.. Exact risk improvement of bandwidth selectors for kernel density estimation with directional data. Electron. J. Stat., 7: García-Portugués, E., Crujeiras, R. M., and González-Manteiga, W.. estimation for directional-linear data. J. Multivariate Anal., :5 75. Kernel density García-Portugués, E., Crujeiras, R. M., and González-Manteiga, W.. Central limit theorems for directional and linear data with applications. Statist. Sinica, to appear. Hall, P., Watson, G. S., and Cabrera, J Kernel density estimation with spherical data. Biometrika, 7: Härdle, W. and Mammen, E. 99. Comparing nonparametric versus parametric regression fits. Ann. Statist., : Hastie, T., Tibshirani, R., and Friedman, J. 9. The elements of statistical learning. Springer Series in Statistics. Springer, New York, second edition. Jennrich, R. I Asymptotic properties of non-linear least squares estimators. Ann. Math. Statist., :6 6. Joachims, T.. Learning to classify text using support vector machines: Methods, theory and algorithms, volume 668 of Kluwer International Series in Engineering and Computer Science. Kluwer Academic Publishers, Boston.

194 7 Chapter 7. Testing parametric models in linear-directional regression Mardia, K. V. and Jupp, P. E.. Directional statistics. Wiley Series in Probability and Statistics. John Wiley & Sons, Chichester, second edition. Meyer, D., Hornik, K., and Feinerer, I. 8. Text mining infrastructure in R. J. Stat. Softw., 55: 5. Ruppert, D. and Wand, M. P. 99. Multivariate locally weighted least squares regression. Ann. Statist., :6 7. Srivastava, A. N. and Sahami, M., editors 9. Text mining: classification, clustering, and applications. Chapman & Hall/CRC Data Mining and Knowledge Discovery Series. CRC Press, Boca Raton. Surian, D. and Chawla, S.. Mining outlier participants: Insights using directional distributions in latent models. In Blockeel, H., Kersting, K., Nijssen, S., and Ztelezný, F., editors, Machine learning and knowledge discovery in databases, volume 888 of Lecture Notes in Artificial Intelligence, pages 7 5. Springer, Heidelberg. Tatar, A., Antoniadis, P., De Amorim, M. D., and Fdida, S.. Ranking news articles based on popularity prediction. In Proceedings of the International Conference on Advances in Social Networks Analysis and Mining ASONAM, pages 6. IEEE. Wang, X., Zhao, L., and Wu, Y.. Distribution free laws of the iterated logarithm for kernel estimator of regression function based on directional data. Chinese Ann. Math. Ser. B, : Zhao, L. and Wu, C.. Central limit theorem for integrated square error of kernel estimators of spherical density. Sci. China Ser. A, :7 8.

195 Chapter 8 Future research This chapter presents some ideas for future contributions on different areas related with the scope of the thesis. The first sections are focused on further contributions on kernel smoothing with directional data: Section 8. studies new bandwidth selectors for the projected local linear estimator presented in Chapter 7 and Section 8. introduces a new kernel density estimator under rotational symmetry motivated by the results on Chapters and. Section 8. describes the structure of an R package containing all the code developed for the implementation of the methods presented in the thesis. Finally, Section 8. looks back to Chapter to present a test for the copula structure of Wehrly and Johnson 979 and Johnson and Wehrly 978 models using empirical processes. Contents 8. Bandwidth selection in nonparametric linear-directional regression Kernel density estimation under rotational symmetry R package DirStats A goodness-of-fit test for the Johnson and Wehrly copula structure 87 References Bandwidth selection in nonparametric linear-directional regression In Chapter 7 a projected local estimator was proposed to estimate the regression function of a linear response on a directional predictor. From the results derived on the bias and variance see Theorem 7. it is possible to derive bandwidth selection rules for the proposed estimator. The next corollary provides a starting point, stating the conditional Mean Integrated Squared Error MISE and the optimal bandwidth expression that minimizes the Asymptotic MISE AMISE of the projected local estimator. A weight function w is included in the conditional MISE definition to allow an easy adaptation of the global error criteria to the case where the interest is the estimation of m on specific areas of the support of the predictor for example, the areas with larger density. Corollary 8. AMISE optimal bandwidth. Under assumptions A A from Chapter 7, for a given weight function w : Ω q R, the conditional weighted MISE of the projected local 75

196 76 Chapter 8. Future research estimator is MISE ˆm h,p X,..., X n = b ql q B p x wx ω q dxh Ω q + λ ql λ q L σ x nh q Ω q fx wx ω qdx + o P h + nh q h + δ p, O P nh q. The bandwidth that minimizes the weighted AMISE of the local estimator is If the von Mises kernel is used, then h AMISE = q λ q L λ q L σ x Ω q fx wx ω qdx b q L Ω q B p x wx ω q dxn h AMISE = π q σ x Ω q fx wx ω qdx q Ω q B p x wx ω q dxn The order of the resulting weighted AMISE for the h AMISE bandwidth is n +q. However, this bandwidth depends on the unknown functions fx, B p x see Theorem 7. and σ x. A first idea to improve its usability is to consider the weight function proportional to the density, wx = w xfx, with w : Ω q R. With this weight function which penalizes the estimation error on the most likely predictor values, the density in the denominator cancels out while it appears on the integral of the numerator, enabling a Monte Carlo estimation. In addition, assuming homocedasticity and a pilot parametric model for the regression function, it is possible to approximate the unknown terms using their corresponding parametric estimates. These two considerations lead to the following rule of thumb bandwidth selector: h ROT = q λ q L λ q L Ω q ˆσ w x ω q dx b q L n i= B p,ˆθx i w X i +q +q. +q. This selector depends on a homocedastic parametric model Y = m θ X + σε, which can be for example a quadratic one: m Q θ x = m + mt x + q T x q+ = m, m T, q T x, 8.. x q+ where x q+ stands for the vector with squared components and omitting its last entry. The estimation of the parameters is easily accomplished by ˆm, ˆm, ˆq = X T X T mx m m Y and ˆσ = ni= n q+ Yi m Qˆθ X i, where T X T X, q+ X m =.... T X T n X n, q+

197 8.. Bandwidth selection in nonparametric linear-directional regression 77 The choice of this parametric model is justified because it is the simplest model that allows an estimation of the first and second derivatives of m and also allows for the computation of the bias integrand in a closed form: B p,θ x = { fxt m+q x,...,q qx q fx + q i= q i, p =, q i= q i, p =. For the estimation of fx in 8. when p =, it is possible to consider an r-mixture of von Mises densities for the unknown density f and fit them using Algorithm., resulting in r f r x = C q ˆκ j ˆκ eˆκ jx T ˆµ j j ˆµ j. j= The h ROT is a simple bandwidth selector that will work better when the regression function m is close to m Q θ. Since this assumption may not be realistic in many situations, it can be considered as the beginning for more sophisticated selectors based on plug-in ideas. Under homocedasticity, there are several possibilities described by Ruppert et al. 995, mimicking in the regression context the ideas given by Sheather and Jones 99 for the density. A first possibility is to estimate the unknown quantity in the denominator of h AMISE assuming wx = w xfx, B = B p x fxw x ω q dx, Ω q by a blocked quadratic fit Härdle and Marron, 995. This estimator could be computed from N estimators 8. on N random subsamples obtained from the ordered sample, i.e.: m Qˆθj obtained from X j = { X j t+,..., X jt }, t = n N, j =,..., N. Then, an estimate of B is B Q N = n N B Q X n p,ˆθ i w X i {Xi X j } j i= j= and the estimation of the variance will be obtained by ˆσ QN = n q + N n N i= j= Y i m Qˆθj X i {Xi X j }. The choice of the tuning parameter N can be done by the Bayesian Information Criteria BIC. Another possibility to improve h ROT is to focus on the local linear case p = and estimate B by B g, ν = n tr H ˆmg,ν X i w X i, with ν, n i= being tr the trace operator and H ˆmg,ν the Hessian matrix of a projected local polynomial estimator ˆm g,ν. This induces a new problem on choosing a suitable bandwidth g for estimating the functional B, which can be done by means of the bandwidth g AMSE that minimizes the Asymptotic Mean Squared Error AMSE of B g, ν. This expression depends on high order derivatives of m, that should be estimated by a parametric fit of order ν this is, cubic or quartic, which will be constructed from a blocked cubic or quartic fit. Further research on these two bandwidth selectors will be required and, particularly, on the extension of the projected local estimator of Chapter 7 to higher polynomial orders. 8.

198 78 Chapter 8. Future research Finally, adaptations of the classical cross-validatory bandwidth selectors can be also defined in this setting. The usual Cross-Validatory CV bandwidth is given by h CV = arg min h> n n i= Y i ˆm i h,p X i, whose computation can be optimized using that ˆm i h,p X i = j i W p nx i,x j W p nx i,x i Y j by the theory of linear smoothers Hastie and Tibshirani, 99, page 6. Then, it is only necessary to compute the hat matrix H = W p nx i, X j ij just once for each bandwidth h without refitting, yielding h CV = arg min h> n n Yi ˆm h,p X i. i= H ii In addition, a bandwidth selector based on the Generalized Cross Validation GCV criterion see Hastie and Tibshirani 99 can be defined also as follows: h GCV = arg min h> n n i= Yi ˆm h,p X i. tr H /n In order to compare the available selectors h ROT, h CV and h GCV and the possible improvements of h ROT, an extensive simulation study similar to the one given in Chapter in the density situation will be carried out. For the design of the simulation study three components must be chosen: the regression function m, the generating process of the directional covariate X and the noise structure in the response, σxε. To account for different combinations, twelve regression models with directional covariate and scalar response, six directional densities and two kinds of noise are proposed. Figure 8. represents the collection of regression models with the following codification: the radius of the q-sphere represents the response mx for a x direction. As the response of the models is contained in the interval,, a translation is applied such that a response is mapped to a zero radius and a response is mapped to a radius 8. The graphs then show the response as a distortion from a perfect circle or sphere, for a sequence of parametric regression models with increasing complexity. Figure 8. shows the six kinds of directional densities, encoded as D to D6 and with increasing complexity. These densities are obtained from the ones in Chapter M to M, see Subsection.5. for notation related with the densities: D is M, D is the mixture 5 M+ 5 M, D is 5 M+ 5 M5, D is 5 M+ 5M, D5 is M5 and D6 is M. The noise considered is ε N,, with two different conditional variances given by σ x = homocedastic and σ x = + f SCx, q,, + f SCx,, q, heteroskedastic. Finally, the combination of these three components is shown in Table 8.. With these simulation scenarios a large variety of situations for homocedasticity and heteroskedasticity are covered for different degrees of complexity: from simple linear models like S S to high non-linear ones like S S, and from simple uniform or unimodal design densities D D to complex multimodal ones D5 D6. The error criterion proposed for measuring the performance of each selector is the Averaged Squared Error ASE, defined as ASEh, p = n mx i ˆm h,p X i, n i=

199 8.. Bandwidth selection in nonparametric linear-directional regression 79 Figure 8.: Directional regression models proposed for the simulation study. From left to right and up to down, models used in scenarios S to S, with the first three rows fo the circular versions and the last three for the spherical ones.

200 8 Chapter 8. Future research Figure 8.: Directional densities to be considered in the simulation study. From left to right and up to down, the first two columns represent the directional densities D to D6 in the circular case. The corresponsing spherical versions are in the third and fourth columns. For each density, a sample of size n = 5 is drawn. where m is the true regression function. This criterion can be seen as the Monte Carlo version of the conditional MISE with weight wx = fx. However, unlike the latter, the ASE does not require to compute an integral, avoiding numerical instabilities when the design density is almost zero and the regression function takes large values this happens for the projected local linear estimator, as well as for the usual Euclidean local linear estimator. For M = replicates, the ASE values will be averaged on the different scenarios for sample sizes n =, 5, 5, dimensions q =,,,, 5, estimators p =, and bandwidth selectors. The improvements related with the h ROT bandwidth, the comparison of all the selectors in a simulation study and the examination of the empirical results constitute the remaining issues to be done in this future work. 8. Kernel density estimation under rotational symmetry A common feature among the most popular distributions in directional statistics is rotational symmetry. For example, in the circular case most of the parametric distributions proposed in the

201 8.. Kernel density estimation under rotational symmetry 8 statistical literature are symmetric around an angle, although skewed distributions have gained certain attention in the last years see Pewsey 6 and Abe and Pewsey, among others. Directional distributions that are rotationally symmetric include the von Mises Watson, 98, specific cases of the projected normal Pukkila and Rao, 988 and the skew normal directional and directional Cauchy, defined in Chapter. Furthermore, axial distributions that is, with density f satisfying fx = f x, x Ω q can also be regarded as rotational symmetric with location parameter not uniquely defined. Some examples of axial distributions are the Watson Mardia and Jupp, and small circle Bingham and Mardia, 978 distributions. Scenario Regression function Parameters Density Noise S mx = m m = D Het. S mx = m + η T x m =, η =, D Het. q S mx = m + η T x +γ T x,..., x q m =, η = q+, γ =, q D Het. S mx = m + ax q+ x q + bx x m =, a =, b = D Het. S5 mx = m + ax b + m =, a =, b = D5 Het. x S6 mx = ae bx logcx q+ x +d max x, x S7 mx = m +d f vm x, q,, κ d f DC x, q,, κ a =, b =, c =, d = D6 Het. m =, d = 5, d = 6, κ =, κ = 5 S8 mx = m +af SN max x, b, c, d m =, a =, b =, c =, d = 5 S9 mx = m +af SN q+ i= x i, b, c, d m =, a =, b =, c =, d = 5 D D D Hom. Hom. Hom. S mx = m + a sinπx +b cosπx m =, a =, b = D Hom. S mx = m + a sinπbx x q+ m =, a =, b = D5 Hom. πb S mx = m + a sin m =, a =, b = D6 Hom. +x q+ Table 8.: Simulation scenarios for comparing the bandwidth selectors. f SN is the Skew Normal distribution of Azzalini 985, while the rest of notations can be seen in Subsection.5.. In general, a directional random variable X with density f is said to be rotationally symmetric around a location θ Ω q if the corresponding density is of the form fx = gx T θ, x Ω q, 8. where g :,, is a function such that ω q gt t q dt =. This class of distributions was firstly proposed by Saw 978 requiring g to be monotone increasing, a necessary assumption to avoid identifiability issues but that excludes the class of axial distributions. The rotational symmetry property 8., equivalent to fx = fx T θθ, x Ω q, is intimately related with the tangent normal decomposition around θ: x = x T θθ + x T θ Bθ ξ, 8.

202 8 Chapter 8. Future research where ξ Ω q and B θ = b,..., b q q+ q is the semi-orthonormal matrix resulting from the completion of θ to the orthonormal basis {θ, b,..., b q }. Using this decomposition, it is possible to define the rotsymmetrizer operator around θ, which transforms any directional density into a rotational symmetric one. Then, for a function f : Ω q R, the rotsymmetrizer operator around θ is S θ fx = f x T θθ + x T θ Bθ ξ ω q dξ. 8.5 ω q Ω q For each point x Ω q, the rotsymmetrizer operator computes the average value of the function f in the q -sphere orthogonal to the location θ using 8., guaranteeing that the final output is a rotational symmetric density around θ. The next result collects some of the properties of this operator. Proposition 8. Properties of the rotsymmetrizer operator. Let f, f, f : Ω q R be directional densities and θ Ω q. The operator 8.5 has the following properties: i. Invariance from B θ : for different semi-orthogonal matrices B θ, and B θ, orthogonal to θ, if S θ, and S θ, denote the rotsymmetrizer operators constructed from B θ, and B θ,, respectively, then S θ, f = S θ, f. ii. Linearity: S θ λ f + λ f = λ S θ f + λ S θ f, for λ, λ R. iii. Density preservation: S θ f is a density. iv. Rotational symmetry: S θ f is rotationally symmetric around θ. If f is rotational symmetric around θ, then S θ f = f. v. For the von Mises density, S θ f vm x; µ, κ = { } C q κ exp κx T θθ T µ. ω q C q κ x T θ µ T θ Recall that if θ = ±µ, then S θ f vm x; µ, κ = f vm x; µ, κ. If µ T θ =, then S θ f vm x; µ, κ = C qκ. ω q C q κ x T θ vi. Particular case when q = : denoting θ = θ, θ and θ = θ, θ, then S θ fx = { } f x T θθ + x T θ θ + f x T θθ x T θ θ. From the rotsymmetrizer definition and its properties, it is easy to derive a rotational kernel density estimator intended to use when rotational symmetry holds. For a random sample {X i } n i= from a directional random variable, by applying the rotsymmetrizer over the usual kernel density estimator see Chapter or it follows: ˆf h,θ x = S θ ˆfh x = n L h,θ x, X i, 8.6 n i=

203 8.. Kernel density estimation under rotational symmetry 8 where L h,θ x, X i = c h,ql ω q Ω q L x T θθ + x T θ B θ ξ T Xi ω q dξ. h In addition, due to property v in Proposition 8., for the von Mises kernel L h,θ x, X i = C q /h exp {x T θθ T X i /h } ω q C q x T θ X T i θ /h, which provides a closed expression for the rotational kernel density estimator that does not require integration. The estimator 8.6 can be regarded as a semiparametric estimator of the unknown rotational symmetric density, as it depends on the location parameter θ. Therefore, the two possibilities are that either θ is known or unknown. For the last case, θ should be estimated first, for example by the Maximum Likelihood Estimator MLE given by the directional sample mean n i= X i if g is monotone Duerinckx and Ley, : ˆθMLE = n. Future research will involve i= X i the development of an estimator that holds for a non monotone g and that can deal with the identifiability issue. Furthermore, bandwidth selection rules for this estimator can be obtained by adaptations of the classical Likelihood Cross Validation LCV and Least Squares Cross Validation LSCV rules: h LSCV = arg min h> CV h, CV h = n h LCV = arg max h> CV KLh, CV KL h = n i= n i= log ˆf h,θ i X i Ω q ˆf i h,θ X i. ˆfh,θ x ω q dx, The bias of the estimator 8.6 follows under similar regularity assumptions that the ones used in Chapters and 6 see also Appendix A, yielding E ˆfh,θ x = S θ fx + b ql tr S θ Hfx h + o h, q uniformly in x Ω q. If f is rotational symmetric around θ, then the bias is the same that for the usual kernel density estimator, since in that case S θ fx = fx and S θ Hfx = S θ g x T θ θθ T = g x T θθθ T = Hfx. The main advantage of the rotational kernel density estimator is the improvement in variance, which is conjectured to be reduced by a factor of ω q with respect to the original kernel density estimator. The derivation of the variance expression, as well as the bias and variance when θ is estimated, will constitute the next steps on the theory development of this estimator. An idea of the performance of the rotational estimator can be seen in Figure 8.. Finally, it is worth to remark that rotational symmetry assumption has played an important role in the development of new methodology with directional data, as it can be seen for example

204 8 Chapter 8. Future research in Ley et al., Ley et al. and Ley and Verdebout. Therefore, it is natural to ask whether this assumption holds or not for a directional random variable, to see if these new methodological tools including the rotational kernel density estimator itself can be applied. An interesting future work will be to test the null hypothesis of rotational symmetry { H : f R = g T θ : θ Ω q, g :,,, } ω q gt t q =, with θ either known or unknown. Following previous approaches, a possible test statistic from a random sample could be R n = ˆfh x ˆf h,ˆθx ωq dx. Ω q Figure 8.: From right to left: vmθ, κ density, rotational kernel density estimators ˆf hlcv,θ and ˆf hlcv,ˆθ MLE and usual kernel density estimator ˆf hlcv. Sample size is n =, θ = q, and κ =. A decomposition of this statistic for studying its asymptotic distribution could be the following: R n = ˆfh x L h fx + L h,θ fx ˆf h,θ x + ˆf h,θ x ˆf Ω h,ˆθx ωq dx q = ˆfh x L h fx ωq dx + L h,θ fx ˆf h,θ xx ωq dx Ω q + Ω q ˆfh,θ x ˆf h,ˆθx ωq dx ˆfh x L h fx L h,θ fx ˆf h,θ xx ω q dx Ω q Ω q Ω q ˆfh x L h fx ˆfh,θ x ˆf h,ˆθx ω q dx Ω q L h,θ fx ˆf h,θ xx ˆfh,θ x ˆf h,ˆθx ω q dx

205 8.. R package DirStats 85 = R n, + R n, + R n, + R n, + R n,5 + R n,6, where the smoothing operators L h and L h,θ are included to remove asymptotic bias and are defined as L h fx = L h x, yfy ω q dy, Ω q L h,θ fx = L h,θ x, yfy ω q dy, Ω q where L h x, y = c h,q LL x T y and L h h,θ was defined in 8.6. For understanding the previous splitting, it is important to note that L h,θ fx = L h S θ fx H = L h fx, x Ω q. The asymptotics of R n, were given by Zhao and Wu, whereas it is expected that R n, presents a similar limit distribution to R n, and that R n, contributes on the asymptotic bias. On the other hand, R n,, R n,5 and R n,6 are very likely to be negligible in probability if ˆθ θ = O P n holds without requiring that g is monotone. The answers to these unknowns and the design of a bootstrap mechanism to calibrate the test statistic under H will constitute a new research line. 8. R package DirStats All the methods proposed in the thesis have been implemented in R R Development Core Team, using calls to FORTRAN to speed up critical computations, tested in simulation studies and applied to real data examples. As the source codes of these methods have not been made public yet, an obvious future step is to collect, organize and document the code in an R package. There exist libraries to analyse circular data in R, such as CircStats Lund and Agostinelli,, circular Agostinelli and Lund,, CircNNTSR Fernández-Durán and Gregorio-Domínguez,, isocir Barragán et al., and NPCirc Oliveira et al.,. Since these libraries are only designed to handle circular data, the focus of the DirStats package will be on basic and advanced methods to deal with directional and linear data. A preliminary skeleton of the DirStats package, organized by topics, is given below: A. Class dir and related functions. A new class named dir, intended for directional data, will be the basis for all the methods employing a directional input. The class will verify if the squared norm of the elements is indeed one, store the dimension, sample size and information about the coordinate system. In addition to the typical functions linked to a class, others like dir.to.rad, rad.to.cir, rad.to.sph, sph.to.aitoff and aitoff.to.sph will allow to make conversions between the most important coordinate systems. B. Descriptive statistics. Basic descriptive measures given in Mardia and Jupp such as the directional mean mean.dir, the mean resultant length R.dir and the moment of inertia MI.dir will be included. The function summary.dir will collect these statistics and maybe more advanced ones, such as the quantiles for rotational symmetric directional data described in Ley et al.. C. Graphical functions. The main graphic function in this topic will be plot.dir, that creates representations for circular, spherical, cylindrical and toroidal data from modifications of codes coming from the plotrix Lemon, 6 and rgl Adler et al.,

206 86 Chapter 8. Future research packages. Second level graphical functions will be points.dir and lines.dir to add information in any of the four possible manifolds. The function jitter.dir will allow to avoid plotting repeated points in the graphics by perturbing the data in any of the manifolds. D. Nonparametric estimation tools. The estimation procedures are divided in two parts considering the density and the regression functions. D.. Density estimation. Kernel density estimation on Ω q density.dir with a flag rot=true to account for the estimator of Section 8., on Ω q R density.dir.lin and on Ω q Ω q density.dir.dir will be the key functions in this part. Bandwidth selection rules for the directional kernel density estimator see Chapter will be performed by bw.dir, with option type=c"emi","ami","lcv","lscv","rt" for choosing a selector from the ones given in Chapter and flag rot=true to account for the rotational versions. Cross-validation bandwidths for the other situations will be available throughout the functions bw.dir.lin and bw.dir.dir, both with option type=c"lcv","lscv","rt". D.. Regression estimation. The function locreg.lin.dir will implement the projected local regression estimator of a linear variable Y on a directional predictor X given in Chapter 7, with p=c, specifying the type of local fit. Bandwidth selection will be done by means of function bw.reg, with type=c"gcv","cv","rt","pi" providing the kind of selector discussed in Section 8.. It could be possible to implement also the regression estimator with directional response and linear locreg.dir. lin or directional locreg.dir.dir predictor, both proposed in Di Marzio et al.. E. Nonparametric testing tools. The testing methods are also dividend into two main subtopics, depending on the kind of curve that generates them. E.. Tests for the density. The goodness-of-fit tests for parametric models of Boente et al. directional and of Chapter 6 directional-linear and directional-directional will be included using the bootstrap resampling by gof.dir.test, gof.dir. lin.test and gof.dir.dir.test, respectively. The independence tests given in Chapter 5, based on permutations for calibration, will be available in indep.dir.lin. test and indep.dir.dir.test. A possible future inclusion is the test for rotational symmetry presented in Section 8. rotsym.test. All the tests will be computed both in a deterministic grid of bandwidths significance trace and for data-driven bandwidths. The test for the Johnson and Wehrly copula structure for circular-linear and circular-circular random variables outlined in Section 8. could be added as the function gof.jw. E.. Tests for the regression. The goodness-of-fit tests for parametric regression functions including the test for significance that were studied in Chapter 7 will be accessible by the gof.reg.test function, with the option p=c, for controlling the kind of local estimator used in the fit. The test is performed in a grid of bandwidths and also for data-driven bandwidths. F. Parametric models. Different collections of ready-to-use density and regression models, including simulation and fitting methods are included in this part.

207 8.. A goodness-of-fit test for the Johnson and Wehrly copula structure 87 F.. Density. The collection of directional parametric models used in the simulation study of Chapter will be provided by the functions ddir density, rdir simulation and fitdir fitting, with option type=: indicating the model. Similarly, the collections of circular-linear and circular-circular parametric densities used in Chapter 6 will be accessible via dcirlin, dcircir, rcirlin, rcircir, fitcirlin and fitcircir. F.. Regression. The collection of parametric regression models proposed in Section 8. will be implemented by the functions freg regression function, fitreg estimation of parameters and rreg simulation, all of them using option type=: to select the regression function. Simulation routine rreg will depend on functions noise.dir.lin and rdir. G. Datasets. The datasets described in Chapter will be included in the package or linked to the original source. Object windso will store the data from Chapter, containing the hourly measures of the wind direction and SO concentration during and. The data from Chapter will be available throughout objects wind and hipparcos, that will either contain or download the Hipparcos dataset van Leeuwen, 7. The fires object will contain either the complete Portuguese wildfires dataset considered in Chapter 5 or the short version used in Chapter 6. Finally, the slashdot and slashdot.dtm objects will store the original dataset and the document term matrix used in Chapter 7. The protein dataset from Chapter 6 is available via ProteinsAAA in the CircNNTSR package. H. Auxiliary functions. This part comprises, among others: numerical integration rules on Ω q, Ω q R and Ω q Ω q ; deterministic grids on the circle, sphere, cylinder and torus; functions to control multiple D devices; specific circular methods not available in other packages and included for comparison, such as the correlation coefficients and tests of Mardia 976, Johnson and Wehrly 977 and Fisher and Lee 98 considered in Chapter A goodness-of-fit test for the Johnson and Wehrly copula structure One of the starting points in this thesis was the Johnson and Wehrly 978 model for circularlinear densities and the Wehrly and Johnson 979 model for circular-circular densities, which were introduced in Chapter and references therein, and recently studied by Jones et al.. Essentially, this model assumes that the underlying copula Nelsen, 6 has a specific semiparametric structure, indexed by the set of all possible circular densities. A natural question to ask is whether this semiparametric structure is suitable for modelling a given circular-linear or circular-circular variable. In the particular case where the marginal densities and link function g belong to certain parametric families this question can be answered by the goodness-of-fit tests given in Chapter 6. However, if these functions are not fixed, the interest is on checking if the relation of both variables could be modelled by a copula belonging to the Johnson and Wehrly family. Denote by X, X any combination of circular and/or linear variables with copula C see Chapter for further details, by U j = F j X j the ranks of X j, j =,, with F j the corresponding

208 88 Chapter 8. Future research cumulative distribution function cdf and by C the class of copula functions. Testing the copula structure of Johnson and Wehrly is equivalent to test { } H,q : C C J W,q = C C : Cu, u = πg πu + qu, g a circular density u u being q = ± the index for the two possible kinds of correlation between U and U : positive q = and negative q =. Recall that a circular density g : R, should satisfy that: i π gθ dθ = and ii gθ + kπ = gθ, θ, π, k Z. Note also that independence when g is the circular uniform density is included in both H, and H,. A possible approach to assess if H holds is to consider the empirical version of the copula C and compare it with an empirical copula under H, leading to an empirical process van der Vaart and Wellner, 996. If weak convergence of this empirical process is proved, then the asymptotic distribution of continuous functionals such as the Kolmogorov-Smirnov and Cramér-von Mises ones will follow easily by the continuous mapping theorem, providing two suitable goodness-of-fit tests. The empirical copula defined from a sample {X i, X i } n i= is given by C n u, u = n n {Fn X i u,f n X i u }, i= where F jn x = n ni= {Xji x} are the empirical cdf of X j, j =,. It can be seen easily that C n is not a copula Nelsen, 6. Under H,q, the copula can be analytically computed from its density, yielding C G u, u = = q u u u πg πs + qt dt ds G πs + qu G πs ds = πq Gπu + qu Gπu Gπqu + G = πq Gπu + qu Gπu Gπqu, 8.7 where G : π, π, is the function defined by Gx = x s gt dt ds, this is, the integral from to x R of the cdf associated to the circular density g. Based on properties of the circular density g, namely that s gt+πk dt = s gt dt and πk gt dt = k with k Z, it happens that the function G satisfies Gx + πk = Gx + Gπk + kx, x R, k Z. 8.8 This is an important relation that helps in proving that C G is indeed a copula. Then, in view of the expression 8.7, an estimator from C G will arise by plugging-in a suitable estimator G n for G, which must satisfy relation 8.8. To account for this, recall another

209 8.. A goodness-of-fit test for the Johnson and Wehrly copula structure 89 important fact related with g: if the copula of U, U is C G, then πu + qu is a circular random variable with density g. This motivates an estimator for G by means of the integration of the empirical cdf on pseudo observations {Θ i } n i=, where Θ i = π F n X i + qf n X i mod π, i =,..., n, using the fact that x n n {Θi s} ds = n n {Θi x}x Θ i. i= i= Then, a possible estimator G n for G that satisfies 8.8 can be constructed as ni= n {Θi x}x Θ i, x, π, G G n x = n x π + G n π + x π, x π, π, G n x + π + G n π x + π, x π,, G n x + π + G n π x + π, x π, π, where G n π = π G n π, G n π = G n π + π and G n π = 8π G n π. The explanation of this estimator is the following: using the modulus π all the data {Θ i } n i= belongs to the interval, π, where G is estimated by G n and later extended to the intervals π, π and π, using relation 8.8. The empirical copula C Gn has two main advantages over C n when H,q holds: first, it is always a copula for any sample size n; second, it seems more efficient in estimating C G see Figure 8.. Future research will involve the study of the properties of C Gn, such as the bias and the variance Figure 8.: From right to left: contourplots for copula C G, copula C Gn and empirical copula C n, for a sample size of n = 5 with q = and gθ = πi κ e κ cosθ µ a circular von Mises density with µ = π and κ =. The empirical copula process for assessing H,q can be written as C n = n C n C Gn = n C n C + n C C G + n C G C Gn = C n + C n + C n. The weak convergence of C n was established by Fermanian et al., whereas C n is a deterministic bias term that vanishes if H,q holds. The study of the weak convergence of C n, its relation with C n and an effective resampling procedure for the calibration of the test statistic under H,q is the main part of this future work.

210 9 Chapter 8. Future research References Abe, T. and Pewsey, A.. Sine-skewed circular distributions. Statist. Papers, 5: Adler, D., Murdoch, D., and others. rgl: D visualization device system OpenGL. R package version Agostinelli, C. and Lund, U.. R package circular: circular statistics version.-7. Azzalini, A A class of distributions which includes the normal ones. Scand. J. Statist., :7 78. Barragán, S., Fernández, M. A., Rueda, C., and Peddada, S. D.. isocir: an R package for constrained inference using isotonic regression for circular data, with an application to cell biology. J. Stat. Softw., 5: 7. Bingham, C. and Mardia, K. V A small circle distribution on the sphere. Biometrika, 65: Boente, G., Rodríguez, D., and González-Manteiga, W.. Goodness-of-fit test for directional data. Scand. J. Stat., : Di Marzio, M., Panzera, A., and Taylor, C. C.. Nonparametric regression for spherical data. J. Amer. Statist. Assoc., 956: Duerinckx, M. and Ley, C.. Maximum likelihood characterization of rotationally symmetric distributions on the sphere. Sankhya A, 7:9 6. Fermanian, J.-D., Radulović, D., and Wegkamp, M.. copula processes. Bernoulli, 5: Weak convergence of empirical Fernández-Durán, J. J. and Gregorio-Domínguez, M. M.. CircNNTSR: an R package for the statistical analysis of circular data using NonNegative Trigonometric Sums NNTS models. R package version.. Fisher, N. I. and Lee, A. J. 98. Biometrika, 68: Nonparametric measures of angular-linear association. Härdle, W. and Marron, J. S Fast and simple scatterplot smoothing. Comput. Statist. Data Anal., : 7. Hastie, T. J. and Tibshirani, R. J. 99. Generalized additive models, volume of Monographs on Statistics and Applied Probability. Chapman & Hall, London. Johnson, R. A. and Wehrly, T Measures and models for angular correlation and angularlinear correlation. J. Roy. Statist. Soc. Ser. B, 9: 9. Johnson, R. A. and Wehrly, T. E Some angular-linear distributions and related regression models. J. Amer. Statist. Assoc., 76:6 66. Jones, M. C., Pewsey, A., and Kato, S.. On a class of circulas: copulas for circular distributions. Ann. Inst. Statist. Math., to appear.

211 References 9 Lemon, J. 6. Plotrix: a package in the red light district of R. R-News, 6:8. Ley, C., Sabbah, C., and Verdebout, T.. A new concept of quantiles for directional data and the angular Mahalanobis depth. Electron. J. Stat., 8: Ley, C., Swan, Y., Thiam, B., and Verdebout, T.. Optimal R-estimation of a spherical location. Statist. Sinica, :5. Ley, C. and Verdebout, T.. Local powers of optimal one-sample and multi-sample tests for the concentration of Fisher-von Mises-Langevin distributions. Int. Stat. Rev., to appear. Lund, U. and Agostinelli, C.. CircStats: Circular Statistics, from Topics in circular statistics. R package version.-. Mardia, K. V Linear-circular correlation coefficients and rhythmometry. Biometrika, 6: 5. Mardia, K. V. and Jupp, P. E.. Directional statistics. Wiley Series in Probability and Statistics. John Wiley & Sons, Chichester, second edition. Nelsen, R. B. 6. An introduction to copulas. Springer Series in Statistics. Springer, New York, second edition. Oliveira, M., Crujeiras, R. M., and Rodríguez-Casal, A.. NPCirc: nonparametric circular methods. J. Stat. Softw., to appear. Pewsey, A. 6. Modelling asymmetrically distributed circular data using the wrapped skewnormal distribution. Environ. Ecol. Stat., : Pukkila, T. M. and Rao, C. R Pattern recognition based on scale invariant discriminant functions. Inform. Sci., 5: R Development Core Team. R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. Ruppert, D., Sheather, S. J., and Wand, M. P An effective bandwidth selector for local least squares regression. J. Amer. Statist. Assoc., 9:57 7. Saw, J. G A family of distributions on the m-sphere and some hypothesis tests. Biometrika, 65:69 7. Sheather, S. J. and Jones, M. C. 99. A reliable data-based bandwidth selection method for kernel density estimation. J. Roy. Statist. Soc. Ser. B, 5: van der Vaart, A. W. and Wellner, J. A Weak convergence and empirical processes: With applications to statistics. Springer Series in Statistics. Springer, New York. van Leeuwen, F. 7. Hipparcos, the new reduction of the raw data, volume 5 of Astrophysics and Space Science Library. Springer, Dordrecht. Watson, G. S. 98. Statistics on spheres, volume 6 of University of Arkansas Lecture Notes in the Mathematical Sciences. John Wiley & Sons, New York.

212 9 Chapter 8. Future research Wehrly, T. E. and Johnson, R. A Bivariate models for dependence of angular observations and a related Markov process. Biometrika, 67: Zhao, L. and Wu, C.. Central limit theorem for integrated square error of kernel estimators of spherical density. Sci. China Ser. A, :7 8.

213 Appendix A Supplement to Chapter 6 This supplement is organized as follows. Section A. contains the detailed proofs of the required technical lemmas used to prove the main results in the paper. The section is divided into four subsections to classify the lemmas used in the CLT of the ISE, the independence test and the goodness-of-fit test, with an extra subsection for general purpose lemmas. Section A. presents closed expressions that can be used in the independence test, the extension of the results to the directional-directional situation and some numerical experiments to illustrate the convergence to the asymptotic distribution. Section A. describes in detail the simulation study of the goodnessof-fit test to allow its reproducibility: parametric models employed, estimation and simulation methods, the construction of the alternatives, the bandwidth choice and further results omitted in the paper. Finally, Section A. shows deeper insights on the real data application. A. Technical lemmas A.. CLT for the ISE Lemma A. presents a generalization of Theorem in Hall 98 for degenerate U-statistics that, up to the authors knowledge, was first stated by Zhao and Wu under different conditions, but without providing a formal proof. This lemma, written under a general notation, is used to prove asymptotic convergence of the ISE when the variance is large relative to the bias nφh, gh q g and when the bias is balanced with the variance nφh, gh q g δ. Lemma A.. Let {X i } n i= be a sequence of independent and identically distributed random variables. Assume that H n x, y is symmetric in x and y, E H n X, X X = almost surely and E HnX, X <, n. A. Define G n x, y = E H n x, X H n y, X and ϕ n, satisfying E ϕ n X = and E ϕ nx <. Define also: M n X = E ϕ n X H n X, X X, A n = ne ϕ nx + n E MnX B n = ne ϕ nx + n E + n E H nx, X HnX, X If A n B n as n and U n = n i= ϕ n X i + i<j n H n X i, X j, B n U n d N, n E G nx, X,

214 9 Appendix A. Supplement to Chapter 6 Note that when ϕ n, U n is an U-statistic and Theorem in Hall 98 is a particular case of Lemma A.. Proof of Lemma A.. To begin with, let consider the sequence of random variables {Y ni } n i=, defined by { ϕn X, i =, Y ni = ϕ n X i + i j= H nx i, X j, i n. This sequence generates a martingale S i = i j= Y nj, i n with respect to the sequence of random variables {X i } n i=, with differences Y n i and with S n = U n. To see that S i = i j= Y nj, i n is indeed a martingale with respect to {X i } n i=, recall that i+ i+ j E S i+ X,..., X i = E ϕ n X j X,..., X i + E H n X j, X k X,..., X i j= j= k= i i j = ϕ n X j + H n X j, X k j= j= k= = S i because of the null expectations of E ϕ n X and E H n X, X X. The main idea of the proof is to apply the martingale CLT of Brown 97 see also Theorem. of Hall and Heyde 98, in the same way as Hall 98 did for the particular case where ϕ n. Theorem of Brown 97 ensures that if the conditions C. lim n s n C. s n Vn n i= p, E Yn i { Yni >εs n} =, ε >, are satisfied, with s n = E U n and V n = n i= E Y n i X,..., X i, then s n U n d N,. The aim of this proof is to prove separately both conditions. From now on, expectations will be taken with respect to the random variables X,..., X n, except otherwise is stated. Proof of C. The key idea is to give bounds for E Yn i and prove that s ni= n E Yn i as n. In that case, the Lindenberg s condition C follows immediately: lim n s n n i= E Yn i { Yni >εs n} lim n s n In order to compute s n = E U n, it is needed n i= = ε lim n s n =. E Yn i ε s n n i= E Y n i E Y n i = { E ϕ n X, i =, E ϕ nx i + i E H nx, X, i n,

215 A.. Technical lemmas 95 where the second case holds because the independence of the variables, the tower property of the conditional expectation and A. ensure that E ϕ n X H n X, X = E H n X, X H n X, X =. Using these relations and the null expectation of ϕ n X, it follows that for j k, k E Y nj Y nk = E ϕ n X j E ϕ n X k + E ϕ n X j H n X k, X l + =. j m= l= k E ϕ n X k H n X j, X m + j l= m= E H n X k, X l H n X j, X m Then: s n = ne ϕ nx n + j E j= HnX, X = O B n. A. On the other hand, E Y n i i = E ϕ n X i + H n X i, X j j= = O E ϕ i nx i + O E H n X i, X j = O E ϕ nx + i i O + i O E j= E HnX, X H nx, X H nx, X where the equalities are true in virtue of Lemma A. and because E H n X, X H n X, X H n X, X H n X, X 5 = E Hn X, X H n X, X =. Finally, n i= Then, joining A. and A., and C is satisfied., E Y n i = no E ϕ nx + nn O E HnX, X + n no E G nx, X = O A n. A. s n n i= E Y n i = O Bn A n n

216 96 Appendix A. Supplement to Chapter 6 Proof of C. Now it is proved the convergence in squared mean of s n Vn to, which implies that s n Vn p, by obtaining bounds for E Vn. First of all, let denote Vn = n i= ν ni, where ν ni = E Y n i X,..., X i = E ϕ i i i nx i + ϕ n X i H n X i, X j + H n X i, X j H n X i, X k X,..., X i = E ϕ nx i = E ϕ nx i + j= i + j= j= i i j= k= M n X j + E H n X i, X j H n X i, X k X j, X k j= k= i M n X j + G n X j, X j + j= j<k i G n X j, X k. Using Lemma A., the Jensen inequality and that for j k, j k, E G nx, X, j = k = j = k, E G E G n X j, X k G n X j, X k = n X, X, j = k j = k, E G nx, X, j = j < k = k,, otherwise, it follows: E ν n i = O E ϕ nx i i i i + O E M n X j M n X k j= k= + O E G n X j, X j G n X k, X k + j= k= = O E j <k i j <k i O E G n X j, X k G n X j, X k ϕ nx + i O E MnX + i i O E M n X + i O E G nx, X + i i O E G n X, X + i i O E G nx, X. Applying again the Lemma A., E Vn n = E ν ni = i= n i= O E ν n i. By the two previous calculus and bearing in mind that E G n X, X = O E HnX, X by the Cauchy-Schwartz inequality and E M n X = by the tower property, it yields: E Vn = no E ϕ nx + nn O E MnX + nn n O E HnX, X + nn n O E G nx, X = O A n.

217 A.. Technical lemmas 97 Then, using the bound for E V n, that s n = B n and that E V n = s n, it results E Then s n s n Vn V = s n E n s n = s n E Vn s n Vn converges to in squared mean, which implies s n Vn Lemma A.. Under conditions A A, n φh, g In, d N,. s n E Vn = O Bn A n. p. Proof of Lemma A.. The asymptotic normality of I n, = n i= I i n, will be derived checking the Lindenberg s condition. To that end, it is needed to prove the following relations: E I i n, =, E I i E n, = O I i n, n h 8 + g 8 = n φh, g + o,, s n = O n h 8 + g 8 where s n = n I i i= E n, and φh, g is defined as in Theorem 6.. If these relations hold, the Lindenberg s condition is satisfied: s n lim n s n n I i E { } n, I i =, ε > n, >εsn i= n I i n E { } n, I i E n, >εsn i= i= I = ε i ne n, = ε O n. I i ε n, s n, O n h 8 + g 8 Therefore s n I n, d N,, which, by Slutsky s theorem, implies that n φh, g In, d N,. In order to prove the moment relations for I i n, and bearing in mind the smoothing operator 6., let denote Ĩ i n, = c h,ql x T X i LK ng h, z Z i E ˆfh,g x, z fx, z dz ω q dx, g = n LK h,g E ˆfh,g X i, Z i fx i, Z i so that I i n, = Ĩi n, EĨi i n,. Therefore, E I n, = and Ĩi n, can be decomposed in two addends by virtue of Lemma A.: Ĩ i n, = bq L n LK h,g tr H x fx i, Z i h + µ K H z fx i, Z i g + o n h + g q = Ĩi, n, + Ĩi, n, + o n h + g,

218 98 Appendix A. Supplement to Chapter 6 where Ĩi,j n, = δ jlk h,g ϕ j f, X i, Z i and ϕ j f, x, z = { tr Hx fx, z, j =, H z fx, z, j =, δ j = { bql q h n, j =, µ Kg n, j =. Note that as the order o h + g is uniform in x, z Ω q R, then it is possible to extract it from the integrand of Ĩi n,. Applying Lemma A. to the functions ϕ jf,,, that by A are uniformly continuous and bounded, it yields LK h,g ϕ j f, y, t ϕ j f, y, t uniformly in y, t Ω q R as n. So, for any integers k and k : lim n δ k δ k E = lim = = E Ĩi, n n, k Ĩi, k n, LK h,g ϕ f, y, t k LK h,g ϕ f, y, t k fy, t dt ω q dy ϕ f, y, t k ϕ f, y, t k fy, t dt ω q dy ϕ f, X, Z k ϕ f, X, Z k. Here the limit can commute with the integral by the Dominated Convergence Theorem DCT, since the functions LK h,g ϕ j f, y, t k are bounded by condition A and the construction of the smoothing operator 6., being this dominating function integrable: LK h,g ϕ f, y, t k LK h,g ϕ f, y, t k fy, t sup ϕ f, x, z k ϕ f, x, z k fy, t. Recapitulating, the relation obtained is: E Ĩi, n, k Ĩi, n, Now it is easy to prove: E I i n, k k n k +k b ql k Ĩi, E = E E q k x,z µ K k h k g k E tr H x f, X, Z k H z fx, Z k. n, + Ĩi, Ĩi, Ĩi, n, E n, + E n, Ĩi, Ĩi, Ĩi, n, + E n, E n, Ĩi, n, Ĩi, Ĩi, Ĩi, Ĩi, n, E n, E n, E n, n bq L q Var tr H x f, X, Z h + µ K Var H z fx, Z g + b qlµ K Cov tr H x f, X, Z, H z fx, Z h g q = n φh, g. With the previous expression, it follows E I n, i = O n h + g see the first point of Lemma A. and s n = n φh, g + o = O n h + g. Then by the fourth point of Lemma A.: s n = s n = O n h + g = O n h 8 + g 8,

219 A.. Technical lemmas 99 E I i n, = O E Ĩi n, + E Ĩi n, = O E Ĩi n,, where E Ĩi n, = O E Ĩi n, + E Ĩi n, = O n h 8 + g 8. Lemma A.. Under conditions A A, I n, = E I n, + O P n h q g = λ ql λ q L RK nh q + O P n h q g. g Proof of Lemma A.. To prove the result the Chebychev inequality will be used. To that end, the expectation and variance of I n, = c h,ql ni= I i n g n, have to be computed. But first recall that, by Lemma A. and 6., for i and j naturals, x LK j T y h, z t ϕ i y, t dt ω q dy h q gλ q L j ϕ i x, z, g A. uniformly in x, z Ω q R, with ϕ a uniformly continuous and bounded function and λ q L j = ω q q L j rr q dr. The following particular cases of this relation are useful to shorten the next computations: i. E LK x T X h ii. LK x T y h, z Z g h q gλ q Lfx, z,, z t g dz ωq dx h q gλ q L RK. Expectation of I n,. The expectation is divided in two addends, which can be computed by applying the relations i ii: E I i n, = E = = E LKn x, z, X, Z dz ω q dx E LK x T X h, z Z dz ω q dx g dz ω q dx x T X E LK h, z Z g x LK T X h, z Z dz ω q dx h q g λ q L Rf + o g = h q gλ q L RK + O Therefore, the expectation of I n, is h q g. E I n, = λ ql nh q g h q gλ q L RK + O h q g = λ ql λ q L RK nh q + O n. g

220 Appendix A. Supplement to Chapter 6 Variance of I n,. For the variance it suffices to compute its order, which follows considering the third point of Lemma A.: E I i n, = = O { I i, n, + Ii, n,, where the involved terms are { I i, x n, = LK T y h { I i, x T X n, = E LK h LK n x, z, y, t dz ω q dx} fy, t dt ω q dy, z t dz ω q dx} fy, t dt ω q dy, g dz ω q dx, z Z g Using relations i ii the orders of the addends I i,k n,, k =,, follow easily: I i, { n, h q gλ q L RK} fy, t dt ωq dy } fy, t dt ω q dy. = h q g λ q L RK, { I i, n, h q gλ q Lfx, z dz ω q dx} fy, t dt ω q dy = h q g λ q L Rf. Therefore I i, n, = O h q g, I i, n, = O h q g and E I i n, The variance of I n, is so by Chebychev s inequality which, by definition, is Var I n, n c h,q L g n i= E i, = O I n, I i n, = O n h q g, { P I n, E I n, kn h q g }, k >, k i, +O I n, = O h q g. I n, = E I n, + O P n h q g = λ ql λ q L RK nh q + O P n h q g, g because O n = O P n h q g. Lemma A.. Let be H n x, z, y, t = LK n u, v, x, z LK n u, v, y, t dv ω q du, G n x, z, y, t = E H n X, Z, x, z H n X, Z, y, t, M n X, Z = c h,ql n g E I n, H n X, Z, X, Z X, Z.

221 A.. Technical lemmas Then, under conditions A A, E Hn X, Z, X, Z = h q g λ q L σ + o, A.5 E Hn X, Z, X, Z = O h 5q g 5, A.6 E G n X, Z, X, Z = O h 7q g 7, A.7 E MnX, Z = O n 6 h + g h q g. A.8 Proof of Lemma A.. The proof is divided in four sections. Proof of A.5. E H n X, Z, X, Z can be split into three addends: E Hn X, Z, X, Z = E LK n x, z, X, Z LK n x, z, X, Z dz ω q dx = E LK n x, z, X, Z LK n x, z, X, Z = = LK n y, t, X, Z LK n y, t, X, Z dz ω q dx dt ω q dy = A A + A, E LK n x, z, X, Z LK n y, t, X, Z dz ω q dx dt ω q dy E x, z, y, t E x, z, y, t dz ω q dx dt ω q dy where: x T X E x, z, y, t = E LK h, z Z y T X LK g h, t Z, g x T X E x, z, y, t = E LK h, z Z y T X E LK g h, t Z. g The dominant term of the three is A, which has order O h q g, as it will be seen. The terms A and A have order O h q g, which can be seen applying iteratively the relation A.: A = E x, z, y, te x, z, y, t dt ω q dy dz ω q dx h q gλ q LLK x T u h h q g λ q L fx, zfy, t h q g λ q L fx, z dz ω q dx, A =, z t fy, t g dt ω q dy dz ω q dx L x, z, y, t dt ω q dy dz ω q dx

222 Appendix A. Supplement to Chapter 6 = h q g λ q L Rf. h q g λ q L fx, z fy, t dt ω q dy dz ω q dx Let recall now on the term A. In order to clarify the following computations, let denote by x, x, y, y and z, z the three variables in Ω q R that play the role of x, z, y, t and u, v, respectively. The addend A in this new notation is: x T z A = LK h, x z y T z LK g h, y z g fz, z dz ω q dz dy ω q dy dx ω q dx. The calculus of A will be divided in the cases q and q =. There are several changes of variables involved, which will be detailed in i iv. To begin with, let suppose q : A i = Ω q R Ω q t +τ < R t LK, x z st τ s LK f h g tx + τb q ξ + h t τ A ξ η, z, y z g t τ q ii = dz dt dτ ω q dη s q dy ds ω q dξ dx ω q dx LK f Ω q h R r + ρ h rρ θ Ω q h R rρ h r h ρ LK ρ, x z g, y z g h ρx + h ρ h ρ θb x ξ + θ Aξ η, z θ q h q ρ h q ρ h ρ h ρ dz dθ dρ ω q dη h q r q h r q h dy dr ω q dξ dx ω q dx h h iii = h q g LK ρ, u LK f Ω q r + ρ h rρ θ R Ω q rρ h r h ρ R, u + v h ρx + h ρ h ρ θb x ξ + θ Aξ η, x ug θ q dv dr ω q dξ dx ω q dx ρ h q ρ du dθ dρ ω q dη r q h r q

223 A.. Technical lemmas iv h q g Ω q R Ω q R LK ρ, u LK r + ρ θ rρ, u + v f x, x θ q ρ q du dθ dρ ω q dη r q dv dr ω q dξ dx ω q dx = h q g Rfω q ω q q r q R R ρ q LKρ, u θ q LK r + ρ θrρ, u + v dθ dρ du dv dr = h q g λ q L σ. The steps for the calculus of the case q are the following: i. Let x a fixed point in Ω q, q. Let be the change of variables: y = sx + s Bx ξ, ω q dy = s q ds ω q dξ, where s,, ξ Ω q and B x = b,..., b q q+ q is the semi-orthonormal matrix B T x B x = I q and B x B T x = I q+ xx T resulting from the completion of x to the orthonormal basis {x, b,..., b q } of R q+. Here I q represents the identity matrix with dimension q. See Lemma of García-Portugués et al. b for further details. Consider also the other change of variables z = tx + τb x ξ + t τ Aξ η, ω q dz = t τ q dt dτ ω q dη, where t, τ,, t + τ <, η Ω q and A ξ = a,..., a q q+ q is the semi-orthonormal matrix A T ξ A ξ = I q and A ξ A T ξ = I q+ xx T B x ξξ T B T x resulting from the completion of {x, B x ξ} to the orthonormal basis {x, B x ξ, a,..., a q } of R q+. This change of variables can be obtained by replicating the proof of Lemma in García- Portugués et al. b with an extra step for the case q. With these two changes of variables, y T z = st + τ s, x T B x ξ = x T A ξ η = B x ξ T A ξ η =. ii. Consider first the change of variables r = s and then h { ρ = t, h τ t, τ θ =, ρ, θ = h ρ h ρ hρ h ρ With this last change of variables, τ = hθ ρ h ρ, t = h ρ and, as a result: s = h r h r, t = h ρ h ρ, t τ = θ h ρ h ρ, st τ s h. = r + ρ h rρ θ rρ h r h ρ.

224 Appendix A. Supplement to Chapter 6 iii. Use u = x z g and v = y x g. iv. By expanding the square, A can be written as where A = h q g Ω q R Ω q R Ω q ϕ n x, x, r, ρ, θ, u, v, ξ, η ϕ n x, x, r, ρ, θ, u, v, ξ, η du dθ dρ ω q dη du dθ dρ ω q dη dv dr ω q dξ dx ω q dx, A.9 ϕ n x, x, r, ρ i,θ i, u i, v, ξ, η i = L ρ i L r + ρ i h rρ i θ rρ i h r h ρ i K u i K u i + v f x, x + α h,g θi q ρ q i h ρ i q r q h r q,h r,h ρ i, with α h,g = h ρ i x + h ρ i h ρ i θb x ξ + θi A ξ η i, ui g and i =,. A first step to apply the DCT is to see that by the Taylor s theorem, f x, x + α h,g = fx, x + O α T h,g fx, x, where the remaining order is O h ρ i + g u i fx, x because α h,g = h ρ i + g u i. Furthermore, the order is uniform for all points x, x because of the boundedness assumption of the second derivative given by A see the proof of Lemma A.. Next, as h, g, then the order becomes o ρ i + u i fx, x. R For bounding the directional kernel L, recall that by completing the square, h r h ρ i = h r + ρ i + h r + ρ i / h r + ρ i / rρ i h r + ρ i. Using this, and the fact that θ,, for all r, ρ i, h, r + ρ i h rρ i θ rρ i h r h ρ i r + ρ i h rρ i rρ i h r h ρ i r + ρ i h rρ i rρ i h r + ρ i = r + ρ i rρ i + h rρ i r + ρi rρ i r + ρ i rρ i,

225 A.. Technical lemmas 5 where the last inequality follows because the last addend is positive by the inequality of the geometric and arithmetic means. As L is a decreasing function by A, L r + ρ i h rρ i θ rρ i h r h ρ i L r + ρ i rρ i. Then for all the variables in the integration domain of A, ϕ n x, x, r, ρ i, θ i, u i, v, ξ, η i L ρ i L r + ρ i rρ i K u i K u i + v fx, x + o ρ i + u i fx, x θi q ρ q i q r q q, r, ρ i = Ψx, x, r, ρ i, θ i, u i, v. The product of functions ϕ n in A.9 is bounded by the respective product of functions Ψ. The product is also integrable as a consequence of assumptions A integrability of f and f, A integrability of kernels and that the product of integrable functions is integrable. To prove it, recall that by the integral definition of the modified Bessel function of order q see equation.. of Olver et al. : πγ q θ q dθ = Γ q <, q. The integral of the linear kernel is proved to be finite using the Cauchy-Schwartz inequality and A: Ku Ku + vku Ku + v du du dv R R R = Ku Ku Ku + vku + v dv du du R R R Ku Ku µ K µ K du du R R = µ K. For the directional situation, the following auxiliary result based on A is needed: L r ρi r q dr = = = O Using this and that Lρρ q 5 dr λ q+ q k= L s s + ρ i q s dr q L s s k q k ρ i k= λ k+ L ρ q k i ρ q i. L <, it follows: Lρ Lρ L r + ρ rρ L r + ρ rρ dr

226 6 Appendix A. Supplement to Chapter 6 = Then, by the DCT, ρ q ρ q = O. r q dr dρ dρ L A h q g Lρ Lρ ρ q ρ q r ρ L Lρ Lρ ρ q ρ q O Ω q R r ρ ρ q O Ω q r q dr ρ q LK ρ, u LK r + ρ θ rρ, u + v f x, x θ q ρ q du dθ dρ ω q dη r q dv dr ω q dξ dx ω q dx, because all the functions involved are continuous almost everywhere. R dρ dρ dρ dρ Turn now to the case q =. As before, the details of the case q = are explained in vi ix: vi t A = LK Ω R Ω R Ω R h, x z g st t s B x ξ T A x η LK h, y z g vii = f tx + t Ax η, z t dz dt ω dη s dy ds ω dξ dx ω dx h h LK ρ, x z Ω R Ω R Ω R g LK r + ρ h rρ rρ h r h ρ f B x ξ T A x η, y z g h ρx + h ρ h ρ A x η, z h ρ h ρ h dz dρ ω dη h r h r h dy dr ω dξ dx ω dx viii = h g LK f Ω R Ω h r + ρ h rρ R Ω h R LK ρ, u rρ h r h ρ h ρx + h ρ h ρ A x η, x ug B x ξ T A x η, u + v

227 A.. Technical lemmas 7 ρ h ρ du dρ ω dη r h r dv dr ω dξ dx ω dx h h vi = h g LK Ω R Ω R r + ρ h rρ + R LK ρ, u rρ h r h ρ, u + v f h ρx + h ρ h ρ B x ξ, x ug + LK r + ρ h rρ rρ h r h ρ, u + v f h ρx h ρ h ρ B x ξ, x ug ρ h ρ du dρ r h r dv dr ω dξ dx ω dx ix h g LK Ω R Ω R r + ρ + rρ, u + v + LK R LK ρ, u r + ρ rρ, u + v f x, x ρ du dρ r dv dr ω dξ dx ω dx = h g Rf LK r R R r + ρ + rρ, u + v + LK du dρ dv dr = h q g λ q L σ. The steps used for the calculus are the following: ρ LK ρ, u r + ρ rρ, u + v vi. Let x a fixed point in Ω q. For q =, let be the changes of variables y = sx + s Bx ξ, z = tx + t Ax η, ω dy = s q ds ω dξ, ω dz = t q dt ω dη, where s, t, and B x and A x are two semi-orthonormal matrices whose q columns are vectors that extend x to an orthonormal basis of R q+. Note that as q = and x T B x ξ = x T A x η =, then necessarily B x ξ = A x η or B x ξ = A x η. vii. Let be the changes of variables ρ = t and r = s. With this change, t = h h h ρ and s = h r. Then s = h r h r, t = h ρ h ρ and st s t B x ξ T A x η h = r + ρ h rρ rρ h r h ρ B x ξ T A x η.

228 8 Appendix A. Supplement to Chapter 6 viii. Use u = x z g and v = y x g. ix. A can be written as A = h q g Ω q R R R ϕ n x, x, r, ρ, u, v, ξ ϕ n x, x, r, ρ, u, v, ξ du dρ du dρ dv dr ω q dξ dx ω q dx, where ϕ n x, x, r, ρ i, u i, v, ξ = L ρ i K u i K u i + v f L L r + ρ i h rρ i + rρ i h r h ρ i x, x + α h,g + K u i + v f r + ρ i h rρ i rρ i h r h ρ i x, x + α h,g ρ i h ρ i r h r,h r,h ρ i, with α j h,g = h ρ i x + k j h ρ i h ρ i B x ξ, u i g and k =, k =. As before, by the Taylor s theorem, f x, x + α k h,g = fx, x + o ρ i + u i fx, x, where the order is uniform for all points x, x. By analogous considerations as for the case q, ϕ n x, x, r, ρ i, u i, v, ξ L ρ i L r + ρ i rρ i K u i K u i + v fx, x + o ρ i + u i fx, x ρ i r, r, ρ i = Ψx, x, r, ρ i, u i, v, Then the product of functions ϕ n is bounded by the respective product of functions Ψ, which is integrable, and by the DCT the limit commute with the integrals. Proof of A.6. E H n X, Z, X, Z can be decomposed in the sum of two terms: E Hn X,Z, X, Z = E LK n x, z, X, Z LK n x, z, X, Z dz ω q dx = E x, z, y, t E x, z, y, t dz ω q dx dt ω q dy = O B + B. The calculus of the orders of these terms is analogous to the calculus of A and A : B = E x, z, y, t dt ω q dy dz ω q dx

229 A.. Technical lemmas 9 x h q T u gλ q LLK h, z t fy, t dt ω q dy dz ω q dx g h 5q g 5 λ q L λ q L Rf, B = E x, z, y, t dt ω q dy dz ω q dx = h 8q g 8 λ q L 8 Rf. Then E H n X, Z, X, Z = O h 5q g 5. h 8q g 8 λ q L 8 fx, z fy, t dt ω q dy dz ω q dx Proof of A.7. The notation x, x, y, y, z, z and u, u for variables in Ω q R will be employed again: G n x, x, y, y = H n z, z, x, x H n z, z, y, y fz, z dz ω q dz Therefore: = { E G nx,z, X, Z = { fz, z dz ω q dz. } LK n u, u, x, xlk n u, u, z, z du ω q du } LK n u, u, y, ylk n u, u, z, z du ω q du { LK n u, u, x, xlk n u, u, z, z du ω q du } LK n u, u, y, ylk n u, u, z, z du ω q du fy, yfx, x dy ω q dy dx ω q dx. fz, z dz ω q dz Then, according to the expression of LK n, E G n X, Z, X, Z can be decomposed in 6 summands, which, in view of the symmetric roles of x, x and y, y can be reduced to 9 different summands. The first of all, C, is the dominant and has order O h 7q g 7. Again, the orders are computed using A. iteratively: { C = LK LK u T x h u T y h, u x u T z LK g h, u z du ω q du g u T x, u y g LK h, u x g fz, z dz ω q dz} fy, yfx, x dy ω q dy dx ω q dx du ω q du

230 Appendix A. Supplement to Chapter 6 λ q Lh q glk { x λ q Lh q T z glk h, x z g y T z h, y z } fz, z dz ω q dz g fy, yfx, x dy ω q dy dx ω q dx { y λ q L h q g λ q Lh q T x glk h, y x } fx, x g fy, yfx, x dy ω q dy dx ω q dx λ q L 6 h 6q g 6 λ q L RKh q gfx, xfx, x dx ω q dx = λ q L 6 λ q L RKh 7q g 7 Rf. The rest of them have order O h 8q g 8, something which can be seen by iteratively applying the Lemma A. as before. Proof of A.8. It suffices to apply the tower property, the Cauchy-Schwartz inequality, result E I n, = O n h 8 + g 8 from Lemma A. and A.6: E MnX, Z = c h,ql n g c h,ql n g E E E I n, H n X, Z, X, Z X, Z I H n, n X, Z, X, Z I n, E Hn X, Z, X, Z c h,ql n g E = O nh q g O n h 8 + g 8 = O n 6 h + g h q g. O h 5q g 5 A.. Testing independence with directional data Lemma A.5. Under conditions A A, nh q g T n, RKλ ql λ q L d nh q N, σ. g Proof of Lemma A.5. By the decomposition of I n in the proof of the Theorem 6., T n, = I n, + I n, and therefore by 6.9 and 6., T n, = E I n, + O P n h q g + σn h q g Nn, where N n is asymptotically a normal. On the other hand, by 6.9, E I n, = λ ql λ q L RK nh q g + O n

231 A.. Technical lemmas and then T n, = λ ql λ q L RK nh q g + σn h q g Nn + O P n h q g, because n h q g q = o nh g. As the last addend is asymptotically negligible compared with the second, nh q g T n, RKλ ql λ q L nh q g d N, σ. Lemma A.6. Under independence and conditions A A, E T n, = λ ql λ q L Rf Z nh q + RKRf X ng Var T n, = O n h q + g. + o n h q + g, Proof of Lemma A.6. The term T n, can be decomposed using the relation ˆf h x ˆf g z E ˆfh x E ˆfg z = S x, z + S x, z + S x, z, where: S x, z = S x, z = S x, z = ˆfh x E ˆfh x ˆfg z E ˆfg z, ˆfh x E ˆfh x E ˆfg z, ˆfg z E ˆfg z E ˆfh x. Hence, T n, = + + Sx, z dz ω q dx + S x, z dz ω q dx + S x, z dz ω q dx S x, zs x, z dz ω q dx + = T n, + T n, + T n, + T n, + T 5 n, + T 6 n,. S x, zs x, z dz ω q dx S x, zs x, z dz ω q dx To compute the expectation of each addend under independence, use the variance and expectation expansions for the directional and linear estimator see for example García-Portugués et al. b for both and relation 6.. Recall that due to assumption A it is possible to consider Taylor expansions on the marginal densities that have uniform remaining orders. E E T n, T n, = Var ˆfh x ω q dx Ω q R = O n h q g, = Var ˆfh x Ω q Var ˆfg z dz ω q dx E ˆfg z dz R

232 Appendix A. Supplement to Chapter 6 E T n, = λq L λ q L nh q + O n Rf Z + o = λ ql λ q L Rf Z nh q + o nh q, = Var ˆfg z dz ˆfh x ωq dx R RK = ng = RKRf X ng Ω q E + O n Rf X + o + o ng. The expectation of T n,, T 5 6 n, and T n, is zero because of the separability of the directional and linear components. Joining these results, E T n, = λ ql λ q L Rf Z nh q + RKRf X + o n h q + g, ng because n h q g = o n h q + g. Computing the variance is not so straightforward as the expectation and some extra results are needed. First of all, recall that by the formula of the variance of the sum, the Cauchy-Schwartz inequality and Lemma A., 6 6 Var T n, = Var T i n, = i= i= O Var Then the variance of each addend will be computed separately. For that purpose, recall that by the decomposition of the ISE given in Theorem 6., so by equations 6.9 and 6., T i n,. ˆfh,g x, z E ˆfh,g x, z dz ωq dx = I n, + I n,, Var I n, + I n, = O Var I n, + Var I n, = O n h q g + n h q g = O n h q g, E I n, + I n, = Var I n, + I n, + E I n, + I n, n h q g + nh q g = O = O nh q g. The marginal directional and linear versions of these relations will be required: E Ω q ˆfh x E ˆfh x ωq dx = O nh q,

233 A.. Technical lemmas Then: Var Var Var T n, T n, T n, Var E T E n, = E Ω q Ω q R ˆfg z E ˆfg z dz = O ng, ˆfh x E ˆfh x ωq dx Var R = O n h q, ˆfg z E ˆfg z dz = O n g. ˆfh x E ˆfh x ωq dx E = O nh q O ng = O n h q g, = Var = R Ω q ˆfh x E ˆfh x ωq dx E ˆfg z dz Var = O O n h q = O n h q, = Var = E Ω q R ˆfh x ωq dx = O O n g = O n g. = O E Ω q R R ˆfg z E ˆfg z dz E ˆfg z dz ˆfh x E ˆfh x ωq dx ˆfg z E ˆfg z dz E ˆfh x ωq dx Ω q Var ˆfg z E ˆfg z dz The next results follows from applying iteratively Cauchy-Schwartz and the previous orders: Var T T n, E n, E Sx, z dz ω q dx Sy, t dt ω q dy T T E n, E n, = O n h q, Var T 5 T T n, E n, E n, n g, Var T 6 n, T n, E T n, R

234 Appendix A. Supplement to Chapter 6 = O n. Therefore, the order of Var T n, is O n h q + g since it dominates O n h q g, O n h q + g and O n by assumption A. Lemma A.7. Under independence and conditions A A, E T n, = E T n, and Var T n, = O n h q + g. Proof of Lemma A.7. The term T n, can be split in a similar fashion to T n,. Let denote S x, z = ˆf h,g x, z E ˆfh,g x, z. Then: T n, = = S x, z S x, z + S x, z + S x, z dz ω q dx T n, + T n, + T n,. The key idea now is to use that, under independence, z Z LK n x, z, X, Z = L n x, X K n z, Z + L n x, X E K g x T X + K n z, Z E L, A. where L n and K n are the marginal versions of LK n : x T y x T X z t L n x, y = L h E L h, K n z, t = K E K g h z Z By repeated use of A. in the integrands of T n, and applying the Fubini theorem, it follows: E E T n, T n, = c h,ql n g = c h,ql n g E LK n x, z, X, Z L n x, X K n z, Z dz ω q dx E L n x, X K n z, Z + L n x, X K n z, Z E + L n x, X K n z, Z x T X E L dz ω q dx = c h,ql n g = E T n, = E, = c h,ql n g = c h,ql n g h E L n x, X E K n z, Z dz ω q dx S x, z dz ω q dx E LK n x, z, X, Z L n x, X E K E L n x, X K n z, Z E K z Z g z Z g K g z Z g dz ω q dx.

235 A.. Technical lemmas 5 E T n, + L n x, X K n z, Z E K + L n x, X E K = c h,ql n g = E = E T n,, = c h,ql n g = c h,ql n g z Z x T X E L g h z Z dz ω q dx g E L n x, X z Z E K dz ω q dx g S x, z dz ω q dx x T X E LK n x, z, X, Z K n z, Z E L dz ω q dx E L n x, X K n z, Z E L x T X x T X z Z + L n x, X K n z, Z E L h E K g + K n z, Z x T X E L h dz ω q dx = c h,ql n g E K n z, Z x T X E L h dz ω q dx = E S x, z dz ω q dx = E T n,. Then E T n, = E T n,. Computing the variance is much more tedious: the order obtained by bounding the variances by repeated use of the Cauchy-Schwartz inequality is not enough. Instead of, a laborious decomposition of the term T n, has to be done in order to compute separately the variance of each addend, by following the steps of Rosenblatt and Wahlen 99. The first step is to split the variance using the Cauchy-Schwartz inequality and Lemma A.: Var T n, = O Var T n, + Var T n, h + Var Each of the three terms will be also decomposed into other addends. To simplify their computation the following notation will be employed: CLK n x,z, X, Z ; x, z, X, Z x T = Cov LK X h, z Z, LK g T n, h. x T X h, z Z g and also its marginal versions: x T CL n x, X ; x, X = Cov L X x T, L X, h h

236 6 Appendix A. Supplement to Chapter 6 CK n z, Z ; z, Z = Cov K z Z g, K z Z Term T n,. To begin with, let examine T n, using the notation of LK n, L n and K n : T n, = c h,ql n g n n i= j= LK n x, z, X i, Z i L n x, X j E K g. z Z dz ω q dx. g where the double summation can be split into two summations a single sum plus the sum of the cross terms. Then, where: T, n, = T, n, = Var T n, = c h,ql n g O nvar T, n, LK n x, z, X, ZL n x, XE K + n Var LK n x, z, X, Z L n x, X E K T, n,, z Z dz ω q dx, g z Z dz ω q dx. g The first term is computed by Var T, T, n, E n, = O g E LK n x, z, X, ZL n x, X dz ω q dx = O g E L x T X h x T X LK h, z Z O h q g g O h q dz ω q dx = O g Ωq h LK r, t O h q g L r O h q R h q h r q r q g dz dr ω q dξ = O h q g, where the second equality follows from E LK x T X, z Z h g = O h q g and E L x T X h = O h q, and the third from applying the changes of variables of the proof of Lemma A.. The second addend is Var E LK n x, z, X, Z L n x, X T, n, E K E K LK n x, z, X, Z L n x, X z Z g z Z dz ω q dx dz ω q dx g

237 A.. Technical lemmas 7 = CLK n x, z, X, Z ; x, z, X, Z z Z z Z CL n x, X ; x, X E K E K g g dz ω q dx dz ω q dx CLK n x, z, X, Z ; x, z, X, Z dz ω q dx dz ω q dx O h q g, because CL n x, X ; x, X = O h q by Cauchy-Schwartz and the directional version of Lemma A., and E K z Z g = O g. Also, the integral of the covariance is CLK n x, z, X, Z ; x, z, X, Z dz ω q dx dz ω q dx x T = E LK X h, z Z LK g dz ω q dx dz ω q dx O h q g x T = LK y = O h, z t x T LK y g h h q g fy, t dt ω q dy dz ω q dx dz ω q dx O h q g, x T X h, z Z g, z t g as it follows that the order of the first addend is O h q g by applying i ix in the same way as in the calculus of A in Lemma A. recall that the square in A is not present here and therefore the order is larger. Then Var T, n, = O h q g and as a consequence, Var T n, = c h,ql n g O nh q g + n h q g = O n h q. A. Term T n,. This addend follows analogously from T n,, as the only difference is the swapping of the roles of the directional and linear components: T n, = c h,ql n g n n i= j= with the same decomposition that gives Var T n, x T X LK n x, z, X i, Z i K n z, Z j E L dz ω q dx, = c h,ql n g O nvar T, n, + n Var T, n, h, where: T, x T X n, = LK n x, z, X, ZK n z, ZE L h dz ω q dx,

238 8 Appendix A. Supplement to Chapter 6 T, x T X n, = LK n x, z, X, Z K n z, Z E L h dz ω q dx. Then, by similar computations to those of T n,, Var T, n, = O h q g, Var T, n, = O h q g and Var T n, = c h,ql n g O nh q g + n h q g = O n g. A. Term T n,. This is the hardest part, as it presents more combinations. As with the previous terms, T n, = c h,ql n g n n n i= j= k= LK n x, z, X i, Z i L n x, X j K n z, Z k dz ω q dx. and now the triple summation can be split into five summations Var T n, = c h,ql n 6 g + n Var O nvar T, n,, T, n, + n Var T,a n, + Var T,b n, + Var T,c n, where: T, n, = T,a n, = T,b n, = T,c n, = T, n, = LK n x, z, X, Z L n x, X K n z, Z dz ω q dx, LK n x, z, X, Z L n x, X K n z, Z dz ω q dx, LK n x, z, X, Z L n x, X K n z, Z dz ω q dx, LK n x, z, X, Z L n x, X K n z, Z dz ω q dx, LK n x, z, X, Z L n x, X K n z, Z dz ω q dx. The first term is computed by Var E T, n, = E LK n x, z, X, ZL n x, XK n z, Z dz ω q dx x T X LK h, z Z O h q g g x T X z Z L h O h q K g h = LK r, t O h q g L r O h q Ω q R O g dz ω q dx K t O g h q h r q r q g dr ω q dξ dz

239 A.. Technical lemmas 9 = O h q g, by the same arguments as for T, n,. The fifth addend is Var T, n, E LK n x, z, X, Z L n x, X K n z, Z = LK n x, z, X, Z L n x, X K n z, Z dz ω q dx dz ω q dx CLK n x, z, X, Z ; x, z, X, Z CL n x, X ; x, X CK n z, Z ; z, Z dz ω q dx dz ω q dx O h q g CLK n x, z, X, Z ; x, z, X, Z dz ω q dx dz ω q dx O h q g, again by the same arguments used for T,,a n,. It only remains to obtain the variance of T n,, T,b n, and T,c n,. The first one arises from Var T,a n, E LK n x, z, X, Z L n x, X K n z, Z = LK n x, z, X, Z L n x, X K n z, Z dz ω q dx dz ω q dx CLK n x, z, X, Z ; x, z, X, Z CL n x, X ; x, X CK n z, Z ; z, Z dz ω q dx dz ω q dx = O h q g, in virtue of the assumption of independence and the calculus of Var T, n,. The second one is Var T,b n, E LK n x, z, X, Z L n x, X K n z, Z = LK n x, z, X, Z L n x, X K n z, Z dz ω q dx dz ω q dx E LK n x, z, X, Z LK n x, z, X, Z L n x, X L n x, X CK n z, Z ; z, Z dz ω q dx dz ω q dx = O g E LK n x, z, X, Z L n x, X dz ω q dx = O h q g,

240 Appendix A. Supplement to Chapter 6 where the order of the expectation is obtained again using the change of variables described in the proof of Lemma A., E LK n x, z, X, Z L n x, X dz ω q dx x T X = E LK h, z Z O h q g g x T X L h O h q dz ω q dx h = E LK r, u O h q g L r O h q Ω q R h q h r q r q g du dr ω q dξ = O h q g. The variance of T,c n, is obtained analogously: Var T,c n, E LK n x, z, X, Z L n x, X K n z, Z = LK n x, z, X, Z L n x, X K n z, Z dz ω q dx dz ω q dx E LK n x, z, X, Z LK n x, z, X, Z K n z, Z K n z, Z CL n x, X ; x, X dz ω q dx dz ω q dx = O h q E LK n x, z, X, Z K n z, Z dz ω q dx = O h q g. Then, putting together the variances of T, n,, T,a n,, T,b n,, T,c n, and T, n,, it follows Var T n, = c h,ql n 6 g = c h,ql n 6 g = O O nh q g + n h q g + h q g + h q g + n h q g O n h q g n h q g. Finally, joining A., A. and A., Var T n, = O n h q g + O n h q + O n g = O n h q + g, which proves the lemma. A.

241 A.. Technical lemmas A.. Goodness-of-fit test for models with directional data Lemma A.8. Under H : f = f θ, with θ Θ unknown and conditions A A and A5 A6, nh q g p R n, and nh q g p R n,. Proof of Lemma A.8. Under the null f = f θ, for a known θ Θ. Term R n,. Using a first order Taylor expansion of fˆθ in θ, R n, = LKh,g fθ x, z fˆθx, z dz ω q dx T f θ x, z = LK h,g ˆθ θ dz ω q dx θ θ=θn ˆθ θ f θ x, z LK h,g dz ω q dx θ θ=θn = O P n O P = O P n, where θ n Θ is a certain parameter depending on the sample. The order holds because, on the one hand, ˆθ θ = OP n by assumption A6 and on the other, by A5 and Lemma A., f θ x, z LK h,g dz ω q dx θ θ=θn = f θ x, z dz ω q dx + o θ θ=θn = O P. Therefore, R n, = O P n and, by assumption A, nh q g R n, p. Term R n,. It follows also by a Taylor expansion of second order centred at θ : R n, = c h,ql ng n i= n LK n x, z, X i, Z i LK h,g fθ x, z fˆθx, z dz ω q dx = c h,ql T fx, z LK n x, z, X i, Z i LK h,g ˆθ θ ng i= θ θ=θ + ˆθ T fx, z θ θ θ T ˆθ θ dz ω q dx θ=θn c h,ql n LK n x, z, X i, Z i ˆθ θ fx, z LK h,g ng i= θ θ=θ + ˆθ θ fx, z F LKh,g θ θ T dz ω q dx θ=θn = ˆθ θ R n, + ˆθ θ R n,,

242 Appendix A. Supplement to Chapter 6 where A F stands for the Frobenious norm of the matrix A. By Lemma A. and assumption of A5, R i n, = O ch,q L n P LK n x, z, X i, Z i dz ω q dx, ng i= for i =,. As a consequence of this and of assumption A6, the first addend of R n, dominates the second. The proof now is based on proving that R n, = O P n using the Chebychev inequality and the fact that the integrand of R n, is deterministic. Now recall that E R n, i = and by the proof of A.5 in Lemma A., Var R n, = c h,ql ng E = c h,ql ng LK n x, z, X, Z dz ω q dx E LK n x, z, X, Z LK n y, t, X, Z dz ω q dx dt ω q dy = c h,ql ng E x, z, y, t E x, z, y, t dz ω q dx dt ω q dy = c h,ql ng O h q g = O n, so by the Chebychev inequality, R n, = O P n and as a consequence of A5, Rn, = O P n and nh q g p R n, follows. Lemma A.9. Under the alternative hypothesis 6.6 and conditions A A, A5 and A7, nh q g R p n, and nh q g R p n, R. Proof of Lemma A.9. The convergence in probability is obtained using the decompositions R n, = R n, + R n, and R n, = R n, + R n, + R n,. Terms R n, and R n,. The proofs of nh q g p R n, and nh q g p R n, are analogous to the ones of Lemma A.8 and follow just replacing assumption A6 by A7 and H by H P. Term R n,. Recall that E R n, = and its variance, using the same steps as in the proof of R n, in Lemma A.8, is Var R n, = c h,ql E n h q g x, z, y, t E x, z, y, t LK h,g x, zlk h,g y, t dz ω q dx dt ω q dy = c h,ql O h q g n h q g n = O h q g.

243 A.. Technical lemmas Then, R n, = O q P nh g and nh q g p R n,. Term R n,. Applying Lemma A., R n, = nh q g LK h,g x, z dz ω q dx = R + o nh q g and as a consequence nh q g R n, p R. Term R n,. Applying the Cauchy-Schwartz inequality: Therefore, A.. nh q g R n, R n, nh q g R n, = O P n O P = O P n, R n, = O q P nh g and nh q g R n, = O P h q g p. General purpose lemmas For the proofs of some lemmas, three auxiliary lemmas have been used. Lemma A.. Under conditions A A, for any function ϕ : Ω q R R that is uniformly continuous and bounded, the smoothing operator 6. satisfies sup LK h,g ϕx, z ϕx, z. A. x,z Ω n q R Thus, LK h,g ϕx, z converges to ϕx, z uniformly in Ω q R. Proof of Lemma A.. Let denote D n = LK h,g ϕx, z ϕx, z. Since ϕx, z can be written as c h,ql g Ω LK x T y q R, z t h g ϕy, t dz ωq dx, then D n = c h,q L g c h,ql g D n, + D n,, x T y LK h, z t ϕy, t ϕx, z dt ω q dy g x T y LK h, z t ϕy, t ϕx, z dt ω q dy g where: D n, = c h,ql x T y LK g A δ h, z t ϕy, t ϕx, z dt ω q dy, g D n, = c h,ql x T y LK g Ā δ h, z t ϕy, t ϕx, z dt ω q dy, g { } A δ = y, t Ω q R : max x T y, z t < δ, } A,δ = {y, t Ω q R : x T y < δ,

244 Appendix A. Supplement to Chapter 6 A,δ = {y, t Ω q R : z t < δ} and Āδ denotes the complementary set to A δ for a δ >. Recall that A δ = A,δ A,δ and as a consequence Āδ = Ā,δ Ā,δ. As stated in assumption A, the uniform continuity of the functions defined in Ω q R is understood with respect to the product Euclidean norm, that is x, z = x Ω q + z R, where Ω q = and R =. Nevertheless, given the equivalence between the product -norm and the product -norm, defined as x, z = max x Ωq, z R, and for the sake of simplicity, the second norm will be used in the proof. Then, by the uniform continuity of ϕ, it holds that for any ε >, there exists a δ > such that x, z, y, t Ω q R, x, z y, t < δ = ϕx, z ϕy, t < ε. Therefore the first term is dominated by D n, < ε c h,ql g A δ LK x T y h, z t dt ω q dy ε, g for any ε >, so as a consequence D n, = o uniformly in x, z Ω q R. For the second term, let consider the change of variables introduced in the proof of Lemma A. see Lemma of García-Portugués et al. b for a detailed derivation: { y = ux + u B x ξ, ω q dy = u q du ω q dξ, where u,, ξ Ω q and B x = b,..., b q q+ q is the semi-orthonormal matrix resulting from the completion of x to the orthonormal basis {x, b,..., b q }. Applying this change of variables and then using the standard changes of variables r = u for the first h addend and s = z t g second addend, it follows: D n, = c h,ql x T y LK g Ā δ h, z t ϕy, t ϕx, z dt ω q dy g c h,ql x T y LK g Ā,δ h, z t ϕy, t ϕx, z dt ω q dy g + c h,ql x T y LK g Ā,δ h, z t ϕy, t ϕx, z dt ω q dy g c { h,ql x T y sup ϕy, t LK g y,t Ā,δ h, z t dt ω q dy g x T y + LK Ā,δ h, z t } dt ω q dy g δ sup ϕy, t {c u h,q Lω q L u q du y,t h

245 A.. Technical lemmas 5 } + K s ds δg sup y,t ϕy, t } + K s ds δg { O { λ h,q L ω q q δ q = O O o + o = o, c h,q Lω q u q du sup Lrr q r q r δ /h } u q du sup Lrr q + o r δ /h by relation 6., the fact u q du < for all q and because by assumption A, λ q+ L <, which implies that lim r Lrr q =. Then, D n as n and this holds regardless the point x, z, since ϕ is uniformly continuous, so A. is satisfied and LK h,g ϕx, z converges to ϕx, z uniformly in Ω q R. Lemma A.. Under conditions A A, the bias and the variance for the directional-linear estimator in a point x, z Ω q R is given by E ˆfh,g x, z = fx, z + b ql tr H x fx, z h + q µ KH z fx, zg + o h + g, Var ˆfh,g x, z = λ ql λ q L RK nh q fx, z + o nh q g, g where the remainder orders are uniform. Proof of Lemma A.. The asymptotic expressions of the bias and the variance are given in García-Portugués et al. b. Recalling the extension of f in condition A, the partial derivative of f for the direction x and evaluated at x, z, that is x T x fx, z, is null: x T f + hx, z f x, z f x, z f x, z x fx, z = lim = lim =. h h h h Using this fact, it also follows that x T H x fx, zx =, since x T x xt x fx, z = x T x fx, z + H x fx, zx =. Therefore, the operator Ψ x f, x, z appearing in the bias expansion given in García-Portugués et al. b can be written in the simplified form Ψ x f, x, z = x T x fx, z + q fx, z x T H x fx, zx = q tr H xfx, z, because fx, z represents the directional Laplacian of f the trace of H x fx, z.

246 6 Appendix A. Supplement to Chapter 6 The uniformity of the orders, not considered in the above paper, can be obtained by using the extra-smoothness assumption A and the integral form of the remainder in the Taylor s theorem on f: fy + α fy = α T fy + αt Hfyα + R, with y x, z, α Ω q R and where the remainder has the exact form R = t q+ i,j,k= fx + tαα i α j α k dt q+ x i x j x k 6 M i,j,k= α i α j α k = o α T α, where M is the bound of the third derivatives of f and in the last equality it is used the second point of Lemma A.. Then the remainder does not depend on the point y x, z and following the proofs of García-Portugués et al. b the convergence of the bias and variance is uniform on Ω q R. Lemma A.. Let a n, b n and c n sequences of positive real numbers. Then: i. If a n, b n, then a n b n = o a n + b n. ii. If a n, b n, c n, then a n b n c n = o a n + b n + c n. iii. a i nb j n = O a k n + b k n, for any integers i, j such that i + j = k. iv. a n + b n k = O a k n + b k n, for any integer k. Proof of Lemma A.. The first statement follows immediately from the definition of o, a n b n = o a n + b n : lim n a n b n a n + b n = lim n b n + = a n =. For the second, suppose that, when n, a n = maxa n, b n, c n to fix notation. Then lim n a n b n c n a n + b n + c n lim n a n a n + b n + c n = lim n = a n + b n a + c n =. n a n Let C be a positive constant. The third statement follows from the definition of O, lim n a i nb j n a k n + b k n = lim n j i = an bn + bn an +, a n = o b n, +, b n = o a n, C j +C i, a n Cb n. Then the limit is bounded and a i nb j n = O a k n + b k n. The last statement arises as a consequence of this result and the Newton binomial: a n + b n k = k i= k a k i n b i n = i k i= k O a k n + b k n = O a k n + b k n. i

247 A.. Further results for the independence test 7 A. Further results for the independence test A.. Closed expressions Consider K and L a normal and a von Mises kernel, respectively. In this case RK = π, λ q L = π q and λ q L λ q L = π q. Furthermore, it is possible to compute exactly the form of the contributions of these two kernels to the asymptotic variance, resulting: { } γ q λ q L r q ρ q Lρϕ q r, ρ dρ dr = 8π q, { R R } KuKu + v du dv = 8π. Corollary A.. If Lr = e r and K is a normal density, then the asymptotic bias and variance in Theorem 6. are A n = q+ π q+ nh q g Rf Z q π q nh Rf X q π ng, σ I = 8π q+ RfX Rf Z. In addition, if f X = f vm ; µ, κ and f Z is the density of a N m, σ, then Rf X = π q+ κ q I q κi q κ and Rf Z = π σ. Proof of Corollary A.. The expressions for RK, Rf Z and R { R KuKu + v du} dv = 8π follow easily from the convolution properties of normal densities. The expressions for λ q L and λ q L can be derived from the definition of the Gamma function. Similarly, Rf X = C q κ γ q λ q L = e κxt µ ω q dx = C qκ q κ Ω q C q κ = I q π q+ I q κ κ, { 5 π, q =, q π q + Γ q Γ q, A. q >. For q = the contribution of the directional kernel to the asymptotic variance can be computed using A. and ρ e ρ± r rρ dρ = πe Φ r, where Φ is the cumulative distribution function of a N,. Then: { γ λ L r ρ Lρϕ r, ρ dρ} dr = γ λ L = π = 8π. { r e r r e r dr ρ e ρ rρ dρ + ρ e ρ+rρ dρ} dr For q >, the integral with respect to θ is computed from the definition of the modified Bessel function and the integral with respect to ρ is ρ q e ρ I q rρ dρ = q r q e r.

248 8 Appendix A. Supplement to Chapter 6 Using these two facts, it results: { γ q λ q L r q ρ q Lρϕ q r, ρ dρ} dr = q π q + q q Γ Γ { r q ρ q e r+ρ π q Γ rρ q I q rρ dρ} dr = q π q q { } Γ r q e r q r e dr = 8π q. A.. Extension to the directional-directional case Under the directional-directional analogue of condition A, that is, h q,n h q,n c, with < c <, the directional-linear independence test can be directly adapted to this setting, considering the following test statistic: T n = Ω q Ω q ˆfX,Y;h,h x, y ˆf X;h x ˆf Y;h y ωq dy ω q dx. Corollary A. Directional-directional independence test. Under the directional-directional analogues of conditions A A and under the null hypothesis of independence, where nh q hq d Tn A n N, σi, A n = λ q L λ q L λ q L λ q L nh q hq λ q L λ q L Rf Y nh q λ q L λ q L Rf X nh q, and σ I is defined as σ in Corollary 6. but with Rf = Rf X Rf Y. Further, if L and L are the von Mises kernel, A n = q q +q π +q nh q hq Rf Y q π q nh q Rf X q π q nh q and σ I = 8π q +q Rf X Rf Y. If f X and f Y are von Mises densities, Rf X and Rf Y are given as in Corollary A.. Proof of Corollary A.. The proof follows from adapting the proofs of Theorem 6. and Corollary A. to the directional-directional situation.

249 A.. Extended simulation study 9 A.. Some numerical experiments The purpose of this subsection is to provide some numerical experiments to illustrate the degree of misfit between the true distribution of the standardized statistic approximated by Monte Carlo and its asymptotic distribution, for increasing sample sizes. For simplicity, independence will be assessed in a circular-linear framework q =, with a vm,, for the circular variable and a N, for the linear one. Kernel density estimation is done using von Mises and normal kernels, as in Corollary A.. Sample sizes considered are n = 5 j k, j =,, k =, 5 see supplementary material for k =,,. The sequence of bandwidths is taken as h n = g n = n, as a compromise between fast convergence and numerical problems avoidance. Figure A. presents the histogram of values from nh q ng n T n A n for different sample sizes, jointly with the p-values of the Kolmogorov-Smirnov test for the distribution N, σi and of the Shapiro-Wilk test for normality. Both tests are significant, until a very large sample size close to 5, data is reached. It should be noted that, in practical problems, the use of the asymptotic distribution does not seem feasible, and a resampling mechanism for the calibration of the test is required. This issue is addressed in García-Portugués et al., considering a permutation approach. The reader is referred to the aforementioned paper for the details concerning the practical application. A. Extended simulation study Some technical details concerning the simulation study and further results are provided in this section. First, the simulated models considered will be described. For constructing the test statistic, parametric estimators as well as simulation methods are required. Different Maximum Likelihood Estimators MLE and simulation approaches have been considered, playing copulas a remarkable role in both problems see Nelsen 6 for a comprehensive review. Some details on the construction of alternative models and bandwidth choice will be also given, jointly with extended results showing the performance of the tests for circular-linear and circular-circular cases for different significance levels. A.. Parametric models Two collections of Circular-Linear CL and Circular-Circular CC parametric scenarios have been considered. The corresponding density contours can be seen in Figures 6. and 6. in the paper. For the circular-linear case, the first five models CL CL5 contain parametric densities with independent components and different kinds of marginals, for which estimation and simulation are easily accomplished. The models are based on von Mises, wrapped Cauchy, wrapped normal, normal, log-normal, gamma and mixtures of these densities. Models CL6 CL7 represent two parametric choices of the model in Mardia and Sutton 978 for cylindrical variables, which is constructed conditioning a normal density on a von Mises one. Models CL8 CL9 include two parametric densities of the semiparametric circular-linear model given in Theorem 5 of Johnson and Wehrly 978. This family is indexed by a circular density g that defines the underlying circular-linear copula density, allowing for flexibility both in the specification of the link density and the marginals. CL is the model given in Theorem of Johnson and Wehrly 978, which considers an exponential density conditioned on a von Mises. CL is

250 Appendix A. Supplement to Chapter 6 Density K S p value =. S W p value =. n = Density 6 8 K S p value =. S W p value =. n = Density 6 8 K S p value =. S W p value =. n = Density 6 8 K S p value =. S W p value =. n = Density 6 8 K S p value =.8 S W p value =.7 n = Density 6 8 K S p value =.8 S W p value =. n = Density 6 8 K S p value =.6 S W p value =.9 n = 5 Density 6 8 K S p value =.8 S W p value =.7 n = Figure A.: Comparison of the asymptotic and empirical distributions of nh q ng n T n A n for sample sizes n = 5 j k, j =,, k =,,, 5. Black curves represent a kernel estimation from simulations, green curves represent a normal fit to the unknown density and red curves represent the theoretical asymptotic distribution.

251 A.. Extended simulation study constructed considering the QS copula density of García-Portugués et al. a and cardioid and log-normal marginals. Finally, CL is an adaptation of the circular-circular copula density of Kato 9 to the circular-linear scenario, using an identity matrix in the joint structure and von Mises and log-normal marginals. The first models CC CC5 of the circular-circular case include also parametric densities with independent components and different kinds of marginals von Mises, wrapped Cauchy, cardioid and mixtures of them. Models CC6 CC7 represent two parametric choices of the sine model given by Singh et al.. This model introduces elliptical contours for bivariate circular densities and also allows for certain multimodality. Models CC8 CC9 are two densities of the semiparametric models of Wehrly and Johnson 979, which are based on the previous work of Johnson and Wehrly 978 and comprise as a particular case the bivariate von Mises model of Shieh and Johnson 5. Models CC CC are two parametric choices of the wrapped normal torus density given in Johnson and Wehrly 977, a natural extension of the circular wrapped normal to the circular-circular setting. Finally, CC employs the copula density of Kato 9 with von Mises marginals. Density name Expression Normal f N z; m, σ = πσ exp } { z m σ { } Log-normal f LN z; m, σ = z exp log z m πσ σ, z Gamma f Γ z; a, p = ap Γp zp e az, z Bivariate normal f N z, z ; m, m, σ, σ, ρ = exp { ρ z m σ πσ σ ρ + z m σ Von Mises f vm θ; µ, κ = πi κ exp {κ cosθ µ} Cardioid f Ca θ; µ, ρ = π Wrapped Cauchy f WC θ; m, σ = Wrapped Normal + ρ cosθ µ ρ π+ρ ρ cosθ µ f WN θ; µ, ρ = p= f N θ + πp; m, σ } ρz mz m σ σ Table A.: Notation for the densities described in Tables A. and A.. The notation and density expressions used for the construction of the parametric models are collected in Table A., whereas Tables A. and A. show the explicit expressions and parameters for the circular-linear and circular-circular models displayed in Figure 6.. Most of the circular densities considered in the simulation study are purely circular and hence not directional and their circular formulation has been used in order to simplify expressions. The directional notation can be obtained taking into account that x = cos θ, sin θ, y = cos ψ, sin ψ and µ = cos µ, sin µ. The distribution function of a circular variable with density f, with θ, π will be denoted by F θ = θ fϕ dϕ. A.. Estimation In the scenarios considered, for most of the marginal densities, MLE are available through specific libraries of R. For the normal and log-normal densities closed expressions are used and for the gamma density the fitdistr function of the MASS Venables and Ripley, library is

252 Appendix A. Supplement to Chapter 6 employed. The estimation of the von Mises parameters is done exactly for the mean and numerically for the concentration parameter, whereas for the wrapped Cauchy and wrapped normal densities the numerical routines of the circular Agostinelli and Lund, package are used. The MLE for the cardioid density are obtained by numerical optimization. Finally, the fitting of mixtures of normals and von Mises was carried out using the Expectation-Maximization algorithms given in packages normix Mächler, and movmf Hornik and Grün,, respectively. The fitting of the independent models CL CL5 and CC CC5 is easily accomplished by marginal fitting of each component. For models CL6 CL7, the closed expressions for the MLE given in Mardia and Sutton 978 are used. For models CL8 CL9, CL CL, CC8 CC9 and CC a two-step Maximum Likelihood ML estimation procedure based on the copula density decomposition is used: first, the marginals are fitted by ML and then the copula is estimated by ML using the pseudo-observations computed from the fitted marginals. This procedure is described in more detail in Section of García-Portugués et al. a. In models CL8 CL9 and CC8 CC9 the MLE for the copula are obtained by estimating univariate von Mises or mixtures of von Mises, whereas numerical optimization is required for the copula estimation. For models CC6 CC7 and CC CC, MLE can be also carried out by numerical optimization. Finally, MLE for model CL in Johnson and Wehrly 978 were obtained analytically: given the circular-linear sample {Θ i, Z i } n i=, ˆλ = Z Z Z c, ˆκ = ˆλ ˆλ Z and with Z = n ni= Z i and Z c = n ni= Z i cosθ i ˆµ. A.. Simulation n Z i sinθ i ˆµ =, i= Simulating from the linear marginals is easily accomplished by the built-in functions in R. The simulation of the wrapped Cauchy and wrapped normal is done with the circular library, the von Mises is sampled implementing the algorithm described in Wood 99 and the cardioid by the inversion method, whose equation is solved numerically. Sampling from the independence models is straightforward. Conditioning on the circular variable, it is easy to sample from models CL6 CL7 sample the circular observation from a von Mises and then the linear from a normal with mean depending on the circular, CL von Mises marginal and exponential with varying rate and CC6 CC7 using the properties detailed in Singh et al. and the inversion method. Simulation in CC CC is straightforward: sample from a bivariate normal and then wrap around, π by applying a modulus of π. Finally, simulation in two steps using copulas was required for models CL8 CL9, CL, CC8 CC9 and CC, where first a pair of uniform random variables U, V is sampled from the copula of the density and then the inversion method is applied marginally. See Section. of García-Portugués et al. a for more details. The simulation of the pair U, V was done by the conditional and inversion methods and, specifically, for the models based on the densities given by Johnson and Wehrly 978 and Wehrly and Johnson 979, a transformation method was obtained. It is summarized in the following algorithm. Algorithm A.. Let g be a circular density. A pair U, V of uniform variables with joint density c g u, v = πgπu ± v is obtained as follows:

253 A.. Extended simulation study Model Density Parameters Description CL fvmθ; µ, κ fn z; m, σ µ = π, κ =, m =, σ = Independent von Mises and normal CL fwcθ; µ, ρ fln z; m, σ µ = π, ρ = σ =, m = Independent wrapped Cauchy and log-normal CL pf vmθ; µ, κ + pfvmθ; µ, κ fγz; a, p µ = π, µ = 5π, κ = κ =, p = p = Independent mixture of von Mises, a =, p = and gamma CL fwnθ; m, σ m = π, σ = σ =, m =, σ =, pfn z; m, σ + pfn z; m, σ m =, p = p = CL5 pf vmθ; µ, κ + pfvmθ; µ, κ pfn z; m, σ + pfn z; m, σ µ = 5π, µ = 7π, κ =, κ =, m =, m =, σ =, σ = p = p =, p =, p = Independent wrapped normal and mixture of normals Independent mixture of von Mises and of normals CL6 fvmθ; µ, κ fn z; mθ, σ ρ ρ, with mθ = m + σκ {ρcosθ cosµ +ρsinθ sinµ} µ = π, κ =, m =, ρ = ρ = σ = See equation. of Mardia and Sutton 978 CL7 µ = π, κ = 5, m =, ρ =, ρ =, σ = CL8 π ; µg, θ CL9 g π θ π FN z; m, σ fn z; m, σ, with gθ = pgfvm θ; µg, κg +pgfvm θ; µg, κg fn z; m, σ m =, σ =, µg = 5π, κg = m =, σ = pg = pg =, µg = π, κg = κg =, µg = 5π See Theorem 5 of Johnson and Wehrly 978 considering a von Mises and a mixture of von Mises as the link functions CL CL CL λ κ exp { λz + κz cosθ µ} µ = π π { + πα cosπf Caθ; µ, ρ FN z; m, σ } fcaθ; µ, ρfn z; m, σ { π ρfvmθ; µ, κfln z; m, σ + ρ } ρ fvmθ; µ, κfln z; m, σ, κ =, λ = See Theorem of Johnson and Wehrly 978 µ = π, ρ = 9, m =, σ =, α = See equation 7 of García-Portugués π et al. a µ = π, κ =, m =, σ = ρ = See Section. in Kato 9 Table A.: Circular-linear models.

254 Appendix A. Supplement to Chapter 6 Model Density Parameters Description CC f π vmψ; µ, κ µ =, κ = Independent uniform and von Mises CC fvmθ; µ, κ fvmψ; µ, κ µ = π, κ =, µ = π, κ = Independent von Mises and von Mises CC fvmθ; µ, κ fwcψ; µ, ρ µ = π, κ =, µ = π, ρ = 7 Independent von Mises and wrapped Cauchy CC CC5 pf vmθ; µ, κ + pfvmθ; µ, κ fcaψ; µ, ρ µ =, κ = κ =, µ = π, µ =, ρ =, p = p = pf vmθ; µ, κ + pfvmθ; µ, κ µ =, κ = κ =, µ = π, µ = π, pfvmψ; µ, κ + pfvmψ; µ, κ κ = κ = 5, µ = 7π, p = p = p = p = Independent mixture von Mises and cardioid Independent mixture of von Mises and of von Mises CC6 C exp { κ cosθ µ + κ cosψ µ +λ sinθ µ sinψ µ } µ = 7π, 8 κ =, µ =, κ =, λ = See equation. of Singh et al. CC7 µ = µ =, κ = 5, κ =, λ = 5 CC8 fvm π F Caθ; µ, ρ ψ π ; µg, κg f Caθ; µ, ρ µ =, ρ =, µg = π, κg = 7 See equations and in Wehrly and Johnson 979 with a von CC9 pg π fvm θ + ψ; µg, κg + pgfvm θ + ψ; µg, κg µg = π, κg = κg =, µg = 7π, pg = pg = p= p= Mises and a mixture of von Mises as links CC m =, m = π, 6 σ =, σ =, ρ = See Example 7. in Johnson and fn θ + πp, ψ + πp; m, m, σ, σ, 977 { π ρfvmθ; µ, κfvmψ, µ, κ + ρ Section. of Kato 9 fvmθ; µ, κfvmψ; µ, κ CC ρ Wehrly m = m =, σ = σ =, ρ = 9 CC ρ } µ = π, κ = 5, µ =, κ =, ρ = See Table A.: Circular-circular models.

255 A.. Extended simulation study 5 i. Sample Ψ, a random variable with circular density g. ii. Sample V, a uniform variable in,. iii. Set U = Ψ πv mod π π. A.. Alternative models The alternative hypothesis for the goodness-of-fit test, both in the circular-linear and circularcircular cases, is stated as: H k,δ : f = δf k θ + δ, δ. Three mixing densities are considered, two for the circular-linear situation and one for the circular-circular: θ, z = f vm θ; µ, κ f N z; m, σ, θ, z = f vm θ; µ, κ f LN z; m, σ, θ, ψ = f vm θ; µ, κ f vm ψ; µ, κ, where µ = π, µ =, κ =, m =, σ = and m = σ =. To account for similar ranges in the linear data obtained under H k, and under H k,δ, is used in models CL, CL CL and CL, whereas in the other models. In the circular-circular case, the deviation for all models is. A..5 Bandwidth choice The delicate issue of the bandwidth choice for the testing procedure has been approached as follows. In the simulation results presented in Section 6.6, a fixed pair of bandwidths has been chosen based on a Likelihood Cross Validation criterion. Ideally, one would like to run the test in a grid of several bandwidths to check how the test is affected by the bandwidth choice. This has been done for six circular-linear and circular-circular models, as shown in Figure A.. Specifically, Figure A. shows percentages of rejections under the null δ =., green and under the alternative δ =.5, orange, computed from M = Monte Carlo samples for each pair of bandwidths the same collection of samples for each pair on a logarithmic spaced grid. The sample size considered is n = and the number of bootstrap replicates is B =. As it can be seen, the test is correctly calibrated regardless the bandwidths value. In fact, for all the models explored, the rejection rates for each pair of bandwidths in the grid are inside the 95% confidence interval of the proportion α =.5 this happens for 95.75% of the bandwidths in the grid. However, the power is notably affected by the choice of the bandwidths, with rather different behaviours depending on the model and on the alternative. Reasonable choices of the bandwidths based on an estimation criterion such as the one obtained by the median of the LCV bandwidths 6.7 lead in general to a competitive power. A..6 Further results Tables A. and A.5 collect the results of the simulation study for each combination of model CL or CC, deviation δ, sample size n and significance level α. When the null hypothesis

Lower surface represents the empirical rejection rate under H. and upper surface under H.5.

256 6 Appendix A. Supplement to Chapter 6 Figure A.: Empirical size and power of the goodness-of-fit tests for a grid of bandwidths. First two rows, from left to right and up to down: models CL, CL5, CL7, CL8, CL9 and CL. Last two rows: CC, CC5, CC7, CC8, CC9 and CC. Lower surface represents the empirical rejection rate under H. and upper surface under H.5. Green colour represent that the empirical rejection is in the 95% confidence interval of α =.5, blue that is lower and orange that is larger. Black points represent the sized and powers obtained with the median of the LCV bandwidths for model CC under H is outside the grid.

257 A.. Extended simulation study 7 Model Sample size n and significance level α n = n = 5 n = α=. α=.5 α=. α=. α=.5 α=. α=. α=.5 α=. H, H, H, H, H 5, H 6, H 7, H 8, H 9, H, H, H, H, H, H, H, H 5, H 6, H 7, H 8, H 9, H, H, H, H, H, H, H, H 5, H 6, H 7, H 8, H 9, H, H, H, Table A.: Empirical size and power of the circular-linear goodness-of-fit test for models CL CL with different sample sizes, deviations and significance levels.

258 8 Appendix A. Supplement to Chapter 6 Model Sample size n and significance level α n = n = 5 n = α=. α=.5 α=. α=. α=.5 α=. α=. α=.5 α=. H, H, H, H, H 5, H 6, H 7, H 8, H 9, H, H, H, H, H, H, H, H 5, H 6, H 7, H 8, H 9, H, H, H, H, H, H, H, H 5, H 6, H 7, H 8, H 9, H, H, H, Table A.5: Empirical size and power of the circular-circular goodness-of-fit test for models CC CC with different sample sizes, deviations and significance levels.

259 A.. Extended data application 9 holds, the level of the test is correctly attained for all significance levels, sample sizes and models. Under the alternative, the tests perform satisfactorily, having both of them a quick detection of the alternative when only a % and a 5% of the data come from a density not belonging to the null parametric family. Mean log burn area ψ 8º 9º º 9º 8º E/W NE/SW N/S NW/SE E/W 8º 9º º 9º 8º Fires mean orientation φ g h h, h LCV h, g LCV h.5. h. Figure A.: Upper row, from left to right: parametric fit model from Mardia and Sutton 978 to the circular mean orientation and mean log-burnt area of the fires in each of the watersheds of Portugal; parametric fit model from Fernández-Durán 7 for the dihedral angles of the alanine-alanine-alanine segments. Lower row: p-values of the goodness-of-fit tests for a grid, with the LCV bandwidth for the data. A. Extended data application The analysis of the two real datasets presented in Section 6.7 has been complemented by exploring the effect of different bandwidths in the test. To that aim, Figure A. shows the p-values computed from B = bootstrap replicates for a logarithmic spaced grid, as well as bandwidths obtained by LCV for each dataset. The graphs shows that there are no evidences against the model of Mardia and Sutton 978 for modelling the wildfires data and that the model used to describe the proteins dataset is not adequate. This model employs the copula

260 Appendix A. Supplement to Chapter 6 structure of Wehrly and Johnson 979 with marginals and link function given by circular densities based on NNTS, specifying Fernández-Durán 7 that the best fit in terms of BIC arises from considering three components for the NNTS s in the marginals and two for the link function. The fitting of the NNTS densities was performed using the nntsmanifoldnewtonestimation function of the package CircNNTSR Fernández-Durán and Gregorio-Domínguez,, which computes the MLE of the NNTS parameters using a Newton algorithm on the hypersphere. The two-step ML procedure described in Section A. was employed to fit first the marginals and then the copula. The resulting contour levels of the parametric estimate are quite similar to the ones shown in Figure 5 of Fernández-Durán 7. The dataset is available as ProteinsAAA in the CircNNTSR package. References Agostinelli, C. and Lund, U.. R package circular: circular statistics version.-7. Brown, B. M. 97. Martingale central limit theorems. Ann. Math. Statist., : Fernández-Durán, J. J. 7. Models for circular-linear and circular-circular data constructed from circular distributions based on nonnegative trigonometric sums. Biometrics, 6: Fernández-Durán, J. J. and Gregorio-Domínguez, M. M.. CircNNTSR: an R package for the statistical analysis of circular data using NonNegative Trigonometric Sums NNTS models. R package version.. García-Portugués, E., Barros, A. M. G., Crujeiras, R. M., González-Manteiga, W., and Pereira, J.. A test for directional-linear independence, with applications to wildfire orientation and size. Stoch. Environ. Res. Risk Assess., 85:6 75. García-Portugués, E., Crujeiras, R. M., and González-Manteiga, W. a. Exploring wind direction and SO concentration by circular-linear density estimation. Stoch. Environ. Res. Risk Assess., 75: García-Portugués, E., Crujeiras, R. M., and González-Manteiga, W. b. Kernel density estimation for directional-linear data. J. Multivariate Anal., :5 75. Hall, P. 98. Central limit theorem for integrated square error of multivariate nonparametric density estimators. J. Multivariate Anal., : 6. Hall, P. and Heyde, C. C. 98. Martingale limit theory and its application. Academic Press, New York. Hornik, K. and Grün, B.. movmf: mixtures of von Mises-Fisher distributions. R package version.-. Johnson, R. A. and Wehrly, T Measures and models for angular correlation and angularlinear correlation. J. Roy. Statist. Soc. Ser. B, 9: 9. Johnson, R. A. and Wehrly, T. E Some angular-linear distributions and related regression models. J. Amer. Statist. Assoc., 76:6 66.

261 References Kato, S. 9. A distribution for a pair of unit vectors generated by Brownian motion. Bernoulli, 5: Mächler, M.. normix: normal -d mixture models S classes and methods. R package version.-. Mardia, K. V. and Sutton, T. W A model for cylindrical variables with applications. J. Roy. Statist. Soc. Ser. B, :9. Nelsen, R. B. 6. An introduction to copulas. Springer Series in Statistics. Springer, New York, second edition. Olver, F. W. J., Lozier, D. W., Boisvert, R. F., and Clark, C. W., editors. NIST handbook of mathematical functions. Cambridge University Press, Cambridge. Rosenblatt, M. and Wahlen, B. E. 99. A nonparametric measure of independence under a hypothesis of independent components. Statist. Probab. Lett., 5:5 5. Shieh, G. S. and Johnson, R. A. 5. Inferences based on a bivariate distribution with von Mises marginals. Ann. Inst. Statist. Math., 57: Singh, H., Hnizdo, V., and Demchuk, E.. Probabilistic model for two dependent circular variables. Biometrika, 89:79 7. Venables, W. N. and Ripley, B. D.. Modern applied statistics with S. Statistics and Computing. Springer, New York, four edition. Wehrly, T. E. and Johnson, R. A Bivariate models for dependence of angular observations and a related Markov process. Biometrika, 67: Wood, A. T. A. 99. Simulation of the von Mises Fisher distribution. Commun. Stat. Simulat., :57 6. Zhao, L. and Wu, C.. Central limit theorem for integrated square error of kernel estimators of spherical density. Sci. China Ser. A, :7 8.

262 Appendix A. Supplement to Chapter 6

263 Appendix B Supplement to Chapter 7 This supplement is organized as follows. Section B. contains particular cases of the projected local estimator for the circular and spherical situations and their relations with polar and spherical coordinates. Section B. gives the technical lemmas used to prove the main results in the paper. Finally, Section B. provides further details about the simulation study and gives extra results omitted in the paper. B. Particular cases of the projected local estimator Some interesting cases and relations of the projected estimator are the following ones. B.. Local constant If p =, then W n x, X i = L h x,x i n j= L hx,x j and the Nadaraya-Watson estimator for directional predictor and scalar response, firstly proposed by Wang et al., is obtained: ˆm h, θ = ˆm h, x = i= n i= L h x, X i nj= L h x, X j Y i = n i= L x T X i h nj= x Y T X i. j h For q =, denoting x = cos θ, sin θ T, for θ, π the circular sample can be identified with a set of angles {Θ i } n i= and the usual notation for circular statistics applies. Then, the local constant estimator for circular data is given by { } n L cosθi θ h n exp cosθ i θ Y h i nj= L cosθj θ h Y i = where the second equality holds if L is the von Mises kernel. i= nj= exp { cosθ j θ h }, For q =, denoting x = sin φ cos θ, sin φ sin θ, cos φ T, for θ, π and φ, π, the spherical sample can be identified as the pairs of angles {Θ i, Φ i } n i=. Therefore, the local constant estimator for spherical data is given by n L sin φ sin Φi cosθ i θ cos φ cos Φ i h ˆm h, θ, φ = nj= sin φ sin Φj Y cosθ i= L j θ cos φ cos Φ i. j h

264 Appendix B. Supplement to Chapter 7 B.. Local linear with q = Let denote x = cos θ, sin θ T, for θ, π. The matrix B x is formed by the vector b = ± sin θ, cos θ T, which is the orthonormal vector to x. Then, by the sine subtraction formula B T x cos Θ i cos θ, sin Θ i sin θ T = ± sinθ i θ and as a consequence 7. can be expressed as n min Y i β β sinθ i θ cosθi θ c h, LL β R h i= B. and the solution 7.5 is given by the design matrix sinθ θ X x, =... sinθ n θ The resulting estimate is the local linear estimator proposed by Di Marzio et al. 9 for circular predictors and for circular kernels which are functions of κ cos θ the change in notation is κ /h. The equivalence of both estimators can be seen also from examining the equality of their design matrices and weights or from the Taylor expansions that motivate them. By the chain rule, it can be seen that the derivative of the regression function in the circular argument, as considered in Di Marzio et al. 9, is the same as the projected gradient of m: d dθ mθ = mx T xθ x= cos θ sin θ θ = mx T x= cos θ B x. sin θ Finally, if θ is close to Θ i in modulo π, then sinθ i θ Θ i θ and the linear coefficient of the local estimator captures indeed a linear effect of close angles in the response. The circular case of the local linear estimator in Di Marzio et al. is different from the circular projected local estimator and the one in Di Marzio et al. 9. The minimum weighted squares problem, using the tangent-normal decomposition ant translated to this paper s notation, is stated as follows: min β R n i= min β,β R Y i β β, β T η i ξ i ch, LL i= cosθi θ h, B. where η i is such that cos η i = x T X i and ξ i = X i x cos η i sin η i if sin η i and ξ i = ±X i otherwise. After considering the polar coordinates and doing some trigonometric algebra, it results that η i ξ i = θ Θ i sin θ, cos θ, so after identifying β = β sin θ + β cos θ, the minimization B. is equivalent to n ch, Y i β β Θ cosθi θ i θ LL. Provided that sinθ i θ Θ i θ for close angles, the practical difference between B. and B. relies only in small samples and for large bandwidths. Finally, it is possible to compute the exact expression for the estimator using the exact inversion formula of the matrix X T x,pw x X x,p as it is done in Wand and Jones 995, among others. This yields ˆm h, θ = s θt θ s θt θ s θs θ s θ, h

265 B.. Particular cases of the projected local estimator 5 where, for j =,,, n cosθi θ s j θ = L h i= n cosθi θ t j θ = L h i= sin j Θ i θ, sin j Θ i θy i. B.. Local linear with q = Let denote x = sin φ cos θ, sin φ sin θ, cos φ T, for θ, π and φ, π. Now the matrix B x is given by vectors b = ± x + x x, x, T and b = ± x + x x x, x x, x + T x if x if x =, then b = ±,, and b = ±,, complete the orthonormal basis. Therefore, after some trigonometric identities, B T x X i x = sin Φ i sinθ i θ, cos φ sin Φ i cosθ i θ + sin φ cos Φ i. As a consequence, the solution 7.5 is given by the design matrix X x, = sin Φ sinθ θ cos φ sin Φ cosθ θ + sin φ cos Φ... sin Φ n sinθ n θ cos φ sin Φ n cosθ n θ + sin φ cos Φ n The second and third columns are almost linear in the angles θ and φ, respectively: if θ is close to Θ i in modulo π and φ i is close to Φ i in modulo π, then cosθ i θ and hence sin Φ i sinθ i θ sin Φ i Θ i θ, so cos φ sin Φ i cosθ i θ + sin φ cos Φ i sinφ i φ Φ i φ. Furthermore, as happens with the circular case, the projected gradient of m used in the projected local estimator comprises naturally the estimator that follows from considering the function m defined throughout spherical coordinates and taking the derivatives on them: mθ, φ, θ mθ, φ φ = mx T sin φ cos θ x= sin φ sin θ cos φ xθ, φ θ φ = mx T x=. sin φ cos θ Bx. sin φ sin θ cos φ Finally, the exact expression for the estimator can also be obtained using the exact inversion formula of the matrix X T x,pw x X x,p. To that end, recall that s θ, φ s θ, φ s θ, φ X T x,pw x X x,p = c h, L s θ, φ s θ, φ s θ, φ, X T x,pw x Y = c h, L s θ, φ s θ, φ s θ, φ where, for j, k =,,, n sin φ sin Φi cosθ i θ cos φ cos Φ i s jk θ, φ = L h i= sin Φ i sinθ i θ j cos φ sin Φ i cosθ i θ + sin φ cos Φ i k, t θ, φ t θ, φ, t θ, φ

266 6 Appendix B. Supplement to Chapter 7 n sin φ sin Φi cosθ i θ cos φ cos Φ i t jk θ, φ = L h i= sin Φ i sinθ i θ j cos φ sin Φ i cosθ i θ + sin φ cos Φ i k Y i. Therefore, after some matrix algebra it turns out that ˆm h, θ, φ = s s s t θ, φ s s s s t θ, φ + s s s s t θ, φ s s s s θ, φ s s s s s θ, φ + s s s s s θ, φ. B. Technical lemmas B.. Projected local estimator properties Lemma B.. Under assumptions A A, for a random sample {X i, Y i } n i= statements hold with uniform orders for any point x Ω q : the following i. ˆfh x = fx + o P. ii. ni= n L h x, X i B T x X i x = bql q B T x fxh + o h + O h P nh. q iii. ni= n L h x, X i B T x X i xy i = O h + O h P nh. q iv. ni= n L h x, X i B T x X i xx i x T B x = bql q I q fxh + o P h T. v. ni= n L h x, X i X i x T B x B T x H m xb x B T x X i x = bql q tr H m x fxh +o P h. vi. n ni= L h x, X i B T x X i xx i x T B x B T x H m xb x B T x X i x = o P h. vii. ni= n L h x, X iσ X i = λql λ ql h σ xfx + o q P h q. viii. n ni= L h x, X ib T x X i xσ X i = o P h q. ix. n ni= L h x, X ib T x X i xx i x T B x σ X i = o P h q. Proof of Lemma B.. Chebychev s inequality, Lemma B.6 and Taylor expansions will be used for each statement in which the proof is divided. Proof of i. By Chebychev s inequality, ˆfh x = E ˆfh x + O P Var ˆfh x. It follows by Lemma B.6 that E ˆfh x = fx + o and that Var ˆfh x = nh q λ ql fx + o, with the remaining orders being uniform in x Ω q. Then, as f is continuous in Ω q by assumption A it is also bounded, so by A Var ˆfh x = o uniformly, which results in ˆf h x = fx + o P uniformly in x Ω q. Proof of ii. Applying Lemma B. and the change of variables r = t n E L h x, X i B T x X i x n i= = E L h x, XB T x X x h,

267 B.. Technical lemmas 7 x T y = c h,q L L Ω q h B T x y xfy ω q dy t = c h,q L L Ω q h ξf tx + t Bx ξ t q ω q dξ dt h = c h,q Lh L r ξf x + α x,ξ rh rh q ω q dξ dr Ω q = c h,q Lh q+ h L r r q rh q Ω q f x + α x,ξ ξ ω q dξ dr, B. where α x,ξ = rh x+ rh rh B x ξ. The inner integral in B. is computed by a Taylor expansion fx + α x,ξ = fx + α T x,ξ fx + O α T x,ξα x,ξ, B. where the remaining order involves the second derivative of f, which is bounded, thus being the order uniform in x. Using Lemma B.5, the first and second addends are: f x ξ ω q dξ =, Ω q α T x,ξ f x ξ ω q dξ = rh rh B x ξ T f x ξ ω q dξ Ω q Ω q = rh rh = ω q q q Ω q i,j= rh rh B T x fx. ξ i B T x f x ξ j ω q dξ The third addend is O α T x,ξ α x,ξ = O h, because B T x x = and B x ξ T B x ξ = ξ T I q ξ =. Therefore, B. becomes B. = c h,q Lh q+ ω q q h + c h,q Lh q+ h L r r q rh q dr B T x fx L r r q rh q dr O h = b q L + o b ql B T x fxh + O q = b ql B T x fxh + o h q h, B.5 where the second last equality follows from applying the Dominated Convergence Theorem DCT, 7. and the definition of b q L. See the proof of Theorem in García-Portugués et al. for the technical details involved in a similar situation. As the Chebychev inequality is going to be applied componentwise, the interest is now in the order of the variance vector. To that end, the square of a vector will denote the vector with correspondent squared components. By analogous computations, n Var L h x, X i B T x X i x n i=

268 8 Appendix B. Supplement to Chapter 7 n E L h x, X B T x X x = c h,ql x T y n h B T x y x fy ω q dy = c h,ql n = c h,ql h n L Ω q = c h,ql h q+ n = c h,ql h q+ n h = O nh q. L Ω q h h h t h ξ f tx + t Bx ξ t q ωq dξ dt Ω q L r ξ f x + α x,ξ L r r q rh q L r r q rh q O dr The result follows from Chebychev s inequality, B.5 and B.6. rh rh q ω q dξ dr Ω q f x + α x,ξ ξ ω q dξ dr B.6 Proof of iii. The result is proved form the previous proof and the tower property of the conditional expectation. The expectation can be expressed as E n n i= L h x, X i B T x X i xy i = E E L h x, XB T x X xy X = E L h x, XB T x X xmx x T y = c h,q L B T x y xmyfy ω q dy. Ω q L Then, replicating the proof of ii, it is easily seen that the order is O h. The order of the variance is obtained in the same way: n Var L h x, X i B T x X i xy i n n E L hx, XB T x X x Y i= h = L n E h x, XB T x X x σ X + mx = O h nh q As a consequence, ni= n L h x, X i B T x X i xy i = O h + O h P nh. q Proof of iv. The steps of the proof of ii are replicated: E n n L h x, X i B T x X i xx i x T B x i= x T y = c h,q L L Ω q h B T x y xy x T B x fy ω q dy.

269 B.. Technical lemmas 9 = c h,q L L Ω q t h ξξ T f tx + t Bx ξ t q ωq dξ dt h = c h,q Lh q+ L r r q rh q f x + α x,ξ ξξ T ω q dξ dr. Ω q B.7 The second integral of B.7 is obtained by expansion B. and Lemma B.5: f x ξξ T B x ω q dξ = ω q I q fx, Ω q q α T x,ξ f x ξξ T ω q dξ = rh x T ξξ T ω q dξ = O h T. Ω q Ω q As the third addend given by expansion B. has order O h T, it results that: h B.7 = c h,q Lh q+ L r r q rh q = b q L + o { q I qf x + O { ωq q } h T h } I q f x + O h T dr = b ql I q f x h + o h T, q B.8 using the same arguments as in ii. The order of the variance is n Var L h x, X i B T x X i xx i x T B x n i= n E L h x, X B T x X xx x T B x = c h,ql n = c h,ql n L Ω q = c h,ql h q+ n h = O nh q T. x T y L Ω q h h t h B T x y xy x T B x fy ωq dy ξξ T f tx + t Bx ξ t q + ω q dξ dt L r r q + rh q + Ω q f x + α x,ξ ξξ T ωq dξ dt The desired result now holds by B.8 and B.9, as O P h nh q T = o P h T by A. B.9 Proof of v. This is one of the most important results since it determines the dominant term of the bias of the local projected estimator. The expectation is E n n L h x, X i X i x T B x B T x H m xb x B T x X i x i= x T y = c h,q L L Ω q h y x T B x B T x H m xb x B T x y xfy ω q dy

270 5 Appendix B. Supplement to Chapter 7 = c h,q L L Ω q t h B x ξ T H m xb x ξf tx + t Bx ξ t q ω q dξ dt h = c h,q Lh q+ L r r q rh q f x + α x,ξ B x ξ T H m xb x ξ Ω q ω q dξ dr. B. The first addend of the Taylor expansion B. is computed using Lemma B.5 and the following relation of the trace operator: x T Ax = tr x T Ax = tr xx T A, for x a vector and A a matrix. Recall also that by definition of B x = b,..., b q q+ q, x T H m xx = by A. Then: f x B x ξ T H m xb x ξ ω q dξ = fx Ω q = fx ω q q q Ω q i,j= = fx ω q tr q q i= b ib T i = I q+ xx T and ξ i ξ j b T i H m xb j ω q dξ q b T i H m xb i i= q b i b T i H m x i= = fx ω q tr H m x. q The second and third addends have the same orders as in iv, so h B. = c h,q Lh q+ L r r q rh q = b ql tr H m x fxh + o h. q { ωq Using the square notation for vectors, the order of the variance is Var n n L h x, X i X i x T B x B T x H m xb x B T x X i x i= c h,ql x L T y n Ω q h = c h,ql h q+ h L r r q + rh q + n f x + α x,ξ B x ξ T H m xb x ξ ωq dξ dr Ω q = c h,ql h q+ n h = O nh q h q tr H m x fx + O h } dr y x T B x B T x H m xb x B T x y x fy ωq dy L r r q + rh q + O dr

271 B.. Technical lemmas 5 and the square root of this order can be merged into o P h. Proof of vi. Similarly to the previous proofs, the order of the bias is n E L h x, X i B T x X i xx i x T B x B T x H m xb x B T x X i x n i= = c h,q Lh q+ h = O L r r q+ rh q+ f x + α x,ξ ξξ T H m xξ ω q dξ dr Ω q h because by Lemma B.5 the first element in the Taylor expansion of the inner integral is exactly zero. The variance is n Var L h x, X i B T x X i xx i x T B x B T x H m xb x B T x X i x n i= h c h,ql h q+6 L r r q + rh q + n f x + α x,ξ ξξ T H m xξ ωq dξ dr Ω q h 6 = O nh q. Since O h + O P h nh q = o P h the result is proved. Proof of vii. Because of Lemma B.6 and 7.: n E L h x, X i σ X i = c h,q L n Var n i= Ω q L x T y σ yfy ω q dy h = c h,ql c h,q L σ xfx + o = λ ql λ q L h q σ xfx + o h q, n L h x, X i σ X i L n E h x, X σ X i= = c h,ql x L T y n Ω q h σ yfy ω q dy = c h,ql nc h,q L σ xfx + o = O nh q. The remaining order is o P h q because nhq h q = nh q by A.

272 5 Appendix B. Supplement to Chapter 7 Proof of viii. By 7. and Lemma B.6 applied componentwise, since the functions in the integrand are vector valued, it follows that E n Var n n L h x, X i B T x X i xσ X i i= x = c h,q L L T y Ω q h B T x y xσ yfy ω q dy = c h,ql c h,q L + o = o h q, n L h x, X i B T x X i xσ X i i= x c h,q L L T y Ω q h = c h,ql nc h,q L + o = o nh q. B T x y x σ yfy ω q dy Proof of ix. The proof is analogous to viii: using 7. and Lemma B.6 componentwise the statement is proved trivially. B.. Asymptotic results for the goodness-of-fit test Lemma B.. Under assumptions A A and A7, for a random sample {X i, Y i } n i= the following statements hold: i. Ω q ˆm h,p x L h,p mx fxwx ω q dx = O P n. ii. Ω q n i= L h x, X i gx i fxwx ω q dx = Ω q gx fxwx ω q dx + o +O P nh q + n. iii. Ω q ni= nj= L h x, X i L h x, X j σx i ε i gx j fxwx ω q dx = O P nh q. iv. ni= Ω q L h x, X i σx i ε i fxwx ω q dx= λql λ ql nh Ω q q σ xwx ω q dx + o + O P n h q. v. E Wijn = n ν + o, E W ijn W jkn W kln W lin = O n h q, E Wijn = O n h q, E W ijn Wikn W jkn = O n, where ν νθ is given in Theorem 7.. Proof of Lemma B.. The proof is divided for each statement. As in Lemma B., Chebychev s inequality and Lemma B.6 are used repeatedly. Proof of i. By Corollary 7., Ω q ˆm h,p x L h,p mxfxwx ω q dx

273 B.. Technical lemmas 5 = Ω q i= n L hx, X i Y i mx i fxwx ω q dx + o P. Using the properties of the conditional expectation, Fubini, relation 7. and Lemma B.6: E Ω q i= n L hx, X i Y i mx i fxwx ω q dx = E L hx, XE Y mx X fxwx ω q dx Ω q =, n Var L hx, X i Y i mx i fxwx ω q dx Ω q i= y T x = nh q L λ q L Ω q Ω q h σ x fxwxwy ω q dx ω q dy + o = σ y fywy ω q dy + o n Ω q = O n. Then Ω q ˆm h,p x L h,p mxfxwx ω q dx = O P n + op = O P n. Proof of ii. The integral can be split in two addends: Ω q n L h x, X i gx i fxwx ω q dx i= n x = n h q λ q L L T X i gxi wx i= Ω q h ω q dx fx x T X i x T X j gxi gx j wx + n h q λ q L L i j Ω q h L h ω q dx fx = I + I. Now, by applying Fubini, 7. and Lemma B.6, x E I = nh q λ q L E L T X wx Ω q h gx fx ω qdx x = nh q λ q L L T y gy wx Ω q Ω q h fy ω q dy ω q dx fx = λ ql Var I = O nh q λ q L nh q, n h q λ q L E Ω q gx wx ω q dx + o Ω q L x T X gx wx h fx ω q dx

274 5 Appendix B. Supplement to Chapter 7 = λ ql gy n h q λ q L Ωq wy ω q dy + o fy = O n h q and therefore I = O P nh q. On the other hand, by Lemma B.6 and the independence of X i and X j if i j: E I = n x T X wx h q λ q L E L Ω q h gx fx ω qdx = n gx fxwx ω q dx + o Ω q = gx fxwx ω q dx + o, Ω q E I x T X i x T X j y T X k = n h q λ q L E L i j k l Ω q Ω q h L h L h y T X l wxwy L h gx i gx j gx k gx l fxfy ω qdx ω q dy = O n h q Ωq x T X y T X E L Ω q h L h gx wxwy fxfy ω qdx ω q dy + O nh q Ωq x T X y T X L L gx E L x T X h E Ω q + n O n n h q λ q L E Ω q = O n h q Ωq L Ω q + O nh q Ωq = O + h h y T X wxwy gx E L h gx fxfy ω qdx ω q dy x T X wx L h gx fx ω qdx y T x h gx fx wxwy ω q dx ω q dy fy Ω q L O n E I n h q + O n + y T x gx gyfxwxwy ω q dx ω q dy h O n E I. Then Var I = E I E I = O n and I = Ω q gx fxwx ω q dx + o + O P n. Finally, I + I = gx fxwx ω q dx + o + O P Ω q nh q + n.

275 B.. Technical lemmas 55 Proof of iii. By the tower property of the conditional expectation and E ε X =, the expectation is zero. By the independece between ε s and E ε X =, the variance is n n Var L h x, X i L h x, X j ε i gx j fxwx ω q dx Ω q i= j= n x T X i x T X j y T X k = n h q λ q L E L i,j,k,l= Ω q Ω q h L h L h y T X l wxwy L h E ε i ε k X i, X k gx j gx l fxfy ω qdx ω q dy n x T X i x T X j y T X i = n h q λ q L E L i,j,l= Ω q Ω q h L h L h y T X l wxwy L h gx j gx l fxfy ω qdx ω q dy = n h q λ q L {I + I + I + I }, where, by repeated use of Lemma B.6: x I = n L T X h Ω q Ω q E nh q, L y T X h gx wxwy fxfy ω qdx ω q dy = O I = O n Ωq x T X y T X E L Ω q h L h x T X y T X wxwy E L h L h gx fxfy ω qdx ω q dy = O n h q, I = O n Ωq E Ω q L x T X h y T X y T X L gx E L gx wxwy fxfy ω qdx ω q dy = O n h q, I = O n Ωq x T X y T X y T X E L Ω q h L h E L h gx wxwy fxfy ω qdx ω q dy = O n h q. Because O n h q + n + n h q + n = O n h q by A, we have that n n nh L h x, X i L q h x, X j σx i ε i gx j fxwx ω q dx = O P. Ω q i= j= h h

276 56 Appendix B. Supplement to Chapter 7 Proof of iv. Let us denote I = Ω q ni= L h x, X i σx i ε i fxwx ω q dx. By the unit conditional variance of ε and the boundedness of E ε X, n EI = E L h x, X i σx i E ε i X i fxwx ω q dx i= Ω q x = nh q λ q L L T y σ ywx Ω q Ω q h fy ω q dy ω q dx fx = nh q λ q L Ω q c h,q L σ xwx ω q dx + o = λ ql λ q L nh q σ xwx ω q dx + o, Ω q E I n n = E L h x, X i σx i L h y, X j σx j E ε i ε j X i, X j i= j= Ω q Ω q fxfywxwy ω q dx ω q dy x = n h q λ q L L T X h Ω q Ω q E y L T X h σ XE ε X wxwy fxfy ω qdx ω q dy + n x n h q λ q L L T y σ ywx Ω q Ω q h fy ω q dx ω q dy fx = O n h q Ωq σ xwx ω q dx + O n E I fx = O n h q + O n E I. Then Var I = O n h q O n E I = O n h q and as a consequence I = λ ql λ q L n nh q σ xwx ω q dx + o + O P h q. Ω q Proof of v. The computation of E Wijn n h q x T X y T X = n h q λ q L E L Ω q Ω q h L h σ XE ε X wxwy fxfy ω qdx ω q dy x T z y T z = n h q λ q L h L h σ zfz ω q dz Ω q Ω q Ω q L wxwy fxfy ω qdx ω q dy B. is split in the cases where q and q =. For the first one, the usual change of variables given by Lemma B. is applied: y = sx + s Bx ξ, ω q dy = s q ω q dξ ds. B.

277 B.. Technical lemmas 57 Because q, it is possible also to consider an extra change of variables: z = tx + τb x ξ + t τ Bx A ξ η, ω q dz = t τ q ω q dη dt dτ, B. where t, τ,, t +τ <, η Ω q and A ξ = a,..., a q q q is the semi-orthonormal matrix resulting from the completion of ξ to the orthonormal basis {ξ, a,..., a q } of R q. This change of variables is obtained by a recursive use of Lemma B.: Ω q fz ω q dz = = f Ω q tx + t Bx ξ t q ω q dξ dt Ω q f tx + t Bx sξ + s Aξ η s q t q ω q dη ds dt = tx + τb x ξ + t τ Bx A ξ η = t +τ < Ω q f τ t q t +τ < Ω q f t τ q ω q dη dτ dt, t q ωq dη dτ dt tx + τb x ξ + t τ Bx A ξ η where in the third equality a change of variables τ = t s is used. The matrix B x A ξ of dimension q + q can be interpreted as the one formed by the column vectors that complete the orthonormal set {x, B x ξ} to an orthonormal basis in R q+. If the changes of variables B. and B. is applied first, after that the changes r = s h ρ = t, h θ = τ h ρ h ρ, t, τ ρ, θ = h ρ h ρ and are used and, denoting α x,ξ = rh x + rh rh B x ξ, β x,ξ = h ρx + h ρ h ρ θb x ξ + θ Bx A ξ η, then the following result is obtained employing the DCT see Lemma of García-Portugués et al. for technical details in a similar situation: B. = n h q λ q L Ω q Ω q t +τ < Ω q t y T tx + τb x ξ + t τ B x A ξ η L h L h σ tx + τb x ξ + t τ Bx A ξ η

278 58 Appendix B. Supplement to Chapter 7 = f tx + τb x ξ + t τ Bx A ξ η t τ q ω q dη dt dτ wxwy fxfy ω qdx ω q dy n h q λ q L Ω q Ω q t +τ < σ tx + τb x ξ + t τ Bx A ξ η f L Ω q t h st τ s L h tx + τb x ξ + t τ Bx A ξ η t τ q ω q dη dt dτ wxw sx + s B x ξ ω q dx s q fxf sx + s ω q dξ ds B x ξ h h = n λ q L L Ω q r + ρ h rρ θ Ω q Ω q L ρ rρ h r h ρ σ x + β x,ξ,η f x + β x,ξ,η θ q q ρ h ρ q wxw x + αx,ξ ω q dη dt dτ fxf x + α x,ξ ω q dx r q h r q ω q dξ dr + o = n λ q L L ρ L r + ρ θrρ σ x fx Ω q Ω q Ω q θ q q ρ q w x ω q dη dt dτ fx ω qdx r q q ω q dξ dr = + o ω q ωq q n λ q L { r q ρ q L ρ = n ν + o. For q =, define the change of variables: Ω q σ x wx ω q dx θ q L r + ρ θrρ dθ dρ} dr y = sx + s Bx ξ, z = tx + t Bx η, ω dy = s ω dξ ds, ω dz = t ω dη dt, where ξ, η Ω = {, }. Note that as q = and x T B x ξ = x T B x η =, then necessarily B x ξ = B x η or B x ξ = B x η. These changes of variables are applied first, later ρ = t and h finally r = s, using that: h st s t B x ξ T B x η h

279 B.. Technical lemmas 59 Finally, considering = r + ρ h rρ rρ h r h ρ B x ξ T B x η. α x,ξ = rh x + rh rh B x ξ, β x,η = ρh x + ρh ρh B x η, it follows by the use of the DCT: t y T tx + t B x η B. = n h λ q L L Ω Ω Ω h L h σ tx + t Bx η f tx + t Bx η t ω dη dt = = wxwy fxfy ω dx ω dy n h λ q L t L h L Ω Ω Ω st t s B x ξ T B x η σ tx + t Bx ξ f tx + t Bx ξ t ω dη dt wxw sx + s B x ξ ω dx s fxf sx + s ω dξ ds B x ξ h n λ q L L ρ L σ x + β x,η f Ω Ω r + ρ h rρ h Ω h rρ h r h ρ B x ξ T B x η x + β x,η ρ h ρ wxw x + αx,ξ ω dη dρ fxf x + α x,ξ ω dx r h r ω dξ dr = + o n λ q L L ρ L r + ρ rρ Bx ξ T B x η Ω Ω Ω σ x f x ρ wx ω dη dρ fx ω dx r ω dξ dr = ω + o n λ q L σ x wx ω dx Ω { ρ L ρ L r + ρ rρ + L r + ρ + rρ dρ} dr r = n ν + o.

280 6 Appendix B. Supplement to Chapter 7 The rest of the results are provided by the recursive use of Lemma B.6, bearing in mind that the indexes are pairwise different: E Wijn = n h q n 8 h 8q λ q L 8 E Ω q Ω q k= = O n h q Ω q = O n h q, E W ijn W jkn W kln W lin Ω q k= L x T k X h L x T k X h σ XE ε X σ 8 x fx 8 k= k= wx k fx k ω qdx k wx k fx k ω qdx k n h q = n 8 h 8q λ q L 8 x T E L X x T Ω q Ω q h L X h σ X x T E L X x T L X x σ T X E L X x T L X σ X h h x T E L X x T h L X 8 h σ wx k X fx k= k ω qdx k = O n h q x T x x T L x x T L x Ω q L x T x h Ω q L h σ x σ x fx fx fx fx h h wx k ω q dx k = O n h q, E W ijn WiknW jkn n h q = n 8 h 8q λ q L 8 x T E L X x T Ω q Ω q h L X h σ X x T E L X x T h L X x T h L X h σ XE ε wx k X fx k= k ω qdx k = O n h q x T x x Ω q h L T x x h L T x h σ 8 fx x fx fx fx = O n. L Ω q k= wx k ω q dx k k= h h Lemma B.. Under assumptions A A6 and A9, for a random sample {X i, Y i } n i= the following statements hold: i. ni= Ω q Wn p x, X i ˆε i Vi ˆfh xwx ω q dx = λql λ ql nh Ω q q σθ xwx ω q dx + o P + O P n h q.

281 B.. Technical lemmas 6 ii. n h q i j Ω q Wn p x, X i Wn p x, X j ˆε iˆε j ˆfh xwx ω q dx = ν θ + o P, E Wijn W jkn W kln W lin = OP n h q, E Wijn = OP n h q and E Wijn W ikn W jkn = O P n. Proof of Lemma B.. The proof is divided in the evaluation of each statement. Proof of i. Using that the Vi s are iid and independent with respect to the sample, n E Ω q i= W p nx, X i ˆε i V i = = Ω q i= Ω q i= ˆf h xwx ω q dx n Wnx, p X i Y i mˆθx i ˆfh xwx ω q dx n L hx, X i Y i m θ X i fxwx ω q dx + o P B. where the last equality holds because by assumptions A5 and A6, mˆθx m θ x = O P n uniformly in x Ω q. By applying the tower property of the conditional expectation as in iii from Lemma B., it is easy to derive from iv in Lemma B. that B. = λ ql λ q L nh q Ω q σ θ xwx ω q dx + o P. The order of the variance is obtained applying the same idea, i.e., first deriving the variance with respect to the Vi s and then applying the order computation given in the proof of iv in Lemma B. adapted via the conditional expectation: n Var L hx, X i Y i mˆθx i Vi fxwx ω q dx Ω q i= n = L hx, X i Y i mˆθx i fxwx ω q dx Var Vi i= Ω q = O n h q n x L T X i i= Ω q h Y i m θ X i wx fx ω qdx = O P n h q. The statement holds by Chebychev s inequality with respect to the probability law P. Proof of ii. First, by Corollary 7., the expansion for the kernel density estimate and the fact mˆθx m θ x = O P n uniformly in x Ωq, we have I n = n h q i j = I ijn + o P, i j Ω q W p n x, X i W p n x, X j ˆε iˆε j ˆfh xwx ω q dx

282 6 Appendix B. Supplement to Chapter 7 where I ijn = n h q Ω q Ω q L h x, X i L h x, X j L h y, X i L h y, X j Y i m θ X i Y j m θ X j fxfywxwy ω q dx ω q dy. By the tower property of the conditional expectation and iv in Lemma B., E I ijn = E Wijn = n νθ + o considering that the W ijn s are defined with respect to θ instead of θ. To p prove that I n νθ, consider Ĩn = i j I ijn and, by 7.8 and 7.9, Var Ĩn = E I ijn n n E I ijn = i j i j k l E WijnW kln n n E Wijn = W E n Var W n + o = Var W n Var W n E Wn = ν θ + o o + o = o, + o Ĩn because, as it was shown in the proof of Theorem 7., conditions b and d hold. Then, Ĩn E converges to zero in squared mean, which implies that it converges in probability and therefore I n = Ĩn + o P = Ĩn E Ĩn + ν θ + o + o P = νθ + o P, which proofs the first statement. Second, it follows straightforwardly that E Wijn = OP W ijn, E Wijn W jkn W kln W lin = O P W ijn W jkn W kln W lin and E Wijn W ikn W jkn = OP Wijn Wikn W jkn. The idea now is to use that, for a random variable X n and by the Markov s inequality, X n = E X n + O P E X n. The expectations of the variables are given in v from Lemma B.. The orders of the absolute expectations are the same: in the definition of W ijn the only factor with sign is ε i ε j, which is handled by the assumption of boundedness of E ε X. Therefore, Wijn = O P n h q, W ijn W jkn W kln W lin = O P n h q and W ijn Wikn W jkn = O P n, so the statement is proved. B.. General purpose lemmas Lemma B. Tangent-normal change of variables. Let f be a function defined in Ω q and x Ω q. Then Ω q fz ω q dz = Ω q f tx + t B x ξ t q ω q dξ dt, where B x = b,..., b q q+ q is the projection matrix given in Section 7.. Proof of Lemma B.. See Lemma of García-Portugués et al.. Lemma B.5. Set x = x,..., x q+ Ω q. For all i, j, k =,..., q +, Ω q x i ω q dx =, Ω q x i x j ω q dx = δ ij ω q q+ and Ω q x i x j x k ω q dx =.

283 B.. Further simulation results 6 Proof of Lemma B.5. Apply Lemma B. considering x = e i Ω q. Then Ω q x i ω q dx = ω q t t q dt = as the integrand is an odd function. As a consequence, and applying the same change of variables, for i j: Ω q x i x j ω q dx = t q dt e T j B x ξ ω q dξ =. Ω q For i = j, Ω q x i ω qdx = q q+ Ω q j= x j ω qdx = ωq q+. For the trivariate case, Ω q x i ω q dx = ω q Ω q x i x j ω q dx = Ω q x i x j x k ω q dx = t t q dt =, t t q dt using that the integrand is odd and the first statement. Ω q e T j B x ξ ω q dξ =, i j, t t q dt Ω q e T j B x ξe T k B x ξ ω q dξ =, i j k, Lemma B.6 Bai et al Let ϕ : Ω q R be a continuous function and denote L h ϕx = c h,q L Ω q L x T y ϕy ω h q dy. Under assumptions A A, L h ϕx = ϕx + o, where the remaining order is uniform for any x Ω q. Proof of Lemma B.6. This corresponds to Lemma 5 in Bai et al. 988, but with slightly different conditions and notation. Assumptions A and A imply conditions a, b, c and d stated in Theorem of the aforementioned paper. B. Further simulation results Some extra simulation results are given to provide a better understanding of the design of the simulation study presented in the paper and a deeper insight into the empirical performance of the goodness-of-fit tests for different significance levels and sample sizes. Graphical representations of the densities considered for the directional predictor X are shown in Figure B.. These densities aim to capture simple designs like the uniform and more challenging ones with holes in the support. The deviations from the null hypothesis, and, are shown in Figure B., jointly with the conditional standard deviation function used to generate data with heteroskedastic noise. The coefficients δ for obtaining deviations δ and δ in each scenario were chosen such that the density of the response Y = m θ X + δ X + σxε under H δ = and under H δ were similar. Figure B. shows the densities of Y under the null and the alternative for the four scenarios and dimensions considered. This is a graphical way of ensuring that the deviation is not trivial to detect and hence is not straightforward to reject H. Note that, due to the design of the deviations, it may be easier to detect them on a particular dimension. The empirical sizes of the test for significance levels α =.,.5,. are given in Figures B., B.5 and B.6, corresponding to sample sizes n =, 5 and 5. Nominal levels are respected in most scenarios. Finally, the empirical powers for n =, 5 and 5 are given in Figure

284 6 Appendix B. Supplement to Chapter 7 B.7 and, as it can be seen, the rejection rates increase with n. A final remark is that it seems harder to detect the alternative in S for q = due to the shape of the parametric model, and the design density Figure B.: From left to right: directional densities for scenarios S to S for circular and spherical cases. Figure B.: From left to right: deviations and and conditional standard deviation function σ for circular and spherical cases.

arxiv: v2 [stat.me] 26 Feb 2014

arxiv: v2 [stat.me] 26 Feb 2014 Exact risk improvement of bandwidth selectors for kernel density estimation with directional data Eduardo García Portugués arxiv:306.057v [stat.me] 6 Feb 04 Abstract New bandwidth selectors for kernel