Multivariate coefficient of variation for functional data

Similar documents
Curves clustering with approximation of the density of functional random variables

The S-estimator of multivariate location and scatter in Stata

A ROBUST METHOD OF ESTIMATING COVARIANCE MATRIX IN MULTIVARIATE DATA ANALYSIS G.M. OYEYEMI *, R.A. IPINYOMI **

ON HIGHLY D-EFFICIENT DESIGNS WITH NON- NEGATIVELY CORRELATED OBSERVATIONS

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Table of Contents. Multivariate methods. Introduction II. Introduction I

Robust Wilks' Statistic based on RMCD for One-Way Multivariate Analysis of Variance (MANOVA)

Minimum Regularized Covariance Determinant Estimator

Robust Exponential Smoothing of Multivariate Time Series

Functional Data Analysis & Variable Selection

Bayesian Decision Theory

Robust estimation of scale and covariance with P n and its application to precision matrix estimation

FAULT DETECTION AND ISOLATION WITH ROBUST PRINCIPAL COMPONENT ANALYSIS

arxiv: v1 [math.st] 22 Dec 2018

SHOTA KATAYAMA AND YUTAKA KANO. Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka , Japan

Lecture Notes 1: Vector spaces

Chapter 5 Prediction of Random Variables

Robust Estimation of Cronbach s Alpha

Covariance function estimation in Gaussian process regression

L5: Quadratic classifiers

A New Robust Partial Least Squares Regression Method

ON THE CALCULATION OF A ROBUST S-ESTIMATOR OF A COVARIANCE MATRIX

STATS306B STATS306B. Discriminant Analysis. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010

Detection of outliers in multivariate data:

Fast and Robust Discriminant Analysis

Lecture 13: Simple Linear Regression in Matrix Format

On the Identifiability of the Functional Convolution. Model

Direct Learning: Linear Classification. Donglin Zeng, Department of Biostatistics, University of North Carolina

Lecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices

Statistics 910, #5 1. Regression Methods

. a m1 a mn. a 1 a 2 a = a n

arxiv: v3 [stat.me] 2 Feb 2018 Abstract

MULTICHANNEL SIGNAL PROCESSING USING SPATIAL RANK COVARIANCE MATRICES

P -spline ANOVA-type interaction models for spatio-temporal smoothing

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Supplementary Material for Wang and Serfling paper

ROBUST ESTIMATION OF A CORRELATION COEFFICIENT: AN ATTEMPT OF SURVEY

Spatial Process Estimates as Smoothers: A Review

Robot Image Credit: Viktoriya Sukhanova 123RF.com. Dimensionality Reduction

EPMC Estimation in Discriminant Analysis when the Dimension and Sample Sizes are Large

Robustness of Principal Components

IMPROVING THE SMALL-SAMPLE EFFICIENCY OF A ROBUST CORRELATION MATRIX: A NOTE

odhady a jejich (ekonometrické)

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 4: Measures of Robustness, Robust Principal Component Analysis

A method for construction of Lie group invariants

Fractional integration and cointegration: The representation theory suggested by S. Johansen

Introduction to Statistical modeling: handout for Math 489/583

GLR-Entropy Model for ECG Arrhythmia Detection

Development of robust scatter estimators under independent contamination model

Sparse Covariance Selection using Semidefinite Programming

Pattern Recognition 2

Kriging models with Gaussian processes - covariance function estimation and impact of spatial sampling

Noisy Streaming PCA. Noting g t = x t x t, rearranging and dividing both sides by 2η we get

SIMULTANEOUS CONFIDENCE INTERVALS AMONG k MEAN VECTORS IN REPEATED MEASURES WITH MISSING DATA

Geodesic Convexity and Regularized Scatter Estimation

ELEMENTS OF PROBABILITY THEORY

Small Sample Corrections for LTS and MCD

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Algebra of Random Variables: Optimal Average and Optimal Scaling Minimising

Algebra of Random Variables: Optimal Average and Optimal Scaling Minimising

Accurate and Powerful Multivariate Outlier Detection

S estimators for functional principal component analysis

Fast and robust bootstrap for LTS

Web-based Supplementary Materials for Multilevel Latent Class Models with Dirichlet Mixing Distribution

Various Extensions Based on Munich Chain Ladder Method

More Linear Algebra. Edps/Soc 584, Psych 594. Carolyn J. Anderson

Support Vector Method for Multivariate Density Estimation

Estimation in Generalized Linear Models with Heterogeneous Random Effects. Woncheol Jang Johan Lim. May 19, 2004

Properties of the least squares estimates

Robust Methods for Multivariate Functional Data Analysis. Pallavi Sawant

Introduction to Robust Statistics. Anthony Atkinson, London School of Economics, UK Marco Riani, Univ. of Parma, Italy

Entropy of Gaussian Random Functions and Consequences in Geostatistics

Robust scale estimation with extensions

Stat 206: Sampling theory, sample moments, mahalanobis

Statistical Data Analysis

Lecture 9 SLR in Matrix Form

Machine Learning for Disease Progression

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

A Vector-Space Approach for Stochastic Finite Element Analysis

A Robust Approach to Regularized Discriminant Analysis

On some interpolation problems

EE731 Lecture Notes: Matrix Computations for Signal Processing

Effective Computation for Odds Ratio Estimation in Nonparametric Logistic Regression

S estimators for functional principal component analysis

Estimation of Parameters

Lawrence D. Brown* and Daniel McCarthy*

Regularized Discriminant Analysis and Its Application in Microarray

Multivariate Statistical Analysis

ECONOMETRIC METHODS II: TIME SERIES LECTURE NOTES ON THE KALMAN FILTER. The Kalman Filter. We will be concerned with state space systems of the form

EXTENDING PARTIAL LEAST SQUARES REGRESSION

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Lecture 11. Multivariate Normal theory

INFERENCE FOR MULTIPLE LINEAR REGRESSION MODEL WITH EXTENDED SKEW NORMAL ERRORS

Unconstrained Ordination

INDIAN INSTITUTE OF SCIENCE STOCHASTIC HYDROLOGY. Lecture -33 Course Instructor : Prof. P. P. MUJUMDAR Department of Civil Engg., IISc.

Estimation of large dimensional sparse covariance matrices

F9 F10: Autocorrelation

A robust functional time series forecasting method

Transcription:

Multivariate coefficient of variation for functional data Mirosław Krzyśko 1 and Łukasz Smaga 2 1 Interfaculty Institute of Mathematics and Statistics The President Stanisław Wojciechowski State University of Applied Sciences in Kalisz 2 Faculty of Mathematics and Computer Science Adam Mickiewicz University XLVIII International Biometrical Colloquium September 9-13, 201, Szamotuły Mirosław Krzyśko and Łukasz Smaga Multivariate coefficient of variation for functional data International Biometrical Colloquium 1 / 19

Coefficient of variation The coefficient of variation (CV) being the ratio of the standard deviation to the population mean is widely used relative variation measure. The CV is a dimensionless quantity, which can be expressed in percent. This quantity is usually used to compare the variability of several populations, even when they are characterized by variables expressed in different units as well as have really different means. In particular, the CV is often used to assess the performance or reproducibility of measurement techniques or equipments. Mirosław Krzyśko and Łukasz Smaga Multivariate coefficient of variation for functional data International Biometrical Colloquium 2 / 19

Multivariate coefficient of variation For multivariate data, computing the CV for each variable is a common practice, although this ignores the correlation between them, and this does not summarize the variability of the multivariate data into a single index. The known multivariate extensions of the CV are less considered in the literature. Perhaps it is due to the fact that generalizing the univariate CV to the multivariate setting is not straightforward, and the multivariate CV s do not generally measure the same quantity, when the number of variables is greater than one. Nevertheless, they all reduce to the CV in the univariate case. Mirosław Krzyśko and Łukasz Smaga Multivariate coefficient of variation for functional data International Biometrical Colloquium 3 / 19

Multivariate coefficient of variation Let X = (X 1,..., X p ) be p-dimensional random vector with mean vector u 0 p and covariance matrix Σ. The multivariate coefficients of variation (MCV) by Reyment (190), Van Valen (19), Voinov and Nikulin (199, p. ) and Albert and Zhang (2010) are: (det Σ) MCV R = 1/p u, MCV VV = u respectively. trσ u u, MCV VN = 1 u Σ 1 u, MCV AZ = u Σu (u u) 2, The MCV R and MCV VV are based on the generalized variance det Σ and the total variance trσ, respectively. In the MCV VN, the Mahalanobis distance u Σ 1 u appears to be a natural extension of the CV. Finally, the MCV AZ is derived based on a matrix generalizing the square of the CV. Mirosław Krzyśko and Łukasz Smaga Multivariate coefficient of variation for functional data International Biometrical Colloquium / 19

Functional multivariate coefficient of variation We have: Var(u X) MCV AZ =, (1) u where u = u/ u, i.e., MCV AZ is the univariate coefficient of variation for u X. Let X(t) = (X 1 (t),..., X p (t)), t [a, b], a, b R be p-dimensional random process with mean function µ(t) = (µ 1 (t),..., µ p (t)) 0 p. We also assume that X(t), t [a, b] belongs to the Hilbert space L p 2 [a, b] of p-dimensional vectors of square integrable functions on [a, b]. Let, and denote the inner product and the norm in L p 2 [a, b]. Definition 1 The functional multivariate coefficient of variation for X(t), t [a, b] is defined as follows (µ (t) = µ(t)/ µ, t [a, b]): FMCV = Var( µ, X ). (2) µ Mirosław Krzyśko and Łukasz Smaga Multivariate coefficient of variation for functional data International Biometrical Colloquium / 19

Functional multivariate coefficient of variation Theorem 1 If X i (t), t [a, b], i = 1,..., p, are square integrable, i.e., b E X i 2 = E Xi 2 (t) dt <, a then Var( µ, X ) exists. Furthermore, the FMCV defined in (2) is the CV of the random variable µ, X. Mirosław Krzyśko and Łukasz Smaga Multivariate coefficient of variation for functional data International Biometrical Colloquium / 19

Functional multivariate coefficient of variation Let X(t) belong to a finite dimensional subspace L p 2 [a, b] of Lp 2 [a, b], where the components of X(t) can be represented by a finite number of basis functions, i.e., B k X k (t) = α kl ϕ kl (t), (3) l=1 where k = 1,..., p, t [a, b], B k N, α kl are random variables with finite variance and {ϕ kl } l=1, k = 1,..., p are bases in the space L1 2 [a, b]. The equations (3) can be expressed in the following matrix notation: X(t) = Φ(t)α, () where ( ) Φ(t) = diag ϕ 1 (t),..., ϕ p (t) is the block diagonal matrix of ϕ k (t) = (ϕ k1(t),..., ϕ kbk (t)), k = 1,..., p and α = (α 11,..., α 1B1,..., α p1,..., α pbp ). Mirosław Krzyśko and Łukasz Smaga Multivariate coefficient of variation for functional data International Biometrical Colloquium / 19

Functional multivariate coefficient of variation ( a Var J Φ α ) J Φ 1/2 a FMCV = J 1/2 Φ a = a J Φ ΣαJ Φ a (a J Φ a) 2, () where a = E(α), Σα = Cov(α) and J Φ = diag(jϕ 1,..., Jϕ p ) and Jϕ k = b a ϕ k(t)ϕ k (t) dt is the B k B k cross product matrix corresponding to the basis {ϕ kl } l=1, k = 1,..., p. Theorem 2 Under the above assumptions and notation, the functional multivariate coefficient of variation for the random process X(t), t [a, b], is the multivariate coefficient of variation of Albert-Zhang type for random vector J 1/2 Φ α, if the matrix J1/2 Φ exists. Although the FMCV is defined for univariate and multivariate functional data, we note that even when p = 1, the FMCV reduces to the MCV AZ (no to the CV), since B 1 > 1 usually. Mirosław Krzyśko and Łukasz Smaga Multivariate coefficient of variation for functional data International Biometrical Colloquium / 19

Estimation In practice, we have to estimate the unknown vector α in () as well as its parameters a and Σα appearing in the FMCV given in (). Let x 1 (t),..., x n (t), t [a, b] be a random sample containing realizations of the process X(t). These observations are represented similarly as in (), i.e., where t [a, b] and i = 1,..., n. x i (t) = Φ(t)α i, Then, the vectors α i, i = 1,..., n can be estimated by the least squares method or the roughness penalty approach. The expansion lengths B k in (3) can be selected deterministically or by using information criteria as the Akaike and Bayesian information criteria. Mirosław Krzyśko and Łukasz Smaga Multivariate coefficient of variation for functional data International Biometrical Colloquium 9 / 19

Estimation Using the estimators of α i, say ˆα i, i = 1,..., n, we can estimate the mean vector a and the covariance matrix Σα. The classical estimators are the sample mean and the sample covariance matrix, i.e., â = 1 n ˆα i, ˆΣα = 1 n ( ˆα i â)( ˆα i â). () n n i=1 i=1 However, these estimators may break down, when the data contain outliers. Thus, many authors recommend the use of robust estimators of location and scatter in the presence of outlying observations. Similarly to Aerts et al. (201), we will mainly use the two commonly used ones, i.e., the minimum covariance determinant (MCD) estimator and the S-estimator. Mirosław Krzyśko and Łukasz Smaga Multivariate coefficient of variation for functional data International Biometrical Colloquium 10 / 19

Estimation For a given breakdown point α, the MCD estimator is based on a subset of { ˆα 1,..., ˆα n } of size h = n(1 α) minimizing the generalized variance (i.e., the determinant of covariance matrix) among all possible subsets of size h. Then, the MCD estimators of a and Σα are the sample mean and the sample covariance matrix (multiplied by a consistency factor) computed from this subset. The location and scatter S-estimators are the vector a n and the positive definite symmetric matrix Σ n which minimizes det(σ n ) subject to 1 n n i=1 ( ) ρ ( ˆα i a n ) Σ 1 n ( ˆα i a n ) = b 0, where ρ : R [0, ) is a given non-decreasing and symmetric function (e.g., Tukey s biweight) and b 0 a constant needed to ensure consistency of the estimator. The (classical and robust) estimators of the FMCV are obtained by substituting the parameters a and Σα in () by their estimators â and ˆΣα. Mirosław Krzyśko and Łukasz Smaga Multivariate coefficient of variation for functional data International Biometrical Colloquium 11 / 19

Simulation studies The functional sample x 1 (t),..., x n (t) of size n = 100 contains realizations of X(t), t [0, 1] with p =. These observations are generated as follows: x i (t j ) = Φ(t j )α i + ɛ ij, where i = 1,..., n, t j, j = 1,..., 0 are equally spaced design time points in [0, 1], the matrix Φ(t) contains basis functions with B k =, k = 1,..., p, α i are p-dimensional random vectors, and ɛ ij = (ɛ ij1,..., ɛ ijp ) are the measurement errors such that ɛ ijk N(0, 0.02r ik ) and r ik is the range of the k-th row of (Φ(t 1 )α i... Φ(t 0 )α i ), k = 1,..., p. We use the Fourier and B-spline bases. The vectors α i, i = 1,..., n were generated from multivariate normal or t - distributions with mean a and covariance matrix Σα. Similarly to Aerts et al. (201), we set a = a 1 := ae 1 or a = a 2 := (a/(p) 1/2 )1 p and Σα = (1 ρ)i p + ρ1 p 1 p, where a is chosen to get a given value of the FMCV, e 1 = (1, 0,..., 0) and ρ = 0, 0., 0.. We set FMCV = 0.1, 0., 0.9. Moreover, to obtain uncontaminated and contaminated functional data, ε% of the observations are generated with 10Σα, where ε = 0, 10, 20, 30, 0, 0. Mirosław Krzyśko and Łukasz Smaga Multivariate coefficient of variation for functional data International Biometrical Colloquium 12 / 19

Simulation studies Fourier basis B spline basis 30 20 10 0 10 20 2 0 2 0 0.2 0. 0. 0. 1.0 Time 0 0.2 0. 0. 0. 1.0 Time Exemplary realizations of the first functional variables of simulated data with 10% of outlying observations. The uncontaminated (resp. contaminated) data are depicted in gray (resp. black). Mirosław Krzyśko and Łukasz Smaga Multivariate coefficient of variation for functional data International Biometrical Colloquium 13 / 19

Simulation studies - mean squared error class. MCD S class. MCD S class. MCD S ε FMCV = 0.1 FMCV = 0. FMCV = 0.9 F 0 0.0001 0.0002 0.0001 0.0209 0.031 0.0212 0.119 0.1 0.121 10 0.0019 0.0002 0.0002 0.12 0.03 0.030 0.23 0.13 0.12 20 0.003 0.0002 0.000 0.2919 0.03 0.019 1.13 0.1909 0.220 30 0.009 0.0003 0.0019 0.902 0.012 0.11 1.9 0.213 0.2 0 0.01 0.0011 0.009 0.30 0.093 0.321 2.39 0.12 1.1 0 0.020 0.00 0.010 0.99 0.33 0.3 2.91 1.331 2.02 B 0 0.0002 0.000 0.0002 0.033 0.01 0.01 0.20 0.392 0.233 10 0.0023 0.000 0.0003 0.213 0.0 0.02 1.00 0.33 0.32 20 0.000 0.000 0.000 0.0 0.0 0.02 2.0930 0.32 0.9 30 0.0109 0.000 0.0022 0.9 0.01 0.12 3.223 0.39 0.922 0 0.01 0.0013 0.00 1.19 0.13 0..33 0. 2.2992 0 0.022 0.00 0.013 1.0 0.0 0.93.3330 2.300.19 MSE s are usually similar for different bases, but greater differences can appear for greater FMCV. Moreover, MSE for the B-spline basis is often greater than for the Fourier basis. Mirosław Krzyśko and Łukasz Smaga Multivariate coefficient of variation for functional data International Biometrical Colloquium 1 / 19

Application to ECG data We consider the ECG data set originated from Olszewski (2001). During each of 200 heartbeats, two electrodes were used to measure the ECG. For one heartbeat and one electrode, the ECG was measured in 12 design time points, and the resulting values of ECG form a curve, which can be treated as discrete functional observation. We have 200 two-dimensional discrete functional data observed in 12 design time points (n = 200, p = 2, m i = 12, i = 1,..., n). The heartbeats were assigned to normal or abnormal group. Abnormal heartbeats are representative of a cardiac pathology known as supraventricular premature beat. The normal and abnormal groups consist of 133 and functional observations, respectively. Mirosław Krzyśko and Łukasz Smaga Multivariate coefficient of variation for functional data International Biometrical Colloquium 1 / 19

Application to ECG data First variable normal First variable abnormal 00 200 0 200 00 00 200 0 200 00 0 0 100 10 Second variable normal 0 0 100 10 Second variable abnormal 00 200 0 200 00 00 200 0 200 00 0 0 100 10 0 0 100 10 Mirosław Krzyśko and Łukasz Smaga Multivariate coefficient of variation for functional data International Biometrical Colloquium 1 / 19

Application to ECG data The ECG database was used to discriminate between normal and abnormal heartbeats (Olszewski, 2001). For illustrative purposes, we show that this has also sense from variability point of view. To do this, we compute the FMCV for both normal and abnormal heartbeats separately. The basis functions representation of the data was obtained by using the Fourier and B- spline bases and B 1 = B 2 =,, 9, 11, 13, 1, if it was possible. To estimate the FMCV, we used the same estimators as in simulation experiments, i.e., the classical, MCD and S estimators, as well as the the pairwise estimator (OGK). The standard errors (SE) were obtained by the bootstrap method, based on 1000 bootstrap samples. Mirosław Krzyśko and Łukasz Smaga Multivariate coefficient of variation for functional data International Biometrical Colloquium 1 / 19

Application to ECG data FMCV SE 0.1 0.2 0.3 0. 0. 0. 0. 3 3 3 3 3 1 3 12 12 1 12 1 2 2 2 2 1 2 2 2 12 2 1 1 1 1 3 3 3 3 3 3 0.02 0.0 0.0 0.0 3 3 1 1 2 2 2 12 1 12 1 2 3 12 2 23 1 12 1 12 2 1 3 3 9 11 13 1 9 11 13 1 Number of basis functions Number of basis functions normal - solid line, abnormal - dashed line, 1 - classical, Fourier, 2 - classical, B-spline, 3 - MCD, Fourier, - MCD, B-spline, - S, Fourier, - S, B-spline, - OGK, Fourier, - OGK, B-spline Mirosław Krzyśko and Łukasz Smaga Multivariate coefficient of variation for functional data International Biometrical Colloquium 1 / 19

References 1 Aerts, S., Haesbroeck, G., Ruwet, C. (201). Multivariate coefficients of variation: comparison and influence functions. Journal of Multivariate Analysis 12, 13 19. 2 Albert, A., Zhang, L. (2010). A novel definition of the multivariate coefficient of variation. Biometrical Journal 2,. 3 Olszewski, R.T. (2001). Generalized Feature Extraction for Structural Pattern Recognition in Time-Series Data. Ph.D. Thesis, Carnegie Mellon University, Pittsburgh, PA. Reyment, R.A. (190). Studies on Nigerian Upper Cretaceous and Lower Tertiary Ostracoda: part 1. Senonian and Maastrichtian Ostracoda. Stockholm Contributions in Geology, 1 23. Van Valen, L. (19). Multivariate structural statistics in natural history. Journal of Theoretical Biology, 23 2. Voinov, V.G., Nikulin, M.S. (199). Unbiased Estimators and Their Applications, Vol. 2, Multivariate Case. Kluwer, Dordrecht. Thank you for your attention Mirosław Krzyśko and Łukasz Smaga Multivariate coefficient of variation for functional data International Biometrical Colloquium 19 / 19