Independent Component (IC) Models: New Extensions of the Multinormal Model

Similar documents
Invariant coordinate selection for multivariate data analysis - the package ICS

Scatter Matrices and Independent Component Analysis

Characteristics of multivariate distributions and the invariant coordinate system

MULTICHANNEL SIGNAL PROCESSING USING SPATIAL RANK COVARIANCE MATRICES

Multivariate Statistics

Independent component analysis for functional data

MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 1: Introduction, Multivariate Location and Scatter

Signed-rank Tests for Location in the Symmetric Independent Component Model

Multivariate Distributions

A more efficient second order blind identification method for separation of uncorrelated stationary time series

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II

PROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

Introduction to Machine Learning

CIFAR Lectures: Non-Gaussian statistics and natural images

On Independent Component Analysis

A Conditional Approach to Modeling Multivariate Extremes

Independent Component Analysis. PhD Seminar Jörgen Ungh

Principal Component Analysis vs. Independent Component Analysis for Damage Detection

Probability and Statistics Notes

Independent Component Analysis and Its Applications. By Qing Xue, 10/15/2004

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Robust Subspace DOA Estimation for Wireless Communications

Why is the field of statistics still an active one?

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

Package SpatialNP. June 5, 2018

STAT 4385 Topic 01: Introduction & Review

Introduction to Statistical Inference Lecture 10: ANOVA, Kruskal-Wallis Test

STA 4273H: Statistical Machine Learning

-Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the

A Probability Review

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Practical tests for randomized complete block designs

Multivariate Distributions

The ICS Package. May 4, 2007

Introduction to Independent Component Analysis. Jingmei Lu and Xixi Lu. Abstract

Contents 1. Contents

VARIABLE SELECTION AND INDEPENDENT COMPONENT

Dr. Maddah ENMG 617 EM Statistics 10/12/12. Nonparametric Statistics (Chapter 16, Hines)

HST.582J/6.555J/16.456J

Statistical Pattern Recognition

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2

Advanced Statistics II: Non Parametric Tests

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining

robustness, efficiency, breakdown point, outliers, rank-based procedures, least absolute regression

A simple graphical method to explore tail-dependence in stock-return pairs

Semi-parametric predictive inference for bivariate data using copulas

Tentative solutions TMA4255 Applied Statistics 16 May, 2015

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Independent Component Analysis

Unconstrained Ordination

Second-Order Inference for Gaussian Random Curves

Basics of Multivariate Modelling and Data Analysis

Robust scale estimation with extensions

Symmetrised M-estimators of multivariate scatter

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

Independent Component Analysis and Its Application on Accelerator Physics

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Plug-in Measure-Transformed Quasi Likelihood Ratio Test for Random Signal Detection

Multivariate Normal-Laplace Distribution and Processes

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Stat 5101 Lecture Notes

Nonparametric Location Tests: k-sample

ROBUST ESTIMATION OF A CORRELATION COEFFICIENT: AN ATTEMPT OF SURVEY

Multiple Random Variables

What s New in Econometrics? Lecture 14 Quantile Methods

CS168: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA)

MTTS1 Dimensionality Reduction and Visualization Spring 2014 Jaakko Peltonen

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Dimensionality Reduction. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

AN IMPROVEMENT TO THE ALIGNED RANK STATISTIC

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Independent Component Analysis

Advanced Introduction to Machine Learning

Statistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1

Part I. Other datatypes, preprocessing. Other datatypes. Other datatypes. Week 2 Based in part on slides from textbook, slides of Susan Holmes

Nonstationary spatial process modeling Part II Paul D. Sampson --- Catherine Calder Univ of Washington --- Ohio State University

Semiparametric Gaussian Copula Models: Progress and Problems

The Multivariate Gaussian Distribution

Statistical Significance of Ranking Paradoxes

INVARIANT COORDINATE SELECTION

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides

Testing Statistical Hypotheses

GARCH Models Estimation and Inference

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Semiparametric Gaussian Copula Models: Progress and Problems

Multivariate-sign-based high-dimensional tests for sphericity

Bayesian linear regression

Review of Statistics

Asymptotic behaviour of multivariate default probabilities and default correlations under stress

MIT Spring 2015

ON SMALL SAMPLE PROPERTIES OF PERMUTATION TESTS: INDEPENDENCE BETWEEN TWO SAMPLES

1 Data Arrays and Decompositions

Applied Multivariate and Longitudinal Data Analysis

PMR Learning as Inference

Multivariate Non-Normally Distributed Random Variables

A Peak to the World of Multivariate Statistical Analysis

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

Quantile Regression for Extraordinarily Large Data

Transcription:

Independent Component (IC) Models: New Extensions of the Multinormal Model Davy Paindaveine (joint with Klaus Nordhausen, Hannu Oja, and Sara Taskinen) School of Public Health, ULB, April 2008

My research is in multivariate statistics, where several (p, say) measurements are recorded on each of the n individuals. We want to come up with models that are potentially useful for a broad range of setups (p << n, though).

My research is in multivariate statistics, where several (p, say) measurements are recorded on each of the n individuals. We want to come up with models that are potentially useful for a broad range of setups (p << n, though). In those models, we develop procedures that are robust to some possible model misspecification

My research is in multivariate statistics, where several (p, say) measurements are recorded on each of the n individuals. We want to come up with models that are potentially useful for a broad range of setups (p << n, though). In those models, we develop procedures that are robust to some possible model misspecification robust to possible outlying observations (crucial in the multivariate case!)

My research is in multivariate statistics, where several (p, say) measurements are recorded on each of the n individuals. We want to come up with models that are potentially useful for a broad range of setups (p << n, though). In those models, we develop procedures that are robust to some possible model misspecification robust to possible outlying observations yet efficient...

Outline Introduction 1 Introduction A (too?) simple multivariate problem Normal and elliptic models 2 What is it? How does it work? vs PCA 3 Definition Inference

Outline Introduction A (too?) simple multivariate problem Normal and elliptic models 1 Introduction A (too?) simple multivariate problem Normal and elliptic models 2 What is it? How does it work? vs PCA 3 Definition Inference

A (too?) simple multivariate problem Normal and elliptic models cigarette sales in packs per capita per capita disposable income

A (too?) simple multivariate problem Normal and elliptic models X i = ( Xi1 X i2 ) = ( ) sales (after - before) for state i, i = 1,...,n income (after - before) for state i

A (too?) simple multivariate problem Normal and elliptic models Assume one wants to find out, on the basis of the sample X 1, X 2,..., X n, whether the tax reform had an effect (or not) on any of the variables. Typically, in statistical terms, this would translate into testing { H0 : µ j = 0 for all j H 1 : µ j 0 for at least j, at some fixed level α (5%, say).

A (too?) simple multivariate problem Normal and elliptic models Assume one wants to find out, on the basis of the sample X 1, X 2,..., X n, whether the tax reform had some fixed specified effect on each variable mean. Typically, in statistical terms, this would translate into testing { H0 : µ j = c j for all j H 1 : µ j c j for at least j, at some fixed level α (5%, say).

A (too?) simple multivariate problem Normal and elliptic models The most basic idea is to go univariate, i.e., for each j = 1, 2, to test on the basis of X 1j,..., X nj whether H (j) 0 : µ j = c j holds or not (at level 5%), to reject H 0 as soon as one H (j) 0 has been rejected. This is a bad multivariate testing procedure, since it is easy to show that P[RH 0 ] > 5% under H 0. You cannot properly control the level if you act marginally...

A (too?) simple multivariate problem Normal and elliptic models cigarette sales in packs per capita per capita disposable income

A (too?) simple multivariate problem Normal and elliptic models The most basic idea is to go univariate, i.e., for each j = 1, 2, to test on the basis of X 1j,..., X nj whether H (j) 0 : µ j = c j holds or not (at level 5%), to reject H 0 as soon as one H (j) 0 has been rejected. This is a bad multivariate testing procedure, since it is easy to show that P[RH 0 ] > 5% under H 0. You cannot properly control the level if you act marginally...

A (too?) simple multivariate problem Normal and elliptic models Confidence zones also cannot be built marginally...

A (too?) simple multivariate problem Normal and elliptic models Confidence zones also cannot be built marginally...

A (too?) simple multivariate problem Normal and elliptic models Hence there is a need for multivariate modelling. The most classical model the multivariate normal model specifies that the common density of the X i s is of the form f X (x) exp( (x µ) Σ 1 (x µ)/2). A necessary condition for to hold is that each of the p variables is normally distributed. Hence, even for p = 2, 3, it is extremely unlikely that the underlying distribution is multivariate normal... (you need to win at Euromillions p times in a row!)

A (too?) simple multivariate problem Normal and elliptic models

A (too?) simple multivariate problem Normal and elliptic models Not quite the same model!

A (too?) simple multivariate problem Normal and elliptic models

A (too?) simple multivariate problem Normal and elliptic models The marginals are far from Gaussian...

A (too?) simple multivariate problem Normal and elliptic models Does it hurt?

A (too?) simple multivariate problem Normal and elliptic models Does it hurt? Oh yes, it does... For H 0 : µ = µ 0, the Gaussian LR test (i) is efficient at the multinormal only, and (ii) is valid only if variances exist (what about financial series?) For H 0 : Σ = Σ 0, the Gaussian LR test is valid at the multivariate normal distribution!

A (too?) simple multivariate problem Normal and elliptic models Does it hurt? Oh yes, it does... For H 0 : µ = µ 0, the Gaussian LR test (i) is efficient at the multinormal only, and (ii) is valid only if variances exist (what about financial series?) For H 0 : Σ = Σ 0, the Gaussian LR test is valid at the multivariate normal distribution! Remarks: Even for n = Incidently, those tests are not robust w.r.t. possible outliers.

A (too?) simple multivariate problem Normal and elliptic models

A (too?) simple multivariate problem Normal and elliptic models An equivalent definition of the multivariate normal distribution specifies that where X = A(RU) + µ, U is uniformly distributed on the unit sphere in R p R 2 χ 2 p is independent of U A is a constant p p matrix µ is a constant p-vector

A (too?) simple multivariate problem Normal and elliptic models

A (too?) simple multivariate problem Normal and elliptic models

A (too?) simple multivariate problem Normal and elliptic models An equivalent definition of the multivariate normal distribution specifies that where X = A(RU) + µ, U is uniformly distributed on the unit sphere in R p R 2 χ 2 p is independent of U A is a constant p p matrix µ is a constant p-vector

A (too?) simple multivariate problem Normal and elliptic models An equivalent definition of the multivariate normal distribution specifies that where X = A(RU) + µ, U is uniformly distributed on the unit sphere in R p R 2 χ 2 p is independent of U A is a constant p p matrix µ is a constant p-vector Elliptical distributions allow for an arbitrary distribution R.

A (too?) simple multivariate problem Normal and elliptic models Elliptical distributions add some flexibility (in particular, allow for heavy tails). but still give raise to marginals with a common (type of) distribution. symmetric marginals a deep multivariate symmetry structure... These stylized facts often are sufficient to rule out the assumption of ellipticity... (no need for a test of ellipticity!) I am burning my old records here!

A (too?) simple multivariate problem Normal and elliptic models

A (too?) simple multivariate problem Normal and elliptic models Elliptical distributions add some flexibility (in particular, allow for heavy tails). but still give raise to marginals with a common (type of) distribution. symmetric marginals a deep multivariate symmetry structure... These stylized facts often are sufficient to rule out the assumption of ellipticity... (no need for a test of ellipticity!)

A (too?) simple multivariate problem Normal and elliptic models And now something completely different..." (Monthy Python Flying Circus, 1970)

Outline Introduction What is it? How does it work? vs PCA 1 Introduction A (too?) simple multivariate problem Normal and elliptic models 2 What is it? How does it work? vs PCA 3 Definition Inference

What is it? How does it work? vs PCA stands for Independent Component Analysis. It is a technique used in Blind Source Separation problems, such as in the cocktail-party problem": 3 conversations: Z it (i = 1, 2, 3, t = 1,..., n) 3 microphones: X it The goal is to recover the original conversations... Under the only assumption the latter are independent.

What is it? How does it work? vs PCA s Z 1t Z 2t Z 3t

What is it? How does it work? vs PCA s Z 1t Z 2t Z 3t

What is it? How does it work? vs PCA The basic model is X 1t = a 11 Z 1t + a 12 Z 2t + a 13 Z 3t X 2t = a 21 Z 1t + a 22 Z 2t + a 23 Z 3t X 3t = a 31 Z 1t + a 32 Z 2t + a 33 Z 3t, that is, X t = A Z t ; where one assumes all Z it s are mutually independent. Conversations" are independent. No serial dependence.

What is it? How does it work? vs PCA The basic model is X 1t = a 11 Z 1t + a 12 Z 2t + a 13 Z 3t X 2t = a 21 Z 1t + a 22 Z 2t + a 23 Z 3t X 3t = a 31 Z 1t + a 32 Z 2t + a 33 Z 3t, that is, X t = A Z t ; where one assumes all Z it s are mutually independent. Conversations" are independent. No serial dependence. The mixing matrix A does not depend on t.

What is it? How does it work? vs PCA For BW images, Z ij {0, 1,..., 255} represents the grey intensity of the ith image for the jth pixel (in vectorized form).

What is it? How does it work? vs PCA For BW images, Z ij {0, 1,..., 255} represents the grey intensity of the ith image for the jth pixel (in vectorized form).

What is it? How does it work? vs PCA For BW images, Z ij {0, 1,..., 255} represents the grey intensity of the ith image for the jth pixel (in vectorized form). Here, n = 281 281.

What is it? How does it work? vs PCA For BW images, Z ij {0, 1,..., 255} represents the grey intensity of the ith image for the jth pixel (in vectorized form). Here, n = 281 281. And Z i1 = 61, Z i2 = 61,...

What is it? How does it work? vs PCA For BW images, Z ij {0, 1,..., 255} represents the grey intensity of the ith image for the jth pixel (in vectorized form). Here, n = 281 281. And Z i1 = 61, Z i2 = 61,... Minimimal value=45 (dark grey) Maximal value=255 (white)

What is it? How does it work? vs PCA Can you guess the Z 1, Z 2, Z 3 which generated this mixture X 1? X 1

What is it? How does it work? vs PCA Would you guess who are X 1 X 2

What is it? How does it work? vs PCA X 1 X 2 X 3

What is it? How does it work? vs PCA magic

What is it? How does it work? vs PCA Ẑ 1 Ẑ 2 Ẑ 3

What is it? How does it work? vs PCA Z 1 Z 2 Z 3

What is it? How does it work? vs PCA Engineers typically estimate A (hence recover sources Ẑt = Â 1 X t ) by choosing the matrix A that makes the marginals of A 1 X t as independent as possible, or as non-gaussian as possible.

What is it? How does it work? vs PCA Engineers typically estimate A (hence recover sources Ẑt = Â 1 X t ) by choosing the matrix A that makes the marginals of A 1 X t as independent as possible, or as non-gaussian as possible. Drawbacks: Arbitrary objective functions. Computationally intensive procedures. Lack of robustness.

What is it? How does it work? vs PCA We have our own way to do that: A p p scatter matrix S = S(X 1,...,X n ) is a statistic such that S(AX 1,...,AX n ) = AS(X 1,..., X n )A for all p p matrix A. Example: S 2 = 1 n 1 S 1 = 1 n 1 n i=1 n (X i X)(X i X) i=1 [(X i X) S 1 1 (X i X)](X i X)(X i X)

What is it? How does it work? vs PCA We have our own way to do that: A p p scatter matrix S = S(X 1,...,X n ) is a statistic such that S(AX 1,...,AX n ) = AS(X 1,..., X n )A for all p p matrix A. Assume lim n S(X 1,..., X n ) is diagonal as soon as the common distribution of the X i s has independent marginals. Then we say S has the independence property. Example: S 2 = 1 n 1 S 1 = 1 n 1 n i=1 n (X i X)(X i X) i=1 [(X i X) S 1 1 (X i X)](X i X)(X i X)

What is it? How does it work? vs PCA We have our own way to do that: A p p scatter matrix S = S(X 1,...,X n ) is a statistic such that S(AX 1,...,AX n ) = AS(X 1,..., X n )A for all p p matrix A. Assume lim n S(X 1,..., X n ) is diagonal as soon as the common distribution of the X i s has independent marginals. Then we say S has the independence property. Examples: S 2 = 1 n 1 S 1 = 1 n 1 n i=1 n (X i X)(X i X) i=1 [(X i X) S 1 1 (X i X)](X i X)(X i X)

What is it? How does it work? vs PCA We have our own way to do that: A p p scatter matrix S = S(X 1,...,X n ) is a statistic such that S(AX 1,...,AX n ) = AS(X 1,..., X n )A for all p p matrix A. Assume lim n S(X 1,..., X n ) is diagonal as soon as the common distribution of the X i s has independent marginals. Then we say S has the independence property. Theorem Let S 1, S 2 be scatter matrices with the independence property. Then the p p matrix B n, whose columns are the eigenvectors of S 1 2 (X 1,..., X n )S 1 (X 1,..., X n ), is consistent for (A ) 1.

What is it? How does it work? vs PCA Proof. By using the definition of a scatter and the independence property, we obtain { S 1 = S 1 (X i ) = S 1 (AZ i ) = AS 1 (Z i )A = AD 1 A S 2 = S 2 (X i ) = S 2 (AZ i ) = AS 2 (Z i )A = AD 2 A, for some diagonal matrices D 1, D 2. Hence, (S 1 2 S 1)A 1 = (AD2 A ) 1 (AD 1 A )A 1 = A 1 (D 1 2 D 1).

What is it? How does it work? vs PCA Proof. By using the definition of a scatter and the independence property, we obtain { S 1 = S 1 (X i ) = S 1 (AZ i ) = AS 1 (Z i )A = AD 1 A S 2 = S 2 (X i ) = S 2 (AZ i ) = AS 2 (Z i )A = AD 2 A, for some diagonal matrices D 1, D 2. Hence, (S 1 2 S 1)A 1 = (AD2 A ) 1 (AD 1 A )A 1 = A 1 (D 1 2 D 1).

What is it? How does it work? vs PCA Proof. By using the definition of a scatter and the independence property, we obtain { S 1 = S 1 (X i ) = S 1 (AZ i ) = AS 1 (Z i )A = AD 1 A S 2 = S 2 (X i ) = S 2 (AZ i ) = AS 2 (Z i )A = AD 2 A, for some diagonal matrices D 1, D 2. Hence, (S 1 2 S 1)A 1 = (AD2 A ) 1 (AD 1 A )A 1 = A 1 (D 1 2 D 1).

What is it? How does it work? vs PCA We have our own way to do that: Theorem Let S 1, S 2 be scatter matrices with the independence property. Then the p p matrix B n, whose columns are the eigenvectors of S 1 2 S 1, is consistent for (A ) 1. Of course, if we choose robust S 1 and S 2, the resulting  will be robust as well, which guarantees a robust reconstruction of the independent sources...

What is it? How does it work? vs PCA With robust S 1, S 2... Ẑ 1 Ẑ 2 Ẑ 3

What is it? How does it work? vs PCA With non-robust S 1, S 2... (the ones given above) Ẑ 1 Ẑ 2 Ẑ 3

What is it? How does it work? vs PCA PCA makes marginals uncorrelated... makes marginals independent... Actually, is going one step further than PCA: = PCA + a rotation...

What is it? How does it work? vs PCA PCA makes marginals uncorrelated... makes marginals independent... Actually, is going one step further than PCA: = PCA + a rotation... This explains PCA is often used as a preliminary step to perform.

What is it? How does it work? vs PCA 10 0 5 5 0 5 10 5 0 5 V1 3 1 1 3 10 0 5 V2 Raw data V3 10 0 5 5 0 5 10 V4 V5 5 0 5 10 5 0 5 V6 3 1 1 3 10 0 5 5 0 5 10

What is it? How does it work? vs PCA 5 0 5 10 6 2 2 1.5 0.0 1.0 V1 15 5 5 15 5 0 5 10 V2 Principal components V3 10 0 5 10 6 2 2 V4 V5 1.0 0.5 1.5 1.5 0.0 1.0 V6 15 5 5 15 10 0 5 10 1.0 0.5 1.5

What is it? How does it work? vs PCA 3 1 1 3 2 0 2 3 1 0 1 V1 4 2 0 2 Independent components 3 1 1 3 V2 V3 3 1 1 3 2 0 2 V4 V5 3 1 0 1 3 1 0 1 V6 4 2 0 2 3 1 1 3 3 1 0 1

Outline Introduction Definition Inference 1 Introduction A (too?) simple multivariate problem Normal and elliptic models 2 What is it? How does it work? vs PCA 3 Definition Inference

Definition Inference We reject the elliptical model, which states that X i = AZ i + µ, where Z i = (Z i1,...,z ip ) is spherically symmetric (about 0 R p ), in favor of the following: Definition The independent component (IC) model states that X i = AZ i + µ, where Z i = (Z i1,...,z ip ) has independent marginals (with median 0 and MAD 1).

Definition Inference provide an extension of the multinormal model, which is obtained when all ICs are Gaussian.

Definition Inference provide an extension of the multinormal model, which is obtained when all ICs are Gaussian. Both extensions are disjoint.

Definition Inference provide an extension of the multinormal model, which is obtained when all ICs are Gaussian. Both extensions are disjoint. This IC extension is bigger than that of elliptic models. In : µ, A, and p densities g 1,...,g p. In elliptic models: µ, A, and a single density g (that of Z ).

Definition Inference As a summary...

Definition Inference provide an extension of the multinormal model, which is obtained when all ICs are Gaussian. Both extensions are disjoint. This IC extension is bigger than that of elliptic models. In : µ, A, and p densities g 1,...,g p. In elliptic models: µ, A, and a single density g (that of Z ). The g j s allow for much flexibility. In particular, - we can play with p different kurtosis values... - the X i may very well be asymmetric...

Definition Inference

Definition Inference Inference problem Test H 0 : µ = 0 for n i.i.d. observations from the IC model X i = AZ i + µ, where Z i = (Z i1,...,z ip ) has independent marginals. The parameters: location vector µ, scatter matrix A, p densities (g 1,..., g p ). Of course, we can hardly assume the g j s to be known, and it is expected that this nuisance will be an important issue.

Definition Inference Quite nicely, our estimators  (based on a couple of scatter S 1, S 2 ) do not require estimating µ nor g 1,...,g p.

Definition Inference Quite nicely, our estimators  (based on a couple of scatter S 1, S 2 ) do not require estimating µ nor g 1,...,g p. We then may (a) write Y i :=  1 X i =  1 AZ i +  1 µ Z i +  1 µ (1) and (b) go univariate to test componentwise whether the location is 0 (RH (j) 0 for large values of T j, with T j N(0, 1) under H (j) 0 ).

Definition Inference Quite nicely, our estimators  (based on a couple of scatter S 1, S 2 ) do not require estimating µ nor g 1,...,g p. We then may (a) write Y i :=  1 X i =  1 AZ i +  1 µ Z i +  1 µ (1) and (b) go univariate to test componentwise whether the location is 0 (RH (j) 0 for large values of T j, with T j N(0, 1) under H (j) 0 ). Crucial point: we will be able to aggregate those univariate tests easily because the components are independent (RH 0 for large values of p j=1 T j 2, which is χ 2 p under H 0).

Definition Inference Which T j should we choose? Student: T j = n Ȳ.j s.j = 1 n n i=1 Y ij s.j = 1 n n i=1 Sign(Y ij ) Y ij s.j This yields a multivariate Student test (φ N, say), which unfortunately suffers the same drawbacks as classical Gaussian tests:

Definition Inference Which T j should we choose? Student: T j = n Ȳ.j s.j = 1 n n i=1 Y ij s.j = 1 n n i=1 Sign(Y ij ) Y ij s.j This yields a multivariate Student test (φ N, say), which unfortunately suffers the same drawbacks as classical Gaussian tests: It cannot deal with heavy tails. It is poorly robust.

Definition Inference Which T j should we choose? Student: T j = n Ȳ.j s.j = 1 n n i=1 Y ij s.j = 1 n n i=1 Sign(Y ij ) Y ij s.j This yields a multivariate Student test (φ N, say), which unfortunately suffers the same drawbacks as classical Gaussian tests: It cannot deal with heavy tails. It is poorly robust.

Definition Inference Which T j should we choose? Student: T j = n Ȳ.j s.j = 1 n n i=1 Y ij s.j = 1 n n i=1 Sign(Y ij ) Y ij s.j This yields a multivariate Student test (φ N, say), which unfortunately suffers the same drawbacks as classical Gaussian tests: It cannot deal with heavy tails. It is poorly robust.

Definition Inference Which T j should we choose? Student: T j = n Ȳ.j s.j = 1 n T j := 1 n at the multinormal, where n i=1 n i=1 Y ij s.j = 1 n n i=1 Sign(Y ij ) Y ij s.j, ( Sign(Y ij )Φ 1 Rij ) +, n + 1 R ij denotes the rank of Y ij among Y 1j,..., Y nj, and Φ + (z) = P[ N(0, 1) z ].

Definition Inference How good is the resulting test ( φ N, say), which rejects H 0 for large values of p j=1 T j 2? It is fairly robust to outliers It can deal with heavy tails

Definition Inference How good is the resulting test ( φ N, say), which rejects H 0 for large values of p j=1 T j 2? and... It is fairly robust to outliers It can deal with heavy tails it is, at the multinormal, as powerful as φ N! (since T j = T j + o P (1) at the multinormal)

Definition Inference How good is the resulting test ( φ N, say), which rejects H 0 for large values of p j=1 T j 2? and... It is fairly robust to outliers It can deal with heavy tails it is, at the multinormal, as powerful as φ N! (since T j = T j + o P (1) at the multinormal) A natural question: How does it compare with φ N (in terms of power) away from the multinormal?

Definition Inference The answer is in favor of our rank test: Theorem The asymptotic relative efficiency (ARE) of φ N with respect to φ N under µ = n 1/2 τ, A, and (g 1,..., g p ) is of the form ARE = p j=1 w j(a,τ) c(g j ) p j=1 w j(a,τ), w j (A,τ) 0. g j t 3 t 6 t 12 N e 2 e 3 e 5 1.639 1.093 1.020 1.000 1.129 1.286 1.533 Table: Various values of c(g j ).

Definition Inference Actually, c(g j ) 1 for all g j, which implies that φ N is always (asymptotically) more powerful than the Student test φ N!

Definition Inference Actually, c(g j ) 1 for all g j, which implies that φ N is always (asymptotically) more powerful than the Student test φ N! Our tests therefore dominate the Student ones both in terms of robustness and efficiency!

Definition Inference Remark: rather than Gaussian scores" as in T j = 1 n n i=1 ( Sign(Y ij )Φ 1 Rij ) +, n + 1 one can use (more robust) Wilcoxon scores T j := 3 n n R ij Sign(Y ij ) n + 1 i=1 or (even more robust) sign scores T j := 1 n n Sign(Y ij ). i=1

Definition Inference Efficiency is then not as good, as a price for the better robustness... g j t 3 t 6 t 12 N e 2 e 3 e 5 φ N 1.639 1.093 1.020 1.000 1.129 1.286 1.533 test φw 1.900 1.164 1.033 0.955 0.873 0.881 0.907 φ S 1.621 0.879 0.733 0.637 0.411 0.370 0.347 Table: Various values of c(g j ) for our Gaussian, Wilcoxon, and sign tests.

Definition Inference Original data

Definition Inference 95% confidence zone Gaussian method

Definition Inference 95% confidence zone our φ N IC method

Definition Inference 95% confidence zone our φ W IC method

Definition Inference 95% confidence zone our φ S IC method

Definition Inference Original data

Definition Inference Contaminated data

Definition Inference 95% confidence zone Gaussian method

Definition Inference 95% confidence zone our φ N IC method

Definition Inference 95% confidence zone our φ W IC method

Definition Inference 95% confidence zone our φ S IC method

Conclusion Introduction Definition Inference provide quite flexible semiparametric models for multivariate statistics. Rank methods are efficient and robust alternatives to Gaussian methods.

Appendix References References I Oja, H., Sirkiä, S., & J. Eriksson (2006). Scatter matrices and independent component analysis, Austrian Journal of Statistics 35, 175 189. Oja, H., Nordhausen, K., & D. Paindaveine (2007). Signed-rank tests for location in the symmetric independent component model, ECORE DP 2007/123. Submitted. Oja, H., Paindaveine, D., & S. Taskinen (2008). Parametric and nonparametric tests for multivariate independence in. Manuscript in preparation.