PROPERTIES OF THE EMPIRICAL CHARACTERISTIC FUNCTION AND ITS APPLICATION TO TESTING FOR INDEPENDENCE. Noboru Murata

Similar documents
Natural Gradient Learning for Over- and Under-Complete Bases in ICA

Independent Component Analysis. Contents

Independent Component Analysis

ICA. Independent Component Analysis. Zakariás Mátyás

Independent Component Analysis

A METHOD OF ICA IN TIME-FREQUENCY DOMAIN

ICA [6] ICA) [7, 8] ICA ICA ICA [9, 10] J-F. Cardoso. [13] Matlab ICA. Comon[3], Amari & Cardoso[4] ICA ICA

An Improved Cumulant Based Method for Independent Component Analysis

A Canonical Genetic Algorithm for Blind Inversion of Linear Channels

x 1 (t) Spectrogram t s

BLIND SEPARATION OF INSTANTANEOUS MIXTURES OF NON STATIONARY SOURCES

1 Introduction Independent component analysis (ICA) [10] is a statistical technique whose main applications are blind source separation, blind deconvo

Fundamentals of Principal Component Analysis (PCA), Independent Component Analysis (ICA), and Independent Vector Analysis (IVA)

CIFAR Lectures: Non-Gaussian statistics and natural images

Analytical solution of the blind source separation problem using derivatives

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II

BLIND SEPARATION USING ABSOLUTE MOMENTS BASED ADAPTIVE ESTIMATING FUNCTION. Juha Karvanen and Visa Koivunen

Autoregressive Independent Process Analysis with Missing Observations

Blind separation of instantaneous mixtures of dependent sources

a 22(t) a nn(t) Correlation τ=0 τ=1 τ=2 (t) x 1 (t) x Time(sec)

1 Introduction Blind source separation (BSS) is a fundamental problem which is encountered in a variety of signal processing problems where multiple s

New Approximations of Differential Entropy for Independent Component Analysis and Projection Pursuit

where A 2 IR m n is the mixing matrix, s(t) is the n-dimensional source vector (n» m), and v(t) is additive white noise that is statistically independ

One-unit Learning Rules for Independent Component Analysis

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Independent Component Analysis and Its Applications. By Qing Xue, 10/15/2004

Independent Component Analysis on the Basis of Helmholtz Machine

ON-LINE BLIND SEPARATION OF NON-STATIONARY SIGNALS

BLIND SEPARATION OF INSTANTANEOUS MIXTURES OF NON STATIONARY SOURCES

Independent Component Analysis

Blind Machine Separation Te-Won Lee

Lecture'12:' SSMs;'Independent'Component'Analysis;' Canonical'Correla;on'Analysis'

Undercomplete Independent Component. Analysis for Signal Separation and. Dimension Reduction. Category: Algorithms and Architectures.

Blind Signal Separation: Statistical Principles

An Iterative Blind Source Separation Method for Convolutive Mixtures of Images

Blind channel deconvolution of real world signals using source separation techniques

Independent Component Analysis

A Constrained EM Algorithm for Independent Component Analysis

Using Kernel PCA for Initialisation of Variational Bayesian Nonlinear Blind Source Separation Method

Independent Components Analysis by Direct Entropy Minimization

Independent Component Analysis of Incomplete Data

ADAPTIVE LATERAL INHIBITION FOR NON-NEGATIVE ICA. Mark Plumbley

Dependence, correlation and Gaussianity in independent component analysis

Recursive Generalized Eigendecomposition for Independent Component Analysis

MATHEMATICAL ENGINEERING TECHNICAL REPORTS. A Superharmonic Prior for the Autoregressive Process of the Second Order

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Non-Euclidean Independent Component Analysis and Oja's Learning

Complete Blind Subspace Deconvolution

Independent component analysis: algorithms and applications

THE THREE EASY ROUTES TO INDEPENDENT COMPONENT ANALYSIS; CONTRASTS AND GEOMETRY. Jean-François Cardoso

Robust extraction of specific signals with temporal structure

Optimization and Testing in Linear. Non-Gaussian Component Analysis

On Information Maximization and Blind Signal Deconvolution

MISEP Linear and Nonlinear ICA Based on Mutual Information

Independent Component Analysis (ICA)

MINIMIZATION-PROJECTION (MP) APPROACH FOR BLIND SOURCE SEPARATION IN DIFFERENT MIXING MODELS

MULTIVARIATE EDGEWORTH-BASED ENTROPY ESTIMATION. Marc M. Van Hulle

Variational Principal Components

Introduction to Independent Component Analysis. Jingmei Lu and Xixi Lu. Abstract

A non-gaussian decomposition of Total Water Storage (TWS), using Independent Component Analysis (ICA)

Independent component analysis for noisy data MEG data analysis

Massoud BABAIE-ZADEH. Blind Source Separation (BSS) and Independent Componen Analysis (ICA) p.1/39

HST.582J/6.555J/16.456J

Information Theory Based Estimator of the Number of Sources in a Sparse Linear Mixing Model

An Introduction to Independent Components Analysis (ICA)

Novel spectrum sensing schemes for Cognitive Radio Networks

Single Channel Signal Separation Using MAP-based Subspace Decomposition

INDEPENDENT COMPONENT ANALYSIS

Independent Component Analysis and Unsupervised Learning

EEG- Signal Processing

ICA Using Kernel Entropy Estimation with NlogN Complexity

BLIND SEPARATION OF TEMPORALLY CORRELATED SOURCES USING A QUASI MAXIMUM LIKELIHOOD APPROACH

Blind Source Separation via Generalized Eigenvalue Decomposition

Random Matrix Theory Lecture 1 Introduction, Ensembles and Basic Laws. Symeon Chatzinotas February 11, 2013 Luxembourg

ICA Using Spacings Estimates of Entropy

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models

Blind Separation of Post-nonlinear Mixtures using Linearizing Transformations and Temporal Decorrelation

A MULTIVARIATE MODEL FOR COMPARISON OF TWO DATASETS AND ITS APPLICATION TO FMRI ANALYSIS

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien

Dynamic Data Factorization

Advanced Introduction to Machine Learning CMU-10715

Fourier PCA. Navin Goyal (MSR India), Santosh Vempala (Georgia Tech) and Ying Xiao (Georgia Tech)

Dimensionality Reduction. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Fast blind separation based on information theory.

BLIND source separation (BSS) aims at recovering a vector

Semi-Blind approaches to source separation: introduction to the special session

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

Nonparametric Bayesian Methods (Gaussian Processes)

Lecture 3: Central Limit Theorem

Unsupervised learning: beyond simple clustering and PCA

Principal component analysis

Independent Component Analysis. PhD Seminar Jörgen Ungh

FuncICA for time series pattern discovery

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models

Lecture 3: Central Limit Theorem

Kriging models with Gaussian processes - covariance function estimation and impact of spatial sampling

Non-Negative Matrix Factorization with Quasi-Newton Optimization

To appear in Proceedings of the ICA'99, Aussois, France, A 2 R mn is an unknown mixture matrix of full rank, v(t) is the vector of noises. The

Multivariate Distributions

VARIABLE SELECTION AND INDEPENDENT COMPONENT

Transcription:

' / PROPERTIES OF THE EMPIRICAL CHARACTERISTIC FUNCTION AND ITS APPLICATION TO TESTING FOR INDEPENDENCE Noboru Murata Waseda University Department of Electrical Electronics and Computer Engineering 3-- Ohkubo Shinjuku Tokyo 69-555 JAPAN ABSTRACT In this article the asymptotic properties of the empirical characteristic function are discussed. The residual of the joint and marginal empirical characteristic functions is studied and the uniform convergence of the residual in the wider sense and the weak convergence of the scaled residual to a Gaussian process are investigated. Taking into account of the result a statistical test for independence against alternatives is considered.. INTRODUCTION Independent component analysis ICA[]) deals with a problem of decomposing signals which are generated from an unknown linear mixture of mutually independent sources and is applied to many problems in various engineering fields such as source separation problem dimension reduction and data mining. ICA is often compared with principle component analysis PCA) which is a method of multivariate data analysis based on the covariance structure of signals with the Euclidean metric. However in the ICA framework the notion of independence which is not affected by the metric of signals plays a central role. Supported by the recent improvement of computer systems which allows us to handle massive data and to estimate computationally tedious higher order information within reasonable time the concept of ICA gives a new viewpoint of using the mutual information instead of the covariance to the multivariate data analysis. It is also pointed out that in the context of learning self-organization of information transformations based on infomax criteria in neural networks is closely related with ICA [] and so on). A simple setting of ICA is as follows. Let ) be a source signal which consists of zero mean independent random variables. We assume that each component is an i.i.d. sequence or a strong stationary ergodic sequence and at most only "!#!%$& one component is subject to a Gaussian distribution and the others are non-gaussian. Let be an observation and we assume that it is produced as a linear transformation of the source signal by an unknown matrix ) * that is ' +)*+--- * Here the problem is to recover the source signal and estimate the matrix ' from the observation without any knowledge about them that is find a matrix. such that the elements of a recovered signal / & " * which is determined by a certain matrix. as are mutually independent. In other words find a decomposition of observation. 3 )5---63 * where elements of / are mutually independent. Ideally the matrix. is an inverse or generalized inverse) of the matrix '. However as well known the order and amplitude of elements of the recovered signal are ambiguous. In general ICA algorithms are implemented as minimization or maximization of a certain contrast function which measures the independence among elements of the recovered signal /. Motivated by mainly stability of algo- rithms and speed of the convergence many contrast functions are proposed and typical ones are as follows. Let 7:9<; ; ; := > > and 7:?< > be the joint probability density and the marginal probability density of the recovered signal respectively. The definition of independence is stated as 7:9; ; ; := > >)+7:9 >*---<7:= > ) Therefore a certain statistical distance between the joint and the marginal distributions can be a contrast function such as the Kullback-Leibler divergence and the Hellinger distance. Since these distances are defined with probability distributions we have to approximate probability distributions by kernel methods or polynomial expansion based on moments or cumulants in order to calculate them by using ) ) 3) 9

only observations i.e. by using empirical distributions. In practice the Kullback-Leibler divergence is often used and in this case entropy or differentials of entropy of the recovered signals have to be estimated utilizing such as Gram- Charlier expansion or non-linear function approximations [3] and so on). There are also some methods using higher order statistics in which shape of probability distributions are not exactly assumed. A method of blind separation by Jutten and Herault [] is one of this type. In the simplest case they assume only the symmetry of the distributions and use the condition that & " )+ Also cumulants in particular kurtosis are often used in contrast functions [5] and so on). In another approach the correlation functions are used in contrast functions to utilize the time structure of the signal [6] and so on). In this case the assumption on the source signals may be relaxed to be weakly stationary. From the assumption of independence the correlation function matrix of the source signal is a diagonal matrix ) 9.?..... = at any time delay where denotes the auto-correlation function of the signal. Therefore a desired matrix. is a matrix which simultaneously diagonalizes the correlation function matrices of the recovered signal at any time delays. In practical calculation this strategy is reduced as an approximated simultaneous diagonalization of several delayed correlation matrices under the least mean square criterion. So far based on various contrast functions many algorithms are proposed. Since each method has its own advantages and disadvantages in practic it is difficult to know which algorithm should be used in advance. We sometimes need to evaluate the validity of the result of an algorithm or compare the results of different algorithms from the viewpoint of independence of reconstructed signals. Therefore it is important to consider methods of test for independence. In this paper we propose to use the empirical characteristic function and investigate basic properties with numerical experiments.. PROPERTIES OF EMPIRICAL CHARACTERISTIC FUNCTION The characteristic function of a probability distribution is defined as the Fourier-Stieltjes transform of! #"$! ) 5) 6) where denotes expectation with respect to the distribution. From the well-known property of the Fourier-Stieltjes transform the characteristic function and the distribution correspond one to one. Therefore closeness of two distributions can be restated in terms of closeness of their characteristic functions. Moreover the characteristic function is overall bounded that is %! #"$)! %'& % #"$! % These uniqueness and boundedness are specific properties of the characteristic function. The moment generating function which is the Laplace transform of the probability distribution is similar to the characteristic function but it is not bounded and there are some pathological examples of distributions which for instance have all degrees of moments but do not have the moment generating functions. Particularly the boundedness is an advantage for stability of numerical calculations. The empirical characteristic function is defined as * where! + #"$! is an i.i.d. sequence from the distribution. Obviously the empirical characteristic function is directly calculated from the empirical distribution We should however note that all the calculation is done in the complex domain. The following two results by Feuerverger and Mureika[7] are quite important to know the properties of convergence of the empirical characteristic function to the true characteristic function. Theorem Feuerverger and Mureika[7]) For -. holds. 7/++3 5769 $ : ;: < % *= % +?> 6 Theorem Feuerverger and Mureika[7]) Let * be a stochastic process that is a residual of the empirical characteristic function and the true characteristic function: )A@ *= BDC = & & As EF. converges to a zero mean complex Gaussian process # satisfying #5HG = and H@#<#B where G denotes complex conjugate. )I= 7) ) 9) ) As AEJ. by the strong law of large numbers the empirical characteristic function converges almost surely to

3 the true characteristic function pointwisely with respect to. Furthermore the above theorems respectively claim that the convergence with respect to is uniform in the wider sense and the residual as a stochastic process converges weakly to a complex Gaussian process. Feuerverger and Mureika have mainly discussed a test for symmetry of the distribution but also mentioned about test for fitness test for independence applications to estimating parameters and so on. In practice a studentized version is often used and the properties of the studentized characteristic function ) where = "! =!!! ) ) are also investigated by Murota and Takeuchi[]. Similarly the residual converges to a complex Gaussian process. We extend the above result to the case of the empirical characteristic function of two independent random variables. Let "! + be mutually indepen- dent i.i.d. sequences and and the characteristic functions of! and and their joint distributions respectively. Note that in the two dimensional case the characteristic function is defined as &! #"$! I& and from the assumption of independence between! it is rewritten as ) The empirical characteristic functions are defined as #"$! #"$ I #"$! I be 3) and ) 5) 6) 7) By the strong law of large numbers the above empirical characteristic functions converge almost surely to the true characteristic functions and pointwisely with respect to and I respectively. We can show following two theorems. Theorem 3 For -. 7/++3 5 69 $ % : ;: = : : < holds. % +?> ) Theorem Let be a two-dimensional complex stochastic process: A@ = BDC = & I & As EF. converges weakly to a zero mean complex Gaussian process and H@ I B @?= B'@ 9) satisfying I I?= We have to be careful about the fact that G = #= I I B: ) and are not independent which gives the essential difference from Feuerverger and Mureika s theorems. For the proof of Theorem 3 the equality = A@ = = @ = B =@ = B =@ = B'@ = B B: and the continuity theorem of the characteristic function are applied. Knowing that )A@ = )A@ = BDC A@ = BDC B C ) ) 3) converge pointwisely to Gaussian distributions satisfying H@ I B II = I ) H@ B ) = 5) H@ I B II = I 6) H@ B @ ) = B 7) H@ I B @ II = I B: ) H@ B+ 9) the proof of Theorem basically follow Feuerverger and Mureika s. Note that consists of only the empirical characteristic functions which can be calculated from observations. That means we can calculate without * any knowledge about the underlying distributions of! and. Also the multidimensional versions for 3 or more random variables can be discussed in a straightforward way.

5 = G 3. APPLICATION TO TESTING FOR INDEPENDENCE According to the results in the previous section we construct a test statistics with A@ = BDC Roughly speaking the procedure for testing consists of evaluating how is close to with a certain distance. -norm -norm and Sobolev-norm may be candidates as a distance function however evaluation on the whole is not necessarily plausible because Theorems and 3 claim that the residual converges uniformly only on a bounded domain. Actually the function approaches to I arbitrarily often as E. and does not converge uniformly on or. Therefore at least from the practical point of view the distance should be evaluated as with a weight function As a special case the decomposed as % I whose support is bounded. -norm is used and can be % using a function whose inverse Fourier transform is >. The norm is then rewritten as % * % I H=! >= H=! >= :> In addition if the function is a probability density function the above distance is interpreted as the -norm of the difference between joint and marginal probability density functions estimated by the kernel method. As another special case Sobolev-norm weighted around the origin is almost equivalent to the method of using higher order statistics moments and cumulants) because the differential coefficients at the origin of the empirical characteristic function give the information of cumulants. In any case these norms are not plausible in practice because of the computational cost of the multiple integrals with respect to I and and weight functions should be carefully chosen with quantitative evaluation of the convergence using for instance cross-validation. Here we try to reduce the computational cost taking account of the properties of the statistics. We should note that the variance of vanishes on the coordinate axes of that is or I& and is bounded at the other points. Also from the covariance structure of the neighborhood we know that as a stochastic process changes continuously and smoothly Fig. 5 see for example). As discussed in Murota and Takeuchi[] we evaluate at one point or several points on a certain region of taking account of the continuity. Since almost obeys a Gaussian distribution if is sufficiently large we can adopt the -norm of normal- ized with the variance 3 where the elements of the matrix "!$#&% * 3 3) are given by @&'!$#&% *)$*&+* $*&+* 3 3-!$#&% "!$#&%.3 @'='!$#&% /$*&+* G = #= I Note that asymptotically calculated as G B holds and then G B can be!$#&% A@ = B'@ 'I= B $*&+* A@: = % % DB'@: = % % DB: This test statistics obeys the distribution with degrees of freedom hence p-values can be evaluated as for example 73&5&6*@ &)7 :9B 5 ; 73&5&6*@ & 9 ;:;9B 5 ;:9 73&5&6*@ & ; :<B 5 ;:; under the null hypothesis that two random variables are mutually independent.. NUMERICAL EXPERIMENTS.. Properties of test statistics First we check the properties of the test statistics and their = -values discussed in the previous section. In each run we generate independent random variables @! > 6 9B @ > 9B subject to the beta distribution? and calculate the test statistics. Figures and shows the histograms of the test statistics and = -values made from runs. Theoretically the test statistics obeys

distribution with degrees of freedom and the = -value obeys uniform distribution on. Comparison of experimental and theoretical cumulative distributions of the test statistics in Figure 3 supports our analysis. 7 6 5 3 test statistics at ) functions A@ = B C = 7 7 = 7 7 where! + and 6+ and the empirical characteristic functions are calculated from randomly chosen 9 points of! and. We can see that both of the values are almost around the origin but continuously fluctuate toward the marginal region. source signal S 6 6.5 Fig.. Histogram of test statistics. amplitude.5.5.5.5 3 3.5.5 5 tap x p values at ) source signal S.5 6 amplitude.5.5.5.5 3 3.5.5 5 tap x...3..5.6.7..9 Fig.. Histogram of = -values. Fig.. Source signal. KHz sampling rate < secs waveforms) behavior of the residual Z n ts).9 cumulative distribution of test statistics at ) experimental theoretical.. real part.7.6.5. s t..3.. 6 6 Fig. 3. Experimental and theoretical cumulative distributions of the test statistics... Application to detecting crosstalk In the following example we use two music instrument sounds and which are synthesized on the computer. Figure shows their waveforms KHz sampling rate < secs 7 points). Figure 5 shows the real part and the imaginary part of the scaled residual of the joint and marginal characteristic imaginary part.5.5 s Fig. 5. The real part and the imaginary part of the residual of the joint and marginal characteristic functions. Using these signals we generate artificially mixed signals 7 ' Since includes with small amplitude it simulates crosstalk in. To detect the crosstalk we generate mixed t 3

signals! and calculate the statistics. In Figures 6 and 7 the test statistics and the = -values are shown which are calculated from randomly chosen 9 points. We plot the intensity of in the horizontal axis and the test statistics and the p-values in the vertical axis respectively where = varies on and the statistics are evaluated at :. The = -values of the = :<:;#= :< = test statistics are maximized at :<&; at the applied points respectively hence the test statistics detect the true crosstalk ratio 7 with small errors. Tts) 9 7 6 5 3 test statistics..)..)..)...6.....6.. α Fig. 6. Test statistics of artificially mixed signals. of sample and the evaluation point are needed as a future work. Also especially for practical use it is important to consider the problem in which the observations are smeared by not negligible noises. In the Gaussian noise case a straightforward extension can be made by using the information of the noise covariance estimated by the conventional factor analysis as discussed in Kawanabe[9] 6. REFERENCES [] Pierre Comon Independent component analysis a new concept? Signal Processing vol. 36 no. 3 pp. 7 3 April 99. [] Anthony J. Bell and Terrence J. Sejnowski An information maximization approach to blind separation and blind deconvolution Neural Computation vol. 7 pp. 9 59 995. [3] Shun-ichi Amari Andrzej Cichocki and Harward Hua Yang A new learning algorightm for blind signal separation in Advances in Neural Information Processing Systems David S. Touretzky Michael C. Mozer and Michael E. Hasselmo Eds. pp. 757 763. MIT Press Cambridge MA 996. [] Christian Jutten and Jeanny Herault Separation of sources Part I Signal Processing vol. no. pp. July 99..9. p values of testing for independence..)..)..) [5] Jean-François Cardoso and Antoine Souloumiac Blind beamforming for non Gaussian signals IEE Proceedings F vol. no. 6 pp. 36 37 December 993. p values.7.6.5..3 [6] L. Féty and J. P. Van Uffelen New methods for signal separation in Proceedings of th International Conference on HF radio systems and techniques London April 9 IEE pp. 6 3......6.....6.. α Fig. 7. p-values of artificially mixed signals. 5. CONCLUDING REMARKS In this paper we investigated the properties of the joint and marginal empirical characteristic functions and discussed a test for independence using the characteristic function of joint and marginal empirical distributions. With more detailed evaluation of the convergence quantitative consideration about the relationship between the size [7] Andrey Feuerverger and Roman A. Mureika The empirical characteristic function and its applications The Annals of Statistics vol. 5 no. pp. 97 977. [] Kazuo Murota and Kei Takeuchi The studentized empirical characteristic function and its application to test for the shape of distribution Biometrika vol. 6 no. pp. 55 65 9. [9] Motoaki Kawanabe and Noboru Murata Independent component analysis in the presence of gaussian noise based on estimating functions in Proceedings of the Second International Workshop on Independent Component Analysis and Blind Signal Separation ICA) Helsinki June.