SENSITIVITY ANALYSIS OF BLIND SEPARATION OF SPEECH MIXTURES. Savaskan Bulek. A Dissertation Submitted to the Faculty of

Size: px
Start display at page:

Download "SENSITIVITY ANALYSIS OF BLIND SEPARATION OF SPEECH MIXTURES. Savaskan Bulek. A Dissertation Submitted to the Faculty of"

Transcription

1 SENSITIVITY ANALYSIS OF BLIND SEPARATION OF SPEECH MIXTURES by Savaskan Bulek A Dissertation Submitted to the Faculty of The College of Engineering & Computer Science in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy Florida Atlantic University Boca Raton, FL December 21

2

3 ACKNOWLEDGEMENTS I gratefully acknowledge my advisor Dr. Nurgun Erdol for her guidance, encouragement, and great patience. She has provided continued motivation, and generous advice throughout my Ph.D. study. I wish to express my thanks to Dr. Valentine Aalo, Dr. Christopher Beetle, and Dr. Hanqi Zhuang for their support and valuable suggestions. I would also like to thank the Lifelong Learning Society at FAU, FAU s Center for Ocean Energy Technology, NASA, and Department of Computer and Electrical Engineering and Computer Science for providing financial support. iii

4 ABSTRACT Author: Title: Institution: Dissertation Advisor: Degree: Savaskan Bulek Sensitivity Analysis of Blind Separation of Speech Mixtures Florida Atlantic University Dr. Nurgun Erdol Doctor of Philosophy Year: 21 Blind source separation (BSS) refers to a class of methods by which multiple sensor signals are combined with the aim of estimating the original source signals. Independent component analysis (ICA) is one such method that effectively resolves static linear combinations of independent non-gaussian distributions. We propose a method that can track variations in the mixing system by seeking a compromise between adaptive and block methods by using mini-batches. The resulting permutation indeterminacy is resolved based on the correlation continuity principle. Methods employing higher order cumulants in the separation criterion are susceptible to outliers in the finite sample case. We propose a robust method based on low-order non-integer moments by exploiting the Laplacian model of speech signals. We study separation methods for even (over)-determined linear convolutive mixtures in the frequency domain based on joint diagonalization of matrices employing time-varying second order statistics. We investigate the sources affecting the sensitivity of the solution under the finite sample case such as the set size, overlap iv

5 amount and cross-spectrum estimation methods. v

6 To my family.

7 Contents List of Figures x 1 Introduction Abstract Motivation System Model Source signals Mixing system Noise Mixture signals Demixing system Global system Blind Source Separation (BSS) Independent Component Analysis Indeterminacies Separability and uniqueness Contrast functions Higher order statistics Fractional order statistics Second order statistics Iterative Search Algorithms vi

8 1.6.1 Deflation scheme Symmetric scheme Joint diagonalization Orthogonal Joint Diagonalization Non-orthogonal Joint Diagonalization Diagonality Measures Uniqueness Conditions Optimization methods Applications Hearing aids Teleconferencing Speech recognition Simulation Setup and Performance Metrics Source Signals Mixing System Mixture signals Noise Performance measures Outline of Dissertation Appendix BSS using Fractional Order Moments Abstract Introduction Contrast function The Search Surface Optimization on the unit circle vii

9 2.6 Statistical Properties of the Sample Contrast Estimator Simulations Synthetic data Speech data Some Generalizations Generalization to arbitrary orders Generalization to MIMO Chapter Summary Appendix Generalized Gaussian Distribution Distribution of linear combinations of independent Laplacian variables Statistics of the Sample Estimator Block Adaptive ICA with a Time Varying Mixing Matrix Abstract Introduction Problem Formulation Numerical Simulations Sinusoidal source signals Speech source signals Chapter Summary Sensitivity Analysis of Joint Diagonalization in Convolutive BSS Abstract Introduction System Model Scaling Correction viii

10 4.3.2 Permutation Correction Demixing System Estimation Joint Diagonalization of Cross-Spectral Matrices Cost Function Uniqueness Conditions Cross-Spectral Matrix Estimation Multitaper estimates Lag window estimates Frequency-averaged cross-periodogram Effects of Imperfect Cross-Spectral Matrix Estimation Numerical Simulations Database Simulations I Simulations II Chapter Summary Conclusion and Outlook Conclusions Open Issues Bibliography ix

11 List of Figures 1.1 Simultaneously active speaker scenario Conceptual block diagram of the blind source separation problem Impulse responses of synthetic mixing channels Magnitude responses of synthetic mixing channels Prerecorded room impulse responses Magnitude responses of prerecorded room channels Mutual information and negentropy of two Laplace mixtures D kurtosis and fractional order moments surfaces D kurtosis surface D fractional order moments surface Optimization of demixing angle with different initializations Bias and variance of sample cost functions Performance measures of kurtosis and fractional order moments Normalized fractional order moments of a generalized Gaussian pdf Performance measure of fractional order moments in a MIMO scenario Various members of a generalized Gaussian pdf Time varying mixing coefficients of a TITO system Effects of the block size on the separation of sinusoids Effects of initializations on the separation x

12 3.4 Correlations of the output signals Effects of the block size on the separation of speech mixtures SIR for various cross-spectrum estimators and overlap amounts SIR for various cross-spectrum estimators and number of tapers Histograms of performance and uniqueness measures Histograms of various statistics of source cross-spectra Effects of number of tapers on uniqueness and performance measures Effects of set size on uniqueness and performance measures Effects of number of tapers & segment size on various measures xi

13 Chapter 1 Introduction 1.1 ABSTRACT This chapter serves as an introduction to the problem considered in this dissertation. All the elements of the problem along with the key principles that lead to several approaches are clearly explained. A comprehensive account of existing approaches to the problem is organized into different categories according to the nature of the methods. Audio applications for which the techniques may be useful are provided. The overview of the dissertation is given at the end of this chapter. 1.2 MOTIVATION Speech enhancement is a signal processing task that is required in many situations in which the improvement in quality of a degraded speech signal is desired. The source of degradation may be reverberation, multiple interfering speakers, background noise. Most of today s single microphone noise reduction systems are based on spectral subtraction, and signal subspace decomposition methods [1], [2]. These methods have limited performances in the case of multiple interfering speakers. One way to overcome these limitations is to employ multiple microphones which is motivated by the human binaural system. Early multi-microphone noise reduction systems relied on fixed or adaptive beamforming, which remain in use today [3]. These systems 1

14 Source 1 Unknown environment Mixture 1 Output 1 Source 2 Mixing System BSS Mixture 2 Output 2 Figure 1.1: Two input two output simultaneously active speaker scenario. require some prior knowledge such as inactive time periods of the target source, source locations, microphone array geometry. In practice, however, this prior knowledge is rarely available, hence a system that will not depend on this information is highly desirable. Blind source separation (BSS), aptly named because they aim the recovery of the source signals from their mixtures when neither the mixing system nor the source signals are observable. The general problem may be described by the example illustrated in Fig. 1.1, where the speech signals of two simultaneously active speakers are recorded by two microphones. The speech signals spoken by the speakers are called source signals, and the microphone recordings are the mixture signals. The acoustic environment is represented by the mixing system. The microphone measurements typically contain components from both sources. The procedure that resolves the individual speaker s speech by operating on the recordings, without possessing information on each source, such as its active time periods and location, or on the mixing system is called the BSS. 2

15 S ( n) 1 SN s V ( n) 1 V ( ) N x n ( n) ( n) Unknown environment X ( n) 1 X N x Y ( n) 1 YN y ( n) Figure 1.2: Conceptual block diagram of the blind source separation problem. 1.3 SYSTEM MODEL Fig. 1.2 depicts the conceptual block diagram of the BSS problem. The general N s input, N x output mixing system can be formulated as: X(n) = H (S(n)) + V(n), (1.1) where S(n) denotes the (N s 1) vector of source signals S 1 (n),, S Ns (n), X(n) denotes the (N x 1) vector of mixture signals X 1 (n),, X Nx (n), V(n) is the (N x 1) vector of noise signals V 1 (n),, V Nx (n), and n is the discrete time index. Here, H is the (N x N s ) multichannel (MIMO) mixing system. Accordingly, the N x input N y output demixing system can be formulated as: Y(n) = W (X(n)). (1.2) where Y(n) denotes the (N y 1) vector of output signals Y 1 (n),, Y Ny (n), and W is the (N y N x ) multichannel demixing system. In the following, each component of the model (1.1) and (1.2) will be explained in detail along with some typical examples and assumptions. 3

16 1.3.1 Source signals In BSS we have a set of physical sources, located at distinct unknown locations, that simultaneously emit the signals S 1 (n),, S Ns (n). These signals occupy the same frequency range and are referred to as the source signals. Moreover, it is implied that the source signals are measured at the sources. For example, in case of speech, S i (n) is the waveform, which gives the pressure change with time at the lips output. In this dissertation we deal with speech signals. Assumptions on the source signals S(n) The following assumptions on the source signals are made throughout this dissertation. A BSS system must perform well for all speech signals. Thus, from the system point of view, its inputs are random processes whose sample functions are randomly selected by the users. Each S m (n) has zero mean, that is E[S m (n)] =, n, m = 1,, N s. S 1 (n),, S Ns (n) are statistically mutually independent at each time instant n. In order to solve the BSS problem the source signals should have some distinct characteristics, such as non-gaussianity, nonstationarity, or nonwhiteness. By exploiting each characteristic we obtain a different BSS method. Further assumptions will be given in later sections and chapters Mixing system The mixing system, H, has multiple sources delivering S 1 (n),, S Ns (n) at the input end and multiple sensors receiving the observed signals X 1 (n),, X Nx (n) at the output end, hence it is a (N x N s ) MIMO system. The sensors discussed in this dissertation are microphones. They are often designed with an omnidirectional 4

17 response. Received signals at the microphones are interchangeably called mixture signals, in the sense that each X i (n) contains some contribution from all the source signals. Assumptions on the mixing system H The following assumptions on the mixing system are made throughout this dissertation. We assume that H is a linear, stable, causal, convolutive (with memory) multi-channel system. The ratio of the number of sources to the number of mixtures is assumed to be (N s /N x ) 1. Furthermore, the frequency response matrix H(f) of the mixing system has full column rank for all frequencies f. The latter two assumptions are necessary to invert a linear mixing system with a linear demixing system. Example In the following an example on room acoustics is given. Consider the speech separation application (see Fig. 1.1) in which the speech (source) signals are recorded in a room with an array of microphones. Here the medium of propagation is air bounded with walls. Depending on the location of the sources and the microphones, each source signal undergoes some changes such as refraction, reflection, diffraction, attenuation [4]. As a result of these distortions each microphone will pick up not only exact copy 1 of each speech signal from the direct path but also some delayed and attenuated copies from indirect paths. This phenomenon is usually referred to as reverberation and its duration varies with the geometry of the environment. Under the assumption of linearity (M1), and (M4), the channel between the mth source and the lth microphone may be modeled by its time- 1 There might be some propagation delay. 5

18 varying impulse response H lm (n; n ). The impulse response provides a model of all the possible paths that the speech signal experiences on its travel from the source to the microphone. Here n denotes response time of the filter to the unit impulse applied by the source at time n n. The term time-varying generally implies motion of the sources and/or receivers. For stationary (fixed positions) sources and microphones, which may be valid over a short duration observation interval, the channel between the source and the receiver can be assumed as time-invariant. The impulse response function of a linear time-invariant (LTI) convolutive mixing system takes the following form H(n) = L h 1 k= H(k)δ(n k). If the mixing system is nonconvolutive, then the impulse response reduces to H(n) = H()δ(n), or we simply drop the time index and denote it by H. The instantaneous mixing model is usually used in anechoic environments for experimental purposes. Sect details the type of the mixing systems that will be used in the simulations Noise We assume that the noise is additive and statistically independent of the source signals. Typical examples are thermal, sensor noise, and background noise, that cannot be modeled as a point source e.g., wind, traffic (diffuse) Mixture signals Consider that the mixing model (1.1) is linear so that the mixture signals take the following form X(n) = H(n) S(n) + V(n), (1.3) 6

19 where denotes convolution. (1.3) will be referred to as the convolutive mixing model. Taking the STFT of (1.3) leads to: X(f, i) = H(f)S(f, i) + V(f, i), (1.4) where X(f, i), S(f, i) are the STFT of the mixture and source signals at frequency f and segment i. Here, H(f) denotes the complex-valued (N x N s ) frequency response matrix of the LTI mixing system H(n). Note that the convolutive mixing problem reduces to the instantaneous mixing one by moving from the time domain to the frequency domain. Other differences between these two formulations are (i) the variables in (1.3) are real-valued, whereas they are complex-valued in (1.4), and (ii) amplitude distributions of the signals are different in two domains. On the other hand, one particular case of (1.3) which is based on (M7) is the linear instantaneous model X(n) = HS(n) + V(n). (1.5) Since Fourier transform preserves linear relations, (1.5) can be written in the frequency domain as X(f, i) = HS(f, i) + V(f, i). (1.6) Demixing system The demixing system, W, has a set of sensors delivering the mixture signals X(n) at the input end and multiple outputs delivering the output signals Y(n) at the output end. W is a (N y N x ) MIMO system whose parameters need to be adjusted according to some criteria to achieve separation. The type of W, such as linear/nonlinear, convolutive/nonconvolutive, depends directly on the type of H. Assuming a linear, convolutive demixing model choice of its structure, i.e., FIR (tapped-delay-line), lattice, direct-form IIR, etc. is also an important issue. 7

20 1.3.6 Global system The cascade of the mixing and demixing systems is usually referred to as the global system G. It is widely used in formulating various performance measures for controlled test simulations (see Sect ). Assuming linear instantaneous model (1.5) the global system G is a (N y N s ) matrix G = WH. (1.7) Under the linear convolutive model (1.3) the global system G(n) is the MIMO filter G(n) = W(n) H(n). (1.8) 1.4 BLIND SOURCE SEPARATION (BSS) Blind Source Separation (BSS) is an example of an inverse problem, in the sense that it identifies the inverse of a mixing system referred to as demixing system. Moreover, it falls in the realm of unsupervised learning, owing to the fact that identification of the demixing system has to be performed without having access to a reference signal or the mixing system. To get around this difficulty there is a need for some strong a priori information on the signals of interest. BSS algorithms incorporate this prior information in the design criteria, so as to estimate the demixing system. The prior information has to be statistical to be effective. One of the earliest methods applied in communications is the constant modulus algorithm (CMA). This method achieves separation and equalization in a blind fashion by minimizing the deviation of the separated output magnitudes from a fixed gain. The underlying assumption is that the source signals, such as PSK and FSK, have constant magnitudes with non-gaussian (sub-gaussian) pdf. 8

21 Another class of algorithms collectively termed as Independent Component Analysis (ICA) uses statistical independence and non-gaussian amplitude distributions of source signals as the prior information to solve the BSS problem. A wide range of ICA algorithms are based on higher order statistics and information theory. ICA can be used for the speech signals because amplitude distribution of speech signals is super-gaussian, e.g., Laplacian for a wide range of segment sizes. Other BSS approaches exploit the non-stationarity, non-whiteness of source signals as the prior information. These assumptions allow for Gaussian sources, and hence second order statistics (SOS) (see Sect ) are sufficient for separation. Speech signals are considered to be stationary over 3-4 ms long segments. Over these stationary segments, they are temporally correlated. For durations greater than 4 ms, speech signals are non-stationary, in the sense that temporal correlations, hence the variances vary from one segment to another Independent Component Analysis ICA is a statistical method that is widely used to solve the BSS problem. The main assumption behind ICA is (S2). The aim of ICA is to estimate a demixing system W, using the N x available mixture signals X(n), such that the output signals Y(n) are statistically independent. Implicit in the ICA formulation is that each signal S i (n) is a SSS random process with a pdf f 2 Si. Similarly, each random vector S(n) is described by the joint pdf f S. Statistical dependencies between multiple random processes with arbitrary pdfs are quantified by mutual information. This information theoretic measure is the basis of ICA. The mutual information I(Y 1 ; ; Y Ny ) between 2 Note that, if S i (n) is SSS, then f Si is independent of n. 9

22 N y random variables with joint pdf f Y is defined as [5] I(Y 1 ; ; Y Ny ) = I(Y) = f Y (u) log f Y(u) du. (1.9) Π i f Yi (u i ) It becomes zero when the underlying random processes are statistically independent. Mutual information is a particular case of Kullback-Leibler divergence (KLD). The KLD between two probability distributions f Sk and f Sl is defined as [5] D KL (S k ; S l ) = f Sk (x) log f S k (x) dx. (1.1) f Sl (x) D KL (S k ; S l ) is a nonnegative measure and becomes zero if and only if f Sk = f Sl. From the definitions it is clear that I(Y) = D KL (f(y); Π i f Yi ). I(Y) is nonnegative and becomes zero if and only if Y(n) are statistically independent (joint pdf factorizes into product of marginal pdfs). As in the KLD, the mutual information depends only on the pdf of the random vector, and hence mutual information is sometimes written as I(f Y ) rather than I(Y). In practice, the mutual information is difficult to utilize, therefore, various statistical criteria have been proposed based on its approximations. Contrast functions, as we will see in Sect. 1.5, cast these statistical criteria into optimization problems, the elements of W being the optimization parameters. The contrast function should possess desirable properties such as its global minimum point defines W yielding Y(n) that are statistically independent. Also, any contrast function should be invariant to several factors, such as scaling and permutation as they are inherent indeterminacies in any BSS algorithm. This will be discussed next Indeterminacies Ideally, we want to achieve Y(n) = S(n). However, such a case requires perfect separation and dereverberation which is not possible without precise knowledge on the sources or the mixing. In BSS sources can be estimated up to several indeterminacies 1

23 because neither S(n) nor H is accessible. Depending on the type of H (and W) we have different indeterminacies. For linear models, these are (i) an arbitrary scaling (filtering) of each source and (ii) a permutation of the source indices [6]. Due to the multiplicative form of the linear instantaneous mixing model, (1.5) can be rewritten as X(n) = (HΛ 1 Π 1 ) (ΠΛS(n)) +V(n), where Λ is a diagonal matrix with nonzero elements and Π is a permutation matrix obtained by interchanging the columns of the identity matrix. Because of these ambiguties, the goal of the BSS is not to recover identical copies of the source signals at the outputs, rather it is to recover source signals without any interference from other sources. This is equivalent to finding W such that the global system (1.7) satisfies the following: G = ΠΛ, (1.11) If Λ and Π are the only two indeterminacies in finding W, then the solution to the BSS problem is said to be unique. Equivalently, the matrices W and H 1 are said to be essentially equal [6]. Under the convolutive mixing model (1.3) the permutation indetermination stays the same, however the scaling indetermination Λ becomes a filtering indetermination Λ(n), and the goal becomes to estimate W(n) that satisfies the following: G(n) = ΠΛ(n). (1.12) Note that, if the demixing estimation is performed independently for each frequency then both the scaling and permutation factors become frequency dependent Separability and uniqueness This section presents an overview of the separability and uniqueness issues of the demixing system W using BSS. Separability means recovery of the sources at the 11

24 ouputs by means of W. By uniqueness we mean that W achieving separation is unique up to aforementioned ambiguities (see Sect ). The question to be addressed is that under what assumptions statistical independence of Y(n) guarantees W satisfying (1.11) or (1.12). For the linear instantaneous mixing model (1.5) theoretical results are provided in [7], [8], [9]. In [1] it was shown that the problem has no solution for Gaussian and temporally iid sources. For temporally iid sequences, temporal correlations at nonzero lags vanish, hence only the statistical properties at zero lag may be used. For Gaussian distributed sources this reduces to the use of auto- and cross-correlations at zero lag as they are the sole parameters that determine their multivariate pdf. However, as we will see in Sect. 1.5, by putting constraints on the cross-correlation and variances of the output signals W can only be determined up to an orthogonal transformation. In order to uniquely identify W one can assume that the sources are possibly temporally iid but non-gaussian, and use contrasts involving higher order statistics such as mutual information, negentropy, entropy, cumulants to find it [7]. possibly Gaussian but nontemporally iid (have temporal structure), and use contrasts involving second order statistics such as cross-correlation matrices at multiple lags [8], zero-lag cross-correlation matrices at multiple times [11], [9] to find it. For convolutive mixtures, theoretical results are provided in [12], [13], [14], [15]. 12

25 Effects of noise In the noiseless and overdetermined case, there is no advantage in using the additional N x N s mixtures, and any set 3 of N s out of the N x mixtures can be used for separation [16]. This effectively reduces the dimension of the demixing parameter space turning the problem into an even determined one. In the noisy case, depending on the noise level on each channel unreliable channels can degrade the separation performance. Unreliable channels must be excluded from the combination [17]. Under the overdetermined model, if the noise is spatially iid with equal variance, then classical subspace based methods may be employed to find the whitening matrix, one of the two matrices constituting the demixing matrix (see Sect. 1.5). The other part of the demixing matrix is usually found using higher order statistics which allows (in theory) unbiased estimation of W so that WH = ΠΛ. Even so, the output signals cannot restore the sources because Y(n) = ΠΛS(n) + WV(n). The additive term implies that noise may be amplified. 1.5 CONTRAST FUNCTIONS In supervised adaptive filtering, a reference signal is typically employed to determine the optimal demixing parameters. However, in unsupervised adaptive filtering, e.g. BSS, such reference signals are not available. Therefore, there is a need to construct a (contrast) function of the demixing system parameters W, that will not utilize S(n) or H. Moreover, at the global maxima of this function WH = ΠΛ needs to be satisfied. Formal definition of a contrast functional is given in [18] in SISO blind deconvolution context. [7] extended them to be used in ICA under a linear instantaneous mixing model. A contrast ψ is a function mapping the pdf f S of a multidimensional random 3 provided that the associated N s N s mixing matrix is invertible 13

26 process S(n) to a real scalar, and satisfying the following properties [7], [19]: (C1) Invariant to scaling: ψ(s(n)) = ψ(λs(n)). (C2) Invariant to permutation: ψ(s(n)) = ψ(πs(n)). (C3) If S(n) has independent components and G is an invertible matrix, then ψ(gs(n)) ψ(s(n)). (C4) Equality holds if and only if G takes the form in (1.11). Note that, as we will see in the following examples, contrast ψ is a function of the pdf f Y of the output signals Y(n). This implies that ψ depends on W. In the following, we briefly review the contrast functions used in ICA. Mutual information The negative of mutual information (1.9), that is ψ(y(n)) = I(Y 1 ; ; Y Ny ) is a contrast function [7]. This has been recognized as the canonical contrast for ICA. The problem with the mutual information is that its estimation requires the estimation of joint and marginal pdfs. The density estimators could be parametric [2] or nonparametric, which are usually based on histograms [21] or kernels [22]. Likelihood When the source distribution f S is known, the maximum likelihood principle leads to minimizing the KL divergence between f Y (n) and f S (n). In [23] it was shown that ψ(y(n)) = D KL (Y(n); S(n)) is a contrast function. 14

27 Negentropy By introducing a reference random vector Y G having a multivariate Gaussian distribution with the same covariance matrix C Y as Y, negentropy N(Y) of the random vector Y can be written as [7] N(Y) = H(Y G ) H(Y), (1.13) where H(Y) is the differential entropy of Y defined in (1.4). Negentropy N(Y) is nonnegative and invariant under invertible linear transformations, that is for any nonsingular matrix A one can readily verify that N(Y) = N(AY) holds. In other words, N(Y) is not a discriminator of W, and hence it cannot be used as a contrast. However, from (1.9) and (1.13) the mutual information may be written as [7] I(Y) = N(Y) i N(Y i ) log det diag C y det C y. (1.14) The middle term in (1.14) is the sum of marginal negentropies of each Y i. As opposed to N(Y), N(Y i ) varies with W, and hence it can be used as a discriminator. The last term in (1.14) contains second-order statistics and vanishes when C Y is diagonal. This is usually achieved by spatial whitening (sphering) of the mixture signals X(n) (see Sect. 1.5). Under these conditions, the only term that can be used to minimize the mutual information is the middle one, and thus ψ(y(n)) = i N(Y i) can be used as a contrast function. There are various methods utilizing the sum of marginal negentropies as a contrast to be maximized with the main difference in its approximations. As an example, FastICA algorithm uses several nonlinear functions to approximate the negentropy [24]. These nonlinear functions imply the use of higher order statistics. 15

28 Spatial Whitening Decorrelating the output signals and normalizing their variances to unity are usually referred to as prewhitening or sphering. For linear instantaneous mixtures, prewhitening amounts to a linear transformation of the mixture signals by a whitening matrix Q, which is usually taken as any square root of the inverse of the spatial mixture covariance matrix. The spatial covariance matrix of the mixture signals is defined as C x = EX(n)X (n), (1.15) which is a N x N x positive definite Hermitian symmetric matrix. The spatial covariance matrix of the output signals is defined similarly and denoted by C y. In simple terms, the idea is to satisfy C y = I. Let C x = E x D x E x be the EVD of C x, then Q = C 1 2 x = D 1 2 x E x is a whitening matrix. In other words, Ỹ(n) = C 1 2 x X(n) has identity covariance matrix under the assumption that the source covariance matrix is also identity, i.e., C s = I. Note that the whitening matrix Q is not unique, in the sense that any orthogonal matrix multiplying it from the left is another whitening matrix. Whitening results in the last term in (1.17), (1.18) to vanish; confining the demixing matrix to the set of orthogonal matrices (unitary matrices in the complex-valued case). When X(n) is noisy, i.e., V(n) in (1.5) it can be shown that the additive noise V(n) introduces a bias in the estimated whitening matrix Q [25]. If the noise covariance matrix, C v = EV(n)V (n), is known or can be estimated, then bias removal may be employed [26]. For the linear convolutive mixing case, prewhitening is achieved through linear multichannel prewhitening filter Q(n) [27]. 16

29 1.5.1 Higher order statistics The moments and cumulants of integer valued orders greater than two are usually referred to as higher order statistics. Fourier transforms of higher order cumulants give the poly-spectra. For example, the Fourier transform of the third order cumulant sequence is called bispectrum or bispectral density. In [7] negentropy N(Y i ) is approximated using a finite number of cumulants. The underlying assumption is that the f Yi is given by a reference Gaussian distribution multiplied by a fourth order polynomial (e.g., Edgeworth expansion). For zero mean and unit variance Y i it yields N(Y i ) 1 12 K2 3(Y i ) K2 4(Y i ) K4 3(Y i ) 1 8 K2 3(Y i )K 4 (Y i ), (1.16) where K 3 (Y i ) and K 4 (Y i ) denote skewness and kurtosis of Y i, respectively. Substituting (1.16) for N(Y i ) in (1.14), I(Y) can be approximated as I(Y) N(Y) log det diag C Y det C Y. { 4K 2 3 (Y i ) + K4(Y 2 i ) + 7K3(Y 4 i ) 6K3(Y 2 i )K 4 (Y i ) } i (1.17) If f Yi is symmetric around its mean, then K 3 (Y i ) = and (1.17) reduces to the following: I(Y) N(Y) 1 48 i K 2 4(Y i ) log det diag C Y det C Y. (1.18) Even though derived as an approximation to mutual information through polynomial expansion of pdfs, higher order cumulants yield contrast functions. In particular, the following functionals utilizing fourth-order cumulants (kurtosis) i K2 4(Y i ), i K 4(Y i ), i K2 4(Y i )/K 4 2(Y i ), i K 4(Y i )/K 2 2(Y i ) are contrasts, implying that they are free from spurious maxima. As a result, the associated iterative algorithms are globally convergent to a valid separation solution. 17

30 While the first two require whiteness constraint, the latter two do not employ any constraint as they are already normalized [7], [28], [29], [3]. Another criterion based on fourth-order cumulants ψ(y(n)) = ijkl,i j κ2 Y (i, j, k, l), under the whiteness constraint is the JADE contrast, that was proposed in [31]. Besides fourth-order cumulants based contrasts the functional ψ(y(n)) = i K2 m(y i ) under the whiteness constraint has been shown to be a contrast 4 for any m 3 in [7]. In practice, cumulants need to be estimated from the received data. This is usually done by sample averaging, however, according to [32], the sample size needed to estimate the mth order statistics of a stochastic process, subject to prescribed values of estimation bias and variance, increases exponentially with order m [33]. This justifies the use of the fourth order cumulants among the HOS as a contrast function Fractional order statistics The absolute moments of noninteger valued orders of probability density functions are referred to as fractional order statistics (FOS). Definitions of cumulants and moments of integer valued orders can be generalized to noninteger values of order m by means of the techniques of fractional calculus [34], [35]. We should emphasize that we are interested in moments of fractional orders with values less than four. The motivation behind the use of low FOS in a contrast function is that for a given sample size, their sample estimators have lower variance compared to that of HOS. Fractional moments have no obvious pictorial representation of the pdf, whereas 4 The use of odd valued m is justified if the underlying pdfs are skew. 18

31 mean is related to the center for unimodal pdfs, variance is an indicator of spread, skewness is related to symmetry, kurtosis is a measure of peakedness, etc. The mth order, where < m <, absolute moment of a pdf f S associated with a RV S(n) is defined by [35] E S m = s m f S (s) ds, (1.19) provided that the integral exists. Methods utilizing FOS for the BSS can be found in [36], [37], [38] Second order statistics Second order statistics (SOS) include second order cumulants, i.e., cross-correlation, auto-correlation functions in time domain. Their Fourier transforms give power spectral density and cross-spectral density functions in the frequency domain. SOS based approaches have the advantage that they do not require any apriori information on the source pdfs. The major limitation of the methods utilizing SOS is that separation is possible only when the source signals are temporally colored and/or nonstationary. For instance, when the source signals have no temporal characteristics, that can be exploited then separation is not possible [7], [9]. Speech signals are considered to be non-stationary for durations gretaer than 4 ms [39]. Another attribute of the speech signals is that they are temporally correlated (colored). These two properties are often exploited to derive contrasts based on SOS in the BSS of speech signals. Methods utilizing SOS for the linear instantaneous mixing case are introduced in [4], [8], [9]. Approaches for the linear convolutive mixing case based on SOS generally exploit the non-stationarity property of the source signals either in the frequency domain [41], [42], [43], [44], [45], or in the time domain [46]. 19

32 1.6 ITERATIVE SEARCH ALGORITHMS Stationary points of a given contrast function ψ, since it has no closed form solution, are determined by an iterative algorithm: W (i+1) = W (i) + µ W W=W (i), (1.2) where W (i) is an estimate at iteration i =, 1, 2, µ denotes the step size, which is usually chosen as a small positive constant, and W is the update used to improve the estimate W (i+1) for the next iteration. There are a variety of algorithms with the difference in the way the update W is constructed. Typically, W is a function of the gradients of the contrast function ψ. However, the choice of the update term introduces a trade-off between convergence speed in terms of the required number of iterations and the computational complexity per iteration. On one hand, methods based on first-order gradient, such as steepest ascent method, have low computational complexity at the expense of slow convergence rate. On the other hand, methods utilizing second-order gradient, such as Newton s method, exhibit faster convergence with increasing computational complexity. There exist other methods exploiting the structure of the ICA model, such as natural gradient [47], equivariant algorithm [48], and fixed point algorithm [24]. In particular, [48] proposes to use multiplicative updates as opposed to additive the one in (1.2): W (i+1) = (I + ε) W (i), (1.21) where the gradient ε of the contrast function for this multiplicative scheme is called relative gradient. [47] approaches the problem by considering the underlying space of parameters W as Riemannian. They show that the steepest ascent direction in the Riemannian space of W is not J/ W as in the Euclidean space rather J W WT W, (1.22) 2

33 and call this as natural gradient. [24] derives his fixed point algorithm based on an approximation of Newton s method and named it as FastICA. Note that the FastICA algorithm operates on batch data. However, both the natural and relative gradient algorithms may be employed in on-line and off-line (batch) modes. All these latter algorithms have superior convergence rates compared to standard steepest ascent adaptation. When the underlying mixing system is time-invariant, batch methods are preferable because of their convergence speed due to more accurate gradient estimates than sample-based methods Deflation scheme In deflation scheme, the idea is to extract one output signal, then remove it from the mixture recursively, that is one after another [28]. In particular, at the first stage, first row w 1 of W is estimated using an iterative algorithm as in (1.2) 5. After convergence, one output signal is extracted and its contribution is removed from the mixtures leading to a N x N s 1 mixing system for the second stage. Typically, Gram-Schmidt orthogonalization is performed at each stage to remove the projections of the previously estimated rows from the current one [49]. This orthogonality constraint is necessary to prevent the algorithm from converging to previously estimated demixing system parameters. It should be emphasized here that the second term in (1.14) allows for deflation scheme of separation. In other words, we are maximizing i N(Y i) by maximizing its summands N(Y i ) at each stage through w i. Advantages of the deflation type BSS algorithms are the ability to estimate a subset of the source signals and reduced computational load. The major drawback, however, is the propagation of the error to the later stages. 5 The parameter of multivariate contrast is a vector instead of a matrix in the deflation scheme 21

34 1.6.2 Symmetric scheme In the symmetric scheme, all the output signals are extracted in a single stage. All of the demixing system parameters W are optimized simultaneously as in (1.2). Joint diagonalization is an example of symmetric scheme. 1.7 JOINT DIAGONALIZATION The joint diagonalization (JD) problem may be stated as finding a matrix W that operates, in the congruence sense, on a set C of D symmetric matrices C i, referred to as target matrices, so that D i = WC i W are diagonal for i = 1,, D. It is well known that any two symmetric matrices can be exactly jointly diagonalized under some mild conditions using the generalized eigenvalue decomposition [5]. In the BSS context, the target matrices C i admit the following structure C i = HΛ i H, i = 1,, D, (1.23) where H is the nonsingular mixing matrix, and Λ i are diagonal matrices i. Under these conditions, the demixing matrix defined by W = H 1 up to permutation and scaling indeterminacies is characterized by an exact joint diagonalizer of the set C. In general, the assumptions on the source signals, in a similar way that they turn into various contrast functions, are used to construct the diagonal matrices Λ i. Some typical examples are the correlation matrices at multiple lags [8], at multiple times [11], [9] and higher order joint cumulant matrices [31]. Usually, the target matrices in the set are estimated from the available data. Because of the estimation errors the hypothesized structure (1.23) of the target matrices is lost and an exact joint diagonization is no longer possible. In this case, however, it is possible to estimate W that will approximately jointly diagonalize the estimated target matrices in the set C. It is beneficial to jointly diagonalize more than two matrices to avoid the sensitivity 22

35 to estimation errors in the target matrices [51], [52], [53]. We will elaborate on this in Chapter 4. The JD problem may be broadly categorized as (i) orthogonal joint diagonalization (ii) non-orthogonal joint diagonalization according to the restrictions on the form of H, hence W Orthogonal Joint Diagonalization In the orthogonal joint diagonalization (OJD) problem, the joint diagonalizer is restricted to be orthogonal. In a general BSS context, this is usually done by first finding a whitening matrix Q as any square root of the inverse of the spatial mixture covariance matrix, say one of the matrices C 1 in the set C, and then transform the remaining matrices C i, i = 2,, D in C into the following C i = QC i Q. This prewhitening stage reduces the JD problem to seeking an orthogonal joint diagonalizer matrix W of the transformed set { C 2,, C D }. The non-orthogonal demixing matrix is then found as W = WQ [31], [8] Non-orthogonal Joint Diagonalization In the non-orthogonal joint diagonalization (NOJD) problem, the joint diagonalizer W is not resticted to be orthogonal. The prewhitening stage in the OJD approach attains exact joint diagonalization of C 1, however, possible estimation errors in C 1 may have a severe effect on both the transformed set and the resulting orthogonal diagonalizer. Therefore, such an approach is known to limit the attainable separation performance [54], [55]. As a consequence, many authors proposed NOJD methods to avoid prewhitening [51], [56], [55], [57], [58]. 23

36 1.7.3 Diagonality Measures Any JD method aims at minimizing some measure of joint deviation from diagonality. In the following two common off-diagonality measures are reviewed. Frobenius norm criterion The first measure is based on the following least-squares squared Frobenius norm: J (W) = D WCi W diag WC i W 2, (1.24) F i=1 where 2 F is the squared Frobenius norm. The minimizer W opt of J is called the joint diagonalizer of the set C = {C 1,, C D }. To avoid the trivial or singular minimizers usually some constraints such as unity determinant [58], rows of unity norm [59], unity diagonal [6], orthogonality [31], [8] are employed. Other variants of (1.24) include a set of positive weights α i yielding a weighted least-squares criterion [55]. Log-likelihood function criterion The second measure is suitable for positive-definite matrices C i and can be traced back to the likelihood criterion [61], [11], [51] J (W) = D i=1 log det diag WC iw det WC i W, (1.25) where det is the determinant operator, and diag(a) is a diagonal matrix with the same diagonal as A. It can be shown that J with equality if and only if WC i W is diagonal i. Furthermore, (1.25) is both scale and permutation invariant, that is J(ΠΛW) = J(W). Note that (1.24) does not have this invariant property. Other variants of (1.25) include a set of positive weights α i in the criterion [9]. 24

37 1.7.4 Uniqueness Conditions [52] quantified the uniqueness of the solution for the JD problem by the introduction of a parameter ρ which is called the modulus of uniqueness. It is defined on the (D N s ) size matrix Ψ that consists of the Λ i diagonal in row i as shown: Λ 11 Λ 12 Λ 1Ns Ψ =..... ]... [λ = 1 λ 2 λ Ns, (1.26) Λ D1 Λ D2 Λ DNs where the (D 1) vector λ i denotes the ith column of Ψ. Collinearity between the columns may be measured by the cosine of the angle between them as ρ ij = λ i λ j λ i λ j, i j = 1,, N s. (1.27) It is assumed that ρ ij = 1 if λ i = for some i. The modulus of uniqueness for the set of diagonal matrices Λ i, i = 1,, D is defined as ρ = max i,j ρ ij. The uniqueness of the solution, that is the essential equivalence of W and H 1 is formulated as ρ < Optimization methods In general, the solution of the JD problem is found using an iterative algorithm. Many algorithms have been proposed with the main differences in the type of iterations used to minimize the cost function with constraints and parameterizations of the joint diagonalizer. For example, [58] uses a Jacobi-like algorithm to construct a constrained matrix of determinant one with equal column norms by successive multiplications of Givens rotations, hyperbolic rotations and diagonal matrices. In [51] a computationally efficient iterative algorithm for solving the minimization of (1.25) was proposed. Since we are going to use this method in Chapter 4, it is briefly described next. The algorithm is based on the classic Jacobi approach of making successive transformations on each pair of rows of W as follows: Let wi T and wj T denote the i-th and j-th 25

38 rows of W. They are transformed as wt i wt i T ij wt i. (1.28) w T j w T j w T j without changing the other rows. Here, T ij is a 2 2 transformation matrix having the following closed form where with f ij = 1 D T ij = e ij, (1.29) 1 4e ij e ji e ji e ij = f ij 1 e ji 1 f ji [ D WCl W ] jj l=1 [WC i W ] ii 1, g ij = 1 D g ij, (1.3) g ji D l=1 Re [ WC l W ] ij [WC i W ] ii, (1.31) [A] ij denoting the (i, j)-the element of A. The key point is that the transformation T ij in (1.28) always decreases (1.25) unless g ij = g ji =. The iterations proceed by applying the procedure to all of the N y (N y 1)/2 pairs of rows until convergence is attained. Since the transformations are not orthogonal, the resulting matrix W is not orthogonal. Note that this procedure requires that all the target matrices in C be positive-definite. 1.8 APPLICATIONS In the following we briefly review possible application areas where the BSS may be beneficial Hearing aids For hearing aid users enhancement of the hearing and understanding the desired speech are essential. Amplification helps most hearing-impaired people to hear speech. 26

39 However, in a noisy place, hearing aids will amplify noise as well as the desired speech signal. Many schemes exist to suppress background noise and interfering sources, and enhance the desired speech improving the signal-to-noise-ratio [62]. As an example microphone array systems performing fixed or adaptive beamforming are still in use today [63], [3]. The drawback of beamforming is that it needs a priori information about the source positions and the microphone array geometry. In practice, however, such information is rarely available, so that BSS methods may be used instead [64], [65] Teleconferencing Audio and video conferencing, collectively termed teleconferencing systems are widely used to facilitate communication among several people located far away from one another. These systems are often used for meetings, during which numerous people using a single teleconferencing device are talking to each other in a room. In such situations the sound captured by multiple microphones of the teleconferencing device is a mixture of multiple reverberant speech signals, resulting in poor intelligibility for the remote listener. In such applications, BSS can be used to improve the sound quality [66]. In addition, this improvement would lead to better audio compression enhancing the efficiency of the transmission Speech recognition Speech recognition is one of the key technologies that will enable verbal communication between humans and computers. One of the shortcomings of the present speech recognition technology is when the speech is recorded at a distance from the speaker. In addition to this, other talkers and noise in the environment can corrupt the speech signal as it is recorded. BSS methods may be used in such scenarios as a preprocess- 27

40 ing stage to help improve the recognition rate. The application areas include voice controlled devices used in intelligent home and office environments, humanoid robots, automobiles, speaker identifiers, and speech-to-speech translation [67]. 1.9 SIMULATION SETUP AND PERFORMANCE METRICS In this section we summarize the types of the source signals and the mixing channels used in the simulations of this dissertation Source Signals In the simulations we use the following type of signals as S(n): iid sequence of random samples drawn from the Laplacian distribution. speech signals from the TIMIT database [68]. The speech signals are constructed from different utterances, half from male and half from female speakers, without intervening pauses. The utterances have been recorded in a quite environment with a close microphone, so that any reverberation is negligibly small. The speech signals are sampled at 16 khz Mixing System For the simulation tests the above sources are mixed using a set of different mixing situations, including instantaneous mixing matrices, convolutive mixing matrices. Under the linear instantaneous mixing model (1.5) H is chosen according to the following: N x N s matrix with elements drawn from zero mean unit variance Gaussian distribution. 28

41 N s N s orthogonal matrix. In particular, when we consider the TITO model, orthogonal H will be chosen as the Givens rotation matrix: H = where θ is the rotation parameter. cos θ sin θ sin θ cos θ, (1.32) Under the LTI model (1.3) the elements of H(n) are selected according to the following: synthetic mixing: (i) iid zero mean unit variance variables drawn from Gaussian distribution, (ii) minimum phase FIR channels, real mixing: measured room impulse responses from the R-HINT-E database provided in [69]. In the following we give two examples of linear, convolutive 2 2 mixing systems. The impulse responses of the synthetic mixing filters (ii) are plotted in Fig.1.3. Fig. 1.4 shows the magnitude response functions of the mixing channels in panels (a)-(d), and the condition number of H(f) is plotted as a function of frequency in panel (e), all in db scale. As the second example we used the premeasured room impulse responses obtained from the R-HINT-E database provided in [69]. These measurements were conducted in hearing aid design context at McMaster University. Details on the room configurations, measurement setup and technique are provided in [7]. Here, we briefly describe the measurement environment. The impulse responses were measured at microphones placed in the ears of a human head and torso model (KEMAR) from different locations in a reverberant classroom (reverberation time T 6 around 13 ms) with dimensions KEMAR was located in the center of the room 29

42 (a) H 11 (n) (b) H 12 (n) 1 1 Amplitude.5 Amplitude (c) H 21 (n) (d) H 22 (n) 1 1 Amplitude.5 Amplitude.5 4 sample, n 4 sample, n Figure 1.3: Impulse response functions of the 8-tap mixing channels. with a microphone in each ear 55 above the floor. A single loudspeaker was moved to 48 different locations around KEMAR with angles varying from to 36 clockwise direction in front of KEMAR. For each location, room impulse responses were measured and stored in a database called R-HINT-E. In the simulations involving 2 2 model, we chose the position of the first speaker at degree and the other one at 45 degree on a circle around the microphones. Both speakers are located at 6 high, and 6 away from the microphones (circle radius). The original sampling rate was 44.1 khz, however, to make consistent with the sampling rate of the source signals (speech), they are resampled to f s = 16 khz. Fig. 1.5 plots the four elements of H(n), each is 248 sample long. Fig. 1.6 shows the magnitude response functions of the mixing channels in panels (a)-(d) in db scale. Furthermore, the condition number 3

43 Mag. (db) Mag. (db) Mag. (db) (a) H 11 (f) (c) H 21 (f) (e) Condition number of H(f) Frequency (khz) Mag. (db) Mag. (db) Mag. (db) (b) H 12 (f) (d) H 22 (f) (f) Index of H(f) Frequency (khz) Figure 1.4: Magnitude response functions (a)-(d) of the channels associated with Fig 1.3, condition number of H(f) in (e), and performance index Index (H(f)) in (f). of H(f) is plotted as a function of frequency in db scale in panel (e). The condition number of H(f) takes values around 4 db f Mixture signals The mixture signals X(n) are generated according to (1.3) Noise The noise V(n) is generated as iid zero mean σ 2 variance variables drawn from Gaussian distribution. The variance σ 2 is chosen according to SNR level. 31

44 (a) H 11 (n) (b) H 12 (n) Amplitude Amplitude (c) H 21 (n) (d) H 22 (n) Amplitude Amplitude sample, n sample, n Figure 1.5: Impulse response functions of the premeasured mixing channels [69] Performance measures In BSS quality of separation is measured using various metrics, some of them being application specific. [71], [72] provide detailed discussions on the evaluation of BSS methods, the latter one on audio applications. One of the most widely used measure in instantaneous mixing case is the Index (G) that measures the cross-talk or interchannel interference [73]: Index (G) [ ] G ij max i j k G ik 1 + j [ i ] G ij max k G kj 1. (1.33) 32

45 Mag. (db) Mag. (db) Mag. (db) (a) H 11 (f) (c) H 21 (f) (e) Condition number of H(f) Frequency (khz) Mag. (db) Mag. (db) Mag. (db) (b) H 12 (f) (d) H 22 (f) (f) Index of H(f) Frequency (khz) Figure 1.6: Magnitude response functions (a)-(d) of the premeasured mixing channels associate with Fig.1.5. Condition number of H(f) (e), and performance index IndexH(f) (f). Index (G) with equality if and only if the perfect demixing (1.11) is achieved. For example, the following matrices G 1 = 1, G 2 = 4, 5 1 yield Index (G 1 ) = Index (G 2 ) =, while G 3 = 5.1,.15 G 4 = 1.5, yield Index (G 3 ) = 21.3 db and Index (G 4 ) = 18.2 db. In practice, an index value around 2 db indicates a successful performance. To illustrate 2 unsuccessful trials 33

46 consider the following matrices.16 G 5 = 2.47, 2.81 G 6 = 1.15, with Index (G 5 ) = 2.7 db and Index (G 6 ) =.8 db. Panels (f) of Figs. 1.4 and 1.6 show the index applied to frequency domain mixing matrices, Index (H(f)) as a function of frequency. They fluctuate around db, and any demixing matrix W(f) tries to pull Index (G(f)) below those values. The second measure to be used in this dissertation especially for the convolutive mixing scanearios is the signal to interference ratio (SIR). Suppose for a moment that the signal of interest is S i (n), and we are trying to estimate it at the jth output as Y j (n), then the SIR at the jth output is defined as SIR j = n [G ji(n) S i (n)] 2 i k n [G jk(n) S k (n)] 2. (1.34) Averaging (1.34) over all N y outputs, assuming that the desired signal is different at each output, we get an average output SIR. Similarly we can measure the input SIR by substituting H for G in (1.34). The ratio of the average output SIR to the average input SIR is usually called the average SIR improvement. Considering the two examples given in Sect , the average input SIR for the first system depicted in Fig. 1.3 is 4.2 db, while for the second system given in Fig. 1.5 is 3.4 db. 1.1 OUTLINE OF DISSERTATION The following is a detailed outline of the remaining chapters of this dissertation: Chapter 2 concentrates on the two-input two-output instantaneous mixing system which is the simplest case of the multi-input multi-output problem. Detailed studies of this subset of the general problem provide some insights to the BSS problem and 34

47 thus serve as a good starting point. The objective of this chapter is to demonstrate the gains in source separation that can be obtained by using fractional lower order moments. Given an explicitly defined probabilistic model (Laplacian distribution) for the sources we explore the use of fractional lower order moments as a criterion for blind source separation. This method starts with prewhitening of the mixture signals and relies on moments matching by means of an orthogonal transformation. A gradient based iterative search algorithm is used to solve the ICA problem. It is also shown that such criteria enjoy basic properties that avoid the existence of nonseparating solutions. Comparison of the proposed method with normalized kurtosis on both the synthetic data and speech data show that the separation performance is in favor of our approach over a wide range of block sizes. Some extensions to the general MIMO system and also moment orders are discussed. Chapter 3 discusses the adaptive ICA algorithms under nonstationary instantaneous mixing scenario. In environments where the rules of source combination change rapidly, adaptive or block adaptive methods must be deployed; and associated problems of convergence and permutation ambiguity solved. We propose using ICA on overlapping blocks (mini batches) and resolve the permutation ambiguity based on the principle of correlation continuity. We explore the effect of different initializations, block length, overlap percentage and sufficiency and utility of second order statistics to maintain continuity in the resolved signals. We demonstrate results using simulated test signals and real speech recordings. Chapter 4 examines the separation of convolutive mixtures of speech signals in the frequency domain. We investigate the sensitivity of the joint approximate diagonalization of a set of time-varying cross-spectral matrices. We study the effect of number of matrices in this set, and show that estimation of demixing system parameters is related to both several statistics of the perturbation term, occuring due to nonvan- 35

48 ishing cross-spectra, and uniqueness of the joint diagonalizer measured by modulus of uniqueness parameter. The second part discusses the cross-spectral matrix estimation in the orthogonal multitaper framework. Four different nonparametric cross-spectrum estimators that fall into this framework are compared via numerical simulations, where real speech signals, and both synthetic and real room impulse responses are used in a two-input, two-output scenario. Chapter 5 is the concluding chapter, suggestions on further development are included. 36

49 1.11 APPENDIX The (joint) characteristic function ϕ Y of a multivariate (joint) distribution f Y is defined as [74] ϕ Y (ν) = f Y (u) exp(jν T u)du, (1.35) where ν is a vector of deterministic variables ν 6 i. It is the inverse Fourier transform of the joint pdf, and under general conditions ϕ Y and f Y completely determine each other. Note that ϕ Y is real and even if and only if f Y is symmetrical around the origin. If the elements of Y(n) are statistically independent, we have ϕ Y (ν) = ϕ Y1 (ν 1 )ϕ Y2 (ν 2 ) ϕ YNy (ν Ny ), (1.36) where ϕ Yi is the characteristic function of the marginal pdf f Yi. Moreover, one can show that ϕ Y (ν) is continuous at ν =, and hence it can be expanded in Taylor series. Note that this is a polynomial expansion, where the exponent of each term is a positive integer. The coefficients in the expansion yields joint moments, and hence ϕ Y (ν) is also referred to as moment generating function. The logarithm of the characteristic function is called cumulant generating function ψ Y (ν) = log ϕ Y (ν) because the coefficients of its Taylor series expansion about ν = reveals cumulants. For example, let 1 i 1,, i k N y, then the kth order cumulant of Y(n) is defined as a k dimensional array with the (i 1,, i k )th element: κ Y (i 1,, i k ) = ( j) k k ψ Y (ν) ν i1 ν ik. (1.37) ν= If we let i 1 i 2 i 3 i 4 and assume that Y(n) 7 is zero-mean, then the fourth order cumulant, also called the fourth order cross-cumulant, may be expressed in terms of 6 If Y is a real-valued (complex-valued) random vector, then ν is also real-valued (complex-valued) with the same size. 7 to simplify the presentation, we assume that Y(n) is real valued. 37

50 its joint moments of orders up to four as: κ Y (i 1, i 2, i 3, i 4 ) =EY i1 (n)y i2 (n)y i3 (n)y i4 (n) EY i1 (n)y i2 (n)ey i3 (n)y i4 (n) EY i1 (n)y i3 (n)ey i2 (n)y i4 (n) EY i2 (n)y i3 (n)ey i1 (n)y i4 (n). (1.38) Letting i 1 = i 2 = i 3 = i 4 = i and assuming zero mean Y i (n), the fourth order cumulant is called the kurtosis of Y i (n): K 4 (Y i ) = κ Y (i, i, i, i) = EY 4 i (n) 3(EY 2 i (n)) 2. (1.39) If the components of Y(n) are statistically independent, then all the cross-cumulants of any order k vanish. Moreover, higher order k > 2 cumulants of a Gaussian random vector are zero. In general, cumulants may be interpreted as a set of descriptive constants of a pdf. The problem is that there are an infinite number of them. However, only finite number of them are typically used in a contrast function. The differential entropy of a random vector Y with joint pdf f Y is defined as follows [75] H(Y) = f Y (u) log f Y (u)du. (1.4) 38

51 Chapter 2 Blind Separation of Laplacian Sources based on Fractional Order Moments 2.1 ABSTRACT A new contrast function based on low fractional moments is proposed for blind source separation (BSS) of speech signals. Its study is motivated by a need to perform blind speech separation over short data frames. The new contrast function is more numerically stable; its estimates over short frames have better statistical properties than higher order measures such as kurtosis. The proposed contrast function is enabled by the Laplacian distribution of speech signals. Its theoretical and statistical properties are derived and tested using pseudo-random data as well as speech. Its performance is compared to that of kurtosis and it is shown that this contrast function consistently outperforms the normalized kurtosis over a wide range of frame lengths chosen between 5 and 5 ms. 2.2 INTRODUCTION Blind source separation (BSS) resolves mixtures into statistically independent signals by optimizing a contrast function such as kurtosis [76]. The extrema of the cost function correspond to an inversion of the mixing operation yielding the source 39

52 signals. Conditions for a successful numerical operation rely on the goodness of the estimate of the cost function. For finite data sets, large deviations from theoretical values create spurious peaks, and the demixing operation fails [77], [78]. The need to work with short frames is common in real-time speech separation applications, where delays beyond 1 ms. are not tolerable. The use of the fourth moment, for example, causes the kurtosis estimate to have a large variance and be highly susceptible to an occasional large valued sample. Insufficient data similarly corrupts gradient estimates in the adaptive case [28]. Use of lower order moments would reduce the estimation variance however third moments are zero for all symmetric distributions, and the variance is insufficient as a discriminator. The next logical choice is the use of fractional moments [36], [37]. It is rather fortunate that the Laplace distribution [79], whose fractional moments have many salient properties, is the widely accepted distribution of speech signals [8]. In this chapter we propose a novel contrast function defined in terms of fractional moments and show that it out performs normalized kurtosis in its discrimination of speech signals. The proposed fractional-moments contrast function is developed in Sect The theoretical search surface of the proposed contrast function are analyzed and compared to those of the kurtosis in Sect CONTRAST FUNCTION The absolute moments of the Laplace distribution are given by ( ) a σ ν a (X) = E X η a = Γ (a + 1), (2.1) 2 where η and σ are the location and standard deviation parameters, respectively. For the Gaussian distribution, the absolute moments satisfy ν a (X) = 1 ( ) ( ) a a + 1 2σ Γ π 2. (2.2) 4

53 Theorem The absolute fractional moments at a = 3/2 and a = 5/2 of all distributions, characterizing linear combinations of independent Laplace random variables satisfy ( ) ν 2/3 ( ) 3/2 ν 2/5 5/2, (2.3) Γ (1 + 3/2) Γ (1 + 5/2) with equality if and only if the distribution is Laplace. Proof. The equality of (2.3) for the Laplacian distribution is obtained by evaluating (2.1) at a = 3/2 and 5/2 solving for σ/ 2. For the Gaussian distribution, we evaluate (2.2) at a = 3/2 and divide by Γ (1 + 3/2) to get ( ) ν 2/3 3/2 = ( ) 2/3 Γ (5/4) 2σπ 1/3 =.748σ. (2.4) Γ (1 + 3/2) Γ (5/2) Analogous operation for a = 5/2 yields ( ) ν 2/5 5/2 = ( ) 2/5 Γ (7/4) 2σπ 1/5 =.6727σ, (2.5) Γ (1 + 5/2) Γ (7/2) and establishes the inequality. It has been shown [81] (and references therein) that the non-gaussianity of the sum of independent random variables (RV) is monotonically non-increasing. It follows, therefore, that the fractional absolute moments of the sum of independent Laplace RVs will also satisfy the inequality of (2.3). The above theorem suggests the proposed optimization statement to be used for speech separation: ( ) ν 2/3 ( ) 3/2 ν 2/5 5/2 J fm = (2.6) Γ (1 + 3/2) Γ (1 + 5/2) In the next section we analyze the search surface both in two and three dimensional demixing parameter spaces under the TITO model and compare it to that of the normalized kurtosis function. 41

54 2.4 THE SEARCH SURFACE Recall that the elements of S(n) = [S 1 (n), S 2 (n)] T are statistically independent, and identically Laplace distributed with f S, and that they are mixed by the orthogonal mixing matrix H, chosen as the Givens rotation matrix with the rotation parameter θ, given in (1.32), to produce the mixture signals X(n) = [X 1 (n), X 2 (n)] T by X(n) = HS(n). (2.7) Orthogonality of H preempts that the mixtures are decorrelated and allows us to focus on the merits of the proposed contrast to resolve the mixtures into independent components. Thus, we may set our aim to finding an orthogonal demixing matrix W = cos(α) sin(α), (2.8) sin(α) cos(α) with the rotation angle α [ π, π). Clearly the global matrix (1.7) is another orthogonal matrix with rotation parameter φ = α θ: G = cos(φ) sin(φ), (2.9) sin(φ) cos(φ) Since (2.9) establishes the link between Y i (n) and S i (n) via Y(n) = GS(n), it is fairly easy to determine the marginal pdf 1 f Y of the output signals Y i (n) as a parametric family of distributions with parameter φ, f Y (y) = 2 y cos(φ) e cos φ sin(φ) e 2 y sin φ 2 cos(2φ), φ Φ 1 (1/2 + y ) e 2 y, φ Φ 2 ( ) 1/ 2 e 2 y, φ Φ 3 (2.1) 1 Because of the symmetry f Yi = f Y, i = 1, 2. 42

55 where the regions are specified at integer multiples of π/2 in Φ 3, Φ 2 = ±π/4 ± π, and excluding them in Φ 1. The joint pdf f Y1 Y 2 of Y(n) can be determined using the convolution theorem in probability theory [82] as f Y1 Y 2 (y 1, y 2 ) = 1 ( 2 exp 2 sin (φ) y 1 + cos (φ) y 2 ) 2 cos (φ) y 1 + sin (φ) y 2. (2.11) Using (2.1) and (2.11), the mutual information I(Y 1 ; Y 2 ) between Y 1 (n) and Y 2 (n) (1.9), negentropy N(Y 1 ) (1.13), cumulants, moments of the output signals can be easily found. Fig. 2.1 plots I(Y 1 ; Y 2 ) in panel (a), and N(Y 1 ) in panel (b) as a function of the orthogonal global matrix parameter φ. Separation points are the ones where the mutual information becomes zero, and the negentropy takes its maximum value. Particularly, At φ = π: Y 1 (n) = S 1 (n), Y 2 (n) = S 2 (n). At φ = π/2: Y 1 (n) = S 2 (n), Y 2 (n) = S 1 (n). At φ = : Y 1 (n) = S 1 (n), Y 2 (n) = S 2 (n). At φ = π/2: Y 1 (n) = S 2 (n), Y 2 (n) = S 1 (n). Using (2.1), the normalized kurtosis defined by J kt = ν 4 ν (2.12) may be written as a function of the orthogonal global matrix parameter φ as ( ) 3 cos(6φ) + 7, φ Φ 8 cos(2φ) 1 J kt = 3/2, φ Φ 2 3, φ Φ 3, (2.13) Note that, since the Laplacian sources are super Gaussian, hence sign definite, we do not need to take the square or absolute value of (2.12) as a cost function to be 43

56 Amplitude.1.5 (a) Mutual information I(Y 1 ;Y 2 ) SEPARATION POINTS φ / π (b) Negentropy N(Y 1 )=N(Y 2 ).1 SEPARATION POINTS Amplitude φ / π Figure 2.1: Mutual information (a), and negentropy (b) as a function of the orthogonal global matrix parameter φ. maximized. It is well known that the maximization of (2.12) corresponds to the true demixing angles [76], [28]. To verify the same for the proposed fractional contrast function, we compute (2.6) for the Laplacian mixture distribution f Y (y) given by (2.1). Using ( ) cos φ a+2 sin φ a+2 a g (φ, a) =, (2.14) cos 2φ the result is given as J fm = (g (φ, 3/2) g (φ, 5/2)), φ Φ 1 ( (1 + 3/4) 2/3 (1 + 5/4) 2/5), φ Φ 2, φ Φ 3. (2.15) 44

57 3 (a) Kurtosis Amplitude Amplitude SEPARATION POINTS φ / π (b) Proposed cost for m 1 =1.5, m 2 = SEPARATION POINTS φ / π Figure 2.2: (a) Kurtosis, and (b) fractional order moments as a function of the orthogonal global matrix parameter φ. We can readily verify that J fm with equality only at the separation points, where Y i = ±S j, i, j = {1, 2} meaning that minimization of J fm will result in perfect separation. Note that, due to the symmetry of the Laplace pdf, ±S j have the same pdf and the sign ambiguity cannot be resolved. Fig. 2.2 illustrates the cost functions (2.13) in panel (a) and (2.15) in panel (b) as functions of the orthogonal global matrix parameter φ. It can be observed that the minima of J fm and the maxima of J kt occur at the correct demixing angles, corresponding to the separated signals ±S 1, S 2. We can also the plot the 3 D surfaces of the contrast functions in the space of W 11 and W 12. Fig. 2.3 and 2.4 show the surfaces of the kurtosis and the fractional order moment based contrasts, respectively. The separation points of this 2 2 problem are given by the maxima points of the kurtosis surface in Fig. 2.3 and the minima 45

58 Figure 2.3: Kurtosis as the contrast function in the space of W 11 and W 12. points of the fractional contrast in Fig Note that the projection of the surfaces onto the unit circle, that is where W W 2 12 = 1 yields the plots shown in panels (a) and (b) of Fig The algorithmic properties of the fractional order moments based contrast are derived in the next section. 2.5 OPTIMIZATION ON THE UNIT CIRCLE We can utilize one of the gradient based optimization techniques to find the minima of J fm. For its simplicity we prefer to use the steepest descent algorithm, with the following update α (i + 1) = α (i) λ ( δjfm δα ) α=α(i), (2.16) 46

59 Figure 2.4: Fractional order moments based cost function in the space of W 11 and W 12. where λ denotes the step size of the update, and i is the iteration index. Using the chain rule, the gradient can be found as follows δj fm ν 1/3 3/2 δα = 2 3 Γ 2/3 (1 + 3/2) δν 3/2 ν 3/5 5/2 δα 2 5 Γ 2/5 (1 + 5/2) δν 5/2 δα, (2.17) where δν a/2 δα = a [ ] 2 E Y i a/2 1 δ R, δα (2.18) δ Y i δα = sgn (Y i) (Y 1 cos (α) Y 2 sin (α)). Note that the absolute moments ν a are presumed to possess continuous first order derivatives excluding the case Y i =. Since the cost function J fm which is shown in Fig. 2.5 as a function of the orthogonal demixing angle α is free of spurious minima, α will converge to one of the four separating angles, depending on the initialization. 47

Independent Component Analysis. Contents

Independent Component Analysis. Contents Contents Preface xvii 1 Introduction 1 1.1 Linear representation of multivariate data 1 1.1.1 The general statistical setting 1 1.1.2 Dimension reduction methods 2 1.1.3 Independence as a guiding principle

More information

TRINICON: A Versatile Framework for Multichannel Blind Signal Processing

TRINICON: A Versatile Framework for Multichannel Blind Signal Processing TRINICON: A Versatile Framework for Multichannel Blind Signal Processing Herbert Buchner, Robert Aichner, Walter Kellermann {buchner,aichner,wk}@lnt.de Telecommunications Laboratory University of Erlangen-Nuremberg

More information

Acoustic MIMO Signal Processing

Acoustic MIMO Signal Processing Yiteng Huang Jacob Benesty Jingdong Chen Acoustic MIMO Signal Processing With 71 Figures Ö Springer Contents 1 Introduction 1 1.1 Acoustic MIMO Signal Processing 1 1.2 Organization of the Book 4 Part I

More information

Statistical and Adaptive Signal Processing

Statistical and Adaptive Signal Processing r Statistical and Adaptive Signal Processing Spectral Estimation, Signal Modeling, Adaptive Filtering and Array Processing Dimitris G. Manolakis Massachusetts Institute of Technology Lincoln Laboratory

More information

An Iterative Blind Source Separation Method for Convolutive Mixtures of Images

An Iterative Blind Source Separation Method for Convolutive Mixtures of Images An Iterative Blind Source Separation Method for Convolutive Mixtures of Images Marc Castella and Jean-Christophe Pesquet Université de Marne-la-Vallée / UMR-CNRS 8049 5 bd Descartes, Champs-sur-Marne 77454

More information

Massoud BABAIE-ZADEH. Blind Source Separation (BSS) and Independent Componen Analysis (ICA) p.1/39

Massoud BABAIE-ZADEH. Blind Source Separation (BSS) and Independent Componen Analysis (ICA) p.1/39 Blind Source Separation (BSS) and Independent Componen Analysis (ICA) Massoud BABAIE-ZADEH Blind Source Separation (BSS) and Independent Componen Analysis (ICA) p.1/39 Outline Part I Part II Introduction

More information

HST.582J/6.555J/16.456J

HST.582J/6.555J/16.456J Blind Source Separation: PCA & ICA HST.582J/6.555J/16.456J Gari D. Clifford gari [at] mit. edu http://www.mit.edu/~gari G. D. Clifford 2005-2009 What is BSS? Assume an observation (signal) is a linear

More information

NOISE ROBUST RELATIVE TRANSFER FUNCTION ESTIMATION. M. Schwab, P. Noll, and T. Sikora. Technical University Berlin, Germany Communication System Group

NOISE ROBUST RELATIVE TRANSFER FUNCTION ESTIMATION. M. Schwab, P. Noll, and T. Sikora. Technical University Berlin, Germany Communication System Group NOISE ROBUST RELATIVE TRANSFER FUNCTION ESTIMATION M. Schwab, P. Noll, and T. Sikora Technical University Berlin, Germany Communication System Group Einsteinufer 17, 1557 Berlin (Germany) {schwab noll

More information

Independent Component Analysis and Its Applications. By Qing Xue, 10/15/2004

Independent Component Analysis and Its Applications. By Qing Xue, 10/15/2004 Independent Component Analysis and Its Applications By Qing Xue, 10/15/2004 Outline Motivation of ICA Applications of ICA Principles of ICA estimation Algorithms for ICA Extensions of basic ICA framework

More information

Fundamentals of Principal Component Analysis (PCA), Independent Component Analysis (ICA), and Independent Vector Analysis (IVA)

Fundamentals of Principal Component Analysis (PCA), Independent Component Analysis (ICA), and Independent Vector Analysis (IVA) Fundamentals of Principal Component Analysis (PCA),, and Independent Vector Analysis (IVA) Dr Mohsen Naqvi Lecturer in Signal and Information Processing, School of Electrical and Electronic Engineering,

More information

STATS 306B: Unsupervised Learning Spring Lecture 12 May 7

STATS 306B: Unsupervised Learning Spring Lecture 12 May 7 STATS 306B: Unsupervised Learning Spring 2014 Lecture 12 May 7 Lecturer: Lester Mackey Scribe: Lan Huong, Snigdha Panigrahi 12.1 Beyond Linear State Space Modeling Last lecture we completed our discussion

More information

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II Gatsby Unit University College London 27 Feb 2017 Outline Part I: Theory of ICA Definition and difference

More information

ADAPTIVE FILTER THEORY

ADAPTIVE FILTER THEORY ADAPTIVE FILTER THEORY Fourth Edition Simon Haykin Communications Research Laboratory McMaster University Hamilton, Ontario, Canada Front ice Hall PRENTICE HALL Upper Saddle River, New Jersey 07458 Preface

More information

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood

More information

where A 2 IR m n is the mixing matrix, s(t) is the n-dimensional source vector (n» m), and v(t) is additive white noise that is statistically independ

where A 2 IR m n is the mixing matrix, s(t) is the n-dimensional source vector (n» m), and v(t) is additive white noise that is statistically independ BLIND SEPARATION OF NONSTATIONARY AND TEMPORALLY CORRELATED SOURCES FROM NOISY MIXTURES Seungjin CHOI x and Andrzej CICHOCKI y x Department of Electrical Engineering Chungbuk National University, KOREA

More information

Independent Component Analysis and Unsupervised Learning

Independent Component Analysis and Unsupervised Learning Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien National Cheng Kung University TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent

More information

Blind separation of instantaneous mixtures of dependent sources

Blind separation of instantaneous mixtures of dependent sources Blind separation of instantaneous mixtures of dependent sources Marc Castella and Pierre Comon GET/INT, UMR-CNRS 7, 9 rue Charles Fourier, 9 Évry Cedex, France marc.castella@int-evry.fr, CNRS, I3S, UMR

More information

CIFAR Lectures: Non-Gaussian statistics and natural images

CIFAR Lectures: Non-Gaussian statistics and natural images CIFAR Lectures: Non-Gaussian statistics and natural images Dept of Computer Science University of Helsinki, Finland Outline Part I: Theory of ICA Definition and difference to PCA Importance of non-gaussianity

More information

Tutorial on Blind Source Separation and Independent Component Analysis

Tutorial on Blind Source Separation and Independent Component Analysis Tutorial on Blind Source Separation and Independent Component Analysis Lucas Parra Adaptive Image & Signal Processing Group Sarnoff Corporation February 09, 2002 Linear Mixtures... problem statement...

More information

Rigid Structure from Motion from a Blind Source Separation Perspective

Rigid Structure from Motion from a Blind Source Separation Perspective Noname manuscript No. (will be inserted by the editor) Rigid Structure from Motion from a Blind Source Separation Perspective Jeff Fortuna Aleix M. Martinez Received: date / Accepted: date Abstract We

More information

Blind Signal Separation: Statistical Principles

Blind Signal Separation: Statistical Principles Blind Signal Separation: Statistical Principles JEAN-FRANÇOIS CARDOSO, MEMBER, IEEE Invited Paper Blind signal separation (BSS) and independent component analysis (ICA) are emerging techniques of array

More information

MULTICHANNEL SIGNAL PROCESSING USING SPATIAL RANK COVARIANCE MATRICES

MULTICHANNEL SIGNAL PROCESSING USING SPATIAL RANK COVARIANCE MATRICES MULTICHANNEL SIGNAL PROCESSING USING SPATIAL RANK COVARIANCE MATRICES S. Visuri 1 H. Oja V. Koivunen 1 1 Signal Processing Lab. Dept. of Statistics Tampere Univ. of Technology University of Jyväskylä P.O.

More information

Blind Deconvolution via Maximum Kurtosis Adaptive Filtering

Blind Deconvolution via Maximum Kurtosis Adaptive Filtering Blind Deconvolution via Maximum Kurtosis Adaptive Filtering Deborah Pereg Doron Benzvi The Jerusalem College of Engineering Jerusalem, Israel doronb@jce.ac.il, deborahpe@post.jce.ac.il ABSTRACT In this

More information

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007 MIT OpenCourseWare http://ocw.mit.edu HST.582J / 6.555J / 16.456J Biomedical Signal and Image Processing Spring 2007 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

BLIND DECONVOLUTION ALGORITHMS FOR MIMO-FIR SYSTEMS DRIVEN BY FOURTH-ORDER COLORED SIGNALS

BLIND DECONVOLUTION ALGORITHMS FOR MIMO-FIR SYSTEMS DRIVEN BY FOURTH-ORDER COLORED SIGNALS BLIND DECONVOLUTION ALGORITHMS FOR MIMO-FIR SYSTEMS DRIVEN BY FOURTH-ORDER COLORED SIGNALS M. Kawamoto 1,2, Y. Inouye 1, A. Mansour 2, and R.-W. Liu 3 1. Department of Electronic and Control Systems Engineering,

More information

Independent Component Analysis

Independent Component Analysis Independent Component Analysis Seungjin Choi Department of Computer Science Pohang University of Science and Technology, Korea seungjin@postech.ac.kr March 4, 2009 1 / 78 Outline Theory and Preliminaries

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Introduction Consider a zero mean random vector R n with autocorrelation matri R = E( T ). R has eigenvectors q(1),,q(n) and associated eigenvalues λ(1) λ(n). Let Q = [ q(1)

More information

Recursive Generalized Eigendecomposition for Independent Component Analysis

Recursive Generalized Eigendecomposition for Independent Component Analysis Recursive Generalized Eigendecomposition for Independent Component Analysis Umut Ozertem 1, Deniz Erdogmus 1,, ian Lan 1 CSEE Department, OGI, Oregon Health & Science University, Portland, OR, USA. {ozertemu,deniz}@csee.ogi.edu

More information

Blind Machine Separation Te-Won Lee

Blind Machine Separation Te-Won Lee Blind Machine Separation Te-Won Lee University of California, San Diego Institute for Neural Computation Blind Machine Separation Problem we want to solve: Single microphone blind source separation & deconvolution

More information

PROPERTIES OF THE EMPIRICAL CHARACTERISTIC FUNCTION AND ITS APPLICATION TO TESTING FOR INDEPENDENCE. Noboru Murata

PROPERTIES OF THE EMPIRICAL CHARACTERISTIC FUNCTION AND ITS APPLICATION TO TESTING FOR INDEPENDENCE. Noboru Murata ' / PROPERTIES OF THE EMPIRICAL CHARACTERISTIC FUNCTION AND ITS APPLICATION TO TESTING FOR INDEPENDENCE Noboru Murata Waseda University Department of Electrical Electronics and Computer Engineering 3--

More information

On Information Maximization and Blind Signal Deconvolution

On Information Maximization and Blind Signal Deconvolution On Information Maximization and Blind Signal Deconvolution A Röbel Technical University of Berlin, Institute of Communication Sciences email: roebel@kgwtu-berlinde Abstract: In the following paper we investigate

More information

System Identification in the Short-Time Fourier Transform Domain

System Identification in the Short-Time Fourier Transform Domain System Identification in the Short-Time Fourier Transform Domain Yekutiel Avargel System Identification in the Short-Time Fourier Transform Domain Research Thesis As Partial Fulfillment of the Requirements

More information

Adaptive Filtering. Squares. Alexander D. Poularikas. Fundamentals of. Least Mean. with MATLABR. University of Alabama, Huntsville, AL.

Adaptive Filtering. Squares. Alexander D. Poularikas. Fundamentals of. Least Mean. with MATLABR. University of Alabama, Huntsville, AL. Adaptive Filtering Fundamentals of Least Mean Squares with MATLABR Alexander D. Poularikas University of Alabama, Huntsville, AL CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is

More information

BLIND SEPARATION OF INSTANTANEOUS MIXTURES OF NON STATIONARY SOURCES

BLIND SEPARATION OF INSTANTANEOUS MIXTURES OF NON STATIONARY SOURCES BLIND SEPARATION OF INSTANTANEOUS MIXTURES OF NON STATIONARY SOURCES Dinh-Tuan Pham Laboratoire de Modélisation et Calcul URA 397, CNRS/UJF/INPG BP 53X, 38041 Grenoble cédex, France Dinh-Tuan.Pham@imag.fr

More information

3.4 Linear Least-Squares Filter

3.4 Linear Least-Squares Filter X(n) = [x(1), x(2),..., x(n)] T 1 3.4 Linear Least-Squares Filter Two characteristics of linear least-squares filter: 1. The filter is built around a single linear neuron. 2. The cost function is the sum

More information

Statistical signal processing

Statistical signal processing Statistical signal processing Short overview of the fundamentals Outline Random variables Random processes Stationarity Ergodicity Spectral analysis Random variable and processes Intuition: A random variable

More information

Unsupervised learning: beyond simple clustering and PCA

Unsupervised learning: beyond simple clustering and PCA Unsupervised learning: beyond simple clustering and PCA Liza Rebrova Self organizing maps (SOM) Goal: approximate data points in R p by a low-dimensional manifold Unlike PCA, the manifold does not have

More information

A Generalization of Blind Source Separation Algorithms for Convolutive Mixtures Based on Second-Order Statistics

A Generalization of Blind Source Separation Algorithms for Convolutive Mixtures Based on Second-Order Statistics 120 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL 13, NO 1, JANUARY 2005 A Generalization of Blind Source Separation Algorithms for Convolutive Mixtures Based on Second-Order Statistics Herbert

More information

CCA BASED ALGORITHMS FOR BLIND EQUALIZATION OF FIR MIMO SYSTEMS

CCA BASED ALGORITHMS FOR BLIND EQUALIZATION OF FIR MIMO SYSTEMS CCA BASED ALGORITHMS FOR BLID EQUALIZATIO OF FIR MIMO SYSTEMS Javier Vía and Ignacio Santamaría Dept of Communications Engineering University of Cantabria 395 Santander, Cantabria, Spain E-mail: {jvia,nacho}@gtasdicomunicanes

More information

1 Introduction Blind source separation (BSS) is a fundamental problem which is encountered in a variety of signal processing problems where multiple s

1 Introduction Blind source separation (BSS) is a fundamental problem which is encountered in a variety of signal processing problems where multiple s Blind Separation of Nonstationary Sources in Noisy Mixtures Seungjin CHOI x1 and Andrzej CICHOCKI y x Department of Electrical Engineering Chungbuk National University 48 Kaeshin-dong, Cheongju Chungbuk

More information

Adaptive Filter Theory

Adaptive Filter Theory 0 Adaptive Filter heory Sung Ho Cho Hanyang University Seoul, Korea (Office) +8--0-0390 (Mobile) +8-10-541-5178 dragon@hanyang.ac.kr able of Contents 1 Wiener Filters Gradient Search by Steepest Descent

More information

REAL-TIME TIME-FREQUENCY BASED BLIND SOURCE SEPARATION. Scott Rickard, Radu Balan, Justinian Rosca. Siemens Corporate Research Princeton, NJ 08540

REAL-TIME TIME-FREQUENCY BASED BLIND SOURCE SEPARATION. Scott Rickard, Radu Balan, Justinian Rosca. Siemens Corporate Research Princeton, NJ 08540 REAL-TIME TIME-FREQUENCY BASED BLIND SOURCE SEPARATION Scott Rickard, Radu Balan, Justinian Rosca Siemens Corporate Research Princeton, NJ 84 fscott.rickard,radu.balan,justinian.roscag@scr.siemens.com

More information

Improved PARAFAC based Blind MIMO System Estimation

Improved PARAFAC based Blind MIMO System Estimation Improved PARAFAC based Blind MIMO System Estimation Yuanning Yu, Athina P. Petropulu Department of Electrical and Computer Engineering Drexel University, Philadelphia, PA, 19104, USA This work has been

More information

Independent Component Analysis of Incomplete Data

Independent Component Analysis of Incomplete Data Independent Component Analysis of Incomplete Data Max Welling Markus Weber California Institute of Technology 136-93 Pasadena, CA 91125 fwelling,rmwg@vision.caltech.edu Keywords: EM, Missing Data, ICA

More information

Tensor approach for blind FIR channel identification using 4th-order cumulants

Tensor approach for blind FIR channel identification using 4th-order cumulants Tensor approach for blind FIR channel identification using 4th-order cumulants Carlos Estêvão R Fernandes Gérard Favier and João Cesar M Mota contact: cfernand@i3s.unice.fr June 8, 2006 Outline 1. HOS

More information

Acoustic Source Separation with Microphone Arrays CCNY

Acoustic Source Separation with Microphone Arrays CCNY Acoustic Source Separation with Microphone Arrays Lucas C. Parra Biomedical Engineering Department City College of New York CCNY Craig Fancourt Clay Spence Chris Alvino Montreal Workshop, Nov 6, 2004 Blind

More information

Separation of Different Voices in Speech using Fast Ica Algorithm

Separation of Different Voices in Speech using Fast Ica Algorithm Volume-6, Issue-6, November-December 2016 International Journal of Engineering and Management Research Page Number: 364-368 Separation of Different Voices in Speech using Fast Ica Algorithm Dr. T.V.P Sundararajan

More information

Machine Learning. A Bayesian and Optimization Perspective. Academic Press, Sergios Theodoridis 1. of Athens, Athens, Greece.

Machine Learning. A Bayesian and Optimization Perspective. Academic Press, Sergios Theodoridis 1. of Athens, Athens, Greece. Machine Learning A Bayesian and Optimization Perspective Academic Press, 2015 Sergios Theodoridis 1 1 Dept. of Informatics and Telecommunications, National and Kapodistrian University of Athens, Athens,

More information

POLYNOMIAL SINGULAR VALUES FOR NUMBER OF WIDEBAND SOURCES ESTIMATION AND PRINCIPAL COMPONENT ANALYSIS

POLYNOMIAL SINGULAR VALUES FOR NUMBER OF WIDEBAND SOURCES ESTIMATION AND PRINCIPAL COMPONENT ANALYSIS POLYNOMIAL SINGULAR VALUES FOR NUMBER OF WIDEBAND SOURCES ESTIMATION AND PRINCIPAL COMPONENT ANALYSIS Russell H. Lambert RF and Advanced Mixed Signal Unit Broadcom Pasadena, CA USA russ@broadcom.com Marcel

More information

Sensor Tasking and Control

Sensor Tasking and Control Sensor Tasking and Control Sensing Networking Leonidas Guibas Stanford University Computation CS428 Sensor systems are about sensing, after all... System State Continuous and Discrete Variables The quantities

More information

ADAPTIVE FILTER THEORY

ADAPTIVE FILTER THEORY ADAPTIVE FILTER THEORY Fifth Edition Simon Haykin Communications Research Laboratory McMaster University Hamilton, Ontario, Canada International Edition contributions by Telagarapu Prabhakar Department

More information

Simultaneous Diagonalization in the Frequency Domain (SDIF) for Source Separation

Simultaneous Diagonalization in the Frequency Domain (SDIF) for Source Separation Simultaneous Diagonalization in the Frequency Domain (SDIF) for Source Separation Hsiao-Chun Wu and Jose C. Principe Computational Neuro-Engineering Laboratory Department of Electrical and Computer Engineering

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Independent Component Analysis Barnabás Póczos Independent Component Analysis 2 Independent Component Analysis Model original signals Observations (Mixtures)

More information

Independent Component Analysis

Independent Component Analysis 1 Independent Component Analysis Background paper: http://www-stat.stanford.edu/ hastie/papers/ica.pdf 2 ICA Problem X = AS where X is a random p-vector representing multivariate input measurements. S

More information

SPARSE signal representations have gained popularity in recent

SPARSE signal representations have gained popularity in recent 6958 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Blind Compressed Sensing Sivan Gleichman and Yonina C. Eldar, Senior Member, IEEE Abstract The fundamental principle underlying

More information

GENERALIZED DEFLATION ALGORITHMS FOR THE BLIND SOURCE-FACTOR SEPARATION OF MIMO-FIR CHANNELS. Mitsuru Kawamoto 1,2 and Yujiro Inouye 1

GENERALIZED DEFLATION ALGORITHMS FOR THE BLIND SOURCE-FACTOR SEPARATION OF MIMO-FIR CHANNELS. Mitsuru Kawamoto 1,2 and Yujiro Inouye 1 GENERALIZED DEFLATION ALGORITHMS FOR THE BLIND SOURCE-FACTOR SEPARATION OF MIMO-FIR CHANNELS Mitsuru Kawamoto,2 and Yuiro Inouye. Dept. of Electronic and Control Systems Engineering, Shimane University,

More information

A Canonical Genetic Algorithm for Blind Inversion of Linear Channels

A Canonical Genetic Algorithm for Blind Inversion of Linear Channels A Canonical Genetic Algorithm for Blind Inversion of Linear Channels Fernando Rojas, Jordi Solé-Casals, Enric Monte-Moreno 3, Carlos G. Puntonet and Alberto Prieto Computer Architecture and Technology

More information

SYSTEM RECONSTRUCTION FROM SELECTED HOS REGIONS. Haralambos Pozidis and Athina P. Petropulu. Drexel University, Philadelphia, PA 19104

SYSTEM RECONSTRUCTION FROM SELECTED HOS REGIONS. Haralambos Pozidis and Athina P. Petropulu. Drexel University, Philadelphia, PA 19104 SYSTEM RECOSTRUCTIO FROM SELECTED HOS REGIOS Haralambos Pozidis and Athina P. Petropulu Electrical and Computer Engineering Department Drexel University, Philadelphia, PA 94 Tel. (25) 895-2358 Fax. (25)

More information

Robust extraction of specific signals with temporal structure

Robust extraction of specific signals with temporal structure Robust extraction of specific signals with temporal structure Zhi-Lin Zhang, Zhang Yi Computational Intelligence Laboratory, School of Computer Science and Engineering, University of Electronic Science

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

DETECTING PROCESS STATE CHANGES BY NONLINEAR BLIND SOURCE SEPARATION. Alexandre Iline, Harri Valpola and Erkki Oja

DETECTING PROCESS STATE CHANGES BY NONLINEAR BLIND SOURCE SEPARATION. Alexandre Iline, Harri Valpola and Erkki Oja DETECTING PROCESS STATE CHANGES BY NONLINEAR BLIND SOURCE SEPARATION Alexandre Iline, Harri Valpola and Erkki Oja Laboratory of Computer and Information Science Helsinki University of Technology P.O.Box

More information

Semi-Blind approaches to source separation: introduction to the special session

Semi-Blind approaches to source separation: introduction to the special session Semi-Blind approaches to source separation: introduction to the special session Massoud BABAIE-ZADEH 1 Christian JUTTEN 2 1- Sharif University of Technology, Tehran, IRAN 2- Laboratory of Images and Signals

More information

A Block-Jacobi Algorithm for Non-Symmetric Joint Diagonalization of Matrices

A Block-Jacobi Algorithm for Non-Symmetric Joint Diagonalization of Matrices A Block-Jacobi Algorithm for Non-Symmetric Joint Diagonalization of Matrices ao Shen and Martin Kleinsteuber Department of Electrical and Computer Engineering Technische Universität München, Germany {hao.shen,kleinsteuber}@tum.de

More information

ORIENTED PCA AND BLIND SIGNAL SEPARATION

ORIENTED PCA AND BLIND SIGNAL SEPARATION ORIENTED PCA AND BLIND SIGNAL SEPARATION K. I. Diamantaras Department of Informatics TEI of Thessaloniki Sindos 54101, Greece kdiamant@it.teithe.gr Th. Papadimitriou Department of Int. Economic Relat.

More information

Performance Analysis for Strong Interference Remove of Fast Moving Target in Linear Array Antenna

Performance Analysis for Strong Interference Remove of Fast Moving Target in Linear Array Antenna Performance Analysis for Strong Interference Remove of Fast Moving Target in Linear Array Antenna Kwan Hyeong Lee Dept. Electriacal Electronic & Communicaton, Daejin University, 1007 Ho Guk ro, Pochen,Gyeonggi,

More information

INDEPENDENT COMPONENT ANALYSIS

INDEPENDENT COMPONENT ANALYSIS INDEPENDENT COMPONENT ANALYSIS A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF Bachelor of Technology in Electronics and Communication Engineering Department By P. SHIVA

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

Independent Component Analysis on the Basis of Helmholtz Machine

Independent Component Analysis on the Basis of Helmholtz Machine Independent Component Analysis on the Basis of Helmholtz Machine Masashi OHATA *1 ohatama@bmc.riken.go.jp Toshiharu MUKAI *1 tosh@bmc.riken.go.jp Kiyotoshi MATSUOKA *2 matsuoka@brain.kyutech.ac.jp *1 Biologically

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

ON-LINE BLIND SEPARATION OF NON-STATIONARY SIGNALS

ON-LINE BLIND SEPARATION OF NON-STATIONARY SIGNALS Yugoslav Journal of Operations Research 5 (25), Number, 79-95 ON-LINE BLIND SEPARATION OF NON-STATIONARY SIGNALS Slavica TODOROVIĆ-ZARKULA EI Professional Electronics, Niš, bssmtod@eunet.yu Branimir TODOROVIĆ,

More information

Independent Components Analysis

Independent Components Analysis CS229 Lecture notes Andrew Ng Part XII Independent Components Analysis Our next topic is Independent Components Analysis (ICA). Similar to PCA, this will find a new basis in which to represent our data.

More information

Entropy Manipulation of Arbitrary Non I inear Map pings

Entropy Manipulation of Arbitrary Non I inear Map pings Entropy Manipulation of Arbitrary Non I inear Map pings John W. Fisher I11 JosC C. Principe Computational NeuroEngineering Laboratory EB, #33, PO Box 116130 University of Floridaa Gainesville, FL 326 1

More information

Chapter 3 Transformations

Chapter 3 Transformations Chapter 3 Transformations An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Linear Transformations A function is called a linear transformation if 1. for every and 2. for every If we fix the bases

More information

Lecture 7 MIMO Communica2ons

Lecture 7 MIMO Communica2ons Wireless Communications Lecture 7 MIMO Communica2ons Prof. Chun-Hung Liu Dept. of Electrical and Computer Engineering National Chiao Tung University Fall 2014 1 Outline MIMO Communications (Chapter 10

More information

Robotic Sound Source Separation using Independent Vector Analysis Martin Rothbucher, Christian Denk, Martin Reverchon, Hao Shen and Klaus Diepold

Robotic Sound Source Separation using Independent Vector Analysis Martin Rothbucher, Christian Denk, Martin Reverchon, Hao Shen and Klaus Diepold Robotic Sound Source Separation using Independent Vector Analysis Martin Rothbucher, Christian Denk, Martin Reverchon, Hao Shen and Klaus Diepold Technical Report Robotic Sound Source Separation using

More information

Independent component analysis: algorithms and applications

Independent component analysis: algorithms and applications PERGAMON Neural Networks 13 (2000) 411 430 Invited article Independent component analysis: algorithms and applications A. Hyvärinen, E. Oja* Neural Networks Research Centre, Helsinki University of Technology,

More information

Blind Source Separation with a Time-Varying Mixing Matrix

Blind Source Separation with a Time-Varying Mixing Matrix Blind Source Separation with a Time-Varying Mixing Matrix Marcus R DeYoung and Brian L Evans Department of Electrical and Computer Engineering The University of Texas at Austin 1 University Station, Austin,

More information

BLIND SEPARATION OF TEMPORALLY CORRELATED SOURCES USING A QUASI MAXIMUM LIKELIHOOD APPROACH

BLIND SEPARATION OF TEMPORALLY CORRELATED SOURCES USING A QUASI MAXIMUM LIKELIHOOD APPROACH BLID SEPARATIO OF TEMPORALLY CORRELATED SOURCES USIG A QUASI MAXIMUM LIKELIHOOD APPROACH Shahram HOSSEII, Christian JUTTE Laboratoire des Images et des Signaux (LIS, Avenue Félix Viallet Grenoble, France.

More information

FuncICA for time series pattern discovery

FuncICA for time series pattern discovery FuncICA for time series pattern discovery Nishant Mehta and Alexander Gray Georgia Institute of Technology The problem Given a set of inherently continuous time series (e.g. EEG) Find a set of patterns

More information

Comparative Performance Analysis of Three Algorithms for Principal Component Analysis

Comparative Performance Analysis of Three Algorithms for Principal Component Analysis 84 R. LANDQVIST, A. MOHAMMED, COMPARATIVE PERFORMANCE ANALYSIS OF THR ALGORITHMS Comparative Performance Analysis of Three Algorithms for Principal Component Analysis Ronnie LANDQVIST, Abbas MOHAMMED Dept.

More information

Lessons in Estimation Theory for Signal Processing, Communications, and Control

Lessons in Estimation Theory for Signal Processing, Communications, and Control Lessons in Estimation Theory for Signal Processing, Communications, and Control Jerry M. Mendel Department of Electrical Engineering University of Southern California Los Angeles, California PRENTICE HALL

More information

Lecture Notes 5: Multiresolution Analysis

Lecture Notes 5: Multiresolution Analysis Optimization-based data analysis Fall 2017 Lecture Notes 5: Multiresolution Analysis 1 Frames A frame is a generalization of an orthonormal basis. The inner products between the vectors in a frame and

More information

Lecture 19 IIR Filters

Lecture 19 IIR Filters Lecture 19 IIR Filters Fundamentals of Digital Signal Processing Spring, 2012 Wei-Ta Chu 2012/5/10 1 General IIR Difference Equation IIR system: infinite-impulse response system The most general class

More information

Introduction to Independent Component Analysis. Jingmei Lu and Xixi Lu. Abstract

Introduction to Independent Component Analysis. Jingmei Lu and Xixi Lu. Abstract Final Project 2//25 Introduction to Independent Component Analysis Abstract Independent Component Analysis (ICA) can be used to solve blind signal separation problem. In this article, we introduce definition

More information

An Improved Cumulant Based Method for Independent Component Analysis

An Improved Cumulant Based Method for Independent Component Analysis An Improved Cumulant Based Method for Independent Component Analysis Tobias Blaschke and Laurenz Wiskott Institute for Theoretical Biology Humboldt University Berlin Invalidenstraße 43 D - 0 5 Berlin Germany

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

Adaptive Systems Homework Assignment 1

Adaptive Systems Homework Assignment 1 Signal Processing and Speech Communication Lab. Graz University of Technology Adaptive Systems Homework Assignment 1 Name(s) Matr.No(s). The analytical part of your homework (your calculation sheets) as

More information

Robustness of Principal Components

Robustness of Principal Components PCA for Clustering An objective of principal components analysis is to identify linear combinations of the original variables that are useful in accounting for the variation in those original variables.

More information

Advanced Digital Signal Processing -Introduction

Advanced Digital Signal Processing -Introduction Advanced Digital Signal Processing -Introduction LECTURE-2 1 AP9211- ADVANCED DIGITAL SIGNAL PROCESSING UNIT I DISCRETE RANDOM SIGNAL PROCESSING Discrete Random Processes- Ensemble Averages, Stationary

More information

Single Channel Signal Separation Using MAP-based Subspace Decomposition

Single Channel Signal Separation Using MAP-based Subspace Decomposition Single Channel Signal Separation Using MAP-based Subspace Decomposition Gil-Jin Jang, Te-Won Lee, and Yung-Hwan Oh 1 Spoken Language Laboratory, Department of Computer Science, KAIST 373-1 Gusong-dong,

More information

BLIND SOURCE SEPARATION TECHNIQUES ANOTHER WAY OF DOING OPERATIONAL MODAL ANALYSIS

BLIND SOURCE SEPARATION TECHNIQUES ANOTHER WAY OF DOING OPERATIONAL MODAL ANALYSIS BLIND SOURCE SEPARATION TECHNIQUES ANOTHER WAY OF DOING OPERATIONAL MODAL ANALYSIS F. Poncelet, Aerospace and Mech. Eng. Dept., University of Liege, Belgium G. Kerschen, Aerospace and Mech. Eng. Dept.,

More information

BLOCK-BASED MULTICHANNEL TRANSFORM-DOMAIN ADAPTIVE FILTERING

BLOCK-BASED MULTICHANNEL TRANSFORM-DOMAIN ADAPTIVE FILTERING BLOCK-BASED MULTICHANNEL TRANSFORM-DOMAIN ADAPTIVE FILTERING Sascha Spors, Herbert Buchner, and Karim Helwani Deutsche Telekom Laboratories, Technische Universität Berlin, Ernst-Reuter-Platz 7, 10587 Berlin,

More information

Dimensionality Reduction. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Dimensionality Reduction. CS57300 Data Mining Fall Instructor: Bruno Ribeiro Dimensionality Reduction CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Visualize high dimensional data (and understand its Geometry) } Project the data into lower dimensional spaces }

More information

Analytical solution of the blind source separation problem using derivatives

Analytical solution of the blind source separation problem using derivatives Analytical solution of the blind source separation problem using derivatives Sebastien Lagrange 1,2, Luc Jaulin 2, Vincent Vigneron 1, and Christian Jutten 1 1 Laboratoire Images et Signaux, Institut National

More information

PHASE RETRIEVAL OF SPARSE SIGNALS FROM MAGNITUDE INFORMATION. A Thesis MELTEM APAYDIN

PHASE RETRIEVAL OF SPARSE SIGNALS FROM MAGNITUDE INFORMATION. A Thesis MELTEM APAYDIN PHASE RETRIEVAL OF SPARSE SIGNALS FROM MAGNITUDE INFORMATION A Thesis by MELTEM APAYDIN Submitted to the Office of Graduate and Professional Studies of Texas A&M University in partial fulfillment of the

More information

TWO METHODS FOR ESTIMATING OVERCOMPLETE INDEPENDENT COMPONENT BASES. Mika Inki and Aapo Hyvärinen

TWO METHODS FOR ESTIMATING OVERCOMPLETE INDEPENDENT COMPONENT BASES. Mika Inki and Aapo Hyvärinen TWO METHODS FOR ESTIMATING OVERCOMPLETE INDEPENDENT COMPONENT BASES Mika Inki and Aapo Hyvärinen Neural Networks Research Centre Helsinki University of Technology P.O. Box 54, FIN-215 HUT, Finland ABSTRACT

More information

Sparse Kernel Density Estimation Technique Based on Zero-Norm Constraint

Sparse Kernel Density Estimation Technique Based on Zero-Norm Constraint Sparse Kernel Density Estimation Technique Based on Zero-Norm Constraint Xia Hong 1, Sheng Chen 2, Chris J. Harris 2 1 School of Systems Engineering University of Reading, Reading RG6 6AY, UK E-mail: x.hong@reading.ac.uk

More information

Linear Algebra Massoud Malek

Linear Algebra Massoud Malek CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product

More information

Independent Component Analysis (ICA) Bhaskar D Rao University of California, San Diego

Independent Component Analysis (ICA) Bhaskar D Rao University of California, San Diego Independent Component Analysis (ICA) Bhaskar D Rao University of California, San Diego Email: brao@ucsdedu References 1 Hyvarinen, A, Karhunen, J, & Oja, E (2004) Independent component analysis (Vol 46)

More information

New Statistical Model for the Enhancement of Noisy Speech

New Statistical Model for the Enhancement of Noisy Speech New Statistical Model for the Enhancement of Noisy Speech Electrical Engineering Department Technion - Israel Institute of Technology February 22, 27 Outline Problem Formulation and Motivation 1 Problem

More information