neural units that learn through an information-theoretic-based criterion.

Size: px
Start display at page:

Download "neural units that learn through an information-theoretic-based criterion."

Transcription

1 Submitted to: Network: Comput. Neural Syst. Entropy Optimization by the PFANN Network: Application to Blind Source Separation Simone Fiori Dept. of Electronics and Automatics { University of Ancona, Italy simone@eealab.unian.it PACS numbers: unknown Abstract. The aim of this paper is to present a study on polynomial functionallink neural units that learn through an information-theoretic-based criterion. First the neuron's structure is presented and the unsupervised learning theory is explained and discussed, with particular attention paid to probability density function and cumulative distribution function approximation capability. Then a neural network formed by such neurons (Polynomial Functional-Link Articial Neural Network, PFANN) is shown to be able to separate out linearly mixed eterokurtic source signals, that is signals endowed with either positive or negative kurtoses. In order to compare the performances of the proposed blind separation technique to those exhibited by existing methods, the Mixture of Densities (MOD) approach by Xu et al., that is closely related to PFANN, is briey recalled; then comparative numerical simulations performed on both synthetic and realworld signals and a complexity evaluation are illustrated. These results show that PFANN approach allows to obtain similar performances with a noticeable reduction of computational eorts.. Introduction Over the recent years, information-theoretic-based optimization by neural networks has become an important research eld. Since the pioneering work of Linsker and Plumbley (see for instance [, 3, 7, 8] and references therein), several authors were involved in the study of the unsupervised neural learning driven by entropy-based criteria, with current applications like blind separation of sources [,, 4, 5, ], linear estimation and time-series prediction [], probability density shaping [, 4,, 4, 9], unsupervised classication [], and blind system deconvolution [, 3, 3]. Particularly, the aim of the dierent techniques for neural density shaping is to nd a non-linear transformation of an input random process (with unknown statistics) that maximizes the entropy of the transformed process. Then, the found transformation approximates the cumulative distribution function of the input random process, and the rst derivative of the transformation approximates the probability density distribution function of the input [, ], with a degree of accuracy depending on the structure of the employed neural network. Usually, the neural topologies used in the literature involve semi-linear neural units, that is, linear combiners followed by static sigmoidal non-linearities. In this paper, a study concerning the use of more complex and exible neural units endowed with functional links [9, 3], trained in an unsupervised way by means

2 Entropy Optimization, PFANN Network and Blind Separation of an entropy-based criterion, is presented. As used here, in a functional-link neuron (also known as? neuron) the total excitation is computed as the weighted sum of dierent powers of the external stimulus, so that the total excitation assumes the expression of a polynomial [6, 7]. The actual response of the neuron is then computed by squashing the total excitation by a sigmoidal function. In order to test for the learning and approximation capabilities of the proposed structure, cases of probability density shaping and cumulative distribution function approximation are tackled and discussed through computer simulations. Among others, blind source separation by the independent component analysis is a very interesting and challenging neural information processing task. The problem of separating out mixed statistically independent source signals by Neural Networks has been widely investigated in the recent years [, 6,, 5], and several algorithms and methods have been proposed by dierent authors. Particularly, the informationtheoretic approach by Bell and Sejnowski [] has attracted a lot of interest in the Neural Network community. However, both analytical studies and computer experiments have shown that this algorithm cannot be guaranteed to work with all kind of source signals, that means it is eective depending on the source probability distribution. To overcome this problem, recently, Xu et al. [5] proposed to use the Bell-Sejnowski's algorithm modied by Amari's Natural Gradient learning technique (see for instance [5, 8] and references therein) that dramatically speeds up its convergence, and by using adaptive non-linearities instead of xed ones. These exible functions may be `learnt' so that they approximate the required cumulative distribution functions of the sources that can be proven to be optimal [6, 8]. Particularly this new technique overcomes the problem of separating both leptokurtic (i.e. with positive kurtosis) and platikurtic (i.e. with negative kurtosis) sources without the need of explicitly estimating their kurtosis, making the algorithm able to separate eterokurtic sources, that means mixed leptokurtic and platikurtic sources. Since the functional-link units that learn by means of entropy optimization principle show interesting probability density function and cumulative distribution function approximation abilities, we propose to extend the aforementioned learning theory to a multiple version (PFANN) and apply it to source separation; we rst present the new Polynomial Functional-Link Articial Neural Network (PFANN) learning equations, then briey recall the Mixture of Densities (MOD) theory from [5, 6, 7], that is closely related to our approach, and compare the performances and structural features of the two methods.. Learning of maximum entropy connection strengths in a single functional-link unit.. Theory derivation In this paper the following input-output description for a functional-link neuron is assumed: y = (x; a) = sgm['(x; a)] ; () where sgm() is a sigmoidal function, bounded above and below, continuous and strictly increasing; '(x; a) is a strictly monotonic polynomial in x depending upon parameters in a = [a a a n ] T. If x is supposed to be a stationary continuoustime random process x = x(t) X with probability density function (pdf) p x (x), then

3 Entropy Optimization, PFANN Network and Blind Separation 3 y will be a random process y = y(t) Y too, with a pdf denoted here as p y (y; a). The dierential entropy of the random process y(t) dened as: H y (a) def =? Z Y p y (; a) log p y (; a)d () can be related to the dierential entropy H x of the random process x(t) by means of the fundamental formula: p y = p x =j j ; (x; a) def = (x; a), where the prime denotes derivative with respect to x. Using that substitution in the formula (), yields: Z p x () H y (a) =? j (; a)j log px () j (; a)jd j (; a)j X = H x + where: Z H x =? p x () log p x ()d : X By denition, j j assumes the expression: j (x; a)j = Z X p x () log j (; a)jd ; (3) d (x; a) dx = d'(x; a) sgm ['(x; a)] dx ; (4) due to equation (). The entropy H y depends upon coecients a k by means of (x; a), thus: y (a) p x (; a)j = d : k j (; k X It is straightforward to see that the partial derivatives involved in the integral (5) j = sgm k dx + d' k dx ; (6) whereby direct calculations lead j = sgm ['] j k sgm k dx d' dx? : (7) It is known from statistics that optimizes the entropy when it approaches the cumulative distribution function of x(t), thus when approaches the probability density function of the input process. Note that entropy is only optimized here for xed upper and lower bound. Thus we wish to nd a vector of parameters a, hence a conguration of the functional-link neuron, that maximizes the entropy of the neuron response y(t). To this aim, a set of continuous-time learning equations derived by the Gradient Steepest Ascent (GSA) method can be used here z. Following this way, such equations write in the form da=dt y =@a, that is: da sgm p x () ['(; sgm + (; a) ['(; ' d : (8) (; dt = ZX z Must be underlined that the presence of the bounded (saturating) non-linearity is needed to have xed bounds, otherwise the optimization problem would be meaningless, and using the sigmoidal function sgm() is a way to enforce this.

4 Entropy Optimization, PFANN Network and Blind Separation 4 Also, from equations (3) and (4) the exact expression of the quantity G def = H y? H x, hereafter referred to as \entropy gap", is obtained: G(a) = Z X p x () logfsgm ['(; a)]j' (; a)jgd : (9) It should be noted that G(a) is not guaranteed to be positive, since it actually represents a dierence among entropies. Anyway, as H y (a) maximizes, G(a) maximizes too since H x is constant, thus maximizing the response entropy may be conceived as the maximization of the entropy gap between the original and the squashed processes... A case-study Here the simple case-study concerning the following neural structure is discussed: Z sgm(z) = + erf(z) = p z exp(?u )du ; ()? z def = '(x; a) = a + a x : () The excitation x is endowed with a Laplacian distribution p x (x) = e?jx?j, where >. Equations () and () represent the input-output relationships of a sigmoidal neuron with one weight and one bias. This is an interesting case in the theory, since it is possible to nd solutions of the dierential system (8) in a closed form. In fact, the relevant quantities involved in the integrals are found to be sgm (z)=sgm (z) =?z and ' (x; a) = a, whereby the others follow. Thus system (8) rewrites, in this case: da dt da dt =? = a? or, after direct calculations: Z +? Z +? e?j?j (a + a )d ; e?j?j (a + a )d ; da da =?(a + a ) ; dt dt =? (a + a )? 4a? : () a Vanishing the time-derivatives allows to determine the equilibrium points a eq for the above dierential system. They are found to be: a eq = [+=? =] T and [?= + =] T : (3) It would be interesting to give in this simple case the exact expression of the entropy gap (9), that is: G([a a ] T ) = log p ja j? [a + a a? (? + )a ] : (4) Note that the entropy gap is invariant with respect to a sign exchange among a and a, coherently to result (3).

5 Entropy Optimization, PFANN Network and Blind Separation 5.3. Two more complex examples It is worth to consider the more complex neural structure described by functions: sgm(z) = tanh(z) ; (5) z = '(x; a) = a + e a x + e a3 x e ar+ x r+ ; (6) with r + being the order of the polynomial. Note that, due to the exponential structure of the coecients and the monotonicity of the polynomial, the property ' (x; a) > holds true for any value of x X (providing that a ; a 3 ; : : : ; a r+ >?). With the structural functions as above, the relevant learning quantities are: sgm (z) sgm =?tanh(z) =?sgm(z) =?y ; (7) (z) ' (x; a) = e a + 3e a3 x + + (r + )e ar+ x r ; (8) and the others follow. In order to simulate learning equations (8) particularized with functions (5)+(6), their instantaneous, stochastic, discrete-time approximations can be used. They are expressed by: a (t + ) = a (t)? y(t) ; (9) a k (t + ) = a k (t) + k e ak(t) k ' (t)? x(t)y(t) x k? (t) ; () a () = a init ; a k () = a init k ; () for k = ; 3; 5; : : :; r +, with ; ; 3 ; : : : ; r+ being suciently small positive learning rates, and t denotes discrete-time index. It should be noted that when the k are dierent one from another, the learning equations no longer represent a gradient descent rule in the a-space, but they only resemble gradient descent equations in the a k -spaces. As learning performance index, an estimate of the entropy gap H y? H x may be obtained by averaging over the learning epochs the instantaneous, stochastic approximation of the right-hand side of the expression (9): Entropy gap G log[(? y )' ] : Since it is known that the quantity (x; a) approximates the pdf of the excitation x(t) (with a degree of accuracy related to sgm() and '(; ), see for instance [, ] and references therein), to test for the probability density shaping capability of the proposed neural structure, two exemplary experiments are considered in what follows. First, a two-overlapping-gaussian excitation with probability distribution function like: p x (x) = p exp? (x? ) + exp? (x? ) was presented to the functional-link neuron (5) with r + = 3. Simulation results are shown in Figure. Each epoch counts 6 input samples; a total of 5 epochs (corresponding to 3 samples) was used. Another experiment was performed with a polynomial '(x; a) with degree r+ = 7 and a real-world excitation (a khz sampled musical stream with suitable amplitude range scaling). Results with batches counting 5 samples and 5 epochs are shown in Figure. Unfortunately, the values of the coecients a i obtained by running the preceding simulations cannot be compared to the `optimal' ones since the solutions to the ()

6 Entropy Optimization, PFANN Network and Blind Separation 6 pdf_in x Entropy gap epochs y (x).5.5 a, a, a x - 3 samples Figure. Entropy maximization for a two-overlapping-gaussian excitation. (Pdf-in: pdf of the stimulus; y (x): approximated pdf after neuron's learning; gures on the right: estimated entropy gap and neuron's coecients during learning)..5.4 pdf_in. Entropy gap x epochs y (x) x a, a, a3, a5, a samples Figure. Entropy maximization for a real-world excitation. (Pdf-in: histogram approximation of the stimulus pdf; y (x): approximated pdf after neuron's learning; gures on the right: estimated entropy gap and neuron's coecients during learning). equilibrium equation dh y (a)=da = are not available at present. However, a close examination of the simulations show that in all cases the neural unit endowed with functional links, with a relatively small number of parameters to be learnt, seems to possess interesting approximation capabilities that of course should be expected as biological evidences seem to suggest []. 3. Extension to the multiple case: The PFANN network applied to blind source separation The problem of separating out mixed statistically independent source signals by Neural Networks has been widely investigated in the recent years [, 6,, 5], and several algorithms and methods have been proposed by dierent authors. Particularly, the

7 Entropy Optimization, PFANN Network and Blind Separation 7 information-theoretic approach by Bell and Sejnowski [] with the Natural Gradient modication developed by Amari [] has attracted a lot of interest in the Neural Network community. However, both analytical studies and computer experiments have shown that this algorithm cannot work with all kinds of source signals, that means it is eective depending on the source probability distribution. This behavior may be explained recognizing that Bell-Sejnowski method relies on the use of some non-linearities whose optimal shapes are the cumulative distribution functions (cdf's) of the sources [, 6], thus using xed non-linearities like standard sigmoids would not be a good way, as also mentioned in [, 5, 9, 8]. On the other hand, in a blind problem the cdf's of the sources are unknown, thus the problem cannot be directly solved. To overcome this problem, recently, Xu et al. [5, 6, 7] proposed to use the wellknown Bell-Sejnowski's algorithm [] modied by Amari's Natural Gradient learning technique (readers please refer to [5, 8] and references therein) that dramatically speeds up its convergence and reduces the amount of required computational eorts, and by using adaptive non-linearities instead of xed ones. These exible functions may be `learnt' so that they approximate the required cdf's of the sources helping the separation algorithm to perform better. Particularly, they overcome the problem of separating both leptokurtic and platikurtic sources without the need of explicitly estimating their kurtoses, making the algorithm able to separate eterokurtic sources. In the previous Section we presented a technique for approximating the pdf as well as the cdf of a signal by means of a single neural unit that learns through an information-theoretic-based learning rule. The aim of this part is to extend this algorithm to a multiple version (PFANN) and to apply it to source separation; we rst present the new PFANN learning equations, then we briey recall the MOD theory from [5], that is closely related to our approach, and compare the performances and structural features of the two methods. 3.. Blind source separation by the entropy maximization approach For separating out n linearly mixed independent sources, we use a neural network with n inputs and n outputs described by the relationship y = h(x) = h(wu), where u is the network input vector and W is the weight matrix that adapt through time in order for the entropy H h(x) to be maximized [, 5], with h(x) = [h (x ) h (x ) h n (x n )] T. The entropy of the squashed signal y is dened as def H y = E x [log p y (y)], where operator E x [] denotes mathematical expectation with respect to x. As learning algorithm for W, we use the natural gradient rule [5]: W(t + ) = W(t) + W [I + g(x(t))x T (t)]w(t) ; (3) = h where g(x) = [g (x ) g (x ) g n (x n )] T def, g j j =h j, and h j should approximate the cumulative distribution function of the j th source signal [, 5]. The mixing model is u = Ms, where M is a full-rank mixing matrix and s is the vector containing the n statistically independent source signals to be separated. It is worth to remark that in a blind separation problem both the mixing matrix M and the source signal stream s(t) are unknown. Moreover, in some real-world applications the mixtures may vary through the time making the problem nonstationary. In this paper we only deal with instantaneous mixtures, that is, mixtures given by mixing matrices M that do not change through time.

8 Entropy Optimization, PFANN Network and Blind Separation 8 x -? exp(a ) a - -?? w exp(a 3 - ) - erf ' 7 exp(a 5 ) - y s? Figure 3. An odd polynomial functional-link neural unit shown for r =. 3.. Entropy maximization by the PFANN network In order to approximate any cdf of the source signals we use the following adaptive transformations: y j = h j (x j ) def = + erf[' j(x j ; a j )] ; j = ; : : : ; n ; (4) where the index j denotes the j th neuron, ' j (x j ; a j ) is a polynomial whose coecients in a j are adapted through time, and erf() denotes again the mathematical `error function' x. Since h j () should approximate a non-decreasing function, each polynomial ' j must be monotonic almost everywhere within the range of x j, formally ' j. This condition is surely fullled if ' j (x j ; a j ) has the form: ' j (x j ; a j ) = a ;j + r j X i= def = d'j dx j e ai+;j x i+ j ; (5) where r j + is the order of the polynomial for each neuron. As mentioned above, the structure of the neurons gives the name of Polynomial Functional-Link Articial Neural Network to the network used for separating out the source signals. A polynomial functional-link network [9, 3] as intended here is composed by neural units endowed with exponential links. A unit of this kind is depicted for instance in Figure 3 for r =. In P Q this Figure, the blocks marked with a perform multiplication of their inputs, the block performs summation and erf() denotes the aforementioned error function that squashes the polynomial '(x). The entropy H y, that has to be maximized [, 5] with respect to the coecients a k;j, is dened as: H y (A) def =? Z p y (y; A) log p y (y; A)dy n ; (6) x The error function in (4) may be replaced by any sigmoid y = (x), where () is continuously dierentiable almost everywhere at least twice, non-decreasing and ranging between and [6, 7]. We chose the `erf' function because it simplies the learning equations.

9 Entropy Optimization, PFANN Network and Blind Separation 9 where p y (y) is the joint pdf of the squashed signal vector y def = [y y y n ] T and A is the matrix formed by the columns a j. The entropy H y may be rewritten [, 5] H y = H x + E x [log ], and function (x; A) is dened as: def = det @x n = P where j def = dhj n dx j, so that log = log j= j. In order for the entropy to be maximized, the recursive stochastic gradient steepest ascent technique may be used again: a ;j log ;j ny j= j ; ; a i+;j = log i+;j ; (7) where i = ; ; : : :; r, and,, 3, : : :, r+ are positive learning stepsizes. In this case, the functions j look like: j(x j ) = p e?' j (xj) ' j (x j) ; where ' j (x j) = r j X i= (i + )e ai+;j x i j ; (8) and the dependence on the a j is understood. Thus we have: " j = j ;j j(x j ) + e?' j (xj ) j (x ;j ;j =? p e?' j (xj) ' j(x j )' j (x j ) ; (9) " j = j (xj j(x j ) + e?' j (xj) j (x j) i+;j Therefore the adapting algorithm writes: ( a;j =? ' j (x j ; a j ) ; a i+;j = i+ e ai+;j h = p e?' j (xj)? i + ' j(x j )' j (x j )x j (i + )e ai+;j x i j : (3) i i+ '? j (xj;aj ) ' j(x j ; a j )x j x i j ; where i = ; ; ; r j, and the short notation a k;j = a k;j (t + )? a k;j (t) is used. Finally, we need to compute the non-linear functions g j that are required for adapting the weight matrix W by (3). In our case they take on the expression: (3) g j (x j ; a j ) = d j =?' j (x j ; a j )' j(x j ; a j ) + ' j (x j; a j ) j dx j ' j (x j; a j ) ; (3) where ' j (x j; a j ) = P r j i= i(i + )eai+;j x i? j.

10 Entropy Optimization, PFANN Network and Blind Separation 3.3. The MOD learning algorithm The MOD technique, an existing approach drawn from the scientic literature, is now briey recalled for a comparison with the PFANN approach. As functions h j, in [5, 6] the following expression is proposed: h j (x j ) = m X j i= i;j [b i;j (x j? c i;j )] ; (33) where m j is the size of the `mixture of densities' for the j th neuron, (b i;j,c i;j ) are coecients to be adapted, and the i;j measure the contribution given by any single term to the approximation, thus they should be positive and sum to one with respect to the index i. To this aim, it is useful to replace the i;j with new unconstrained variables % i;j dened by i;j = exp(%i;j ) Pi exp(%i;j). The basis function is [5] (u) =. +exp(?u) Also, the rst derivative of the functions h j have to be computed: h j(x j ) = dh j dx j = m X j i= i;j b i;j [b i;j (x j? c i;j )] : (34) Now, the MOD learning equations that maximize the entropy H y read [5, 6, 7]: % i;j = k % h j (x j) c i;j =? m X j k= b k;j (v k;j? v k;j) k;j ( i;k? i;j ) ; (35) k c h j (x j) i;jb i;j (? v i;j)(v i;j? v i;j ) ; (36) b i;j = k b i;j h j (x j) [ + b i;j(x j? c i;j )(? v i;j )](v i;j? v i;j) ; (37) where v i;j = v i;j (x j ) def = [b i;j (x j? c i;j )], i;j is the Kronecker `delta', and the k %, k c and k b are positive learning rates. In this case the non-linearities g j () take on the expression: g j (x j ) = h j (x j) m X j i= i;j b i;j(? v i;j )(v i;j? v i;j) : (38) These equations have been recast directly from [5, 6, 7] where a very clear derivation of the MOD learning theory was presented, along with a detailed discussion on the reasons for employing this kind of mixtures of parametric densities as adaptive activation functions. 4. Computer simulation results and structural comparison In this part we show computer simulation results that conrm the eectiveness of the proposed approach, and present a complexity comparison of the MOD and PFANN separation methods. 4.. A blind separation test on PFANN network The functional-link network-based PFANN separation algorithm has been tested with an input stream which is a mixture of three signals: s (t) = sign[cos(5t+9 cos(5t))],

11 Entropy Optimization, PFANN Network and Blind Separation s (t) is a uniformly distributed white noise ranging in [?; +], and s 3 (t) is a 8kHz sampled speech signal. The signal s (t) has been chosen in such a way because in [9] Gustason reported that original Bell-Sejnowki algorithm may be ineective in presence of it. The 3 3 mixing matrix M has been randomly generated and the weights in W and the coecients a k;j have been randomly initialized as well. The learning stepsize W was?4. The separation neural network has 3 inputs, 3 outputs and thus 3 functional-link neurons structured as in Section 3, where the constants r j have been chosen equal to r =. As convergence measure, an interference residual has been dened as the sum of the n? n smallest squared entries of the product V def = WM like in []. In fact, since V represents the overall source-to-output transfer matrix, perfect separation would imply a quasi-diagonal form of it, i.e., only one entry per row dierent from zero [4]; in a real-world context, however, some residual interference should be tolerated. Figure 4 shows the interference residual. It tells that the algorithm has been able to separate the three eterokurtic sources. Figure 5 depicts instead the Noise-to-Signal.5 Interference residual Samples x 4 Figure 4. Interference residual. Ratios (NSRs) of the three network's outputs. The NSR at the output of each neuron measures the power of the residual interference (the `noise') with respect to the power of the separated source signal pertaining to that neuron. Formally, it is dened as [5]: N S j! def Pi6=k = log v j;i vj;k ; k def = argmax fig vj;i : The Figure 6 illustrates the functions h j and their derivatives j = h j as learnt by the functional-link units. Consider, for instance, neurons and. Due to the denition of sources s and s, the pdf of the signal s consists of two Dirac's `delta centered in? and +, while the pdf of s is constant within [?; +] and null elsewhere. The approximation capability of neurons and may be quantitatively evaluated by observing the rst two pictures from the left on the rst row of Figure 6, where the solid lines represent the estimated pdfs j of the signals s and s, respectively. They are in good accordance with the true pdfs. The dashed lines represent the derivatives of the static sigmoidal activation functions, as used by Bell and Sejnowski, for comparison: the dierence between static and adaptive (exible) non-linearities

12 Entropy Optimization, PFANN Network and Blind Separation - -4 NSR Samples x 4 Figure 5. Noise-to-signal ratio (NSR) in db. are quite evident, especially for neuron. The dostdashed lines on the same pictures represent the j functions at the beginning of PFANN learning. Pictures in the second row display instead the approximated cdfs h j, the sigmoidal activation functions, and the coarse approximations at the beginning of PFANN neurons' learning. The third column relates instead to pdf-cdf approximation for a speech signal; it is common experience that the true pdf is a leptokurtic (rather picked) function, and simulation results seem to show again one network's neuron is able to retain in its activation function derivative that shape. In conclusion, a close examination of these results shows that the neurons have been eectively able to approximate with a high degree of accuracy, compatibly to the number of free parameters to adapt, both the cdf and the pdf of the sources. Neuron Neuron 4 Neuron h (y) h (y) h3 (y3) y - y - y h(y).6.4 h(y).6.4 h3(y3) y - y - y3 Figure 6. Functions h j and h j = j for any neuron. (Dotdashed: at the beginning of learning; solid: after learning; dashed: no adaptation (hyperbolic tangent) for a comparison to the static case).

13 Entropy Optimization, PFANN Network and Blind Separation Numerical comparison of PFANN and MOD In order to eectively compare the MOD and PFANN algorithms, we chose the learning rates of the adapting rules so that both algorithms show approximately the same convergence speed and steady-state precision. The aim of these simulations is to prove that the new separation algorithm may exhibit the same capability of MOD. In Subsection below we shall show that this may be obtained with a substantial reduction of computational eort. In Experiment 4., we used two i.i.d. source sequences uniformly distributed within [?; +]. The initial conditions relative to the MOD algorithm are the same used in [5], and particularly for each neuron we have m j = 7. For running the PFANN network we set r j = and = i+ = : for any i, for each neuron. Figure 7 shows the interference residuals averaged over trials of PFANN and MOD algorithms. In this case, both the source streams are platikurtic, and the algorithms were able to separate them out with no external `kurtosis tuning'. In Experiment 4., we considered two speech sequences both sampled at 8kHz. The testing conditions are similar to those of the Experiment 4., except for the learning rates that have been set to = i+ = :5. Note that in order to obtain the same performances for the two algorithms we had to use r j = and m j = 7. The pictures displayed in Figure 8 have the same meaning as in the preceding Experiment. In this Experiment both the source signals are leptokurtic. The algorithms were able to separate them out without requiring the user to know the kurtoses' signs nor to perform kurtoses estimation Interference residuals Samples x 4 Figure 7. Two noises. Solid: MOD, Dotdashed: PFANN. In Experiment 4.3, the blind separation algorithms have been tested with three source signals: a pure sinusoid, a uniformly distributed noise and a speech signal. The learning constants are m j = 5, r j =, and = i+ = :3. The results are shown in Figure 9. Since the at noise is platikurtic while the voice stream is leptokurtic, the PFANN separating technique has shown its ability to separate out eterokurtic sources, as well as MOD, again with no prior knowledge nor kurtoses estimation needs.

14 Entropy Optimization, PFANN Network and Blind Separation 4. Interference residuals Samples x 4 Figure 8. Two voices. Solid: MOD, Dotdashed: PFANN..5 Interference residuals Samples x 4 Figure 9. Sinusoid, noise, voice. Solid: MOD, Dotdashed: PFANN. Table. Complexity comparison of MOD and PFANN. Algorithm Multiplications Divisions Non-Linearities Parameters MOD 4mn (5m + )n mn 3mn PFANN (r + 8)n (r + )n n (r + )n 4.3. Structure comparison Suppose (as in the simulations) that r j = r and m j = m. A direct comparison between the experimental results obtained with the functional-link network-based PFANN and MOD techniques shows that to obtain similar performances it is needed to choose m > r. In order to perform a fair complexity comparison, in Table the number of operations (multiplications, divisions and any generic non-linearity) involved (and strictly needed) in the learning equations is shown. The PFANN algorithm appears more easy to be implemented as it requires reduced computational eorts. For a numerical example about computational complexity, readers please refer to [8] where

15 Entropy Optimization, PFANN Network and Blind Separation 5 times required to run PFANN and MOD on the same blind separation problems are given. 5. Conclusion In this work, a new learning theory for functional-link neural units, based on an information-theoretic approach, is presented; then, learning and approximation capabilities shown by dierent units were investigated by solving density shaping problems. Simulation results conrm the eectiveness of the proposed learning theory and the good exibility exhibited by the non-semi-linear structures. The aim of this paper was also to propose a novel approach for performing blind separation of eterokurtic source signals by the functional-link neural network, that is based upon the functional approximation ability of (low-order) polynomials. With the aim to provide exible non-linearities, Taleb and Jutten [] and Roth and Baram [] proposed the use of a multilayer perceptron, Gustason [9] presented a technique based upon an adjustable linear combination of a series of Gaussian basis functions with xed mean values and variances, while Xu et al. proposed to employ linear combinations of fully adjustable sigmoids (MOD technique) [5, 6, 7]. Here we compared the new method to the closely related MOD: both computer simulations and structure comparison conrm that the presented approach is eective and interesting since it allows to obtain comparable performances with a noticeable reduction of computational eorts. Extensions of the presently proposed theory to other kinds of approximating functions, like squashed truncated Fourier series, are currently under investigation, along with a numerical and structural comparison to existing techniques in order to gain quantitative knowledge about how the proposed method relates to other approaches drawn from the scientic literature, especially from applied statistics. Acknowledgments This research was supported by the Italian MURST. The author wishes to thank the anonymous reviewers whose careful and detailed suggestions allowed to signicantly improve the quality of the paper. References [] S.-i. Amari, T.-p. Chen and A. Chicocki, Stability Analysis of Learning Algorithms for Blind Source Separation, Neural Networks, Vol., No. 8, pp, 345 { 35, 997 [] A.J. Bell and T.J. Sejnowski, An Information Maximization Approach to Blind Separation and Blind Deconvolution, Neural Computation, Vol. 7, No. 6, pp. 9 { 59, 996 [3] S. Bellini, Bussgang Techniques for Blind Equalization, in IEEE Global Telecommunication Conf. Rec., pp. 634 { 64, Dec. 986 [4] P. Comon, Independent Component Analysis, A new concept?, Signal Processing, Vol. 36, pp. 87 { 34, 994 [5] J. Dehaene and N. Twum Danso, Local Adaptive Algorithms for Information Maximization in Neural Networks, Proc. of the International Conference on Acoustics, Speech and Signal Processing (ICASSP'97), pp. 59 { 6, 997 [6] S. Fiori and F. Piazza, A Study on Functional-Link Neural Units with Maximum Entropy Response, Articial Neural Networks, Vol. II, pp. 493 { 498, Springer-Verlag, 998 [7] S. Fiori, P. Bucciarelli and F. Piazza, Blind Signal Flatting Using Warping Neural Modules, Proc. Int. Joint Conf. on Neural Networks (IJCNN'98), Vol., pp. 3 { 37, 998

16 Entropy Optimization, PFANN Network and Blind Separation 6 [8] S. Fiori, Blind Source Separation by New M{WARP Algorithm, Electronics Letters, Vol. 35, No. 4, pp. 69 { 7, Feb. 999 [9] M. Gustaffson, Gaussian Mixture and Kernel Based Approach to Blind Source Separation Using Neural Networks, Articial Neural Networks, Vol., pp. 869 { 874, 998, Springer- Verlag [] A. Hyvarinen and E. Oja, Independent Component Analysis by General Non-Linear Hebbian- Like Rules, Signal Processing, Vol. 64, No. 3, pp. 3 { 33, 998 [] S. Laughlin, A Simple Coding Procedure Enhances a Neuron's Information Capacity, Z. Naturforsch, Vol. 36, pp. 9 { 9, 98 [] R. Linsker, An Application of the Principle of Maximum Information Preservation to Linear Systems, in Advances in Neural Information Processing System, (NIPS*88), pp. 86 { 94, Morgan-Kaufmann, 989 [3] R. Linsker, Local Synaptic Rules Suce to Maximize Mutual Information in a Linear Network, Neural Computation, Vol. 4, pp. 69 { 7, 99 [4] P. Moreland, Mixture of Experts Estimate A-Posteriori Probabilities, Articial Neural Networks, pp. 499 { 55, Springer-Verlag, 997 [5] J.P. Nadal, N. Brunel, and N. Parga, Nonlinear Feedforward Networks with Stochastic Inputs: Infomax Implies Redundancy Reduction, Network: Computation in neural Systems, Vol. 9, No., May 998 [6] D. Obradovic and G. Deco, Unsupervised Learning for Blind Source Separation: An Information-Approach, Proc. of International Conference on Acoustics, Speech and Signal Processing (ICASSP'97), pp. 7 { 3, 997 [7] M.D. Plumbley, Ecient Information Transfer and Anti-Hebbian Neural Networks, Neural Networks, Vol. 6, pp. 83 { 833, 993 [8] M.D. Plumbley, Approximating Optimal Information Transmission Using Local Hebbian Algorithm in a Double Feedback Loop, Articial Neural Networks, pp. 435 { 44, Springer- Verlag, 993 [9] Y.-H. Pao, Adaptive Pattern Recognition and Neural Networks, Addison-Wesley Publishing Company, 989 (Chpt. 8) [] Z. Roth and Y. Baram, Multidimensional Density Shaping by Sigmoids, IEEE Trans. on Neural Networks, Vol. 7, No. 5, pp. 9 { 98, Sept. 996 [] A. Sudjianto and M.H. Hassoun, Nonlinear Hebbian Rule: A Statistical Interpretation, Proc. of International Conference on Neural Networks (ICNN'94), Vol., pp. 47 { 5, 994 [] A. Taleb and C. Jutten, Entropy Optimization - Application to Source Separation, Articial Neural Networks, pp. 59 { 534, Springer-Verlag, 997 [3] K. Torkkola, Blind Deconvolution, Information Maximization and Recursive Filters, Proc. of International Conference on Acoustics, Speech and Signal Processing (ICASSP'97), pp. 33 { 334, 997 [4] V. Vapnik, The Support Vector Method, Articial Neural Networks, pp. 63 { 7, Springer- Verlag, 997 [5] L. Xu, C.C. Cheung, H.H. Yang, and S.-i. Amari, Independent Component Analysis by the Information-Theoretic Approach with Mixture of Densities, Proc. of International Joint Conference on Neural Networks (IJCNN'98), pp. 8 { 86, 998 [6] L. Xu, C.C. Cheung, J. Ruan, and S.-i. Amari, Nonlinearity and Separation Capability: Further Justications for the ICA Algorithm with a Learned Mixture of Parametric Densities, Proc. of European Symposium on Articial Neural Networks (ESANN'97), pp. 9 { 96, 997 [7] L. Xu, C.C. Cheung, and S.-i. Amari, Nonlinearity, Separation Capability and Learned Parametric Mixture ICA Algorithms, to appear on a special issue of the International Journal of Neural Systems, 998 [8] H.H. Yang and S.-i. Amari, Adaptive Online Learning Algorithms for Blind Separation: Maximum Entropy and Minimal Mutual Information, Neural Computation, Vol. 9, pp. 457 { 48, 997 [9] Y. Yang and A.R. Barron, An Asymptotic Property of Model Selection Criteria, IEEE Trans. on Information Theory, Vol. 44, No., pp. 95 { 6, Jan. 998 [3] S.M. Zurada, Introduction to Neural Articial Systems, West Publishing Company, 99

1 Introduction Independent component analysis (ICA) [10] is a statistical technique whose main applications are blind source separation, blind deconvo

1 Introduction Independent component analysis (ICA) [10] is a statistical technique whose main applications are blind source separation, blind deconvo The Fixed-Point Algorithm and Maximum Likelihood Estimation for Independent Component Analysis Aapo Hyvarinen Helsinki University of Technology Laboratory of Computer and Information Science P.O.Box 5400,

More information

Blind Deconvolution by Modified Bussgang Algorithm

Blind Deconvolution by Modified Bussgang Algorithm Reprinted from ISCAS'99, International Symposium on Circuits and Systems, Orlando, Florida, May 3-June 2, 1999 Blind Deconvolution by Modified Bussgang Algorithm Simone Fiori, Aurelio Uncini, Francesco

More information

On Information Maximization and Blind Signal Deconvolution

On Information Maximization and Blind Signal Deconvolution On Information Maximization and Blind Signal Deconvolution A Röbel Technical University of Berlin, Institute of Communication Sciences email: roebel@kgwtu-berlinde Abstract: In the following paper we investigate

More information

Entropy Manipulation of Arbitrary Non I inear Map pings

Entropy Manipulation of Arbitrary Non I inear Map pings Entropy Manipulation of Arbitrary Non I inear Map pings John W. Fisher I11 JosC C. Principe Computational NeuroEngineering Laboratory EB, #33, PO Box 116130 University of Floridaa Gainesville, FL 326 1

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

A Canonical Genetic Algorithm for Blind Inversion of Linear Channels

A Canonical Genetic Algorithm for Blind Inversion of Linear Channels A Canonical Genetic Algorithm for Blind Inversion of Linear Channels Fernando Rojas, Jordi Solé-Casals, Enric Monte-Moreno 3, Carlos G. Puntonet and Alberto Prieto Computer Architecture and Technology

More information

Blind channel deconvolution of real world signals using source separation techniques

Blind channel deconvolution of real world signals using source separation techniques Blind channel deconvolution of real world signals using source separation techniques Jordi Solé-Casals 1, Enric Monte-Moreno 2 1 Signal Processing Group, University of Vic, Sagrada Família 7, 08500, Vic

More information

PROPERTIES OF THE EMPIRICAL CHARACTERISTIC FUNCTION AND ITS APPLICATION TO TESTING FOR INDEPENDENCE. Noboru Murata

PROPERTIES OF THE EMPIRICAL CHARACTERISTIC FUNCTION AND ITS APPLICATION TO TESTING FOR INDEPENDENCE. Noboru Murata ' / PROPERTIES OF THE EMPIRICAL CHARACTERISTIC FUNCTION AND ITS APPLICATION TO TESTING FOR INDEPENDENCE Noboru Murata Waseda University Department of Electrical Electronics and Computer Engineering 3--

More information

4. Multilayer Perceptrons

4. Multilayer Perceptrons 4. Multilayer Perceptrons This is a supervised error-correction learning algorithm. 1 4.1 Introduction A multilayer feedforward network consists of an input layer, one or more hidden layers, and an output

More information

Non-Euclidean Independent Component Analysis and Oja's Learning

Non-Euclidean Independent Component Analysis and Oja's Learning Non-Euclidean Independent Component Analysis and Oja's Learning M. Lange 1, M. Biehl 2, and T. Villmann 1 1- University of Appl. Sciences Mittweida - Dept. of Mathematics Mittweida, Saxonia - Germany 2-

More information

MULTICHANNEL BLIND SEPARATION AND. Scott C. Douglas 1, Andrzej Cichocki 2, and Shun-ichi Amari 2

MULTICHANNEL BLIND SEPARATION AND. Scott C. Douglas 1, Andrzej Cichocki 2, and Shun-ichi Amari 2 MULTICHANNEL BLIND SEPARATION AND DECONVOLUTION OF SOURCES WITH ARBITRARY DISTRIBUTIONS Scott C. Douglas 1, Andrzej Cichoci, and Shun-ichi Amari 1 Department of Electrical Engineering, University of Utah

More information

Information maximization in a network of linear neurons

Information maximization in a network of linear neurons Information maximization in a network of linear neurons Holger Arnold May 30, 005 1 Introduction It is known since the work of Hubel and Wiesel [3], that many cells in the early visual areas of mammals

More information

Undercomplete Independent Component. Analysis for Signal Separation and. Dimension Reduction. Category: Algorithms and Architectures.

Undercomplete Independent Component. Analysis for Signal Separation and. Dimension Reduction. Category: Algorithms and Architectures. Undercomplete Independent Component Analysis for Signal Separation and Dimension Reduction John Porrill and James V Stone Psychology Department, Sheeld University, Sheeld, S10 2UR, England. Tel: 0114 222

More information

In: Proc. BENELEARN-98, 8th Belgian-Dutch Conference on Machine Learning, pp 9-46, 998 Linear Quadratic Regulation using Reinforcement Learning Stephan ten Hagen? and Ben Krose Department of Mathematics,

More information

Blind Machine Separation Te-Won Lee

Blind Machine Separation Te-Won Lee Blind Machine Separation Te-Won Lee University of California, San Diego Institute for Neural Computation Blind Machine Separation Problem we want to solve: Single microphone blind source separation & deconvolution

More information

One-unit Learning Rules for Independent Component Analysis

One-unit Learning Rules for Independent Component Analysis One-unit Learning Rules for Independent Component Analysis Aapo Hyvarinen and Erkki Oja Helsinki University of Technology Laboratory of Computer and Information Science Rakentajanaukio 2 C, FIN-02150 Espoo,

More information

where A 2 IR m n is the mixing matrix, s(t) is the n-dimensional source vector (n» m), and v(t) is additive white noise that is statistically independ

where A 2 IR m n is the mixing matrix, s(t) is the n-dimensional source vector (n» m), and v(t) is additive white noise that is statistically independ BLIND SEPARATION OF NONSTATIONARY AND TEMPORALLY CORRELATED SOURCES FROM NOISY MIXTURES Seungjin CHOI x and Andrzej CICHOCKI y x Department of Electrical Engineering Chungbuk National University, KOREA

More information

TWO METHODS FOR ESTIMATING OVERCOMPLETE INDEPENDENT COMPONENT BASES. Mika Inki and Aapo Hyvärinen

TWO METHODS FOR ESTIMATING OVERCOMPLETE INDEPENDENT COMPONENT BASES. Mika Inki and Aapo Hyvärinen TWO METHODS FOR ESTIMATING OVERCOMPLETE INDEPENDENT COMPONENT BASES Mika Inki and Aapo Hyvärinen Neural Networks Research Centre Helsinki University of Technology P.O. Box 54, FIN-215 HUT, Finland ABSTRACT

More information

ADAPTIVE LATERAL INHIBITION FOR NON-NEGATIVE ICA. Mark Plumbley

ADAPTIVE LATERAL INHIBITION FOR NON-NEGATIVE ICA. Mark Plumbley Submitteed to the International Conference on Independent Component Analysis and Blind Signal Separation (ICA2) ADAPTIVE LATERAL INHIBITION FOR NON-NEGATIVE ICA Mark Plumbley Audio & Music Lab Department

More information

y(n) Time Series Data

y(n) Time Series Data Recurrent SOM with Local Linear Models in Time Series Prediction Timo Koskela, Markus Varsta, Jukka Heikkonen, and Kimmo Kaski Helsinki University of Technology Laboratory of Computational Engineering

More information

To appear in Proceedings of the ICA'99, Aussois, France, A 2 R mn is an unknown mixture matrix of full rank, v(t) is the vector of noises. The

To appear in Proceedings of the ICA'99, Aussois, France, A 2 R mn is an unknown mixture matrix of full rank, v(t) is the vector of noises. The To appear in Proceedings of the ICA'99, Aussois, France, 1999 1 NATURAL GRADIENT APPROACH TO BLIND SEPARATION OF OVER- AND UNDER-COMPLETE MIXTURES L.-Q. Zhang, S. Amari and A. Cichocki Brain-style Information

More information

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood

More information

Feature Extraction with Weighted Samples Based on Independent Component Analysis

Feature Extraction with Weighted Samples Based on Independent Component Analysis Feature Extraction with Weighted Samples Based on Independent Component Analysis Nojun Kwak Samsung Electronics, Suwon P.O. Box 105, Suwon-Si, Gyeonggi-Do, KOREA 442-742, nojunk@ieee.org, WWW home page:

More information

Error Empirical error. Generalization error. Time (number of iteration)

Error Empirical error. Generalization error. Time (number of iteration) Submitted to Neural Networks. Dynamics of Batch Learning in Multilayer Networks { Overrealizability and Overtraining { Kenji Fukumizu The Institute of Physical and Chemical Research (RIKEN) E-mail: fuku@brain.riken.go.jp

More information

below, kernel PCA Eigenvectors, and linear combinations thereof. For the cases where the pre-image does exist, we can provide a means of constructing

below, kernel PCA Eigenvectors, and linear combinations thereof. For the cases where the pre-image does exist, we can provide a means of constructing Kernel PCA Pattern Reconstruction via Approximate Pre-Images Bernhard Scholkopf, Sebastian Mika, Alex Smola, Gunnar Ratsch, & Klaus-Robert Muller GMD FIRST, Rudower Chaussee 5, 12489 Berlin, Germany fbs,

More information

Fundamentals of Principal Component Analysis (PCA), Independent Component Analysis (ICA), and Independent Vector Analysis (IVA)

Fundamentals of Principal Component Analysis (PCA), Independent Component Analysis (ICA), and Independent Vector Analysis (IVA) Fundamentals of Principal Component Analysis (PCA),, and Independent Vector Analysis (IVA) Dr Mohsen Naqvi Lecturer in Signal and Information Processing, School of Electrical and Electronic Engineering,

More information

Remaining energy on log scale Number of linear PCA components

Remaining energy on log scale Number of linear PCA components NONLINEAR INDEPENDENT COMPONENT ANALYSIS USING ENSEMBLE LEARNING: EXPERIMENTS AND DISCUSSION Harri Lappalainen, Xavier Giannakopoulos, Antti Honkela, and Juha Karhunen Helsinki University of Technology,

More information

Independent Component Analysis and Unsupervised Learning

Independent Component Analysis and Unsupervised Learning Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien National Cheng Kung University TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent

More information

SEPARATION OF ACOUSTIC SIGNALS USING SELF-ORGANIZING NEURAL NETWORKS. Temujin Gautama & Marc M. Van Hulle

SEPARATION OF ACOUSTIC SIGNALS USING SELF-ORGANIZING NEURAL NETWORKS. Temujin Gautama & Marc M. Van Hulle SEPARATION OF ACOUSTIC SIGNALS USING SELF-ORGANIZING NEURAL NETWORKS Temujin Gautama & Marc M. Van Hulle K.U.Leuven, Laboratorium voor Neuro- en Psychofysiologie Campus Gasthuisberg, Herestraat 49, B-3000

More information

Blind Extraction of Singularly Mixed Source Signals

Blind Extraction of Singularly Mixed Source Signals IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL 11, NO 6, NOVEMBER 2000 1413 Blind Extraction of Singularly Mixed Source Signals Yuanqing Li, Jun Wang, Senior Member, IEEE, and Jacek M Zurada, Fellow, IEEE Abstract

More information

x 1 (t) Spectrogram t s

x 1 (t) Spectrogram t s A METHOD OF ICA IN TIME-FREQUENCY DOMAIN Shiro Ikeda PRESTO, JST Hirosawa 2-, Wako, 35-98, Japan Shiro.Ikeda@brain.riken.go.jp Noboru Murata RIKEN BSI Hirosawa 2-, Wako, 35-98, Japan Noboru.Murata@brain.riken.go.jp

More information

BLIND SEPARATION OF INSTANTANEOUS MIXTURES OF NON STATIONARY SOURCES

BLIND SEPARATION OF INSTANTANEOUS MIXTURES OF NON STATIONARY SOURCES BLIND SEPARATION OF INSTANTANEOUS MIXTURES OF NON STATIONARY SOURCES Dinh-Tuan Pham Laboratoire de Modélisation et Calcul URA 397, CNRS/UJF/INPG BP 53X, 38041 Grenoble cédex, France Dinh-Tuan.Pham@imag.fr

More information

Novel determination of dierential-equation solutions: universal approximation method

Novel determination of dierential-equation solutions: universal approximation method Journal of Computational and Applied Mathematics 146 (2002) 443 457 www.elsevier.com/locate/cam Novel determination of dierential-equation solutions: universal approximation method Thananchai Leephakpreeda

More information

Recursive Generalized Eigendecomposition for Independent Component Analysis

Recursive Generalized Eigendecomposition for Independent Component Analysis Recursive Generalized Eigendecomposition for Independent Component Analysis Umut Ozertem 1, Deniz Erdogmus 1,, ian Lan 1 CSEE Department, OGI, Oregon Health & Science University, Portland, OR, USA. {ozertemu,deniz}@csee.ogi.edu

More information

ORIENTED PCA AND BLIND SIGNAL SEPARATION

ORIENTED PCA AND BLIND SIGNAL SEPARATION ORIENTED PCA AND BLIND SIGNAL SEPARATION K. I. Diamantaras Department of Informatics TEI of Thessaloniki Sindos 54101, Greece kdiamant@it.teithe.gr Th. Papadimitriou Department of Int. Economic Relat.

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

1 Introduction Blind source separation (BSS) is a fundamental problem which is encountered in a variety of signal processing problems where multiple s

1 Introduction Blind source separation (BSS) is a fundamental problem which is encountered in a variety of signal processing problems where multiple s Blind Separation of Nonstationary Sources in Noisy Mixtures Seungjin CHOI x1 and Andrzej CICHOCKI y x Department of Electrical Engineering Chungbuk National University 48 Kaeshin-dong, Cheongju Chungbuk

More information

CIFAR Lectures: Non-Gaussian statistics and natural images

CIFAR Lectures: Non-Gaussian statistics and natural images CIFAR Lectures: Non-Gaussian statistics and natural images Dept of Computer Science University of Helsinki, Finland Outline Part I: Theory of ICA Definition and difference to PCA Importance of non-gaussianity

More information

Learning with Ensembles: How. over-tting can be useful. Anders Krogh Copenhagen, Denmark. Abstract

Learning with Ensembles: How. over-tting can be useful. Anders Krogh Copenhagen, Denmark. Abstract Published in: Advances in Neural Information Processing Systems 8, D S Touretzky, M C Mozer, and M E Hasselmo (eds.), MIT Press, Cambridge, MA, pages 190-196, 1996. Learning with Ensembles: How over-tting

More information

Massoud BABAIE-ZADEH. Blind Source Separation (BSS) and Independent Componen Analysis (ICA) p.1/39

Massoud BABAIE-ZADEH. Blind Source Separation (BSS) and Independent Componen Analysis (ICA) p.1/39 Blind Source Separation (BSS) and Independent Componen Analysis (ICA) Massoud BABAIE-ZADEH Blind Source Separation (BSS) and Independent Componen Analysis (ICA) p.1/39 Outline Part I Part II Introduction

More information

ON-LINE BLIND SEPARATION OF NON-STATIONARY SIGNALS

ON-LINE BLIND SEPARATION OF NON-STATIONARY SIGNALS Yugoslav Journal of Operations Research 5 (25), Number, 79-95 ON-LINE BLIND SEPARATION OF NON-STATIONARY SIGNALS Slavica TODOROVIĆ-ZARKULA EI Professional Electronics, Niš, bssmtod@eunet.yu Branimir TODOROVIĆ,

More information

ON SOME EXTENSIONS OF THE NATURAL GRADIENT ALGORITHM. Brain Science Institute, RIKEN, Wako-shi, Saitama , Japan

ON SOME EXTENSIONS OF THE NATURAL GRADIENT ALGORITHM. Brain Science Institute, RIKEN, Wako-shi, Saitama , Japan ON SOME EXTENSIONS OF THE NATURAL GRADIENT ALGORITHM Pando Georgiev a, Andrzej Cichocki b and Shun-ichi Amari c Brain Science Institute, RIKEN, Wako-shi, Saitama 351-01, Japan a On leave from the Sofia

More information

Reading Group on Deep Learning Session 1

Reading Group on Deep Learning Session 1 Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular

More information

Statistical Independence and Novelty Detection with Information Preserving Nonlinear Maps

Statistical Independence and Novelty Detection with Information Preserving Nonlinear Maps Statistical Independence and Novelty Detection with Information Preserving Nonlinear Maps Lucas Parra, Gustavo Deco, Stefan Miesbach Siemens AG, Corporate Research and Development, ZFE ST SN 4 Otto-Hahn-Ring

More information

Speed and Accuracy Enhancement of Linear ICA Techniques Using Rational Nonlinear Functions

Speed and Accuracy Enhancement of Linear ICA Techniques Using Rational Nonlinear Functions Speed and Accuracy Enhancement of Linear ICA Techniques Using Rational Nonlinear Functions Petr Tichavský 1, Zbyněk Koldovský 1,2, and Erkki Oja 3 1 Institute of Information Theory and Automation, Pod

More information

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II Gatsby Unit University College London 27 Feb 2017 Outline Part I: Theory of ICA Definition and difference

More information

Independent Component Analysis. Contents

Independent Component Analysis. Contents Contents Preface xvii 1 Introduction 1 1.1 Linear representation of multivariate data 1 1.1.1 The general statistical setting 1 1.1.2 Dimension reduction methods 2 1.1.3 Independence as a guiding principle

More information

Fundamentals of Neural Network

Fundamentals of Neural Network Chapter 3 Fundamentals of Neural Network One of the main challenge in actual times researching, is the construction of AI (Articial Intelligence) systems. These systems could be understood as any physical

More information

A Non-linear Information Maximisation Algorithm that Performs Blind Separation.

A Non-linear Information Maximisation Algorithm that Performs Blind Separation. A Non-linear Information Maximisation Algorithm that Performs Blind Separation. Anthony J. Bell tonylosalk.edu Terrence J. Sejnowski terrylosalk.edu Computational Neurobiology Laboratory The Salk Institute

More information

On the INFOMAX algorithm for blind signal separation

On the INFOMAX algorithm for blind signal separation University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2000 On the INFOMAX algorithm for blind signal separation Jiangtao Xi

More information

Gradient Descent Training Rule: The Details

Gradient Descent Training Rule: The Details Gradient Descent Training Rule: The Details 1 For Perceptrons The whole idea behind gradient descent is to gradually, but consistently, decrease the output error by adjusting the weights. The trick is

More information

SYSTEM RECONSTRUCTION FROM SELECTED HOS REGIONS. Haralambos Pozidis and Athina P. Petropulu. Drexel University, Philadelphia, PA 19104

SYSTEM RECONSTRUCTION FROM SELECTED HOS REGIONS. Haralambos Pozidis and Athina P. Petropulu. Drexel University, Philadelphia, PA 19104 SYSTEM RECOSTRUCTIO FROM SELECTED HOS REGIOS Haralambos Pozidis and Athina P. Petropulu Electrical and Computer Engineering Department Drexel University, Philadelphia, PA 94 Tel. (25) 895-2358 Fax. (25)

More information

Performance Bounds for Joint Source-Channel Coding of Uniform. Departements *Communications et **Signal

Performance Bounds for Joint Source-Channel Coding of Uniform. Departements *Communications et **Signal Performance Bounds for Joint Source-Channel Coding of Uniform Memoryless Sources Using a Binary ecomposition Seyed Bahram ZAHIR AZAMI*, Olivier RIOUL* and Pierre UHAMEL** epartements *Communications et

More information

ADAPTIVE FILTER THEORY

ADAPTIVE FILTER THEORY ADAPTIVE FILTER THEORY Fourth Edition Simon Haykin Communications Research Laboratory McMaster University Hamilton, Ontario, Canada Front ice Hall PRENTICE HALL Upper Saddle River, New Jersey 07458 Preface

More information

Blind deconvolution by simple adaptive activation function neuron

Blind deconvolution by simple adaptive activation function neuron Neurocomputing 48 (2002) 763 778 www.elsevier.com/locate/neucom Blind deconvolution by simple adaptive activation function neuron Simone Fiori Neural Networks and Adaptive Systems Research Group - Dept.

More information

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models JMLR Workshop and Conference Proceedings 6:17 164 NIPS 28 workshop on causality Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models Kun Zhang Dept of Computer Science and HIIT University

More information

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models Kun Zhang Dept of Computer Science and HIIT University of Helsinki 14 Helsinki, Finland kun.zhang@cs.helsinki.fi Aapo Hyvärinen

More information

The Bias-Variance dilemma of the Monte Carlo. method. Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel

The Bias-Variance dilemma of the Monte Carlo. method. Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel The Bias-Variance dilemma of the Monte Carlo method Zlochin Mark 1 and Yoram Baram 1 Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel fzmark,baramg@cs.technion.ac.il Abstract.

More information

Independent Component Analysis of Incomplete Data

Independent Component Analysis of Incomplete Data Independent Component Analysis of Incomplete Data Max Welling Markus Weber California Institute of Technology 136-93 Pasadena, CA 91125 fwelling,rmwg@vision.caltech.edu Keywords: EM, Missing Data, ICA

More information

Blind separation of sources that have spatiotemporal variance dependencies

Blind separation of sources that have spatiotemporal variance dependencies Blind separation of sources that have spatiotemporal variance dependencies Aapo Hyvärinen a b Jarmo Hurri a a Neural Networks Research Centre, Helsinki University of Technology, Finland b Helsinki Institute

More information

An Improved Cumulant Based Method for Independent Component Analysis

An Improved Cumulant Based Method for Independent Component Analysis An Improved Cumulant Based Method for Independent Component Analysis Tobias Blaschke and Laurenz Wiskott Institute for Theoretical Biology Humboldt University Berlin Invalidenstraße 43 D - 0 5 Berlin Germany

More information

Carnegie Mellon University Forbes Ave. Pittsburgh, PA 15213, USA. fmunos, leemon, V (x)ln + max. cost functional [3].

Carnegie Mellon University Forbes Ave. Pittsburgh, PA 15213, USA. fmunos, leemon, V (x)ln + max. cost functional [3]. Gradient Descent Approaches to Neural-Net-Based Solutions of the Hamilton-Jacobi-Bellman Equation Remi Munos, Leemon C. Baird and Andrew W. Moore Robotics Institute and Computer Science Department, Carnegie

More information

In: Advances in Intelligent Data Analysis (AIDA), International Computer Science Conventions. Rochester New York, 1999

In: Advances in Intelligent Data Analysis (AIDA), International Computer Science Conventions. Rochester New York, 1999 In: Advances in Intelligent Data Analysis (AIDA), Computational Intelligence Methods and Applications (CIMA), International Computer Science Conventions Rochester New York, 999 Feature Selection Based

More information

Generalized Information Potential Criterion for Adaptive System Training

Generalized Information Potential Criterion for Adaptive System Training IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 5, SEPTEMBER 2002 1035 Generalized Information Potential Criterion for Adaptive System Training Deniz Erdogmus, Student Member, IEEE, and Jose C. Principe,

More information

CONVOLUTIVE NON-NEGATIVE MATRIX FACTORISATION WITH SPARSENESS CONSTRAINT

CONVOLUTIVE NON-NEGATIVE MATRIX FACTORISATION WITH SPARSENESS CONSTRAINT CONOLUTIE NON-NEGATIE MATRIX FACTORISATION WITH SPARSENESS CONSTRAINT Paul D. O Grady Barak A. Pearlmutter Hamilton Institute National University of Ireland, Maynooth Co. Kildare, Ireland. ABSTRACT Discovering

More information

Comparative Analysis of ICA Based Features

Comparative Analysis of ICA Based Features International Journal of Emerging Engineering Research and Technology Volume 2, Issue 7, October 2014, PP 267-273 ISSN 2349-4395 (Print) & ISSN 2349-4409 (Online) Comparative Analysis of ICA Based Features

More information

Modifying Voice Activity Detection in Low SNR by correction factors

Modifying Voice Activity Detection in Low SNR by correction factors Modifying Voice Activity Detection in Low SNR by correction factors H. Farsi, M. A. Mozaffarian, H.Rahmani Department of Electrical Engineering University of Birjand P.O. Box: +98-9775-376 IRAN hfarsi@birjand.ac.ir

More information

Blind separation of instantaneous mixtures of dependent sources

Blind separation of instantaneous mixtures of dependent sources Blind separation of instantaneous mixtures of dependent sources Marc Castella and Pierre Comon GET/INT, UMR-CNRS 7, 9 rue Charles Fourier, 9 Évry Cedex, France marc.castella@int-evry.fr, CNRS, I3S, UMR

More information

Support Vector Machines vs Multi-Layer. Perceptron in Particle Identication. DIFI, Universita di Genova (I) INFN Sezione di Genova (I) Cambridge (US)

Support Vector Machines vs Multi-Layer. Perceptron in Particle Identication. DIFI, Universita di Genova (I) INFN Sezione di Genova (I) Cambridge (US) Support Vector Machines vs Multi-Layer Perceptron in Particle Identication N.Barabino 1, M.Pallavicini 2, A.Petrolini 1;2, M.Pontil 3;1, A.Verri 4;3 1 DIFI, Universita di Genova (I) 2 INFN Sezione di Genova

More information

Natural Gradient Learning for Over- and Under-Complete Bases in ICA

Natural Gradient Learning for Over- and Under-Complete Bases in ICA NOTE Communicated by Jean-François Cardoso Natural Gradient Learning for Over- and Under-Complete Bases in ICA Shun-ichi Amari RIKEN Brain Science Institute, Wako-shi, Hirosawa, Saitama 351-01, Japan Independent

More information

1 Introduction Consider the following: given a cost function J (w) for the parameter vector w = [w1 w2 w n ] T, maximize J (w) (1) such that jjwjj = C

1 Introduction Consider the following: given a cost function J (w) for the parameter vector w = [w1 w2 w n ] T, maximize J (w) (1) such that jjwjj = C On Gradient Adaptation With Unit-Norm Constraints Scott C. Douglas 1, Shun-ichi Amari 2, and S.-Y. Kung 3 1 Department of Electrical Engineering, Southern Methodist University Dallas, Texas 75275 USA 2

More information

ICA [6] ICA) [7, 8] ICA ICA ICA [9, 10] J-F. Cardoso. [13] Matlab ICA. Comon[3], Amari & Cardoso[4] ICA ICA

ICA [6] ICA) [7, 8] ICA ICA ICA [9, 10] J-F. Cardoso. [13] Matlab ICA. Comon[3], Amari & Cardoso[4] ICA ICA 16 1 (Independent Component Analysis: ICA) 198 9 ICA ICA ICA 1 ICA 198 Jutten Herault Comon[3], Amari & Cardoso[4] ICA Comon (PCA) projection persuit projection persuit ICA ICA ICA 1 [1] [] ICA ICA EEG

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3

More information

MINIMIZATION-PROJECTION (MP) APPROACH FOR BLIND SOURCE SEPARATION IN DIFFERENT MIXING MODELS

MINIMIZATION-PROJECTION (MP) APPROACH FOR BLIND SOURCE SEPARATION IN DIFFERENT MIXING MODELS MINIMIZATION-PROJECTION (MP) APPROACH FOR BLIND SOURCE SEPARATION IN DIFFERENT MIXING MODELS Massoud Babaie-Zadeh ;2, Christian Jutten, Kambiz Nayebi 2 Institut National Polytechnique de Grenoble (INPG),

More information

Higher Order Statistics

Higher Order Statistics Higher Order Statistics Matthias Hennig Neural Information Processing School of Informatics, University of Edinburgh February 12, 2018 1 0 Based on Mark van Rossum s and Chris Williams s old NIP slides

More information

Stable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems

Stable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems Stable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems Thore Graepel and Nicol N. Schraudolph Institute of Computational Science ETH Zürich, Switzerland {graepel,schraudo}@inf.ethz.ch

More information

NONLINEAR BLIND SOURCE SEPARATION USING KERNEL FEATURE SPACES.

NONLINEAR BLIND SOURCE SEPARATION USING KERNEL FEATURE SPACES. NONLINEAR BLIND SOURCE SEPARATION USING KERNEL FEATURE SPACES Stefan Harmeling 1, Andreas Ziehe 1, Motoaki Kawanabe 1, Benjamin Blankertz 1, Klaus-Robert Müller 1, 1 GMD FIRST.IDA, Kekuléstr. 7, 1489 Berlin,

More information

Independent Component Analysis

Independent Component Analysis A Short Introduction to Independent Component Analysis Aapo Hyvärinen Helsinki Institute for Information Technology and Depts of Computer Science and Psychology University of Helsinki Problem of blind

More information

Velocity distribution in active particles systems, Supplemental Material

Velocity distribution in active particles systems, Supplemental Material Velocity distribution in active particles systems, Supplemental Material Umberto Marini Bettolo Marconi, 1 Claudio Maggi, 2 Nicoletta Gnan, 2 Matteo Paoluzzi, 3 and Roberto Di Leonardo 2 1 Scuola di Scienze

More information

Ecient Higher-order Neural Networks. for Classication and Function Approximation. Joydeep Ghosh and Yoan Shin. The University of Texas at Austin

Ecient Higher-order Neural Networks. for Classication and Function Approximation. Joydeep Ghosh and Yoan Shin. The University of Texas at Austin Ecient Higher-order Neural Networks for Classication and Function Approximation Joydeep Ghosh and Yoan Shin Department of Electrical and Computer Engineering The University of Texas at Austin Austin, TX

More information

IE 5531: Engineering Optimization I

IE 5531: Engineering Optimization I IE 5531: Engineering Optimization I Lecture 15: Nonlinear optimization Prof. John Gunnar Carlsson November 1, 2010 Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 1, 2010 1 / 24

More information

Independent Component Analysis on the Basis of Helmholtz Machine

Independent Component Analysis on the Basis of Helmholtz Machine Independent Component Analysis on the Basis of Helmholtz Machine Masashi OHATA *1 ohatama@bmc.riken.go.jp Toshiharu MUKAI *1 tosh@bmc.riken.go.jp Kiyotoshi MATSUOKA *2 matsuoka@brain.kyutech.ac.jp *1 Biologically

More information

SPARSE REPRESENTATION AND BLIND DECONVOLUTION OF DYNAMICAL SYSTEMS. Liqing Zhang and Andrzej Cichocki

SPARSE REPRESENTATION AND BLIND DECONVOLUTION OF DYNAMICAL SYSTEMS. Liqing Zhang and Andrzej Cichocki SPARSE REPRESENTATON AND BLND DECONVOLUTON OF DYNAMCAL SYSTEMS Liqing Zhang and Andrzej Cichocki Lab for Advanced Brain Signal Processing RKEN Brain Science nstitute Wako shi, Saitama, 351-198, apan zha,cia

More information

Analytical solution of the blind source separation problem using derivatives

Analytical solution of the blind source separation problem using derivatives Analytical solution of the blind source separation problem using derivatives Sebastien Lagrange 1,2, Luc Jaulin 2, Vincent Vigneron 1, and Christian Jutten 1 1 Laboratoire Images et Signaux, Institut National

More information

1 Introduction Tasks like voice or face recognition are quite dicult to realize with conventional computer systems, even for the most powerful of them

1 Introduction Tasks like voice or face recognition are quite dicult to realize with conventional computer systems, even for the most powerful of them Information Storage Capacity of Incompletely Connected Associative Memories Holger Bosch Departement de Mathematiques et d'informatique Ecole Normale Superieure de Lyon Lyon, France Franz Kurfess Department

More information

= w 2. w 1. B j. A j. C + j1j2

= w 2. w 1. B j. A j. C + j1j2 Local Minima and Plateaus in Multilayer Neural Networks Kenji Fukumizu and Shun-ichi Amari Brain Science Institute, RIKEN Hirosawa 2-, Wako, Saitama 35-098, Japan E-mail: ffuku, amarig@brain.riken.go.jp

More information

p(z)

p(z) Chapter Statistics. Introduction This lecture is a quick review of basic statistical concepts; probabilities, mean, variance, covariance, correlation, linear regression, probability density functions and

More information

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks Jan Drchal Czech Technical University in Prague Faculty of Electrical Engineering Department of Computer Science Topics covered

More information

computation of the algorithms it is useful to introduce some sort of mapping that reduces the dimension of the data set before applying signal process

computation of the algorithms it is useful to introduce some sort of mapping that reduces the dimension of the data set before applying signal process Optimal Dimension Reduction for Array Processing { Generalized Soren Anderson y and Arye Nehorai Department of Electrical Engineering Yale University New Haven, CT 06520 EDICS Category: 3.6, 3.8. Abstract

More information

Artificial Neural Networks Examination, March 2004

Artificial Neural Networks Examination, March 2004 Artificial Neural Networks Examination, March 2004 Instructions There are SIXTY questions (worth up to 60 marks). The exam mark (maximum 60) will be added to the mark obtained in the laborations (maximum

More information

Departement Elektrotechniek ESAT-SISTA/TR Dynamical System Prediction: a Lie algebraic approach for a novel. neural architecture 1

Departement Elektrotechniek ESAT-SISTA/TR Dynamical System Prediction: a Lie algebraic approach for a novel. neural architecture 1 Katholieke Universiteit Leuven Departement Elektrotechniek ESAT-SISTA/TR 1995-47 Dynamical System Prediction: a Lie algebraic approach for a novel neural architecture 1 Yves Moreau and Joos Vandewalle

More information

BUMPLESS SWITCHING CONTROLLERS. William A. Wolovich and Alan B. Arehart 1. December 27, Abstract

BUMPLESS SWITCHING CONTROLLERS. William A. Wolovich and Alan B. Arehart 1. December 27, Abstract BUMPLESS SWITCHING CONTROLLERS William A. Wolovich and Alan B. Arehart 1 December 7, 1995 Abstract This paper outlines the design of bumpless switching controllers that can be used to stabilize MIMO plants

More information

Blind Equalization Formulated as a Self-organized Learning Process

Blind Equalization Formulated as a Self-organized Learning Process Blind Equalization Formulated as a Self-organized Learning Process Simon Haykin Communications Research Laboratory McMaster University 1280 Main Street West Hamilton, Ontario, Canada L8S 4K1 Abstract In

More information

Equivalence of Backpropagation and Contrastive Hebbian Learning in a Layered Network

Equivalence of Backpropagation and Contrastive Hebbian Learning in a Layered Network LETTER Communicated by Geoffrey Hinton Equivalence of Backpropagation and Contrastive Hebbian Learning in a Layered Network Xiaohui Xie xhx@ai.mit.edu Department of Brain and Cognitive Sciences, Massachusetts

More information

ICA. Independent Component Analysis. Zakariás Mátyás

ICA. Independent Component Analysis. Zakariás Mátyás ICA Independent Component Analysis Zakariás Mátyás Contents Definitions Introduction History Algorithms Code Uses of ICA Definitions ICA Miture Separation Signals typical signals Multivariate statistics

More information

Simultaneous Diagonalization in the Frequency Domain (SDIF) for Source Separation

Simultaneous Diagonalization in the Frequency Domain (SDIF) for Source Separation Simultaneous Diagonalization in the Frequency Domain (SDIF) for Source Separation Hsiao-Chun Wu and Jose C. Principe Computational Neuro-Engineering Laboratory Department of Electrical and Computer Engineering

More information

Introduction to Neural Networks

Introduction to Neural Networks Introduction to Neural Networks Steve Renals Automatic Speech Recognition ASR Lecture 10 24 February 2014 ASR Lecture 10 Introduction to Neural Networks 1 Neural networks for speech recognition Introduction

More information

An Iterative Blind Source Separation Method for Convolutive Mixtures of Images

An Iterative Blind Source Separation Method for Convolutive Mixtures of Images An Iterative Blind Source Separation Method for Convolutive Mixtures of Images Marc Castella and Jean-Christophe Pesquet Université de Marne-la-Vallée / UMR-CNRS 8049 5 bd Descartes, Champs-sur-Marne 77454

More information

The Best Circulant Preconditioners for Hermitian Toeplitz Systems II: The Multiple-Zero Case Raymond H. Chan Michael K. Ng y Andy M. Yip z Abstract In

The Best Circulant Preconditioners for Hermitian Toeplitz Systems II: The Multiple-Zero Case Raymond H. Chan Michael K. Ng y Andy M. Yip z Abstract In The Best Circulant Preconditioners for Hermitian Toeplitz Systems II: The Multiple-ero Case Raymond H. Chan Michael K. Ng y Andy M. Yip z Abstract In [0, 4], circulant-type preconditioners have been proposed

More information