FEBAM: A Feature-Extracting Bidirectional Associative Memory

Size: px
Start display at page:

Download "FEBAM: A Feature-Extracting Bidirectional Associative Memory"

Transcription

1 Proceedings of International Joint Conference on eural etworks, Orlo, Florida, USA, August -7, 007 FEBAM: A Feature-Extracting Bidirectional Associative Memory Sylvain Chartier, Gyslain Giguère, Patrice Renaud, Jean-Marc Lina & Robert Proulx Abstract In this paper, a new model that can ultimately create its own set of perceptual features is proposed. Using a bidirectional associative memory (BAM)-inspired architecture, the resulting model inherits properties such as attractor-like behavior successful processing of noisy inputs, while being able to achieve principal component analysis (PCA) tasks such as feature extraction dimensionality reduction. he model is tested by simulating image reconstruction blind source separation tasks. Simulations show that the model fares particularly well compared to current neural PCA independent component analysis (ICA) algorithms. It is argued the model possesses more cognitive explanative power than any other nonlinear/linear PCA ICA algorithm. I I. IRODUCIO n everyday life, humans are constantly exposed to situations in which they must discriminate, identify, classify categorize perceptual patterns (such as visual objects) in order to act upon the identity properties of the encountered stimuli. o achieve this task, cognitive scientists have historically taken for granted that the human cognitive system uses sets of properties (or dimensions) which crisply or fuzzily define a particular object or category []. hese properties are thought to strictly contain information that is invariant relevant to future decisionmaking situations. hese characteristics of human properties tend to accelerate cognitive processes by reducing the amount of information to be taken into account in a given situation. While most researchers view properties dimensions as symbolic (or logically-based), some have argued for perceptual symbol or feature systems, in which properties would be defined in an abstract iconic fashion created in major part by associative bottom-up processes [, 3]. o achieve this perceptual feature creation scheme, humans are sometimes hypothesized to implicitly use an information reduction strategy. Unfortunately, while the idea of a system creating its own atomic perceptual representations has been deemed crucial to understing many cognitive Sylvain Chartier, School of psychology, University of Ottawa (Ottawa, Canada). Institut Philippe-Pinel de Montréal (Montréal, Canada). ( chartier.sylvain@gmail.com). Gyslain Giguère, Département d informatique, Université du Québec à Montréal (Montréal, Canada). ( giguere.gyslain@uqam.ca). Patrice Renaud, Département de psychologie, Université du Québec en Outaouais (Gatineau, Canada). ( patrice.renaud@uqo.ca). Jean-Marc Lina, Département de génie électrique, École de echnologie Supérieure (Montréal, Canada). ( jean-marc.lina@etsmtl.ca). Robert Proulx, Département de psychologie, Université du Québec à Montréal (Montréal, Canada). ( proulx.robert@uqam.ca). processes [3], few efforts have been made to model perceptual feature creation processes in humans. When faced with the task of modeling this strategy, one statistical solution is to use a principal component analysis (PCA) approach [4]. PCA neural networks [5] can deal quite effectively with input variability by creating (or extracting ) statistical low-level features that represent the data s intrinsic information to a certain degree. he features are selected in such a way that the input data s dimensionality is reduced while preserving important information. While widely used for still moving picture compression, as well as many other computer science engineering applications, PCA networks have not made a breakthrough in the world of cognitive modeling, even if they possess many interesting properties such as psychologically plausible local rules flexible architecture growth, can deal with noisy inputs extract multiple components in parallel. his is probably because cognitive science researchers are looking for solutions to apply to multiple perceptual/cognitive tasks without assuming a high number of different modules (needing ad hoc modifications of the rules architecture) or adding multiple fitting parameters. his can hardly be done with PCA networks, since this class of neural network models is feedforward by definition, therefore does not lead to attractor-like (or categorical ) behavior. Also, the absence of explicitly depicted connections in the model forces supervised learning to be done using an external teacher. Moreover, some PCA models cannot even perform online or local learning [5]. A class of neural networks known as recurrent autoassociative memory (RAM) models does respond to these last requirements. In psychology, AI engineering, autoassociative memory models are widely used to store correlated patterns. One characteristic of RAMs is the use of a feedback loop, which allows for generalization to new patterns, noise filtering pattern completion, among other uses [6]. Feedback enables a given network to shift progressively from an initial pattern towards an invariable state (namely, an attractor). Because the information is distributed among the units, the network s attractors are global. If the model is properly trained, the attractors should then correspond to the learned patterns [6]. Direct generalization of RAM models is the development of bidirectional associative memory (BAM) models [7]. BAMs can associate any two data vectors of equal or different lengths (representing for example, a visual input X/07/$ IEEE

2 a prototype or category). hese networks possess the advantage of being both autoassociative heteroassociative memories [7], therefore encompass both unsupervised supervised learning. A problem with BAM models is that contrarily to what is found in real-world situations, they store information usingnoise free versions of the input patterns. In comparison, to overcome the possibly infinite number of stimuli stemming, for instance, from multiple perceptual events, humans must regroup these unique noisy inputs into a finite number of stimuli or categories. his type of learning in the presence of noise should therefore be taken into account by BAM models. o sum up, on one h, PCA models can deal with noise but do not have the properties of attractor networks. On the other h, BAM models exhibit attractor-like behavior but have difficulty dealing with noise. In this paper, it will be shown that unifying both classes of networks under the same framework leads to a model that possesses a clear advantage in explanatory power for memory cognition modeling over its predecessors. Precisely, this paper will show that BAM-type architecture can be made to encompass some of the PCA networks properties. he resulting model will thus exhibit attractorlike behavior, supervised unsupervised learning, feature extraction, other nonlinear PCA properties such as blind source separation (BSS) capacities. II. HEOREICAL BACKGROUD A. Principal component analysis neural nets PCA neural networks are based on a linear output function defined by: y = Wx () where x is the original input vector, W is the stabilized weight matrix, y is the output. When the weight matrix maximizes preservation of relevant dimensions, then the reverse operation: xˆ = W y () should produce a result that is close to the original input. hus, the norm of the difference vector x x ˆ should be minimized. his translates by finding a W which minimizes: i ˆ i i i E = x x = x W Wx (3) where E is the error function to minimize, is the input s dimensionality. Several solutions exist for Equation 3 (for a review of most of the methods see [5]). Generalization to nonlinear PCA [8] is straightforward; in this case, a nonlinear output function is to be used: ( ) y = f Wx (4) he corresponding minimization function is thus given by i ( ) E = x W f Wx (5) i Depending on the type of nonlinear output function, different types of optimization can be achieved. If the output function uses cubic or higher nonlinearity, then the network directly relates to independent component analysis (ICA) other BSS algorithms [8, 9]. B. Bidirectional associative memories In Kosko s stard BAM [7], learning is accomplished using a simple Hebbian rule, according to the following equation: W = YX (6) In this expression, X Y are matrices representing the sets of bipolar pairs to be associated, W is the weight matrix which, in fact, corresponds to a correlation measure between the input output. he model uses a recurrent nonlinear output function in order to allow the network to filter out the different patterns during recall. Kosko s BAM uses a signum output function to recall noisy patterns [7]. he nonlinear output function typically used in BAM networks is expressed by the following equations: y( t+ ) = sgn( Wx ( t)) (7) x( t+ ) = sgn( W y ( t)) (8) where y(t) x(t) represent the vectors to be associated at time t, sgn is the signum function defined by if z>0 sgn( z) = 0 if z=0 (9) - if z<0 Many enhancements to Kosko s original BAM have since been proposed over the years; these allow, for example, learning convergence, online learning progressive recall to occur. Moreover, recent works have enabled BAMs to deal with multi-valued [0-4] stimuli in addition to bipolar (binary) stimuli. hese developments directly increase the BAM s modeling capacities range of applications.

3 III. MODEL DESCRIPIO Like any artificial neural network, the Feature-Extracting Bidirectional Associative Memory (FEBAM) can be entirely described by its architecture its output learning functions. A. Architecture FEBAM s architecture, which is nearly identical to that of the BAM proposed by [0], is illustrated in Figure. It consists of two Hopfield-like neural networks interconnected in head-to-toe fashion [5]. When connected, these networks allow a recurrent flow of information that is processed bidirectionally. As shown in Figure, the W layer returns information to the V layer vice versa, in a kind of top-down/bottom-up process fashion. As in a stard BAM, both layers serve as a teacher for the other layer, the connections are explicitly depicted in the model (as opposed to Multi-layer Perceptrons). However, to enable a BAM to perform PCA, one set of those explicit connections must be removed. dimensionality is greater or equal to that of the x layer, then no compression occurs, the network can behave as an autoassociative memory [6]. B. Output function he output function is expressed by the following equations:, If Wxi ( t) > i,...,, yi( t+ ) =, If Wxi( t) < (0) 3 ( δ + ) Wxi( t) δ( Wx) i( t), Else, If Vyi ( t) > i,..., M, xi( t+ ) =, If Vyi( t) < () 3 ( δ + ) Vyi( t) δ( Vy) i( t), Else where M are the number of units in each layer, i is the index of the respective vector element, y(t+) x(t+) represent the network outputs at time t +, δ is a general output parameter that should be fixed at a value lower than 0.5 to insure fixed-point behavior [6]. Figure 3 illustrates the shape of the output function when δ = 0.4. his output function possesses the advantage of exhibiting continuous-valued (gray-level) attractor behavior (for a detailed example see [0]). Such properties contrast with networks using a stard nonlinear output function, which can only exhibit bipolar attractor behavior (e.g. [7]). Fig.. FEBAM network architecture. In a stard BAM, the external links given by the initial y(0) connections would also be available to the W layer. Here, the external links are provided solely through the initial x(0) connections on the V layer. hus, in contrast with the stard architecture [0], y(0), the initial output is not obtained externally, but is instead acquired by iterating once through the network as illustrated in Figure. x(0) x() W V W y(0) y() Fig.. Output iterative process used for learning updates. In the context of feature extraction, the W layer will be used to extract the features whose dimensionality will be determined by the number of y units; the more units in that layer, the less compression will occur. If the y layer s Fig. 3. Output function for δ = 0.4. C. Learning function Learning is based on time-difference Hebbian association [0, 6-9], is formally expressed by the following equations: W( k+ ) = W( k) + η( y(0) y( t))( x(0) + x ( t)) () V( k + ) = V( k) + η( x(0) x( t))( y(0) + y ( t)) (3) where η represents a learning parameter, y(0) x(0), the initial patterns at t = 0, y(t) x(t), the state vectors after t iterations through the network, k the learning trial. he learning rule is thus very simple, can be shown to constitute a generalization of hebbian/anti-hebbian correlation in its autoassociative memory version [0, 6].

4 For weight convergence to occur, η must be set according to the following condition [6]: η <, δ ( δ ) Max[, M ] (4) Equations 3 show that the weights can only converge when feedback is identical to the initial inputs (that is, y(t) = y(0) x(t) = x(0)). he function therefore correlates directly with network outputs, instead of activations. As a result, the learning rule is dynamically linked to the network s output (unlike most BAMs). In this paper, the number of iterations (or network cycles) performed before the weights are updated is set to t=. Consequently, FEBAM can be seen as a type of onlinear PCA which tries to minimize the following functions at different times (t = 0, ) (see [0] for details): E = yi(0) yi() yi(0) yi() y i() (5) M E = xi(0) xi() xi(0) xi() x i() (6) Equations 5 Equation 6 are respectively minimized by learning rules 3. hese minimizations are done in parallel. For example, if the left side of the right term of Equation 6 is exped, it results in the following expression: n E = xi(0) f[ Vf[ Wxi(0)]] xi(0) xi() x i() (7) which is closely related to Equation 5 (the nonlinear PCA minimization function). IV. SIMULAIOS o test FEBAM s ability to perform like a nonlinear PCA, we simulated a classic PCA task (image reconstruction), as well as a classic ICA task (blind source separation of mixed signals). A. Image reconstruction he first simulation consisted in extracting a limited amount of statistical features from a natural image, then reconstructing it to evaluate the amount of information loss. hen, using the same features, the network s generalization capacities are tested by having it reconstruct a different image with other statistical properties. ) Methodology he image used for learning is illustrated in Figure 4a (left side). his image has a dimension of 8x8 pixels. Each pixel is encoded according to an 8-bit grayscale (0 to 55). First, the image values were rescaled to obtain values in the range [-, ]. Using an overlapping window of 5x5 dimensions, the image was convoluted into 5376 input vectors of 5 dimensions. Weights were romly initialized with values ranging from - to. he learning output parameters were respectively set to η = δ = 0.. Learning followed this general procedure: - Rom selection of an input vector; - Iteration through the network (one cycle, as illustrated in Figure ) using the output function (Equations 0 ); 3- Weight updates (Equations 3); 4- Repetition of until the desired number of learning trials k has been completed or the squared error between y(0) y(t) is sufficiently small. (a) Model (b) APEX (c) PCA (d) fastica (e) FEBAM Learning Resulting image PSR (db) Generalization Resulting image PSR (db) Fig. 4. (a) Original images used for learning generalization purposes. (b-e) Simulation results. For each model, reconstructed images following learning generalization are shown, as well as the respective peak signalto-noise ratio (PSR) for each simulation. Results for: (b) APEX [5], a linear PCA neural network; (c) a nonlinear PCA network [8]; (d) fastica [9], an ICA algorithm; (e) FEBAM, a nonlinear PCA-BAM network. FEBAM s performance was compared to that of several algorithms, namely neural linear PCA [5], neural nonlinear PCA [8], fastica [9]. All algorithms were also tested using the generalization image, with other statistical properties, illustrated in Figure 4a (right side). In this situation, the new image s 5 5 pixel windows were

5 presented to the network one at a time. hose resulting outputs were then used to produce the compressed image shown in Figure 4. o quantify performance, the peak signal-to-ratio (PSR) was used: PSR( z) = 0log row, col i, j= 0 row, col i, j= 55 ( zi, j oi, j) (8) where z is the reconstructed image, o the original image, row col are the image s number of rows columns. ) Results After 4000 learning trials, the squared error is lower than Hence, the networks weights constitute a solution which minimizes Equations 5 6. he weights, illustrated using contour plots (Figure 5), seem to have converged onto salient traits that correspond to various contours. From those weights, detection feature maps ( DV 5 = ( WV ) DW = W j i ij ij ij ij V ) were constructed to illustrate what each unit in each layer is tuned for (Figure 6). In Figure 6a, the 5 units are tuned to detect information presents in various orientations in various locations. On the other h, in Figure 6b, the 5 units are tuned to respond to four different diagonal bars a horizontal bar. With various combinations of these features, the network is able to reconstruct both the original the generalization images without much loss of detail. As Figure 4 shows, FEBAM s reconstruction performance is superior to that of other algorithms, except for fastica. In regards to reconstructing the generalization image, FEBAM s performance is comparable to fastica, superior to other algorithms. (a) (b) Fig. 5 Final weight connections for: (a) the W layer; (b) the V layer. (a) Fig. 6. Feature detection performed by each of (a) the 5 units of layer V (b) the 5 units of layer W. (b) B. Blind source separation ) Methodology Blind source separation (BSS) consists in recovering unobserved source signals from several observed mixed signals [0]. he unobserved source signals are assumed nongaussian mutually independent. BSS techniques are mainly used when one must deal with sets of multivariate data (e.g. mixed speech signals, EEG data). he two source signals used for the simulation are illustrated in Figure 7a. he first source is a sinusoid, the second one contains uniformly distributed white noise. hese source signals, noted s, have been multiplied by a mixing matrix, A, following: x = As (9) where A = 0.5. he resulting mixed signal, x, is illustrated in Figure 7b. he network s task is then to recover the original signals (which, in this case, means separating signal from noise), using only the mixed signal. his is a classic task that can be found in several BSS studies (e.g. [8, 9]). (a) Original (b) Observed Fig. 7. (a) Original signals. (b) Observed mixtures of the original signals. Before estimation of the original components is performed, the observed mixtures (Figure 7b) are whitened (or sphered) by decomposing the observed signals covariance matrix into their eigenvalues eigenvectors components ( x% = Λλ Λ x, where Λ is the eigenvector matrix λ is a diagonal matrix composed of the eigenvalues). hen, the final whitened signal can be obtained ( x% = Λλ Λ x ). he new signal vectors will in that case be uncorrelated, their variances will be equal to unity ( xx %% = I ). For this simulation, the model s free parameters were set to η = 0.05 δ = 0., the weights were romly initialized with values ranging from - to, the general learning procedure was identical to that of the image reconstruction simulation. FEBAM s performance was compared to that of the fastica algorithm [9], which was designed for such tasks. ) Results After 000 learning trials, the weights self-stabilize

6 (once again, squared error < 0.03). Figure 8a show the network s estimate of the original signals Figure 8b show fastica s estimate of the original signals. hese signals closely match the originals up to multiplicative signs, like stard BSS algorithms (e.g. [9]). (a) FEBAM (b) FastICA Fig. 8 (a) FEBAM s estimates of the original signals. (b) fastica s estimates of the original signals. V. DISCUSSIO In this paper, it was shown that the BAM introduced in [0] uses a learning function that is closely related to nonlinear PCA networks. Consequently, by simply removing a set of external connections from the original BAM architecture, FEBAM is able to perform feature extraction reduction of the input dimensionality, just like PCA networks do. Because the W V layers are connected, it is possible to observe what the developed attractors in the V layer look like. Leading to attractor-like behavior, the network can thus be used to perform prototype development from exemplar presentations. Seemingly, the model takes more trials to learn shows a slightly lower generalization performance than fastica [9]. However, FEBAM, being a special case of BAM [0], can also be used to simulate other applications such as categorization [6], classification [0], many-to-one association [] multi-step pattern recognition []. hose properties cannot be simultaneously accounted for by any other models presented in this paper (such as fastica). Consequently, in addition to being a cidate for modeling human perceptual feature creation, the model possesses more cognitive explanative power than any other nonlinear/linear PCA or ICA algorithm. One characteristic of FEBAM resides in the fact that as the network learns different weights may converge to the same component. o enhance the network s performance, a decorrelation procedure, such as the ones proposed for ICA [9] PCA [5], could be used. ote, however, that in cognitive psychology, redundancy is often seen as a sign of representational robustness, instead of a performance weakness. Further research should establish FEBAM s computational complexity, as well as evaluate its capacity to learn environmental biases [], to extract learn prototypes in noisy environments (a difficult task to achieve for stard RAM/BAMs). In addition, further studies should address the possibility of FEBAM presenting an evolutive architecture in relation to clustering development. REFERECES [] G. L. Murphy, he Big Book of Concepts. Cambridge, MA: MI Press, 00. [] S. Harnad, "he symbol-grounding problem," Physica D, pp , 990. [3] P. G. Schyns, R. L. Goldstone, J. P. hibaut, "he Development of Features in Object Concepts," Brain Behavioral Sciences, pp. -54, 998. [4] H. Abdi, D. Valentin, B. G. Edelman, "Eigenfeatures as Intermediate-level Representations: he Case for PCA Models," Brain Behavioral Sciences, vol., pp. 7-8, 998. [5] K. I. Diamantaras S. Y. Kung, Principal Component eural etworks. ew York: Wiley, 996. [6] J. J. Hopfield, "eural etworks Physical Systems with Emergent Collective Computational Abilities," Proceedings of the ational Academy of Sciences, vol. 79, pp , 98. [7] B. Kosko, "Bidirectional associative memories," IEEE ransactions on Systems, Man Cybernetics vol. 8, pp , 988. [8] J. Karhunen, P. Pajunen, E. Oja, "he onlinear PCA Criterion in Blind Source Separation: Relations with Other Approaches," eurocomputing, vol., pp. 5-0, 998. [9] A. Hyvarinen E. Oja, "Independent component analysis: algorithms applications," eural etworks, vol. 3, pp , 000. [0] S. Chartier M. Boukadoum, "A bidirectional heteroassociative memory for binary grey-level patterns," IEEE ransactions on eural etworks, vol. 7, pp , 006. [] G. Costantini, D. Casali, R. Perfetti, "eural associative memory storing gray-coded gray-scale images," IEEE ransactions on eural etworks, vol. 4, pp , 003. [] C.-C. Wang, S.-M. Hwang, J.-P. Lee, "Capacity analysis of the asymptotically stable multi-valued exponential bidirectional associative memory," IEEE transactions on systems, Man Cybernetics - Part B: Cybernetics, vol. 6, pp , 996. [3] D. Zhang S. Chen, "A novel multi-valued BAM model with improved error-correcting capability," Journal of electronics, vol. 0, pp. 0-3, 003. [4] J. M. Zurada, I. Cloete, E. van der Peol, "Generalized Hopfield networks for associative memories with multi-valued stable states," eurocomputing, vol. 3, pp , 996. [5] M. H. Hassoun, "Dynamic Heteroassociative eural Memories," eural etworks, vol., pp , 989. [6] S. Chartier R. Proulx, "DRAM: nonlinear dynamic recurrent associative memory for learning bipolar nonbipolar correlated patterns," IEEE ransactions on eural etworks, vol. 6, pp , 005. [7] B. Kosko, "Unsupervised learning in noise," IEEE ransactions on eural etworks, vol., pp , 990. [8] E. Oja, "eural etworks, Principal Components, Subspaces," International Journal of eural Systems, vol., pp. 6-68, 989. [9] R. S. Sutton, "Learning to predict by the methods of temporal difference," Machine learning, vol. 3, pp. 9-44, 988. [0] J. F. Cardoso, "Blind Signal Separation: Statistical Principles," Proceedings of the IEEE, vol. 86, pp , 998. [] S. Chartier M. Boukadoum, "A sequential dynamic heteroassociative memory for multistep pattern recognition oneto-many association," IEEE ransactions on eural etworks, vol. 7, pp , 006. [] S. Helie, S. Chartier, R. Proulx, "Are unsupervised neural networks ignorant? Sizing the effect of environmental distributions on unsupervised learning," Cognitive Systems Research, vol. In Press, 006.

SCRAM: Statistically Converging Recurrent Associative Memory

SCRAM: Statistically Converging Recurrent Associative Memory SCRAM: Statistically Converging Recurrent Associative Memory Sylvain Chartier Sébastien Hélie Mounir Boukadoum Robert Proulx Department of computer Department of computer science, UQAM, Montreal, science,

More information

CHAPTER 3. Pattern Association. Neural Networks

CHAPTER 3. Pattern Association. Neural Networks CHAPTER 3 Pattern Association Neural Networks Pattern Association learning is the process of forming associations between related patterns. The patterns we associate together may be of the same type or

More information

Artificial Intelligence Module 2. Feature Selection. Andrea Torsello

Artificial Intelligence Module 2. Feature Selection. Andrea Torsello Artificial Intelligence Module 2 Feature Selection Andrea Torsello We have seen that high dimensional data is hard to classify (curse of dimensionality) Often however, the data does not fill all the space

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

Lecture 7 Artificial neural networks: Supervised learning

Lecture 7 Artificial neural networks: Supervised learning Lecture 7 Artificial neural networks: Supervised learning Introduction, or how the brain works The neuron as a simple computing element The perceptron Multilayer neural networks Accelerated learning in

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Neural Networks Lecture 2:Single Layer Classifiers

Neural Networks Lecture 2:Single Layer Classifiers Neural Networks Lecture 2:Single Layer Classifiers H.A Talebi Farzaneh Abdollahi Department of Electrical Engineering Amirkabir University of Technology Winter 2011. A. Talebi, Farzaneh Abdollahi Neural

More information

TWO METHODS FOR ESTIMATING OVERCOMPLETE INDEPENDENT COMPONENT BASES. Mika Inki and Aapo Hyvärinen

TWO METHODS FOR ESTIMATING OVERCOMPLETE INDEPENDENT COMPONENT BASES. Mika Inki and Aapo Hyvärinen TWO METHODS FOR ESTIMATING OVERCOMPLETE INDEPENDENT COMPONENT BASES Mika Inki and Aapo Hyvärinen Neural Networks Research Centre Helsinki University of Technology P.O. Box 54, FIN-215 HUT, Finland ABSTRACT

More information

Introduction to Neural Networks

Introduction to Neural Networks Introduction to Neural Networks What are (Artificial) Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning

More information

7. Variable extraction and dimensionality reduction

7. Variable extraction and dimensionality reduction 7. Variable extraction and dimensionality reduction The goal of the variable selection in the preceding chapter was to find least useful variables so that it would be possible to reduce the dimensionality

More information

2- AUTOASSOCIATIVE NET - The feedforward autoassociative net considered in this section is a special case of the heteroassociative net.

2- AUTOASSOCIATIVE NET - The feedforward autoassociative net considered in this section is a special case of the heteroassociative net. 2- AUTOASSOCIATIVE NET - The feedforward autoassociative net considered in this section is a special case of the heteroassociative net. - For an autoassociative net, the training input and target output

More information

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 42 Outline 1 Introduction 2 Feature selection

More information

Iterative face image feature extraction with Generalized Hebbian Algorithm and a Sanger-like BCM rule

Iterative face image feature extraction with Generalized Hebbian Algorithm and a Sanger-like BCM rule Iterative face image feature extraction with Generalized Hebbian Algorithm and a Sanger-like BCM rule Clayton Aldern (Clayton_Aldern@brown.edu) Tyler Benster (Tyler_Benster@brown.edu) Carl Olsson (Carl_Olsson@brown.edu)

More information

Recursive Generalized Eigendecomposition for Independent Component Analysis

Recursive Generalized Eigendecomposition for Independent Component Analysis Recursive Generalized Eigendecomposition for Independent Component Analysis Umut Ozertem 1, Deniz Erdogmus 1,, ian Lan 1 CSEE Department, OGI, Oregon Health & Science University, Portland, OR, USA. {ozertemu,deniz}@csee.ogi.edu

More information

Independent Component Analysis and Its Application on Accelerator Physics

Independent Component Analysis and Its Application on Accelerator Physics Independent Component Analysis and Its Application on Accelerator Physics Xiaoying Pang LA-UR-12-20069 ICA and PCA Similarities: Blind source separation method (BSS) no model Observed signals are linear

More information

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD WHAT IS A NEURAL NETWORK? The simplest definition of a neural network, more properly referred to as an 'artificial' neural network (ANN), is provided

More information

Independent Component Analysis

Independent Component Analysis Independent Component Analysis James V. Stone November 4, 24 Sheffield University, Sheffield, UK Keywords: independent component analysis, independence, blind source separation, projection pursuit, complexity

More information

Basic Principles of Unsupervised and Unsupervised

Basic Principles of Unsupervised and Unsupervised Basic Principles of Unsupervised and Unsupervised Learning Toward Deep Learning Shun ichi Amari (RIKEN Brain Science Institute) collaborators: R. Karakida, M. Okada (U. Tokyo) Deep Learning Self Organization

More information

Fundamentals of Principal Component Analysis (PCA), Independent Component Analysis (ICA), and Independent Vector Analysis (IVA)

Fundamentals of Principal Component Analysis (PCA), Independent Component Analysis (ICA), and Independent Vector Analysis (IVA) Fundamentals of Principal Component Analysis (PCA),, and Independent Vector Analysis (IVA) Dr Mohsen Naqvi Lecturer in Signal and Information Processing, School of Electrical and Electronic Engineering,

More information

Introduction to Independent Component Analysis. Jingmei Lu and Xixi Lu. Abstract

Introduction to Independent Component Analysis. Jingmei Lu and Xixi Lu. Abstract Final Project 2//25 Introduction to Independent Component Analysis Abstract Independent Component Analysis (ICA) can be used to solve blind signal separation problem. In this article, we introduce definition

More information

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II Gatsby Unit University College London 27 Feb 2017 Outline Part I: Theory of ICA Definition and difference

More information

Blind Source Separation Using Artificial immune system

Blind Source Separation Using Artificial immune system American Journal of Engineering Research (AJER) e-issn : 2320-0847 p-issn : 2320-0936 Volume-03, Issue-02, pp-240-247 www.ajer.org Research Paper Open Access Blind Source Separation Using Artificial immune

More information

Independent Component Analysis and Its Applications. By Qing Xue, 10/15/2004

Independent Component Analysis and Its Applications. By Qing Xue, 10/15/2004 Independent Component Analysis and Its Applications By Qing Xue, 10/15/2004 Outline Motivation of ICA Applications of ICA Principles of ICA estimation Algorithms for ICA Extensions of basic ICA framework

More information

Artificial Neural Networks Examination, March 2004

Artificial Neural Networks Examination, March 2004 Artificial Neural Networks Examination, March 2004 Instructions There are SIXTY questions (worth up to 60 marks). The exam mark (maximum 60) will be added to the mark obtained in the laborations (maximum

More information

Machine Learning. Neural Networks

Machine Learning. Neural Networks Machine Learning Neural Networks Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 Biological Analogy Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 THE

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Introduction Consider a zero mean random vector R n with autocorrelation matri R = E( T ). R has eigenvectors q(1),,q(n) and associated eigenvalues λ(1) λ(n). Let Q = [ q(1)

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

In biological terms, memory refers to the ability of neural systems to store activity patterns and later recall them when required.

In biological terms, memory refers to the ability of neural systems to store activity patterns and later recall them when required. In biological terms, memory refers to the ability of neural systems to store activity patterns and later recall them when required. In humans, association is known to be a prominent feature of memory.

More information

Lecture 7: Con3nuous Latent Variable Models

Lecture 7: Con3nuous Latent Variable Models CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 7: Con3nuous Latent Variable Models All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/

More information

Computational Intelligence Lecture 3: Simple Neural Networks for Pattern Classification

Computational Intelligence Lecture 3: Simple Neural Networks for Pattern Classification Computational Intelligence Lecture 3: Simple Neural Networks for Pattern Classification Farzaneh Abdollahi Department of Electrical Engineering Amirkabir University of Technology Fall 2011 arzaneh Abdollahi

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Independent Component Analysis Barnabás Póczos Independent Component Analysis 2 Independent Component Analysis Model original signals Observations (Mixtures)

More information

Neuroscience Introduction

Neuroscience Introduction Neuroscience Introduction The brain As humans, we can identify galaxies light years away, we can study particles smaller than an atom. But we still haven t unlocked the mystery of the three pounds of matter

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

Neural Network Analysis of Russian Parliament Voting Patterns

Neural Network Analysis of Russian Parliament Voting Patterns eural etwork Analysis of Russian Parliament Voting Patterns Dusan Husek Acad. of Sci. of the Czech Republic, Institute of Computer Science, the Czech Republic Email: dusan@cs.cas.cz Alexander A. Frolov

More information

Discriminative Direction for Kernel Classifiers

Discriminative Direction for Kernel Classifiers Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering

More information

CIFAR Lectures: Non-Gaussian statistics and natural images

CIFAR Lectures: Non-Gaussian statistics and natural images CIFAR Lectures: Non-Gaussian statistics and natural images Dept of Computer Science University of Helsinki, Finland Outline Part I: Theory of ICA Definition and difference to PCA Importance of non-gaussianity

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning CS4731 Dr. Mihail Fall 2017 Slide content based on books by Bishop and Barber. https://www.microsoft.com/en-us/research/people/cmbishop/ http://web4.cs.ucl.ac.uk/staff/d.barber/pmwiki/pmwiki.php?n=brml.homepage

More information

Massoud BABAIE-ZADEH. Blind Source Separation (BSS) and Independent Componen Analysis (ICA) p.1/39

Massoud BABAIE-ZADEH. Blind Source Separation (BSS) and Independent Componen Analysis (ICA) p.1/39 Blind Source Separation (BSS) and Independent Componen Analysis (ICA) Massoud BABAIE-ZADEH Blind Source Separation (BSS) and Independent Componen Analysis (ICA) p.1/39 Outline Part I Part II Introduction

More information

Viewpoint invariant face recognition using independent component analysis and attractor networks

Viewpoint invariant face recognition using independent component analysis and attractor networks Viewpoint invariant face recognition using independent component analysis and attractor networks Marian Stewart Bartlett University of California San Diego The Salk Institute La Jolla, CA 92037 marni@salk.edu

More information

c Springer, Reprinted with permission.

c Springer, Reprinted with permission. Zhijian Yuan and Erkki Oja. A FastICA Algorithm for Non-negative Independent Component Analysis. In Puntonet, Carlos G.; Prieto, Alberto (Eds.), Proceedings of the Fifth International Symposium on Independent

More information

Lecture 4: Feed Forward Neural Networks

Lecture 4: Feed Forward Neural Networks Lecture 4: Feed Forward Neural Networks Dr. Roman V Belavkin Middlesex University BIS4435 Biological neurons and the brain A Model of A Single Neuron Neurons as data-driven models Neural Networks Training

More information

MTTS1 Dimensionality Reduction and Visualization Spring 2014 Jaakko Peltonen

MTTS1 Dimensionality Reduction and Visualization Spring 2014 Jaakko Peltonen MTTS1 Dimensionality Reduction and Visualization Spring 2014 Jaakko Peltonen Lecture 3: Linear feature extraction Feature extraction feature extraction: (more general) transform the original to (k < d).

More information

Emergence of Phase- and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces

Emergence of Phase- and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces LETTER Communicated by Bartlett Mel Emergence of Phase- and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces Aapo Hyvärinen Patrik Hoyer Helsinki University

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE 4: Linear Systems Summary # 3: Introduction to artificial neural networks DISTRIBUTED REPRESENTATION An ANN consists of simple processing units communicating with each other. The basic elements of

More information

Artificial Neural Networks Examination, June 2004

Artificial Neural Networks Examination, June 2004 Artificial Neural Networks Examination, June 2004 Instructions There are SIXTY questions (worth up to 60 marks). The exam mark (maximum 60) will be added to the mark obtained in the laborations (maximum

More information

Multilayer Perceptron

Multilayer Perceptron Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Single Perceptron 3 Boolean Function Learning 4

More information

Natural Gradient Learning for Over- and Under-Complete Bases in ICA

Natural Gradient Learning for Over- and Under-Complete Bases in ICA NOTE Communicated by Jean-François Cardoso Natural Gradient Learning for Over- and Under-Complete Bases in ICA Shun-ichi Amari RIKEN Brain Science Institute, Wako-shi, Hirosawa, Saitama 351-01, Japan Independent

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

MODELS OF LEARNING AND THE POLAR DECOMPOSITION OF BOUNDED LINEAR OPERATORS

MODELS OF LEARNING AND THE POLAR DECOMPOSITION OF BOUNDED LINEAR OPERATORS Eighth Mississippi State - UAB Conference on Differential Equations and Computational Simulations. Electronic Journal of Differential Equations, Conf. 19 (2010), pp. 31 36. ISSN: 1072-6691. URL: http://ejde.math.txstate.edu

More information

Unsupervised learning: beyond simple clustering and PCA

Unsupervised learning: beyond simple clustering and PCA Unsupervised learning: beyond simple clustering and PCA Liza Rebrova Self organizing maps (SOM) Goal: approximate data points in R p by a low-dimensional manifold Unlike PCA, the manifold does not have

More information

Natural Image Statistics

Natural Image Statistics Natural Image Statistics A probabilistic approach to modelling early visual processing in the cortex Dept of Computer Science Early visual processing LGN V1 retina From the eye to the primary visual cortex

More information

Neural Networks Lecture 6: Associative Memory II

Neural Networks Lecture 6: Associative Memory II Neural Networks Lecture 6: Associative Memory II H.A Talebi Farzaneh Abdollahi Department of Electrical Engineering Amirkabir University of Technology Winter 2011. A. Talebi, Farzaneh Abdollahi Neural

More information

Covariance and Correlation Matrix

Covariance and Correlation Matrix Covariance and Correlation Matrix Given sample {x n } N 1, where x Rd, x n = x 1n x 2n. x dn sample mean x = 1 N N n=1 x n, and entries of sample mean are x i = 1 N N n=1 x in sample covariance matrix

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand

More information

Information maximization in a network of linear neurons

Information maximization in a network of linear neurons Information maximization in a network of linear neurons Holger Arnold May 30, 005 1 Introduction It is known since the work of Hubel and Wiesel [3], that many cells in the early visual areas of mammals

More information

Separation of the EEG Signal using Improved FastICA Based on Kurtosis Contrast Function

Separation of the EEG Signal using Improved FastICA Based on Kurtosis Contrast Function Australian Journal of Basic and Applied Sciences, 5(9): 2152-2156, 211 ISSN 1991-8178 Separation of the EEG Signal using Improved FastICA Based on Kurtosis Contrast Function 1 Tahir Ahmad, 2 Hjh.Norma

More information

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage

More information

Independent Component Analysis. Contents

Independent Component Analysis. Contents Contents Preface xvii 1 Introduction 1 1.1 Linear representation of multivariate data 1 1.1.1 The general statistical setting 1 1.1.2 Dimension reduction methods 2 1.1.3 Independence as a guiding principle

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Machine Learning 2nd Edition

Machine Learning 2nd Edition INTRODUCTION TO Lecture Slides for Machine Learning 2nd Edition ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some parts from http://www.cs.tau.ac.il/~apartzin/machinelearning/ The MIT Press, 2010

More information

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) Principal Component Analysis (PCA) Additional reading can be found from non-assessed exercises (week 8) in this course unit teaching page. Textbooks: Sect. 6.3 in [1] and Ch. 12 in [2] Outline Introduction

More information

Feature selection and extraction Spectral domain quality estimation Alternatives

Feature selection and extraction Spectral domain quality estimation Alternatives Feature selection and extraction Error estimation Maa-57.3210 Data Classification and Modelling in Remote Sensing Markus Törmä markus.torma@tkk.fi Measurements Preprocessing: Remove random and systematic

More information

Subspace Methods for Visual Learning and Recognition

Subspace Methods for Visual Learning and Recognition This is a shortened version of the tutorial given at the ECCV 2002, Copenhagen, and ICPR 2002, Quebec City. Copyright 2002 by Aleš Leonardis, University of Ljubljana, and Horst Bischof, Graz University

More information

ADAPTIVE LATERAL INHIBITION FOR NON-NEGATIVE ICA. Mark Plumbley

ADAPTIVE LATERAL INHIBITION FOR NON-NEGATIVE ICA. Mark Plumbley Submitteed to the International Conference on Independent Component Analysis and Blind Signal Separation (ICA2) ADAPTIVE LATERAL INHIBITION FOR NON-NEGATIVE ICA Mark Plumbley Audio & Music Lab Department

More information

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?

More information

A. The Hopfield Network. III. Recurrent Neural Networks. Typical Artificial Neuron. Typical Artificial Neuron. Hopfield Network.

A. The Hopfield Network. III. Recurrent Neural Networks. Typical Artificial Neuron. Typical Artificial Neuron. Hopfield Network. III. Recurrent Neural Networks A. The Hopfield Network 2/9/15 1 2/9/15 2 Typical Artificial Neuron Typical Artificial Neuron connection weights linear combination activation function inputs output net

More information

Learning and Memory in Neural Networks

Learning and Memory in Neural Networks Learning and Memory in Neural Networks Guy Billings, Neuroinformatics Doctoral Training Centre, The School of Informatics, The University of Edinburgh, UK. Neural networks consist of computational units

More information

Table of Contents. Multivariate methods. Introduction II. Introduction I

Table of Contents. Multivariate methods. Introduction II. Introduction I Table of Contents Introduction Antti Penttilä Department of Physics University of Helsinki Exactum summer school, 04 Construction of multinormal distribution Test of multinormality with 3 Interpretation

More information

ARTEFACT DETECTION IN ASTROPHYSICAL IMAGE DATA USING INDEPENDENT COMPONENT ANALYSIS. Maria Funaro, Erkki Oja, and Harri Valpola

ARTEFACT DETECTION IN ASTROPHYSICAL IMAGE DATA USING INDEPENDENT COMPONENT ANALYSIS. Maria Funaro, Erkki Oja, and Harri Valpola ARTEFACT DETECTION IN ASTROPHYSICAL IMAGE DATA USING INDEPENDENT COMPONENT ANALYSIS Maria Funaro, Erkki Oja, and Harri Valpola Neural Networks Research Centre, Helsinki University of Technology P.O.Box

More information

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as

More information

ORIENTED PCA AND BLIND SIGNAL SEPARATION

ORIENTED PCA AND BLIND SIGNAL SEPARATION ORIENTED PCA AND BLIND SIGNAL SEPARATION K. I. Diamantaras Department of Informatics TEI of Thessaloniki Sindos 54101, Greece kdiamant@it.teithe.gr Th. Papadimitriou Department of Int. Economic Relat.

More information

Neural Networks. Mark van Rossum. January 15, School of Informatics, University of Edinburgh 1 / 28

Neural Networks. Mark van Rossum. January 15, School of Informatics, University of Edinburgh 1 / 28 1 / 28 Neural Networks Mark van Rossum School of Informatics, University of Edinburgh January 15, 2018 2 / 28 Goals: Understand how (recurrent) networks behave Find a way to teach networks to do a certain

More information

Neural Networks and Machine Learning research at the Laboratory of Computer and Information Science, Helsinki University of Technology

Neural Networks and Machine Learning research at the Laboratory of Computer and Information Science, Helsinki University of Technology Neural Networks and Machine Learning research at the Laboratory of Computer and Information Science, Helsinki University of Technology Erkki Oja Department of Computer Science Aalto University, Finland

More information

Nonlinear Component Analysis Based on Correntropy

Nonlinear Component Analysis Based on Correntropy onlinear Component Analysis Based on Correntropy Jian-Wu Xu, Puskal P. Pokharel, António R. C. Paiva and José C. Príncipe Abstract In this paper, we propose a new nonlinear principal component analysis

More information

Fisher Memory of Linear Wigner Echo State Networks

Fisher Memory of Linear Wigner Echo State Networks ESA 07 proceedings, European Symposium on Artificial eural etworks, Computational Intelligence and Machine Learning. Bruges (Belgium), 6-8 April 07, i6doc.com publ., ISB 978-87587039-. Fisher Memory of

More information

Unsupervised Learning with Permuted Data

Unsupervised Learning with Permuted Data Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University

More information

Christian Mohr

Christian Mohr Christian Mohr 20.12.2011 Recurrent Networks Networks in which units may have connections to units in the same or preceding layers Also connections to the unit itself possible Already covered: Hopfield

More information

Principal Component Analysis

Principal Component Analysis B: Chapter 1 HTF: Chapter 1.5 Principal Component Analysis Barnabás Póczos University of Alberta Nov, 009 Contents Motivation PCA algorithms Applications Face recognition Facial expression recognition

More information

COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection

COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection Instructor: Herke van Hoof (herke.vanhoof@cs.mcgill.ca) Based on slides by:, Jackie Chi Kit Cheung Class web page:

More information

COMP 408/508. Computer Vision Fall 2017 PCA for Recognition

COMP 408/508. Computer Vision Fall 2017 PCA for Recognition COMP 408/508 Computer Vision Fall 07 PCA or Recognition Recall: Color Gradient by PCA v λ ( G G, ) x x x R R v, v : eigenvectors o D D with v ^v (, ) x x λ, λ : eigenvalues o D D with λ >λ v λ ( B B, )

More information

One-unit Learning Rules for Independent Component Analysis

One-unit Learning Rules for Independent Component Analysis One-unit Learning Rules for Independent Component Analysis Aapo Hyvarinen and Erkki Oja Helsinki University of Technology Laboratory of Computer and Information Science Rakentajanaukio 2 C, FIN-02150 Espoo,

More information

Principal Component Analysis CS498

Principal Component Analysis CS498 Principal Component Analysis CS498 Today s lecture Adaptive Feature Extraction Principal Component Analysis How, why, when, which A dual goal Find a good representation The features part Reduce redundancy

More information

+ + ( + ) = Linear recurrent networks. Simpler, much more amenable to analytic treatment E.g. by choosing

+ + ( + ) = Linear recurrent networks. Simpler, much more amenable to analytic treatment E.g. by choosing Linear recurrent networks Simpler, much more amenable to analytic treatment E.g. by choosing + ( + ) = Firing rates can be negative Approximates dynamics around fixed point Approximation often reasonable

More information

Associative Memories (I) Hopfield Networks

Associative Memories (I) Hopfield Networks Associative Memories (I) Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Applied Brain Science - Computational Neuroscience (CNS) A Pun Associative Memories Introduction

More information

A Coupled Helmholtz Machine for PCA

A Coupled Helmholtz Machine for PCA A Coupled Helmholtz Machine for PCA Seungjin Choi Department of Computer Science Pohang University of Science and Technology San 3 Hyoja-dong, Nam-gu Pohang 79-784, Korea seungjin@postech.ac.kr August

More information

Machine Learning - MT & 14. PCA and MDS

Machine Learning - MT & 14. PCA and MDS Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)

More information

Chapter 9: The Perceptron

Chapter 9: The Perceptron Chapter 9: The Perceptron 9.1 INTRODUCTION At this point in the book, we have completed all of the exercises that we are going to do with the James program. These exercises have shown that distributed

More information

A two-layer ICA-like model estimated by Score Matching

A two-layer ICA-like model estimated by Score Matching A two-layer ICA-like model estimated by Score Matching Urs Köster and Aapo Hyvärinen University of Helsinki and Helsinki Institute for Information Technology Abstract. Capturing regularities in high-dimensional

More information

Keywords Eigenface, face recognition, kernel principal component analysis, machine learning. II. LITERATURE REVIEW & OVERVIEW OF PROPOSED METHODOLOGY

Keywords Eigenface, face recognition, kernel principal component analysis, machine learning. II. LITERATURE REVIEW & OVERVIEW OF PROPOSED METHODOLOGY Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Eigenface and

More information

Convolutional Associative Memory: FIR Filter Model of Synapse

Convolutional Associative Memory: FIR Filter Model of Synapse Convolutional Associative Memory: FIR Filter Model of Synapse Rama Murthy Garimella 1, Sai Dileep Munugoti 2, Anil Rayala 1 1 International Institute of Information technology, Hyderabad, India. rammurthy@iiit.ac.in,

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Artificial Neural Networks Examination, June 2005

Artificial Neural Networks Examination, June 2005 Artificial Neural Networks Examination, June 2005 Instructions There are SIXTY questions. (The pass mark is 30 out of 60). For each question, please select a maximum of ONE of the given answers (either

More information

1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM.

1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM. Université du Sud Toulon - Var Master Informatique Probabilistic Learning and Data Analysis TD: Model-based clustering by Faicel CHAMROUKHI Solution The aim of this practical wor is to show how the Classification

More information

Computational Intelligence Lecture 6: Associative Memory

Computational Intelligence Lecture 6: Associative Memory Computational Intelligence Lecture 6: Associative Memory Farzaneh Abdollahi Department of Electrical Engineering Amirkabir University of Technology Fall 2011 Farzaneh Abdollahi Computational Intelligence

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

A Statistical Analysis of Fukunaga Koontz Transform

A Statistical Analysis of Fukunaga Koontz Transform 1 A Statistical Analysis of Fukunaga Koontz Transform Xiaoming Huo Dr. Xiaoming Huo is an assistant professor at the School of Industrial and System Engineering of the Georgia Institute of Technology,

More information

Machine Learning. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Machine Learning. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Machine Learning Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1395 1 / 47 Table of contents 1 Introduction

More information

Using Kernel PCA for Initialisation of Variational Bayesian Nonlinear Blind Source Separation Method

Using Kernel PCA for Initialisation of Variational Bayesian Nonlinear Blind Source Separation Method Using Kernel PCA for Initialisation of Variational Bayesian Nonlinear Blind Source Separation Method Antti Honkela 1, Stefan Harmeling 2, Leo Lundqvist 1, and Harri Valpola 1 1 Helsinki University of Technology,

More information

Course Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch

Course Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch Psychology 452 Week 12: Deep Learning What Is Deep Learning? Preliminary Ideas (that we already know!) The Restricted Boltzmann Machine (RBM) Many Layers of RBMs Pros and Cons of Deep Learning Course Structure

More information