AS Elementary Cybernetics. Lecture 4: Neocybernetic Basic Models

Size: px
Start display at page:

Download "AS Elementary Cybernetics. Lecture 4: Neocybernetic Basic Models"

Transcription

1 AS Elementary Cybernetics Lecture 4: Neocybernetic Basic Models

2 Starting point when modeling real complex systems: Observation: Bottom-up approaches (studying the mechanisms alone) is futile Another observation: op-down approaches alone are similarly hopeless there is no grounding Mission: Both views have to be combined One needs vision from top One needs substance from bottom ry to apply the ideas to a prototypical example: Modeling of neural networks the best understood complex system (?) Remember that combining the two views is a big challenge: Computationalism (numeric) and traditional AI (symbolic) seem to be incompatible; low-level functions and high-level (emergent) functionalities are very different

3 Neocybernetic starting points summary he details (along the time axis) are abstracted away, holistic view from the above is applied here exist local actions only, there are no structures of centralized control It is assumed that the underlying interactions and feedbacks are consistent, maintaining the system integrity his means that one can assume stationarity and dynamic balance in the system in varying environmental conditions An additional assumption: Linearity is pursued as long as it is reasonable Sounds simple are there any new intuitions available? Strong guiding principles for modeling

4 Modeling a neuron Neural (chemical) signals are pulse coded, asynchronous,... extremely complicated Simplification: Only the relevant information is represented the activation levels

5 Abstraction level #1 riggering of neuronal pulses is stochastic Assume that in stationary environmental conditions the average number of pulses in some time interval remains constant Only study statistical phenomena: Abstract the time axis away, only model average activity Perceptron: Linear summation of input signals v j + activation function: x i = ( ) i f W v and linear version m i = = i ij j j= 1 x W v w v v 1 v m w i1 w im Σ For the whole linear grid ζ f( ζ) x = W v x i

6 Still, remember the reality...

7 he emergence idea is exploited here deterministic activity variables are employed to describe behaviors How to exploit the first-level neuron abstraction, how to reach the neuron grid level of abstraction? Neural networks research studies this opposite ends: 1. Feedforward perceptron networks Non-intuitive: Black-box model, unanalyzable Mathematically strong: Smooth functions can be approximated to arbitrary accuracy 2. Kohonen s self-organizing maps (SOM) Intuitive: Easily interpretable by humans (visual pattern recognition capability exploited) Less mathematical: A mapping from m dimensional real-valued vectors to n integers Now, again, trust deep structures more than surface patterns!

8 Hebbian learning Artificial neural networks are mainly seen as computational tools only o capture the functional essence of neuronal systems, one has to elaborate on the domain area he Hebbian learning rule (by physician Donald O. Hebb) also dates back to mid-1900 s: If the neuron activity correlates with the input signal, the corresponding synaptic weight increases Are there some goals for neurons included here?! Is there something teleological taking place? Bold assumptions make it possible to reach powerful models

9 raditional approach Assume: Perceptron activity x i is a linear function of the input signal v j, where the vector w ij contains the synaptic weight: x with Hebbian law applied in adaptation: Correlation between input and neuronal activity expressed as x i ν j, so that dw dt = w v ij ij j ij = γ x v = γ w v 2 i j ij j assuming here, for simplicity, that m = 1. his learning law is unstable the synaptic weight grows infinitely, and so does x i! x i m = j= 1 x ij

10 Enhancements Stabilization by the Oja s rule (by Erkki Oja): dw dt ij = γ wv γ wx 2 2 ij j ij i Motivation: Keeps the weight vector bounded ( W i = 1), and average signal size E x i } = 1 Extracts the first principal component of the data Compare to the logistic formulation of limited growth! Extension: Generalized Hebbian Algorithm (GHA): Structural tailoring makes it possible to deflate pc s one at a time However, the new formula is nonlinear: Analysis of neuron grids containing such elements is difficult, and extending them is equally difficult What to do instead?

11 Level of synapses he neocybernetic guidelines are: Search for balance and linearity Note: Nonlinearity was not included in the original Hebbian law it was only introduced for pragmatic reasons Are there other ways to reach stability in linear terms? Yes one can apply negative feedback: dwij 1 dw 1 = γ i xivj τ w or in matrix form i ij = γ xv τ W dt dt he steady-state is } E } W = γτ E xv =Γ xv Synaptic weights become coded in a correlation matrix!

12 Level of neuron grids Just the same principles can be applied when studying the neuron grid level balance and linearity Distinguish sources: x W = ( A B) and v = u so that A xx and =Γ E } B =Γ E xu } o implement negative feedback, one needs to apply the anti-hebbian action between otherwise Hebbian neurons: dx Ax Bu dt = + 1 } 1 E E } so that the steady state becomes x = A Bu = xx xu u = φ u Model is stable! Eigenvalues of A always real and non-negative

13 Hebbian/anti-Hebbian system x = Ax+ Bu E xx} E xx} 1 1 E } xx E } xx 2 2 A Explicit feedback structures Completely localized operation, even though centralized matrix formulations applied to reach mathematical compactness x 1 B u 1 x 2 E } 1 1 E 1 2 } E } E xu 2 1 } E xu} xu E } xu 2 3 u 2 u 3

14 Different time scales after all, u also varies u j t u j u j x i τ x i τ u j x i τ

15 owards abstraction level #2 Cybernetic model = statistical model of balances x(u) Assume dynamics of u is essentially slower than that of x and study the covariance properties: or or 1 1 } } } } } xx = xx xu uu xu xx } E E E E E E 3 } } } xx = xu uu xu } E E E E ( E uu } ) E uu } 3 3 φ φ = φ φ n< m Balance on the statistical level = second-order balance Operator E connects the levels hird level of time scales!

16 Solution Expression fulfilled for φ = θ n D, where θ n is a matrix of n of the covariance matrix eigenvectors, and D is orthogonal Loose motivation: his is because left-hand side is then ( E } ) E } and right-hand side is E = n n = Λn } E } ( ) ( ) = n n = Λ n = Λn φ uu φ D θ uu θ D D D D D φ uu φ D θ uu θ D D D Stable solution when θ n contains the most significant data covariance matrix eigenvectors

17 Accurate proof: Adaptation converges to PS Study how the mapping matrices become adapted; assume that this process is divided in k steps: 1 x(0) = A (0) B(0) u he first part of this expression does not affect the subspace being spanned; so only B(k) is now of interest. Write it applying the basis axes spanned by the principal components of data, so that and B(0) = DΘ. Now one has Bk ( ) = B(0)E uu k k k = DΘ ΘΛ Θ = DΛ Θ, } } } } } x(1) = A (1) B(1) u = A (1) E x(0) u u = A (1) E A (0) B(0) uu u = A (1) A (0) B(0) E uu u } } x A B u A x u u A A A B uu uu u A A A B uu u (2) = (2) (2) = (2) E (1) = (2) E (1) (0) (0) E = (2) (1) (0) (0) E 1 xk A kbk u= = A ( k i) B(0)E uu u 1 ( ) = ( ) ( ) E uu } =ΘΛΘ } k i= 0 } k meaning that in the mapping matrix the relevance of the principal component direction j is weighted by λ j. Because the variables x i are linearly independent, it is the n most significant of those directions that only will remain visible in the mapped data after adaptation. his completes the proof. 2

18 Principal subspace analysis Any subset of input data principal components can be selected for φ he subspace spanned by the n most significant principal components gives a stable solution Conclusion: Competitive learning (combined Hebbian and anti-hebbian learning) without any structural constraints results in self-regulation (balance) and self-organization (in terms of principal subspace).

19 Emergent patterns he process (convergence of x) can be substituted with the final pattern: Details are lost, but the essence remains (?) he pattern is characterized in terms of a cost criterion 1 (, ) E } E J x u = x xx x x xu } u 2 Models of local minima (m = 2, n = 1): Process itself = Gradient descent minimization for the criterion! u 2 u 2 u 2 1 φ = / 2 1 u 1 0 φ = 1 u 1 1 φ = / 2 1 u 1

20 Mathematics vs. reality 1. Correlations vs. covariances he matrices being studied are correlation matrices rather than covariance matrices (as is normally the case in PCA) his means that now data u is not assumed to be zero-mean, there is no need for preprocessing; in practice, the variables are always non-negative From physical point of view, this is beneficial: Note that the actual signal carriers (chemical concentrations / pulse frequencies) cannot be negative 2. Principal subspace vs. principal components When applying the linear structure, the actual principal components are not distinguished, only the subspace spanned by them his means that the variables can again all be non-negative, so that the signals x again can be physically plausible Indeed: If all u j are non-negative, and initially all x i have non-negative values, the x i s will always be non-negative (so that one has a positive system)

21 Unification of layers Again, study a synapse; it is not a static mapping but a dynamic system (much faster than the grid dynamics) Assume this (trivial) system is also cybernetic: dxij 2 = γe x } E } i xij + γ xv i j vj dt or in balance xv } i j x 2 } i E x = v = w v E ij j ij j Extra term! Comparing this to the grid model, one can see that the two layers are qualitatively identical if one selects Var } Γ= xx 1

22 echnical experiments In technical terms, one can define the algorithm 1 x = Eˆ xx ˆ xu u where dê xx } dt dê xu } dt } E } xx } = λ Ê + λ xx xu } = λ Ê + λ xu In practice, easiest implemented in discrete-time Matrix inversion lemma can be applied Matrices can be masked to implement hierarchies, etc. Initialization: Matrix A has to be always positive definite

23 Principal subspaces found OK for different data sets Data 1 Data 2 HAH = Hebbian/Anti- Hebbian algorithm Data 3 Data 4

24 PC s cannot always be told apart Data 1 Data 2 Data 3 Data 4

25 Summary: Neocybernetic models First-order cybernetic system: For any stable A, assume that there holds dx Ax Bu dt = + 1 with x = A Bu = φ u Second-order cybernetic system: Additionally, assume that the matrices are A E =Γ xx } and B =Γ E xu } Higher-order (optimized) cybernetic system: Additionally, asume that Var } xx 1 Γ= or Γ= E xx } 1 Newton algorithm: Second-order convergence

26 Example: Hand-written digits here were a large body of 32x32 pixel images, representing digits from 0 to 9, over 8000 samples Examples of typical 9 Examples of less typical 9

27 Algorithm for Hebbian/anti-Hebbian learning... LOOP iterate for data in the kxm matrix U % Balance of latent variables Xbar = U * (inv(exx)*exu)'; % Model adaptation Exu = lambda*exu + (1-lambda)*Xbar'*U/k; Exx = lambda*exx + (1-lambda)*Xbar'*Xbar/k; % PCA rather than PSA through structural constraints Exx = tril(ones(n,n)).*exx; END % Recursive algorithm can be boosted with matrix inversion lemma

28 ... resulting in Principal Components Parameters: m = 1024 n = λ = 0.5 Rather fast convergence, starting from θ 1 on top left

29 Summary of Clever Agents Emergence in terms of self-regulation (stability) and selforganization (principal subspace analysis) reached his is reached applying physiologically plausible operations and model is linear scalable beyond toy domains Learning is local but not completely local: Need communication among neurons (anti-hebbian structures) Roles of signals different: How to motivate the inversion in adaptation direction (anti-hebbian learning)? Solution next: Apply non-idealities in an unorthodox way! here exist no unidirectional causal flows in real life systems Feedback: Exploiting a signal exhausts that signal

30 Stupid Agents } φ = qe xu stiffness, coupling factor φ Τ u + + u x Δu ϕ ϕ = E ux } q Implicit feedback Exhaustion through Exploitation Generalized Boltzmann Machine?

31 Simply go for resources Again: Balancing is reached by feedback, but now not explicitly but implicitly through the environment x = φ u u = u ϕ x Also environment finds its balance Only exploiting locally visible quantities, implement evolutionary adaptation symmetrically as φ = ϕ = q q E E xu } xu } How to characterize this environmental balance?

32 Because x = q xu u, one can write two covariances: and so that E } xu } = q xu } uu } E E E } 2 } } } } xx = q xu uu xu = q xu xu } E E E E E E -1/2-1/2 In = q E xx E xu E xu E xx q =θ θ } } } } D θ θ D 1 } -1/2 } } } } -1/2 In = q E xx E xu E uu E xu E xx q q D θ θ D Forget the trivial solution where x i is identically zero

33 Similarly, if x = Q xu u for some (diagonal) matrix Q: and E } xu } = Q xu } uu } E E E } = } } } = } } E xx QE xu E uu E xu Q E xu E xu Q Note: this has to be symmetric, so that } } } xx = xx = Q xu xu } E E E E Stronger formulation is reached: } } -1/2 1/2 θ = E xu E xx Q For non-identical q i, this has to become diagonal also

34 Equalization of environmental variances Because θθ= I n and θ uu θ = Q, θ consists of the n (most significant) eigenvectors of E uu }, and E uu } If n = m, the variation structure becomes trivial: E 1 uu = I m q } or E } 1 E } 1 uu = Q Visible data variation becomes whitened by the feedback Relation to ICA : Assume that this whitened data is further processed by neurons (FOBI) but this has to be nonlinear! On the other hand, if q i are different, the modes become separated in the PCA style (rather than PSA) Still try to avoid nonlinearity!

35 λ Open-loop eigenvalues Sub-cybernetic system Marginally cybernetic Hyper-cybernetic λ n 1/q λn+1 Untouched 1/q 1 1/q n Unequal stiffnesses Eigenvalue index DEMO equalvar.m

36 Emergence of Intelligence Solving for the balance signals, one can write ( } ) 1 1 E E } x = xx + Q xu u E } his can be rewritten in a form where there is xu as the feedforward matrix, and E } 1 xx + Q as the feedback matrix Of course, u has changed to u as it is the only signal visible It turns out that if the coupling increases, so that qi, this structure equals the prior one with explicit feedback his means that in evolution (to be studied later closer) a selfish agent finally becomes a clever agent Without preprogramming, an agent becomes context-aware

37 Variance inheritance Further study the relationship between x and original u: ( } } ) 1 2 E E E } n x = I + q xu xu q xu u ( }) 1 E E } n = I + q xx q xu u Multiply from the right by transpose, and take expectations: ( } ) E E } E I } n + q xx xx In + q xx ( ) ( ) 2 } 1/2 1/2 } } n = E xx I + q E xx E xx = } } } 2 q E xu E uu E xu

38 ( I E }) n + q xx 2-1/2-1/2 } } } } } = q qe xx E xu E uu E xu E xx q D θ Solving for the latent covariance: E } 1 ( E } ) 1/2 1 xx = q D θ uu θd I n q q his means that the external and internal eigenvalues (variances) are related as follows: q λ 1 i q i j for pairs, there must hold θ D q λ > 1 i j

39 Effect of feedback = add black noise σ 2 Adding/subtracting white noise 1 n White noise = Constant increase in all directions Black noise = Decrease in all directions (if poss.) Adding black noise 1/q 1 n

40 Results of orthogonal basis rotations 1 λ 1 n otal variance above zero level intact regardless of the rotations otal variance above 1 changes! Mathematical view of data λ 1 Cybernetic view of data 1 n

41 Factor Analysis Algorithm rotates the PCA subspace axes

42 owards differentiation of features A simple example of nonlinear extensions: CU function If variable is positive, let it through; otherwise, filter it out Well in line with modeling of activity in neuronal systems: Frequencies cannot become negative (interpretation in terms of pulse trains) Concentrations cannot become negative (interpretation in terms of chemicals) Sizes of neuron populations cannot become negative Power / information content cannot become negative, etc. Makes modes separated Still: End result almost linear! f cut,i f (() xz ) i f ( x) i = x, when x > 0 i 0, when x 0 i i x z i Symmetry breaking?!

43 Algorithm for Hebbian feedback learning... LOOP iterate for data in kxm matrix U END % Balance of latent variables Xbar = U * (inv(eye(n)+q*exu*exu')*q*exu)'; % Enhance model convergence by nonlinearity Xbar = Xbar.*(Xbar>0); % Balance of the environmental signals Ubar = U - Xbar*Exu; % Model adaptation Exu = lambda*exu + (1-lambda)*Xbar'*Ubar/k; % Maintaining system activity Exx = Xbar'*Xbar/k; q = q + P*diag(ref - sqrt(diag(exx)));

44 ... resulting in Sparse Components! Parameters: m = 1024 n = λ = 0.97 ref = DEMO digitfeat.m

45 Work load becomes distributed Correlations between inputs and neuronal activities shown below: max 0 min

46 Sparse coding raditional modeling goal mathematically optimal and simple to implement: Minimize the overall size of the model Sparsity-oriented modeling goal often physically relevant, but cumbersome to characterize: Minimize the number of simultaneously active representations while there are no acute constraints on the overall model size here is a tradeoff: too sparse model becomes inaccurate Note the (intuitive) connection to cognitive system: here are no acute constraints on the long-term memory (LM), whereas the size of the working memory / short-term memory is limited to 7 +/ 2

47 Studied later... Visual V1 cortex seems to do this kind of decomposing

48 Conclusion: Clever vs. stupid agents Both agent types can exist resulting systems very different Intelligent agent Stupid/selfish agent Hebbian/anti-Hebbian learning u u Hebbian environmental learning x = E xx } -1 E xu } u x = qe xu } u Principal Component (Subspace) Analysis x x Sparse Component (Subspace) Analysis Complete insulation ( or elimination) of environment Stiffness regulation, equalization of environment

Neuron Grids as seen as. Elastic Systems. Heikki Hyötyniemi TKK / Control Engineering Presentation at NeuroCafé March 31, 2006

Neuron Grids as seen as. Elastic Systems. Heikki Hyötyniemi TKK / Control Engineering Presentation at NeuroCafé March 31, 2006 Neuron Grids as seen as Elastic Systems Heikki Hyötyniemi KK / Control Engineering Presentation at NeuroCafé March 31, 2006 Heikki Hyötyniemi Chairman of the Finnish Artificial Intelligence Society (FAIS)

More information

Tutorial Proposal for ICANN 2011 (June 14 17, 2011, in Espoo, Finland) Cybernetics of Neuron Systems

Tutorial Proposal for ICANN 2011 (June 14 17, 2011, in Espoo, Finland) Cybernetics of Neuron Systems Tutorial Proposal for ICANN 2011 (June 14 17, 2011, in Espoo, Finland) Cybernetics of Neuron Systems Heikki Hyötyniemi neocybernetics.com heikki.hyotyniemi@tkk.fi Tel. +358-9-50-3841626 Abstract In this

More information

Hebb rule book: 'The Organization of Behavior' Theory about the neural bases of learning

Hebb rule book: 'The Organization of Behavior' Theory about the neural bases of learning PCA by neurons Hebb rule 1949 book: 'The Organization of Behavior' Theory about the neural bases of learning Learning takes place in synapses. Synapses get modified, they get stronger when the pre- and

More information

Introduction to Neural Networks

Introduction to Neural Networks Introduction to Neural Networks What are (Artificial) Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning

More information

AS Elementary Cybernetics. Lecture 8: Practical Experiments

AS Elementary Cybernetics. Lecture 8: Practical Experiments AS-74.4192 Elementary Cybernetics Lecture 8: Practical Experiments Key point to note: In practice, theory has to be tuned appropriately Luckily, the neocybernetic framework is very versatile! After we

More information

Synaptic Plasticity. Introduction. Biophysics of Synaptic Plasticity. Functional Modes of Synaptic Plasticity. Activity-dependent synaptic plasticity:

Synaptic Plasticity. Introduction. Biophysics of Synaptic Plasticity. Functional Modes of Synaptic Plasticity. Activity-dependent synaptic plasticity: Synaptic Plasticity Introduction Dayan and Abbott (2001) Chapter 8 Instructor: Yoonsuck Choe; CPSC 644 Cortical Networks Activity-dependent synaptic plasticity: underlies learning and memory, and plays

More information

Hierarchy. Will Penny. 24th March Hierarchy. Will Penny. Linear Models. Convergence. Nonlinear Models. References

Hierarchy. Will Penny. 24th March Hierarchy. Will Penny. Linear Models. Convergence. Nonlinear Models. References 24th March 2011 Update Hierarchical Model Rao and Ballard (1999) presented a hierarchical model of visual cortex to show how classical and extra-classical Receptive Field (RF) effects could be explained

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks Delivered by Mark Ebden With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable

More information

Data Preprocessing Tasks

Data Preprocessing Tasks Data Tasks 1 2 3 Data Reduction 4 We re here. 1 Dimensionality Reduction Dimensionality reduction is a commonly used approach for generating fewer features. Typically used because too many features can

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Laurenz Wiskott Institute for Theoretical Biology Humboldt-University Berlin Invalidenstraße 43 D-10115 Berlin, Germany 11 March 2004 1 Intuition Problem Statement Experimental

More information

An objective function for self-limiting neural plasticity rules.

An objective function for self-limiting neural plasticity rules. ESANN 5 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges (Belgium), -4 April 5, i6doc.com publ., ISBN 978-875874-8. An objective function

More information

IPAM Summer School Optimization methods for machine learning. Jorge Nocedal

IPAM Summer School Optimization methods for machine learning. Jorge Nocedal IPAM Summer School 2012 Tutorial on Optimization methods for machine learning Jorge Nocedal Northwestern University Overview 1. We discuss some characteristics of optimization problems arising in deep

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Lecture - 27 Multilayer Feedforward Neural networks with Sigmoidal

More information

A summary of Deep Learning without Poor Local Minima

A summary of Deep Learning without Poor Local Minima A summary of Deep Learning without Poor Local Minima by Kenji Kawaguchi MIT oral presentation at NIPS 2016 Learning Supervised (or Predictive) learning Learn a mapping from inputs x to outputs y, given

More information

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo

More information

Dimensionality Reduction

Dimensionality Reduction Lecture 5 1 Outline 1. Overview a) What is? b) Why? 2. Principal Component Analysis (PCA) a) Objectives b) Explaining variability c) SVD 3. Related approaches a) ICA b) Autoencoders 2 Example 1: Sportsball

More information

c Springer, Reprinted with permission.

c Springer, Reprinted with permission. Zhijian Yuan and Erkki Oja. A FastICA Algorithm for Non-negative Independent Component Analysis. In Puntonet, Carlos G.; Prieto, Alberto (Eds.), Proceedings of the Fifth International Symposium on Independent

More information

Unsupervised learning: beyond simple clustering and PCA

Unsupervised learning: beyond simple clustering and PCA Unsupervised learning: beyond simple clustering and PCA Liza Rebrova Self organizing maps (SOM) Goal: approximate data points in R p by a low-dimensional manifold Unlike PCA, the manifold does not have

More information

Recursive Generalized Eigendecomposition for Independent Component Analysis

Recursive Generalized Eigendecomposition for Independent Component Analysis Recursive Generalized Eigendecomposition for Independent Component Analysis Umut Ozertem 1, Deniz Erdogmus 1,, ian Lan 1 CSEE Department, OGI, Oregon Health & Science University, Portland, OR, USA. {ozertemu,deniz}@csee.ogi.edu

More information

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II Gatsby Unit University College London 27 Feb 2017 Outline Part I: Theory of ICA Definition and difference

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

The Hebb rule Neurons that fire together wire together.

The Hebb rule Neurons that fire together wire together. Unsupervised learning The Hebb rule Neurons that fire together wire together. PCA RF development with PCA Classical Conditioning and Hebbʼs rule Ear A Nose B Tongue When an axon in cell A is near enough

More information

Neocybernetics A New Kind of Natural Philosophy. Heikki Hyötyniemi HUT / Control Engineering Tutorial presentation at CIRA 05, June 27, 2005

Neocybernetics A New Kind of Natural Philosophy. Heikki Hyötyniemi HUT / Control Engineering Tutorial presentation at CIRA 05, June 27, 2005 Neocybernetics A New Kind of Natural Philosophy Heikki Hyötyniemi HUT / Control Engineering Tutorial presentation at CIRA 05, June 27, 2005 Heikki Hyötyniemi Chairman of the Finnish Artificial Intelligence

More information

Emergence of Phase- and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces

Emergence of Phase- and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces LETTER Communicated by Bartlett Mel Emergence of Phase- and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces Aapo Hyvärinen Patrik Hoyer Helsinki University

More information

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision) CS4495/6495 Introduction to Computer Vision 8B-L2 Principle Component Analysis (and its use in Computer Vision) Wavelength 2 Wavelength 2 Principal Components Principal components are all about the directions

More information

Neural networks: Unsupervised learning

Neural networks: Unsupervised learning Neural networks: Unsupervised learning 1 Previously The supervised learning paradigm: given example inputs x and target outputs t learning the mapping between them the trained network is supposed to give

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?

More information

Machine Learning (Spring 2012) Principal Component Analysis

Machine Learning (Spring 2012) Principal Component Analysis 1-71 Machine Learning (Spring 1) Principal Component Analysis Yang Xu This note is partly based on Chapter 1.1 in Chris Bishop s book on PRML and the lecture slides on PCA written by Carlos Guestrin in

More information

1 Principal Components Analysis

1 Principal Components Analysis Lecture 3 and 4 Sept. 18 and Sept.20-2006 Data Visualization STAT 442 / 890, CM 462 Lecture: Ali Ghodsi 1 Principal Components Analysis Principal components analysis (PCA) is a very popular technique for

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Machine Learning Techniques

Machine Learning Techniques Machine Learning Techniques ( 機器學習技法 ) Lecture 13: Deep Learning Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University ( 國立台灣大學資訊工程系

More information

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann (Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Introduction Consider a zero mean random vector R n with autocorrelation matri R = E( T ). R has eigenvectors q(1),,q(n) and associated eigenvalues λ(1) λ(n). Let Q = [ q(1)

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Neural Networks. Mark van Rossum. January 15, School of Informatics, University of Edinburgh 1 / 28

Neural Networks. Mark van Rossum. January 15, School of Informatics, University of Edinburgh 1 / 28 1 / 28 Neural Networks Mark van Rossum School of Informatics, University of Edinburgh January 15, 2018 2 / 28 Goals: Understand how (recurrent) networks behave Find a way to teach networks to do a certain

More information

Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

More information

Multilayer Perceptron

Multilayer Perceptron Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Single Perceptron 3 Boolean Function Learning 4

More information

Basic Principles of Unsupervised and Unsupervised

Basic Principles of Unsupervised and Unsupervised Basic Principles of Unsupervised and Unsupervised Learning Toward Deep Learning Shun ichi Amari (RIKEN Brain Science Institute) collaborators: R. Karakida, M. Okada (U. Tokyo) Deep Learning Self Organization

More information

Vector Space Models. wine_spectral.r

Vector Space Models. wine_spectral.r Vector Space Models 137 wine_spectral.r Latent Semantic Analysis Problem with words Even a small vocabulary as in wine example is challenging LSA Reduce number of columns of DTM by principal components

More information

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage

More information

Independent Component Analysis and Its Applications. By Qing Xue, 10/15/2004

Independent Component Analysis and Its Applications. By Qing Xue, 10/15/2004 Independent Component Analysis and Its Applications By Qing Xue, 10/15/2004 Outline Motivation of ICA Applications of ICA Principles of ICA estimation Algorithms for ICA Extensions of basic ICA framework

More information

CPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018

CPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018 CPSC 340: Machine Learning and Data Mining Sparse Matrix Factorization Fall 2018 Last Time: PCA with Orthogonal/Sequential Basis When k = 1, PCA has a scaling problem. When k > 1, have scaling, rotation,

More information

Variational Principal Components

Variational Principal Components Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings

More information

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:

More information

Neural networks and optimization

Neural networks and optimization Neural networks and optimization Nicolas Le Roux Criteo 18/05/15 Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 1 / 85 1 Introduction 2 Deep networks 3 Optimization 4 Convolutional

More information

Wavelet Transform And Principal Component Analysis Based Feature Extraction

Wavelet Transform And Principal Component Analysis Based Feature Extraction Wavelet Transform And Principal Component Analysis Based Feature Extraction Keyun Tong June 3, 2010 As the amount of information grows rapidly and widely, feature extraction become an indispensable technique

More information

Principal Components Analysis (PCA)

Principal Components Analysis (PCA) Principal Components Analysis (PCA) Principal Components Analysis (PCA) a technique for finding patterns in data of high dimension Outline:. Eigenvectors and eigenvalues. PCA: a) Getting the data b) Centering

More information

Natural Image Statistics

Natural Image Statistics Natural Image Statistics A probabilistic approach to modelling early visual processing in the cortex Dept of Computer Science Early visual processing LGN V1 retina From the eye to the primary visual cortex

More information

Non-Convex Optimization. CS6787 Lecture 7 Fall 2017

Non-Convex Optimization. CS6787 Lecture 7 Fall 2017 Non-Convex Optimization CS6787 Lecture 7 Fall 2017 First some words about grading I sent out a bunch of grades on the course management system Everyone should have all their grades in Not including paper

More information

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017 CPSC 340: Machine Learning and Data Mining More PCA Fall 2017 Admin Assignment 4: Due Friday of next week. No class Monday due to holiday. There will be tutorials next week on MAP/PCA (except Monday).

More information

[POLS 8500] Review of Linear Algebra, Probability and Information Theory

[POLS 8500] Review of Linear Algebra, Probability and Information Theory [POLS 8500] Review of Linear Algebra, Probability and Information Theory Professor Jason Anastasopoulos ljanastas@uga.edu January 12, 2017 For today... Basic linear algebra. Basic probability. Programming

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

Neural Networks and the Back-propagation Algorithm

Neural Networks and the Back-propagation Algorithm Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely

More information

CMSC 421: Neural Computation. Applications of Neural Networks

CMSC 421: Neural Computation. Applications of Neural Networks CMSC 42: Neural Computation definition synonyms neural networks artificial neural networks neural modeling connectionist models parallel distributed processing AI perspective Applications of Neural Networks

More information

CIFAR Lectures: Non-Gaussian statistics and natural images

CIFAR Lectures: Non-Gaussian statistics and natural images CIFAR Lectures: Non-Gaussian statistics and natural images Dept of Computer Science University of Helsinki, Finland Outline Part I: Theory of ICA Definition and difference to PCA Importance of non-gaussianity

More information

One-unit Learning Rules for Independent Component Analysis

One-unit Learning Rules for Independent Component Analysis One-unit Learning Rules for Independent Component Analysis Aapo Hyvarinen and Erkki Oja Helsinki University of Technology Laboratory of Computer and Information Science Rakentajanaukio 2 C, FIN-02150 Espoo,

More information

Mathematical foundations - linear algebra

Mathematical foundations - linear algebra Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar

More information

Temporal Backpropagation for FIR Neural Networks

Temporal Backpropagation for FIR Neural Networks Temporal Backpropagation for FIR Neural Networks Eric A. Wan Stanford University Department of Electrical Engineering, Stanford, CA 94305-4055 Abstract The traditional feedforward neural network is a static

More information

Principal Component Analysis

Principal Component Analysis B: Chapter 1 HTF: Chapter 1.5 Principal Component Analysis Barnabás Póczos University of Alberta Nov, 009 Contents Motivation PCA algorithms Applications Face recognition Facial expression recognition

More information

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012 Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component

More information

Equivalence of Backpropagation and Contrastive Hebbian Learning in a Layered Network

Equivalence of Backpropagation and Contrastive Hebbian Learning in a Layered Network LETTER Communicated by Geoffrey Hinton Equivalence of Backpropagation and Contrastive Hebbian Learning in a Layered Network Xiaohui Xie xhx@ai.mit.edu Department of Brain and Cognitive Sciences, Massachusetts

More information

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis: Uses one group of variables (we will call this X) In

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Lecture 7: Con3nuous Latent Variable Models

Lecture 7: Con3nuous Latent Variable Models CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 7: Con3nuous Latent Variable Models All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/

More information

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Neural Networks CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Perceptrons x 0 = 1 x 1 x 2 z = h w T x Output: z x D A perceptron

More information

Lecture 7 Artificial neural networks: Supervised learning

Lecture 7 Artificial neural networks: Supervised learning Lecture 7 Artificial neural networks: Supervised learning Introduction, or how the brain works The neuron as a simple computing element The perceptron Multilayer neural networks Accelerated learning in

More information

MODELS OF LEARNING AND THE POLAR DECOMPOSITION OF BOUNDED LINEAR OPERATORS

MODELS OF LEARNING AND THE POLAR DECOMPOSITION OF BOUNDED LINEAR OPERATORS Eighth Mississippi State - UAB Conference on Differential Equations and Computational Simulations. Electronic Journal of Differential Equations, Conf. 19 (2010), pp. 31 36. ISSN: 1072-6691. URL: http://ejde.math.txstate.edu

More information

Credit Assignment: Beyond Backpropagation

Credit Assignment: Beyond Backpropagation Credit Assignment: Beyond Backpropagation Yoshua Bengio 11 December 2016 AutoDiff NIPS 2016 Workshop oo b s res P IT g, M e n i arn nlin Le ain o p ee em : D will r G PLU ters p cha k t, u o is Deep Learning

More information

A two-layer ICA-like model estimated by Score Matching

A two-layer ICA-like model estimated by Score Matching A two-layer ICA-like model estimated by Score Matching Urs Köster and Aapo Hyvärinen University of Helsinki and Helsinki Institute for Information Technology Abstract. Capturing regularities in high-dimensional

More information

Lecture 1: Systems of linear equations and their solutions

Lecture 1: Systems of linear equations and their solutions Lecture 1: Systems of linear equations and their solutions Course overview Topics to be covered this semester: Systems of linear equations and Gaussian elimination: Solving linear equations and applications

More information

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Course 495: Advanced Statistical Machine Learning/Pattern Recognition Course 495: Advanced Statistical Machine Learning/Pattern Recognition Deterministic Component Analysis Goal (Lecture): To present standard and modern Component Analysis (CA) techniques such as Principal

More information

Kernel Methods. Machine Learning A W VO

Kernel Methods. Machine Learning A W VO Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance

More information

17 Neural Networks NEURAL NETWORKS. x XOR 1. x Jonathan Richard Shewchuk

17 Neural Networks NEURAL NETWORKS. x XOR 1. x Jonathan Richard Shewchuk 94 Jonathan Richard Shewchuk 7 Neural Networks NEURAL NETWORKS Can do both classification & regression. [They tie together several ideas from the course: perceptrons, logistic regression, ensembles of

More information

Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis

Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture

More information

Lecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan

Lecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan Lecture 3: Latent Variables Models and Learning with the EM Algorithm Sam Roweis Tuesday July25, 2006 Machine Learning Summer School, Taiwan Latent Variable Models What to do when a variable z is always

More information

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks Topics in Machine Learning-EE 5359 Neural Networks 1 The Perceptron Output: A perceptron is a function that maps D-dimensional vectors to real numbers. For notational convenience, we add a zero-th dimension

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4 Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:

More information

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = 30 MATHEMATICS REVIEW G A.1.1 Matrices and Vectors Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = a 11 a 12... a 1N a 21 a 22... a 2N...... a M1 a M2... a MN A matrix can

More information

Numerical Analysis Lecture Notes

Numerical Analysis Lecture Notes Numerical Analysis Lecture Notes Peter J Olver 8 Numerical Computation of Eigenvalues In this part, we discuss some practical methods for computing eigenvalues and eigenvectors of matrices Needless to

More information

Quantitative Understanding in Biology Principal Components Analysis

Quantitative Understanding in Biology Principal Components Analysis Quantitative Understanding in Biology Principal Components Analysis Introduction Throughout this course we have seen examples of complex mathematical phenomena being represented as linear combinations

More information

Learning and Memory in Neural Networks

Learning and Memory in Neural Networks Learning and Memory in Neural Networks Guy Billings, Neuroinformatics Doctoral Training Centre, The School of Informatics, The University of Edinburgh, UK. Neural networks consist of computational units

More information

Lecture: Face Recognition and Feature Reduction

Lecture: Face Recognition and Feature Reduction Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab 1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed in the

More information

Chapter 18. Remarks on partial differential equations

Chapter 18. Remarks on partial differential equations Chapter 8. Remarks on partial differential equations If we try to analyze heat flow or vibration in a continuous system such as a building or an airplane, we arrive at a kind of infinite system of ordinary

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Independent Component Analysis Barnabás Póczos Independent Component Analysis 2 Independent Component Analysis Model original signals Observations (Mixtures)

More information

Iterative face image feature extraction with Generalized Hebbian Algorithm and a Sanger-like BCM rule

Iterative face image feature extraction with Generalized Hebbian Algorithm and a Sanger-like BCM rule Iterative face image feature extraction with Generalized Hebbian Algorithm and a Sanger-like BCM rule Clayton Aldern (Clayton_Aldern@brown.edu) Tyler Benster (Tyler_Benster@brown.edu) Carl Olsson (Carl_Olsson@brown.edu)

More information

Matching the dimensionality of maps with that of the data

Matching the dimensionality of maps with that of the data Matching the dimensionality of maps with that of the data COLIN FYFE Applied Computational Intelligence Research Unit, The University of Paisley, Paisley, PA 2BE SCOTLAND. Abstract Topographic maps are

More information

Principal Component Analysis CS498

Principal Component Analysis CS498 Principal Component Analysis CS498 Today s lecture Adaptive Feature Extraction Principal Component Analysis How, why, when, which A dual goal Find a good representation The features part Reduce redundancy

More information

ADAPTIVE LATERAL INHIBITION FOR NON-NEGATIVE ICA. Mark Plumbley

ADAPTIVE LATERAL INHIBITION FOR NON-NEGATIVE ICA. Mark Plumbley Submitteed to the International Conference on Independent Component Analysis and Blind Signal Separation (ICA2) ADAPTIVE LATERAL INHIBITION FOR NON-NEGATIVE ICA Mark Plumbley Audio & Music Lab Department

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Simple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017

Simple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017 Simple Techniques for Improving SGD CS6787 Lecture 2 Fall 2017 Step Sizes and Convergence Where we left off Stochastic gradient descent x t+1 = x t rf(x t ; yĩt ) Much faster per iteration than gradient

More information

Independent Component Analysis. Contents

Independent Component Analysis. Contents Contents Preface xvii 1 Introduction 1 1.1 Linear representation of multivariate data 1 1.1.1 The general statistical setting 1 1.1.2 Dimension reduction methods 2 1.1.3 Independence as a guiding principle

More information

A. The Hopfield Network. III. Recurrent Neural Networks. Typical Artificial Neuron. Typical Artificial Neuron. Hopfield Network.

A. The Hopfield Network. III. Recurrent Neural Networks. Typical Artificial Neuron. Typical Artificial Neuron. Hopfield Network. Part 3A: Hopfield Network III. Recurrent Neural Networks A. The Hopfield Network 1 2 Typical Artificial Neuron Typical Artificial Neuron connection weights linear combination activation function inputs

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V

More information

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inria.fr http://perception.inrialpes.fr/

More information

Sections 18.6 and 18.7 Artificial Neural Networks

Sections 18.6 and 18.7 Artificial Neural Networks Sections 18.6 and 18.7 Artificial Neural Networks CS4811 - Artificial Intelligence Nilufer Onder Department of Computer Science Michigan Technological University Outline The brain vs. artifical neural

More information

Principal Component Analysis (PCA) for Sparse High-Dimensional Data

Principal Component Analysis (PCA) for Sparse High-Dimensional Data AB Principal Component Analysis (PCA) for Sparse High-Dimensional Data Tapani Raiko, Alexander Ilin, and Juha Karhunen Helsinki University of Technology, Finland Adaptive Informatics Research Center Principal

More information

Factor Analysis (10/2/13)

Factor Analysis (10/2/13) STA561: Probabilistic machine learning Factor Analysis (10/2/13) Lecturer: Barbara Engelhardt Scribes: Li Zhu, Fan Li, Ni Guan Factor Analysis Factor analysis is related to the mixture models we have studied.

More information