AS Elementary Cybernetics. Lecture 4: Neocybernetic Basic Models
|
|
- Abner Kelly
- 5 years ago
- Views:
Transcription
1 AS Elementary Cybernetics Lecture 4: Neocybernetic Basic Models
2 Starting point when modeling real complex systems: Observation: Bottom-up approaches (studying the mechanisms alone) is futile Another observation: op-down approaches alone are similarly hopeless there is no grounding Mission: Both views have to be combined One needs vision from top One needs substance from bottom ry to apply the ideas to a prototypical example: Modeling of neural networks the best understood complex system (?) Remember that combining the two views is a big challenge: Computationalism (numeric) and traditional AI (symbolic) seem to be incompatible; low-level functions and high-level (emergent) functionalities are very different
3 Neocybernetic starting points summary he details (along the time axis) are abstracted away, holistic view from the above is applied here exist local actions only, there are no structures of centralized control It is assumed that the underlying interactions and feedbacks are consistent, maintaining the system integrity his means that one can assume stationarity and dynamic balance in the system in varying environmental conditions An additional assumption: Linearity is pursued as long as it is reasonable Sounds simple are there any new intuitions available? Strong guiding principles for modeling
4 Modeling a neuron Neural (chemical) signals are pulse coded, asynchronous,... extremely complicated Simplification: Only the relevant information is represented the activation levels
5 Abstraction level #1 riggering of neuronal pulses is stochastic Assume that in stationary environmental conditions the average number of pulses in some time interval remains constant Only study statistical phenomena: Abstract the time axis away, only model average activity Perceptron: Linear summation of input signals v j + activation function: x i = ( ) i f W v and linear version m i = = i ij j j= 1 x W v w v v 1 v m w i1 w im Σ For the whole linear grid ζ f( ζ) x = W v x i
6 Still, remember the reality...
7 he emergence idea is exploited here deterministic activity variables are employed to describe behaviors How to exploit the first-level neuron abstraction, how to reach the neuron grid level of abstraction? Neural networks research studies this opposite ends: 1. Feedforward perceptron networks Non-intuitive: Black-box model, unanalyzable Mathematically strong: Smooth functions can be approximated to arbitrary accuracy 2. Kohonen s self-organizing maps (SOM) Intuitive: Easily interpretable by humans (visual pattern recognition capability exploited) Less mathematical: A mapping from m dimensional real-valued vectors to n integers Now, again, trust deep structures more than surface patterns!
8 Hebbian learning Artificial neural networks are mainly seen as computational tools only o capture the functional essence of neuronal systems, one has to elaborate on the domain area he Hebbian learning rule (by physician Donald O. Hebb) also dates back to mid-1900 s: If the neuron activity correlates with the input signal, the corresponding synaptic weight increases Are there some goals for neurons included here?! Is there something teleological taking place? Bold assumptions make it possible to reach powerful models
9 raditional approach Assume: Perceptron activity x i is a linear function of the input signal v j, where the vector w ij contains the synaptic weight: x with Hebbian law applied in adaptation: Correlation between input and neuronal activity expressed as x i ν j, so that dw dt = w v ij ij j ij = γ x v = γ w v 2 i j ij j assuming here, for simplicity, that m = 1. his learning law is unstable the synaptic weight grows infinitely, and so does x i! x i m = j= 1 x ij
10 Enhancements Stabilization by the Oja s rule (by Erkki Oja): dw dt ij = γ wv γ wx 2 2 ij j ij i Motivation: Keeps the weight vector bounded ( W i = 1), and average signal size E x i } = 1 Extracts the first principal component of the data Compare to the logistic formulation of limited growth! Extension: Generalized Hebbian Algorithm (GHA): Structural tailoring makes it possible to deflate pc s one at a time However, the new formula is nonlinear: Analysis of neuron grids containing such elements is difficult, and extending them is equally difficult What to do instead?
11 Level of synapses he neocybernetic guidelines are: Search for balance and linearity Note: Nonlinearity was not included in the original Hebbian law it was only introduced for pragmatic reasons Are there other ways to reach stability in linear terms? Yes one can apply negative feedback: dwij 1 dw 1 = γ i xivj τ w or in matrix form i ij = γ xv τ W dt dt he steady-state is } E } W = γτ E xv =Γ xv Synaptic weights become coded in a correlation matrix!
12 Level of neuron grids Just the same principles can be applied when studying the neuron grid level balance and linearity Distinguish sources: x W = ( A B) and v = u so that A xx and =Γ E } B =Γ E xu } o implement negative feedback, one needs to apply the anti-hebbian action between otherwise Hebbian neurons: dx Ax Bu dt = + 1 } 1 E E } so that the steady state becomes x = A Bu = xx xu u = φ u Model is stable! Eigenvalues of A always real and non-negative
13 Hebbian/anti-Hebbian system x = Ax+ Bu E xx} E xx} 1 1 E } xx E } xx 2 2 A Explicit feedback structures Completely localized operation, even though centralized matrix formulations applied to reach mathematical compactness x 1 B u 1 x 2 E } 1 1 E 1 2 } E } E xu 2 1 } E xu} xu E } xu 2 3 u 2 u 3
14 Different time scales after all, u also varies u j t u j u j x i τ x i τ u j x i τ
15 owards abstraction level #2 Cybernetic model = statistical model of balances x(u) Assume dynamics of u is essentially slower than that of x and study the covariance properties: or or 1 1 } } } } } xx = xx xu uu xu xx } E E E E E E 3 } } } xx = xu uu xu } E E E E ( E uu } ) E uu } 3 3 φ φ = φ φ n< m Balance on the statistical level = second-order balance Operator E connects the levels hird level of time scales!
16 Solution Expression fulfilled for φ = θ n D, where θ n is a matrix of n of the covariance matrix eigenvectors, and D is orthogonal Loose motivation: his is because left-hand side is then ( E } ) E } and right-hand side is E = n n = Λn } E } ( ) ( ) = n n = Λ n = Λn φ uu φ D θ uu θ D D D D D φ uu φ D θ uu θ D D D Stable solution when θ n contains the most significant data covariance matrix eigenvectors
17 Accurate proof: Adaptation converges to PS Study how the mapping matrices become adapted; assume that this process is divided in k steps: 1 x(0) = A (0) B(0) u he first part of this expression does not affect the subspace being spanned; so only B(k) is now of interest. Write it applying the basis axes spanned by the principal components of data, so that and B(0) = DΘ. Now one has Bk ( ) = B(0)E uu k k k = DΘ ΘΛ Θ = DΛ Θ, } } } } } x(1) = A (1) B(1) u = A (1) E x(0) u u = A (1) E A (0) B(0) uu u = A (1) A (0) B(0) E uu u } } x A B u A x u u A A A B uu uu u A A A B uu u (2) = (2) (2) = (2) E (1) = (2) E (1) (0) (0) E = (2) (1) (0) (0) E 1 xk A kbk u= = A ( k i) B(0)E uu u 1 ( ) = ( ) ( ) E uu } =ΘΛΘ } k i= 0 } k meaning that in the mapping matrix the relevance of the principal component direction j is weighted by λ j. Because the variables x i are linearly independent, it is the n most significant of those directions that only will remain visible in the mapped data after adaptation. his completes the proof. 2
18 Principal subspace analysis Any subset of input data principal components can be selected for φ he subspace spanned by the n most significant principal components gives a stable solution Conclusion: Competitive learning (combined Hebbian and anti-hebbian learning) without any structural constraints results in self-regulation (balance) and self-organization (in terms of principal subspace).
19 Emergent patterns he process (convergence of x) can be substituted with the final pattern: Details are lost, but the essence remains (?) he pattern is characterized in terms of a cost criterion 1 (, ) E } E J x u = x xx x x xu } u 2 Models of local minima (m = 2, n = 1): Process itself = Gradient descent minimization for the criterion! u 2 u 2 u 2 1 φ = / 2 1 u 1 0 φ = 1 u 1 1 φ = / 2 1 u 1
20 Mathematics vs. reality 1. Correlations vs. covariances he matrices being studied are correlation matrices rather than covariance matrices (as is normally the case in PCA) his means that now data u is not assumed to be zero-mean, there is no need for preprocessing; in practice, the variables are always non-negative From physical point of view, this is beneficial: Note that the actual signal carriers (chemical concentrations / pulse frequencies) cannot be negative 2. Principal subspace vs. principal components When applying the linear structure, the actual principal components are not distinguished, only the subspace spanned by them his means that the variables can again all be non-negative, so that the signals x again can be physically plausible Indeed: If all u j are non-negative, and initially all x i have non-negative values, the x i s will always be non-negative (so that one has a positive system)
21 Unification of layers Again, study a synapse; it is not a static mapping but a dynamic system (much faster than the grid dynamics) Assume this (trivial) system is also cybernetic: dxij 2 = γe x } E } i xij + γ xv i j vj dt or in balance xv } i j x 2 } i E x = v = w v E ij j ij j Extra term! Comparing this to the grid model, one can see that the two layers are qualitatively identical if one selects Var } Γ= xx 1
22 echnical experiments In technical terms, one can define the algorithm 1 x = Eˆ xx ˆ xu u where dê xx } dt dê xu } dt } E } xx } = λ Ê + λ xx xu } = λ Ê + λ xu In practice, easiest implemented in discrete-time Matrix inversion lemma can be applied Matrices can be masked to implement hierarchies, etc. Initialization: Matrix A has to be always positive definite
23 Principal subspaces found OK for different data sets Data 1 Data 2 HAH = Hebbian/Anti- Hebbian algorithm Data 3 Data 4
24 PC s cannot always be told apart Data 1 Data 2 Data 3 Data 4
25 Summary: Neocybernetic models First-order cybernetic system: For any stable A, assume that there holds dx Ax Bu dt = + 1 with x = A Bu = φ u Second-order cybernetic system: Additionally, assume that the matrices are A E =Γ xx } and B =Γ E xu } Higher-order (optimized) cybernetic system: Additionally, asume that Var } xx 1 Γ= or Γ= E xx } 1 Newton algorithm: Second-order convergence
26 Example: Hand-written digits here were a large body of 32x32 pixel images, representing digits from 0 to 9, over 8000 samples Examples of typical 9 Examples of less typical 9
27 Algorithm for Hebbian/anti-Hebbian learning... LOOP iterate for data in the kxm matrix U % Balance of latent variables Xbar = U * (inv(exx)*exu)'; % Model adaptation Exu = lambda*exu + (1-lambda)*Xbar'*U/k; Exx = lambda*exx + (1-lambda)*Xbar'*Xbar/k; % PCA rather than PSA through structural constraints Exx = tril(ones(n,n)).*exx; END % Recursive algorithm can be boosted with matrix inversion lemma
28 ... resulting in Principal Components Parameters: m = 1024 n = λ = 0.5 Rather fast convergence, starting from θ 1 on top left
29 Summary of Clever Agents Emergence in terms of self-regulation (stability) and selforganization (principal subspace analysis) reached his is reached applying physiologically plausible operations and model is linear scalable beyond toy domains Learning is local but not completely local: Need communication among neurons (anti-hebbian structures) Roles of signals different: How to motivate the inversion in adaptation direction (anti-hebbian learning)? Solution next: Apply non-idealities in an unorthodox way! here exist no unidirectional causal flows in real life systems Feedback: Exploiting a signal exhausts that signal
30 Stupid Agents } φ = qe xu stiffness, coupling factor φ Τ u + + u x Δu ϕ ϕ = E ux } q Implicit feedback Exhaustion through Exploitation Generalized Boltzmann Machine?
31 Simply go for resources Again: Balancing is reached by feedback, but now not explicitly but implicitly through the environment x = φ u u = u ϕ x Also environment finds its balance Only exploiting locally visible quantities, implement evolutionary adaptation symmetrically as φ = ϕ = q q E E xu } xu } How to characterize this environmental balance?
32 Because x = q xu u, one can write two covariances: and so that E } xu } = q xu } uu } E E E } 2 } } } } xx = q xu uu xu = q xu xu } E E E E E E -1/2-1/2 In = q E xx E xu E xu E xx q =θ θ } } } } D θ θ D 1 } -1/2 } } } } -1/2 In = q E xx E xu E uu E xu E xx q q D θ θ D Forget the trivial solution where x i is identically zero
33 Similarly, if x = Q xu u for some (diagonal) matrix Q: and E } xu } = Q xu } uu } E E E } = } } } = } } E xx QE xu E uu E xu Q E xu E xu Q Note: this has to be symmetric, so that } } } xx = xx = Q xu xu } E E E E Stronger formulation is reached: } } -1/2 1/2 θ = E xu E xx Q For non-identical q i, this has to become diagonal also
34 Equalization of environmental variances Because θθ= I n and θ uu θ = Q, θ consists of the n (most significant) eigenvectors of E uu }, and E uu } If n = m, the variation structure becomes trivial: E 1 uu = I m q } or E } 1 E } 1 uu = Q Visible data variation becomes whitened by the feedback Relation to ICA : Assume that this whitened data is further processed by neurons (FOBI) but this has to be nonlinear! On the other hand, if q i are different, the modes become separated in the PCA style (rather than PSA) Still try to avoid nonlinearity!
35 λ Open-loop eigenvalues Sub-cybernetic system Marginally cybernetic Hyper-cybernetic λ n 1/q λn+1 Untouched 1/q 1 1/q n Unequal stiffnesses Eigenvalue index DEMO equalvar.m
36 Emergence of Intelligence Solving for the balance signals, one can write ( } ) 1 1 E E } x = xx + Q xu u E } his can be rewritten in a form where there is xu as the feedforward matrix, and E } 1 xx + Q as the feedback matrix Of course, u has changed to u as it is the only signal visible It turns out that if the coupling increases, so that qi, this structure equals the prior one with explicit feedback his means that in evolution (to be studied later closer) a selfish agent finally becomes a clever agent Without preprogramming, an agent becomes context-aware
37 Variance inheritance Further study the relationship between x and original u: ( } } ) 1 2 E E E } n x = I + q xu xu q xu u ( }) 1 E E } n = I + q xx q xu u Multiply from the right by transpose, and take expectations: ( } ) E E } E I } n + q xx xx In + q xx ( ) ( ) 2 } 1/2 1/2 } } n = E xx I + q E xx E xx = } } } 2 q E xu E uu E xu
38 ( I E }) n + q xx 2-1/2-1/2 } } } } } = q qe xx E xu E uu E xu E xx q D θ Solving for the latent covariance: E } 1 ( E } ) 1/2 1 xx = q D θ uu θd I n q q his means that the external and internal eigenvalues (variances) are related as follows: q λ 1 i q i j for pairs, there must hold θ D q λ > 1 i j
39 Effect of feedback = add black noise σ 2 Adding/subtracting white noise 1 n White noise = Constant increase in all directions Black noise = Decrease in all directions (if poss.) Adding black noise 1/q 1 n
40 Results of orthogonal basis rotations 1 λ 1 n otal variance above zero level intact regardless of the rotations otal variance above 1 changes! Mathematical view of data λ 1 Cybernetic view of data 1 n
41 Factor Analysis Algorithm rotates the PCA subspace axes
42 owards differentiation of features A simple example of nonlinear extensions: CU function If variable is positive, let it through; otherwise, filter it out Well in line with modeling of activity in neuronal systems: Frequencies cannot become negative (interpretation in terms of pulse trains) Concentrations cannot become negative (interpretation in terms of chemicals) Sizes of neuron populations cannot become negative Power / information content cannot become negative, etc. Makes modes separated Still: End result almost linear! f cut,i f (() xz ) i f ( x) i = x, when x > 0 i 0, when x 0 i i x z i Symmetry breaking?!
43 Algorithm for Hebbian feedback learning... LOOP iterate for data in kxm matrix U END % Balance of latent variables Xbar = U * (inv(eye(n)+q*exu*exu')*q*exu)'; % Enhance model convergence by nonlinearity Xbar = Xbar.*(Xbar>0); % Balance of the environmental signals Ubar = U - Xbar*Exu; % Model adaptation Exu = lambda*exu + (1-lambda)*Xbar'*Ubar/k; % Maintaining system activity Exx = Xbar'*Xbar/k; q = q + P*diag(ref - sqrt(diag(exx)));
44 ... resulting in Sparse Components! Parameters: m = 1024 n = λ = 0.97 ref = DEMO digitfeat.m
45 Work load becomes distributed Correlations between inputs and neuronal activities shown below: max 0 min
46 Sparse coding raditional modeling goal mathematically optimal and simple to implement: Minimize the overall size of the model Sparsity-oriented modeling goal often physically relevant, but cumbersome to characterize: Minimize the number of simultaneously active representations while there are no acute constraints on the overall model size here is a tradeoff: too sparse model becomes inaccurate Note the (intuitive) connection to cognitive system: here are no acute constraints on the long-term memory (LM), whereas the size of the working memory / short-term memory is limited to 7 +/ 2
47 Studied later... Visual V1 cortex seems to do this kind of decomposing
48 Conclusion: Clever vs. stupid agents Both agent types can exist resulting systems very different Intelligent agent Stupid/selfish agent Hebbian/anti-Hebbian learning u u Hebbian environmental learning x = E xx } -1 E xu } u x = qe xu } u Principal Component (Subspace) Analysis x x Sparse Component (Subspace) Analysis Complete insulation ( or elimination) of environment Stiffness regulation, equalization of environment
Neuron Grids as seen as. Elastic Systems. Heikki Hyötyniemi TKK / Control Engineering Presentation at NeuroCafé March 31, 2006
Neuron Grids as seen as Elastic Systems Heikki Hyötyniemi KK / Control Engineering Presentation at NeuroCafé March 31, 2006 Heikki Hyötyniemi Chairman of the Finnish Artificial Intelligence Society (FAIS)
More informationTutorial Proposal for ICANN 2011 (June 14 17, 2011, in Espoo, Finland) Cybernetics of Neuron Systems
Tutorial Proposal for ICANN 2011 (June 14 17, 2011, in Espoo, Finland) Cybernetics of Neuron Systems Heikki Hyötyniemi neocybernetics.com heikki.hyotyniemi@tkk.fi Tel. +358-9-50-3841626 Abstract In this
More informationHebb rule book: 'The Organization of Behavior' Theory about the neural bases of learning
PCA by neurons Hebb rule 1949 book: 'The Organization of Behavior' Theory about the neural bases of learning Learning takes place in synapses. Synapses get modified, they get stronger when the pre- and
More informationIntroduction to Neural Networks
Introduction to Neural Networks What are (Artificial) Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning
More informationAS Elementary Cybernetics. Lecture 8: Practical Experiments
AS-74.4192 Elementary Cybernetics Lecture 8: Practical Experiments Key point to note: In practice, theory has to be tuned appropriately Luckily, the neocybernetic framework is very versatile! After we
More informationSynaptic Plasticity. Introduction. Biophysics of Synaptic Plasticity. Functional Modes of Synaptic Plasticity. Activity-dependent synaptic plasticity:
Synaptic Plasticity Introduction Dayan and Abbott (2001) Chapter 8 Instructor: Yoonsuck Choe; CPSC 644 Cortical Networks Activity-dependent synaptic plasticity: underlies learning and memory, and plays
More informationHierarchy. Will Penny. 24th March Hierarchy. Will Penny. Linear Models. Convergence. Nonlinear Models. References
24th March 2011 Update Hierarchical Model Rao and Ballard (1999) presented a hierarchical model of visual cortex to show how classical and extra-classical Receptive Field (RF) effects could be explained
More informationSTA 414/2104: Lecture 8
STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks Delivered by Mark Ebden With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable
More informationData Preprocessing Tasks
Data Tasks 1 2 3 Data Reduction 4 We re here. 1 Dimensionality Reduction Dimensionality reduction is a commonly used approach for generating fewer features. Typically used because too many features can
More informationPrincipal Component Analysis
Principal Component Analysis Laurenz Wiskott Institute for Theoretical Biology Humboldt-University Berlin Invalidenstraße 43 D-10115 Berlin, Germany 11 March 2004 1 Intuition Problem Statement Experimental
More informationAn objective function for self-limiting neural plasticity rules.
ESANN 5 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges (Belgium), -4 April 5, i6doc.com publ., ISBN 978-875874-8. An objective function
More informationIPAM Summer School Optimization methods for machine learning. Jorge Nocedal
IPAM Summer School 2012 Tutorial on Optimization methods for machine learning Jorge Nocedal Northwestern University Overview 1. We discuss some characteristics of optimization problems arising in deep
More informationIntroduction to Machine Learning
10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what
More informationPattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore
Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Lecture - 27 Multilayer Feedforward Neural networks with Sigmoidal
More informationA summary of Deep Learning without Poor Local Minima
A summary of Deep Learning without Poor Local Minima by Kenji Kawaguchi MIT oral presentation at NIPS 2016 Learning Supervised (or Predictive) learning Learn a mapping from inputs x to outputs y, given
More informationDEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY
DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo
More informationDimensionality Reduction
Lecture 5 1 Outline 1. Overview a) What is? b) Why? 2. Principal Component Analysis (PCA) a) Objectives b) Explaining variability c) SVD 3. Related approaches a) ICA b) Autoencoders 2 Example 1: Sportsball
More informationc Springer, Reprinted with permission.
Zhijian Yuan and Erkki Oja. A FastICA Algorithm for Non-negative Independent Component Analysis. In Puntonet, Carlos G.; Prieto, Alberto (Eds.), Proceedings of the Fifth International Symposium on Independent
More informationUnsupervised learning: beyond simple clustering and PCA
Unsupervised learning: beyond simple clustering and PCA Liza Rebrova Self organizing maps (SOM) Goal: approximate data points in R p by a low-dimensional manifold Unlike PCA, the manifold does not have
More informationRecursive Generalized Eigendecomposition for Independent Component Analysis
Recursive Generalized Eigendecomposition for Independent Component Analysis Umut Ozertem 1, Deniz Erdogmus 1,, ian Lan 1 CSEE Department, OGI, Oregon Health & Science University, Portland, OR, USA. {ozertemu,deniz}@csee.ogi.edu
More informationGatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II
Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II Gatsby Unit University College London 27 Feb 2017 Outline Part I: Theory of ICA Definition and difference
More informationSTA 414/2104: Lecture 8
STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA
More informationThe Hebb rule Neurons that fire together wire together.
Unsupervised learning The Hebb rule Neurons that fire together wire together. PCA RF development with PCA Classical Conditioning and Hebbʼs rule Ear A Nose B Tongue When an axon in cell A is near enough
More informationNeocybernetics A New Kind of Natural Philosophy. Heikki Hyötyniemi HUT / Control Engineering Tutorial presentation at CIRA 05, June 27, 2005
Neocybernetics A New Kind of Natural Philosophy Heikki Hyötyniemi HUT / Control Engineering Tutorial presentation at CIRA 05, June 27, 2005 Heikki Hyötyniemi Chairman of the Finnish Artificial Intelligence
More informationEmergence of Phase- and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces
LETTER Communicated by Bartlett Mel Emergence of Phase- and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces Aapo Hyvärinen Patrik Hoyer Helsinki University
More informationCS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)
CS4495/6495 Introduction to Computer Vision 8B-L2 Principle Component Analysis (and its use in Computer Vision) Wavelength 2 Wavelength 2 Principal Components Principal components are all about the directions
More informationNeural networks: Unsupervised learning
Neural networks: Unsupervised learning 1 Previously The supervised learning paradigm: given example inputs x and target outputs t learning the mapping between them the trained network is supposed to give
More informationECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction
ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering
More informationDeep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści
Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?
More informationMachine Learning (Spring 2012) Principal Component Analysis
1-71 Machine Learning (Spring 1) Principal Component Analysis Yang Xu This note is partly based on Chapter 1.1 in Chris Bishop s book on PRML and the lecture slides on PCA written by Carlos Guestrin in
More information1 Principal Components Analysis
Lecture 3 and 4 Sept. 18 and Sept.20-2006 Data Visualization STAT 442 / 890, CM 462 Lecture: Ali Ghodsi 1 Principal Components Analysis Principal components analysis (PCA) is a very popular technique for
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationMachine Learning Techniques
Machine Learning Techniques ( 機器學習技法 ) Lecture 13: Deep Learning Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University ( 國立台灣大學資訊工程系
More information(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann
(Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for
More informationPrincipal Component Analysis
Principal Component Analysis Introduction Consider a zero mean random vector R n with autocorrelation matri R = E( T ). R has eigenvectors q(1),,q(n) and associated eigenvalues λ(1) λ(n). Let Q = [ q(1)
More informationStatistical Machine Learning
Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x
More informationNeural Networks. Mark van Rossum. January 15, School of Informatics, University of Edinburgh 1 / 28
1 / 28 Neural Networks Mark van Rossum School of Informatics, University of Edinburgh January 15, 2018 2 / 28 Goals: Understand how (recurrent) networks behave Find a way to teach networks to do a certain
More informationMachine learning for pervasive systems Classification in high-dimensional spaces
Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version
More informationMultilayer Perceptron
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Single Perceptron 3 Boolean Function Learning 4
More informationBasic Principles of Unsupervised and Unsupervised
Basic Principles of Unsupervised and Unsupervised Learning Toward Deep Learning Shun ichi Amari (RIKEN Brain Science Institute) collaborators: R. Karakida, M. Okada (U. Tokyo) Deep Learning Self Organization
More informationVector Space Models. wine_spectral.r
Vector Space Models 137 wine_spectral.r Latent Semantic Analysis Problem with words Even a small vocabulary as in wine example is challenging LSA Reduce number of columns of DTM by principal components
More informationDimension Reduction Techniques. Presented by Jie (Jerry) Yu
Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage
More informationIndependent Component Analysis and Its Applications. By Qing Xue, 10/15/2004
Independent Component Analysis and Its Applications By Qing Xue, 10/15/2004 Outline Motivation of ICA Applications of ICA Principles of ICA estimation Algorithms for ICA Extensions of basic ICA framework
More informationCPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018
CPSC 340: Machine Learning and Data Mining Sparse Matrix Factorization Fall 2018 Last Time: PCA with Orthogonal/Sequential Basis When k = 1, PCA has a scaling problem. When k > 1, have scaling, rotation,
More informationVariational Principal Components
Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings
More informationUnsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent
Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:
More informationNeural networks and optimization
Neural networks and optimization Nicolas Le Roux Criteo 18/05/15 Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 1 / 85 1 Introduction 2 Deep networks 3 Optimization 4 Convolutional
More informationWavelet Transform And Principal Component Analysis Based Feature Extraction
Wavelet Transform And Principal Component Analysis Based Feature Extraction Keyun Tong June 3, 2010 As the amount of information grows rapidly and widely, feature extraction become an indispensable technique
More informationPrincipal Components Analysis (PCA)
Principal Components Analysis (PCA) Principal Components Analysis (PCA) a technique for finding patterns in data of high dimension Outline:. Eigenvectors and eigenvalues. PCA: a) Getting the data b) Centering
More informationNatural Image Statistics
Natural Image Statistics A probabilistic approach to modelling early visual processing in the cortex Dept of Computer Science Early visual processing LGN V1 retina From the eye to the primary visual cortex
More informationNon-Convex Optimization. CS6787 Lecture 7 Fall 2017
Non-Convex Optimization CS6787 Lecture 7 Fall 2017 First some words about grading I sent out a bunch of grades on the course management system Everyone should have all their grades in Not including paper
More informationCPSC 340: Machine Learning and Data Mining. More PCA Fall 2017
CPSC 340: Machine Learning and Data Mining More PCA Fall 2017 Admin Assignment 4: Due Friday of next week. No class Monday due to holiday. There will be tutorials next week on MAP/PCA (except Monday).
More information[POLS 8500] Review of Linear Algebra, Probability and Information Theory
[POLS 8500] Review of Linear Algebra, Probability and Information Theory Professor Jason Anastasopoulos ljanastas@uga.edu January 12, 2017 For today... Basic linear algebra. Basic probability. Programming
More informationPCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given
More informationNeural Networks and the Back-propagation Algorithm
Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely
More informationCMSC 421: Neural Computation. Applications of Neural Networks
CMSC 42: Neural Computation definition synonyms neural networks artificial neural networks neural modeling connectionist models parallel distributed processing AI perspective Applications of Neural Networks
More informationCIFAR Lectures: Non-Gaussian statistics and natural images
CIFAR Lectures: Non-Gaussian statistics and natural images Dept of Computer Science University of Helsinki, Finland Outline Part I: Theory of ICA Definition and difference to PCA Importance of non-gaussianity
More informationOne-unit Learning Rules for Independent Component Analysis
One-unit Learning Rules for Independent Component Analysis Aapo Hyvarinen and Erkki Oja Helsinki University of Technology Laboratory of Computer and Information Science Rakentajanaukio 2 C, FIN-02150 Espoo,
More informationMathematical foundations - linear algebra
Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar
More informationTemporal Backpropagation for FIR Neural Networks
Temporal Backpropagation for FIR Neural Networks Eric A. Wan Stanford University Department of Electrical Engineering, Stanford, CA 94305-4055 Abstract The traditional feedforward neural network is a static
More informationPrincipal Component Analysis
B: Chapter 1 HTF: Chapter 1.5 Principal Component Analysis Barnabás Póczos University of Alberta Nov, 009 Contents Motivation PCA algorithms Applications Face recognition Facial expression recognition
More informationMachine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012
Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component
More informationEquivalence of Backpropagation and Contrastive Hebbian Learning in a Layered Network
LETTER Communicated by Geoffrey Hinton Equivalence of Backpropagation and Contrastive Hebbian Learning in a Layered Network Xiaohui Xie xhx@ai.mit.edu Department of Brain and Cognitive Sciences, Massachusetts
More informationPrinciple Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA
Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis: Uses one group of variables (we will call this X) In
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationLecture 7: Con3nuous Latent Variable Models
CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 7: Con3nuous Latent Variable Models All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/
More informationNeural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Neural Networks CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Perceptrons x 0 = 1 x 1 x 2 z = h w T x Output: z x D A perceptron
More informationLecture 7 Artificial neural networks: Supervised learning
Lecture 7 Artificial neural networks: Supervised learning Introduction, or how the brain works The neuron as a simple computing element The perceptron Multilayer neural networks Accelerated learning in
More informationMODELS OF LEARNING AND THE POLAR DECOMPOSITION OF BOUNDED LINEAR OPERATORS
Eighth Mississippi State - UAB Conference on Differential Equations and Computational Simulations. Electronic Journal of Differential Equations, Conf. 19 (2010), pp. 31 36. ISSN: 1072-6691. URL: http://ejde.math.txstate.edu
More informationCredit Assignment: Beyond Backpropagation
Credit Assignment: Beyond Backpropagation Yoshua Bengio 11 December 2016 AutoDiff NIPS 2016 Workshop oo b s res P IT g, M e n i arn nlin Le ain o p ee em : D will r G PLU ters p cha k t, u o is Deep Learning
More informationA two-layer ICA-like model estimated by Score Matching
A two-layer ICA-like model estimated by Score Matching Urs Köster and Aapo Hyvärinen University of Helsinki and Helsinki Institute for Information Technology Abstract. Capturing regularities in high-dimensional
More informationLecture 1: Systems of linear equations and their solutions
Lecture 1: Systems of linear equations and their solutions Course overview Topics to be covered this semester: Systems of linear equations and Gaussian elimination: Solving linear equations and applications
More informationCourse 495: Advanced Statistical Machine Learning/Pattern Recognition
Course 495: Advanced Statistical Machine Learning/Pattern Recognition Deterministic Component Analysis Goal (Lecture): To present standard and modern Component Analysis (CA) techniques such as Principal
More informationKernel Methods. Machine Learning A W VO
Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance
More information17 Neural Networks NEURAL NETWORKS. x XOR 1. x Jonathan Richard Shewchuk
94 Jonathan Richard Shewchuk 7 Neural Networks NEURAL NETWORKS Can do both classification & regression. [They tie together several ideas from the course: perceptrons, logistic regression, ensembles of
More informationData Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis
Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture
More informationLecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan
Lecture 3: Latent Variables Models and Learning with the EM Algorithm Sam Roweis Tuesday July25, 2006 Machine Learning Summer School, Taiwan Latent Variable Models What to do when a variable z is always
More informationSPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks
Topics in Machine Learning-EE 5359 Neural Networks 1 The Perceptron Output: A perceptron is a function that maps D-dimensional vectors to real numbers. For notational convenience, we add a zero-th dimension
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationNeural Networks Learning the network: Backprop , Fall 2018 Lecture 4
Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:
More informationMatrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =
30 MATHEMATICS REVIEW G A.1.1 Matrices and Vectors Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = a 11 a 12... a 1N a 21 a 22... a 2N...... a M1 a M2... a MN A matrix can
More informationNumerical Analysis Lecture Notes
Numerical Analysis Lecture Notes Peter J Olver 8 Numerical Computation of Eigenvalues In this part, we discuss some practical methods for computing eigenvalues and eigenvectors of matrices Needless to
More informationQuantitative Understanding in Biology Principal Components Analysis
Quantitative Understanding in Biology Principal Components Analysis Introduction Throughout this course we have seen examples of complex mathematical phenomena being represented as linear combinations
More informationLearning and Memory in Neural Networks
Learning and Memory in Neural Networks Guy Billings, Neuroinformatics Doctoral Training Centre, The School of Informatics, The University of Edinburgh, UK. Neural networks consist of computational units
More informationLecture: Face Recognition and Feature Reduction
Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab 1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed in the
More informationChapter 18. Remarks on partial differential equations
Chapter 8. Remarks on partial differential equations If we try to analyze heat flow or vibration in a continuous system such as a building or an airplane, we arrive at a kind of infinite system of ordinary
More informationAdvanced Introduction to Machine Learning CMU-10715
Advanced Introduction to Machine Learning CMU-10715 Independent Component Analysis Barnabás Póczos Independent Component Analysis 2 Independent Component Analysis Model original signals Observations (Mixtures)
More informationIterative face image feature extraction with Generalized Hebbian Algorithm and a Sanger-like BCM rule
Iterative face image feature extraction with Generalized Hebbian Algorithm and a Sanger-like BCM rule Clayton Aldern (Clayton_Aldern@brown.edu) Tyler Benster (Tyler_Benster@brown.edu) Carl Olsson (Carl_Olsson@brown.edu)
More informationMatching the dimensionality of maps with that of the data
Matching the dimensionality of maps with that of the data COLIN FYFE Applied Computational Intelligence Research Unit, The University of Paisley, Paisley, PA 2BE SCOTLAND. Abstract Topographic maps are
More informationPrincipal Component Analysis CS498
Principal Component Analysis CS498 Today s lecture Adaptive Feature Extraction Principal Component Analysis How, why, when, which A dual goal Find a good representation The features part Reduce redundancy
More informationADAPTIVE LATERAL INHIBITION FOR NON-NEGATIVE ICA. Mark Plumbley
Submitteed to the International Conference on Independent Component Analysis and Blind Signal Separation (ICA2) ADAPTIVE LATERAL INHIBITION FOR NON-NEGATIVE ICA Mark Plumbley Audio & Music Lab Department
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationSimple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017
Simple Techniques for Improving SGD CS6787 Lecture 2 Fall 2017 Step Sizes and Convergence Where we left off Stochastic gradient descent x t+1 = x t rf(x t ; yĩt ) Much faster per iteration than gradient
More informationIndependent Component Analysis. Contents
Contents Preface xvii 1 Introduction 1 1.1 Linear representation of multivariate data 1 1.1.1 The general statistical setting 1 1.1.2 Dimension reduction methods 2 1.1.3 Independence as a guiding principle
More informationA. The Hopfield Network. III. Recurrent Neural Networks. Typical Artificial Neuron. Typical Artificial Neuron. Hopfield Network.
Part 3A: Hopfield Network III. Recurrent Neural Networks A. The Hopfield Network 1 2 Typical Artificial Neuron Typical Artificial Neuron connection weights linear combination activation function inputs
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V
More informationManifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA
Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inria.fr http://perception.inrialpes.fr/
More informationSections 18.6 and 18.7 Artificial Neural Networks
Sections 18.6 and 18.7 Artificial Neural Networks CS4811 - Artificial Intelligence Nilufer Onder Department of Computer Science Michigan Technological University Outline The brain vs. artifical neural
More informationPrincipal Component Analysis (PCA) for Sparse High-Dimensional Data
AB Principal Component Analysis (PCA) for Sparse High-Dimensional Data Tapani Raiko, Alexander Ilin, and Juha Karhunen Helsinki University of Technology, Finland Adaptive Informatics Research Center Principal
More informationFactor Analysis (10/2/13)
STA561: Probabilistic machine learning Factor Analysis (10/2/13) Lecturer: Barbara Engelhardt Scribes: Li Zhu, Fan Li, Ni Guan Factor Analysis Factor analysis is related to the mixture models we have studied.
More information