Neural networks: Unsupervised learning

Size: px

Start display at page:

Download "Neural networks: Unsupervised learning"

Ethelbert McGee
6 years ago
Views:

1 Neural networks: Unsupervised learning 1

2 Previously The supervised learning paradigm: given example inputs x and target outputs t learning the mapping between them the trained network is supposed to give correct response for any given input stimulus training is equivalent of learning the appropriate weights to achiece this an objective function (or error function) is defined, which is minimized during training 2

3 Previously Optimization wrt. an objective function where (error function) (regularizer) 3

4 Interpret y(x,w) as a probability: Previously 4

5 Interpret y(x,w) as a probability: Previously the likelihood of the input data can be expressed with the original error function function 4

6 Interpret y(x,w) as a probability: Previously the likelihood of the input data can be expressed with the original error function function the regularizer has the form of a prior! 4

7 Interpret y(x,w) as a probability: Previously the likelihood of the input data can be expressed with the original error function function the regularizer has the form of a prior! what we get in the objective function M(w): the posterior distribution of w: 4

8 Interpret y(x,w) as a probability: Previously the likelihood of the input data can be expressed with the original error function function the regularizer has the form of a prior! what we get in the objective function M(w): the posterior distribution of w: The neuron s behavior is faithfully translated into probabilistic terms! 4

9 When making predictions... Previously 5

10 When making predictions... Original estimate Previously 5

11 Previously When making predictions... Original estimate Bayesian estimate 5

12 Previously When making predictions... Original estimate Bayesian estimate The probabilistic interpretation makes our assumptions explicit: by the regularizer we imposed a soft constraint on the learned parameters, which expresses our prior expecations. An additional plus: beyond getting w MP we get a measure for learned parameter uncertainty 5

13 What s coming? Networks & probabilistic framework: from the Hopfield network to Boltzmann machine What we learn? Density estimation, neural architecture and optimization principles: principal component analysis (PCA) How we learn? Hebb et al: Learning rules Any biology? Simple cells & ICA 6

14 Learning data... 7

15 Learning data... 7

16 Neural networks Unsupervised learning Capacity of a single neuron is limited: certain data can only be learned So far, we used a supervised learning paradigm: a teacher was necessary to teach an input-output relation Hopfield networks try to cure both Unsupervised learning: what is it about? Hebb rule: an enlightening example assuming 2 neurons and a weight modification process: This simple rule realizes an associative memory! 8

17 Neural networks The Hopfield network Architecture: a set of I neurons connected by symmetric synapses of weight w ij no self connections: w ii =0 output of neuron i: x i Activity rule: Learning rule: Synchronous/ asynchronous update ; 9

18 Neural networks The Hopfield network Architecture: a set of I neurons connected by symmetric synapses of weight w ij no self connections: w ii =0 output of neuron i: x i Activity rule: Learning rule: Synchronous/ asynchronous update alternatively, a continuous network can be defined as: ; 9

19 Neural networks Stability of Hopfield network Are the memories stable? the activation and activity rule together define a Lyapunov function Necessary conditions: symmetric weights; asynchronous update 10

20 Neural networks Stability of Hopfield network Are the memories stable? the activation and activity rule together define a Lyapunov function Necessary conditions: symmetric weights; asynchronous update 10

21 Neural networks Stability of Hopfield network Are the memories stable? the activation and activity rule together define a Lyapunov function Necessary conditions: symmetric weights; asynchronous update Robust against perturbation of a subset of weights 10

22 Neural networks Capacity of Hopfield network How many traces can be memorized by a network of I neurons? 11

23 Neural networks Capacity of Hopfield network Failures of the Hopfield networks: Corrupted bits Missing memory traces Spurious states not directly related to training data 12

24 Neural networks Capacity of Hopfield network Activation rule: Trace of the desired memory and additional random memories: 13

25 Neural networks Capacity of Hopfield network Activation rule: Trace of the desired memory and additional random memories: 13

26 Neural networks Capacity of Hopfield network Activation rule: Trace of the desired memory and additional random memories: desired state 13

27 Neural networks Capacity of Hopfield network Activation rule: Trace of the desired memory and additional random memories: desired state random contribution 13

28 Neural networks Capacity of Hopfield network Activation rule: Trace of the desired memory and additional random memories: desired state random contribution 13

29 Neural networks Capacity of Hopfield network Failure in operation: avalanches N/I < 0.138: spin glass states N/I ( ): states close to desired memories N/I (0 0.05): desired states have lower energy than spurious states N/I ( ): spurious states dominate N/I (0 0.03): mixture states The Hebb rule determines how well it performs other learning might do a better job (reiforcement learning) 14

30 Hopfield network for optimization 15

31 The Boltzmann machine The optimization performed by Hopfield network: minimizing 16

32 The Boltzmann machine The optimization performed by Hopfield network: minimizing Again: we can make a correspondence with a probabilistic model: 16

33 The Boltzmann machine The optimization performed by Hopfield network: minimizing Again: we can make a correspondence with a probabilistic model: What do we gain by this: more transparent functioning superior performance than Hebb rule 16

34 The Boltzmann machine The optimization performed by Hopfield network: minimizing Again: we can make a correspondence with a probabilistic model: What do we gain by this: more transparent functioning superior performance than Hebb rule Activity rule: 16

35 The Boltzmann machine The optimization performed by Hopfield network: minimizing Again: we can make a correspondence with a probabilistic model: What do we gain by this: more transparent functioning superior performance than Hebb rule Activity rule: How is learning performed? 16

36 Boltzmann machine -- EM Likelihood function: 17

37 Boltzmann machine -- EM Likelihood function: Estimating the parameters: 17

38 Boltzmann machine -- EM Likelihood function: Estimating the parameters: Minimizing for w: 17

39 Boltzmann machine -- EM Likelihood function: Estimating the parameters: Minimizing for w: Sleeping and waking: 17

40 Boltzmann machine -- EM Likelihood function: Estimating the parameters: Minimizing for w: Sleeping and waking: 17

41 Boltzmann machine -- EM Likelihood function: Estimating the parameters: Minimizing for w: Sleeping and waking: 17

42 Learning data... 18

43 Learning data... 18

44 Summary Boltmann translates the neural network mecanisms into a probablisitic framework Its capabilities are limited We learned that the probabilistic framework clarifies assumptions We learned that within the world constrained by our assumptions the probabilistic approach gives clear answers 19

45 Learning data... 20

46 Learning data... 20

47 Learning data... Hopfield/Boltman 20

48 Learning data... Hopfield/Boltman? 20

49 Principal Component Analysis 21

50 Principal Component Analysis Let s try to find linearly independent filters Set the basis along the eigenvectors of the data 21

51 Principal Component Analysis 22

52 Principal Component Analysis 22

53 Principal Component Analysis 22

54 Principal Component Analysis Olshausen & Field, Nature (1996) 22

55 Relation between PCA and learning rules A single neuron driven by multiple inputs: Basic Hebb rule: Averaged Hebb rule: Correlation based rule: note that 23

56 Relation between PCA and learning rules Making possible both LTP and LTD Postsynaptic threshold Postsynaptic threshold Setting the threshold to average postsynaptic activity:, where 24

57 Relation between PCA and learning rules Making possible both LTP and LTD Postsynaptic threshold homosynaptic depression Postsynaptic threshold heterosynaptic depression Setting the threshold to average postsynaptic activity:, where 24

58 Relation between PCA and learning rules Making possible both LTP and LTD Postsynaptic threshold homosynaptic depression Postsynaptic threshold heterosynaptic depression Setting the threshold to average postsynaptic activity:, where BCM rule: 24

59 Relation between PCA and learning rules Regularization again BCM rule: Hebb rule: (Oja rule) 25

60 Relation between PCA and learning rules 26

61 Relation between PCA and learning rules 26

62 Relation between PCA and learning rules 26

63 The architecture of the network and learning rule hand-in-hand detemine the learned representation 27

64 Density estimation Empirical distribution/ input distribution Latent variables: v Recognition: p[ v u] Generative distribution: Kullback-Leibler divergence: 28

65 Density estimation Empirical distribution/ input distribution Latent variables: v Recognition: p[ v u] generative model Generative distribution: Kullback-Leibler divergence: 28

66 Density estimation Empirical distribution/ input distribution Latent variables: v Recognition: p[ v u] generative model Generative distribution: Recognition distribution: Kullback-Leibler divergence: 28

67 Density estimation Empirical distribution/ input distribution Latent variables: v Recognition: p[ v u] generative model Generative distribution: Recognition distribution: Kullback-Leibler divergence: The match between our model distribution and input distribution 28

68 How to solve density estmation? EM 29

69 Sparse coding PCA: Sparse coding: make the reconstruction faithful keep the units/neurons quiet 30

70 Sparse coding PCA: trying to find linearly independent filters setting the basis along the eigenvectors of the data Sparse coding: make the reconstruction faithful keep the units/neurons quiet 30

71 Sparse coding PCA: trying to find linearly independent filters setting the basis along the eigenvectors of the data Sparse coding: make the reconstruction faithful keep the units/neurons quiet 30

72 Sparse coding Neural dynamics: gradient descent 31

73 Sparse coding 32

Introduction to Neural Networks

Introduction to Neural Networks What are (Artificial) Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning