Cogsci 118B. Virginia de Sa. Self-supervised Learning

Size: px

Start display at page:

Download "Cogsci 118B. Virginia de Sa. Self-supervised Learning"

Aldous Morton
5 years ago
Views:

1 Cogsci 118B 1 Virginia de Sa Self-supervised Learning

2 Self-Supervised Learning 2 How can we get a system to learn without providing it with a supervisory signal?

3 Newly-sighted adults see but don t see 3 Having often forgot which was the Cat, and which the Dog, he was asham d to ask; but catching the Cat (which he knew by feeling) he was observ d to look at her steadfastly and then setting her down, said, So Puss! I shall know you another Time. [Cheselden, 1728] When... the experiment was made of giving her a silver pencil case and a large key to examine with her hands; she discriminated and knew each distinctly; but when they were placed on the table, side by side, through she distinguished each with her eye, yet she could not tell which was the pencil case and which was the key. [Wardrop 1827] Thus, for patient TG, telling a circle from a square, or either from a triangle was very difficult; he had to stare at the angles, one at a time, engaging in what we have called scanning, to do it. [Valvo 1971]

4 Visual Cortical Areas 4 from Felleman, D.J. and Van Essen, D.C. (1991) Cerebral Cortex 1:1-47.

5 Multisensory integration and Cortical Feedback help pattern recognition learning 5

6 Peterson-Barney Vowel Formant Dataset: Supervised Case (with labels) 6 second formant frequency [F2] in Hz first formant frequency [F1] in Hz

7 Peterson-Barney Vowel Formant Dataset: Unsupervised Case (no labels) 7 second formant frequency [F2] in Hz first formant frequency [F1] in Hz

8 Motivation for Approach Self-Supervised Teaching 8 Supervised Unsupervised Self-Supervised - label implausible (implausible) label - limited power - derives label from a co-occuring input to "cow" Target another modality Input Input Input 1 Input 2 moo

9 The Minimizing-Disagreement(M-D) Algorithm Modality 2 P 0.5 P (C B )p(x 2 C B ) Modality 1 P P (C A )p(x 2 C A ) 0.3 P (C A )p(x 1 C A ) 0.3 P (C B )p(x 1 C B ) x x1

10 The Minimizing-Disagreement(M-D) Algorithm Modality 2 P 0.5 P (C B )p(x 2 C B ) Modality 1 P P (C A )p(x 2 C A ) 0.3 P (C A )p(x 1 C A ) 0.3 P (C B )p(x 1 C B ) x x1 P p(x 2 ) P p(x 1 ) x x1

11 The Minimizing-Disagreement(M-D) Algorithm Modality 2 P 0.5 P (C B )p(x 2 C B ) Modality 1 P P (C A )p(x 2 C A ) 0.3 P (C A )p(x 1 C A ) 0.3 P (C B )p(x 1 C B ) x x1 P p(x 2 ) P p(x 1 ) x2 p(x 2 ) p(x 1 ) x1

The Minimizing-Disagreement(M-D) Algorithm Modality 2 P 0.5 P (C B )p(x 2 C B ) Modality 1 P 0.5 9 P (C A )p(x 2 C A ) 0.3 P (C A )p(x 1 C A ) 0.

12 The Minimizing-Disagreement(M-D) Algorithm Modality 2 P 0.5 P (C B )p(x 2 C B ) Modality 1 P P (C A )p(x 2 C A ) 0.3 P (C A )p(x 1 C A ) 0.3 P (C B )p(x 1 C B ) x x1 P p(x 2 ) P p(x 1 ) x2 p(x 2 ) p(x 1 ) x1 Minimize: R R b2 b 1 p(x 1, x 2 )dx 1 dx 2 + R b 1 R b p(x 1, x 2 )dx 1 dx 2 2

13 Self-Supervised Teaching 10 "Class" Units Multi-sensory object area Hidden Units Modality/Network 1 Modality/Network 2 (Visual) (Auditory)

14 Self-Supervised Teaching 11 "Class" Units Multi-sensory object area Hidden Units Modality/Network 1 Modality/Network 2 (Visual) (Auditory)

15 Self-Supervised Teaching 12 "Class" Units Multi-sensory object area Hidden Units Modality/Network 1 Modality/Network 2 (Visual) (Auditory)

16 Self-Supervised Teaching 13 feedback of class picked by auditory input visual input

17 Self-Supervised Teaching 14 feedback of class picked by auditory input visual input move the weight from the same class towards and that from the other class away w j (n + 1) = w j (n) ± α(n) ( Xn w j (n)) Xn w j (n)

18 Data Collection 15 Auditory Processing Visual Processing Filter... Normal Flow 24 frequency channels x 9 time windows Average 25 spatial sums x 5 frames in time Σ Σ Σ

19 Sample Patterns 16 time space /WA/ /BA/ } time frequency }

20 Percentage Correct (Generalization Performance) Results on Visual-Auditory Dataset Self-Supervised Supervised Initial M-D LVQ 2.1 Labeling Auditory Network Visual Network 17

21 Simulation Conclusions 18 Clustering performance improves when information from other sensory modalities is used The minimizing-disagreement algorithm is a simple effective way of using the cross-sensory correlations Feedback connections are crucial to feed back the information from the other sensory modalities

22 Why are sensory modalities separated and connected the way they are? 19 Multi-sensory integration areas, Hippocampus olfaction IT V2? A2 A1 somatosensation V1

23 Why not mix up visual and auditory inputs? 20 Auditory Processing Visual Processing Filter... Normal Flow 24 frequency channels x 9 time windows Average 25 spatial sums x 5 frames in time Σ Σ Σ

24 Dividing Modalities into Sub-Modalities 21

25 Performance of Sub-Modalities 22 All numbers give percent correct classifications on independent test sets ± standard deviations Pseudo-Modality Supervised Performance Ax 89 ± 2 Ay 91 ± 2 Vx 83 ± 2 Vy 77 ± 3

26 Performance of Sub-Modalities 22 All numbers give percent correct classifications on independent test sets ± standard deviations Pseudo-Modality Supervised Performance Ax 89 ± 2 Ay 91 ± 2 Vx 83 ± 2 Vy 77 ± 3 Trained By Ax Ay Vx Vy Performance of Ax N/A

27 Performance of Sub-Modalities 23 All numbers give percent correct classifications on independent test sets ± standard deviations Pseudo-Modality Supervised Performance Ax 89 ± 2 Ay 91 ± 2 Vx 83 ± 2 Vy 77 ± 3 Trained By Ax Ay Vx Vy Performance of Ax N/A 69 ± 5 69 ± 3 63 ± 3

28 Performance of Sub-Modalities 24 All numbers give percent correct classifications on independent test sets ± standard deviations Pseudo-Modality Supervised Performance Ax 89 ± 2 Ay 91 ± 2 Vx 83 ± 2 Vy 77 ± 3 Trained By Ax Ay Vx Vy Performance of Ax N/A 69 ± 5 69 ± 3 63 ± 3 Performance of Ay 74 ± 5 N/A 80 ± 4 74 ± 5

29 Results for all Combinations of PseudoModalities Ax Ay,Vx,Vy Ay Ax,Vx,Vy Ax,Ay,Vy Vx Ax,Ay,Vx Vy Pseudo- Modality1 Pseudo- Modality2 Ax,Ay Vx,Vy Ax,Vx Ay,Vy Ay,Vx Ax,Vy

30 Removing same-modality inputs from other side Ax Ay,Vx,Vy Ay Ax,Vx,Vy Ax,Ay,Vy Vx Ax,Ay,Vx Vy P-M 1 P-M 2 Ax Ay,Vx,Vy Ay Ax,Vx,Vy Ax,Ay,Vy Vx Ax,Ay,Vx Vy

31 Joint Structure for Different Correlational Relationships 27 Distributions in One Modality Joint Distribution with ρ = Joint Distribution with ρ= Joint Distribution with ρ =

6 100 100 150 150 0 0 200 200 250 250 0.6 0.6 300 300 0.

32 Correlations in the Auditory-Visual Speech Dataset 28 Full Correlations 1 Within-Class Correlations

33 Conclusions 29 The best teaching signal is one that independently comes to the same conclusion Different sensory modalities are appropriate for teaching other modalities (and not as appropriate for teaching their own) This suggests a different role for lateral and feedback connections

Generative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul

Generative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul Generative Learning INFO-4604, Applied Machine Learning University of Colorado Boulder November 29, 2018 Prof. Michael Paul Generative vs Discriminative The classification algorithms we have seen so far