Cogsci 118B. Virginia de Sa. Self-supervised Learning
|
|
- Aldous Morton
- 5 years ago
- Views:
Transcription
1 Cogsci 118B 1 Virginia de Sa Self-supervised Learning
2 Self-Supervised Learning 2 How can we get a system to learn without providing it with a supervisory signal?
3 Newly-sighted adults see but don t see 3 Having often forgot which was the Cat, and which the Dog, he was asham d to ask; but catching the Cat (which he knew by feeling) he was observ d to look at her steadfastly and then setting her down, said, So Puss! I shall know you another Time. [Cheselden, 1728] When... the experiment was made of giving her a silver pencil case and a large key to examine with her hands; she discriminated and knew each distinctly; but when they were placed on the table, side by side, through she distinguished each with her eye, yet she could not tell which was the pencil case and which was the key. [Wardrop 1827] Thus, for patient TG, telling a circle from a square, or either from a triangle was very difficult; he had to stare at the angles, one at a time, engaging in what we have called scanning, to do it. [Valvo 1971]
4 Visual Cortical Areas 4 from Felleman, D.J. and Van Essen, D.C. (1991) Cerebral Cortex 1:1-47.
5 Multisensory integration and Cortical Feedback help pattern recognition learning 5
6 Peterson-Barney Vowel Formant Dataset: Supervised Case (with labels) 6 second formant frequency [F2] in Hz first formant frequency [F1] in Hz
7 Peterson-Barney Vowel Formant Dataset: Unsupervised Case (no labels) 7 second formant frequency [F2] in Hz first formant frequency [F1] in Hz
8 Motivation for Approach Self-Supervised Teaching 8 Supervised Unsupervised Self-Supervised - label implausible (implausible) label - limited power - derives label from a co-occuring input to "cow" Target another modality Input Input Input 1 Input 2 moo
9 The Minimizing-Disagreement(M-D) Algorithm Modality 2 P 0.5 P (C B )p(x 2 C B ) Modality 1 P P (C A )p(x 2 C A ) 0.3 P (C A )p(x 1 C A ) 0.3 P (C B )p(x 1 C B ) x x1
10 The Minimizing-Disagreement(M-D) Algorithm Modality 2 P 0.5 P (C B )p(x 2 C B ) Modality 1 P P (C A )p(x 2 C A ) 0.3 P (C A )p(x 1 C A ) 0.3 P (C B )p(x 1 C B ) x x1 P p(x 2 ) P p(x 1 ) x x1
11 The Minimizing-Disagreement(M-D) Algorithm Modality 2 P 0.5 P (C B )p(x 2 C B ) Modality 1 P P (C A )p(x 2 C A ) 0.3 P (C A )p(x 1 C A ) 0.3 P (C B )p(x 1 C B ) x x1 P p(x 2 ) P p(x 1 ) x2 p(x 2 ) p(x 1 ) x1
12 The Minimizing-Disagreement(M-D) Algorithm Modality 2 P 0.5 P (C B )p(x 2 C B ) Modality 1 P P (C A )p(x 2 C A ) 0.3 P (C A )p(x 1 C A ) 0.3 P (C B )p(x 1 C B ) x x1 P p(x 2 ) P p(x 1 ) x2 p(x 2 ) p(x 1 ) x1 Minimize: R R b2 b 1 p(x 1, x 2 )dx 1 dx 2 + R b 1 R b p(x 1, x 2 )dx 1 dx 2 2
13 Self-Supervised Teaching 10 "Class" Units Multi-sensory object area Hidden Units Modality/Network 1 Modality/Network 2 (Visual) (Auditory)
14 Self-Supervised Teaching 11 "Class" Units Multi-sensory object area Hidden Units Modality/Network 1 Modality/Network 2 (Visual) (Auditory)
15 Self-Supervised Teaching 12 "Class" Units Multi-sensory object area Hidden Units Modality/Network 1 Modality/Network 2 (Visual) (Auditory)
16 Self-Supervised Teaching 13 feedback of class picked by auditory input visual input
17 Self-Supervised Teaching 14 feedback of class picked by auditory input visual input move the weight from the same class towards and that from the other class away w j (n + 1) = w j (n) ± α(n) ( Xn w j (n)) Xn w j (n)
18 Data Collection 15 Auditory Processing Visual Processing Filter... Normal Flow 24 frequency channels x 9 time windows Average 25 spatial sums x 5 frames in time Σ Σ Σ
19 Sample Patterns 16 time space /WA/ /BA/ } time frequency }
20 Percentage Correct (Generalization Performance) Results on Visual-Auditory Dataset Self-Supervised Supervised Initial M-D LVQ 2.1 Labeling Auditory Network Visual Network 17
21 Simulation Conclusions 18 Clustering performance improves when information from other sensory modalities is used The minimizing-disagreement algorithm is a simple effective way of using the cross-sensory correlations Feedback connections are crucial to feed back the information from the other sensory modalities
22 Why are sensory modalities separated and connected the way they are? 19 Multi-sensory integration areas, Hippocampus olfaction IT V2? A2 A1 somatosensation V1
23 Why not mix up visual and auditory inputs? 20 Auditory Processing Visual Processing Filter... Normal Flow 24 frequency channels x 9 time windows Average 25 spatial sums x 5 frames in time Σ Σ Σ
24 Dividing Modalities into Sub-Modalities 21
25 Performance of Sub-Modalities 22 All numbers give percent correct classifications on independent test sets ± standard deviations Pseudo-Modality Supervised Performance Ax 89 ± 2 Ay 91 ± 2 Vx 83 ± 2 Vy 77 ± 3
26 Performance of Sub-Modalities 22 All numbers give percent correct classifications on independent test sets ± standard deviations Pseudo-Modality Supervised Performance Ax 89 ± 2 Ay 91 ± 2 Vx 83 ± 2 Vy 77 ± 3 Trained By Ax Ay Vx Vy Performance of Ax N/A
27 Performance of Sub-Modalities 23 All numbers give percent correct classifications on independent test sets ± standard deviations Pseudo-Modality Supervised Performance Ax 89 ± 2 Ay 91 ± 2 Vx 83 ± 2 Vy 77 ± 3 Trained By Ax Ay Vx Vy Performance of Ax N/A 69 ± 5 69 ± 3 63 ± 3
28 Performance of Sub-Modalities 24 All numbers give percent correct classifications on independent test sets ± standard deviations Pseudo-Modality Supervised Performance Ax 89 ± 2 Ay 91 ± 2 Vx 83 ± 2 Vy 77 ± 3 Trained By Ax Ay Vx Vy Performance of Ax N/A 69 ± 5 69 ± 3 63 ± 3 Performance of Ay 74 ± 5 N/A 80 ± 4 74 ± 5
29 Results for all Combinations of PseudoModalities Ax Ay,Vx,Vy Ay Ax,Vx,Vy Ax,Ay,Vy Vx Ax,Ay,Vx Vy Pseudo- Modality1 Pseudo- Modality2 Ax,Ay Vx,Vy Ax,Vx Ay,Vy Ay,Vx Ax,Vy
30 Removing same-modality inputs from other side Ax Ay,Vx,Vy Ay Ax,Vx,Vy Ax,Ay,Vy Vx Ax,Ay,Vx Vy P-M 1 P-M 2 Ax Ay,Vx,Vy Ay Ax,Vx,Vy Ax,Ay,Vy Vx Ax,Ay,Vx Vy
31 Joint Structure for Different Correlational Relationships 27 Distributions in One Modality Joint Distribution with ρ = Joint Distribution with ρ= Joint Distribution with ρ =
32 Correlations in the Auditory-Visual Speech Dataset 28 Full Correlations 1 Within-Class Correlations
33 Conclusions 29 The best teaching signal is one that independently comes to the same conclusion Different sensory modalities are appropriate for teaching other modalities (and not as appropriate for teaching their own) This suggests a different role for lateral and feedback connections
Generative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul
Generative Learning INFO-4604, Applied Machine Learning University of Colorado Boulder November 29, 2018 Prof. Michael Paul Generative vs Discriminative The classification algorithms we have seen so far
More informationLecture 3: Pattern Classification
EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures
More informationMachine Learning CISC 5800 Dr Daniel Leeds
Machine Learning CISC 5800 Dr Daniel Leeds What is machine learning Finding patterns in data Adapting program behavior 2 Dog photos and the internet Change radio channel when user says change channel Model
More informationGaussian Statistics and Unsupervised Learning
Gaussian Statistics and Unsupervised Learning 1 Gaussian Statistics and Unsupervised Learning A Tutorial for the Course Computational Intelligence http://www.igi.tugraz.at/lehre/ci Barbara Resch (minor
More informationMachine Learning (CS 567) Lecture 2
Machine Learning (CS 567) Lecture 2 Time: T-Th 5:00pm - 6:20pm Location: GFS118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationCompetitive Learning for Deep Temporal Networks
Competitive Learning for Deep Temporal Networks Robert Gens Computer Science and Engineering University of Washington Seattle, WA 98195 rcg@cs.washington.edu Pedro Domingos Computer Science and Engineering
More informationReview for Exam 1. Erik G. Learned-Miller Department of Computer Science University of Massachusetts, Amherst Amherst, MA
Review for Exam Erik G. Learned-Miller Department of Computer Science University of Massachusetts, Amherst Amherst, MA 0003 March 26, 204 Abstract Here are some things you need to know for the in-class
More informationSample Exam COMP 9444 NEURAL NETWORKS Solutions
FAMILY NAME OTHER NAMES STUDENT ID SIGNATURE Sample Exam COMP 9444 NEURAL NETWORKS Solutions (1) TIME ALLOWED 3 HOURS (2) TOTAL NUMBER OF QUESTIONS 12 (3) STUDENTS SHOULD ANSWER ALL QUESTIONS (4) QUESTIONS
More informationPattern Recognition Applied to Music Signals
JHU CLSP Summer School Pattern Recognition Applied to Music Signals 2 3 4 5 Music Content Analysis Classification and Features Statistical Pattern Recognition Gaussian Mixtures and Neural Nets Singing
More informationLinear & Non-Linear Discriminant Analysis! Hugh R. Wilson
Linear & Non-Linear Discriminant Analysis! Hugh R. Wilson PCA Review! Supervised learning! Fisher linear discriminant analysis! Nonlinear discriminant analysis! Research example! Multiple Classes! Unsupervised
More informationMetric-based classifiers. Nuno Vasconcelos UCSD
Metric-based classifiers Nuno Vasconcelos UCSD Statistical learning goal: given a function f. y f and a collection of eample data-points, learn what the function f. is. this is called training. two major
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationNeural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha Outline Goal is to predict secondary structure of a protein from its sequence Artificial Neural Network used for this
More informationClassification 2: Linear discriminant analysis (continued); logistic regression
Classification 2: Linear discriminant analysis (continued); logistic regression Ryan Tibshirani Data Mining: 36-462/36-662 April 4 2013 Optional reading: ISL 4.4, ESL 4.3; ISL 4.3, ESL 4.4 1 Reminder:
More informationMachine Learning CISC 5800 Dr Daniel Leeds
Machine Learning CISC 5800 Dr Daniel Leeds What is machine learning Finding patterns in data Adapting program behavior 2 Advertise a customer s favorite products This summer, I had two meetings, one in
More informationWhen Dictionary Learning Meets Classification
When Dictionary Learning Meets Classification Bufford, Teresa 1 Chen, Yuxin 2 Horning, Mitchell 3 Shee, Liberty 1 Mentor: Professor Yohann Tendero 1 UCLA 2 Dalhousie University 3 Harvey Mudd College August
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationUnsupervised Learning with Permuted Data
Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University
More informationBayesian probability theory and generative models
Bayesian probability theory and generative models Bruno A. Olshausen November 8, 2006 Abstract Bayesian probability theory provides a mathematical framework for peforming inference, or reasoning, using
More informationLecture 3: Machine learning, classification, and generative models
EE E6820: Speech & Audio Processing & Recognition Lecture 3: Machine learning, classification, and generative models 1 Classification 2 Generative models 3 Gaussian models Michael Mandel
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationMachine Learning 2nd Edition
INTRODUCTION TO Lecture Slides for Machine Learning 2nd Edition ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some parts from http://www.cs.tau.ac.il/~apartzin/machinelearning/ The MIT Press, 2010
More informationSession 1: Pattern Recognition
Proc. Digital del Continguts Musicals Session 1: Pattern Recognition 1 2 3 4 5 Music Content Analysis Pattern Classification The Statistical Approach Distribution Models Singing Detection Dan Ellis
More informationCS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 3
CS434a/541a: attern Recognition rof. Olga Veksler Lecture 3 1 Announcements Link to error data in the book Reading assignment Assignment 1 handed out, due Oct. 4 lease send me an email with your name and
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationUnsupervised Learning
2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and
More informationExpectation Maximization, and Learning from Partly Unobserved Data (part 2)
Expectation Maximization, and Learning from Partly Unobserved Data (part 2) Machine Learning 10-701 April 2005 Tom M. Mitchell Carnegie Mellon University Clustering Outline K means EM: Mixture of Gaussians
More informationGENOMIC SIGNAL PROCESSING. Lecture 2. Classification of disease subtype based on microarray data
GENOMIC SIGNAL PROCESSING Lecture 2 Classification of disease subtype based on microarray data 1. Analysis of microarray data (see last 15 slides of Lecture 1) 2. Classification methods for microarray
More informationClustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.
Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)
More informationWill Penny. 21st April The Macroscopic Brain. Will Penny. Cortical Unit. Spectral Responses. Macroscopic Models. Steady-State Responses
The The 21st April 2011 Jansen and Rit (1995), building on the work of Lopes Da Sliva and others, developed a biologically inspired model of EEG activity. It was originally developed to explain alpha activity
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationHeeyoul (Henry) Choi. Dept. of Computer Science Texas A&M University
Heeyoul (Henry) Choi Dept. of Computer Science Texas A&M University hchoi@cs.tamu.edu Introduction Speaker Adaptation Eigenvoice Comparison with others MAP, MLLR, EMAP, RMP, CAT, RSW Experiments Future
More informationMixtures of Gaussians continued
Mixtures of Gaussians continued Machine Learning CSE446 Carlos Guestrin University of Washington May 17, 2013 1 One) bad case for k-means n Clusters may overlap n Some clusters may be wider than others
More informationLECTURE 10: REVIEW OF POWER SERIES. 1. Motivation
LECTURE 10: REVIEW OF POWER SERIES By definition, a power series centered at x 0 is a series of the form where a 0, a 1,... and x 0 are constants. For convenience, we shall mostly be concerned with the
More informationNatural Image Statistics and Neural Representations
Natural Image Statistics and Neural Representations Michael Lewicki Center for the Neural Basis of Cognition & Department of Computer Science Carnegie Mellon University? 1 Outline 1. Information theory
More informationWe Prediction of Geological Characteristic Using Gaussian Mixture Model
We-07-06 Prediction of Geological Characteristic Using Gaussian Mixture Model L. Li* (BGP,CNPC), Z.H. Wan (BGP,CNPC), S.F. Zhan (BGP,CNPC), C.F. Tao (BGP,CNPC) & X.H. Ran (BGP,CNPC) SUMMARY The multi-attribute
More informationName Date Class CHAPTER ASSESSMENT. a. Refers to how close a series of measurements are to one another
Data Analysis Reviewing Vocabulary Match each term in Column A with its definition in Column B. d f j h e b i g c a Column A 1. base unit 2. derived unit 3. graph 4. scientific notation 5. accuracy 6.
More informationCS 340 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis
CS 3 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis AD March 11 AD ( March 11 1 / 17 Multivariate Gaussian Consider data { x i } N i=1 where xi R D and we assume they are
More informationMachine Learning. Boris
Machine Learning Boris Nadion boris@astrails.com @borisnadion @borisnadion boris@astrails.com astrails http://astrails.com awesome web and mobile apps since 2005 terms AI (artificial intelligence)
More informationCourse Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch
Psychology 452 Week 12: Deep Learning What Is Deep Learning? Preliminary Ideas (that we already know!) The Restricted Boltzmann Machine (RBM) Many Layers of RBMs Pros and Cons of Deep Learning Course Structure
More informationRecap from previous lecture
Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationModeling Complex Temporal Composition of Actionlets for Activity Prediction
Modeling Complex Temporal Composition of Actionlets for Activity Prediction ECCV 2012 Activity Recognition Reading Group Framework of activity prediction What is an Actionlet To segment a long sequence
More informationMixture of Gaussians Models
Mixture of Gaussians Models Outline Inference, Learning, and Maximum Likelihood Why Mixtures? Why Gaussians? Building up to the Mixture of Gaussians Single Gaussians Fully-Observed Mixtures Hidden Mixtures
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a
More informationClassification on Pairwise Proximity Data
Classification on Pairwise Proximity Data Thore Graepel, Ralf Herbrich, Peter Bollmann-Sdorra y and Klaus Obermayer yy This paper is a submission to the Neural Information Processing Systems 1998. Technical
More informationGlobal Scene Representations. Tilke Judd
Global Scene Representations Tilke Judd Papers Oliva and Torralba [2001] Fei Fei and Perona [2005] Labzebnik, Schmid and Ponce [2006] Commonalities Goal: Recognize natural scene categories Extract features
More informationTensor Methods for Feature Learning
Tensor Methods for Feature Learning Anima Anandkumar U.C. Irvine Feature Learning For Efficient Classification Find good transformations of input for improved classification Figures used attributed to
More informationSELECTIVE APPROACH TO SOLVING SYLLOGISM
SELECTIVE APPROACH TO SOLVING SYLLOGISM While solving Syllogism questions, we encounter many weird Statements: Some Cows are Ugly, Some Lions are Vegetarian, Some Cats are not Dogs, Some Girls are Boys,
More informationData Preprocessing. Cluster Similarity
1 Cluster Similarity Similarity is most often measured with the help of a distance function. The smaller the distance, the more similar the data objects (points). A function d: M M R is a distance on M
More informationIntroduction to Statistical Inference
Structural Health Monitoring Using Statistical Pattern Recognition Introduction to Statistical Inference Presented by Charles R. Farrar, Ph.D., P.E. Outline Introduce statistical decision making for Structural
More informationARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92
ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 BIOLOGICAL INSPIRATIONS Some numbers The human brain contains about 10 billion nerve cells (neurons) Each neuron is connected to the others through 10000
More informationCSC321 Lecture 7 Neural language models
CSC321 Lecture 7 Neural language models Roger Grosse and Nitish Srivastava February 1, 2015 Roger Grosse and Nitish Srivastava CSC321 Lecture 7 Neural language models February 1, 2015 1 / 19 Overview We
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1
More informationNeural networks and optimization
Neural networks and optimization Nicolas Le Roux INRIA 8 Nov 2011 Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov 2011 1 / 80 1 Introduction 2 Linear classifier 3 Convolutional neural networks
More informationGenerative Clustering, Topic Modeling, & Bayesian Inference
Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week
More informationCS 6375 Machine Learning
CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.
More informationSupervised Learning. George Konidaris
Supervised Learning George Konidaris gdk@cs.brown.edu Fall 2017 Machine Learning Subfield of AI concerned with learning from data. Broadly, using: Experience To Improve Performance On Some Task (Tom Mitchell,
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationPattern Recognition and Machine Learning. Learning and Evaluation of Pattern Recognition Processes
Pattern Recognition and Machine Learning James L. Crowley ENSIMAG 3 - MMIS Fall Semester 2016 Lesson 1 5 October 2016 Learning and Evaluation of Pattern Recognition Processes Outline Notation...2 1. The
More informationVoice Activity Detection Using Pitch Feature
Voice Activity Detection Using Pitch Feature Presented by: Shay Perera 1 CONTENTS Introduction Related work Proposed Improvement References Questions 2 PROBLEM speech Non speech Speech Region Non Speech
More informationElectroencephalogram Based Causality Graph Analysis in Behavior Tasks of Parkinson s Disease Patients
University of Denver Digital Commons @ DU Electronic Theses and Dissertations Graduate Studies 1-1-2015 Electroencephalogram Based Causality Graph Analysis in Behavior Tasks of Parkinson s Disease Patients
More informationLecture 1: Bayesian Framework Basics
Lecture 1: Bayesian Framework Basics Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de April 21, 2014 What is this course about? Building Bayesian machine learning models Performing the inference of
More informationSupervised learning in single-stage feedforward networks
Supervised learning in single-stage feedforward networks Bruno A Olshausen September, 204 Abstract This handout describes supervised learning in single-stage feedforward networks composed of McCulloch-Pitts
More informationAbsolute Value. That was easy
Absolute Value The absolute value of a number is its distance from 0 on a number line. Absolute value is always nonnegative since distance is always nonnegative. Let s write an example using mathematical
More informationA Similarity Metric for Spatial Probability Distributions
A imilarity Metric for patial Probability Distributions Michael H. Coen Massachusetts Institute of Technology Computer cience and Artificial Intelligence Laboratory 32 Vassar t, Cambridge, MA 239 mhcoen@csail.mit.edu
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality
More informationLearning Vector Quantization (LVQ)
Learning Vector Quantization (LVQ) Introduction to Neural Computation : Guest Lecture 2 John A. Bullinaria, 2007 1. The SOM Architecture and Algorithm 2. What is Vector Quantization? 3. The Encoder-Decoder
More informationattention mechanisms and generative models
attention mechanisms and generative models Master's Deep Learning Sergey Nikolenko Harbour Space University, Barcelona, Spain November 20, 2017 attention in neural networks attention You re paying attention
More informationData Informatics. Seon Ho Kim, Ph.D.
Data Informatics Seon Ho Kim, Ph.D. seonkim@usc.edu What is Machine Learning? Overview slides by ETHEM ALPAYDIN Why Learn? Learn: programming computers to optimize a performance criterion using example
More informationAnalyzing the Performance of Multilayer Neural Networks for Object Recognition
Analyzing the Performance of Multilayer Neural Networks for Object Recognition Pulkit Agrawal, Ross Girshick, Jitendra Malik {pulkitag,rbg,malik}@eecs.berkeley.edu University of California Berkeley Supplementary
More informationHow to do backpropagation in a brain
How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep
More informationIntroduction to Graphical Models
Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic
More informationc 4, < y 2, 1 0, otherwise,
Fundamentals of Big Data Analytics Univ.-Prof. Dr. rer. nat. Rudolf Mathar Problem. Probability theory: The outcome of an experiment is described by three events A, B and C. The probabilities Pr(A) =,
More informationMachine Learning 9/2/2015. What is machine learning. Advertise a customer s favorite products. Search the web to find pictures of dogs
9//5 What is machine learning Machine Learning CISC 58 Dr Daniel Leeds Finding patterns in data Adapting program behavior Advertise a customer s favorite products Search the web to find pictures of dogs
More informationProject One: C Bump functions
Project One: C Bump functions James K. Peterson Department of Biological Sciences and Department of Mathematical Sciences Clemson University November 2, 2018 Outline 1 2 The Project Let s recall what the
More informationHierarchy. Will Penny. 24th March Hierarchy. Will Penny. Linear Models. Convergence. Nonlinear Models. References
24th March 2011 Update Hierarchical Model Rao and Ballard (1999) presented a hierarchical model of visual cortex to show how classical and extra-classical Receptive Field (RF) effects could be explained
More informationLecture 9: Speech Recognition. Recognizing Speech
EE E68: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition 3 4 Recognizing Speech Feature Calculation Sequence Recognition Hidden Markov Models Dan Ellis http://www.ee.columbia.edu/~dpwe/e68/
More informationLecture 9: Speech Recognition
EE E682: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition 1 2 3 4 Recognizing Speech Feature Calculation Sequence Recognition Hidden Markov Models Dan Ellis
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationTemporal Modeling and Basic Speech Recognition
UNIVERSITY ILLINOIS @ URBANA-CHAMPAIGN OF CS 498PS Audio Computing Lab Temporal Modeling and Basic Speech Recognition Paris Smaragdis paris@illinois.edu paris.cs.illinois.edu Today s lecture Recognizing
More informationRegularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline
Other Measures 1 / 52 sscott@cse.unl.edu learning can generally be distilled to an optimization problem Choose a classifier (function, hypothesis) from a set of functions that minimizes an objective function
More informationQuarter 2 400, , , , , , ,000 50,000
Algebra 2 Quarter 2 Quadratic Functions Introduction to Polynomial Functions Hybrid Electric Vehicles Since 1999, there has been a growing trend in the sales of hybrid electric vehicles. These data show
More informationReading Group on Deep Learning Session 2
Reading Group on Deep Learning Session 2 Stephane Lathuiliere & Pablo Mesejo 10 June 2016 1/39 Chapter Structure Introduction. 5.1. Feed-forward Network Functions. 5.2. Network Training. 5.3. Error Backpropagation.
More informationBrief Introduction of Machine Learning Techniques for Content Analysis
1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview
More informationInstructor Notes for Chapters 3 & 4
Algebra for Calculus Fall 0 Section 3. Complex Numbers Goal for students: Instructor Notes for Chapters 3 & 4 perform computations involving complex numbers You might want to review the quadratic formula
More informationNatural Language Processing
Natural Language Processing Info 59/259 Lecture 4: Text classification 3 (Sept 5, 207) David Bamman, UC Berkeley . https://www.forbes.com/sites/kevinmurnane/206/04/0/what-is-deep-learning-and-how-is-it-useful
More informationPattern Recognition and Machine Learning. Artificial Neural networks
Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016/2017 Lessons 9 11 Jan 2017 Outline Artificial Neural networks Notation...2 Convolutional Neural Networks...3
More informationIntelligence and statistics for rapid and robust earthquake detection, association and location
Intelligence and statistics for rapid and robust earthquake detection, association and location Anthony Lomax ALomax Scientific, Mouans-Sartoux, France anthony@alomax.net www.alomax.net @ALomaxNet Alberto
More informationSUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION
SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology
More informationIntroduction to Biomedical Engineering
Introduction to Biomedical Engineering Biosignal processing Kung-Bin Sung 6/11/2007 1 Outline Chapter 10: Biosignal processing Characteristics of biosignals Frequency domain representation and analysis
More information1 Probabilities. 1.1 Basics 1 PROBABILITIES
1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability
More informationIntelligent Systems Statistical Machine Learning
Intelligent Systems Statistical Machine Learning Carsten Rother, Dmitrij Schlesinger WS2014/2015, Our tasks (recap) The model: two variables are usually present: - the first one is typically discrete k
More informationSupplemental Materials
Supplemental Materials Teaching Total Recall of the Alphabet for Students Learning to Read, Write, and Spell with Blumenfeld s Alpha-Phonics Prepared by Donald L. Potter Odessa, TX Copyright November 12,
More informationLecture 3: Pattern Classification. Pattern classification
EE E68: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mitures and
More informationData Analysis for an Absolute Identification Experiment. Randomization with Replacement. Randomization without Replacement
Data Analysis for an Absolute Identification Experiment 1 Randomization with Replacement Imagine that you have k containers for the k stimulus alternatives The i th container has a fixed number of copies
More informationA.2 Angular Resolution: Seeing Details with the Eye
CHAPTER A. LABORATORY EXPERIMENTS 13 Name: Section: Date: A.2 Angular Resolution: Seeing Details with the Eye I. Introduction We can see through a telescope that the surface of the Moon is covered with
More informationT 1. The value function v(x) is the expected net gain when using the optimal stopping time starting at state x:
108 OPTIMAL STOPPING TIME 4.4. Cost functions. The cost function g(x) gives the price you must pay to continue from state x. If T is your stopping time then X T is your stopping state and f(x T ) is your
More informationUNSUPERVISED LEARNING
UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training
More information