Neural Networks. Neural Network Motivation. Why Neural Networks? Comments on Blue Gene. More Comments on Blue Gene

Similar documents
Neural networks. Nuno Vasconcelos ECE Department, UCSD

Admin NEURAL NETWORKS. Perceptron learning algorithm. Our Nervous System 10/25/16. Assignment 7. Class 11/22. Schedule for the rest of the semester

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

Multi-layer neural networks

Multilayer Perceptron (MLP)

Week 5: Neural Networks

Multilayer neural networks

Neural Networks. Perceptrons and Backpropagation. Silke Bussen-Heyen. 5th of Novemeber Universität Bremen Fachbereich 3. Neural Networks 1 / 17

EEE 241: Linear Systems

Introduction to the Introduction to Artificial Neural Network

Evaluation of classifiers MLPs

Generalized Linear Methods

Internet Engineering. Jacek Mazurkiewicz, PhD Softcomputing. Part 3: Recurrent Artificial Neural Networks Self-Organising Artificial Neural Networks

Ensemble Methods: Boosting

CSC 411 / CSC D11 / CSC C11

Neural Networks. Class 22: MLSP, Fall 2016 Instructor: Bhiksha Raj

Multilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Multi layer feed-forward NN FFNN. XOR problem. XOR problem. Neural Network for Speech. NETtalk (Sejnowski & Rosenberg, 1987) NETtalk (contd.

Lecture 23: Artificial neural networks

1 Convex Optimization

Neural Networks & Learning

10-701/ Machine Learning, Fall 2005 Homework 3

CS294A Lecture notes. Andrew Ng

Other NN Models. Reinforcement learning (RL) Probabilistic neural networks

Kernel Methods and SVMs Extension

Hopfield networks and Boltzmann machines. Geoffrey Hinton et al. Presented by Tambet Matiisen

Linear Classification, SVMs and Nearest Neighbors

Supporting Information

Kernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan

Feature Selection: Part 1

Model of Neurons. CS 416 Artificial Intelligence. Early History of Neural Nets. Cybernetics. McCulloch-Pitts Neurons. Hebbian Modification.

Unsupervised Learning

Solving Nonlinear Differential Equations by a Neural Network Method

Lecture Notes on Linear Regression

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

Introduction to Neural Networks. David Stutz

2 Laminar Structure of Cortex. 4 Area Structure of Cortex

MATH 567: Mathematical Techniques in Data Science Lab 8

CHALMERS, GÖTEBORGS UNIVERSITET. SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD

CS294A Lecture notes. Andrew Ng

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

Using deep belief network modelling to characterize differences in brain morphometry in schizophrenia

CSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

Radial-Basis Function Networks

Fundamentals of Neural Networks

Which Separator? Spring 1

Support Vector Machines

Homework Assignment 3 Due in class, Thursday October 15

The Cortex. Networks. Laminar Structure of Cortex. Chapter 3, O Reilly & Munakata.

Video Data Analysis. Video Data Analysis, B-IT

Discriminative classifier: Logistic Regression. CS534-Machine Learning

CSE 546 Midterm Exam, Fall 2014(with Solution)

Boostrapaggregating (Bagging)

Support Vector Machines

arxiv: v1 [cs.cv] 9 Nov 2017

Linear Feature Engineering 11

Hopfield Training Rules 1 N

Multigradient for Neural Networks for Equalizers 1

WE extend the familiar unidirectional backpropagation

Machine Learning CS-527A ANN ANN. ANN Short History ANN. Artificial Neural Networks (ANN) Artificial Neural Networks

18-660: Numerical Methods for Engineering Design and Optimization

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach

Errors for Linear Systems

Time Series Forecasting Using Artificial Neural Networks under Dempster Shafer Evidence Theory and Trimmed-winsorized Means

Kristin P. Bennett. Rensselaer Polytechnic Institute

Lecture 10 Support Vector Machines II

IV. Performance Optimization

Support Vector Machines

Clustering with Gaussian Mixtures

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

Neural Networks. Adapted from slides by Tim Finin and Marie desjardins. Some material adapted from lecture notes by Lise Getoor and Ron Parr

RBF Neural Network Model Training by Unscented Kalman Filter and Its Application in Mechanical Fault Diagnosis

SDMML HT MSc Problem Sheet 4

Gaussian Mixture Models

We present the algorithm first, then derive it later. Assume access to a dataset {(x i, y i )} n i=1, where x i R d and y i { 1, 1}.

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CSE 252C: Computer Vision III

Chapter 11: Simple Linear Regression and Correlation

Logistic Classifier CISC 5800 Professor Daniel Leeds

8 Derivation of Network Rate Equations from Single- Cell Conductance Equations

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Classification as a Regression Problem

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE

Clustering gene expression data & the EM algorithm

Some modelling aspects for the Matlab implementation of MMA

Difference Equations

Fundamentals of Computational Neuroscience 2e

Instance-Based Learning (a.k.a. memory-based learning) Part I: Nearest Neighbor Classification

Probabilistic Classification: Bayes Classifiers 2

De-noising Method Based on Kernel Adaptive Filtering for Telemetry Vibration Signal of the Vehicle Test Kejun ZENG

A new Approach for Solving Linear Ordinary Differential Equations

18.1 Introduction and Recap

The exam is closed book, closed notes except your one-page cheat sheet.

Generative classification models

LECTURE NOTES. Artifical Neural Networks. B. MEHLIG (course home page)

Singular Value Decomposition: Theory and Applications

Transcription:

Motvaton for non-lnear Classfers Neural Networs CPS 27 Ron Parr Lnear methods are wea Mae strong assumptons Can only express relatvely smple functons of nputs Comng up wth good features can be hard Why not mae the classfer to more wor for us? What does the space of hypotheses loo le? How do we navgate n ths space? Neural Networ Motvaton Human brans are only nown example of actual ntellgence Indvdual neurons are slow, borng Brans succeed by usng massve parallelsm Idea: Copy what wors Rases many ssues: Is the computatonal metaphor suted to the computatonal hardware? How do we now f we are copyng the mportant part? Are we amng too low? Why Neural Networs? Maybe computers should be more bran-le: Computers Brans Computatonal Unts 0 9 gates/cpu 0 neurons Storage Unts 0 0 bts RAM 0 2 bts HD Cycle Tme 0-9 S 0-3 S 0 neurons 0 4 synapses Bandwdth 0 0 bts/s* 0 4 bts/s Compute Power 0 0 Ops/s 0 4 Ops/s Comments on Blue Gene Blue Gene: World s Fastest Supercomputer 360 Teraflops Currently at 3,000 processors 0 3 -> 0 4 Ops/s (bran level?) 6 TB memory (0 3 bts) 4 Megawatts power ($M/year n electrcty) 2500 sq ft sze (nce szed house) Pctures and other detals: http://domno.research.bm.com/comm/pr.nsf/pages/rsc.bluegene_2004.html More Comments on Blue Gene What s wrong wth ths pcture? Weght Sze Power Consumpton What s mssng? Stll can t replcate human abltes (though vastly exceeds human abltes n many areas) Are we runnng the wrong programs? Is the archtecture well suted to the programs we mght need to run?

Artfcal Neural Networs Develop abstracton of functon of actual neurons Smulate large, massvely parallel artfcal neural networs on conventonal computers Some have tred to buld the hardware too Try to approxmate human learnng, robustness to nose, robustness to damage, etc. Use of neural networs Traned to pronounce Englsh Tranng set: Sldng wndow over text, sounds 95% accuracy on tranng set 78% accuracy on test set Traned to recognze handwrtten dgts >99% accuracy Traned to drve (Pomerleau s no-hands across Amerca) Neural Networ Lore Neural nets have been adopted wth an almost relgous fervor wthn the AI communty - several tmes Often ascrbed near magcal powers by people, usually those who now the least about computaton or brans For most AI people, magc s gone, but neural nets reman extremely nterestng and useful mathematcal obects Artfcal Neurons a w, a a = f node/ neuron ( w, a ) f can be any functon, but usually a smoothed step functon g Threshold Functons Networ Archtectures.5 0.5 0-0.5 - -.5-0 -5 0 5 0 g(x)=tanh(x) or /(+exp(-x)) (logstc regresson) 0.5 0-0.5 g(x)=sgn(x) (perceptron) Cyclc vs. Acyclc Cyclc s trcy, but more bologcally plausble Hard to analyze n general May not be stable Need to assume latches to avod race condtons Hopfeld nets: specal type of cyclc net useful for assocatve memory Sngle layer (perceptron) Multple layer - -0-5 0 5 0 2

Feedforward Networs We consder acyclc networs One or more computatonal layers Entre networ can be vewed as computng a complex non-lnear functon Typcal uses n learnng: Classfcaton (usually nvolvng complex patterns) General contnuous functon approxmaton x w Perceptron node/ neuron f f s a smple step functon (sgn) Y Perceptron Learnng Update Rule We are gven a set of nputs x () x (n) t () t (n) s a set of target outputs (boolean) {-,} ws our set of weghts output of perceptron = w T x Perceptron_error(x (), w) = -net(x (),w) t () Goal: Pc w to optmze: mn w msclassf ed perceptron_ error( x ( ), w) Repeat untl convergence: msclassfed : w w terates over samples terates over weghts + αx ( ) ( ) t Learnng Rate (can be any constant) http://neuron.eng.wayne.edu/ava/perceptron/new38.html Observatons Lnear separablty s farly wea We have other trcs: Functons that are not lnearly separable n one space, may be lnearly separable n another space If we engneer our nputs to our neural networ, then we change the space n whch we are constructng lnear separators Every functon has a lnear separator (n some space) Perhaps other networ archtectures wll help Is red lnearly separable from green? Are the crcles lnearly separable from the squares? 3

Multlayer Networs Once people realzed how smple perceptrons were, they lost nterest n neural networs for a whle Multlayer networs turn out to be much more expressve (wth a smoothed step functon) Use sgmod, e.g., tanh(w T x) Wth 2 layers, can represent any contnuous functon Wth 3 layers, can represent many dscontnuous functons Trcy part: How to adust the weghts Smoothng Thngs Out Idea: Do gradent descent on a smooth error functon Error functon s sum of squared errors Consder a sngle tranng example frst ( ) 2 E = 0.5error( X, w) = = z z a z a = w w z z z = f a ) ( Propagatng Errors For output unts (assumng no weghts on outputs) For hdden unts = = y t = All upstream nodes from f '( a ) a z w δ a = w w z z z = f a ) ( Puttng t together Apply nput xto networ (sum for multple nputs) Compute all actvaton levels Compute fnal output Compute δfor output unts δ = y t Bacpropagate δ s to hdden unts δ = f '( a ) = Compute gradent update w δ a Summary of Gradent Update Gradent calculaton, parameter update have recursve formulaton Decomposes nto: Local message passng No transcendentals: g (x)=-g(x) 2 for tanh(x) Hghly parallelzable Bologcally plausble(?) Celebrated bacpropagaton algorthm Good News Can represent any contnuous functon wth two layers ( hdden) Can represent essentally any functon wth 3 layers (But how many hdden nodes?) Multlayer nets are a unversal approxmaton archtecture wth a hghly parallelzable tranng algorthm 4

Bac-prop Issues Bacprop = gradent descent on error functon Functon s nonlnear (= powerful) Functon s nonlnear (= local mnma) Bg nets: Many parameters Many optma Slow gradent descent Rs of overfttng Bologcal plausblty Electronc plausblty Many NN experts became experts n numercal analyss (by necessty) Neural Networ Trcs Many gradent descent acceleraton trcs Early stoppng Methods of enforcng transformaton nvarance Modfy error functon Transform/augment tranng data Weght sharng Handcrafted networ archtectures Neural Nets n Practce Many applcatons for pattern recognton tass Very powerful representaton Can overft Can fal to ft wth too many parameters, poor features Very wdely deployed AI technology, but Few open research questons (Best way to get a machne learnng paper reected: Neural Networ n ttle.) Connecton to bology stll uncertan Results are hard to nterpret Second best way to solve any problem Can do ust about anythng w/enough twddlng Now thrd or fourth to SVMs, boostng, and??? 5