MACHINE LEARNING AND PATTERN RECOGNITION Fall 2004, Lecture 1: Introduction and Basic Concepts Yann LeCun
|
|
- Lily Perkins
- 5 years ago
- Views:
Transcription
1 Y. LeCun: Machine Learning and Pattern Recognition p. 1/2 MACHINE LEARNING AND PATTERN RECOGNITION Fall 2004, Lecture 1: Introduction and Basic Concepts Yann LeCun The Courant Institute, New York University
2 Y. LeCun: Machine Learning and Pattern Recognition p. 2/2 Before we get started... Course web site: yann/2004f-g /index.html Evaluation: Assignements (mostly small programming projects) [60%] + larger final project [40%]. Course mailing list: Text Books: mainly Pattern Classification by Duda, Hart, and Stork, but a number of other books can be used reference material: Neural Networks for Pattern Recognition by Bishop, and Element of Statistical Learning by Hastie, Tibshirani and Friedman.... but we will mostly use resarch papers and tutorials. formal prerequisite: linear algebra. You might want to brush up on probability theory, multivariate calculus (partial derivatives...), optimization (least square method...), and the method of Lagrange multipliers for constrained optimization. We will review those topics in class. Programming projects: can be done in any language, but I STRONGLY recommend to use Lush ( ). Skeleton code in Lush will be provided for most projects.
3 Y. LeCun: Machine Learning and Pattern Recognition p. 3/2 What is Learning? Learning is acquiring and improving performance through experience. Pretty much all animals with a central nervous system are capable of learning (even the simplest ones). What does it mean for a computer to learn? Why would we want them to learn? How do we get them to learn? We want computers to learn when it is too difficult or too expensive to program them directly to perform a task. Get the computer to program itself by showing examples of inputs and outputs. In reality: we will write a parameterized program, and let the learning algorithm find the set of parameters that best approximates the desired function or behavior.
4 Y. LeCun: Machine Learning and Pattern Recognition p. 4/2 Different Types of Learning Supervised Learning: given training examples of inputs and corresponding outputs, produce the correct outputs for new inputs. Example: character recognition. Reinforcement Learning (similar to animal learning): an agent takes inputs from the environment, and takes actions that affect the environment. Occasionally, the agent gets a scalar reward or punishment. The goal is to learn to produce action sequences that maximize the expected reward (e.g. driving a robot without bumping into obstacles). I won t talk much about that in this course. Unsupervised Learning: given only inputs as training, find structure in the world: discover clusters, manifolds, characterize the areas of the space to which the observed inputs belong (e.g.: clustering, probability density estimation, novelty detection, compression, embedding).
5 Y. LeCun: Machine Learning and Pattern Recognition p. 5/2 Related Fields Statistical Estimation: statistical estimation attempts to solve the same problem as machine learning. Most learning techniques are statistical in nature. Pattern Recognition: pattern recognition is when the output of the learning machine is a set of discrete categories. Neural Networks: neural nets are now one many techniques for statistical machine learning. Data Mining: data mining is a large application area for machine learning. Adaptive Optimal Control: non-linear adaptive control techniques are very similar to machine learning methods. Machine Learning methods are an essential ingredient in many fields: bio-informatics, natural language processing, web search and text classification, speech and handwriting recognition, fraud detection, financial time-series prediction, industrial process control, database marketing...
6 Applications handwriting recognition, OCR: reading checks and zipcodes, handwriting recognition for tablet PCs. speech recognition, speaker recognition/verification security: face detection and recognition, event detection in videos. text classification: indexing, web search. computer vision: object detection and recognition. diagnosis: medical diagnosis (e.g. pap smears processing) adaptive control: locomotion control for legged robots, navigation for mobile robots, minimizing pollutant emissions for chemical plants, predicting consumption for utilites... fraud detection: e.g. detection of unusual usage patterns for credit cards or calling cards. database marketing: predicting who is more likely to respond to an ad campaign. (...and the antidote) spam filtering. games (e.g. backgammon). Financial prediction (many people on Wall Street use machine learning). Y. LeCun: Machine Learning and Pattern Recognition p. 6/2
7 Y. LeCun: Machine Learning and Pattern Recognition p. 7/2 Demos / Concrete Examples Handwritten Digit Recognition: supervised learning for classification Handwritten Word Recognition: weakly supervised learning for classification with many classes Face detection: supervised learning for detection (faces against everything else in the world). Object Recognition: supervised learning for detection and recognition with highly complex variabilities Robot Navigation: supervised learning and reinforcement learning for control.
8 Y. LeCun: Machine Learning and Pattern Recognition p. 8/2 Two Kinds of Supervised Learning Regression: also known as curve fitting or function approximation. Learn a continuous input-output mapping from a limited number of examples (possibly noisy). Classification: outputs are discrete variables (category labels). Learn a decision boundary that separates one class from the other. Generally, a confidence is also desired (how sure are we that the input belongs to the chosen category).
9 Y. LeCun: Machine Learning and Pattern Recognition p. 9/2 Unsupervised Learning Unsupervised learning comes down to this: if the input looks like the training samples, output a small number, if it doesn t, output a large number. This is a horrendously ill-posed problem in high dimension. To do it right, we must guess/discover the hidden structure of the inputs. Methods differ by their assumptions about the nature of the data. A Special Case: Density Estimation. Find a function f such f(x) approximates the probability density of X, p(x), as well as possible. Clustering: discover clumps of points Embedding: discover low-dimensional manifold or surface near which the data lives. Compression/Quantization: discover a function that for each input computes a compact code from which the input can be reconstructed.
10 Y. LeCun: Machine Learning and Pattern Recognition p. 10/2 Learning is NOT Memorization rote learning is easy: just memorize all the training examples and their corresponding outputs. when a new input comes in, compare it to all the memorized samples, and produce the output associated with the matching sample. PROBLEM: in general, new inputs are different from training samples. The ability to produce correct outputs or behavior on previously unseen inputs is called GENERALIZATION. rote learning is memorization without generalization. The big question of Learning Theory (and practice): how to get good generalization with a limited number of examples.
11 Y. LeCun: Machine Learning and Pattern Recognition p. 11/2 A Simple Trick: Nearest Neighbor Matching Instead of insisting that the input be exactly identical to one of the training samples, let s compute the distances between the input and all the memorized samples (aka the prototypes). 1-Nearest Neighbor Rule: pick the class of the nearest prototype. K-Nearest Neighbor Rule: pick the class that has the majority among the K nearest prototypes. PROBLEM: What is the right distance measure? PROBLEM: This is horrendously expensive if the number of prototypes is large. PROBLEM: do we have any guarantee that we get the best possible performance as the number of training samples increases?
12 Y. LeCun: Machine Learning and Pattern Recognition p. 12/2 How Biology Does It The first attempts at machine learning in the 50 s, and the development of artificial neural networks in the 80 s and 90 s were inspired by biology. Nervous Systems are networks of neurons interconnected through synapses Learning and memory are changes in the efficacy of the synapses HUGE SIMPLIFICATION: a neuron computes a weighted sum of its inputs (where the weights are the synaptic efficacies) and fires when that sum exceeds a threshold. Hebbian learning (from Hebb, 1947): synaptic weights change as a function of the pre- and post-synaptic activities. orders of magnitude: each neuron has 10 3 to 10 5 synapses. Brain sizes (number of neurons): house fly: 10 5 ; mouse: , human:
13 Y. LeCun: Machine Learning and Pattern Recognition p. 13/2 The Linear Classifier Historically, the Linear Classifier was designed as a highly simplified model of the neuron (McCulloch and Pitts 1943, Rosenblatt 1957): i=n y = f( i=0 w i x i ) With f is the threshold function: f(z) = 1 iff z > 0, f(z) = 1 otherwise. x 0 is assumed to be constant equal to 1, and w 0 is interpreted as a bias. In vector form: W = (w 0, w 1...w n ), X = (1, x 1...x n ): y = f(w X) The hyperplane W X = 0 partitions the space in two categories. W is orthogonal to the hyperplane.
14 Y. LeCun: Machine Learning and Pattern Recognition p. 14/2 Vector Inputs With vector-based classifiers such as the linear classifier, we must represent objects in the world as vectors. Each component is a measurement or a feature of the the object to be classified. For example, the grayscale values of all the pixels in an image can be seen as a (very high-dimensional) vector.
15 A Simple Idea for Learning: Error Correction We have a training set Sconsisting of P input-output pairs: S = (X 1, y 1 ), (X 2, y 2 ),...(X P, y P ). A very simple algorithm: - show each sample in sequence repetitively - if the output is correct: do nothing - if the output is -1 and the desired output +1: increase the weights whose inputs are positive, decrease the weights whose inputs are negative. - if the output is +1 and the desired output -1: decrease the weights whose inputs are positive, increase the weights whose inputs are negative. More formally, for sample p: w i (t + 1) = w i (t) + (y p i f(w X p ))x p i This simple algorithm is called the Perceptron learning procedure (Rosenblatt 1957). Y. LeCun: Machine Learning and Pattern Recognition p. 15/2
16 Y. LeCun: Machine Learning and Pattern Recognition p. 16/2 The Perceptron Learning Procedure Theorem: If the classes are linearly separable (i.e. separable by a hyperplane), then the Perceptron procedure will converge to a solution in a finite number of steps. Proof: Let s denote by W a normalized vector in the direction of a solution. Suppose all X are within a ball of radius R. Without loss of generality, we replace all X p whose y p is -1 by X p, and set all y p to 1. Let us now define the margin M = min p W X p. Each time there is an error, W.W increases by at least X.W M. This means W final.w NM where N is the total number of weight updates (total number of errors). But, the change in square magnitude of W is bounded by the square magnitude of the current sample X p, which is itself bounded by R 2. Therefore, W final 2 NR 2. combining the two inequalities W final.w NM and W final NR, we have W final.w / W final (N)M/R. Since the left hand side is upper bounded by 1, we deduce N R 2 /M 2
17 Y. LeCun: Machine Learning and Pattern Recognition p. 17/2 Good News, Bad News The perceptron learning procedure can learn a linear decision surface, if such a surface exists that separates the two classes. If no perfect solution exists, the perceptron procedure will keep wobbling around. What class of problems is Linearly Separable, and learnable by a Perceptron? There are many interesting applications where the data can be represented in a way that makes the classes (nearly) linearly separable: e.g. text classification using bag of words representations (e.g. for spam filtering). Unfortunately, the really interesting applications are generally not linearly separable. This is why most people abandonned the field between the late 60 s and the early 80 s. We will come back to the linear separability problem later.
18 Regression, Mean Squared Error Regression or function approximation is finding a function that approximates a set of samples as well as possible. Classic example: linear regression. We are given a training set S of input/output pairs S = {(X 1, y 1 ), (X 2, y 2 )...(X P, y P )}, and we must find the parameters of a linear function that best predicts the y s from the X s in the least square sense. In other words, we must find the parameter W that minimizes the quadratic loss function L(W, S): L(W, S) = 1 P P L(W, y i, X i ) where the per-sample loss function L(W, y i, X i ) is defined as: L(W, y i, X i ) = 1 2 (yi W X i ) 2 Y. LeCun: Machine Learning and Pattern Recognition p. 18/2
19 Y. LeCun: Machine Learning and Pattern Recognition p. 19/2 Regression: Solution L(W ) = 1 P P 1 2 (yi W X i ) 2 W = argmin W L(W ) = argmin W 1 P At the solution, W satisfies the extremality condition: [ d 1 P P dl(w ) dw = 0 P P 1 2 (yi W X i ) 2 ] dw d [ 1 2 (yi W X i ) 2] dw = (yi W X i ) 2 = 0
20 Y. LeCun: Machine Learning and Pattern Recognition p. 20/2 A digression on multivariate calculus W : a vector of dimension N. W denotes the transpose of W, i.e. if W is N 1 (column vector), W is 1 N (line vector). F (W ): a multivariate scalar-valued function (an N-dimensional surface in an N + 1 dimensional space). df (W ) dw = [ F (W ) w 1, F (W ) w 2,... F (W ) w N is the gradient of F (W ) with respect to W (it s a line vector). The gradient of a function that maps N-dim vectors scalars is a 1 N line vector. example 1: linear function: F (W ) = W X where X is an N-dim vector: d(w X) dw = X example 2: quadratic function F (W ) = (y W X) 2 where y is a scalar: d(y W X) 2 dw = 2(y W X)X. ]
21 Y. LeCun: Machine Learning and Pattern Recognition p. 21/2 Regression: Solution The gradient of L(W) is: dl(w ) dw = P d [ 1 2 (yi W X i ) 2] dw = P (y i W X i )X i The extremality condition becomes: 1 P P (y i W X i )X i = 0 Which we can rewrite as: [ P [ P y i X i ] X i X i ] W = 0
22 Y. LeCun: Machine Learning and Pattern Recognition p. 22/2 Regression: Direct Solution Can be written as: P P y i X i [ X i X i ]W = 0 P [ X i X i ]W = P y i X i This is a linear system that can be solved with a number of traditional numerical methods (although it may be ill-conditioned or singular). If the covariance matrix A = P Xi X i is non singular, the solution is: W = [ P ] 1 X i X i P y i X i
23 Y. LeCun: Machine Learning and Pattern Recognition p. 23/2 Regression: Iterative Solution Gradient-based minimization: W (t + 1) = W (t) η dl(w ) dw where η is a well chosen coefficient (often a scalar, sometimes diagonal matrix with positive entries, occasionally a full symmetric positive definite matrix). The k-th component of the gradient of the quadratic loss L(W ) is: L(W ) w k = P (y i W (t) X i )x i k If η is a scalar or a diagonal matrix, we can write the udpate equation for a single component of W : w k (t + 1) = w k (t) + η P (yi W (t) X i )x i k This update rules converges for well-chosen, small-enough values of η (more on this later).
24 Y. LeCun: Machine Learning and Pattern Recognition p. 24/2 Regression, Online/Stochastic Gradient Online gradient descent, aka Stochastic Gradient: W (t + 1) = W (t) η d(w, Y i, X i ) dw w k (t + 1) = w k (t) + η(t)(y i W (t) X i )x i k No sum! The average gradient is replaced by its instantaneous value. This is called stochastic gradient descent. In many practical situation it is enormously faster than batch gradient. But the convergence analysis of this method is very tricky. One condition for convergence is that η(t) must be decreased according to a schedule such that t η(t)2 converges while t η(t) diverges. One possible such sequence is η(t) = η 0 /t. We can also use second-order methods, but we will keep that for later.
25 Y. LeCun: Machine Learning and Pattern Recognition p. 25/2 Least Mean Squared Error for Classification We can use the Mean Squared Error criterion with a linear regressor to perform classification (although this is clearly suboptimal). We compute a linear discriminant function G(W, X) = W X and compare it to a threshold T. If G(W, X) is larger than T, we classify X in class 1, if it is smaller than T, we classify X n class 2. To compute W, we simply minimize the quadratic loss function L(W ) = 1 P P 1 2 (yi W X i ) 2 where y i = +1 if training sample X i is of class 1 and y i = 1 if training sample X i is of class 2. This is called the Adaline algorithm (Widrow-Hoff 1960).
26 Y. LeCun: Machine Learning and Pattern Recognition p. 26/2 Linear Classifiers In multiple dimensions, the linear discriminant function G(W, X) = W X partitions the space into two half-spaces separated by a hyperplane.
27 Y. LeCun: Machine Learning and Pattern Recognition p. 27/2 A Richer Class of Functions What if we know that the process that generated our samples is non linear? We can use a richer family of functions, e.g. polynomials, sum of trigonometric functions... PROBLEM: if the family of functions is too rich, we run the risk of overfitting the data. If the family is too restrictive we run the risk of not being able to approximate the training data very well. QUESTIONS: How can we choose the richness of the family of functions? Can we predict the performance on new data as a function of the training error and the richness of the family of functions? Simply minimizing the training error may not give us a solution that will do well on new data.
28 Y. LeCun: Machine Learning and Pattern Recognition p. 28/2 Learning as Function Estimation pick a machine G(W, X) parameterized by W. It can be complicated and non-linear, but it better be differentiable with respect to W. pick a per-sample loss function L(Y, G(W, X)). pick a training set S = (X 1, Y 1 ), (X 2, Y 2 ),...(X P, Y P ). find the W that minimizes L(W, S) = 1 P i L(Y i, G(W, X i ))
29 Y. LeCun: Machine Learning and Pattern Recognition p. 29/2 Learning as Function Estimation (continued) If L(Y i, G(W, X i )) is differentiable with respect to W, use a gradient-based minimization technique: W W η L(W ) W or use a stochastic gradient minimization technique: W W η L(Y i, G(W, X i )) W
CS 6375 Machine Learning
CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.
More informationNeural networks. Chapter 19, Sections 1 5 1
Neural networks Chapter 19, Sections 1 5 Chapter 19, Sections 1 5 1 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 19, Sections 1 5 2 Brains 10
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationSections 18.6 and 18.7 Artificial Neural Networks
Sections 18.6 and 18.7 Artificial Neural Networks CS4811 - Artificial Intelligence Nilufer Onder Department of Computer Science Michigan Technological University Outline The brain vs. artifical neural
More informationECE662: Pattern Recognition and Decision Making Processes: HW TWO
ECE662: Pattern Recognition and Decision Making Processes: HW TWO Purdue University Department of Electrical and Computer Engineering West Lafayette, INDIANA, USA Abstract. In this report experiments are
More informationLecture 4: Feed Forward Neural Networks
Lecture 4: Feed Forward Neural Networks Dr. Roman V Belavkin Middlesex University BIS4435 Biological neurons and the brain A Model of A Single Neuron Neurons as data-driven models Neural Networks Training
More informationIntroduction to Neural Networks
Introduction to Neural Networks What are (Artificial) Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning
More informationNeural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21
Neural Networks Chapter 8, Section 7 TB Artificial Intelligence Slides from AIMA http://aima.cs.berkeley.edu / 2 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural
More informationLast update: October 26, Neural networks. CMSC 421: Section Dana Nau
Last update: October 26, 207 Neural networks CMSC 42: Section 8.7 Dana Nau Outline Applications of neural networks Brains Neural network units Perceptrons Multilayer perceptrons 2 Example Applications
More informationIntroduction To Artificial Neural Networks
Introduction To Artificial Neural Networks Machine Learning Supervised circle square circle square Unsupervised group these into two categories Supervised Machine Learning Supervised Machine Learning Supervised
More informationARTIFICIAL INTELLIGENCE. Artificial Neural Networks
INFOB2KI 2017-2018 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Artificial Neural Networks Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html
More informationThe perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt.
1 The perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt. The algorithm applies only to single layer models
More informationClassification with Perceptrons. Reading:
Classification with Perceptrons Reading: Chapters 1-3 of Michael Nielsen's online book on neural networks covers the basics of perceptrons and multilayer neural networks We will cover material in Chapters
More informationNeural networks. Chapter 20. Chapter 20 1
Neural networks Chapter 20 Chapter 20 1 Outline Brains Neural networks Perceptrons Multilayer networks Applications of neural networks Chapter 20 2 Brains 10 11 neurons of > 20 types, 10 14 synapses, 1ms
More informationApprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning
Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire
More informationNeural networks. Chapter 20, Section 5 1
Neural networks Chapter 20, Section 5 Chapter 20, Section 5 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 20, Section 5 2 Brains 0 neurons of
More informationSections 18.6 and 18.7 Artificial Neural Networks
Sections 18.6 and 18.7 Artificial Neural Networks CS4811 - Artificial Intelligence Nilufer Onder Department of Computer Science Michigan Technological University Outline The brain vs artifical neural networks
More information(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann
(Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) Human Brain Neurons Input-Output Transformation Input Spikes Output Spike Spike (= a brief pulse) (Excitatory Post-Synaptic Potential)
More informationLinear discriminant functions
Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative
More informationInf2b Learning and Data
Inf2b Learning and Data Lecture : Single layer Neural Networks () (Credit: Hiroshi Shimodaira Iain Murray and Steve Renals) Centre for Speech Technology Research (CSTR) School of Informatics University
More informationArtificial Neural Networks Examination, June 2005
Artificial Neural Networks Examination, June 2005 Instructions There are SIXTY questions. (The pass mark is 30 out of 60). For each question, please select a maximum of ONE of the given answers (either
More informationFeedforward Neural Nets and Backpropagation
Feedforward Neural Nets and Backpropagation Julie Nutini University of British Columbia MLRG September 28 th, 2016 1 / 23 Supervised Learning Roadmap Supervised Learning: Assume that we are given the features
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward
More informationArtificial Neural Networks
Artificial Neural Networks 鮑興國 Ph.D. National Taiwan University of Science and Technology Outline Perceptrons Gradient descent Multi-layer networks Backpropagation Hidden layer representations Examples
More informationCSC242: Intro to AI. Lecture 21
CSC242: Intro to AI Lecture 21 Administrivia Project 4 (homeworks 18 & 19) due Mon Apr 16 11:59PM Posters Apr 24 and 26 You need an idea! You need to present it nicely on 2-wide by 4-high landscape pages
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward
More informationKernelized Perceptron Support Vector Machines
Kernelized Perceptron Support Vector Machines Emily Fox University of Washington February 13, 2017 What is the perceptron optimizing? 1 The perceptron algorithm [Rosenblatt 58, 62] Classification setting:
More informationSections 18.6 and 18.7 Analysis of Artificial Neural Networks
Sections 18.6 and 18.7 Analysis of Artificial Neural Networks CS4811 - Artificial Intelligence Nilufer Onder Department of Computer Science Michigan Technological University Outline Univariate regression
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Neural networks Daniel Hennes 21.01.2018 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Logistic regression Neural networks Perceptron
More informationPart 8: Neural Networks
METU Informatics Institute Min720 Pattern Classification ith Bio-Medical Applications Part 8: Neural Netors - INTRODUCTION: BIOLOGICAL VS. ARTIFICIAL Biological Neural Netors A Neuron: - A nerve cell as
More informationSimple Neural Nets For Pattern Classification
CHAPTER 2 Simple Neural Nets For Pattern Classification Neural Networks General Discussion One of the simplest tasks that neural nets can be trained to perform is pattern classification. In pattern classification
More informationIntroduction to Machine Learning. Introduction to ML - TAU 2016/7 1
Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger
More informationEEE 241: Linear Systems
EEE 4: Linear Systems Summary # 3: Introduction to artificial neural networks DISTRIBUTED REPRESENTATION An ANN consists of simple processing units communicating with each other. The basic elements of
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationLinear Regression. S. Sumitra
Linear Regression S Sumitra Notations: x i : ith data point; x T : transpose of x; x ij : ith data point s jth attribute Let {(x 1, y 1 ), (x, y )(x N, y N )} be the given data, x i D and y i Y Here D
More informationDEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY
DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo
More informationOptimization Methods for Machine Learning (OMML)
Optimization Methods for Machine Learning (OMML) 2nd lecture (2 slots) Prof. L. Palagi 16/10/2014 1 What is (not) Data Mining? By Namwar Rizvi - Ad Hoc Query: ad Hoc queries just examines the current data
More informationThe Perceptron. Volker Tresp Summer 2014
The Perceptron Volker Tresp Summer 2014 1 Introduction One of the first serious learning machines Most important elements in learning tasks Collection and preprocessing of training data Definition of a
More informationArtificial Neural Networks The Introduction
Artificial Neural Networks The Introduction 01001110 01100101 01110101 01110010 01101111 01101110 01101111 01110110 01100001 00100000 01110011 01101011 01110101 01110000 01101001 01101110 01100001 00100000
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More informationNeural networks and optimization
Neural networks and optimization Nicolas Le Roux Criteo 18/05/15 Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 1 / 85 1 Introduction 2 Deep networks 3 Optimization 4 Convolutional
More informationARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92
ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 BIOLOGICAL INSPIRATIONS Some numbers The human brain contains about 10 billion nerve cells (neurons) Each neuron is connected to the others through 10000
More informationLecture 5 Neural models for NLP
CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm
More informationNeural Networks Lecture 2:Single Layer Classifiers
Neural Networks Lecture 2:Single Layer Classifiers H.A Talebi Farzaneh Abdollahi Department of Electrical Engineering Amirkabir University of Technology Winter 2011. A. Talebi, Farzaneh Abdollahi Neural
More informationIntroduction to Machine Learning
Introduction to Machine Learning Thomas G. Dietterich tgd@eecs.oregonstate.edu 1 Outline What is Machine Learning? Introduction to Supervised Learning: Linear Methods Overfitting, Regularization, and the
More informationLinear Discrimination Functions
Laurea Magistrale in Informatica Nicola Fanizzi Dipartimento di Informatica Università degli Studi di Bari November 4, 2009 Outline Linear models Gradient descent Perceptron Minimum square error approach
More informationArtificial Neural Networks Examination, June 2004
Artificial Neural Networks Examination, June 2004 Instructions There are SIXTY questions (worth up to 60 marks). The exam mark (maximum 60) will be added to the mark obtained in the laborations (maximum
More informationThe Perceptron. Volker Tresp Summer 2016
The Perceptron Volker Tresp Summer 2016 1 Elements in Learning Tasks Collection, cleaning and preprocessing of training data Definition of a class of learning models. Often defined by the free model parameters
More informationAdministration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6
Administration Registration Hw3 is out Due on Thursday 10/6 Questions Lecture Captioning (Extra-Credit) Look at Piazza for details Scribing lectures With pay; come talk to me/send email. 1 Projects Projects
More informationRegression and Classification" with Linear Models" CMPSCI 383 Nov 15, 2011!
Regression and Classification" with Linear Models" CMPSCI 383 Nov 15, 2011! 1 Todayʼs topics" Learning from Examples: brief review! Univariate Linear Regression! Batch gradient descent! Stochastic gradient
More informationFoundations of Machine Learning
Introduction to ML Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu page 1 Logistics Prerequisites: basics in linear algebra, probability, and analysis of algorithms. Workload: about
More informationSupport Vector Machines for Classification: A Statistical Portrait
Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,
More informationMachine Learning! in just a few minutes. Jan Peters Gerhard Neumann
Machine Learning! in just a few minutes Jan Peters Gerhard Neumann 1 Purpose of this Lecture Foundations of machine learning tools for robotics We focus on regression methods and general principles Often
More informationNeural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav
Neural Networks 30.11.2015 Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav 1 Talk Outline Perceptron Combining neurons to a network Neural network, processing input to an output Learning Cost
More informationMachine learning for pervasive systems Classification in high-dimensional spaces
Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version
More informationNeural networks and optimization
Neural networks and optimization Nicolas Le Roux INRIA 8 Nov 2011 Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov 2011 1 / 80 1 Introduction 2 Linear classifier 3 Convolutional neural networks
More informationArtificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen
Artificial Neural Networks Introduction to Computational Neuroscience Tambet Matiisen 2.04.2018 Artificial neural network NB! Inspired by biology, not based on biology! Applications Automatic speech recognition
More informationIntroduction to Machine Learning
Introduction to Machine Learning CS4731 Dr. Mihail Fall 2017 Slide content based on books by Bishop and Barber. https://www.microsoft.com/en-us/research/people/cmbishop/ http://web4.cs.ucl.ac.uk/staff/d.barber/pmwiki/pmwiki.php?n=brml.homepage
More informationMaster Recherche IAC TC2: Apprentissage Statistique & Optimisation
Master Recherche IAC TC2: Apprentissage Statistique & Optimisation Alexandre Allauzen Anne Auger Michèle Sebag LIMSI LRI Oct. 4th, 2012 This course Bio-inspired algorithms Classical Neural Nets History
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationArtificial Neural Networks. Historical description
Artificial Neural Networks Historical description Victor G. Lopez 1 / 23 Artificial Neural Networks (ANN) An artificial neural network is a computational model that attempts to emulate the functions of
More informationThe Perceptron. Volker Tresp Summer 2018
The Perceptron Volker Tresp Summer 2018 1 Elements in Learning Tasks Collection, cleaning and preprocessing of training data Definition of a class of learning models. Often defined by the free model parameters
More informationBinary Classification / Perceptron
Binary Classification / Perceptron Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Supervised Learning Input: x 1, y 1,, (x n, y n ) x i is the i th data
More informationBits of Machine Learning Part 1: Supervised Learning
Bits of Machine Learning Part 1: Supervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology) Outline of the Course 1. Supervised Learning Regression and Classification
More informationLinear Regression (continued)
Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression
More informationBrief Introduction of Machine Learning Techniques for Content Analysis
1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview
More informationIntroduction to Artificial Neural Networks
Facultés Universitaires Notre-Dame de la Paix 27 March 2007 Outline 1 Introduction 2 Fundamentals Biological neuron Artificial neuron Artificial Neural Network Outline 3 Single-layer ANN Perceptron Adaline
More informationMachine Learning. Neural Networks
Machine Learning Neural Networks Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 Biological Analogy Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 THE
More informationData Informatics. Seon Ho Kim, Ph.D.
Data Informatics Seon Ho Kim, Ph.D. seonkim@usc.edu What is Machine Learning? Overview slides by ETHEM ALPAYDIN Why Learn? Learn: programming computers to optimize a performance criterion using example
More informationSTA 414/2104: Lecture 8
STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA
More informationLast updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship
More informationCS 540: Machine Learning Lecture 1: Introduction
CS 540: Machine Learning Lecture 1: Introduction AD January 2008 AD () January 2008 1 / 41 Acknowledgments Thanks to Nando de Freitas Kevin Murphy AD () January 2008 2 / 41 Administrivia & Announcement
More informationNeural networks and support vector machines
Neural netorks and support vector machines Perceptron Input x 1 Weights 1 x 2 x 3... x D 2 3 D Output: sgn( x + b) Can incorporate bias as component of the eight vector by alays including a feature ith
More informationNeural Networks (Part 1) Goals for the lecture
Neural Networks (Part ) Mark Craven and David Page Computer Sciences 760 Spring 208 www.biostat.wisc.edu/~craven/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed
More informationCOMP 652: Machine Learning. Lecture 12. COMP Lecture 12 1 / 37
COMP 652: Machine Learning Lecture 12 COMP 652 Lecture 12 1 / 37 Today Perceptrons Definition Perceptron learning rule Convergence (Linear) support vector machines Margin & max margin classifier Formulation
More informationNeural Networks Learning the network: Backprop , Fall 2018 Lecture 4
Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:
More informationMachine Learning 2017
Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section
More informationTutorial on Machine Learning for Advanced Electronics
Tutorial on Machine Learning for Advanced Electronics Maxim Raginsky March 2017 Part I (Some) Theory and Principles Machine Learning: estimation of dependencies from empirical data (V. Vapnik) enabling
More informationIntelligent Systems Discriminative Learning, Neural Networks
Intelligent Systems Discriminative Learning, Neural Networks Carsten Rother, Dmitrij Schlesinger WS2014/2015, Outline 1. Discriminative learning 2. Neurons and linear classifiers: 1) Perceptron-Algorithm
More informationLecture 4: Perceptrons and Multilayer Perceptrons
Lecture 4: Perceptrons and Multilayer Perceptrons Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning Perceptrons, Artificial Neuronal Networks Lecture 4: Perceptrons
More informationThe Perceptron Algorithm, Margins
The Perceptron Algorithm, Margins MariaFlorina Balcan 08/29/2018 The Perceptron Algorithm Simple learning algorithm for supervised classification analyzed via geometric margins in the 50 s [Rosenblatt
More informationComputational Intelligence Lecture 3: Simple Neural Networks for Pattern Classification
Computational Intelligence Lecture 3: Simple Neural Networks for Pattern Classification Farzaneh Abdollahi Department of Electrical Engineering Amirkabir University of Technology Fall 2011 arzaneh Abdollahi
More informationAN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009
AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009 SUPERVISED LEARNING We are given some training data: We must learn a function If y is discrete, we call it classification If it is
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More informationMachine Learning and Adaptive Systems. Lectures 3 & 4
ECE656- Lectures 3 & 4, Professor Department of Electrical and Computer Engineering Colorado State University Fall 2015 What is Learning? General Definition of Learning: Any change in the behavior or performance
More informationStat 406: Algorithms for classification and prediction. Lecture 1: Introduction. Kevin Murphy. Mon 7 January,
1 Stat 406: Algorithms for classification and prediction Lecture 1: Introduction Kevin Murphy Mon 7 January, 2008 1 1 Slides last updated on January 7, 2008 Outline 2 Administrivia Some basic definitions.
More informationRecap from previous lecture
Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience
More informationCMSC 421: Neural Computation. Applications of Neural Networks
CMSC 42: Neural Computation definition synonyms neural networks artificial neural networks neural modeling connectionist models parallel distributed processing AI perspective Applications of Neural Networks
More informationMACHINE LEARNING AND PATTERN RECOGNITION Fall 2005, Lecture 4 Gradient-Based Learning III: Architectures Yann LeCun
Y. LeCun: Machine Learning and Pattern Recognition p. 1/3 MACHINE LEARNING AND PATTERN RECOGNITION Fall 2005, Lecture 4 Gradient-Based Learning III: Architectures Yann LeCun The Courant Institute, New
More informationClassifier Complexity and Support Vector Classifiers
Classifier Complexity and Support Vector Classifiers Feature 2 6 4 2 0 2 4 6 8 RBF kernel 10 10 8 6 4 2 0 2 4 6 Feature 1 David M.J. Tax Pattern Recognition Laboratory Delft University of Technology D.M.J.Tax@tudelft.nl
More informationGI01/M055: Supervised Learning
GI01/M055: Supervised Learning 1. Introduction to Supervised Learning October 5, 2009 John Shawe-Taylor 1 Course information 1. When: Mondays, 14:00 17:00 Where: Room 1.20, Engineering Building, Malet
More informationIntroduction to Machine Learning
Introduction to Machine Learning Neural Networks Varun Chandola x x 5 Input Outline Contents February 2, 207 Extending Perceptrons 2 Multi Layered Perceptrons 2 2. Generalizing to Multiple Labels.................
More informationCN2 1: Introduction. Paul Gribble. Sep 10,
CN2 1: Introduction Paul Gribble http://gribblelab.org Sep 10, 2012 Administrivia Class meets Mondays, 2:00pm - 3:30pm and Thursdays, 11:30am - 1:00pm, in NSC 245A Contact me with any questions or to set
More informationData Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.5. Spring 2010 Instructor: Dr. Masoud Yaghini Outline How the Brain Works Artificial Neural Networks Simple Computing Elements Feed-Forward Networks Perceptrons (Single-layer,
More informationCOMS 4771 Lecture Course overview 2. Maximum likelihood estimation (review of some statistics)
COMS 4771 Lecture 1 1. Course overview 2. Maximum likelihood estimation (review of some statistics) 1 / 24 Administrivia This course Topics http://www.satyenkale.com/coms4771/ 1. Supervised learning Core
More informationCSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression
CSC2515 Winter 2015 Introduction to Machine Learning Lecture 2: Linear regression All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html
More informationMachine Learning Linear Models
Machine Learning Linear Models Outline II - Linear Models 1. Linear Regression (a) Linear regression: History (b) Linear regression with Least Squares (c) Matrix representation and Normal Equation Method
More informationMath for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han
Math for Machine Learning Open Doors to Data Science and Artificial Intelligence Richard Han Copyright 05 Richard Han All rights reserved. CONTENTS PREFACE... - INTRODUCTION... LINEAR REGRESSION... 4 LINEAR
More informationLinear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7.
Preliminaries Linear models: the perceptron and closest centroid algorithms Chapter 1, 7 Definition: The Euclidean dot product beteen to vectors is the expression d T x = i x i The dot product is also
More information