Timo Koski Department of mathematics LiTH
|
|
- Charleen Stanley
- 5 years ago
- Views:
Transcription
1 Forskarskola i medicinsk bioinformatik Course for CMI PhD programme in Medical Bioinformatics : Support Vector Machines Part I Lecture Notes 1: Linear Learning Machines 8-9th of September, 2005 LiTH/Linköping University Timo Koski Department of mathematics LiTH 1
2 Overview of the lectures //Part I Support vector machines (SVM), as invented by V.N. Vapnik and his coworkers in the early 1990 s, are based on a combination of two ideas from s: linear learning machines and kernel mappings. These were put into perspective with statistical learning theory giving the capacity of a learning machine, as defined by Vapnik. In part I of the course we are going to discuss the above techniques in SVM for pattern recognition/classification. There are techniques of SVM regression and SVM time series prediction, too, and other applications not covered in these lectures. However, SVM regression is studied in the computer exercises. 2
3 Overview of the lectures //Part II In part II of the lectures we will deal with kernel classifiers from a Bayesian perspective. designing kernels (incl. string kernels and Fisher kernels) kernel PCA (= principal component analysis) kernel Fisher discriminant Basically, the reference is ch:s in B. Schölkopf & A.J. Smola (2002): Learning with Kernels, The MIT Press, Cambridge, Massachusetts, with added sections (to be announced later) of -P. Vert, K. Tsuda, & B. Schölkopf (ed.s: A Primer on Kernel Methods. in Kernel Methods in Computational Biology. The MIT Press, Cambridge Massachusetts, London, England. 3
4 This first set of lecture notes deals with the main building blocks of the most simple SVMs, and exhibits the concepts required for understanding more complex SVMs. We start with the theory of linear learning machines. There are some prerequisites, known as the geometry of learning. 4
5 TEXTS ON SUPPORT VECTOR MACHINES The lecture is mainly based on N. Christiani & J. Shawe-Taylor (2000): An Introduction to Support Vector Machines, Cambridge University Press, Cambridge. N.V. Vapnik (1995): The Nature of Statistical Learning Theory, Springer Verlag, New York. 5
6 OUTLINE OF TOPICS: vector spaces normed vector spaces, inner product spaces properties of inner products geometry of linear learning machines perceptron perceptron convergence dual representation of perceptron convergence 6
7 Vector Spaces, Norms and Inner Products We need a mathematical groundwork for a geometry of linear learning. We recall a few facts about vector spaces, norms, and inner products, as found in a plethora of books on, e.g., linear algebra or functional analysis. For a treatment that also includes some of the optimization theory needed for SVM we refer to D.G. Luenberger: Optimization by Vector Space Methods. Wiley Professional Paperbacks, John Wiley annd Sons,
8 Prerequisites (A) on Vector Spaces A set X with elements x, y, z..., referred to as vectors, is called a vector space, if there are two operations called addition of two vectors and multiplication of a vector by scalar, α R. These operations satisfy x + y X There is a neutral element 0 such that x + 0 = x αx X 1x = x X, 0x = 0 X 8
9 Prerequisites (A) on Vector Spaces and in addition x + y = y + x α(x + y) = αx + αy (α + β)x = αx + βx We take x y = x + ( 1)y and then x x = 0 9
10 Prerequisites (A) on Vector Spaces: An Example The standard example is the set R n of real column vectors of fixed dimension n. Let T denote transpose x = (x 1, x 2,..., x n ) T (a transposed row vector is a column vector), where x i R, i = 1,..., n. We define 0 = (0,0,...,0) T x + y = (x 1 + y 1, x 2 + y 2,..., x n + y n ) T αx = (αx 1, αx 2,..., αx n ) T 10
11 Prerequisites (A): Vector Spaces and Norms A vector space X is called a normed linear space, if there is a real-valued function that maps each x X to a number x with the following properties x 0 for all x X x = 0 if and only if x = 0. x+y x + y (triangle inequality) αx = α x (homogeneity) 11
12 Linear Vector Spaces and Norms: Examples Consider R n. Then x 2 and x are norms on R n : x 2 = ni=1 x 2 i (Euclidean norm) x = max 1 i n x i 12
13 Prerequisites (A): Norms and Distances In a normed linear space X the real-valued function d(x,y) = x y is called the distance between x and y. d(x,y) 0 for all x X d(x,y) = d(y,x). d(x,y) = 0 if and only if x = y. d(x,y) d(x,z) + d(y,z) (triangle inequality) d(αx, αy) = α d(x, y) (homogeneity) 13
14 Lengths and Balls in Normed Spaces Then the length of x is the distance from x to 0, i.e., d(x,0) = x 0 = x The open ball B τ (x) X of radius τ around x X is B τ (x) def = {y X y x < τ} 14
15 Prerequisites (A): Inner Product Spaces A vector space X is called an inner product space, if there is a function, called inner product, that maps each pair x,y of vectors in X to a number x,y with the following properties x,y = y,x x,x 0 with x,x = 0, if and only if x = 0. x + y,z = x,z + y,z x,y + z = x,y + x,z x, αy = α x,y Much of learning theory literature talks about dot products 15
16 Prerequisites (A): Inner Product Spaces Inner Product Spaces: Examples 1. Take X = R n, and x,y = n i=1 x i y i 2. Take X = R n, and x,y A = n n i=1 j=1 where A = ( a ij ) n.n i,j=1 a ij x i y j = x, Ay is a symmetric and non-negative definite matrix x, Ay = n i=1 n j=1 a ijx i x j 0 for all x. 16
17 Inner Product Spaces are Normed Spaces An inner product space X is automatically a normed linear space, since we can put x = x,x and, since x,x 0, Example: X = R n x 2 = x,x x = x,x = n x 2 i i=1 17
18 Examples: Mahalanobis distance When we take A = Σ 1 in x,y A and set x y 2 Σ 1 = x y,σ 1 (x y) we obtain a useful distance in pattern recognition known as the (squared) Mahalanobis distance. Regions of constant Mahalanobis distance to a fixed vector y are ellipsoids. 18
19 Properties of Inner Products Cauchy-Schwarz inequality x,y x y x + y 2 = x 2 + y x,y x y 2 = x 2 + y 2 2 x,y 19
20 Properties of Inner Product Spaces The angle θ between x and y in an inner product space is given by cos(θ) = x,y x y If x,y = 0, we say that x and y are orthogonal, since then cos(θ) = 0, and then θ = π/2 (within period). Also, we have then the Pythagorean relations x + y 2 = x 2 + y 2 x y 2 = x 2 + y 2 20
21 Orthonormal Vectors in Inner Product Spaces Let φ 1,..., φ n be a sequence of orthonormal vectors of an inner product space X. Orthonormality means that 1. φ i, φ j = 0, for i j. 2. φ i = 1 for all j. We note an example. 1. X = R n, and x,y = n i=1 x i y i. Then φ i = 0,0,..., }{{} 1,...,0, ith position are orthonormal vectors in X = R n. i = 1,..., n 21
22 Basis and Dimension We say that the orthonomal vectors φ 1,..., φ n form an orthonormal basis for X, if every x can be written x = x, φ 1 φ x, φ n φ n and that this is unique, also in the sense φ 1,..., φ n cannot be appended or reduced by a single orthonormal vector. Then n is the dimension of the inner product space X. We shall later on need to discuss inner product spaces X of infinite dimension, i.e., where formally speaking x = i=1 x, φ i φ i so that the basis has infinitely many orthonormal vectors. 22
23 Inner Product Spaces An inner product space supports thus geometric notions. Inner product spaces of finite dimension are often called Euclidean spaces. A Euclidean space X equipped with x,y X is isomorphic to R n with x,y = n i=1 x i y i in the sense that there is an invertible map ψ : X R n such that x,y X = ψ(x), ψ(y) 23
24 LINEAR LEARNING MACHINES: NOTATIONS Next we start studying linear learning machines, also known as perceptrons. X = INPUT SPACE which is also an inner product space (of finite dimension). Y = OUTPUT DOMAIN Examples of output domains are {+1, 1} (output domain for binary classification), or {1,2,..., m} (multiple classification). We consider functions X f R such that f (x) = w,x +b where w is a vector called weight vector, and b is a real number called bias. 24
25 Output domains are {+1, 1} Most of the theory to be presented deals with the binary output domain represented as {+1, 1}. This is, of course, restrictive, but this is not completely without practical relevance, e.g., in bioinformatics. To quote a problem: the prediction of constitutive and immunoproteasome cleavage sites in antigenic sequences See also the supporting material on the webpage of N. Christiani & J. Shawe-Taylor (2000): M. Basin, & P.S. Raghava (2005): Pcleavage, Nucleic Acids Research, 33, Web server issue W202 W207 25
26 LINEAR LEARNING MACHINES: Some intuition w,x w,y = w,x y by rules of inner product, and then w x y by Cauchy-Schwartz. Hence whenever the difference x y is small, the difference in the real-valued outputs is also small. 26
27 LINEAR LEARNING MACHINES: Some geometry A hyperplane is an affine subspace of dimension n 1, if n is the dimension of X, which divides the space into two half spaces. We consider the following hyperplane D X: where D = {x X f (x) = 0} f (x) = w,x +b Take x 1 D and x 2 D. Then w,x 1 = w,x 2 w,x 1 x 2 = 0. In other words, w is orthogonal to any vector in D. =parallel to a linear subspace, a linear subspace contains 0 27
28 LINEAR LEARNING MACHINES: Some geometry f (x) = w,x +b D = {x X f (x) = 0} A hyperplane is an affine subspace of dimension n 1, which divides the space into two half spaces, here R 1 and R 2, as R 1 = {x X f (x) > 0} R 2 = {x X f (x) < 0} By the preceding w is orthogonal to any vector in D and points to R 1. 28
29 LINEAR LEARNING MACHINES: Some geometry w f(x) > 0 f(x) <0 f( x ) =0 29
30 LINEAR LEARNING MACHINES: More geometry Let x p = P D (x p ) be the orthogonal projection of x onto D, i.e., x p,x x p = 0 x w x p f( x ) =0 30
31 LINEAR LEARNING MACHINES: More geometry Write x = x p + r x w w where r x is a number (to be determined) and w is a unit length vector in the direction of w. We compute the squared length w ( ) 2 x x p 2 w rx = r x w 2 = w 2 w x x p 2 = r 2 x x x p = r x Hence r x is the signed distance of x to D. This calculus is due to R.O. Duda, P.E. Hart, & D.G. Stork (2001): Pattern Classification. Second Edition., Wiley Interscience, New York, pp
32 LINEAR LEARNING MACHINES: More geometry In addition, with f (x) = w,x +b = w = w,x p +r x w +b = w,x w p + w, r x w +b = w,x p +b+ w, r }{{} x =f(x p ) = f (x p ) + w, r }{{} x =0 = w, r x w w w w w w 32
33 LINEAR LEARNING MACHINES: Signed distance = w, r x w w 1 = r x w w,w = r x w. In other words we have obtained f (x) = r x w r x = f (x) w Furthermore, f (0) = w,0 +b = 0+b = 0, i.e., r 0 = f (0) w = b w 33
34 LINEAR LEARNING MACHINES: Some geometry Recalling f (x) = r x w, if r x > 0, then x is on the positive side, and if r x < 0, then x is on the negative side. Here we take R 1 = {x X f (x) > 0} as the positive side of D R 2 = {x X f (x) < 0} as the negative side of D. 34
35 LINEAR LEARNING MACHINES: More geometry Since r 0 is the signed distance of 0 to D, and r 0 = b w we see that varying b moves D parallell to itself. If b > 0, then 0 is in R 1, and if b < 0, then 0 is in R 2. 35
36 LINEAR LEARNING MACHINES: More geometry r 0 = b w is the signed distance of 0 to D w f(x) > 0 x r x w w b w x p f(x) <0 f( x ) =0 36
37 LINEAR LEARNING MACHINE a.k.a. PERCEPTRON Let sign(x) = { +1 0 x, 1 x < 0 (Note that the definition of sign for x = 0 is arbitrary.) Then a linear learning machine for a binary classification task (or for a binary target concept), a.k.a. perceptron, processes an input x by giving the output y = sign(f (x)), y {+1, 1} This requires, of course, that we have, somehow, set the values of w and b. The learning of the weights w and the bias b from examples is our next topic of study. 37
38 LINEAR LEARNING MACHINE a.k.a. PERCEPTRON Frank Rosenblatt developed in 1957 the perceptron, i.e., y = sign(f (x)), y {+1, 1} as a model to understand human memory, learning, and cognition. The perceptron was based on biological ideas about networks of neurons in the brain. In 1960 Rosenblatt demonstrated a piece of electromechanical hardware, named Mark I Perceptron, the first machine that could learn to recognize and identify optical patterns. F. Rosenblatt (1958): The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain. Psychological Review, vol. 65, n:o 6, pp , Rosenblatt 38
39 LINEAR LEARNING MACHINE a.k.a. PERCEPTRON The perceptron theory, as it stood in the 1960 s, is very lucidly presented in N.J. Nilsson: The Mathematical Foundations of Learning Machines, originally published in 1965, reprinted in 1990, Morgan Kaufman Publishers, San Mateo, California. A thorough presentation and at the same time a devastating criticism of the performance of perceptrons, and then a reverse reappraisal, is in M.L. Minsky & S.A. Papert: Perceptrons (Expanded Edition), first in 1969, third (extended) printing in 1988, MIT Press, Cambridge, Massachusetts. When sign(f (x)) was replaced by smooth functions, e.g. tanh(f (x)), and several such perceptrons were layered after each other, and backpropagation algorithm was invented, the theory of neural networks or feedforward multilayer perceptrons, was launched. 39
40 PERCEPTRON & NEURAL NETWORKS A very clearly written survey of learning theory (with an electrical engineering point of view) is found in B. Widrow & M.A. Lehr: 30 Years of Adaptive Neural Networks: Perceptron, Madaline, and Backpropagation. Proceedings of the IEEE, vol. 78, 9, pp
41 PERCEPTRON TRAINING Let x i X, y i Y = {+1, 1}, i = 1,... l be l pairs of examples of a binary target concept. Then S = {(x 1, y 1 ),...,(x l, y l )} is called a training set. Then we consider the following learning task Supervised learning of perceptrons Find (w, b) using S. 41
42 PERCEPTRON TRAINING S = {(x 1, y 1 ),...,(x l, y l )} Supervised learning of perceptrons Task: Find (w, b) using S so that f (x i ) = y i for all i, i.e., all inputs are correctly classified. Questions: 1. Does a solution to this task exist? 2. If a solution exists, is there an algorithm that finds it in a finite number of steps? 42
43 Linear Separability We assume that there exists a hyperplane that correctly classifies S. More formally this can be stated as Linear Separability of S There exist (w, b ) such that if f (x) = w,x +b then for all (x i, y i ) S we have f (x) > 0 if and only if y i = +1. When this assumption is true, we call S linearly separable. 43
44 Linear Separability Linear separability of S is easy to illustrate with hyperplane geometry: The examples (x i, y i ) are represented by points with two labels and +. All points with labels + lie in R 1 and all points with labels lie in R 2 with regard to the hyperplane f (x) = f ( x ) =0 * 44
45 Linear Separability There are theorems giving sufficient conditions for separability. E.g., two disjoint convex sets in R k are linearly separable. Examples of convex sets are halfplanes, and open balls. This and other relevant results are found, e.g., in pp of O.L. Mangarian (1969): Nonlinear Programming. Tata Mc-Graw Hill Publishing Co.., Bombay, New Delhi. 45
46 CONVEXITY CONVEX NONCONVEX 46
47 A LINEARLY NONSEPARABLE SITUATION The examples correspond to points with the labels x and. The training set cannot be separated by a hyperplane without a considerable number of errors. x x x x x x x x x x x x x x x x x x x x x x x x x x x x x 47
48 Rosenblatt s perceptron algorithm An algorithm that finds a perceptron that classifies correctly a linearly separable training set can be constructed as follows. One assumes a weight vector and a bias, and then sweeps through the training set with the corresponding perceptron. For every incorrectly classified example the hyperplane direction is changed and the hyperplane is shifted parallell to itself. It turns out that this algorithm halts after a finite number of runs, i.e., it finds a correctly classifying perceptron in a finite time, and that there is an explicit upper bound on the number of sweeps through S. 48
49 Rosenblatt s perceptron algorithm Every time an example is classified incorrectly, the weight vector is updated by adding a term that is proportional to the error margin (with sign). We note that If y i ( w k,x i +b k ) > 0 then x i is correctly classified. Hence it would seem to be a good idea to update, i.e. rotate the hyperplane, by adding to the weight vector a vector proportional to y i ( w k,x i +b k ), as soon as this is negative. 49
50 Rosenblatt s perceptron algorithm The bias is updated (the hyperplane is shifted parallell to itself) by adding a term that is proportional to the square of the maximum length of the input vectors in the training set. The algorithm is run until no examples in the training set are longer misclassified. R = max 1 i l x i Clearly R gives radius for the ball B R (0), where all training inputs lie. In addition, we must have b < R, since otherwise all of S will lie on one side of the hyperplane, which is meaningful only for trivial training sets, i.e. those that contain only one label. 50
51 Algorithm 1 Rosenblatt s perceptron algorithm. Given linearly separable S and a learning rate η > 0 1: w 0 0, b 0 0, k 0 R max 1 i l x i 2: repeat 3: for i = 1 to l do 4: if y i ( w k,x i +b k ) 0 then 5: w k+1 w k + ηy i x i, b k+1 b k + ηy i R 2, k k + 1 6: end if 7: end for 8: until no mistakes made in the loop 9: return w k, b k, where k is the number of mistakes. 51
52 Functional margin The properties of the perceptron algorithm as well as of SVMs will be clearly understood, once we introduce the concept of a functional margin. Here we set for some (w, b) γ i := y i ( w,x i +b k ) = y i f (x i ) We call γ i the functional margin of (x i, y i ) with regard to (w, b). We set γ := min(γ 1,..., γ l ) and call γ the functional margin of S with regard to (w, b). 52
53 Functional margin & the geometric margin Recall from the above, r x = f(x) w, then γ i = y i f (x i ) = y i r xi w We obtain in view of the preceding geometric analysis, that the distance from x i to D is x i P D (x i ) = r xi If the input x i is correctly classified, then r xi = y i r xi Hence, in case w = 1, and all x i are correctly classified, γ = min(γ 1,..., γ l ) = min y i r xi Hence, here γ is the geometric margin. The geometric margin of S is the maximum of γ over all hyperplanes. 53
54 Functional margin & the geometric margin γ j x j γ i x i f ( x ) =0 54
55 Perceptron Convergence Theorem (Novikoff 1961) Let S be a non-trivial linearly separable training set and let for all i γ i = y i ( w,x i +b ) γ where w = 1. Then number of repeats made by Rosenblatt s perceptron algorithm is at most ( ) R 2 γ 55
56 Proof: Perceptron Convergence (1) We extend the inner product space to X R; for x = (x, x) and ŷ = (y, y). We take x+ŷ = (x + y, x + y), α x = (αx, αx) and we set x,ŷ ext = x,y +xy This is an inner product on X R. We set x i = (x i, R) ( ŵ = w, b ) R By our exteneded inner product above we get y i ŵ, x i ext = y i ( w,x i +b) Let now ŵ t 1 be the weight vector prior to the tth mistake. The tth mistake/update happens when (x i, y i ) is classified incorrectly by ŵ t 1, i.e., y i ŵ t 1, x i ext 0 56
57 Proof: Perceptron Convergence (2) In view of the pseudocode above the updates can be written as ( w t, b ) ( t w t 1 + ηy i x i, b ) t 1 R R + ηy ir ŵ t = ŵ t 1 + ηy i x i Let ( ŵ = w, b ) R correspond to a separating hyperplane with w and b. Then we get ŵ t,ŵ ext = ŵ t 1,w ext +ηy i x i,ŵ ext and x i,ŵ ext = x i,w +R b R = x i,w +b 57
58 Proof: Perceptron Convergence (3) Hence linear separability of S w.r.t. w, b entails ηy i x i,ŵ = ηy i ( x i,w +b ) ηγ. In other words we have obtained that ŵ t,ŵ ext w t 1,w +ηγ By successive application of this inequality we get ŵ t,ŵ ext tηγ 58
59 Proof: Perceptron Convergence (4) We use ŵ t = ŵ t 1 + ηy i x i and the rules for norms and inner products and get ŵ t 2 ext = ŵ t 1 2 ext+η 2 x t 1 2 ext+2η y i ŵ t 1,ŵ ext }{{} <0 ŵ t 1 2 ext + η 2 x t 1 2 ext by incorrect classification of (x i, y i ), and ŵ t 1 2 ext + η 2 ( x t R 2) ŵ t 1 2 ext + η 2 ( R 2 + R 2) ŵ t 1 2 ext + 2η2 R 2 I.e., by successive application ŵ t 2 ext 2tη2 R 2 59
60 Proof: Perceptron Convergence (5) By the results ŵ t,ŵ ext tηγ and ŵ t 2 ext 2tη 2 R 2 we get by Cauchy-Schwarz inequality tηγ ŵ t,ŵ ext ŵ t ext ŵ ext 2tηR ŵ ext We need to bound ŵ ext. 60
61 Proof: Perceptron Convergence (5) We need to bound ŵ ext and therefore we check ŵ 2 ext = w 2 + b2 R 2 and since S is non-trivial, In all we have obtained w }{{ 2 + R2 } R = 2 2 =1 tηγ 2tηR 2 ŵ ext tη2r i.e., t 2R γ and ( ) 2 2R t γ which verifies the assertion as claimed. 61
62 Comments on Perceptron Convergence The number of errors t was found to bounded by ( ) 2 2R t γ The bound does not depend on the learning rate (counterintuitive) A large geometric margin promises a fast convergence A small radius on inputs promises a fast convergence 62
63 If α 0, then The Scaling Freedom D = {x X = 0} = {x X w,x +b = 0} = {x X αw,x +αb = 0} Hence αw, αb corresponds to the same hyperplane as w, b. This was used in the proof above by taking α = 1 w, since then αw = 1 w = 1. w Elimination of scaling freedom is discussed in the lecture2. 63
64 Dual Form of the Perceptron Algorithm The perceptron algorithm works by adding misclassified examples with label +1, and by subtracting misclassified examples with label 1. Hence, as w 0 = 0, we have after the algorithm has stopped that w = l i=1 a i y i x i where a i is proportional to the number of times misclassification of x i has caused the weight to be updated. (The algorithm sweeps through S in some fixed order a finite number of times until the algorithm converges.) a i is also called the embedding strength of x i. 64
65 Dual Form of the Perceptron Algorithm Hence we can write w,x +b = l i=1 a i y i x i,x +b = l i=1 a i y i x i,x +b and hence the perceptron output is h(x) = sign l i=1 a i y i x i,x +b This gives a dual form of the algorithm. (The concept of duality will reappear and plays a big part later.) 65
66 Algorithm 2 Dual form of the perceptron algorithm. Given linearly separable S 1: a = (a 1, a 2,..., a l ) 0, b 0 0, k 0 R max 1 i l x i 2: repeat 3: for i = ( 1 to l do lj=1 4: if y i a j y j x j,x i +b ) 0 then 5: a i a i + 1, b b + y j R 2 6: end if 7: end for 8: until no mistakes made within the forloop 9: return a, b, define h(x). 66
67 Dual Form of the Perceptron Algorithm Then we have, by the perceptron convergence theorem, l j=1 a j ( 2R γ ) 2 The sum l j=1 a j measures thus the complexity of the target concept in the dual representation. NOTE: In the dual form of the perceptron algorithm the inputs appear only through their pairwise inner products x j,x i. 67
Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM
1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University
More informationPattern Recognition and Machine Learning. Perceptrons and Support Vector machines
Pattern Recognition and Machine Learning James L. Crowley ENSIMAG 3 - MMIS Fall Semester 2016 Lessons 6 10 Jan 2017 Outline Perceptrons and Support Vector machines Notation... 2 Perceptrons... 3 History...3
More informationLinear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7.
Preliminaries Linear models: the perceptron and closest centroid algorithms Chapter 1, 7 Definition: The Euclidean dot product beteen to vectors is the expression d T x = i x i The dot product is also
More informationSupport Vector Machines for Classification: A Statistical Portrait
Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,
More informationThe Perceptron. Volker Tresp Summer 2016
The Perceptron Volker Tresp Summer 2016 1 Elements in Learning Tasks Collection, cleaning and preprocessing of training data Definition of a class of learning models. Often defined by the free model parameters
More informationLast updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationBrief Introduction to Machine Learning
Brief Introduction to Machine Learning Yuh-Jye Lee Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU August 29, 2016 1 / 49 1 Introduction 2 Binary Classification 3 Support Vector
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationLearning Methods for Linear Detectors
Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2011/2012 Lesson 20 27 April 2012 Contents Learning Methods for Linear Detectors Learning Linear Detectors...2
More informationLecture Support Vector Machine (SVM) Classifiers
Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in
More informationLinear Discrimination Functions
Laurea Magistrale in Informatica Nicola Fanizzi Dipartimento di Informatica Università degli Studi di Bari November 4, 2009 Outline Linear models Gradient descent Perceptron Minimum square error approach
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer
More informationPreliminaries. Definition: The Euclidean dot product between two vectors is the expression. i=1
90 8 80 7 70 6 60 0 8/7/ Preliminaries Preliminaries Linear models and the perceptron algorithm Chapters, T x + b < 0 T x + b > 0 Definition: The Euclidean dot product beteen to vectors is the expression
More informationEE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015
EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,
More informationThe Perceptron Algorithm with Uneven Margins
The Perceptron Algorithm with Uneven Margins Yaoyong Li Royal Holloway, University of London, Egham, Surrey, TW20 0EX, UK Hugo Zaragoza Microsoft Research, 7 J J Thomson Avenue, CB3 0FB Cambridge, UK Ralf
More informationDiscriminative Direction for Kernel Classifiers
Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering
More informationCOMS 4771 Introduction to Machine Learning. Nakul Verma
COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW1 due next lecture Project details are available decide on the group and topic by Thursday Last time Generative vs. Discriminative
More informationLINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
LINEAR CLASSIFIERS Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification, the input
More informationPolyhedral Computation. Linear Classifiers & the SVM
Polyhedral Computation Linear Classifiers & the SVM mcuturi@i.kyoto-u.ac.jp Nov 26 2010 1 Statistical Inference Statistical: useful to study random systems... Mutations, environmental changes etc. life
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationThe Perceptron Algorithm
The Perceptron Algorithm Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 Outline The Perceptron Algorithm Perceptron Mistake Bound Variants of Perceptron 2 Where are we? The Perceptron
More informationLinear models and the perceptron algorithm
8/5/6 Preliminaries Linear models and the perceptron algorithm Chapters, 3 Definition: The Euclidean dot product beteen to vectors is the expression dx T x = i x i The dot product is also referred to as
More informationThe Perceptron. Volker Tresp Summer 2018
The Perceptron Volker Tresp Summer 2018 1 Elements in Learning Tasks Collection, cleaning and preprocessing of training data Definition of a class of learning models. Often defined by the free model parameters
More informationChemometrics: Classification of spectra
Chemometrics: Classification of spectra Vladimir Bochko Jarmo Alander University of Vaasa November 1, 2010 Vladimir Bochko Chemometrics: Classification 1/36 Contents Terminology Introduction Big picture
More informationLinear Classifiers. Michael Collins. January 18, 2012
Linear Classifiers Michael Collins January 18, 2012 Today s Lecture Binary classification problems Linear classifiers The perceptron algorithm Classification Problems: An Example Goal: build a system that
More informationSupport Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs
E0 270 Machine Learning Lecture 5 (Jan 22, 203) Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in
More informationLearning and Memory in Neural Networks
Learning and Memory in Neural Networks Guy Billings, Neuroinformatics Doctoral Training Centre, The School of Informatics, The University of Edinburgh, UK. Neural networks consist of computational units
More informationKernel Methods & Support Vector Machines
Kernel Methods & Support Vector Machines Mahdi pakdaman Naeini PhD Candidate, University of Tehran Senior Researcher, TOSAN Intelligent Data Miners Outline Motivation Introduction to pattern recognition
More informationSupport Vector Machines
Wien, June, 2010 Paul Hofmarcher, Stefan Theussl, WU Wien Hofmarcher/Theussl SVM 1/21 Linear Separable Separating Hyperplanes Non-Linear Separable Soft-Margin Hyperplanes Hofmarcher/Theussl SVM 2/21 (SVM)
More informationWorst-Case Analysis of the Perceptron and Exponentiated Update Algorithms
Worst-Case Analysis of the Perceptron and Exponentiated Update Algorithms Tom Bylander Division of Computer Science The University of Texas at San Antonio San Antonio, Texas 7849 bylander@cs.utsa.edu April
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationSupport Vector Machines.
Support Vector Machines www.cs.wisc.edu/~dpage 1 Goals for the lecture you should understand the following concepts the margin slack variables the linear support vector machine nonlinear SVMs the kernel
More informationAn Introduction to Statistical and Probabilistic Linear Models
An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning
More informationSupport'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan
Support'Vector'Machines Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan kasthuri.kannan@nyumc.org Overview Support Vector Machines for Classification Linear Discrimination Nonlinear Discrimination
More informationLecture Notes 1: Vector spaces
Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector
More informationLecture 9: Large Margin Classifiers. Linear Support Vector Machines
Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Perceptrons Definition Perceptron learning rule Convergence Margin & max margin classifiers (Linear) support vector machines Formulation
More informationSupport Vector Machine Classification via Parameterless Robust Linear Programming
Support Vector Machine Classification via Parameterless Robust Linear Programming O. L. Mangasarian Abstract We show that the problem of minimizing the sum of arbitrary-norm real distances to misclassified
More informationLecture 6. Notes on Linear Algebra. Perceptron
Lecture 6. Notes on Linear Algebra. Perceptron COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne This lecture Notes on linear algebra Vectors
More informationMachine Learning & SVM
Machine Learning & SVM Shannon "Information is any difference that makes a difference. Bateman " It has not escaped our notice that the specific pairing we have postulated immediately suggests a possible
More informationSYLLABUS UNDER AUTONOMY MATHEMATICS
SYLLABUS UNDER AUTONOMY SEMESTER III Calculus and Analysis MATHEMATICS COURSE: A.MAT.3.01 [45 LECTURES] LEARNING OBJECTIVES : To learn about i) lub axiom of R and its consequences ii) Convergence of sequences
More informationStatistical learning theory, Support vector machines, and Bioinformatics
1 Statistical learning theory, Support vector machines, and Bioinformatics Jean-Philippe.Vert@mines.org Ecole des Mines de Paris Computational Biology group ENS Paris, november 25, 2003. 2 Overview 1.
More informationLecture 4: Linear predictors and the Perceptron
Lecture 4: Linear predictors and the Perceptron Introduction to Learning and Analysis of Big Data Kontorovich and Sabato (BGU) Lecture 4 1 / 34 Inductive Bias Inductive bias is critical to prevent overfitting.
More information1 Machine Learning Concepts (16 points)
CSCI 567 Fall 2018 Midterm Exam DO NOT OPEN EXAM UNTIL INSTRUCTED TO DO SO PLEASE TURN OFF ALL CELL PHONES Problem 1 2 3 4 5 6 Total Max 16 10 16 42 24 12 120 Points Please read the following instructions
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationChapter ML:VI. VI. Neural Networks. Perceptron Learning Gradient Descent Multilayer Perceptron Radial Basis Functions
Chapter ML:VI VI. Neural Networks Perceptron Learning Gradient Descent Multilayer Perceptron Radial asis Functions ML:VI-1 Neural Networks STEIN 2005-2018 The iological Model Simplified model of a neuron:
More informationSupport Vector Machines
EE 17/7AT: Optimization Models in Engineering Section 11/1 - April 014 Support Vector Machines Lecturer: Arturo Fernandez Scribe: Arturo Fernandez 1 Support Vector Machines Revisited 1.1 Strictly) Separable
More informationSupport Vector Machines. Machine Learning Fall 2017
Support Vector Machines Machine Learning Fall 2017 1 Where are we? Learning algorithms Decision Trees Perceptron AdaBoost 2 Where are we? Learning algorithms Decision Trees Perceptron AdaBoost Produce
More informationA short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie
A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie Computational Biology Program Memorial Sloan-Kettering Cancer Center http://cbio.mskcc.org/leslielab
More informationThe Perceptron. Volker Tresp Summer 2014
The Perceptron Volker Tresp Summer 2014 1 Introduction One of the first serious learning machines Most important elements in learning tasks Collection and preprocessing of training data Definition of a
More informationKernel Methods. Outline
Kernel Methods Quang Nguyen University of Pittsburgh CS 3750, Fall 2011 Outline Motivation Examples Kernels Definitions Kernel trick Basic properties Mercer condition Constructing feature space Hilbert
More informationCh4: Perceptron Learning Rule
Ch4: Perceptron Learning Rule Learning Rule or Training Algorithm: A procedure for modifying weights and biases of a network. Learning Rules Supervised Learning Reinforcement Learning Unsupervised Learning
More informationSample width for multi-category classifiers
R u t c o r Research R e p o r t Sample width for multi-category classifiers Martin Anthony a Joel Ratsaby b RRR 29-2012, November 2012 RUTCOR Rutgers Center for Operations Research Rutgers University
More informationNeural Networks. Prof. Dr. Rudolf Kruse. Computational Intelligence Group Faculty for Computer Science
Neural Networks Prof. Dr. Rudolf Kruse Computational Intelligence Group Faculty for Computer Science kruse@iws.cs.uni-magdeburg.de Rudolf Kruse Neural Networks 1 Supervised Learning / Support Vector Machines
More informationMachine Learning A Geometric Approach
Machine Learning A Geometric Approach Linear Classification: Perceptron Professor Liang Huang some slides from Alex Smola (CMU) Perceptron Frank Rosenblatt deep learning multilayer perceptron linear regression
More informationPerceptron Mistake Bounds
Perceptron Mistake Bounds Mehryar Mohri, and Afshin Rostamizadeh Google Research Courant Institute of Mathematical Sciences Abstract. We present a brief survey of existing mistake bounds and introduce
More informationChapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space.
Chapter 1 Preliminaries The purpose of this chapter is to provide some basic background information. Linear Space Hilbert Space Basic Principles 1 2 Preliminaries Linear Space The notion of linear space
More informationCOMP 652: Machine Learning. Lecture 12. COMP Lecture 12 1 / 37
COMP 652: Machine Learning Lecture 12 COMP 652 Lecture 12 1 / 37 Today Perceptrons Definition Perceptron learning rule Convergence (Linear) support vector machines Margin & max margin classifier Formulation
More informationSUPPORT VECTOR MACHINE
SUPPORT VECTOR MACHINE Mainly based on https://nlp.stanford.edu/ir-book/pdf/15svm.pdf 1 Overview SVM is a huge topic Integration of MMDS, IIR, and Andrew Moore s slides here Our foci: Geometric intuition
More informationMATH 304 Linear Algebra Lecture 19: Least squares problems (continued). Norms and inner products.
MATH 304 Linear Algebra Lecture 19: Least squares problems (continued). Norms and inner products. Orthogonal projection Theorem 1 Let V be a subspace of R n. Then any vector x R n is uniquely represented
More informationMachine Learning 2017
Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section
More informationNeural Networks and the Back-propagation Algorithm
Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely
More informationLinear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights
Linear Discriminant Functions and Support Vector Machines Linear, threshold units CSE19, Winter 11 Biometrics CSE 19 Lecture 11 1 X i : inputs W i : weights θ : threshold 3 4 5 1 6 7 Courtesy of University
More informationLinear Classification and SVM. Dr. Xin Zhang
Linear Classification and SVM Dr. Xin Zhang Email: eexinzhang@scut.edu.cn What is linear classification? Classification is intrinsically non-linear It puts non-identical things in the same class, so a
More informationCS 4700: Foundations of Artificial Intelligence
CS 4700: Foundations of Artificial Intelligence Prof. Bart Selman selman@cs.cornell.edu Machine Learning: Neural Networks R&N 18.7 Intro & perceptron learning 1 2 Neuron: How the brain works # neurons
More informationLINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning
LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES Supervised Learning Linear vs non linear classifiers In K-NN we saw an example of a non-linear classifier: the decision boundary
More information6.036 midterm review. Wednesday, March 18, 15
6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that
More informationSupport Vector Machines
Support Vector Machines Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support Overview Motivation
More informationPart 1a: Inner product, Orthogonality, Vector/Matrix norm
Part 1a: Inner product, Orthogonality, Vector/Matrix norm September 19, 2018 Numerical Linear Algebra Part 1a September 19, 2018 1 / 16 1. Inner product on a linear space V over the number field F A map,
More informationCS798: Selected topics in Machine Learning
CS798: Selected topics in Machine Learning Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS798: Selected topics in Machine Learning
More informationFunctions of Several Variables
Jim Lambers MAT 419/519 Summer Session 2011-12 Lecture 2 Notes These notes correspond to Section 1.2 in the text. Functions of Several Variables We now generalize the results from the previous section,
More informationSupport Vector Machines for Classification and Regression
CIS 520: Machine Learning Oct 04, 207 Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may
More informationThe Perceptron Algorithm 1
CS 64: Machine Learning Spring 5 College of Computer and Information Science Northeastern University Lecture 5 March, 6 Instructor: Bilal Ahmed Scribe: Bilal Ahmed & Virgil Pavlu Introduction The Perceptron
More informationLinear Learning Machines
Linear Learning Machines Chapter 2 February 13, 2003 T.P. Runarsson (tpr@hi.is) and S. Sigurdsson (sven@hi.is) Linear learning machines February 13, 2003 In supervised learning, the learning machine is
More informationSupport Vector Machines: Maximum Margin Classifiers
Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang 1 Outline What is behind
More informationMachine Learning Lecture 2
Machine Perceptual Learning and Sensory Summer Augmented 15 Computing Many slides adapted from B. Schiele Machine Learning Lecture 2 Probability Density Estimation 16.04.2015 Bastian Leibe RWTH Aachen
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationMachine learning for pervasive systems Classification in high-dimensional spaces
Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version
More informationMulti-class SVMs. Lecture 17: Aykut Erdem April 2016 Hacettepe University
Multi-class SVMs Lecture 17: Aykut Erdem April 2016 Hacettepe University Administrative We will have a make-up lecture on Saturday April 23, 2016. Project progress reports are due April 21, 2016 2 days
More informationAnalytic Geometry. Orthogonal projection. Chapter 4 Matrix decomposition
1541 3 Analytic Geometry 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 In Chapter 2, we studied vectors, vector spaces and linear mappings at a general but abstract level. In this
More informationLogistic regression and linear classifiers COMS 4771
Logistic regression and linear classifiers COMS 4771 1. Prediction functions (again) Learning prediction functions IID model for supervised learning: (X 1, Y 1),..., (X n, Y n), (X, Y ) are iid random
More informationKernel Methods in Machine Learning
Kernel Methods in Machine Learning Autumn 2015 Lecture 1: Introduction Juho Rousu ICS-E4030 Kernel Methods in Machine Learning 9. September, 2015 uho Rousu (ICS-E4030 Kernel Methods in Machine Learning)
More informationLearning Kernel Parameters by using Class Separability Measure
Learning Kernel Parameters by using Class Separability Measure Lei Wang, Kap Luk Chan School of Electrical and Electronic Engineering Nanyang Technological University Singapore, 3979 E-mail: P 3733@ntu.edu.sg,eklchan@ntu.edu.sg
More informationCourse 395: Machine Learning - Lectures
Course 395: Machine Learning - Lectures Lecture 1-2: Concept Learning (M. Pantic) Lecture 3-4: Decision Trees & CBC Intro (M. Pantic & S. Petridis) Lecture 5-6: Evaluating Hypotheses (S. Petridis) Lecture
More informationSupport Vector Machines
Support Vector Machines Tobias Pohlen Selected Topics in Human Language Technology and Pattern Recognition February 10, 2014 Human Language Technology and Pattern Recognition Lehrstuhl für Informatik 6
More informationThe Perceptron Algorithm, Margins
The Perceptron Algorithm, Margins MariaFlorina Balcan 08/29/2018 The Perceptron Algorithm Simple learning algorithm for supervised classification analyzed via geometric margins in the 50 s [Rosenblatt
More informationRobust Kernel-Based Regression
Robust Kernel-Based Regression Budi Santosa Department of Industrial Engineering Sepuluh Nopember Institute of Technology Kampus ITS Surabaya Surabaya 60111,Indonesia Theodore B. Trafalis School of Industrial
More informationPerceptron Revisited: Linear Separators. Support Vector Machines
Support Vector Machines Perceptron Revisited: Linear Separators Binary classification can be viewed as the task of separating classes in feature space: w T x + b > 0 w T x + b = 0 w T x + b < 0 Department
More informationORTHOGONALITY AND LEAST-SQUARES [CHAP. 6]
ORTHOGONALITY AND LEAST-SQUARES [CHAP. 6] Inner products and Norms Inner product or dot product of 2 vectors u and v in R n : u.v = u 1 v 1 + u 2 v 2 + + u n v n Calculate u.v when u = 1 2 2 0 v = 1 0
More informationA New Perspective on an Old Perceptron Algorithm
A New Perspective on an Old Perceptron Algorithm Shai Shalev-Shwartz 1,2 and Yoram Singer 1,2 1 School of Computer Sci & Eng, The Hebrew University, Jerusalem 91904, Israel 2 Google Inc, 1600 Amphitheater
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationLinear discriminant functions
Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative
More informationNon-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines
Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall
More informationNeural Networks for Machine Learning. Lecture 2a An overview of the main types of neural network architecture
Neural Networks for Machine Learning Lecture 2a An overview of the main types of neural network architecture Geoffrey Hinton with Nitish Srivastava Kevin Swersky Feed-forward neural networks These are
More informationApplied inductive learning - Lecture 7
Applied inductive learning - Lecture 7 Louis Wehenkel & Pierre Geurts Department of Electrical Engineering and Computer Science University of Liège Montefiore - Liège - November 5, 2012 Find slides: http://montefiore.ulg.ac.be/
More informationEngineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers
Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:
More informationLECTURE NOTE #8 PROF. ALAN YUILLE. Can we find a linear classifier that separates the position and negative examples?
LECTURE NOTE #8 PROF. ALAN YUILLE 1. Linear Classifiers and Perceptrons A dataset contains N samples: { (x µ, y µ ) : µ = 1 to N }, y µ {±1} Can we find a linear classifier that separates the position
More informationAn introduction to Support Vector Machines
1 An introduction to Support Vector Machines Giorgio Valentini DSI - Dipartimento di Scienze dell Informazione Università degli Studi di Milano e-mail: valenti@dsi.unimi.it 2 Outline Linear classifiers
More informationClassification and Pattern Recognition
Classification and Pattern Recognition Léon Bottou NEC Labs America COS 424 2/23/2010 The machine learning mix and match Goals Representation Capacity Control Operational Considerations Computational Considerations
More information