Classification. Sandro Cumani. Politecnico di Torino

Size: px
Start display at page:

Download "Classification. Sandro Cumani. Politecnico di Torino"

Transcription

1 Politecnico di Torino

2 Outline Generative model: Gaussian classifier (Linear) discriminative model: logistic regression (Non linear) discriminative model: neural networks

3 Gaussian Classifier We want to model the data distribution P(x c) This allows computing class posterior probabilities using Bayes rule P(c x) = How do we model P(x c)? P(x c)p(c) c P(x c )P(c )

4 Gaussian Classifier We want to model the data distribution P(x c) This allows computing class posterior probabilities using Bayes rule P(c x) = How do we model P(x c)? P(x c)p(c) c P(x c )P(c ) Simplest distribution: multivariate Gaussian distribution One mean and one covariance matrix per class In some cases it s useful to tie covariance parameters across classes

5 Gaussian Distribution Let X denote a Random Variable (R.V.), and x a sample of X. Let X be distributed according to a univariate Gaussian distribution X N(m,σ 2 ) The probability density function for X is 4 P X (x) = m is the distribution mean σ 2 is the distribution variance 1 1 (x m) 2 2πσ 2 e 2 σ 2 4 With an abuse of notation we will use the symbol P to denote both probabilities (for discrete R.V.s) and densities (for continuous R.V.s).

6 Gaussian Distribution m = 0,σ 2 = 0 m = 0,σ 2 = 4 m = 1,σ 2 =

7 Multivariate Gaussian Distribution Let X be a random vector X = [X 1,...,X N ] T where X i are independent identically distributed R.V.s following a standard normal distribution X i N(0, 1) The distribution of X is given by the joint distribution of{x 1,...,X N } or, equivalently, N P X (x) = P Xi (x i ) i=1 P X (x) = (2π) N 2 e 1 2 xt x X is said to follow a standard multivariate normal distribution X N(0, I)

8 Multivariate Gaussian Distribution In general, X is said to follow a multivariate normal distribution with mean µ and covariance matrix Σ if X can be rewritten as a linear transformation of a standard multivariate normal distributed random vector Y : X = AY +µ where Y N(0, I) and Σ = AA T We will write X N(µ,Σ) The p.d.f. of X is given by P X (x) = (2π) N 2 Σ 1 2 e 1 2 (x µ)t Σ 1 (x µ) Often, rather than working with a covariance matrix, it s easier to work with its inverse, the precision matrix Λ = Σ 1

9 Multivariate Gaussian Distribution It s usually more practical to work with the logarithm of the p.d.f. (to avoid numerical problems) Let s have a look at the log pdf of X N(µ,Λ 1 ) log P X (x) = 1 2 log Λ 1 2 (x µ)t Λ(x µ) N 2 log(2π)

10 Multivariate Gaussian Distribution It s usually more practical to work with the logarithm of the p.d.f. (to avoid numerical problems) Let s have a look at the log pdf of X N(µ,Λ 1 ) log P X (x) = 1 2 log Λ 1 2 (x µ)t Λ(x µ) N 2 log(2π) We can notice that it s a negative definite quadratic form in x, thus level sets are ellipses The axes are given by the eigenvalues of the covariance matrix

11 Multivariate Gaussian Distribution µ = 0,Σ = [ ] µ = 0,Σ = [ ] µ = 0,Σ = [ 1.5 ]

12 Multivariate Gaussian Distribution Let X N(m,Λ 1 ). The covariance matrix can be decomposed as Λ 1 = UDU T Consider the R.V. Z = U T (X m): Z N(0, D) i.e., Z is a random vector whose components are independent univariate normal distributed R.V.s Z i N(0, d i ) The first m components of Z = U T (X m) correspond to the directions of highest variance we recovered PCA.

13 Maximum Likelihood Estimate We assume that our data are independent samples of a R.V. with multivariate Gaussian distribution The log pdf of our dataset X = {x 1,...,x K } given model M is therefore K log P(X M) = log P(x i M) i=1 P(X M) is called likelihood function Maximum Likelihood Estimate (MLE) estimates the parameters of the model M which maximize the (log )likelihood MLE finds the parameters under which the dataset is more likely to be generated

14 Maximum Likelihood Estimate Let s apply MLE to a multivariate Gaussian distribution The log likelihood we want to maximize is given by log P(X µ,λ 1 ) = K i=1 1 2 log Λ 1 2 (x i µ) T Λ(x i µ)+k The solution is obtained by setting the derivative equal to zero and solving for µ and Λ Λ 1 ML = 1 K µ ML = 1 K i x i (x i µ ML )(x i µ ML ) T i i.e., the empirical mean and covariance matrix of the data

15 Gaussian Classifier For classification purposes, we can assume that the data of each class c is generated by a R.V. X c N(µ c,λ 1 c ) MLE allows us to estimate the set of parameters Π = { µ c,λ 1 c For each test sample, we compute the class conditional likelihood P(x c) as P(x c) = P(x c,π,m) = N(x;µ c,λ 1 c ) }

16 Gaussian Classifier Binary classification: we can compute the log likelihood ratio 1 ) l = log P(x c 1) P(x c 2 ) = log N(x;µ 1,Λ 1 N(x;µ 2,Λ 1 The decision function is quadratic in x: l(x) = x T Ax+x T b+c 2 ) with A = 1 2 (Λ 1 Λ 2 ) c = 1 2 b = (Λ 1 µ 1 Λ 2 µ 2 ) ( µ T 1 Λ 1 µ 1 µ T 2Λ 2 µ 2 ) (log Λ 1 log Λ 2 )

17 Gaussian Classifier A binary 2D example

18 Gaussian Classifier For some datasets it s convenient to assume that the covariance matrices of the different classes are tied Large dimensional data Small number of samples In this case, the ML solution is given by Σ = 1 (x c µ K c )(x c µ c ) T c i c

19 Gaussian Classifier The binary log likelihood ratio becomes l = log P(x c 1) P(x c 2 ) = log N(x;µ 1,Λ 1 ) N(x;µ 2,Λ 1 ) The decision function is now linear in x: l(x) = x T b+c with c = 1 2 b = Λ(µ 1 µ 2 ) ( µ T 1 Λµ 1 µ T 2Λµ 2 )

20 Gaussian Classifier A binary 2D example

21 Gaussian Classifier Consider a classifier based on the Mahalanobis distance, which assigns the label to the class for which x µ c W is minimum A corresponding scoring function would then be f(x) = x µ 2 2 W x µ 1 2 W Observe that f(x) = x T Wx 2x T Wµ 2 +µ T 2 Wµ 2 x T Wx+2x T Wµ 1 µ T 1 Wµ 1 = 2x T W(µ 1 µ 2 )+µ T 2 Wµ 2 µ T 1 Wµ 1 If we set W = Λ, f(x) provides the same decision boundaries as the log likelihood ratio of the Gaussian model with tied covariances!

22 Gaussian Classifier The model is also closely related to LDA Remember that two class LDA looks for the direction which maximizes the generalized Rayleigh quozient with w T S B w w T S W w S W = K Λ 1 S B = (µ 2 µ 1 )(µ 2 µ 1 ) T We have seen that we can solve the problem by applying the following transformations x = Λ 1 2 x = I S W S B = Λ 1 2(µ 2 µ 1 )(µ 2 µ 1 ) T Λ 1 2

23 Gaussian Classifier Since v = Λ 1 2(µ 2 µ 1 ) is just a vector, the leading eigenvector of S B is ν = v v Thus, projection over the LDA subspace is, up to a scaling factor, given by w T x = k x T Λ(µ 2 µ 1 ) This corresponds to the classification rule of the Gaussian model with tied covariances! Indeed, a limitation of LDA is that it assumes that all classes have the same within class covariance matrix

24 Gaussian Classifier Multiclass problems: we learn class specific model parameters This allows computing class conditional likelihoods P(x c) = N(x;µ c,σ c ) If we are interested in closed set class posteriors, we can apply Bayes rule to compute posterior probabilities: P(c x) = P(x c)p(c) P(x) = π cn(x;µ c,σ c ) c π c N(x;µ c,σ c ) We assign to the test the label of the class that has highest posterior probability P(c x)

25 Gaussian Classifier MNIST Error rates for Gaussian classifier M PCA PCA+LDA AEC (MLP) Tied covariances % 11.8% % 12.0% % 12.3% 13.2% Non tied covariances % 3.6% % 3.5% % 10.2% 6.4%

26 Logistic Regression For a 2 class problem, the Gaussian model with tied covariances provides likelihood ratios that are linear functions of our data 5 Assuming uniform priors 6 l(x) = log P(x c 1) P(x c 2 ) = wt x It follows that P(x c 1 ) P(x c 2 ) = P(c 1 x) P(c 2 x) = ewt x P(c 1 x, w) = e wtx P(c 2 x, w) = e wtx (1 P(c 1 x, w)) 5 We omit the bias term here. In general, bias can be accounted for by replacing x with [x T, 1] T 6 Non uniform priors can be accounted for using a bias term

27 Logistic Regression Therefore where P(c 1 x, w) = ewt x 1+e wt x = 1 1+e wt x = σ(wt x) σ(x) = 1 1+e x is called sigmoid function (a special case of logistic function)

28 Logistic Regression Sigmoid function: σ(x) Some properties of σ(x) that will come useful later 1 σ(x) = σ( x) dσ(x) dx = σ(x)(1 σ(x))

29 Logistic Regression We assume that the label for class c 1 is 1, and the label for class c 2 is 0 Let It follows that y i = P(c 1 x i, w) = σ(w T x i ) P(c 2 x i, w) = 1 y i = σ( w T x) Let t i {0, 1} denote the training label associated to x i

30 Logistic Regression The likelihood of our label set is P(t x, w) = i P(t i x i, w) where P(t i x i, w) = { yi if t i = 1 1 y i if t i = 0 i.e., each t i is generated according to a Bernoulli distribution with parameter p = y i

31 Logistic Regression In a compact form, the likelihood can be expressed as P(t x, w) = i y t i i (1 y i) 1 t i In log domain, log P(t x, w) = i [t i log y i +(1 t i ) log(1 y i )] The negative expression E(w) = i [t i log y i +(1 t i ) log(1 y i )] is also called binary cross entropy

32 Logistic Regression The cross entropy can be interpreted as a type of error function It measures the distance between the predicted labels for the training set and the actual labels As for the Gaussian model, we are interested in maximizing the likelihood log P(t x, w), or, which is equivalent, minimize the cross entropy E(w)

33 Logistic Regression Note that, if we set z i = 2t i 1, i.e. { 1 if ti = 1 z i = 1 if t i = 0 then E(w) can be rewritten in compact form as E(w) = i = i logσ(z i w T x i ) ) log (1+e z iw T x i Function l(s) = log(1+e s ) is called logistic loss

34 Logistic Regression Logistic loss l(s)

35 Logistic Regression Logistic regression can be interpreted as an instance of a broad class of optimization problems which aim at minimizing an empirical risk function over our training data Generalized risk minimization problem: min l(w, x i, z i ) w i where l is called loss (or cost) function. In general, some regularization is used to avoid overfitting min w λ 2 w K l(w, x i, z i ) i

36 Logistic Regression Regularization can also be applied to Logistic Regression 7 This is necessary when classes are separable to avoid the norm of w from growing indefinitely! Regularized logistic regression: min w λ 2 w K i ) log (1+e z iw T x i λ is a hyperparameter of the model, and as usual its optimal value should be selected by means of a validation set 7 Once we add regularization, the model is not invariant to linear transformations anymore. It is therefore useful to preprocess our data (e.g. whitening) for the regularizer to be effective

37 Logistic Regression The optimal value for w cannot be expressed in closed form It can be shown that the regularized logistic regression objective function is convex We can resort to numerical optimization approaches In this course we will use the L BFGS algortihm 8 L BFGS libraries are available for a wide number of programming languages (including python) 8 Details of L BFGS can be found in: J. Nocedal and S. J. Wright, Numerical Optimization, 2nd ed. Springer, 2006

38 Logistic Regression In order to run the numerical solver, we need to compute the gradient of the objective function with respect to w w E(w) The derivative of the cost function l(s, z i ) = log(1+e z is ) with respect to s is Thus, dl(s) ds = z i 1+e z is w l(w T x i, z i ) = z i 1+e z is x i Notice that, if we use t i in place of z i, we have also w l(w T x i, z i ) = (y i t i )x i

39 Logistic Regression λ = 0 λ =

40 Multiclass Logistic Regression As in the binary case, we assume that log P(x c) = w T c x+k We have one hyperplane w c per class Assuming uniform priors, class posteriors are then given by P(c x) P(x c) e wt c x Since c P(c x) = 1 it follows that P(c x) = ewt c x c ewt c x Function f i (s) = es i j es j is called softmax

41 Multiclass Logistic Regression We assume that class labels are c i {0,...,N 1}. We adopt a 1 over N coding scheme for class labels. For each data point x i we define the label vector t i with components { 1 if ci = j t ij = 0 otherwise i.e., t i is a vector whose elements are all zero except the element whose position corresponds to the class label.

42 Multiclass Logistic Regression Let W denote the set of all hyperplanes W = {w 1,...,w N }. The likelihood for vectors t i is given by log P({t i } {x i },W) = i = i = i log P(t i x i, W) t ij log P(c j x, W) j t ij log y ij j where y ij = ewt j x k ewt k x This objective function is also known as negative cross entropy between class labels and predictions

43 Multiclass Logistic Regression Multiclass logistic regression corresponds to a multinomial model of the class labels, where the multinomial parameters are given by Π = [ w T i x,...,wt N x] As for the binary case, we estimate W as to maximize the likelihood for the training labels Compared to the binary case, the model is overparametrized (i.e., we can add a constant vector to all terms w i without changing the model) In particular, for a 2 class problem, if we subtract w 2 from both w 1 and w 2, we recover exactly the binary logistic regression objective.

44 Multiclass Logistic Regression Finally, as for the binary class, we can cast the problem as a minimization of a loss function We rewrite the objective in terms of class labels c as log P({t i } {x i },W) = i t ij log y ij j This is also called softmax loss e wt c i x i c ewt c x = log i [ ( = log i e wt c x c ) w Tci x i ]

45 Multiclass Logistic Regression Adding a regularizer, the multiclass logistic regression objective function is min w 1,...,w N λω(w 1,...,w N )+ 1 K [ ( log i e wt c x c Different regularizers can be used, for example Ω(w 1,...,w N ) = 1 w i 2 2 i ) w Tci x i ]

46 Multiclass Logistic Regression The empirical loss and its gradients are l(w 1,...,w N, x i ) = wk l(w 1,...,w N, x i ) = [ ( log ( e wt c x c ) w Tci x i ] e wt k x c ewt c x δ k,c k ) x In terms of y ij and t ij : l(w 1,...,w N, x i ) = j t ij log y ij wk l(w 1,...,w N, x i ) = (y nk t nk ) x i

47 Multiclass Logistic Regression MNIST Error rates for Logistic Regression DimRed λ = 0 λ = λ = λ = 0.1 Tied Gau RAW [768] 8.0% 7.4% 7.9% 12.9% PCA [50] 8.8% 8.8% 8.9% 13.3% 12.3 % PCA [100] 7.8% 7.8% 8.2% 12.9% 12.6% AEC [50] 9.1% 9.2% 9.2% 11.9% 12.0% AEC [100] 7.8% 7.8% 8.2% 12.1% 11.8%

48 Multiclass Logistic Regression Linear logistic regression on MNIST performs better than our Tied Covariance Gaussian classifier, however it s far worse than our non linear Gaussian classifier Remember that, for LR, we assumed that log P(x c) = w T c x+k which has the same form as the Gaussian classifier with tied covariances For Gaussian classifier with non tied covariances we have log P(x c) = x T Λ c x+x T Λ c 1 2 µt cλ c µ c + k which we can rewrite as log P(x c) =< xx T, A > +x T v+k

49 Multiclass Logistic Regression The log pdf log P(x c) can be expressed as a linear function in an expanded feature space vec(xx T ) φ(x) = x vec(λ c) w = Λ c µ c 1 2 µt Λ c µ c + k log P(x c) = w T φ(x)+k We can use LR with data points φ(x) to directly estimate w

50 Multiclass Logistic Regression In general, we can consider a transformation φ(x) of our feature space such that our classes are linearly separable in the expanded feature space The simple expansion we have just seen produces quadratic separtion surfaces (cfr. with the Gaussian model) The dimensionality of the expanded feature space can grow very quickly MNIST Error rates for LR with quadratic feature expansion DimRed λ = 0 λ = 1e 5 λ = 1e 3 λ = 1e 1 Gaussian PCA [50] 2.3% 1.9% 1.7% 3.1% 3.6% AEC [50] 2.3% 2.0% 2.0% 2.2% 3.5%

51 Neural Networks for classification Neural Networks (NN) provide a method to estimate function φ A Neural Network can be interpreted as a non linear parametric function φ(x,π) The parameters of the non linear transformation are learned from the data The function is represented by means of a directed graph Each node is associated to a function that operates on the input nodes and provides the node output

52 Neural Networks A neural network node x 1 x 2 f(x,π) y = f(x,π) x 3 x 4

53 Neural Networks Function f is usually represented as a composition of an affine projection and a scalar non linearity f(x,π) = h(w T x+b) Several possible functions have been proposed for the non linearity Sigmoid function h(x) = 1 1+e x Hyperbolic tangent h(x) = tanh(x) Rectified linear h(x) = max(0, x)

54 Neural Networks Sigmoid Hyp. Tangent h(x) = σ(x) h(x) = tanh(x) ReLU h(x) = max(x,0)

55 Feed forward Neural Networks Units are organized in layers x 1 x 2 y 1 x y M x N Connections are defined between layers (no loops)

56 Feed forward network (revisited) Training a neural network requires optimizing its parameters (in our case, the parameters of the affine transformations) with respect to our objective function This requires computing the gradients of our objective function with respect to the network parameters We can exploit the representation of the network in terms of composition of functions and apply the chain rule to compute our gradients An effective approach to compute these gradients is the back propagation algorithm

57 Neural Networks for classification Training is usually performed using Stochastic Gradient Descent over batches Given a randomly sampled batch, we compute the gradient of the objective function We update the weights using a gradient descent approach α t is called learning rate w w α t w l({x} BATCH ) Convergence is guaranteed if α t = t αt 2 < More sophisticated approaches have been recently introduced t

58 Neural Networks for classification We can combine the neural network with the logistic regression model The neural network preprocesses our data, which are then classified by means of LR Recall that, for binary classification, we can write our objective function as l(x) = [t log y+(1 t) log(1 y)] where y = σ(w T x). Adding the NN component we have y = σ(w T f n (x))

59 Neural Networks for classification The model is equivalent to a NN n 1 with an additional sigmoid activation layer, and loss function l(x) = [t log f n1 (x)+(1 t) log(1 f n1 (x))] For binary classification we can thus build a network with an extra, single node, sigmoide layer and train the network to minimize the objective function i [t i log f(x i )+(1 t i ) log(1 f(x i ))] This objective is called binary cross entropy

60 Neural Networks for classification A binary 2D example

61 Neural Networks for classification Overfitting can be much more dramatic than for linear logistic regression

62 Neural Networks for classification Different regularization strategies can be adopted L2 weights regularization Dropout Early stopping (computing error on validation set)

63 Neural Networks for classification L2 weights regularization

64 Neural Networks for classification For multiclass problems we consider the multiclass logistic regression objective l(x) = j t j log y j where y j = ewt j f NET (x) k ewt k f NET (x) As for the binary case, we can interpret this model as a NN NET 1 with an additional softmax layer. The network has an output node for each class Training targets are represented using a 1 out of N code

65 Multiclass Logistic Regression MNIST Error rates for Neural Nets 9 No Reg. L2 (λ = 1e 5 ) Dropout (p = 0.5) MLP (Tanh) MLP (ReLU) MLP (ReLU) % [1.8%] 2.0% [1.7%] 1.5% [1.5%] 1.6% [1.5%] 1.7% [1.6%] 1.7% [1.4%] 1.6% [1.5%] 1.6% [1.4%] 1.4% [1.4%] ConvNet (ReLU) 1.1% [1.0%] 1.1% [1.0%] 0.9% [0.8%] 9 Training set was split into development (90% of the data) and validation (10% of the data) sets to select the best performing model. The performance of the model with lowest error rate on the test set is shown in brackets.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable

More information

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat Neural Networks, Computation Graphs CMSC 470 Marine Carpuat Binary Classification with a Multi-layer Perceptron φ A = 1 φ site = 1 φ located = 1 φ Maizuru = 1 φ, = 2 φ in = 1 φ Kyoto = 1 φ priest = 0 φ

More information

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber Machine Learning Regression-Based Classification & Gaussian Discriminant Analysis Manfred Huber 2015 1 Logistic Regression Linear regression provides a nice representation and an efficient solution to

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 305 Part VII

More information

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Gaussian and Linear Discriminant Analysis; Multiclass Classification Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015

More information

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012) Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation

More information

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks Jan Drchal Czech Technical University in Prague Faculty of Electrical Engineering Department of Computer Science Topics covered

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

Machine Learning. 7. Logistic and Linear Regression

Machine Learning. 7. Logistic and Linear Regression Sapienza University of Rome, Italy - Machine Learning (27/28) University of Rome La Sapienza Master in Artificial Intelligence and Robotics Machine Learning 7. Logistic and Linear Regression Luca Iocchi,

More information

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Machine Learning 2017

Machine Learning 2017 Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section

More information

5. Discriminant analysis

5. Discriminant analysis 5. Discriminant analysis We continue from Bayes s rule presented in Section 3 on p. 85 (5.1) where c i is a class, x isap-dimensional vector (data case) and we use class conditional probability (density

More information

April 9, Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá. Linear Classification Models. Fabio A. González Ph.D.

April 9, Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá. Linear Classification Models. Fabio A. González Ph.D. Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá April 9, 2018 Content 1 2 3 4 Outline 1 2 3 4 problems { C 1, y(x) threshold predict(x) = C 2, y(x) < threshold, with threshold

More information

Linear Classification

Linear Classification Linear Classification Lili MOU moull12@sei.pku.edu.cn http://sei.pku.edu.cn/ moull12 23 April 2015 Outline Introduction Discriminant Functions Probabilistic Generative Models Probabilistic Discriminative

More information

Lecture 5. Gaussian Models - Part 1. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. November 29, 2016

Lecture 5. Gaussian Models - Part 1. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. November 29, 2016 Lecture 5 Gaussian Models - Part 1 Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza November 29, 2016 Luigi Freda ( La Sapienza University) Lecture 5 November 29, 2016 1 / 42 Outline 1 Basics

More information

Reading Group on Deep Learning Session 1

Reading Group on Deep Learning Session 1 Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular

More information

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging

More information

Logistic Regression. Seungjin Choi

Logistic Regression. Seungjin Choi Logistic Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

CSCI567 Machine Learning (Fall 2018)

CSCI567 Machine Learning (Fall 2018) CSCI567 Machine Learning (Fall 2018) Prof. Haipeng Luo U of Southern California Sep 12, 2018 September 12, 2018 1 / 49 Administration GitHub repos are setup (ask TA Chi Zhang for any issues) HW 1 is due

More information

Machine Learning. Linear Models. Fabio Vandin October 10, 2017

Machine Learning. Linear Models. Fabio Vandin October 10, 2017 Machine Learning Linear Models Fabio Vandin October 10, 2017 1 Linear Predictors and Affine Functions Consider X = R d Affine functions: L d = {h w,b : w R d, b R} where ( d ) h w,b (x) = w, x + b = w

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

CSC 411 Lecture 10: Neural Networks

CSC 411 Lecture 10: Neural Networks CSC 411 Lecture 10: Neural Networks Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 10-Neural Networks 1 / 35 Inspiration: The Brain Our brain has 10 11

More information

Statistical Machine Learning Hilary Term 2018

Statistical Machine Learning Hilary Term 2018 Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html

More information

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire

More information

Learning from Data Logistic Regression

Learning from Data Logistic Regression Learning from Data Logistic Regression Copyright David Barber 2-24. Course lecturer: Amos Storkey a.storkey@ed.ac.uk Course page : http://www.anc.ed.ac.uk/ amos/lfd/ 2.9.8.7.6.5.4.3.2...2.3.4.5.6.7.8.9

More information

Logistic Regression & Neural Networks

Logistic Regression & Neural Networks Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability

More information

Logistic Regression. COMP 527 Danushka Bollegala

Logistic Regression. COMP 527 Danushka Bollegala Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26 Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar

More information

CS 195-5: Machine Learning Problem Set 1

CS 195-5: Machine Learning Problem Set 1 CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of

More information

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods

More information

CMU-Q Lecture 24:

CMU-Q Lecture 24: CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015 EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,

More information

CSC 578 Neural Networks and Deep Learning

CSC 578 Neural Networks and Deep Learning CSC 578 Neural Networks and Deep Learning Fall 2018/19 3. Improving Neural Networks (Some figures adapted from NNDL book) 1 Various Approaches to Improve Neural Networks 1. Cost functions Quadratic Cross

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

The exam is closed book, closed notes except your one-page cheat sheet.

The exam is closed book, closed notes except your one-page cheat sheet. CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right

More information

CSC 411: Lecture 04: Logistic Regression

CSC 411: Lecture 04: Logistic Regression CSC 411: Lecture 04: Logistic Regression Raquel Urtasun & Rich Zemel University of Toronto Sep 23, 2015 Urtasun & Zemel (UofT) CSC 411: 04-Prob Classif Sep 23, 2015 1 / 16 Today Key Concepts: Logistic

More information

Bayesian Decision Theory

Bayesian Decision Theory Bayesian Decision Theory Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 1 / 46 Bayesian

More information

Machine Learning (CS 567) Lecture 5

Machine Learning (CS 567) Lecture 5 Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

ECE521 Lectures 9 Fully Connected Neural Networks

ECE521 Lectures 9 Fully Connected Neural Networks ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance

More information

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

Bias-Variance Tradeoff

Bias-Variance Tradeoff What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V

More information

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so. CS 89 Spring 07 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic

More information

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer. University of Cambridge Engineering Part IIB & EIST Part II Paper I0: Advanced Pattern Processing Handouts 4 & 5: Multi-Layer Perceptron: Introduction and Training x y (x) Inputs x 2 y (x) 2 Outputs x

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3

More information

Introduction to Machine Learning

Introduction to Machine Learning Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Machine Learning Support Vector Machines. Prof. Matteo Matteucci Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

CSC 411: Lecture 09: Naive Bayes

CSC 411: Lecture 09: Naive Bayes CSC 411: Lecture 09: Naive Bayes Class based on Raquel Urtasun & Rich Zemel s lectures Sanja Fidler University of Toronto Feb 8, 2015 Urtasun, Zemel, Fidler (UofT) CSC 411: 09-Naive Bayes Feb 8, 2015 1

More information

Multiclass Logistic Regression

Multiclass Logistic Regression Multiclass Logistic Regression Sargur. Srihari University at Buffalo, State University of ew York USA Machine Learning Srihari Topics in Linear Classification using Probabilistic Discriminative Models

More information

Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 3: Introduction to Deep Learning (continued)

Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 3: Introduction to Deep Learning (continued) Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound Lecture 3: Introduction to Deep Learning (continued) Course Logistics - Update on course registrations - 6 seats left now -

More information

Generative v. Discriminative classifiers Intuition

Generative v. Discriminative classifiers Intuition Logistic Regression (Continued) Generative v. Discriminative Decision rees Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University January 31 st, 2007 2005-2007 Carlos Guestrin 1 Generative

More information

Generative v. Discriminative classifiers Intuition

Generative v. Discriminative classifiers Intuition Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 1 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features

More information

Gaussian discriminant analysis Naive Bayes

Gaussian discriminant analysis Naive Bayes DM825 Introduction to Machine Learning Lecture 7 Gaussian discriminant analysis Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Outline 1. is 2. Multi-variate

More information

DATA MINING AND MACHINE LEARNING. Lecture 4: Linear models for regression and classification Lecturer: Simone Scardapane

DATA MINING AND MACHINE LEARNING. Lecture 4: Linear models for regression and classification Lecturer: Simone Scardapane DATA MINING AND MACHINE LEARNING Lecture 4: Linear models for regression and classification Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Linear models for regression Regularized

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012

Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012 Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 20, 2012 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)

More information

CS 340 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis

CS 340 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis CS 3 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis AD March 11 AD ( March 11 1 / 17 Multivariate Gaussian Consider data { x i } N i=1 where xi R D and we assume they are

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Artificial Neural Networks. MGS Lecture 2

Artificial Neural Networks. MGS Lecture 2 Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Antti Ukkonen TAs: Saska Dönges and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer,

More information

Iterative Reweighted Least Squares

Iterative Reweighted Least Squares Iterative Reweighted Least Squares Sargur. University at Buffalo, State University of ew York USA Topics in Linear Classification using Probabilistic Discriminative Models Generative vs Discriminative

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features

More information

1 What a Neural Network Computes

1 What a Neural Network Computes Neural Networks 1 What a Neural Network Computes To begin with, we will discuss fully connected feed-forward neural networks, also known as multilayer perceptrons. A feedforward neural network consists

More information

Neural Networks: Backpropagation

Neural Networks: Backpropagation Neural Networks: Backpropagation Seung-Hoon Na 1 1 Department of Computer Science Chonbuk National University 2018.10.25 eung-hoon Na (Chonbuk National University) Neural Networks: Backpropagation 2018.10.25

More information

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES Supervised Learning Linear vs non linear classifiers In K-NN we saw an example of a non-linear classifier: the decision boundary

More information

Introduction to Machine Learning Spring 2018 Note 18

Introduction to Machine Learning Spring 2018 Note 18 CS 189 Introduction to Machine Learning Spring 2018 Note 18 1 Gaussian Discriminant Analysis Recall the idea of generative models: we classify an arbitrary datapoint x with the class label that maximizes

More information

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships

More information

10-701/ Machine Learning, Fall

10-701/ Machine Learning, Fall 0-70/5-78 Machine Learning, Fall 2003 Homework 2 Solution If you have questions, please contact Jiayong Zhang .. (Error Function) The sum-of-squares error is the most common training

More information

Linear Regression (continued)

Linear Regression (continued) Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression

More information

Lecture 3 Feedforward Networks and Backpropagation

Lecture 3 Feedforward Networks and Backpropagation Lecture 3 Feedforward Networks and Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 3, 2017 Things we will look at today Recap of Logistic Regression

More information

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship

More information

Feed-forward Network Functions

Feed-forward Network Functions Feed-forward Network Functions Sargur Srihari Topics 1. Extension of linear models 2. Feed-forward Network Functions 3. Weight-space symmetries 2 Recap of Linear Models Linear Models for Regression, Classification

More information

Minimum Error Rate Classification

Minimum Error Rate Classification Minimum Error Rate Classification Dr. K.Vijayarekha Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur-613 401 Table of Contents 1.Minimum Error Rate Classification...

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Lecture 4: Deep Learning Essentials Pierre Geurts, Gilles Louppe, Louis Wehenkel 1 / 52 Outline Goal: explain and motivate the basic constructs of neural networks. From linear

More information

ECE521 Lecture7. Logistic Regression

ECE521 Lecture7. Logistic Regression ECE521 Lecture7 Logistic Regression Outline Review of decision theory Logistic regression A single neuron Multi-class classification 2 Outline Decision theory is conceptually easy and computationally hard

More information

Week 5: Logistic Regression & Neural Networks

Week 5: Logistic Regression & Neural Networks Week 5: Logistic Regression & Neural Networks Instructor: Sergey Levine 1 Summary: Logistic Regression In the previous lecture, we covered logistic regression. To recap, logistic regression models and

More information

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015 Sequence Modelling with Features: Linear-Chain Conditional Random Fields COMP-599 Oct 6, 2015 Announcement A2 is out. Due Oct 20 at 1pm. 2 Outline Hidden Markov models: shortcomings Generative vs. discriminative

More information

The Bayes classifier

The Bayes classifier The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal

More information

Multilayer Perceptron

Multilayer Perceptron Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Single Perceptron 3 Boolean Function Learning 4

More information

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:

More information

NEURAL NETWORKS

NEURAL NETWORKS 5 Neural Networks In Chapters 3 and 4 we considered models for regression and classification that comprised linear combinations of fixed basis functions. We saw that such models have useful analytical

More information

Gradient-Based Learning. Sargur N. Srihari

Gradient-Based Learning. Sargur N. Srihari Gradient-Based Learning Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation and Other Differentiation

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information