Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer

Size: px
Start display at page:

Download "Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer"

Transcription

1 Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers Blaine Nelson, Tobias Scheffer

2 Contents Classification Problem Bayesian Classifier Decision Linear Classifiers, MAP Models Logistic Regression Regularized Empirical Risk Minimization Kernel Perceptron, Support Vector Machine Ridge Regression, LASSO Representer Theorem Dualized Perceptron, Dual SVM Mercer Map Learning with Structured Input & Output Taxonomy, Sequences, Ranking, Decoder, Cutting Plane Algorithm 2

3 Prerequisites Statistics Random Variables, Distributions Bayes Formula Linear Algebra Vectors & Matrices Transpose, inverse Matrices Eigenvalues & Eigenvectors Calculus (Analysis) Derivatives, partial derivatives Gradients 3

4 Classification Input: an instance x X E.g., X can be a vector space over attributes The Instance is then an assignment of attributes. x = x x m is a feature vector Output: Class y Y; where Y is a finite set. The class is also referred to as the target attribute y is also referred to as the (class) label x classifier y 4

5 Classification: Example Input: Instance x X X : the set of all possible combinations of regiment of medication Attribute Medication # included? Medication #6 included? Instance x Attribute values Feature vector Medication combination Output: y Y = toxic, ok / classifier 5

6 Classification: Example Input: Instance x X X : the set of all 6 6 pixel bitmaps Attribute Gray value of pixel Gray value of pixel 256 Instance x pixel values Output: y Y = 0,,2,3,4,5,6,7,8,9 : recognized digit classifier "6" 6

7 Classification: Example Input: Instance x X X : bag-of-words representation of all possible texts Attribute Word # occurs? Word #m occurs? m,000,000 Instance x Output: y Y = spam, ok Aardvark Beneficiary Friend Sterling Science Dear Beneficiary, your address has been picked online in this years MICROSOFT CONSUMER AWARD as a Winner of One Hundred and Fifty Five Thousand Pounds Sterling Dear Beneficiary, We are pleased to notify you that your address has been picked online in this second quarter's MICROSOFT CONSUMER AWARD (MCA) as a Winner of One Hundred and Fifty Five Thousand Pounds Sterling classifier Spam 7

8 Classifier Learning Input to the Learner: Training data T n. X = x x m x n x nm y = y y n Training Data: T n = x, y,, x n, y n 8

9 Classifier Learning Input to the Learner: Training data T n. X = y = x x m x n x nm y y n Output: a Model y X Y for example: if φ x y x = T 0 otherwise Training Data: T n = x, y,, x n, y n Linear classifier with parameter vector. 9

10 BAYESIAN CLASSIFICATION 0

11 Empirical Inference Inference of the probability of y given instance x and training data T n? p y x, T n Inference of the most likely class y = argmax y p y x, T n We must make assumptions about the process by which the data is generated to be able to calculate the most probable class. We assume all data are independent given model.

12 Empirical Inference Inference of the probability of y given instance x and training data T n? p y x, T n = p y, x, T n d Integration over space of model parameters: Bayesian Model Averaging = p y x, p T n d Inference of the most likely class y = argmax p y x, T n y Independence assumption = argmax y p y x, p T n d 2

13 Empirical Inference Inference of the probability of y given instance x and training data T n? p y x, T n = p y, x, T n d Class probability at instance x given = p y x, p T n d Inference of the most likely class y = argmax p y x, T n y a posteriori probability (Posterior) of model given training data = argmax y p y x, p T n d 3

14 Empirical Inference Inference of the probability of y given instance x and training data T n? p y x, T n = p y x, p T n d Generally, no closed-form solution for classification. Difficult to approximate since the space of all parameter vectors is too large. 4

15 Empirical Inference Inference of the probability of y given instance x and training data? p y x, T n = p y x, p T n d where MAP = argmax p y x, MAP p T n Approximation of the weighted sum through its maximum. Classification through the most probable single model instead of a sum over all models. 5

16 Inference Example Clinical study: Medication combination x and outcome y Inference of the probability of y given instance x and training data? p y x, T n = p y x, p T n d Integral over all models where MAP = argmax p y x, MAP p T n Most probable model given training data (Maximum a- Approximation of the weighted sum Posteriori through model) its maximum. Classification through the most probable single model instead of a sum over all models. 6

17 Graphical Model for Classification A graphical model defines a stochastic process It constitutes our modeling assumptions about the data generation process y i y First, a model parameter is selected (or sampled) x i n x This parameterizes the training data p y i x i, The distribution of the data p x i is not further modeled 7

18 Example Evolution determines physiological parameters of humans Given these parameters and a combination of medication, Nature rolls dice to decide whether we survive this combination of drugs. Every time this combination of medicine is administered, the dice are re-rolled according to p y i x i, to determine the result. x i n x? 8

19 Empirical Inference Computation of MAP : MAP = argmax p T n = argmax p,t n p T n x i y i n 9

20 Empirical Inference Computation of MAP : MAP = argmax p T n = argmax p,t n p T n x i y i n = argmax p p X p y X, p T n (data model) 20

21 Empirical Inference Computation of MAP : MAP = argmax p T n = argmax p,t n p T n x i y i n = argmax = argmax p p X p y X, p T n p y X, p (Constants w.r.t. ) 2

22 Empirical Inference Computation of p y X,. Independence of the training data (from the graphical model) n y i p y X, = p y i x i, i= x i n Discriminative class probabilities p y i x i, are directly specified by the model. 22

23 Empirical Inference Discriminative Models Summary of empirical inference to this point: P y x, T n = p y x, p T n d p y x, MAP MAP = argmax p y X, p p y X, = p y i x i, n i= p y i x i, is directly specified by the model 23

24 Empirical Inference Discriminative Models Summary of empirical inference to this point: Integral over all models: Bayesian model averaging P y x, T n = p y x, p T n d p y x, MAP Likelihood of the Class MAP = argmax p y X, = p y i x i, Training data are independent n i= p y X, p Prior over model parameters MAP: Approximation by most probable model p y i x i, is directly specified by the model 24

25 DISCRIMINATIVE APPROACH 25

26 Class Probabilities: Discriminative Models How should we model p y x,? Simple Approach: assume p depends on x T ; i.e. p y x, = q y x T Linear Model: Eg. Binary logistic regression: p y = + x, = + exp x T + b p y = x, = p y = + x, = + exp x T + b Later, we look at other frameworks for linear models 26

27 Binary Logistic Regression Binary classification: classes + & - p y = + x, = + exp x T + b Decision point: p y = + x, = p y = x, 2 = + exp x T + b x T + b = 0 The set of points x x T + b = 0 form a separating plane between classes - & +. 27

28 Linear Models Hyperplane given by normal vector & displacement: H = x f x = x T + b = 0 Decision function: f x Classifier: = x T + b y x = sign f x Discriminative class probability: P y = + x, = x 2 b +exp x T +b f x f x > 0 f x < 0 = 0 x 28

29 Linear Models Hyperplane given by normal vector & displacement: H = x f x = x T + b = 0 p x y = +, Decision function: f x = x T + b Classifier: y x = sign f x x 2 Discriminative class probability: p x y =, x P y = + x, = +exp x T +b 29

30 Linear Models Hyperplane given by normal vector & displacement: H = x f x = x T + b = 0 Decision function: f x = x T + b Classifier: y x = sign f x x 2 f x = 0 Discriminative class probability: p x y =, x P y = + x, = +exp x T +b 30

31 Logistic Regression: Learning Problem Inference of MAP = argmax p T n Another Assumption: the prior is normally distributed with a mean 0: p = N ; 0, Σ 3

32 Logistic Regression: Learning Problem Inference of the MAP-Parameter: MAP = argmax p T n = argmax p y X, p = argmax = argmax = argmax = argmax = argmin log p y X, + log p n i= log p y i x i, + log N ; 0, Σ log y + exp x T i =+ + b + log + exp + x T + b n i= y i = log n i= + exp y i x T + b + + log e 2 T Σ 2π m Σ log + exp y i x T + b + 2 T Σ 32

33 Logistic Regression: Learning Problem Inference of the MAP-Parameter: MAP = argmax p T n = argmax p y X, p = argmax = argmax = argmax = argmax = argmin log p y X, + log p n i= log p y i x i, + log N ; 0, Σ log y + exp x T i =+ + b + log + exp + x T + b n i= y i = log n i= + exp y i x T + b + + log e 2 T Σ 2π m Σ log + exp y i x T + b + 2 T Σ 33

34 Logistic Regression: Learning Problem Inference of the MAP-Parameter: MAP = argmax p T n = argmax p y X, p = argmax = argmax = argmax = argmax = argmin log p y X, + log p n i= log p y i x i, + log N ; 0, Σ log y + exp x T i =+ + b + log + exp + x T + b n i= y i = log n i= + exp y i x T + b + + log e 2 T Σ 2π m Σ log + exp y i x T + b + 2 T Σ 34

35 Logistic Regression: Learning Problem Inference of the MAP-Parameter: MAP = argmax p T n = argmax p y X, p = argmax = argmax = argmax = argmax = argmin log p y X, + log p n i= log p y i x i, + log N ; 0, Σ log y + exp x T i =+ + b + log + exp + x T + b n i= y i = log n i= + exp y i x T + b + + log e 2 T Σ 2π m Σ log + exp y i x T + b + 2 T Σ 35

36 Logistic Regression: Learning Problem Inference of the MAP-Parameter. Binary logistic regression: classes + and - MAP = argmin n i= log + exp y i x T + b + 2 T Σ y i, + How can MAP be computed? To be continued 36

37 FEATURE MAPPINGS 37

38 Linear Classification Reformulation by adding a constant input feature (affine transformation): f x = φ x m T m + b m = φ x f f f= m+ = φ x f f f= + b f x = x T + b y x = sign f x where φ x m+ = and m+ = b = φ x m+ T m+ 38

39 Linear Classification Reformulation by adding a constant input feature (affine transformation): f x = φ x m T m + b m = φ x f f f= m+ = φ x f f f= + b f X x = x T + b y x = sign f x where φ x m+ = and m+ = b = φ x m+ T m+ f x = φ x T y x = sign f x 39

40 Additional Feature Maps The abstraction φ x allows us to learn in more general feature spaces We can replace x by φ x & use the same learning! MAP = argmin n i= log + exp y i φ x T + b + 2 T Σ Aside: The tensor product between an n and m dimensional vector is an nm-dimensional vector of all products of elements: x y = x x n y y m = x y x y m x n y x n y m 40

41 Feature Mappings Linear Mapping: φ x i = x i Quadratic Mapping: φ x i = x i x i x i Tensor product Polynomial Mapping: φ x i = x i x i x i x i x i p factors Frequently, it occurs that feature mappings do not have a closed form expression, but can be specified indirectly via their inner products E.g., RBF kernel, Hash kernel functions 4

42 Sufficient Statistics, Feature Mappings Linear Mappings: Linear Mapping φ x i = x i is the sufficient statistic, when p x y, = N x; μ y, Σ and the covariance matrix is the same for all classes. A linear mapping φ x i = x i is then sufficient to calculate the class probabilities. Quadratic Mappings: More generally, a quadratic mapping is the sufficient statistic when classes have different covariance matrices. 42

43 Linear Models Feature Mappings Hyperplane given by normal vector & displacement: H = x f x = φ x T + b = 0 Decision function: f x Classifier: = φ x T + b y x = sign f x x 2 p x y = +, φ x i = x i x i x i Discriminative class probability: p x y =, x P y = + x, = +exp φ x T +b 43

44 Linear Models Feature Mappings Hyperplane given by normal vector & displacement: H = x f x = φ x T + b = 0 Decision function: f x Classifier: = φ x T + b y x = sign f x x 2 φ x i = f x = 0 x i x i x i Discriminative class probability: p x y =, x P y = + x, = +exp φ x T +b 44

45 MULTI-CLASS CLASSIFICATION 45

46 Multi-class Classification Motivation: we would like to extend classification to problems with more than 2 classes. Y =,, k Problem: we cannot separate k classes with a single hyperplane. Idea: Each class y has a separate function f x, y that is used to predict how likely y is given x. Each function is modeled as linear. We predict class y with the highest scoring function for x. 46

47 Multi-class Logistic Regression Probability for class y: p y x, = exp φ x T y +b y z Y exp φ x T z +b z Exponent is affine in φ x (linear + offset) Denominator is constant w.r.t. y Class y is the most likely class if it satisfies y argmax φ x T z + b z z Y This is a linear (+offset) decision function. 47

48 Linear Models Multi-class Case Hyperplane given by normal vector & displacement: H,y = x f x, y = φ x T y + b y = 0 Decision functions: f x, y Classifier: y x = φ x T y + b y = argmax z Y f x, z x 2 f x, y > 0 y y2 f x, y 2 > 0 Discriminative class probability: P y x, = exp φ x T y + b y z Y exp φ x T z + b z y3 f x, y 3 x > 0 48

49 Logistic Regression: Learning Problem Inference of the MAP-Parameter: =,, k T MAP = argmax p T n = argmax p y X, p = argmax = argmax = argmin log p y X, + log p n i= n i= log p y i x i, + log N ; 0, Σ log exp φ x i T y i + b y i z Y exp φ x i T z + b z log e 2 T Σ 2π m Σ = argmin n i= log Σ z Y exp φ x i T z + b z φ x i T y i + b y i + T Σ 2 49

50 Summary Learning Logistic Regression If the modelling assumptions are fulfilled: Data generation model from Slide 7, p = N ; 0, Σ ; that is, the prior is normally distributed, Then we use P y x, = exp φ x T y + b y z Y exp φ x T z + b z And the Maximum-A-Posteriori-Parameter is MAP = argmin n i= log Σ z Y exp φ x i T z + b z φ x i T y i + b y i + T Σ 2 How can MAP be computed? To be continued 50

51 GENERATIVE APPROACH 5

52 Empirical Inference Generative Models Computation of p y X,. Independence of the training data (from the graphical model) p y X, = p y i x i, n i= Generative model: apply Bayes Rule, p y i x i, = p x i y i, p y i y Y p x i z, p z where p x i y i, and p y i are model specific. x i y i n x y 52

53 Exponential Family Probability of a class label is part of the parameter vector p y = π y p y i x i, = p x i y i, p y i z Y p x i y, p y The conditional probability of x is given by: p x y, = h x exp φ x T y ln g y For class k, we partition the parameter vector : = k π π π k 53

54 Exponential Family The conditional probability of x is given by: p x y, = h x exp φ x T y ln g y The representation φ x is the sufficient statistic φ x conveys all useful information about x for the probability distribution. Partition function g h x is the base measure. y normalizes the distribution The distribution is specified by h x, φ x,, & g. Many common distributions are in exponential family. 54

55 Exponential Family: Normal distribution The conditional probability of x is given by: p x y, = h x exp φ x T y ln g y Example: Normal distribution N x; μ, Σ = 2π m Σ e 2 x μ T Σ x μ Can it be represented in the exponential family form? 3 2 N x x 2 ; 0 0, x x 55

56 Exponential Family: Normal distribution The conditional probability of x is given by: p x y, = h x exp φ x T y ln g y Example: Normal distribution N x; μ, Σ = Exponential family form: 2π m Σ e 2 x μ T Σ x μ 3 2 N x x 2 ; 0 0, φ x = x x x, = Σ μ vec Σ 2 h x = 2π m/2, g = Σ exp μ T Σ μ x x 56

57 N(0,) Exponential Family: Normal distribution The conditional probability of x is given by: p x y, = h x exp φ x T y ln g y Example: Normal distribution N x; μ, σ = σ 2π e 2σ 2 x μ 2 Exponential family form: 0.4 N x; 0, φ x = x x 2, = μ σ 2 2σ h x = 2π /2, g = σexp μ2 2σ x 57

58 Exponential Family in Classification The conditional probability of x is given by: p x y, = h x exp φ x T y ln g y Substitute into Bayes Rule (Recall p y = π y ) p y i x i, = p x i y i, p y i z Y p x i z, p z = h x i exp φ x i T y i ln g y i π y i z Y h x i exp φ x i T z ln g z π z 58

59 Exponential Family in Classification = The conditional probability of x is given by: p x y, = h x exp φ x T y ln g y Substitute into Bayes Rule (Recall p y = π y ) p y i x i, = p x i y i, p y i z Y p x i z, p z k π π k π = h x i exp φ x i T y i ln g y i π y i z Y h x i exp φ x i T z ln g z π z = exp φ x i T y i + b y i z Y exp φ x i T z + b z b y i = ln π y i ln g y i 59

60 Exponential Family in Classification = The conditional probability of x is given by: p x y, = h x exp φ x T y ln g y Substitute into Bayes Rule (Recall p y = π y ) p y i x i, = p x i y i, p y i z Y p x i z, p z b k = h x i exp φ x i T y i ln g y i π y i z Y h x i exp φ x i T z ln g z π z = exp φ x i T y i + b y i z Y exp φ x T i z + b b k z b yi = ln π yi ln g yi 60

61 Exponential Family in Classification = The conditional probability of x is given by: p x y, = h x exp φ x T y ln g y Substitute into Bayes Rule (Recall p y = π y ) p y i x i, = p x i y i, p y i z Y p x i z, p z b k b k = h x i exp φ x i T y i ln g y i π y i z Y h x i exp φ x i T z ln g z π z = exp φ x i T y i z Y exp φ x i T z f x, y = φ x T y y x = argmax f x, z z Y 6

62 Generative Logistic Regression Using the generative approach & assumptions Data generation model from slide 52 p x y, is an exponential family distribution We arrived at this conditional distribution for y: p y x, = exp φ x T y z Y exp φ x T z We do not know the parameters y. We will soon show how to infer the MAP- (maximum a posteriori-) parameter. 62

63 Linear Classification Summary In the 2-class case, the linear classifier has a decision function: f x = φ x T + b & a classifier: y x = sign f x In the multi-class case, the linear classifier has a decision function: f x, y = φ x T y + b y & a classifier: y x = argmax z Y f x, z The data is mapped by φ x to feature space. The offsets b y can be appended to the end of the vector y & a is added to the end of each φ x i. The parameter vector y is a normal vector of a separating hyperplane. 63

Models, Data, Learning Problems

Models, Data, Learning Problems Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Models, Data, Learning Problems Tobias Scheffer Overview Types of learning problems: Supervised Learning (Classification, Regression,

More information

Linear Classifiers (Kernels)

Linear Classifiers (Kernels) Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers (Kernels) Blaine Nelson, Christoph Sawade, Tobias Scheffer Exam Dates & Course Conclusion There are 2 Exam dates: Feb 20 th March

More information

Linear Classifiers IV

Linear Classifiers IV Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers IV Blaine Nelson, Tobias Scheffer Contents Classification Problem Bayesian Classifier Decision Linear Classifiers, MAP Models Logistic

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

Logistic Regression. Machine Learning Fall 2018

Logistic Regression. Machine Learning Fall 2018 Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

More information

CS 6375 Machine Learning

CS 6375 Machine Learning CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures

More information

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Machine Learning Support Vector Machines. Prof. Matteo Matteucci Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way

More information

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Discriminative Learning and Big Data

Discriminative Learning and Big Data AIMS-CDT Michaelmas 2016 Discriminative Learning and Big Data Lecture 2: Other loss functions and ANN Andrew Zisserman Visual Geometry Group University of Oxford http://www.robots.ox.ac.uk/~vgg Lecture

More information

Math for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han

Math for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han Math for Machine Learning Open Doors to Data Science and Artificial Intelligence Richard Han Copyright 05 Richard Han All rights reserved. CONTENTS PREFACE... - INTRODUCTION... LINEAR REGRESSION... 4 LINEAR

More information

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

Bayesian Support Vector Machines for Feature Ranking and Selection

Bayesian Support Vector Machines for Feature Ranking and Selection Bayesian Support Vector Machines for Feature Ranking and Selection written by Chu, Keerthi, Ong, Ghahramani Patrick Pletscher pat@student.ethz.ch ETH Zurich, Switzerland 12th January 2006 Overview 1 Introduction

More information

Logistic Regression. COMP 527 Danushka Bollegala

Logistic Regression. COMP 527 Danushka Bollegala Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Least Squares Regression

Least Squares Regression E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute

More information

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d.

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer

More information

Machine Learning 2017

Machine Learning 2017 Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section

More information

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so. CS 89 Spring 07 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Relationship between Least Squares Approximation and Maximum Likelihood Hypotheses

Relationship between Least Squares Approximation and Maximum Likelihood Hypotheses Relationship between Least Squares Approximation and Maximum Likelihood Hypotheses Steven Bergner, Chris Demwell Lecture notes for Cmpt 882 Machine Learning February 19, 2004 Abstract In these notes, a

More information

Linear Models for Regression

Linear Models for Regression Linear Models for Regression Machine Learning Torsten Möller Möller/Mori 1 Reading Chapter 3 of Pattern Recognition and Machine Learning by Bishop Chapter 3+5+6+7 of The Elements of Statistical Learning

More information

Bayesian Methods: Naïve Bayes

Bayesian Methods: Naïve Bayes Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Kernel methods, kernel SVM and ridge regression

Kernel methods, kernel SVM and ridge regression Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;

More information

Logistic Regression. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Logistic Regression. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824 Logistic Regression Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative Please start HW 1 early! Questions are welcome! Two principles for estimating parameters Maximum Likelihood

More information

An Introduction to Statistical and Probabilistic Linear Models

An Introduction to Statistical and Probabilistic Linear Models An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning

More information

6.036 midterm review. Wednesday, March 18, 15

6.036 midterm review. Wednesday, March 18, 15 6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM 1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University

More information

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Gaussian and Linear Discriminant Analysis; Multiclass Classification Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015

More information

Classification objectives COMS 4771

Classification objectives COMS 4771 Classification objectives COMS 4771 1. Recap: binary classification Scoring functions Consider binary classification problems with Y = { 1, +1}. 1 / 22 Scoring functions Consider binary classification

More information

1 Machine Learning Concepts (16 points)

1 Machine Learning Concepts (16 points) CSCI 567 Fall 2018 Midterm Exam DO NOT OPEN EXAM UNTIL INSTRUCTED TO DO SO PLEASE TURN OFF ALL CELL PHONES Problem 1 2 3 4 5 6 Total Max 16 10 16 42 24 12 120 Points Please read the following instructions

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Machine Learning. Classification, Discriminative learning. Marc Toussaint University of Stuttgart Summer 2015

Machine Learning. Classification, Discriminative learning. Marc Toussaint University of Stuttgart Summer 2015 Machine Learning Classification, Discriminative learning Structured output, structured input, discriminative function, joint input-output features, Likelihood Maximization, Logistic regression, binary

More information

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships

More information

Midterm: CS 6375 Spring 2015 Solutions

Midterm: CS 6375 Spring 2015 Solutions Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

Machine Learning Practice Page 2 of 2 10/28/13

Machine Learning Practice Page 2 of 2 10/28/13 Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes

More information

CS 231A Section 1: Linear Algebra & Probability Review

CS 231A Section 1: Linear Algebra & Probability Review CS 231A Section 1: Linear Algebra & Probability Review 1 Topics Support Vector Machines Boosting Viola-Jones face detector Linear Algebra Review Notation Operations & Properties Matrix Calculus Probability

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA Tobias Scheffer Overview Principal Component Analysis (PCA) Kernel-PCA Fisher Linear Discriminant Analysis t-sne 2 PCA: Motivation

More information

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang CS 231A Section 1: Linear Algebra & Probability Review Kevin Tang Kevin Tang Section 1-1 9/30/2011 Topics Support Vector Machines Boosting Viola Jones face detector Linear Algebra Review Notation Operations

More information

Day 4: Classification, support vector machines

Day 4: Classification, support vector machines Day 4: Classification, support vector machines Introduction to Machine Learning Summer School June 18, 2018 - June 29, 2018, Chicago Instructor: Suriya Gunasekar, TTI Chicago 21 June 2018 Topics so far

More information

Machine Learning Basics

Machine Learning Basics Security and Fairness of Deep Learning Machine Learning Basics Anupam Datta CMU Spring 2019 Image Classification Image Classification Image classification pipeline Input: A training set of N images, each

More information

Click Prediction and Preference Ranking of RSS Feeds

Click Prediction and Preference Ranking of RSS Feeds Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS

More information

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017 CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class

More information

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Antti Ukkonen TAs: Saska Dönges and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer,

More information

10-701/ Machine Learning - Midterm Exam, Fall 2010

10-701/ Machine Learning - Midterm Exam, Fall 2010 10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam

More information

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES Supervised Learning Linear vs non linear classifiers In K-NN we saw an example of a non-linear classifier: the decision boundary

More information

Lecture 3: Multiclass Classification

Lecture 3: Multiclass Classification Lecture 3: Multiclass Classification Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Some slides are adapted from Vivek Skirmar and Dan Roth CS6501 Lecture 3 1 Announcement v Please enroll in

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write

More information

Introduction to Logistic Regression

Introduction to Logistic Regression Introduction to Logistic Regression Guy Lebanon Binary Classification Binary classification is the most basic task in machine learning, and yet the most frequent. Binary classifiers often serve as the

More information

Learning theory. Ensemble methods. Boosting. Boosting: history

Learning theory. Ensemble methods. Boosting. Boosting: history Learning theory Probability distribution P over X {0, 1}; let (X, Y ) P. We get S := {(x i, y i )} n i=1, an iid sample from P. Ensemble methods Goal: Fix ɛ, δ (0, 1). With probability at least 1 δ (over

More information

MLE/MAP + Naïve Bayes

MLE/MAP + Naïve Bayes 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes Matt Gormley Lecture 19 March 20, 2018 1 Midterm Exam Reminders

More information

MLE/MAP + Naïve Bayes

MLE/MAP + Naïve Bayes 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes MLE / MAP Readings: Estimating Probabilities (Mitchell, 2016)

More information

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The

More information

Naive Bayes and Gaussian Bayes Classifier

Naive Bayes and Gaussian Bayes Classifier Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others February 22, 2016 Naive Bayes and Gaussian Bayes Classifier February 22, 2016 1 / 21 Naive Bayes Bayes Rule:

More information

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS

More information

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018 From Binary to Multiclass Classification CS 6961: Structured Prediction Spring 2018 1 So far: Binary Classification We have seen linear models Learning algorithms Perceptron SVM Logistic Regression Prediction

More information

18.9 SUPPORT VECTOR MACHINES

18.9 SUPPORT VECTOR MACHINES 744 Chapter 8. Learning from Examples is the fact that each regression problem will be easier to solve, because it involves only the examples with nonzero weight the examples whose kernels overlap the

More information

Midterm. Introduction to Machine Learning. CS 189 Spring You have 1 hour 20 minutes for the exam.

Midterm. Introduction to Machine Learning. CS 189 Spring You have 1 hour 20 minutes for the exam. CS 189 Spring 2013 Introduction to Machine Learning Midterm You have 1 hour 20 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. Please use non-programmable calculators

More information

Bayes Rule. CS789: Machine Learning and Neural Network Bayesian learning. A Side Note on Probability. What will we learn in this lecture?

Bayes Rule. CS789: Machine Learning and Neural Network Bayesian learning. A Side Note on Probability. What will we learn in this lecture? Bayes Rule CS789: Machine Learning and Neural Network Bayesian learning P (Y X) = P (X Y )P (Y ) P (X) Jakramate Bootkrajang Department of Computer Science Chiang Mai University P (Y ): prior belief, prior

More information

Machine Learning, Fall 2009: Midterm

Machine Learning, Fall 2009: Midterm 10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all

More information

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima. http://goo.gl/xilnmn Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT

More information

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall

More information

Loss Functions, Decision Theory, and Linear Models

Loss Functions, Decision Theory, and Linear Models Loss Functions, Decision Theory, and Linear Models CMSC 678 UMBC January 31 st, 2018 Some slides adapted from Hamed Pirsiavash Logistics Recap Piazza (ask & answer questions): https://piazza.com/umbc/spring2018/cmsc678

More information

Lecture 9: PGM Learning

Lecture 9: PGM Learning 13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and

More information

The Naïve Bayes Classifier. Machine Learning Fall 2017

The Naïve Bayes Classifier. Machine Learning Fall 2017 The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning

More information

Managing Uncertainty

Managing Uncertainty Managing Uncertainty Bayesian Linear Regression and Kalman Filter December 4, 2017 Objectives The goal of this lab is multiple: 1. First it is a reminder of some central elementary notions of Bayesian

More information

Advanced Introduction to Machine Learning

Advanced Introduction to Machine Learning 10-715 Advanced Introduction to Machine Learning Homework Due Oct 15, 10.30 am Rules Please follow these guidelines. Failure to do so, will result in loss of credit. 1. Homework is due on the due date

More information

CIS 520: Machine Learning Oct 09, Kernel Methods

CIS 520: Machine Learning Oct 09, Kernel Methods CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed

More information

Warm up: risk prediction with logistic regression

Warm up: risk prediction with logistic regression Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T

More information

Logistic regression and linear classifiers COMS 4771

Logistic regression and linear classifiers COMS 4771 Logistic regression and linear classifiers COMS 4771 1. Prediction functions (again) Learning prediction functions IID model for supervised learning: (X 1, Y 1),..., (X n, Y n), (X, Y ) are iid random

More information

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc.

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Harsha Veeramachaneni Thomson Reuter Research and Development April 1, 2010 Harsha Veeramachaneni (TR R&D) Logistic Regression April 1, 2010

More information

Introduction to Logistic Regression and Support Vector Machine

Introduction to Logistic Regression and Support Vector Machine Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel

More information

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber Machine Learning Regression-Based Classification & Gaussian Discriminant Analysis Manfred Huber 2015 1 Logistic Regression Linear regression provides a nice representation and an efficient solution to

More information

5. Discriminant analysis

5. Discriminant analysis 5. Discriminant analysis We continue from Bayes s rule presented in Section 3 on p. 85 (5.1) where c i is a class, x isap-dimensional vector (data case) and we use class conditional probability (density

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines Pattern Recognition and Machine Learning James L. Crowley ENSIMAG 3 - MMIS Fall Semester 2016 Lessons 6 10 Jan 2017 Outline Perceptrons and Support Vector machines Notation... 2 Perceptrons... 3 History...3

More information