Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer
|
|
- Lambert Lamb
- 6 years ago
- Views:
Transcription
1 Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers Blaine Nelson, Tobias Scheffer
2 Contents Classification Problem Bayesian Classifier Decision Linear Classifiers, MAP Models Logistic Regression Regularized Empirical Risk Minimization Kernel Perceptron, Support Vector Machine Ridge Regression, LASSO Representer Theorem Dualized Perceptron, Dual SVM Mercer Map Learning with Structured Input & Output Taxonomy, Sequences, Ranking, Decoder, Cutting Plane Algorithm 2
3 Prerequisites Statistics Random Variables, Distributions Bayes Formula Linear Algebra Vectors & Matrices Transpose, inverse Matrices Eigenvalues & Eigenvectors Calculus (Analysis) Derivatives, partial derivatives Gradients 3
4 Classification Input: an instance x X E.g., X can be a vector space over attributes The Instance is then an assignment of attributes. x = x x m is a feature vector Output: Class y Y; where Y is a finite set. The class is also referred to as the target attribute y is also referred to as the (class) label x classifier y 4
5 Classification: Example Input: Instance x X X : the set of all possible combinations of regiment of medication Attribute Medication # included? Medication #6 included? Instance x Attribute values Feature vector Medication combination Output: y Y = toxic, ok / classifier 5
6 Classification: Example Input: Instance x X X : the set of all 6 6 pixel bitmaps Attribute Gray value of pixel Gray value of pixel 256 Instance x pixel values Output: y Y = 0,,2,3,4,5,6,7,8,9 : recognized digit classifier "6" 6
7 Classification: Example Input: Instance x X X : bag-of-words representation of all possible texts Attribute Word # occurs? Word #m occurs? m,000,000 Instance x Output: y Y = spam, ok Aardvark Beneficiary Friend Sterling Science Dear Beneficiary, your address has been picked online in this years MICROSOFT CONSUMER AWARD as a Winner of One Hundred and Fifty Five Thousand Pounds Sterling Dear Beneficiary, We are pleased to notify you that your address has been picked online in this second quarter's MICROSOFT CONSUMER AWARD (MCA) as a Winner of One Hundred and Fifty Five Thousand Pounds Sterling classifier Spam 7
8 Classifier Learning Input to the Learner: Training data T n. X = x x m x n x nm y = y y n Training Data: T n = x, y,, x n, y n 8
9 Classifier Learning Input to the Learner: Training data T n. X = y = x x m x n x nm y y n Output: a Model y X Y for example: if φ x y x = T 0 otherwise Training Data: T n = x, y,, x n, y n Linear classifier with parameter vector. 9
10 BAYESIAN CLASSIFICATION 0
11 Empirical Inference Inference of the probability of y given instance x and training data T n? p y x, T n Inference of the most likely class y = argmax y p y x, T n We must make assumptions about the process by which the data is generated to be able to calculate the most probable class. We assume all data are independent given model.
12 Empirical Inference Inference of the probability of y given instance x and training data T n? p y x, T n = p y, x, T n d Integration over space of model parameters: Bayesian Model Averaging = p y x, p T n d Inference of the most likely class y = argmax p y x, T n y Independence assumption = argmax y p y x, p T n d 2
13 Empirical Inference Inference of the probability of y given instance x and training data T n? p y x, T n = p y, x, T n d Class probability at instance x given = p y x, p T n d Inference of the most likely class y = argmax p y x, T n y a posteriori probability (Posterior) of model given training data = argmax y p y x, p T n d 3
14 Empirical Inference Inference of the probability of y given instance x and training data T n? p y x, T n = p y x, p T n d Generally, no closed-form solution for classification. Difficult to approximate since the space of all parameter vectors is too large. 4
15 Empirical Inference Inference of the probability of y given instance x and training data? p y x, T n = p y x, p T n d where MAP = argmax p y x, MAP p T n Approximation of the weighted sum through its maximum. Classification through the most probable single model instead of a sum over all models. 5
16 Inference Example Clinical study: Medication combination x and outcome y Inference of the probability of y given instance x and training data? p y x, T n = p y x, p T n d Integral over all models where MAP = argmax p y x, MAP p T n Most probable model given training data (Maximum a- Approximation of the weighted sum Posteriori through model) its maximum. Classification through the most probable single model instead of a sum over all models. 6
17 Graphical Model for Classification A graphical model defines a stochastic process It constitutes our modeling assumptions about the data generation process y i y First, a model parameter is selected (or sampled) x i n x This parameterizes the training data p y i x i, The distribution of the data p x i is not further modeled 7
18 Example Evolution determines physiological parameters of humans Given these parameters and a combination of medication, Nature rolls dice to decide whether we survive this combination of drugs. Every time this combination of medicine is administered, the dice are re-rolled according to p y i x i, to determine the result. x i n x? 8
19 Empirical Inference Computation of MAP : MAP = argmax p T n = argmax p,t n p T n x i y i n 9
20 Empirical Inference Computation of MAP : MAP = argmax p T n = argmax p,t n p T n x i y i n = argmax p p X p y X, p T n (data model) 20
21 Empirical Inference Computation of MAP : MAP = argmax p T n = argmax p,t n p T n x i y i n = argmax = argmax p p X p y X, p T n p y X, p (Constants w.r.t. ) 2
22 Empirical Inference Computation of p y X,. Independence of the training data (from the graphical model) n y i p y X, = p y i x i, i= x i n Discriminative class probabilities p y i x i, are directly specified by the model. 22
23 Empirical Inference Discriminative Models Summary of empirical inference to this point: P y x, T n = p y x, p T n d p y x, MAP MAP = argmax p y X, p p y X, = p y i x i, n i= p y i x i, is directly specified by the model 23
24 Empirical Inference Discriminative Models Summary of empirical inference to this point: Integral over all models: Bayesian model averaging P y x, T n = p y x, p T n d p y x, MAP Likelihood of the Class MAP = argmax p y X, = p y i x i, Training data are independent n i= p y X, p Prior over model parameters MAP: Approximation by most probable model p y i x i, is directly specified by the model 24
25 DISCRIMINATIVE APPROACH 25
26 Class Probabilities: Discriminative Models How should we model p y x,? Simple Approach: assume p depends on x T ; i.e. p y x, = q y x T Linear Model: Eg. Binary logistic regression: p y = + x, = + exp x T + b p y = x, = p y = + x, = + exp x T + b Later, we look at other frameworks for linear models 26
27 Binary Logistic Regression Binary classification: classes + & - p y = + x, = + exp x T + b Decision point: p y = + x, = p y = x, 2 = + exp x T + b x T + b = 0 The set of points x x T + b = 0 form a separating plane between classes - & +. 27
28 Linear Models Hyperplane given by normal vector & displacement: H = x f x = x T + b = 0 Decision function: f x Classifier: = x T + b y x = sign f x Discriminative class probability: P y = + x, = x 2 b +exp x T +b f x f x > 0 f x < 0 = 0 x 28
29 Linear Models Hyperplane given by normal vector & displacement: H = x f x = x T + b = 0 p x y = +, Decision function: f x = x T + b Classifier: y x = sign f x x 2 Discriminative class probability: p x y =, x P y = + x, = +exp x T +b 29
30 Linear Models Hyperplane given by normal vector & displacement: H = x f x = x T + b = 0 Decision function: f x = x T + b Classifier: y x = sign f x x 2 f x = 0 Discriminative class probability: p x y =, x P y = + x, = +exp x T +b 30
31 Logistic Regression: Learning Problem Inference of MAP = argmax p T n Another Assumption: the prior is normally distributed with a mean 0: p = N ; 0, Σ 3
32 Logistic Regression: Learning Problem Inference of the MAP-Parameter: MAP = argmax p T n = argmax p y X, p = argmax = argmax = argmax = argmax = argmin log p y X, + log p n i= log p y i x i, + log N ; 0, Σ log y + exp x T i =+ + b + log + exp + x T + b n i= y i = log n i= + exp y i x T + b + + log e 2 T Σ 2π m Σ log + exp y i x T + b + 2 T Σ 32
33 Logistic Regression: Learning Problem Inference of the MAP-Parameter: MAP = argmax p T n = argmax p y X, p = argmax = argmax = argmax = argmax = argmin log p y X, + log p n i= log p y i x i, + log N ; 0, Σ log y + exp x T i =+ + b + log + exp + x T + b n i= y i = log n i= + exp y i x T + b + + log e 2 T Σ 2π m Σ log + exp y i x T + b + 2 T Σ 33
34 Logistic Regression: Learning Problem Inference of the MAP-Parameter: MAP = argmax p T n = argmax p y X, p = argmax = argmax = argmax = argmax = argmin log p y X, + log p n i= log p y i x i, + log N ; 0, Σ log y + exp x T i =+ + b + log + exp + x T + b n i= y i = log n i= + exp y i x T + b + + log e 2 T Σ 2π m Σ log + exp y i x T + b + 2 T Σ 34
35 Logistic Regression: Learning Problem Inference of the MAP-Parameter: MAP = argmax p T n = argmax p y X, p = argmax = argmax = argmax = argmax = argmin log p y X, + log p n i= log p y i x i, + log N ; 0, Σ log y + exp x T i =+ + b + log + exp + x T + b n i= y i = log n i= + exp y i x T + b + + log e 2 T Σ 2π m Σ log + exp y i x T + b + 2 T Σ 35
36 Logistic Regression: Learning Problem Inference of the MAP-Parameter. Binary logistic regression: classes + and - MAP = argmin n i= log + exp y i x T + b + 2 T Σ y i, + How can MAP be computed? To be continued 36
37 FEATURE MAPPINGS 37
38 Linear Classification Reformulation by adding a constant input feature (affine transformation): f x = φ x m T m + b m = φ x f f f= m+ = φ x f f f= + b f x = x T + b y x = sign f x where φ x m+ = and m+ = b = φ x m+ T m+ 38
39 Linear Classification Reformulation by adding a constant input feature (affine transformation): f x = φ x m T m + b m = φ x f f f= m+ = φ x f f f= + b f X x = x T + b y x = sign f x where φ x m+ = and m+ = b = φ x m+ T m+ f x = φ x T y x = sign f x 39
40 Additional Feature Maps The abstraction φ x allows us to learn in more general feature spaces We can replace x by φ x & use the same learning! MAP = argmin n i= log + exp y i φ x T + b + 2 T Σ Aside: The tensor product between an n and m dimensional vector is an nm-dimensional vector of all products of elements: x y = x x n y y m = x y x y m x n y x n y m 40
41 Feature Mappings Linear Mapping: φ x i = x i Quadratic Mapping: φ x i = x i x i x i Tensor product Polynomial Mapping: φ x i = x i x i x i x i x i p factors Frequently, it occurs that feature mappings do not have a closed form expression, but can be specified indirectly via their inner products E.g., RBF kernel, Hash kernel functions 4
42 Sufficient Statistics, Feature Mappings Linear Mappings: Linear Mapping φ x i = x i is the sufficient statistic, when p x y, = N x; μ y, Σ and the covariance matrix is the same for all classes. A linear mapping φ x i = x i is then sufficient to calculate the class probabilities. Quadratic Mappings: More generally, a quadratic mapping is the sufficient statistic when classes have different covariance matrices. 42
43 Linear Models Feature Mappings Hyperplane given by normal vector & displacement: H = x f x = φ x T + b = 0 Decision function: f x Classifier: = φ x T + b y x = sign f x x 2 p x y = +, φ x i = x i x i x i Discriminative class probability: p x y =, x P y = + x, = +exp φ x T +b 43
44 Linear Models Feature Mappings Hyperplane given by normal vector & displacement: H = x f x = φ x T + b = 0 Decision function: f x Classifier: = φ x T + b y x = sign f x x 2 φ x i = f x = 0 x i x i x i Discriminative class probability: p x y =, x P y = + x, = +exp φ x T +b 44
45 MULTI-CLASS CLASSIFICATION 45
46 Multi-class Classification Motivation: we would like to extend classification to problems with more than 2 classes. Y =,, k Problem: we cannot separate k classes with a single hyperplane. Idea: Each class y has a separate function f x, y that is used to predict how likely y is given x. Each function is modeled as linear. We predict class y with the highest scoring function for x. 46
47 Multi-class Logistic Regression Probability for class y: p y x, = exp φ x T y +b y z Y exp φ x T z +b z Exponent is affine in φ x (linear + offset) Denominator is constant w.r.t. y Class y is the most likely class if it satisfies y argmax φ x T z + b z z Y This is a linear (+offset) decision function. 47
48 Linear Models Multi-class Case Hyperplane given by normal vector & displacement: H,y = x f x, y = φ x T y + b y = 0 Decision functions: f x, y Classifier: y x = φ x T y + b y = argmax z Y f x, z x 2 f x, y > 0 y y2 f x, y 2 > 0 Discriminative class probability: P y x, = exp φ x T y + b y z Y exp φ x T z + b z y3 f x, y 3 x > 0 48
49 Logistic Regression: Learning Problem Inference of the MAP-Parameter: =,, k T MAP = argmax p T n = argmax p y X, p = argmax = argmax = argmin log p y X, + log p n i= n i= log p y i x i, + log N ; 0, Σ log exp φ x i T y i + b y i z Y exp φ x i T z + b z log e 2 T Σ 2π m Σ = argmin n i= log Σ z Y exp φ x i T z + b z φ x i T y i + b y i + T Σ 2 49
50 Summary Learning Logistic Regression If the modelling assumptions are fulfilled: Data generation model from Slide 7, p = N ; 0, Σ ; that is, the prior is normally distributed, Then we use P y x, = exp φ x T y + b y z Y exp φ x T z + b z And the Maximum-A-Posteriori-Parameter is MAP = argmin n i= log Σ z Y exp φ x i T z + b z φ x i T y i + b y i + T Σ 2 How can MAP be computed? To be continued 50
51 GENERATIVE APPROACH 5
52 Empirical Inference Generative Models Computation of p y X,. Independence of the training data (from the graphical model) p y X, = p y i x i, n i= Generative model: apply Bayes Rule, p y i x i, = p x i y i, p y i y Y p x i z, p z where p x i y i, and p y i are model specific. x i y i n x y 52
53 Exponential Family Probability of a class label is part of the parameter vector p y = π y p y i x i, = p x i y i, p y i z Y p x i y, p y The conditional probability of x is given by: p x y, = h x exp φ x T y ln g y For class k, we partition the parameter vector : = k π π π k 53
54 Exponential Family The conditional probability of x is given by: p x y, = h x exp φ x T y ln g y The representation φ x is the sufficient statistic φ x conveys all useful information about x for the probability distribution. Partition function g h x is the base measure. y normalizes the distribution The distribution is specified by h x, φ x,, & g. Many common distributions are in exponential family. 54
55 Exponential Family: Normal distribution The conditional probability of x is given by: p x y, = h x exp φ x T y ln g y Example: Normal distribution N x; μ, Σ = 2π m Σ e 2 x μ T Σ x μ Can it be represented in the exponential family form? 3 2 N x x 2 ; 0 0, x x 55
56 Exponential Family: Normal distribution The conditional probability of x is given by: p x y, = h x exp φ x T y ln g y Example: Normal distribution N x; μ, Σ = Exponential family form: 2π m Σ e 2 x μ T Σ x μ 3 2 N x x 2 ; 0 0, φ x = x x x, = Σ μ vec Σ 2 h x = 2π m/2, g = Σ exp μ T Σ μ x x 56
57 N(0,) Exponential Family: Normal distribution The conditional probability of x is given by: p x y, = h x exp φ x T y ln g y Example: Normal distribution N x; μ, σ = σ 2π e 2σ 2 x μ 2 Exponential family form: 0.4 N x; 0, φ x = x x 2, = μ σ 2 2σ h x = 2π /2, g = σexp μ2 2σ x 57
58 Exponential Family in Classification The conditional probability of x is given by: p x y, = h x exp φ x T y ln g y Substitute into Bayes Rule (Recall p y = π y ) p y i x i, = p x i y i, p y i z Y p x i z, p z = h x i exp φ x i T y i ln g y i π y i z Y h x i exp φ x i T z ln g z π z 58
59 Exponential Family in Classification = The conditional probability of x is given by: p x y, = h x exp φ x T y ln g y Substitute into Bayes Rule (Recall p y = π y ) p y i x i, = p x i y i, p y i z Y p x i z, p z k π π k π = h x i exp φ x i T y i ln g y i π y i z Y h x i exp φ x i T z ln g z π z = exp φ x i T y i + b y i z Y exp φ x i T z + b z b y i = ln π y i ln g y i 59
60 Exponential Family in Classification = The conditional probability of x is given by: p x y, = h x exp φ x T y ln g y Substitute into Bayes Rule (Recall p y = π y ) p y i x i, = p x i y i, p y i z Y p x i z, p z b k = h x i exp φ x i T y i ln g y i π y i z Y h x i exp φ x i T z ln g z π z = exp φ x i T y i + b y i z Y exp φ x T i z + b b k z b yi = ln π yi ln g yi 60
61 Exponential Family in Classification = The conditional probability of x is given by: p x y, = h x exp φ x T y ln g y Substitute into Bayes Rule (Recall p y = π y ) p y i x i, = p x i y i, p y i z Y p x i z, p z b k b k = h x i exp φ x i T y i ln g y i π y i z Y h x i exp φ x i T z ln g z π z = exp φ x i T y i z Y exp φ x i T z f x, y = φ x T y y x = argmax f x, z z Y 6
62 Generative Logistic Regression Using the generative approach & assumptions Data generation model from slide 52 p x y, is an exponential family distribution We arrived at this conditional distribution for y: p y x, = exp φ x T y z Y exp φ x T z We do not know the parameters y. We will soon show how to infer the MAP- (maximum a posteriori-) parameter. 62
63 Linear Classification Summary In the 2-class case, the linear classifier has a decision function: f x = φ x T + b & a classifier: y x = sign f x In the multi-class case, the linear classifier has a decision function: f x, y = φ x T y + b y & a classifier: y x = argmax z Y f x, z The data is mapped by φ x to feature space. The offsets b y can be appended to the end of the vector y & a is added to the end of each φ x i. The parameter vector y is a normal vector of a separating hyperplane. 63
Models, Data, Learning Problems
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Models, Data, Learning Problems Tobias Scheffer Overview Types of learning problems: Supervised Learning (Classification, Regression,
More informationLinear Classifiers (Kernels)
Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers (Kernels) Blaine Nelson, Christoph Sawade, Tobias Scheffer Exam Dates & Course Conclusion There are 2 Exam dates: Feb 20 th March
More informationLinear Classifiers IV
Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers IV Blaine Nelson, Tobias Scheffer Contents Classification Problem Bayesian Classifier Decision Linear Classifiers, MAP Models Logistic
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationCS 6375 Machine Learning
CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationDEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE
Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures
More informationMachine Learning Support Vector Machines. Prof. Matteo Matteucci
Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationDiscriminative Learning and Big Data
AIMS-CDT Michaelmas 2016 Discriminative Learning and Big Data Lecture 2: Other loss functions and ANN Andrew Zisserman Visual Geometry Group University of Oxford http://www.robots.ox.ac.uk/~vgg Lecture
More informationMath for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han
Math for Machine Learning Open Doors to Data Science and Artificial Intelligence Richard Han Copyright 05 Richard Han All rights reserved. CONTENTS PREFACE... - INTRODUCTION... LINEAR REGRESSION... 4 LINEAR
More informationMidterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationBayesian Support Vector Machines for Feature Ranking and Selection
Bayesian Support Vector Machines for Feature Ranking and Selection written by Chu, Keerthi, Ong, Ghahramani Patrick Pletscher pat@student.ethz.ch ETH Zurich, Switzerland 12th January 2006 Overview 1 Introduction
More informationLogistic Regression. COMP 527 Danushka Bollegala
Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationLeast Squares Regression
E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute
More informationMachine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d.
More informationMachine Learning for NLP
Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationMachine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart
Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer
More informationMachine Learning 2017
Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section
More informationMidterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.
CS 89 Spring 07 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationRelationship between Least Squares Approximation and Maximum Likelihood Hypotheses
Relationship between Least Squares Approximation and Maximum Likelihood Hypotheses Steven Bergner, Chris Demwell Lecture notes for Cmpt 882 Machine Learning February 19, 2004 Abstract In these notes, a
More informationLinear Models for Regression
Linear Models for Regression Machine Learning Torsten Möller Möller/Mori 1 Reading Chapter 3 of Pattern Recognition and Machine Learning by Bishop Chapter 3+5+6+7 of The Elements of Statistical Learning
More informationBayesian Methods: Naïve Bayes
Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationKernel methods, kernel SVM and ridge regression
Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;
More informationLogistic Regression. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824
Logistic Regression Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative Please start HW 1 early! Questions are welcome! Two principles for estimating parameters Maximum Likelihood
More informationAn Introduction to Statistical and Probabilistic Linear Models
An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning
More information6.036 midterm review. Wednesday, March 18, 15
6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationSupport Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM
1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationClassification objectives COMS 4771
Classification objectives COMS 4771 1. Recap: binary classification Scoring functions Consider binary classification problems with Y = { 1, +1}. 1 / 22 Scoring functions Consider binary classification
More information1 Machine Learning Concepts (16 points)
CSCI 567 Fall 2018 Midterm Exam DO NOT OPEN EXAM UNTIL INSTRUCTED TO DO SO PLEASE TURN OFF ALL CELL PHONES Problem 1 2 3 4 5 6 Total Max 16 10 16 42 24 12 120 Points Please read the following instructions
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationMachine Learning. Classification, Discriminative learning. Marc Toussaint University of Stuttgart Summer 2015
Machine Learning Classification, Discriminative learning Structured output, structured input, discriminative function, joint input-output features, Likelihood Maximization, Logistic regression, binary
More informationMachine Learning - MT & 5. Basis Expansion, Regularization, Validation
Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships
More informationMidterm: CS 6375 Spring 2015 Solutions
Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an
More informationCh 4. Linear Models for Classification
Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,
More informationMachine Learning Practice Page 2 of 2 10/28/13
Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes
More informationCS 231A Section 1: Linear Algebra & Probability Review
CS 231A Section 1: Linear Algebra & Probability Review 1 Topics Support Vector Machines Boosting Viola-Jones face detector Linear Algebra Review Notation Operations & Properties Matrix Calculus Probability
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA Tobias Scheffer Overview Principal Component Analysis (PCA) Kernel-PCA Fisher Linear Discriminant Analysis t-sne 2 PCA: Motivation
More informationCS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang
CS 231A Section 1: Linear Algebra & Probability Review Kevin Tang Kevin Tang Section 1-1 9/30/2011 Topics Support Vector Machines Boosting Viola Jones face detector Linear Algebra Review Notation Operations
More informationDay 4: Classification, support vector machines
Day 4: Classification, support vector machines Introduction to Machine Learning Summer School June 18, 2018 - June 29, 2018, Chicago Instructor: Suriya Gunasekar, TTI Chicago 21 June 2018 Topics so far
More informationMachine Learning Basics
Security and Fairness of Deep Learning Machine Learning Basics Anupam Datta CMU Spring 2019 Image Classification Image Classification Image classification pipeline Input: A training set of N images, each
More informationClick Prediction and Preference Ranking of RSS Feeds
Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS
More informationCPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017
CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class
More informationECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam
ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Antti Ukkonen TAs: Saska Dönges and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer,
More information10-701/ Machine Learning - Midterm Exam, Fall 2010
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam
More informationLINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning
LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES Supervised Learning Linear vs non linear classifiers In K-NN we saw an example of a non-linear classifier: the decision boundary
More informationLecture 3: Multiclass Classification
Lecture 3: Multiclass Classification Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Some slides are adapted from Vivek Skirmar and Dan Roth CS6501 Lecture 3 1 Announcement v Please enroll in
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write
More informationIntroduction to Logistic Regression
Introduction to Logistic Regression Guy Lebanon Binary Classification Binary classification is the most basic task in machine learning, and yet the most frequent. Binary classifiers often serve as the
More informationLearning theory. Ensemble methods. Boosting. Boosting: history
Learning theory Probability distribution P over X {0, 1}; let (X, Y ) P. We get S := {(x i, y i )} n i=1, an iid sample from P. Ensemble methods Goal: Fix ɛ, δ (0, 1). With probability at least 1 δ (over
More informationMLE/MAP + Naïve Bayes
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes Matt Gormley Lecture 19 March 20, 2018 1 Midterm Exam Reminders
More informationMLE/MAP + Naïve Bayes
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes MLE / MAP Readings: Estimating Probabilities (Mitchell, 2016)
More informationECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam
ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The
More informationNaive Bayes and Gaussian Bayes Classifier
Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others February 22, 2016 Naive Bayes and Gaussian Bayes Classifier February 22, 2016 1 / 21 Naive Bayes Bayes Rule:
More informationCOMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017
COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS
More informationFrom Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018
From Binary to Multiclass Classification CS 6961: Structured Prediction Spring 2018 1 So far: Binary Classification We have seen linear models Learning algorithms Perceptron SVM Logistic Regression Prediction
More information18.9 SUPPORT VECTOR MACHINES
744 Chapter 8. Learning from Examples is the fact that each regression problem will be easier to solve, because it involves only the examples with nonzero weight the examples whose kernels overlap the
More informationMidterm. Introduction to Machine Learning. CS 189 Spring You have 1 hour 20 minutes for the exam.
CS 189 Spring 2013 Introduction to Machine Learning Midterm You have 1 hour 20 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. Please use non-programmable calculators
More informationBayes Rule. CS789: Machine Learning and Neural Network Bayesian learning. A Side Note on Probability. What will we learn in this lecture?
Bayes Rule CS789: Machine Learning and Neural Network Bayesian learning P (Y X) = P (X Y )P (Y ) P (X) Jakramate Bootkrajang Department of Computer Science Chiang Mai University P (Y ): prior belief, prior
More informationMachine Learning, Fall 2009: Midterm
10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all
More informationStatistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.
http://goo.gl/xilnmn Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT
More informationNon-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines
Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall
More informationLoss Functions, Decision Theory, and Linear Models
Loss Functions, Decision Theory, and Linear Models CMSC 678 UMBC January 31 st, 2018 Some slides adapted from Hamed Pirsiavash Logistics Recap Piazza (ask & answer questions): https://piazza.com/umbc/spring2018/cmsc678
More informationLecture 9: PGM Learning
13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and
More informationThe Naïve Bayes Classifier. Machine Learning Fall 2017
The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning
More informationManaging Uncertainty
Managing Uncertainty Bayesian Linear Regression and Kalman Filter December 4, 2017 Objectives The goal of this lab is multiple: 1. First it is a reminder of some central elementary notions of Bayesian
More informationAdvanced Introduction to Machine Learning
10-715 Advanced Introduction to Machine Learning Homework Due Oct 15, 10.30 am Rules Please follow these guidelines. Failure to do so, will result in loss of credit. 1. Homework is due on the due date
More informationCIS 520: Machine Learning Oct 09, Kernel Methods
CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed
More informationWarm up: risk prediction with logistic regression
Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T
More informationLogistic regression and linear classifiers COMS 4771
Logistic regression and linear classifiers COMS 4771 1. Prediction functions (again) Learning prediction functions IID model for supervised learning: (X 1, Y 1),..., (X n, Y n), (X, Y ) are iid random
More informationLogistic Regression: Online, Lazy, Kernelized, Sequential, etc.
Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Harsha Veeramachaneni Thomson Reuter Research and Development April 1, 2010 Harsha Veeramachaneni (TR R&D) Logistic Regression April 1, 2010
More informationIntroduction to Logistic Regression and Support Vector Machine
Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel
More informationMachine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber
Machine Learning Regression-Based Classification & Gaussian Discriminant Analysis Manfred Huber 2015 1 Logistic Regression Linear regression provides a nice representation and an efficient solution to
More information5. Discriminant analysis
5. Discriminant analysis We continue from Bayes s rule presented in Section 3 on p. 85 (5.1) where c i is a class, x isap-dimensional vector (data case) and we use class conditional probability (density
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationPattern Recognition and Machine Learning. Perceptrons and Support Vector machines
Pattern Recognition and Machine Learning James L. Crowley ENSIMAG 3 - MMIS Fall Semester 2016 Lessons 6 10 Jan 2017 Outline Perceptrons and Support Vector machines Notation... 2 Perceptrons... 3 History...3
More information