Linear Classification
|
|
- Gabriella Griffith
- 6 years ago
- Views:
Transcription
1 Linear Classification Lili MOU moull12 23 April 2015
2 Outline Introduction Discriminant Functions Probabilistic Generative Models Probabilistic Discriminative Models Bayesian Logistic Regression
3 Outline Introduction Discriminant Functions Probabilistic Generative Models Probabilistic Discriminative Models Bayesian Logistic Regression
4 Prologue Linear classification is easy my good old friend, logistic regression, always serves as a baseline method in various applications. Through a systematic study, however, we can grasp the main idea behind a range of machine learning techniques. This seminar also precedes our future discussion on GP classification. References: The materials basically follow Chapter 4 in Pattern Recognition and Machine Learning. Figures are taken (w/ or w/o modification) without further acknowledgment. See also A. Webb, et al., Statistical Pattern Recognition.
5 The Linear Classification Problem The Road Map Discriminant functions Probabilistic generative models Probabilistic discriminative models Bayesian logistic regression
6 On Non-Linearity Generalized linearity: non-linear transformation by basis functions Feature engineering/selection Kernels: transforming to a Hilbert space, fully characterized by a predefined inner-product operation SVM (a discriminant function) Gaussian processes (non-parametric Bayes) k-nn (slacked similarity measure, non-parametric discriminant) Composition of non-linear activation functions Neural networks (finite feature learning, equivalent to learning a sophisticated kernel mapping input data to another Euclidean space) My questions Can neural networks composite infinite dimensional features (with kernels)? Sparse kernel network? Or is it necessary? Or indeed, neural networks ARE non-parametric!
7 Outline Introduction Discriminant Functions Probabilistic Generative Models Probabilistic Discriminative Models Bayesian Logistic Regression
8 The Decision Boundary Let x = (x 1,, x n ) T be a data sample in R n Let y {0, 1} be the label of x (a two-class problem) A linear classifier always takes the form { 1, if w y = T x + w 0 > 0 0, otherwise The decision boundary is a hyperplane in R n 1 w T x + w 0 = 0
9 Multiclass problem Let K be the number of classes one-versus-the-rest (K 1 classifiers) one-versus-one + voting (K(K 1)/2 classifiers) Scoring y k (x) = wk T x + w k0
10 Linear Regression as Classiciation Let the target value be 1-of-K coded. (K: # of classes) The scoring function is y k (x) = w T k x + w k0 The least square objective function m K J = (y k (x (i) ) t (i)) 2 i=1 k=1 The problem of being too correct the target value is too far away from a Gaussian distribution.
11 Perception Learning Algorithm Non-linear hard squashing function { 1, if w y = T x + w 0 > 0 1, otherwise Perception learning algorithm For each misclassified sample x (i) : w (τ+1) = w (τ) + ηx (i) t (i) For its convergence theorem, please refer to Neural Networks and Machine Learning. Pros and Cons + Cares about mis-classified samples only Does not work with linearly inseparable data Does not generalize to multi-class problems + Introduces non-linear activation function
12 Fisher s Linear Discriminant Heuristics: Attempt #1: Maximize the class separation Attempt #2 (the Fisher criterion): Maximize the ratio of the between-class variance to the within-class variance (after projecting to a one-dimensional space, perpendicular to the decision boundary)
13 Formalization of Fisher s Linear Discriminant Between-class variance m 1 = Within-class variance 1 N 1 i C 1 x (i), m 2 = 1 N 2 s 2 k = i C k ( y (i) m k ) 2 i C 2 x (i) where The cost function y n = w T x n J(w) = wt (m 2 m 1 ) s s2 2
14 Solution to Fisher s Linear Discriminant where J(w) = wt (m 2 m 1 ) s s2 2 = wt S B w w T S w w S B = (m 2 m 1 )(m 2 m 1 ) T S W = i C 1 (x i m 1 )(x i m 1 ) T + i C 2 (x i m 2 )(x i m 2 ) T Differentiating the cost function and setting it to 0, we obtain Thus (ws B w)s W w = (w T S W w)s B w w S 1 W (m 2 m 1 ) See Pattern Recognition and Machine Learning for multi-class scenarios.
15 Outline Introduction Discriminant Functions Probabilistic Generative Models Probabilistic Discriminative Models Bayesian Logistic Regression
16 A Probabilistic Model A data (x, y) is generated according to the following story Step 1: Choose a box C k, i.e., t = k, with probability p(c k ) Step 2: Draw a ticket x from box k with probability p(x C k ) The Bayesian network Posterior probability (t) (x) p(x C 1 )p(c 1 ) p(c 1 x) = p(x C 1 )p(c 1 ) + p(x C 2 )p(c 2 ) 1 = = σ(a) 1 + exp( a) where a = ln p(x C 1)p(C 1 ) p(x C 2 )p(c 2 ) = ln p(c 1 x) p(c 2 x)
17 Multi-class Scenarios Softmax Predicting the class label p(c k x) = p(x C k)p(c k ) j p(x C j)p(c j ) ˆt = argmax p(c k x) k
18 Generative Models v.s. Discriminative Models Training objectives i.e., v.s. maximize p(x, t; Θ) Θ maximize p(x t; Θ)p(t; Θ) Θ maximize p(t x; Θ) Θ
19 Gaussian Inputs Assumption p(x C k ) = 1 { (2π) n Σ exp 1 } 2 (x µ k) T Σ 1 (x µ) Consider a binary classification problem where w = Σ 1 (µ 1 µ 2 ) p(c 1 x) = σ(w T x + w 0 ) w 0 = 1 2 µt 1 Σ 1 µ µt 2 Σ 1 µ 2 + ln p(c 1) p(c 2 ) Parameter estimation: Maximum likelihood estimation See also Ch 2.3, 2.4 in Statistical Pattern Recognition.
20 Gaussian Classifiers Shared covariance matrix Σ Linear decision boundary Different Σ k Quadratic decision boundary Warning: Don t use Gaussian Classifiers
21 Discrete Features Consider binary features x = (x 1,, x n ) T, where x i {0, 1}. p(x C k ) has 2 n 1 independent free parameters Naïve Bayes assumption: x i independent p(x C k ) = n p(x i C k ) = i=1 Posterior class distributions a k (x) = n i=1 p(c k x) = σ(a k (x)) µ x i ki (1 µ ki) 1 x i n {x i ln µ ki (1 x i ) ln(1 µ ki )} + ln p(c k ) i=1 Linear in features x!
22 Exponential Family Let class-conditional distributions take the form Binary classification p(x λ k ) = h(x)g(λ k ) exp { λ T k u(x)} a(x) = (λ 1 λ 2 ) T x + ln g(λ 1 ) ln g(λ 2 ) + ln p(c 1 ) ln p(c 2 ) Multi-class problem a k (x) = λ T k x + ln g(λ k) + ln p(c k ) Examples: Normal, exponential, log-normal, gamma, chi-squared, beta, Dirichlet, Bernoulli, categorical, Poisson, geometric, inverse Gaussian, von Mises, and von Mises-Fisher. Provided some parameters are fixed, Pareto, binomial, multinomial, and negative binomial are also in the exponential family.
23 Outline Introduction Discriminant Functions Probabilistic Generative Models Probabilistic Discriminative Models Bayesian Logistic Regression
24 Logistic Regression Under rather general assumptions, the posterior class distribution takes the form y(x) = p(c 1 x) = σ(w T x + w 0 ) The idea is to optimize directly the above posterior distribution. Fewer parameters, better performance. Consider a binary classification problem with n-dimensional feature space Logistic regression: n parameters A Gaussian classifier: + Means: 2n + Covariance matrices (shared): n(n + 1)/2 + Class prior: 1
25 Probit Regression What if we substitute σ with other squashing functions? Probit function Φ(a) = a N (θ 0, 1) dθ Similar performance compared with logistic regression More prone to outliers (Why? Consider the decay ) rate Useful later
26 Outline Introduction Discriminant Functions Probabilistic Generative Models Probabilistic Discriminative Models Bayesian Logistic Regression
27 Bayesian Learning Let y {0, 1} m denote the labels of training data φ 1,, φ m Prior p(w), which is a $64,000,000 question Likelihood m m p(y w) = p(y (i) w) = σ (w T φ (i)) t (i) ( 1 σ (w T φ (i))) 1 t (i) Posterior i=1 p(w y) = p(w)p(y w) p(y) i=1 w p(w)p(y w) = p(w) Predictive density p(y y) = dw p(y w) p(w y) y m σ( ) t(i) (1 σ( )) 1 t(i) i=1 dw σ ( w T ) m φ p(w) σ( ) t(i) (1 σ( )) 1 t(i) i=1
28 Intractability Posterior is intractable due to the normalizing factor. Predictive density is intractable due to the integral. We have to resort to approximations Sampling methods Stochastic, usually asymptotically correct, hard to scale Deterministic methods Do things wrongly and hope they work
29 Laplace Approximation Fit a Gaussian at a mode The standard deviation is chosen such that... the second-order derivative of the log probability matches The first-order derivative is always 0 at a mode Scale free in representing the unnormalized measure Real variables only Only local properties captured, multi-mode distributions?
30 Fitting a Gaussian Let p(z) = 1 Z f(z) be a true distribution, where Z = f(z) dz Step 1: Find a mode z 0 of p(z), by gradient methods, say, satisfying df(z) dz = 0 z=z0 Step 2: Consider a Taylor expansion of ln f(z) at z 0 ln f(z) ln f(z 0 ) 1 2 A(z z 0) 2 where A is given by A = d2 ln f(z) dz2 z=z0 Taking the exponential, f(z) f(z 0 ) exp { A2 } (z z 0) 2 Step 3: Normalize to a Gaussian distribution ( ) A 1/2 q(z) = exp { A2 } 2π (z z 0) 2
31 Laplace Approximation for Multivariate Distributions To approximate p(z) = 1 Z f(z), where z Rm We expand at mode z 0 ln f(z) ln f(z 0 ) 1 2 (z z 0) T A(z z 0 ) where A = ln f(z), Hessian of ln f(z), serving as the z=z0 precision matrix in a Gaussian distribution Taking the exponential, we obtain { f(z) f(z 0 ) exp 1 } 2 (z z 0) T A(z z 0 ) Normalize it as a distribution, and then we have { q(z) = A 1/2 exp 1 } (2π) m/2 2 (z z 0) T A(z z 0 ) = N (z z 0, A 1 )
32 Bayesian Logistic Regression: A Revisit in Earnest Prior: Gaussian, which is natural 1 Posterior: p(w) = N (w m 0, S 0 ) p(w t) p(w)p(t w) ln p(w t) = 1 2 (w m 0) T S0 1 (w m 0) [prior] m { ( + t (i) ln y (i) + 1 t (i)) ( ln 1 y (i))} [likelihood] i=1 + const S N = ln p(w t) = S m y (i) (1 y (i) )φ n φ T n i=1 Hence, the Laplace approximation to the posterior is q(w) = N (w w MAP, S N ) 1 Mathematicians always choose priors for the sake of convenience rather than approaching God.
33 Predictive Density p(c 1 φ, t) = p(c 1 φ, w)p(w t) dw σ(w T φ )q(w) dw Plan p(c 2 φ, t) = 1 p(c 1 φ, t) Change it to a univariate integral Substitute sigmoid with a probit function, which is then convolved with a normal σ( )N ( ) d Φ( )N ( ) d = Φ( ) σ( )
34 The Dirac Delta Fucntion Let δ be the Dirac delta function, loosely thought of a function such that Gaussian distribution peaked at 0 with standard deviation 0 { +, if x = 0 δ(x) = 0, otherwise δ function satisfies δ(a x)f(a) da = f(x) and specifically δ(a x) da = 1
35 Deriving the Predictive Density p(c 1 φ ) σ(w T φ )q(w) dw [Laplace approx.] σ(w T φ) = δ(a w T φ )σ(a) da [Def. of δ] p(c 1 φ ) = δ(a w T φ)σ(a) da q(w) dw = σ(a)δ(a w T φ )q(w) dw da = σ(a)p(a) da where p(a) = δ(a w T φ )q(w) dw We now argue that p(a) is Gaussian
36 Deriving the Predictive Density (2) p(a) = = δ(a w T φ )q(w) dw 1 dw 2 dw n q( w) dw 2 dw n w is such that w T φ = a We can also verify that p(a) da = δ(a w T φ )q(w) dw da = δ(a w T φ )q(w) da dw = q(w) d w δ(a w T φ) da = 1
37 Deriving the Predictive Density (3) µ a = E[a] = p(a)a da = q(w)w T φ dw = wmap T φ σa 2 = var[a] = p(a) { a 2 E[a] 2} da = q(w) { (w T φ) 2 (m 2 Nφ) 2} dw = φ T S N φ Thus p(c 1 t) σ(a)p(a) da = σ(a)n (a µ a, σa) 2 da ( ) Φ(λa)N (a µ a, σa) 2 µ a da = Φ (λ 2 + σa) 2 1/2 σ ( κ(σa)µ 2 ) a λ = π/8, κ(σ 2 a) = (1 + πσ 2 a/8) 1/2, chosen such that the rescaled probit function has the same slope as sigmoid at the origin.
38 Φ( ) φ( ) = Φ( ) Let X N (a, b 2 ) and Y N (c, d 2 ) ( ) w c Pr{X Y } = Pr{X Y Y = w} φ dw d ( ) w c = Pr{X w} φ dw d ( ) ( ) w a w c = Φ φ dw c d By noticing that We have X Y N (a c, b 2 + d 2 ) Pr{X Y } = Pr{X Y 0} ( ) a + c = Φ b 2 + d 2
39 Take-Home Messages Discriminant functions Probabilistic generative models (don t use it) Probabilistic discriminative models Bayesian logistic regression Sampling methods Deterministic approximations Laplace approximation (fitting a Gaussian at a mode) Substitute the sigmoid function with a probit function Analytical solutions The decision boundary is the same with equal prior probabilities
40 Thanks for Listening!
Cheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 305 Part VII
More informationLINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,
More informationCh 4. Linear Models for Classification
Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationBayesian Logistic Regression
Bayesian Logistic Regression Sargur N. University at Buffalo, State University of New York USA Topics in Linear Models for Classification Overview 1. Discriminant Functions 2. Probabilistic Generative
More informationLinear Classification: Probabilistic Generative Models
Linear Classification: Probabilistic Generative Models Sargur N. University at Buffalo, State University of New York USA 1 Linear Classification using Probabilistic Generative Models Topics 1. Overview
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationOutline Lecture 2 2(32)
Outline Lecture (3), Lecture Linear Regression and Classification it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic
More informationLogistic Regression. Sargur N. Srihari. University at Buffalo, State University of New York USA
Logistic Regression Sargur N. University at Buffalo, State University of New York USA Topics in Linear Classification using Probabilistic Discriminative Models Generative vs Discriminative 1. Fixed basis
More informationThe Laplace Approximation
The Laplace Approximation Sargur N. University at Buffalo, State University of New York USA Topics in Linear Models for Classification Overview 1. Discriminant Functions 2. Probabilistic Generative Models
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationOutline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation
More informationLast updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship
More informationMachine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationIntroduction to Machine Learning
Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationLinear Models for Classification
Catherine Lee Anderson figures courtesy of Christopher M. Bishop Department of Computer Science University of Nebraska at Lincoln CSCE 970: Pattern Recognition and Machine Learning Congradulations!!!!
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More informationLinear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging
More informationMachine Learning Practice Page 2 of 2 10/28/13
Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationDEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE
Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More informationProbabilistic generative models
Linear models for classification Francesco Corona Probabilistic discriminative models Models with linear decision boundaries arise from assumptions about the data In a generative approach to classification,
More informationAn Introduction to Statistical and Probabilistic Linear Models
An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning
More informationLogistic Regression. Seungjin Choi
Logistic Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationPattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods
Pattern Recognition and Machine Learning Chapter 6: Kernel Methods Vasil Khalidov Alex Kläser December 13, 2007 Training Data: Keep or Discard? Parametric methods (linear/nonlinear) so far: learn parameter
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables
More informationMidterm. Introduction to Machine Learning. CS 189 Spring You have 1 hour 20 minutes for the exam.
CS 189 Spring 2013 Introduction to Machine Learning Midterm You have 1 hour 20 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. Please use non-programmable calculators
More informationIntroduction to Machine Learning
Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},
More informationMachine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber
Machine Learning Regression-Based Classification & Gaussian Discriminant Analysis Manfred Huber 2015 1 Logistic Regression Linear regression provides a nice representation and an efficient solution to
More informationClassification. Sandro Cumani. Politecnico di Torino
Politecnico di Torino Outline Generative model: Gaussian classifier (Linear) discriminative model: logistic regression (Non linear) discriminative model: neural networks Gaussian Classifier We want to
More informationMachine Learning. 7. Logistic and Linear Regression
Sapienza University of Rome, Italy - Machine Learning (27/28) University of Rome La Sapienza Master in Artificial Intelligence and Robotics Machine Learning 7. Logistic and Linear Regression Luca Iocchi,
More informationVariational Bayesian Logistic Regression
Variational Bayesian Logistic Regression Sargur N. University at Buffalo, State University of New York USA Topics in Linear Models for Classification Overview 1. Discriminant Functions 2. Probabilistic
More informationLogistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu
Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data
More informationMidterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian
More informationMachine Learning 2017
Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationChris Bishop s PRML Ch. 8: Graphical Models
Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular
More informationMachine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart
Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationApril 9, Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá. Linear Classification Models. Fabio A. González Ph.D.
Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá April 9, 2018 Content 1 2 3 4 Outline 1 2 3 4 problems { C 1, y(x) threshold predict(x) = C 2, y(x) < threshold, with threshold
More informationNeural Networks - II
Neural Networks - II Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I Christensen (RIM@GT) Neural Networks 1
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationRelevance Vector Machines
LUT February 21, 2011 Support Vector Machines Model / Regression Marginal Likelihood Regression Relevance vector machines Exercise Support Vector Machines The relevance vector machine (RVM) is a bayesian
More informationSupport Vector Machine. Industrial AI Lab.
Support Vector Machine Industrial AI Lab. Classification (Linear) Autonomously figure out which category (or class) an unknown item should be categorized into Number of categories / classes Binary: 2 different
More informationLecture 2: Simple Classifiers
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 2: Simple Classifiers Slides based on Rich Zemel s All lecture slides will be available on the course website: www.cs.toronto.edu/~jessebett/csc412
More informationMachine Learning. Lecture 3: Logistic Regression. Feng Li.
Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification
More informationSYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I
SYDE 372 Introduction to Pattern Recognition Probability Measures for Classification: Part I Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 Why use probability
More informationMachine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d.
More informationLinear Regression and Discrimination
Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationLogistic Regression. William Cohen
Logistic Regression William Cohen 1 Outline Quick review classi5ication, naïve Bayes, perceptrons new result for naïve Bayes Learning as optimization Logistic regression via gradient ascent Over5itting
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationCPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017
CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class
More informationIntroduction to Machine Learning
Introduction to Machine Learning Bayesian Classification Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationPMR Learning as Inference
Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning
More informationProbabilistic modeling. The slides are closely adapted from Subhransu Maji s slides
Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationCSci 8980: Advanced Topics in Graphical Models Gaussian Processes
CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian
More informationMultivariate statistical methods and data mining in particle physics
Multivariate statistical methods and data mining in particle physics RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement of the problem Some general
More informationMachine Learning Overview
Machine Learning Overview Sargur N. Srihari University at Buffalo, State University of New York USA 1 Outline 1. What is Machine Learning (ML)? 2. Types of Information Processing Problems Solved 1. Regression
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More information10-701/ Machine Learning - Midterm Exam, Fall 2010
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam
More informationData Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396
Data Mining Linear & nonlinear classifiers Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 31 Table of contents 1 Introduction
More informationMulticlass Logistic Regression
Multiclass Logistic Regression Sargur. Srihari University at Buffalo, State University of ew York USA Machine Learning Srihari Topics in Linear Classification using Probabilistic Discriminative Models
More informationComputer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression
Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationSTA414/2104. Lecture 11: Gaussian Processes. Department of Statistics
STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationCSC 411: Lecture 04: Logistic Regression
CSC 411: Lecture 04: Logistic Regression Raquel Urtasun & Rich Zemel University of Toronto Sep 23, 2015 Urtasun & Zemel (UofT) CSC 411: 04-Prob Classif Sep 23, 2015 1 / 16 Today Key Concepts: Logistic
More informationMachine Learning - MT & 5. Basis Expansion, Regularization, Validation
Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships
More informationReading Group on Deep Learning Session 1
Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular
More informationGaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak
Gaussian processes and bayesian optimization Stanisław Jastrzębski kudkudak.github.io kudkudak Plan Goal: talk about modern hyperparameter optimization algorithms Bayes reminder: equivalent linear regression
More informationGaussian discriminant analysis Naive Bayes
DM825 Introduction to Machine Learning Lecture 7 Gaussian discriminant analysis Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Outline 1. is 2. Multi-variate
More informationLecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.
Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods
More informationOverview c 1 What is? 2 Definition Outlines 3 Examples of 4 Related Fields Overview Linear Regression Linear Classification Neural Networks Kernel Met
c Outlines Statistical Group and College of Engineering and Computer Science Overview Linear Regression Linear Classification Neural Networks Kernel Methods and SVM Mixture Models and EM Resources More
More informationMachine Learning Support Vector Machines. Prof. Matteo Matteucci
Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way
More informationMidterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.
CS 89 Spring 07 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic
More informationIterative Reweighted Least Squares
Iterative Reweighted Least Squares Sargur. University at Buffalo, State University of ew York USA Topics in Linear Classification using Probabilistic Discriminative Models Generative vs Discriminative
More informationIntroduction to Machine Learning
Introduction to Machine Learning Thomas G. Dietterich tgd@eecs.oregonstate.edu 1 Outline What is Machine Learning? Introduction to Supervised Learning: Linear Methods Overfitting, Regularization, and the
More informationKernel Methods. Machine Learning A W VO
Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance
More informationLINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning
LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES Supervised Learning Linear vs non linear classifiers In K-NN we saw an example of a non-linear classifier: the decision boundary
More information