Today. Calculus. Linear Regression. Lagrange Multipliers
|
|
- Chad Gordon
- 6 years ago
- Views:
Transcription
1 Today Calculus Lagrange Multipliers Linear Regression 1
2 Optimization with constraints What if I want to constrain the parameters of the model. The mean is less than 10 Find the best likelihood, subject to a constraint. Two functions: An objective function to maximize An inequality that must be satisfied 2
3 Lagrange Multipliers Find maxima of f (x,y) subject to a constraint. f(x, y) =x +2y x 2 + y 2 =1 3
4 General form Maximizing: Subject to: f(x, y) g(x, y) =c Introduce a new variable, and find a maxima. Λ(x, y, λ) =f(x, y)+λ(g(x, y) c) 4
5 Example Maximizing: Subject to: f(x, y) =x +2y x 2 + y 2 =1 Introduce a new variable, and find a maxima. Λ(x, y, λ) =x +2y + λ(x 2 + y 2 1) 5
6 Example Λ(x, y, λ) x Λ(x, y, λ) y =1+2λx =0 =2+2λy =0 Λ(x, y, λ) λ =(x 2 + y 2 1) = 0 Now have 3 equations with 3 unknowns. 6
7 Example Eliminate Lambda 1=2λx 2=2λy 1 x =2λ = 2 y y =2x Substitute and Solve x 2 + y 2 =1 x 2 +(2x) 2 =1 5x 2 =1 x = ± 1 5 y = ± 2 5 7
8 Basics of Linear Regression Regression algorithm Supervised technique. In one dimension: Identify In D-dimensions: Identify Given: training data: And targets: y : R R y : R D R {x0, x 1,...,x N } {t 0,t 1,...,t N } 8
9 Graphical Example of Regression t? x 9
10 Graphical Example of Regression t x 10
11 Graphical Example of Regression t x 11
12 Definition In linear regression, we assume that the model that generates the data involved only a linear combination of input variables. y(x, w) =w 0 + w 1 x w D x D y(x, w) =w 0 + D j=1 w j x j Where w is a vector of weights which define the D parameters of the model 12
13 How can we evaluate the performance of a regression solution? Error Functions (or Loss functions) Squared Error Linear Error Evaluation E(t E(t i,y(x i, w)) = 1 i,y(x i, w)) = t 2 (t i y(x i, w) w)) 2 13
14 Regression Error 14
15 Empirical Risk Empirical risk is the measure of the loss from data. R emp = 1 N = 1 N N i=1 N E(t i,y(x i, w)) 1 2 (t i y(x i, w)) 2 i=1 By minimizing risk on the training data, we optimize the fit with respect to the loss function w R =0 15
16 Model Likelihood and Empirical Risk Two related but distinct ways to look at a model. 1. Model Likelihood. 1. What is the likelihood that a model generated the observed data? 2. Empirical Risk 1. How much error does the model have on the training data? 16
17 Model Likelihood p(t x, w, β) =N(t; y(x, w), β 1 ) where β = 1 σ 2 p(t x, w, β) = N(t i ; y(x i, w), β 1 ) Assuming Independently Identically Distributed (iid) data. 17
18 Understanding Model Likelihood p(t x, w, β) = p(t x, w, β) = n p(t x, w, β) =ln 1 β exp 2π N(t i ; y(x i, w), β 1 ) β 2 (y(x i, w) t i ) 2 1 β β exp 2π 2 (y(x i, w) t i ) 2 Substitution for the eqn of a gaussian Apply a log function = β 2 (y(xi, w) t i ) 2 + N 2 ln β N 2 ln 2π Let the log dissolve products into sums 18
19 Understanding Model Likelihood ln p(t x, w, β) = β 2 w ln p(t x, w, β) = w w β 2 R emp = 1 N β 2 (y(xi, w) t i ) 2 + N 2 ln β N 2 (y(xi, w) t i ) 2 (y(xi, w) t i ) 2 =0 N i=1 1 2 (t i y(x i, w)) 2 ln 2π Optimize the weights. (Maximum Likelihood Estimation) Log Likelihood Empirical Risk w/ Squared Loss Function 19
20 Maximizing Log Likelihood (1-D) Find the optimal settings of w. w R = 0 R( w) = 1 w = w 0 2N w 1 T R w 0 R w 1 = 0 0 (t i w 1 x i w 0 ) 2 20
21 1 N Maximizing Log Likelihood w R( w) = 1 2N (t i w 1 x i w 0 ) 2 R w 0 = 1 N (t i w 1 x i w 0 )( 1) = 0 1 N w 0 = 1 N w 0 = 1 N w 0 = 1 N (t i w 1 x i w 0 )( 1) (t i w 1 x i ) (t i w 1 x i ) t i w 1 1 N x i Partial derivative Set to zero Separate the sum to isolate w 0 21
22 1 N 1 N Maximizing Log Likelihood w R( w) = 1 2N R w 1 = 1 N (t i w 1 x i w 0 )( x i )=0 (t i x i w 1 x 2 i w 0 x i )=0 1 N w 1 w 1 x 2 i = 1 N x 2 i = (t i w 1 x i w 0 ) 2 (t i w 1 x i w 0 )( x i ) t i x i 1 N t i x i w 0 x i w 0 x i Partial derivative Set to zero Separate the sum to isolate w 0 22
23 Maximizing Log Likelihood w 1 w 1 x 2 i = w 0 = 1 N x 2 i = t i x i 1 N t i w 1 1 N t i x i w 0 t i w 1 1 N x i x i x i x i From previous partial From prev. slide Substitute 1 x 2 i 1 N w 1 = x i x i = t i x i 1 N t ix i 1 N x2 i 1 N t i x i Isolate w 1 t i x i x i x i 23
24 Maximizing Log Likelihood Clean and easy. w 0 w 1 = Or not 1 N t i w1 1 N t ix i 1 N x2 i 1 N t i x i x i x i x i Apply some linear algebra. 24
25 Likelihood using linear algebra Representing the linear regression function in terms of vectors. y = w 0 + w 1 x 1 + w 2 x w x x = 1 x 1 x 2... x T w = w 0 w 1 w 2... w T y = x T w 25
26 Likelihood using linear algebra Stack x T into a matrix of data points, X. R emp ( w) = 1 2N = 1 2N = 1 2N = 1 2N (t i w 1 x i w 0 ) 2 t i 2 w 1 x 0 i w 1 t 0 1 x 0 t 1. 1 x 1 w0.. w 1 t 1 x t X w 2 2 Representation as vectors Stack the data into a matrix and use the Norm operation to handle the sum 26
27 Likelihood in multiple dimensions This representation of risk has no inherent dimensionality. R emp ( w) = 1 2N t X w 2 w 1 2N w R emp ( w) =0 t X 2 w =0 27
28 Maximum Likelihood Estimation redux 1 2N w w R emp ( w) =0 1 w t X 2N w 2 =0 1 2N w (t X w) T (t X w) =0 (t T t t T X w w T X Tt + w T X T X w) =0 1 X 2N T t X T t +2X T X w =0 1 2N 2 X T t +2 X T X w =0 X T X w = X T t Decompose the norm FOIL linear algebra style Differentiate Combine terms Isolate w w =( X T X) 1 X Tt 28
29 Extension to polynomial regression 29
30 Extension to polynomial regression y = c 0 + c 1 x 1 + c 2 x 2 y = c 0 + c 1 x + c 2 x 2 Polynomial regression is the same as linear regression in D dimensions 30
31 Generate new features Standard Polynomial with coefficients, w y(x, w) = D d=1 w d x d + w 0 Risk = 1 2 t 0 t 1. t n 1 1 x 0... x p 0 1 x 1... x p x n 1... x p n 1 w 0 w 1. w p 2 31
32 Generate new features Feature Trick: To fit a D dimensional polynomial, Create a D-element vector from x i x i = x 0 i x 1 i... x P i T Then standard linear regression in D dimensions 32
33 How is this still linear regression? The regression is linear in the parameters, despite projecting x i from one dimension to D dimensions. Now we fit a plane (or hyperplane) to a representation of x i in a higher dimensional feature space. This generalizes to any set of functions φ i : R R x i = φ 0 (x i ) φ 1 (x i )... φ P (x i ) T 33
34 Basis functions as feature extraction These functions are called basis functions. They define the bases of the feature space Allows linear decomposition of any type of function to data points Common Choices: Polynomial Gaussian Sigmoids Wave functions (sine, etc.) φ i : R R 34
35 Training data vs. Testing Data Evaluating the performance of a classifier on training data is meaningless. With enough parameters, a model can simply memorize (encode) every training point To evaluate performance, data is divided into training and testing (or evaluation) data. Training data is used to learn model parameters Testing data is used to evaluate performance 35
36 Overfitting 36
37 Overfitting 37
38 Overfitting performance 38
39 Definition of overfitting When the model describes the noise, rather than the signal. How can you tell the difference between overfitting, and a bad model? 39
40 Possible detection of overfitting Stability An appropriately fit model is stable under different samples of the training data An overfit model generates inconsistent performance Performance A good model has low test error A bad model has high test error 40
41 What is the optimal model size? The best model size generalizes to unseen data the best. Approximate this by testing error. One way to optimize parameters is to minimize testing error. This operation uses testing data as tuning or development data Sacrifices training data in favor of parameter optimization Can we do this without explicit evaluation data? 41
42 Context for linear regression Simple approach Efficient learning Extensible Regularization provides robust models 42
43 Break Coffee. Stretch. 43
44 Linear Regression Identify the best parameters, w, for a regression function y = w 0 + N i=1 w i x i w =( X T X) 1 X Tt 44
45 Overfitting Recall: overfitting happens when a model is capturing idiosyncrasies of the data rather than generalities. Often caused by too many parameters relative to the amount of training data. E.g. an order-n polynomial can intersect any N+1 data points 45
46 Dealing with Overfitting Use more data Use a tuning set Regularization Be a Bayesian 46
47 Regularization In a linear regression model overfitting is characterized by large weights. M =0 M =1 M =3 M =9 w w w w w w w w w w
48 Penalize large weights Introduce a penalty term in the loss function. E( w) = 1 2 E( w) = 1 2 n=0 n=0 {t n y(x n, w)} 2 Regularized Regression (L2-Regularization or Ridge Regression) (t n y(x n, w)) 2 + λ 2 w2 48
49 Regularization Derivation w 1 2 w (E( w)) = 0 (y(x i, w) t i ) 2 + λ 2 w2 =0 w 1 2 t X w 2 + λ 2 w2 =0 w 1 2 ( t X w) T (t X w)+ λ 2 wt w =0 49
50 1 w 2 ( t X w) T (t X w)+ λ 2 wt w =0 X T t + X T λ X w + w 2 wt w =0 X T t + X T X w + λw =0 X T t + X T X w + λi w =0 X T t +( X T X + λi) w =0 ( X T X + λi) w = X Tt w =( X T X + λi) 1 X Tt 50
51 Regularization in Practice 51
52 Regularization Results 52
53 More regularization The penalty term defines the styles of regularization L2-Regularization L1-Regularization L0-Regularization L0-norm is the optimal subset of features E( w) = 1 2 E( w) = 1 2 E( w) = 1 2 (t n y(x n, w)) 2 + λ 2 w2 n=0 (t n y(x n, w)) 2 + λ w 1 n=0 (t n y(x n, w)) 2 + λ n=0 n=0 δ(w n = 0) 53
54 Curse of dimensionality Increasing dimensionality of features increases the data requirements exponentially. For example, if a single feature can be accurately approximated with 100 data points, to optimize the joint over two features requires 100*100 data points. Models should be small relative to the amount of available data Dimensionality reduction techniques feature selection can help. L0-regularization is explicit feature selection L1- and L2-regularizations approximate feature selection. 54
55 Bayesians v. Frequentists What is a probability? Frequentists A probability is the likelihood that an event will happen It is approximated by the ratio of the number of observed events to the number of total events Assessment is vital to selecting a model Point estimates are absolutely fine Bayesians A probability is a degree of believability of a proposition. Bayesians require that probabilities be prior beliefs conditioned on data. The Bayesian approach is optimal, given a good model, a good prior and a good loss function. Don t worry so much about assessment. If you are ever making a point estimate, you ve made a mistake. The only valid probabilities are posteriors based on evidence given some prior 55
56 Bayesian Linear Regression The previous MLE derivation of linear regression uses point estimates for the weight vector, w. Bayesians say, hold it right there. Use a prior distribution over w to estimate parameters p( w α) =N( w 0, α 1 I)= α 2π (M+1)/2 exp α 2 wt w Alpha is a hyperparameter over w, where alpha is the precision or inverse variance of the distribution. Now optimize: p( w x,t, α, β) p(t x, w, β)p( w α) 56
57 Optimize the Bayesian posterior p( w x,t, α, β) p(t x, w, β)p( w α) As usual it s easier to optimize after a log transform. p(t x, w, β) = ln p(t x, w, β)+lnp( w α) n=0 β 2π exp n p(t x, w, β) = N 2 ln β N 2 ln 2π β 2 β2 (t n y(x n, w)) 2 (t n y(x n, w)) 2 n=0 57
58 Optimize the Bayesian posterior p( w x,t, α, β) p(t x, w, β)p( w α) As usual it s easier to optimize after a log transform. ln p(t x, w, β)+lnp( w α) p( w α) =N( w 0, α 1 I)= α 2π n p( w α) = M +1 2 ln α M +1 2 (M+1)/2 exp α 2 wt w ln 2π α 2 wt w 58
59 Optimize the Bayesian posterior ln p(t x, w, β)+lnp( w α) n p(t x, w, β) = N 2 ln β N 2 ln 2π β 2 ln p( w α) = M +1 2 ln α M +1 2 (t n y(x n, w)) 2 n=0 Ignoring terms that do not depend on w n p(t x, w, β)+lnp( w α) β 2 ln 2π α 2 wt w (t n y(x n, w)) 2 + α 2 wt w n=0 IDENTICAL formulation as L2-regularization 59
60 Context Overfitting is bad. Bayesians vs. Frequentists Is one better? Machine Learning uses techniques from both camps. 60
61 Next Time Logistic Regression Read Chapter 4.1,
Overfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationRegression. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh.
Regression Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh September 24 (All of the slides in this course have been adapted from previous versions
More informationModeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop
Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationy Xw 2 2 y Xw λ w 2 2
CS 189 Introduction to Machine Learning Spring 2018 Note 4 1 MLE and MAP for Regression (Part I) So far, we ve explored two approaches of the regression framework, Ordinary Least Squares and Ridge Regression:
More informationBayesian Linear Regression [DRAFT - In Progress]
Bayesian Linear Regression [DRAFT - In Progress] David S. Rosenberg Abstract Here we develop some basics of Bayesian linear regression. Most of the calculations for this document come from the basic theory
More informationToday. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion
Today Probability and Statistics Naïve Bayes Classification Linear Algebra Matrix Multiplication Matrix Inversion Calculus Vector Calculus Optimization Lagrange Multipliers 1 Classical Artificial Intelligence
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationBayesian Gaussian / Linear Models. Read Sections and 3.3 in the text by Bishop
Bayesian Gaussian / Linear Models Read Sections 2.3.3 and 3.3 in the text by Bishop Multivariate Gaussian Model with Multivariate Gaussian Prior Suppose we model the observed vector b as having a multivariate
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More informationMachine Learning And Applications: Supervised Learning-SVM
Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine
More informationLinear Models for Regression
Linear Models for Regression Machine Learning Torsten Möller Möller/Mori 1 Reading Chapter 3 of Pattern Recognition and Machine Learning by Bishop Chapter 3+5+6+7 of The Elements of Statistical Learning
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More informationLinear Models for Regression
Linear Models for Regression CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 The Regression Problem Training data: A set of input-output
More informationJeff Howbert Introduction to Machine Learning Winter
Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable
More informationBias-Variance Trade-off in ML. Sargur Srihari
Bias-Variance Trade-off in ML Sargur srihari@cedar.buffalo.edu 1 Bias-Variance Decomposition 1. Model Complexity in Linear Regression 2. Point estimate Bias-Variance in Statistics 3. Bias-Variance in Regression
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationLINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning
LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES Supervised Learning Linear vs non linear classifiers In K-NN we saw an example of a non-linear classifier: the decision boundary
More informationCSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression
CSC2515 Winter 2015 Introduction to Machine Learning Lecture 2: Linear regression All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationSupport Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar
Data Mining Support Vector Machines Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar 02/03/2018 Introduction to Data Mining 1 Support Vector Machines Find a linear hyperplane
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
More informationOutline lecture 2 2(30)
Outline lecture 2 2(3), Lecture 2 Linear Regression it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic Control
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationLecture 3. Linear Regression II Bastian Leibe RWTH Aachen
Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression
More informationIntroduction to Bayesian Learning. Machine Learning Fall 2018
Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationMachine Learning - MT & 5. Basis Expansion, Regularization, Validation
Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships
More informationSupport Vector Machines and Kernel Methods
Support Vector Machines and Kernel Methods Geoff Gordon ggordon@cs.cmu.edu July 10, 2003 Overview Why do people care about SVMs? Classification problems SVMs often produce good results over a wide range
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationCPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017
CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class
More informationCOMP 551 Applied Machine Learning Lecture 20: Gaussian processes
COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55
More informationy(x) = x w + ε(x), (1)
Linear regression We are ready to consider our first machine-learning problem: linear regression. Suppose that e are interested in the values of a function y(x): R d R, here x is a d-dimensional vector-valued
More informationCS6375: Machine Learning Gautam Kunapuli. Support Vector Machines
Gautam Kunapuli Example: Text Categorization Example: Develop a model to classify news stories into various categories based on their content. sports politics Use the bag-of-words representation for this
More informationIntroduction to Machine Learning
Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1
More informationSupport Vector Machine (continued)
Support Vector Machine continued) Overlapping class distribution: In practice the class-conditional distributions may overlap, so that the training data points are no longer linearly separable. We need
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality
More informationStatistics 203: Introduction to Regression and Analysis of Variance Penalized models
Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor - p. 1/15 Today s class Bias-Variance tradeoff. Penalized regression. Cross-validation. - p. 2/15 Bias-variance
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationMax Margin-Classifier
Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization
More informationWeek 3: Linear Regression
Week 3: Linear Regression Instructor: Sergey Levine Recap In the previous lecture we saw how linear regression can solve the following problem: given a dataset D = {(x, y ),..., (x N, y N )}, learn to
More informationMachine Learning 4771
Machine Learning 4771 Instructor: Tony Jebara Topic 3 Additive Models and Linear Regression Sinusoids and Radial Basis Functions Classification Logistic Regression Gradient Descent Polynomial Basis Functions
More informationLogistic Regression. COMP 527 Danushka Bollegala
Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will
More informationComputer Vision Group Prof. Daniel Cremers. 3. Regression
Prof. Daniel Cremers 3. Regression Categories of Learning (Rep.) Learnin g Unsupervise d Learning Clustering, density estimation Supervised Learning learning from a training data set, inference on the
More informationSupport Vector Machines
Two SVM tutorials linked in class website (please, read both): High-level presentation with applications (Hearst 1998) Detailed tutorial (Burges 1998) Support Vector Machines Machine Learning 10701/15781
More information4 Bias-Variance for Ridge Regression (24 points)
Implement Ridge Regression with λ = 0.00001. Plot the Squared Euclidean test error for the following values of k (the dimensions you reduce to): k = {0, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500,
More informationLinear Models for Regression. Sargur Srihari
Linear Models for Regression Sargur srihari@cedar.buffalo.edu 1 Topics in Linear Regression What is regression? Polynomial Curve Fitting with Scalar input Linear Basis Function Models Maximum Likelihood
More informationCPSC 340: Machine Learning and Data Mining
CPSC 340: Machine Learning and Data Mining MLE and MAP Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due tonight. Assignment 5: Will be released
More informationSTA414/2104. Lecture 11: Gaussian Processes. Department of Statistics
STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationECE521 lecture 4: 19 January Optimization, MLE, regularization
ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity
More informationMachine Learning Basics Lecture 4: SVM I. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 4: SVM I Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d. from distribution
More informationSupport Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Support Vector Machines CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 A Linearly Separable Problem Consider the binary classification
More informationIntroduction to Gaussian Process
Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationNon-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines
Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall
More informationLecture 4: Types of errors. Bayesian regression models. Logistic regression
Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationMidterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian
More informationCOMP 551 Applied Machine Learning Lecture 21: Bayesian optimisation
COMP 55 Applied Machine Learning Lecture 2: Bayesian optimisation Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp55 Unless otherwise noted, all material posted
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 24) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu October 2, 24 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 24) October 2, 24 / 24 Outline Review
More informationMidterm. Introduction to Machine Learning. CS 189 Spring You have 1 hour 20 minutes for the exam.
CS 189 Spring 2013 Introduction to Machine Learning Midterm You have 1 hour 20 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. Please use non-programmable calculators
More informationMachine Learning Practice Page 2 of 2 10/28/13
Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationCOMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017
COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS
More informationWays to make neural networks generalize better
Ways to make neural networks generalize better Seminar in Deep Learning University of Tartu 04 / 10 / 2014 Pihel Saatmann Topics Overview of ways to improve generalization Limiting the size of the weights
More informationCSC 411: Lecture 04: Logistic Regression
CSC 411: Lecture 04: Logistic Regression Raquel Urtasun & Rich Zemel University of Toronto Sep 23, 2015 Urtasun & Zemel (UofT) CSC 411: 04-Prob Classif Sep 23, 2015 1 / 16 Today Key Concepts: Logistic
More informationAnnouncements - Homework
Announcements - Homework Homework 1 is graded, please collect at end of lecture Homework 2 due today Homework 3 out soon (watch email) Ques 1 midterm review HW1 score distribution 40 HW1 total score 35
More informationLinear regression COMS 4771
Linear regression COMS 4771 1. Old Faithful and prediction functions Prediction problem: Old Faithful geyser (Yellowstone) Task: Predict time of next eruption. 1 / 40 Statistical model for time between
More information10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers
Computational Methods for Data Analysis Massimo Poesio SUPPORT VECTOR MACHINES Support Vector Machines Linear classifiers 1 Linear Classifiers denotes +1 denotes -1 w x + b>0 f(x,w,b) = sign(w x + b) How
More informationCS798: Selected topics in Machine Learning
CS798: Selected topics in Machine Learning Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS798: Selected topics in Machine Learning
More informationIntroduction to SVM and RVM
Introduction to SVM and RVM Machine Learning Seminar HUS HVL UIB Yushu Li, UIB Overview Support vector machine SVM First introduced by Vapnik, et al. 1992 Several literature and wide applications Relevance
More informationLecture 5: GPs and Streaming regression
Lecture 5: GPs and Streaming regression Gaussian Processes Information gain Confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19, 2017 1 Recall: Non-parametric regression Input space X
More informationQualifying Exam in Machine Learning
Qualifying Exam in Machine Learning October 20, 2009 Instructions: Answer two out of the three questions in Part 1. In addition, answer two out of three questions in two additional parts (choose two parts
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationMidterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.
CS 89 Spring 07 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic
More informationMachine Learning Support Vector Machines. Prof. Matteo Matteucci
Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way
More informationINTRODUCTION TO PATTERN RECOGNITION
INTRODUCTION TO PATTERN RECOGNITION INSTRUCTOR: WEI DING 1 Pattern Recognition Automatic discovery of regularities in data through the use of computer algorithms With the use of these regularities to take
More informationDay 5: Generative models, structured classification
Day 5: Generative models, structured classification Introduction to Machine Learning Summer School June 18, 2018 - June 29, 2018, Chicago Instructor: Suriya Gunasekar, TTI Chicago 22 June 2018 Linear regression
More informationCOMS 4771 Regression. Nakul Verma
COMS 4771 Regression Nakul Verma Last time Support Vector Machines Maximum Margin formulation Constrained Optimization Lagrange Duality Theory Convex Optimization SVM dual and Interpretation How get the
More informationMachine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)
More informationNotes on Discriminant Functions and Optimal Classification
Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem
More informationMultivariate Bayesian Linear Regression MLAI Lecture 11
Multivariate Bayesian Linear Regression MLAI Lecture 11 Neil D. Lawrence Department of Computer Science Sheffield University 21st October 2012 Outline Univariate Bayesian Linear Regression Multivariate
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationCSE546: SVMs, Dual Formula5on, and Kernels Winter 2012
CSE546: SVMs, Dual Formula5on, and Kernels Winter 2012 Luke ZeClemoyer Slides adapted from Carlos Guestrin Linear classifiers Which line is becer? w. = j w (j) x (j) Data Example i Pick the one with the
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationIntroduction to Machine Learning Fall 2017 Note 5. 1 Overview. 2 Metric
CS 189 Introduction to Machine Learning Fall 2017 Note 5 1 Overview Recall from our previous note that for a fixed input x, our measurement Y is a noisy measurement of the true underlying response f x):
More informationRecap from previous lecture
Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience
More informationOverview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation
Overview Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Probabilistic Interpretation: Linear Regression Assume output y is generated
More informationBayesian Linear Regression. Sargur Srihari
Bayesian Linear Regression Sargur srihari@cedar.buffalo.edu Topics in Bayesian Regression Recall Max Likelihood Linear Regression Parameter Distribution Predictive Distribution Equivalent Kernel 2 Linear
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationLecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University
Lecture 18: Kernels Risk and Loss Support Vector Regression Aykut Erdem December 2016 Hacettepe University Administrative We will have a make-up lecture on next Saturday December 24, 2016 Presentations
More information