Machine Learning Linear Models
|
|
- Lynette Norton
- 6 years ago
- Views:
Transcription
1 Machine Learning Linear Models
2 Outline II - Linear Models 1. Linear Regression (a) Linear regression: History (b) Linear regression with Least Squares (c) Matrix representation and Normal Equation Method (d) Iterative Method: Gradient descent (e) Pros and Cons of both methods 2. Linear Classification: Perceptron (a) Definition and history (b) Example (c) Algorithm
3 Supervised Learning Training data: examples x with labels y. (x 1,y 1 ),...,(x n,y n ) /x i 2 R d Regression: y is a real value, y 2 R f : R d! R f is called a regressor. Classification: y is discrete. To simplify, y 2{ 1, +1} f : R d! { 1, +1} f is called a binary classifier.
4 : History A very popular technique. Rooted in Statistics. Method of Least Squares used as early as 1795 by Gauss. Re-invented in 1805 by Legendre. Frequently applied in astronomy to study the large scale of the universe. Still a very useful tool today. Carl Friedrich Gauss
5 Given: Training data: (x 1,y 1 ),...,(x n,y n ) /x i 2 R d and y i 2 R
6 Given: Training data: (x 1,y 1 ),...,(x n,y n ) /x i 2 R d and y i 2 R example x 1! x 11 x x 1d y 1 label example x i! x i1 x i2... x id y i label example x n! x n1 x n2... x nd y n label
7 Given: Training data: (x 1,y 1 ),...,(x n,y n ) /x i 2 R d and y i 2 R example x 1! x 11 x x 1d y 1 label example x i! x i1 x i2... x id y i label example x n! x n1 x n2... x nd y n label Task: Learn a regression function: f : R d! R f(x) =y
8 Given: Training data: (x 1,y 1 ),...,(x n,y n ) /x i 2 R d and y i 2 R example x 1! x 11 x x 1d y 1 label example x i! x i1 x i2... x id y i label example x n! x n1 x n2... x nd y n label Task: Learn a regression function: f : R d! R f(x) =y Linear Regression: A regression model is said to be linear if it is represented by a linear function.
9 Y #" X 2!" X 1 d = 1, line in R 2 d = 2, hyperplane is R 3 Credit: Introduction to Statistical Learning.
10 Linear Regression Model: f(x) = 0 + dx j=1 jx j with j 2 R, j 2{1,...,d} s are called parameters or coe cients or weights.
11 Linear Regression Model: f(x) = 0 + dx j=1 jx j with j 2 R, j 2{1,...,d} s are called parameters or coe cients or weights. Learning the linear model! learning the 0 s
12 Linear Regression Model: f(x) = 0 + dx j=1 jx j with j 2 R, j 2{1,...,d} s are called parameters or coe cients or weights. Learning the linear model! learning the 0 s Estimation with Least squares: Use least square loss: `oss(y i,f(x i )) = (y i f(x i )) 2
13 Linear Regression Model: f(x) = 0 + dx j=1 jx j with j 2 R, j 2{1,...,d} s are called parameters or coe cients or weights. Learning the linear model! learning the 0 s Estimation with Least squares: Use least square loss: `oss(y i,f(x i )) = (y i f(x i )) 2 We want to minimize the loss over all examples, that is minimize the risk or cost function R: R = 1 2n (y i f(x i )) 2
14 Asimplecasewithonefeature(d =1): f(x) = x
15 Asimplecasewithonefeature(d =1): We want to minimize: f(x) = x R = 1 2n (y i f(x i )) 2
16 Asimplecasewithonefeature(d =1): We want to minimize: f(x) = x R = 1 2n R( )= 1 2n (y i f(x i )) 2 (y i 0 1 x i ) 2
17 Asimplecasewithonefeature(d =1): We want to minimize: f(x) = x R = 1 2n R( )= 1 2n Find 0 and 1 that minimize: R( )= 1 2n (y i f(x i )) 2 (y i 0 1 x i ) 2 (y i 0 1 x i ) 2
18 β RSS β 0 β β 0 Credit: Introduction to Statistical Learning.
19 Find 0 and 1 so that: argmin ( 1 2n (y i 0 1 x i ) 2 )
20 Find 0 and 1 so that: Minimize: R( 0, argmin ( 1 2n 1), that is: (y i 0 1 x i ) =0
21 Find 0 and 1 so that: Minimize: R( 0, =2 0 2n argmin ( 1 2n 1), that is: (y i 0 1 x i ) =0 (y i 0 1 x i (y i 0 1 x i 0
22 Find 0 and 1 so that: Minimize: R( 0, =2 0 2n argmin ( 1 2n 1), that is: (y i 0 1 x i ) =0 (y i 0 1 x i (y i 0 1 x i 0 = 1 n (y i 0 1 x i ) ( 1) = 0
23 Find 0 and 1 so that: Minimize: R( 0, =2 0 2n argmin ( 1 2n 1), that is: (y i 0 1 x i ) =0 (y i 0 1 x i (y i 0 1 x i 0 = 1 n (y i 0 1 x i ) ( 1) = 0 0 = 1 n y i 1 1 n x i
24 =2 1 2n (y i 0 1 x i (y i 0 1 x i 1
25 =2 1 2n (y i 0 1 x i (y i 0 1 x i 1 = 1 n (y i 0 1 x i ) ( x i )=0
26 =2 1 2n (y i 0 1 x i (y i 0 1 x i 1 = 1 n (y i 0 1 x i ) ( x i )=0 1 x i 2 = x i y i n X 0x i
27 =2 1 2n (y i 0 1 x i (y i 0 1 x i 1 = 1 n (y i 0 1 x i ) ( x i )=0 Plugging 0 in 1 : 1 x i 2 = x i y i n X 0x i 1 = P n y i x i P n x 2 i 1 n P n y i P n x i 1 n P n x i P xi
28 With more than one feature: Find the f(x) = 0 + j that minimize: dx j=1 jx j R = 1 2n (y i 0 d X j=1 jx ij )) 2 Let s write it more elegantly with matrices!
29 Matrix representation Let X be an n (d + 1) matrix where each row starts with a 1 followed by a feature vector. Let y be the label vector of the training set. Let be the vector of weights (that we want to estimate!). X := 0 B B B B B 1 x 11 x 1j x 1d x i1 x ij x id x n1 x nj x nd 1 C C C C C C A y := 0 B B B B B y 1. y i. y n 1 C C C C C C A := 0 B B B B B 0. j. d 1 C C C C C C A
30 Normal Equation We want to find (d + 1) s that minimize R. We write R: We have that: R( )= 1 (y X ) 2 2n R( )= 1 2n (y X )T (y X = 1 n XT (y X 2 = 1 n XT X is positive definite which ensures that X T (y X )=0 is a minimum. We solve: The unique solution is: =(X T X) 1 X T y
31 Gradient descent!"#$%&'() ' 4.5' +,-./0$'12,3$' *$%&'( '
32 Gradient descent Gradient Descent is an optimization method. Repeat until convergence: Update simultaneously all j for (j = 0 and j = 1) 0 0 R( 0, 1) 1 1 R( 0, 1)
33 Gradient descent Gradient Descent is an optimization method. Repeat until convergence: Update simultaneously all j for (j = 0 and j = 1) 0 0 R( 0, 1) 1 1 R( 0, 1) is a learning rate.
34 Gradient descent In the linear 0 = 1 n (y i 0 1 x i ) ( 1) = 0 Let s generalize 1 = 1 n (y i 0 1 x i ) ( x i )
35 Gradient descent In the linear 0 = 1 n (y i 0 1 x i ) ( 1) = 1 = 1 n (y i 0 1 x i ) ( x i ) Repeat until convergence: Update simultaneously all j for (j = 0 and j = 1) 0 := 0 1 n ( x i y i ) 1 := 1 1 n ( x i y i )(x i )
36 Pros and Cons Analytical approach: Normal Equation + No need to specify a convergence rate or iterate. - Works only if X T X is invertible - Very slow if d is large O(d 3 ) to compute (X T X) 1 Iterative approach: Gradient Descent + E ective and e cient even in high dimensions. - Iterative (sometimes need many iterations to converge). - Needs to choose the rate.
37 Practical considerations 1. Scaling: Bring your features to a similar scale.
38 Practical considerations 1. Scaling: Bring your features to a similar scale. x i := x i µ i stdev(x i )
39 Practical considerations 1. Scaling: Bring your features to a similar scale. x i := x i µ i stdev(x i ) 2. Learning rate: Don t use a rate that is too small or too large.
40 Practical considerations 1. Scaling: Bring your features to a similar scale. x i := x i µ i stdev(x i ) 2. Learning rate: Don t use a rate that is too small or too large. 3. Rshoulddecreaseafter each iteration.
41 Practical considerations 1. Scaling: Bring your features to a similar scale. x i := x i µ i stdev(x i ) 2. Learning rate: Don t use a rate that is too small or too large. 3. Rshoulddecreaseafter each iteration. 4. Declare convergence if it start decreasing by less
42 Practical considerations 1. Scaling: Bring your features to a similar scale. x i := x i µ i stdev(x i ) 2. Learning rate: Don t use a rate that is too small or too large. 3. Rshoulddecreaseafter each iteration. 4. Declare convergence if it start decreasing by less 5. If X T X is not invertible?
43 Practical considerations 1. Scaling: Bring your features to a similar scale. x i := x i µ i stdev(x i ) 2. Learning rate: Don t use a rate that is too small or too large. 3. Rshoulddecreaseafter each iteration. 4. Declare convergence if it start decreasing by less 5. If X T X is not invertible? (a) Too many features as compared to the number of examples (e.g., 50 examples and 500 features) (b) Features linearly dependent: e.g., weight in pounds and in kilo.
44 Credit The elements of statistical learning. Data mining, inference, and prediction. 10th Edition T. Hastie, R. Tibshirani, J. Friedman. Machine Learning Tom Mitchell.
Recap from previous lecture
Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience
More informationWe choose parameter values that will minimize the difference between the model outputs & the true function values.
CSE 4502/5717 Big Data Analytics Lecture #16, 4/2/2018 with Dr Sanguthevar Rajasekaran Notes from Yenhsiang Lai Machine learning is the task of inferring a function, eg, f : R " R This inference has to
More informationKernelized Perceptron Support Vector Machines
Kernelized Perceptron Support Vector Machines Emily Fox University of Washington February 13, 2017 What is the perceptron optimizing? 1 The perceptron algorithm [Rosenblatt 58, 62] Classification setting:
More informationLinear Classifiers. Michael Collins. January 18, 2012
Linear Classifiers Michael Collins January 18, 2012 Today s Lecture Binary classification problems Linear classifiers The perceptron algorithm Classification Problems: An Example Goal: build a system that
More informationCS-E3210 Machine Learning: Basic Principles
CS-E3210 Machine Learning: Basic Principles Lecture 3: Regression I slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 48 In a nutshell
More informationClassification with Perceptrons. Reading:
Classification with Perceptrons Reading: Chapters 1-3 of Michael Nielsen's online book on neural networks covers the basics of perceptrons and multilayer neural networks We will cover material in Chapters
More informationBinary Classification / Perceptron
Binary Classification / Perceptron Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Supervised Learning Input: x 1, y 1,, (x n, y n ) x i is the i th data
More informationGradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725
Gradient descent Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Gradient descent First consider unconstrained minimization of f : R n R, convex and differentiable. We want to solve
More informationRegression and Classification" with Linear Models" CMPSCI 383 Nov 15, 2011!
Regression and Classification" with Linear Models" CMPSCI 383 Nov 15, 2011! 1 Todayʼs topics" Learning from Examples: brief review! Univariate Linear Regression! Batch gradient descent! Stochastic gradient
More informationLogistic Regression. COMP 527 Danushka Bollegala
Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will
More informationMLCC 2017 Regularization Networks I: Linear Models
MLCC 2017 Regularization Networks I: Linear Models Lorenzo Rosasco UNIGE-MIT-IIT June 27, 2017 About this class We introduce a class of learning algorithms based on Tikhonov regularization We study computational
More informationLogistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu
Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data
More informationLinear Discrimination Functions
Laurea Magistrale in Informatica Nicola Fanizzi Dipartimento di Informatica Università degli Studi di Bari November 4, 2009 Outline Linear models Gradient descent Perceptron Minimum square error approach
More informationNeural Networks DWML, /25
DWML, 2007 /25 Neural networks: Biological and artificial Consider humans: Neuron switching time 0.00 second Number of neurons 0 0 Connections per neuron 0 4-0 5 Scene recognition time 0. sec 00 inference
More informationGradient Descent. Ryan Tibshirani Convex Optimization /36-725
Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like
More informationIntroduction to Machine Learning. Regression. Computer Science, Tel-Aviv University,
1 Introduction to Machine Learning Regression Computer Science, Tel-Aviv University, 2013-14 Classification Input: X Real valued, vectors over real. Discrete values (0,1,2,...) Other structures (e.g.,
More informationLinear Regression (continued)
Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression
More informationLecture 4 Logistic Regression
Lecture 4 Logistic Regression Dr.Ammar Mohammed Normal Equation Hypothesis hθ(x)=θ0 x0+ θ x+ θ2 x2 +... + θd xd Normal Equation is a method to find the values of θ operations x0 x x2.. xd y x x2... xd
More informationLinear discriminant functions
Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative
More informationLinear Regression 1 / 25. Karl Stratos. June 18, 2018
Linear Regression Karl Stratos June 18, 2018 1 / 25 The Regression Problem Problem. Find a desired input-output mapping f : X R where the output is a real value. x = = y = 0.1 How much should I turn my
More informationMachine Learning. Linear Models. Fabio Vandin October 10, 2017
Machine Learning Linear Models Fabio Vandin October 10, 2017 1 Linear Predictors and Affine Functions Consider X = R d Affine functions: L d = {h w,b : w R d, b R} where ( d ) h w,b (x) = w, x + b = w
More informationCSC242: Intro to AI. Lecture 21
CSC242: Intro to AI Lecture 21 Administrivia Project 4 (homeworks 18 & 19) due Mon Apr 16 11:59PM Posters Apr 24 and 26 You need an idea! You need to present it nicely on 2-wide by 4-high landscape pages
More informationAn Introduction to Statistical and Probabilistic Linear Models
An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning
More informationClassification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).
Regression and PCA Classification The goal: map from input X to a label Y. Y has a discrete set of possible values We focused on binary Y (values 0 or 1). But we also discussed larger number of classes
More informationEngineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers
Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:
More informationSupervised Learning. George Konidaris
Supervised Learning George Konidaris gdk@cs.brown.edu Fall 2017 Machine Learning Subfield of AI concerned with learning from data. Broadly, using: Experience To Improve Performance On Some Task (Tom Mitchell,
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationMore about the Perceptron
More about the Perceptron CMSC 422 MARINE CARPUAT marine@cs.umd.edu Credit: figures by Piyush Rai and Hal Daume III Recap: Perceptron for binary classification Classifier = hyperplane that separates positive
More informationLinear Regression. S. Sumitra
Linear Regression S Sumitra Notations: x i : ith data point; x T : transpose of x; x ij : ith data point s jth attribute Let {(x 1, y 1 ), (x, y )(x N, y N )} be the given data, x i D and y i Y Here D
More informationMachine Learning (CS 567) Lecture 3
Machine Learning (CS 567) Lecture 3 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationNeural Networks. Advanced data-mining. Yongdai Kim. Department of Statistics, Seoul National University, South Korea
Neural Networks Advanced data-mining Yongdai Kim Department of Statistics, Seoul National University, South Korea What is Neural Networks? One of supervised learning method using one or more hidden layer.
More informationGRADIENT DESCENT AND LOCAL MINIMA
GRADIENT DESCENT AND LOCAL MINIMA 25 20 5 15 10 3 2 1 1 2 5 2 2 4 5 5 10 Suppose for both functions above, gradient descent is started at the point marked red. It will walk downhill as far as possible,
More informationSingle layer NN. Neuron Model
Single layer NN We consider the simple architecture consisting of just one neuron. Generalization to a single layer with more neurons as illustrated below is easy because: M M The output units are independent
More informationFA Homework 2 Recitation 2
FA17 10-701 Homework 2 Recitation 2 Logan Brooks,Matthew Oresky,Guoquan Zhao October 2, 2017 Logan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 1 / 15 Outline
More informationLinear Regression. Aarti Singh. Machine Learning / Sept 27, 2010
Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X
More informationNeural Networks: Backpropagation
Neural Networks: Backpropagation Seung-Hoon Na 1 1 Department of Computer Science Chonbuk National University 2018.10.25 eung-hoon Na (Chonbuk National University) Neural Networks: Backpropagation 2018.10.25
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationRegression with Numerical Optimization. Logistic
CSG220 Machine Learning Fall 2008 Regression with Numerical Optimization. Logistic regression Regression with Numerical Optimization. Logistic regression based on a document by Andrew Ng October 3, 204
More informationLINEAR REGRESSION, RIDGE, LASSO, SVR
LINEAR REGRESSION, RIDGE, LASSO, SVR Supervised Learning Katerina Tzompanaki Linear regression one feature* Price (y) What is the estimated price of a new house of area 30 m 2? 30 Area (x) *Also called
More informationCSE 151 Machine Learning. Instructor: Kamalika Chaudhuri
CSE 151 Machine Learning Instructor: Kamalika Chaudhuri Linear Classification Given labeled data: (xi, feature vector yi) label i=1,..,n where y is 1 or 1, find a hyperplane to separate from Linear Classification
More informationLogistic Regression. William Cohen
Logistic Regression William Cohen 1 Outline Quick review classi5ication, naïve Bayes, perceptrons new result for naïve Bayes Learning as optimization Logistic regression via gradient ascent Over5itting
More informationLinear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x))
Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard and Mitch Marcus (and lots original slides by
More informationSupport Vector Machines for Classification: A Statistical Portrait
Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More informationLogistic Regression. Some slides adapted from Dan Jurfasky and Brendan O Connor
Logistic Regression Some slides adapted from Dan Jurfasky and Brendan O Connor Naïve Bayes Recap Bag of words (order independent) Features are assumed independent given class P (x 1,...,x n c) =P (x 1
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationECS171: Machine Learning
ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018 Gradient descent Optimization Goal: find the minimizer of a function min f (w) w For now we assume f
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationLinear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7.
Preliminaries Linear models: the perceptron and closest centroid algorithms Chapter 1, 7 Definition: The Euclidean dot product beteen to vectors is the expression d T x = i x i The dot product is also
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 24) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu October 2, 24 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 24) October 2, 24 / 24 Outline Review
More informationNeural Networks. Haiming Zhou. Division of Statistics Northern Illinois University.
Neural Networks Haiming Zhou Division of Statistics Northern Illinois University zhouh@niu.edu Neural Networks The term neural network has evolved to encompass a large class of models and learning methods.
More informationCS260: Machine Learning Algorithms
CS260: Machine Learning Algorithms Lecture 4: Stochastic Gradient Descent Cho-Jui Hsieh UCLA Jan 16, 2019 Large-scale Problems Machine learning: usually minimizing the training loss min w { 1 N min w {
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More informationCSC 411 Lecture 17: Support Vector Machine
CSC 411 Lecture 17: Support Vector Machine Ethan Fetaya, James Lucas and Emad Andrews University of Toronto CSC411 Lec17 1 / 1 Today Max-margin classification SVM Hard SVM Duality Soft SVM CSC411 Lec17
More information6.036 midterm review. Wednesday, March 18, 15
6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that
More informationMachine Learning Practice Page 2 of 2 10/28/13
Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes
More informationCE213 Artificial Intelligence Lecture 14
CE213 Artificial Intelligence Lecture 14 Neural Networks: Part 2 Learning Rules -Hebb Rule - Perceptron Rule -Delta Rule Neural Networks Using Linear Units [ Difficulty warning: equations! ] 1 Learning
More informationThe Perceptron Algorithm 1
CS 64: Machine Learning Spring 5 College of Computer and Information Science Northeastern University Lecture 5 March, 6 Instructor: Bilal Ahmed Scribe: Bilal Ahmed & Virgil Pavlu Introduction The Perceptron
More informationLecture 6. Notes on Linear Algebra. Perceptron
Lecture 6. Notes on Linear Algebra. Perceptron COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne This lecture Notes on linear algebra Vectors
More informationOptimization and Gradient Descent
Optimization and Gradient Descent INFO-4604, Applied Machine Learning University of Colorado Boulder September 12, 2017 Prof. Michael Paul Prediction Functions Remember: a prediction function is the function
More information18.6 Regression and Classification with Linear Models
18.6 Regression and Classification with Linear Models 352 The hypothesis space of linear functions of continuous-valued inputs has been used for hundreds of years A univariate linear function (a straight
More informationWarm up: risk prediction with logistic regression
Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T
More informationAN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009
AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009 SUPERVISED LEARNING We are given some training data: We must learn a function If y is discrete, we call it classification If it is
More informationTutorial on Machine Learning for Advanced Electronics
Tutorial on Machine Learning for Advanced Electronics Maxim Raginsky March 2017 Part I (Some) Theory and Principles Machine Learning: estimation of dependencies from empirical data (V. Vapnik) enabling
More informationLearning Decision Trees
Learning Decision Trees Machine Learning Fall 2018 Some slides from Tom Mitchell, Dan Roth and others 1 Key issues in machine learning Modeling How to formulate your problem as a machine learning problem?
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationMachine Learning, Fall 2012 Homework 2
0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0
More informationLast updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship
More informationMachine Learning 2017
Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section
More informationSelected Topics in Optimization. Some slides borrowed from
Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model
More informationCOS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION
COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:
More informationLogistic Regression & Neural Networks
Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability
More informationError Functions & Linear Regression (2)
Error Functions & Linear Regression (2) John Kelleher & Brian Mac Namee Machine Learning @ DIT Overview 1 Introduction Overview 2 Linear Classifiers Threshold Function Perceptron Learning Rule Training/Learning
More information18.9 SUPPORT VECTOR MACHINES
744 Chapter 8. Learning from Examples is the fact that each regression problem will be easier to solve, because it involves only the examples with nonzero weight the examples whose kernels overlap the
More informationPracticals 5 : Perceptron
Université Paul Sabatier M2 SE Data Mining Practicals 5 : Perceptron Framework The aim of this last session is to introduce the basics of neural networks theory through the special case of the perceptron.
More informationCoordinate Descent and Ascent Methods
Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)
More informationMachine Learning (CS 567) Lecture 5
Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationMachine Learning. Linear Models. Fabio Vandin October 10, 2017
Machine Learning Linear Models Fabio Vandin October 10, 2017 1 Linear Predictors and Affine Functions Consider X = R d Affine functions: L d = {h w,b : w R d, b R} where ( d ) h w,b (x) = w, x + b = w
More informationCS 4700: Artificial Intelligence
CS 4700: Foundations of Artificial Intelligence Fall 2017 Instructor: Prof. Haym Hirsh Lecture 18 Prelim Grade Distribution Homework 3: Out Today Extra Credit Opportunity: 4:15pm Today, Gates G01 Relaxing
More informationMachine Learning. A. Supervised Learning A.1. Linear Regression. Lars Schmidt-Thieme
Machine Learning A. Supervised Learning A.1. Linear Regression Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany
More informationNeural Networks (Part 1) Goals for the lecture
Neural Networks (Part ) Mark Craven and David Page Computer Sciences 760 Spring 208 www.biostat.wisc.edu/~craven/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed
More informationLogic and machine learning review. CS 540 Yingyu Liang
Logic and machine learning review CS 540 Yingyu Liang Propositional logic Logic If the rules of the world are presented formally, then a decision maker can use logical reasoning to make rational decisions.
More informationConvex Optimization / Homework 1, due September 19
Convex Optimization 1-725/36-725 Homework 1, due September 19 Instructions: You must complete Problems 1 3 and either Problem 4 or Problem 5 (your choice between the two). When you submit the homework,
More informationThe Perceptron Algorithm
The Perceptron Algorithm Greg Grudic Greg Grudic Machine Learning Questions? Greg Grudic Machine Learning 2 Binary Classification A binary classifier is a mapping from a set of d inputs to a single output
More informationCS 6375 Machine Learning
CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.
More informationLinear Regression with mul2ple variables. Mul2ple features. Machine Learning
Linear Regression with mul2ple variables Mul2ple features Machine Learning Mul4ple features (variables). Size (feet 2 ) Price ($1000) 2104 460 1416 232 1534 315 852 178 Mul4ple features (variables). Size
More informationLearning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text
Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute
More informationMulticlass Classification-1
CS 446 Machine Learning Fall 2016 Oct 27, 2016 Multiclass Classification Professor: Dan Roth Scribe: C. Cheng Overview Binary to multiclass Multiclass SVM Constraint classification 1 Introduction Multiclass
More informationSecurity Analytics. Topic 6: Perceptron and Support Vector Machine
Security Analytics Topic 6: Perceptron and Support Vector Machine Purdue University Prof. Ninghui Li Based on slides by Prof. Jenifer Neville and Chris Clifton Readings Principle of Data Mining Chapter
More informationPolyhedral Computation. Linear Classifiers & the SVM
Polyhedral Computation Linear Classifiers & the SVM mcuturi@i.kyoto-u.ac.jp Nov 26 2010 1 Statistical Inference Statistical: useful to study random systems... Mutations, environmental changes etc. life
More informationCPSC 340: Machine Learning and Data Mining
CPSC 340: Machine Learning and Data Mining Linear Classifiers: predictions Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due Friday of next
More informationIntroduction to Logistic Regression and Support Vector Machine
Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel
More informationThe Perceptron. Volker Tresp Summer 2016
The Perceptron Volker Tresp Summer 2016 1 Elements in Learning Tasks Collection, cleaning and preprocessing of training data Definition of a class of learning models. Often defined by the free model parameters
More informationNeural networks and support vector machines
Neural netorks and support vector machines Perceptron Input x 1 Weights 1 x 2 x 3... x D 2 3 D Output: sgn( x + b) Can incorporate bias as component of the eight vector by alays including a feature ith
More informationMachine Learning and Data Mining. Linear classification. Kalev Kask
Machine Learning and Data Mining Linear classification Kalev Kask Supervised learning Notation Features x Targets y Predictions ŷ = f(x ; q) Parameters q Program ( Learner ) Learning algorithm Change q
More informationIs the test error unbiased for these programs? 2017 Kevin Jamieson
Is the test error unbiased for these programs? 2017 Kevin Jamieson 1 Is the test error unbiased for this program? 2017 Kevin Jamieson 2 Simple Variable Selection LASSO: Sparse Regression Machine Learning
More informationFeedforward Neural Nets and Backpropagation
Feedforward Neural Nets and Backpropagation Julie Nutini University of British Columbia MLRG September 28 th, 2016 1 / 23 Supervised Learning Roadmap Supervised Learning: Assume that we are given the features
More information