Mustafa Jarrar. Machine Learning Linear Regression. Birzeit University. Mustafa Jarrar: Lecture Notes on Linear Regression Birzeit University, 2018

Size: px
Start display at page:

Download "Mustafa Jarrar. Machine Learning Linear Regression. Birzeit University. Mustafa Jarrar: Lecture Notes on Linear Regression Birzeit University, 2018"

Transcription

1 Mustafa Jarrar: Lecture Notes on Linear Regression Birzeit University, 2018 Version 1 Machine Learning Linear Regression Mustafa Jarrar Birzeit University 1

2 Watch this lecture and download the slides Course Page: More Online Courses at: Acknowledgement: This lecture is based on (but not limited to) Andrew Ng s course about Machine Learning 2

3 Mustafa Jarrar: Lecture Notes on Linear Regression Machine Learning Birzeit University, 2018 Machine Learning Linear Regression In this lecture: Part 1: Motivation (Regression Problems) Part 2: Linear Regression Part 3: The Cost Function Part 4: The Gradient Descent Algorithm Part 5: The Normal Euation Part 6: Linear Algebra overview Part 7: Using Octave Part 8: Using R 3

4 Motivation Given the following housing prices, Price (in 1000s of dollars) Size (feet 2 ) How much a house of 1250ft 2 costs? We may assume a linear curve, So, conclude that a house with 1250ft 2 costs 230K$. 4

5 Motivation Given the following housing prices Price (in 1000s of dollars) Supervised Learning: Given the right answers for each eample in the data (training data) Size (feet 2 ) Regression Problem: Predict real-valued output Remember that classification (not regression) refers to predicting discrete-valued output 5

6 Motivation Given the following training set of housing prices: Our job is to learn from this data how to predict prices Size in feet 2 () Price ($) in 1000s (y) Notation: m = Number of training eamples s = input variable/features y s = output variable/target variable (, y) : a training eample ( i, y i ) : the i th training eample For eample: 1 = 2104 y 1 = 460 6

7 Mustafa Jarrar: Lecture Notes on Linear Regression Machine Learning Birzeit University, 2018 Machine Learning Linear Regression In this lecture: Part 1: Motivation (Regression Problems) Part 2: Linear Regression Part 3: The Cost Function Part 4: The Gradient Descent Algorithm Part 5: The Normal Euation Part 6: Linear Algebra overview Part 7: Using Octave Part 8: Using R 7

8 Linear Regression Training Set How to represent h? Learning Algorithm y h () = h() Size of house h Hypothesis Estimated Price This is a linear function. h maps s to y s Also called linear regression with one variable. 8

9 Linear Regression with Multiple Features Linear regression with multiple features is also called multiple linear regression Suppose we have the following features y Size ft 2 bedrooms floors Age Price A hypothesis function h() might be: h() = n n or h() = n n ( 0 =1) e.g., h() =

10 Polynomial Regression Suppose we have the following features y Size ft 2 bedrooms floors Age Price Another hypothesis function h() might be (polynomial): h () = n n n ( 0 =1) 10

11 Polynomial Regression Price (y) h () = h () = ! h () = Size () We may combine features (e.g., size = width * depth). We have the option of what features and what models (uadric, cubic, ) to use. Deciding which features and models that best fit our data and application, is beyond the scope of this course, but there are several algorithms for this. 11

12 Mustafa Jarrar: Lecture Notes on Linear Regression Machine Learning Birzeit University, 2018 Machine Learning Linear Regression In this lecture: Part 1: Motivation (Regression Problems) Part 2: Linear Regression Part 3: The Cost Function Part 4: The Gradient Descent Algorithm Part 5: The Normal Euation Part 6: Linear Algebra overview Part 7: Using Octave Part 8: Using R 12

13 Understanding s Values h () = i s: Parameters How to choose i s? y 13

14 Understanding s Values h() = h() = h() = = = 0 0 = 0 1 = = 1 1 =

15 The Cost Function y Idea: Choose 0, 1 so that h() is close to y for our training eamples (,y). y: is the actual value. h(): is the estimated value. The Cost Function J aims to find 0, 1 that minimizes the error. 0, 1 m 1 J( ( h( 2m i ) y i ) 2 0, 1 ) = i=1 This cost function is also called a Suared Error function 15

16 Undersetting the Cost Function Lets try different values of s and see how the cost function J( 0, 1 ) behave! J(0,1) J(0,0.5) J(0,0) 1 2 J(1)= 1/2 3 [(0) 2 +(0) 2 (0) 2 ] = 0 Given our training data, this is the best value of J(0.5)= 1/2 3 [(0.5-1) 2 +(1-2) 2 (1.5-3) 2 ] J( 1 ) J(0)= 1/2 3 [(1) 2 +(2) 2 (3) 2 ] 2.3 But this figure plots only 1, the problem becomes complicated if we have also ( 0. 1) 3 16

17 Cost Function Intuition II Remember that our goal is find the minimum values of 0 and 1 Hypothesis: Parameters: h () = , 1 Cost Function: J( 0, 1 ) = m 1 ( h( 2m i ) y i ) 2 i=1 Our Goal: minimize J( 0, 1 ) 17

18 Cost Function Intuition II Given any dataset, when we try to draw the cost function J( 0, 1 ), we may get this 3D shape: Given a dataset, our goal is find the minimum values of J( 0, 1 ) 18

19 Cost Function Intuition II We may draw the cost function also using contour figures: 19

20 Cost Function Intuition II h () = ( 0, 1 ) 20

21 Cost Function Intuition II h () = ( 0, 1 ) 21

22 Cost Function Intuition II ( 0, 1 ) Is there any way/algorithm to find s automatically? Yes, e.g., the Gradient Descent Algorithm 22

23 Mustafa Jarrar: Lecture Notes on Linear Regression Machine Learning Birzeit University, 2018 Machine Learning Linear Regression In this lecture: Part 1: Motivation (Regression Problems) Part 2: Linear Regression Part 3: The Cost Function Part 4: The Gradient Descent Algorithm Part 5: The Normal Euation Part 6: Linear Algebra overview Part 7: Using Octave Part 8: Using R 23

24 The Gradient Descent Algorithm Starts with some initial values of ' 0 and ' 1 Keep changing ' 0 and ' 1 to reduce.(' 0, ' 1 ) Until hopefully we end up at a minimum Repeat until convergence {!"#$0 ' 0 α.!"#$1 '1 α. ' 0 : =!"#$0 + J is 0 or 1..(' +, 0, ' 1 ) - +..(' +, 0, ' 1 ) 3 } ' 1 : =!"#$1 Learning Rate Partial Derivation 24

25 The Problem of the Gradient Descent algo. 1 If α is too small, gradient descent can be slow 1 If α is too large, gradient decent can overshoot the minimum. It may fail to converge or even diverge May converge to global minimum May converge to a local minimum 25

26 The Gradient Descent for Linear Regression Simplified version for linear regression Repeat until convergence { J is 0 and 1! 0! 0 α. '. ( *+' ( (h(. * ) 0 * ) }! 1! 1 α. '. ( *+' ( (h(. * ) 0 * ).. * Net, we will try to use Linear Algebra to numerically minimize!2 (called Normal Euation) without needing to use iterating algorithms like Gradient Descent. However Gradient Descent scales better for bigger datasets. 26

27 Mustafa Jarrar: Lecture Notes on Linear Regression Machine Learning Birzeit University, 2018 Machine Learning Linear Regression In this lecture: Part 1: Motivation (Regression Problems) Part 2: Linear Regression Part 3: The Cost Function Part 4: The Gradient Descent Algorithm Part 5: The Normal Euation Part 6: Linear Algebra overview Part 7: Using Octave Part 8: Using R 27

28 Minimizing!" numerically Another way to numerically (using linear algebra) estimate the optimal values of!". Given the following features: y Size ft 2 bedrooms floors Age Price Convert the features and the target value into matries: X = y=

29 Minimizing!" numerically y Size ft 2 bedrooms floors Age Price add # 0, so to represent! 0 X = y=

30 The Normal Euation To obtain the optimal values (minimized) of!s, Use the following Normal Euation:! = (X T X) -1 X T y Where:!: the set of!s we want to minimize X: the set of features in the training set y: the output values we try to predict X T : the transpose of X X -1 : the inverse of X In Octave: pinv(x *X)*X *y 30

31 Gradient Descent Vs Normal Euation m training eamples, n features Gradient Descent Needs to choose the learning rate (α) Needs many iterations Works well even when n is large Normal Euation No need to choose (α) Don t need to iterate Need to compute (X T X), which takes about O(n 3 ) Slow if n is very large Some matrices (e.g., singular) are non-invertible Recommendation: use the Normal Euation If the number of features in less than 1000, otherwise the Gradient Descent. 31

32 Mustafa Jarrar: Lecture Notes on Linear Regression Machine Learning Birzeit University, 2018 Machine Learning Linear Regression In this lecture: Part 1: Motivation (Regression Problems) Part 2: Linear Regression Part 3: The Cost Function Part 4: The Gradient Descent Algorithm Part 5: The Normal Euation Part 6: Linear Algebra overview Part 7: Using Octave Part 8: Using R 32

33 Matri Vector Multiplication = = = = 7 33

34 Matri Matri Multiplication = Multiple with the first column = 11 9 Multiple with the second column =

35 Matri Inverse If A is an m m matri, and if it has an inverse, then A A -1 = A -1 A = I The multiplication of a matri with its inverse produce an identity matri: = Some matrices do not have inverses e.g., if all cells are zeros For more, please review Linear Algebra 35

36 Matri Transpose Eample: A= A T = Let A be an m n matri, and let B = A T Then B is an n m mati, and B ij = A ji 36

37 Mustafa Jarrar: Lecture Notes on Linear Regression Machine Learning Birzeit University, 2018 Machine Learning Linear Regression In this lecture: Part 1: Motivation (Regression Problems) Part 2: Linear Regression Part 3: The Cost Function Part 4: The Gradient Descent Algorithm Part 5: The Normal Euation Part 6: Linear Algebra overview Part 7: Using Octave Part 8: Using R 37

38 Octave Download from High-level Scientific Programming Language Free alternatives to (and compatible with) Matlab helps in solving linear and nonlinear problems numerically Online Tutorial 38

39 Compute the normal euation using Octave Given X = y= In Octave: load X.tt load y.tt C= pinv(x *X)*X *y Save!.tt! = These are the values of!s we need to use in our hypothesis function h() h () = h () =

40 Mustafa Jarrar: Lecture Notes on Linear Regression Machine Learning Birzeit University, 2018 Machine Learning Linear Regression In this lecture: Part 1: Motivation (Regression Problems) Part 2: Linear Regression Part 3: The Cost Function Part 4: The Gradient Descent Algorithm Part 5: The Normal Euation Part 6: Linear Algebra overview Part 7: Using Octave Part 8: Using R 40

41 R (programming language) Free software environment for statistical computing and graphics that is supported by the R Foundation for Statistical Computing. R comes with many functions that can do sophisticated stuff, and the ability to install additional packages to do much more. Download R: A very good IDE for R is the RStudio: R basics tutorial: UZiDpam7ygg 41

42 Compute the normal euation using R Given X = y= In R: X = as.matri(read.table("~/.tt", header=f, sep=",")) Y = as.matri(read.table("~/y.tt", header=f, sep=",")) thetas = solve( t(x) %*% X ) %*% t(x) %*% Y write.table(thetas, file="~/thetas.tt", row.names=f, col.names=f) t(x): is the transpose of matri X. solve(x): is the inverse of matri X. 42

43 References [1] Andrew Ng s course about Machine Learning [2] Sami Ghawi, Mustafa Jarrar: Lecture Notes on Introduction to Machine Learning, Birzeit University, 2018 [3] Mustafa Jarrar: Lecture Notes on Decision Tree Machine Learning, Birzeit University, 2018 [4] Mustafa Jarrar: Lecture Notes on Linear Regression Machine Learning, Birzeit University,

Linear Regression with mul2ple variables. Mul2ple features. Machine Learning

Linear Regression with mul2ple variables. Mul2ple features. Machine Learning Linear Regression with mul2ple variables Mul2ple features Machine Learning Mul4ple features (variables). Size (feet 2 ) Price ($1000) 2104 460 1416 232 1534 315 852 178 Mul4ple features (variables). Size

More information

Experiment 1: Linear Regression

Experiment 1: Linear Regression Experiment 1: Linear Regression August 27, 2018 1 Description This first exercise will give you practice with linear regression. These exercises have been extensively tested with Matlab, but they should

More information

Lecture 11 Linear regression

Lecture 11 Linear regression Advanced Algorithms Floriano Zini Free University of Bozen-Bolzano Faculty of Computer Science Academic Year 2013-2014 Lecture 11 Linear regression These slides are taken from Andrew Ng, Machine Learning

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 477 Instructor: Tony Jebara Topic Regression Empirical Risk Minimization Least Squares Higher Order Polynomials Under-fitting / Over-fitting Cross-Validation Regression Classification

More information

Least Mean Squares Regression

Least Mean Squares Regression Least Mean Squares Regression Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 Lecture Overview Linear classifiers What functions do linear classifiers express? Least Squares Method

More information

CSC321 Lecture 2: Linear Regression

CSC321 Lecture 2: Linear Regression CSC32 Lecture 2: Linear Regression Roger Grosse Roger Grosse CSC32 Lecture 2: Linear Regression / 26 Overview First learning algorithm of the course: linear regression Task: predict scalar-valued targets,

More information

Linear Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Linear Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com Linear Regression These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these

More information

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Module 2 Lecture 05 Linear Regression Good morning, welcome

More information

Least Mean Squares Regression. Machine Learning Fall 2018

Least Mean Squares Regression. Machine Learning Fall 2018 Least Mean Squares Regression Machine Learning Fall 2018 1 Where are we? Least Squares Method for regression Examples The LMS objective Gradient descent Incremental/stochastic gradient descent Exercises

More information

Regression with Numerical Optimization. Logistic

Regression with Numerical Optimization. Logistic CSG220 Machine Learning Fall 2008 Regression with Numerical Optimization. Logistic regression Regression with Numerical Optimization. Logistic regression based on a document by Andrew Ng October 3, 204

More information

Propositional Logic Introduction and Basics. 2.2 Conditional Statements. 2.3 Inferencing. mjarrar 2015

Propositional Logic Introduction and Basics. 2.2 Conditional Statements. 2.3 Inferencing. mjarrar 2015 9/18/17 Mustafa Jarrar: Lecture Notes in Discrete Mathematics. Birzeit University Palestine 2017 Propositional Logic 2.1. Introduction and Basics 2.2 Conditional Statements 2.3 Inferencing mjarrar 2015

More information

Gradient Descent. Stephan Robert

Gradient Descent. Stephan Robert Gradient Descent Stephan Robert Model Other inputs # of bathrooms Lot size Year built Location Lake view Price (CHF) 6 5 4 3 2 1 2 4 6 8 1 Size (m 2 ) Model How it works Linear regression with one variable

More information

Lecture 2: Linear regression

Lecture 2: Linear regression Lecture 2: Linear regression Roger Grosse 1 Introduction Let s ump right in and look at our first machine learning algorithm, linear regression. In regression, we are interested in predicting a scalar-valued

More information

CSC 411 Lecture 6: Linear Regression

CSC 411 Lecture 6: Linear Regression CSC 411 Lecture 6: Linear Regression Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 06-Linear Regression 1 / 37 A Timely XKCD UofT CSC 411: 06-Linear Regression

More information

Linear Regression. S. Sumitra

Linear Regression. S. Sumitra Linear Regression S Sumitra Notations: x i : ith data point; x T : transpose of x; x ij : ith data point s jth attribute Let {(x 1, y 1 ), (x, y )(x N, y N )} be the given data, x i D and y i Y Here D

More information

LINEAR REGRESSION, RIDGE, LASSO, SVR

LINEAR REGRESSION, RIDGE, LASSO, SVR LINEAR REGRESSION, RIDGE, LASSO, SVR Supervised Learning Katerina Tzompanaki Linear regression one feature* Price (y) What is the estimated price of a new house of area 30 m 2? 30 Area (x) *Also called

More information

Machine Learning. Regression. Manfred Huber

Machine Learning. Regression. Manfred Huber Machine Learning Regression Manfred Huber 2015 1 Regression Regression refers to supervised learning problems where the target output is one or more continuous values Continuous output values imply that

More information

Error Functions & Linear Regression (1)

Error Functions & Linear Regression (1) Error Functions & Linear Regression (1) John Kelleher & Brian Mac Namee Machine Learning @ DIT Overview 1 Introduction Overview 2 Univariate Linear Regression Linear Regression Analytical Solution Gradient

More information

Gradient Descent. Model

Gradient Descent. Model Gradient Descent Stephan Robert Model Other inputs # of bathrooms Lot size Year built Location Lake view Price (HF) 6 5 4 3 2 2 4 6 8 Size (m 2 ) 2 Model How it works Linear regression with one variable

More information

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector

More information

CSC 411: Lecture 02: Linear Regression

CSC 411: Lecture 02: Linear Regression CSC 411: Lecture 02: Linear Regression Richard Zemel, Raquel Urtasun and Sanja Fidler University of Toronto (Most plots in this lecture are from Bishop s book) Zemel, Urtasun, Fidler (UofT) CSC 411: 02-Regression

More information

Ridge Regression 1. to which some random noise is added. So that the training labels can be represented as:

Ridge Regression 1. to which some random noise is added. So that the training labels can be represented as: CS 1: Machine Learning Spring 15 College of Computer and Information Science Northeastern University Lecture 3 February, 3 Instructor: Bilal Ahmed Scribe: Bilal Ahmed & Virgil Pavlu 1 Introduction Ridge

More information

COMP 551 Applied Machine Learning Lecture 2: Linear Regression

COMP 551 Applied Machine Learning Lecture 2: Linear Regression COMP 551 Applied Machine Learning Lecture 2: Linear Regression Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise

More information

COMP 551 Applied Machine Learning Lecture 2: Linear regression

COMP 551 Applied Machine Learning Lecture 2: Linear regression COMP 551 Applied Machine Learning Lecture 2: Linear regression Instructor: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted for this

More information

MATH36061 Convex Optimization

MATH36061 Convex Optimization M\cr NA Manchester Numerical Analysis MATH36061 Convex Optimization Martin Lotz School of Mathematics The University of Manchester Manchester, September 26, 2017 Outline General information What is optimization?

More information

Linear and Logistic Regression. Dr. Xiaowei Huang

Linear and Logistic Regression. Dr. Xiaowei Huang Linear and Logistic Regression Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Two Classical Machine Learning Algorithms Decision tree learning K-nearest neighbor Model Evaluation Metrics

More information

Overview. Linear Regression with One Variable Gradient Descent Linear Regression with Multiple Variables Gradient Descent with Multiple Variables

Overview. Linear Regression with One Variable Gradient Descent Linear Regression with Multiple Variables Gradient Descent with Multiple Variables Overview Linear Regression with One Variable Gradient Descent Linear Regression with Multiple Variables Gradient Descent with Multiple Variables Example: Advertising Data Data taken from An Introduction

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machine Learning (Fall 24) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu October 2, 24 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 24) October 2, 24 / 24 Outline Review

More information

CMU-Q Lecture 24:

CMU-Q Lecture 24: CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input

More information

6.036 midterm review. Wednesday, March 18, 15

6.036 midterm review. Wednesday, March 18, 15 6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that

More information

Sample questions for Fundamentals of Machine Learning 2018

Sample questions for Fundamentals of Machine Learning 2018 Sample questions for Fundamentals of Machine Learning 2018 Teacher: Mohammad Emtiyaz Khan A few important informations: In the final exam, no electronic devices are allowed except a calculator. Make sure

More information

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression CSC2515 Winter 2015 Introduction to Machine Learning Lecture 2: Linear regression All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html

More information

CSC242: Intro to AI. Lecture 21

CSC242: Intro to AI. Lecture 21 CSC242: Intro to AI Lecture 21 Administrivia Project 4 (homeworks 18 & 19) due Mon Apr 16 11:59PM Posters Apr 24 and 26 You need an idea! You need to present it nicely on 2-wide by 4-high landscape pages

More information

ECE521 W17 Tutorial 1. Renjie Liao & Min Bai

ECE521 W17 Tutorial 1. Renjie Liao & Min Bai ECE521 W17 Tutorial 1 Renjie Liao & Min Bai Schedule Linear Algebra Review Matrices, vectors Basic operations Introduction to TensorFlow NumPy Computational Graphs Basic Examples Linear Algebra Review

More information

Linear Models for Regression

Linear Models for Regression Linear Models for Regression CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 The Regression Problem Training data: A set of input-output

More information

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple

More information

VBM683 Machine Learning

VBM683 Machine Learning VBM683 Machine Learning Pinar Duygulu Slides are adapted from Dhruv Batra Bias is the algorithm's tendency to consistently learn the wrong thing by not taking into account all the information in the data

More information

Linear Regression (continued)

Linear Regression (continued) Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression

More information

Naïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Naïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824 Naïve Bayes Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative HW 1 out today. Please start early! Office hours Chen: Wed 4pm-5pm Shih-Yang: Fri 3pm-4pm Location: Whittemore 266

More information

Machine Learning: Assignment 1

Machine Learning: Assignment 1 10-701 Machine Learning: Assignment 1 Due on Februrary 0, 014 at 1 noon Barnabas Poczos, Aarti Singh Instructions: Failure to follow these directions may result in loss of points. Your solutions for this

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining Linear Classifiers: predictions Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due Friday of next

More information

Machine Learning Linear Models

Machine Learning Linear Models Machine Learning Linear Models Outline II - Linear Models 1. Linear Regression (a) Linear regression: History (b) Linear regression with Least Squares (c) Matrix representation and Normal Equation Method

More information

ECE521 Lectures 9 Fully Connected Neural Networks

ECE521 Lectures 9 Fully Connected Neural Networks ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance

More information

Manual for a computer class in ML

Manual for a computer class in ML Manual for a computer class in ML November 3, 2015 Abstract This document describes a tour of Machine Learning (ML) techniques using tools in MATLAB. We point to the standard implementations, give example

More information

Introduction to Logistic Regression and Support Vector Machine

Introduction to Logistic Regression and Support Vector Machine Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel

More information

Set Theory Basics of Set Theory. 6.2 Properties of Sets and Element Argument. 6.3 Algebraic Proofs 6.4 Boolean Algebras.

Set Theory Basics of Set Theory. 6.2 Properties of Sets and Element Argument. 6.3 Algebraic Proofs 6.4 Boolean Algebras. 9/6/17 Mustafa Jarrar: Lecture Notes in Discrete Mathematics. Birzeit University Palestine 2015 Set Theory 6.1. Basics of Set Theory 6.2 Properties of Sets and Element Argument 6.3 Algebraic Proofs 6.4

More information

CSC321 Lecture 5 Learning in a Single Neuron

CSC321 Lecture 5 Learning in a Single Neuron CSC321 Lecture 5 Learning in a Single Neuron Roger Grosse and Nitish Srivastava January 21, 2015 Roger Grosse and Nitish Srivastava CSC321 Lecture 5 Learning in a Single Neuron January 21, 2015 1 / 14

More information

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS LAST TIME Intro to cudnn Deep neural nets using cublas and cudnn TODAY Building a better model for image classification Overfitting

More information

Linear Regression. CSL603 - Fall 2017 Narayanan C Krishnan

Linear Regression. CSL603 - Fall 2017 Narayanan C Krishnan Linear Regression CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis Regularization

More information

Stanford Machine Learning - Week V

Stanford Machine Learning - Week V Stanford Machine Learning - Week V Eric N Johnson August 13, 2016 1 Neural Networks: Learning What learning algorithm is used by a neural network to produce parameters for a model? Suppose we have a neural

More information

Linear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan

Linear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan Linear Regression CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis

More information

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010 Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X

More information

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Lecture - 27 Multilayer Feedforward Neural networks with Sigmoidal

More information

Midterm, Fall 2003

Midterm, Fall 2003 5-78 Midterm, Fall 2003 YOUR ANDREW USERID IN CAPITAL LETTERS: YOUR NAME: There are 9 questions. The ninth may be more time-consuming and is worth only three points, so do not attempt 9 unless you are

More information

Advanced Introduction to Machine Learning

Advanced Introduction to Machine Learning 10-715 Advanced Introduction to Machine Learning Homework Due Oct 15, 10.30 am Rules Please follow these guidelines. Failure to do so, will result in loss of credit. 1. Homework is due on the due date

More information

Midterm: CS 6375 Spring 2015 Solutions

Midterm: CS 6375 Spring 2015 Solutions Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an

More information

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18 CSE 417T: Introduction to Machine Learning Lecture 11: Review Henry Chai 10/02/18 Unknown Target Function!: # % Training data Formal Setup & = ( ), + ),, ( -, + - Learning Algorithm 2 Hypothesis Set H

More information

Machine Learning, Fall 2012 Homework 2

Machine Learning, Fall 2012 Homework 2 0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning Elman Mansimov 1 September 24, 2015 1 Modified based on Shenlong Wang s and Jake Snell s tutorials, with additional contents borrowed from Kevin Swersky and Jasper Snoek

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis Introduction to Natural Computation Lecture 9 Multilayer Perceptrons and Backpropagation Peter Lewis 1 / 25 Overview of the Lecture Why multilayer perceptrons? Some applications of multilayer perceptrons.

More information

Artificial Neural Networks 2

Artificial Neural Networks 2 CSC2515 Machine Learning Sam Roweis Artificial Neural s 2 We saw neural nets for classification. Same idea for regression. ANNs are just adaptive basis regression machines of the form: y k = j w kj σ(b

More information

An introduction to plotting data

An introduction to plotting data An introduction to plotting data Eric D. Black California Institute of Technology v2.0 1 Introduction Plotting data is one of the essential skills every scientist must have. We use it on a near-daily basis

More information

Answers Machine Learning Exercises 4

Answers Machine Learning Exercises 4 Answers Machine Learning Exercises 4 Tim van Erven November, 007 Exercises. The following Boolean functions take two Boolean features x and x as input. The features can take on the values and, where represents

More information

10-701/ Machine Learning - Midterm Exam, Fall 2010

10-701/ Machine Learning - Midterm Exam, Fall 2010 10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam

More information

Bias-Variance Tradeoff

Bias-Variance Tradeoff What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff

More information

10701/15781 Machine Learning, Spring 2007: Homework 2

10701/15781 Machine Learning, Spring 2007: Homework 2 070/578 Machine Learning, Spring 2007: Homework 2 Due: Wednesday, February 2, beginning of the class Instructions There are 4 questions on this assignment The second question involves coding Do not attach

More information

Online Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions?

Online Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions? Online Videos FERPA Sign waiver or sit on the sides or in the back Off camera question time before and after lecture Questions? Lecture 1, Slide 1 CS224d Deep NLP Lecture 4: Word Window Classification

More information

Homework #3 RELEASE DATE: 10/28/2013 DUE DATE: extended to 11/18/2013, BEFORE NOON QUESTIONS ABOUT HOMEWORK MATERIALS ARE WELCOMED ON THE FORUM.

Homework #3 RELEASE DATE: 10/28/2013 DUE DATE: extended to 11/18/2013, BEFORE NOON QUESTIONS ABOUT HOMEWORK MATERIALS ARE WELCOMED ON THE FORUM. Homework #3 RELEASE DATE: 10/28/2013 DUE DATE: extended to 11/18/2013, BEFORE NOON QUESTIONS ABOUT HOMEWORK MATERIALS ARE WELCOMED ON THE FORUM. Unless granted by the instructor in advance, you must turn

More information

Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

More information

Cross-validation for detecting and preventing overfitting

Cross-validation for detecting and preventing overfitting Cross-validation for detecting and preventing overfitting A Regression Problem = f() + noise Can we learn f from this data? Note to other teachers and users of these slides. Andrew would be delighted if

More information

CSE446: non-parametric methods Spring 2017

CSE446: non-parametric methods Spring 2017 CSE446: non-parametric methods Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin and Luke Zettlemoyer Linear Regression: What can go wrong? What do we do if the bias is too strong? Might want

More information

Classification with Perceptrons. Reading:

Classification with Perceptrons. Reading: Classification with Perceptrons Reading: Chapters 1-3 of Michael Nielsen's online book on neural networks covers the basics of perceptrons and multilayer neural networks We will cover material in Chapters

More information

CS 6375 Machine Learning

CS 6375 Machine Learning CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.

More information

CPSC 340: Machine Learning and Data Mining. Regularization Fall 2017

CPSC 340: Machine Learning and Data Mining. Regularization Fall 2017 CPSC 340: Machine Learning and Data Mining Regularization Fall 2017 Assignment 2 Admin 2 late days to hand in tonight, answers posted tomorrow morning. Extra office hours Thursday at 4pm (ICICS 246). Midterm

More information

Machine Learning Basics Lecture 4: SVM I. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 4: SVM I. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 4: SVM I Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d. from distribution

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the

More information

Definition 8.1 Two inequalities are equivalent if they have the same solution set. Add or Subtract the same value on both sides of the inequality.

Definition 8.1 Two inequalities are equivalent if they have the same solution set. Add or Subtract the same value on both sides of the inequality. 8 Inequalities Concepts: Equivalent Inequalities Linear and Nonlinear Inequalities Absolute Value Inequalities (Sections.6 and.) 8. Equivalent Inequalities Definition 8. Two inequalities are equivalent

More information

Number Theory and Proof Methods

Number Theory and Proof Methods 9/6/17 Lecture Notes on Discrete Mathematics. Birzeit University Palestine 2016 Number Theory and Proof Methods Mustafa Jarrar 4.1 Introduction 4.2 Rational Numbers 4.3 Divisibility 4.4 Quotient-Remainder

More information

CSC 411: Lecture 04: Logistic Regression

CSC 411: Lecture 04: Logistic Regression CSC 411: Lecture 04: Logistic Regression Raquel Urtasun & Rich Zemel University of Toronto Sep 23, 2015 Urtasun & Zemel (UofT) CSC 411: 04-Prob Classif Sep 23, 2015 1 / 16 Today Key Concepts: Logistic

More information

ML (cont.): SUPPORT VECTOR MACHINES

ML (cont.): SUPPORT VECTOR MACHINES ML (cont.): SUPPORT VECTOR MACHINES CS540 Bryan R Gibson University of Wisconsin-Madison Slides adapted from those used by Prof. Jerry Zhu, CS540-1 1 / 40 Support Vector Machines (SVMs) The No-Math Version

More information

Vote. Vote on timing for night section: Option 1 (what we have now) Option 2. Lecture, 6:10-7:50 25 minute dinner break Tutorial, 8:15-9

Vote. Vote on timing for night section: Option 1 (what we have now) Option 2. Lecture, 6:10-7:50 25 minute dinner break Tutorial, 8:15-9 Vote Vote on timing for night section: Option 1 (what we have now) Lecture, 6:10-7:50 25 minute dinner break Tutorial, 8:15-9 Option 2 Lecture, 6:10-7 10 minute break Lecture, 7:10-8 10 minute break Tutorial,

More information

ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning ECE 5984: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16 Dhruv Batra Virginia Tech Administrativia HW3 Due: April 14, 11:55pm You will implement

More information

LECTURE # - NEURAL COMPUTATION, Feb 04, Linear Regression. x 1 θ 1 output... θ M x M. Assumes a functional form

LECTURE # - NEURAL COMPUTATION, Feb 04, Linear Regression. x 1 θ 1 output... θ M x M. Assumes a functional form LECTURE # - EURAL COPUTATIO, Feb 4, 4 Linear Regression Assumes a functional form f (, θ) = θ θ θ K θ (Eq) where = (,, ) are the attributes and θ = (θ, θ, θ ) are the function parameters Eample: f (, θ)

More information

Machine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods)

Machine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods) Machine Learning InstanceBased Learning (aka nonparametric methods) Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Non parametric CSE 446 Machine Learning Daniel Weld March

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

Machine Learning Lecture 7

Machine Learning Lecture 7 Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant

More information

Lecture 4 Logistic Regression

Lecture 4 Logistic Regression Lecture 4 Logistic Regression Dr.Ammar Mohammed Normal Equation Hypothesis hθ(x)=θ0 x0+ θ x+ θ2 x2 +... + θd xd Normal Equation is a method to find the values of θ operations x0 x x2.. xd y x x2... xd

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Linear Discriminant Functions

Linear Discriminant Functions Linear Discriminant Functions Linear discriminant functions and decision surfaces Definition It is a function that is a linear combination of the components of g() = t + 0 () here is the eight vector and

More information

COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS16

COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS16 COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS6 Lecture 3: Classification with Logistic Regression Advanced optimization techniques Underfitting & Overfitting Model selection (Training-

More information

Multilayer Neural Networks

Multilayer Neural Networks Multilayer Neural Networks Multilayer Neural Networks Discriminant function flexibility NON-Linear But with sets of linear parameters at each layer Provably general function approximators for sufficient

More information

Multilayer Neural Networks

Multilayer Neural Networks Multilayer Neural Networks Introduction Goal: Classify objects by learning nonlinearity There are many problems for which linear discriminants are insufficient for minimum error In previous methods, the

More information

ECS171: Machine Learning

ECS171: Machine Learning ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018 Gradient descent Optimization Goal: find the minimizer of a function min f (w) w For now we assume f

More information

Jeff Howbert Introduction to Machine Learning Winter

Jeff Howbert Introduction to Machine Learning Winter Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

CE213 Artificial Intelligence Lecture 14

CE213 Artificial Intelligence Lecture 14 CE213 Artificial Intelligence Lecture 14 Neural Networks: Part 2 Learning Rules -Hebb Rule - Perceptron Rule -Delta Rule Neural Networks Using Linear Units [ Difficulty warning: equations! ] 1 Learning

More information

Instructor (Andrew Ng)

Instructor (Andrew Ng) MachineLearning-Lecture02 Instructor (Andrew Ng):All right, good morning, welcome back. So before we jump into today's material, I just have one administrative announcement, which is graders. So I guess

More information

Gradient Descent. Dr. Xiaowei Huang

Gradient Descent. Dr. Xiaowei Huang Gradient Descent Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Three machine learning algorithms: decision tree learning k-nn linear regression only optimization objectives are discussed,

More information