Lecture 11 Linear regression
|
|
- Elfreda Owens
- 5 years ago
- Views:
Transcription
1 Advanced Algorithms Floriano Zini Free University of Bozen-Bolzano Faculty of Computer Science Academic Year Lecture 11 Linear regression These slides are taken from Andrew Ng, Machine Learning on Coursera - 1
2 Machine Learning definition Arthur Samuel (1959) Machine Learning: Field of study that gives computers the ability to learn without being explicitly programmed The Samuel Checkers-playing Program appears to be the world's first self-learning program Machine Learning definition Tom Mitchell (1998) Well-posed Learning Problem: A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E 2
3 A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E. Suppose your program watches which s you do or do not mark as spam, and based on that learns how to better filter spam. What is the task T in this setting? q Classifying s as spam or not spam q Watching you label s as spam or not spam. q The number (or fraction) of s correctly classified as spam/not spam. q None of the above this is not a machine learning problem A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E. Suppose your program watches which s you do or do not mark as spam, and based on that learns how to better filter spam. What is the task T in this setting? T E P q Classifying s as spam or not spam q Watching you label s as spam or not spam. q The number (or fraction) of s correctly classified as spam/not spam. q None of the above this is not a machine learning problem 3
4 Examples of ML applications (1) spam filtering T: to predict which are spam for a given user P: % of s correctly predicted E: a set of s idenafied as spam/non spam for the user Web pages categorizaaon T: to categorize Web pages in predefined categories P: % of Web pages correctly categorized E: a set of Web pages with specified categories spam? Which cat.? no yes Examples of ML applications (2) HandwriAng recogniaon T: to recognize and classify handwriken words within images P: % of words correctly classified E: a database of handwriken words with given classificaaons (i.e., labels) Robot driving T: to drive on public highways using vision sensors P: average distance traveled before an error (as judged by human observer) E: a sequence of images and steering commands recorded while observing a human driver Which word? Which steering command? we do in the right way Go straight Move left Move right Slow down Speed up 4
5 Examples of ML applications (3) Medical diagnosis T: suggest treatments to doctors P: % of suggesaons accepted by the doctors E: a database of electronic health records including <observaaons,treatment> pairs Movie recommendaaon T: to recommend people movies they like P: overall saasfacaon of the users E: movie raangs given by the users of the system Which movie? Which treatment? t1 t2 t3 t4 t5 t6 Machine learning algorithms Supervised learning Unsupervised learning (e.g., clustering) In this lecture 1 algorithm for supervised learning Linear regression These topics are expanded in the courses: Data Mining by Dr. Mouna Kacimi (second semester) Information Search and Retrieval by Prof. Francesco Ricci (second semester) 5
6 Supervised learning Housing price prediction Price ($) in 1000 s Size in feet 2 Supervised Learning right answers given Regression: Predict continuous valued output (price) Supervised learning Breast cancer (malignant, benign) 1(Y) Classification Discrete valued output (0 or 1) Malignant? 0(N) Tumor size Tumor s ize In this learning problem we consider just one feature, i.e., the tumor size 6
7 Supervised learning Age Tumor size In this learning problem we consider two features, i.e., the tumor size, and the age In general, in real cases there are much more features. E.g., Clump Thickness Uniformity of Cell Size Uniformity of Cell Shape You re running a company, and you want to develop learning algorithms to address each of two problems Problem 1: You have a large inventory of identical items. You want to predict how many of these items will sell over the next 3 months Problem 2: You d like software to examine individual customer accounts, and for each account decide if it has been hacked/ compromised Should you treat these as classification or as regression problems? q Treat both as classification problems q Treat problem 1 as a classification problem, problem 2 as a regression problem q Treat problem 1 as a regression problem, problem 2 as a classification problem q Treat both as regression problems 7
8 You re running a company, and you want to develop learning algorithms to address each of two problems Problem 1: You have a large inventory of identical items. You want to predict how many of these items will sell over the next 3 months Problem 2: You d like software to examine individual customer accounts, and for each account decide if it has been hacked/ compromised Should you treat these as classification or as regression problems? q Treat both as classification problems q Treat problem 1 as a classification problem, problem 2 as a regression problem q Treat problem 1 as a regression problem, problem 2 as a classification problem 0 - not hacked q Treat both as regression problems 1 - hacked Linear regression with one variable 8
9 Model representation Housing Prices (Portland, OR) Price (in 1000s of dollars) Size (feet 2 ) Supervised Learning Given the right answer for each example in the data Regression Problem Predict real-valued output Model representation Training set of housing prices (Portland, OR) Price ($) in 1000's (y) Size in feet 2 (x) m=47 Notation: m = Number of training examples x s = input variable / features y s = output variable / target variable (x, y) = one training example (x (i), y (i) )= i th training example E.g., x (1) =2104 x (2) =1416 y (1) =460 (x (3), y (3) ) = (1534, 315) 9
10 Model representation How do we represent h? Training Set shorthand: h(x) Size of house x Learning Algorithm h hypothesis h maps from x s to y s Estimated price (estimated value of y) y x Linear regression with one variable. Univariate linear regression. Cost function Training set Size in feet 2 (x) Price ($) in 1000's (y) m=47 Hypothesis: s: Parameters How to choose s? 10
11 Cost function h(x) = 1+0.5*x 2 h(x) = *x 2 h(x) = 0.5*x
12 Cost function y Idea: Choose so that is close to for our training examples x Cost function Goal: is also called squared error function 12
13 Cost function intuition 1 Hypothesis: Simplified Parameters: h(x) h(x) Cost Function: Goal: Cost function intuition 1 (for fixed, this is a function of x) (function of the parameter ) 3 2 y x = J(1)= 0 13
14 Cost function intuition 1 (for fixed, this is a function of x) (function of the parameter ) y x J(0.5)= 0.58 = 1/(2*3) * [(0.5-1) 2 +(1-2) 2 +(1.5-3) 2 ] 0.58 Cost function intuition 1 (for fixed, this is a function of x) (function of the parameter ) 3 y x 2 3 = 1/(2*3) * [ ] J(0)=
15 3 3 15
16 Cost function intuition 2 Hypothesis: Parameters: Cost Function: Goal: Cost function intuition 2 (for fixed, this is a function of x) (function of the parameters ) 500 Price ($) in 1000 s Size in feet 2 (x)
17 Cost function intuition 2 (for fixed, this is a function of x) (function of the parameters ) Contour plot h(x) = *x = K Cost function intuition 2 (for fixed, this is a function of x) (function of the parameters ) h(x) = * x
18 Cost function intuition 2 (for fixed, this is a function of x) (function of the parameters ) h(x) Gradient descent Have some function Want More in general: J(θ 0, θ 1,., θ n ) min θ J(θ 0, θ 1,., θ n ) Outline: Start with some (e.g., θ 0 = 0, θ 1 = 0) Keep changing to reduce until we hopefully end up at a minimum 18
19 Gradient descent J(θ 0,θ 1 ) θ 1 θ 0 Gradient descent J(θ 0,θ 1 ) θ 0 θ 1 19
20 Gradient descent algorithm Learning rate Correct: Simultaneous update Incorrect: 20
21 Gradient descent intuition Gradient descent algorithm Learning rate Derivative Simplification min J(θ 1 ),θ 1 R θ 1 21
22 Gradient descent intuition J(θ 1 ),θ 1 R θ 1 :=θ 1 α d dθ 1 J(θ 1 ) > 0 θ 1 :=θ 1 α (positive number) θ 1 d dθ 1 J(θ 1 ) 0 θ 1 :=θ 1 α (negative number) θ 1 Gradient descent intuition J(θ 1 ) If α is too small, gradient descent can be slow If α is too large, gradient descent can overshoot the minimum. It may fail to converge, or even diverge. J(θ 1 ) 22
23 What if is already at the local minimum? at local optima = 0 Current value of θ 1 :=θ 1 α * 0 unchanged Gradient descent intuition Gradient descent can converge to a local minimum, even with the learning rate α fixed As we approach a local minimum, gradient descent will automatically take smaller steps So, no need to decrease α over time 23
24 Gradient descent for linear regression Gradient descent algorithm Linear Regression Model Key term We apply gradient descent to minimize J(θ 0,θ 1 ) : min J(θ 0,θ 1 ) θ 0,θ 1 R θ 0,θ 1 Gradient descent for linear regression 1 θ j 2m = 1 θ j 2m m 2 ( h θ (x (i) ) y (i) ) = i=1 m ( θ 0 +θ 1 x (i) y (i)) i=1 2 m 1 h θ (x (i) ) y (i) m i=1 ( ) m 1 ( h θ (x (i) ) y (i) ) x (i) m i=1 24
25 Gradient descent for linear regression Gradient descent algorithm update and simultaneously Gradient descent for linear regression In general, the local optimum of the cost function J found by the gradient descent algorithm depends on the initialization of the parameters J(θ 0,θ 1 ) θ 0 θ 1 25
26 Gradient descent for linear regression The cost function J for liner regression is always a convex (or bowl-shaped ) function Only 1 global optimum Gradient descent for linear regression (for fixed, this is a function of x) (function of the parameters ) h(x) = *x 26
27 Gradient descent for linear regression (for fixed, this is a function of x) (function of the parameters ) Gradient descent for linear regression (for fixed, this is a function of x) (function of the parameters ) 27
28 Gradient descent for linear regression (for fixed, this is a function of x) (function of the parameters ) Gradient descent for linear regression (for fixed, this is a function of x) (function of the parameters ) 28
29 Gradient descent for linear regression (for fixed, this is a function of x) (function of the parameters ) Gradient descent for linear regression (for fixed, this is a function of x) (function of the parameters ) 29
30 Gradient descent for linear regression (for fixed, this is a function of x) (function of the parameters ) Gradient descent for linear regression (for fixed, this is a function of x) (function of the parameters ) 30
31 Gradient descent for linear regression Batch Gradient Descent Each step of gradient descent uses all training examples in calculating the derivatives There are variants: Stochastic: uses 1 example in each iteration Faster algorithm, but may not reach the global minimum, even tough it gets close to it Mini-batch: uses b examples in each iteration, (1<b<m) 31
32 Linear regression with multiple variables 32
33 Multiple features Size (feet 2 ) Price ($1000) Single feature (variable) Multiple features (variables) Size (feet 2 ) Number of bedrooms Number of floors Age of home (years) Price ($1000) Multiple features Multiple features (variables) Size (feet 2 ) Number of Number of Age of home Price ($1000) bedrooms floors (years) x 1 x 2 x 3 x 4 y m=47 n=4 Notation: n = number of features x (i) = input (features) of i th training example x (i) j = value of feature j in ith training example! # x (2) = # # # " $ & & & & % x 3 (2) = 2 33
34 Multiple features Hypothesis: Previously: Now: h θ (x) =θ 0 +θ 1 x 1 +θ 2 x 2 +θ 3 x 3 +θ 4 x 4 E.g.: h θ (x) = * x * x 2 + 3* x 3 2 * x 4 size # bedrooms # floors age Multiple features For convenience of notation, define x 0 (i) =1! # # # x = # # # "# x 0 x 1 x 2 x n $! & # & # & & R n+1 # θ = # & # & # %& "# θ 0 θ 1 θ 2 θ n $ & & & & R n+1 & & %& =1 h θ (x) =θ 0 x 0 +θ 1 x 1 + +θ n x n =θ T x Multivariate linear regression 34
35 Gradient descent for multiple variables Hypothesis: =1 Parameters: θ R n+1 n+1 dimensional vector Cost function: Gradient descent: Repeat J(θ) J(θ) (simultaneously update for every ) Gradient descent for multiple variables Gradient Descent Previously (n=1): Repeat New algorithm : Repeat (simultaneously update ) for E.g., n=2 (simultaneously update ) 35
Mustafa Jarrar. Machine Learning Linear Regression. Birzeit University. Mustafa Jarrar: Lecture Notes on Linear Regression Birzeit University, 2018
Mustafa Jarrar: Lecture Notes on Linear Regression Birzeit University, 2018 Version 1 Machine Learning Linear Regression Mustafa Jarrar Birzeit University 1 Watch this lecture and download the slides Course
More informationIntroduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur
Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Module 2 Lecture 05 Linear Regression Good morning, welcome
More informationLecture 4 Logistic Regression
Lecture 4 Logistic Regression Dr.Ammar Mohammed Normal Equation Hypothesis hθ(x)=θ0 x0+ θ x+ θ2 x2 +... + θd xd Normal Equation is a method to find the values of θ operations x0 x x2.. xd y x x2... xd
More informationMachine Learning (CS 567) Lecture 2
Machine Learning (CS 567) Lecture 2 Time: T-Th 5:00pm - 6:20pm Location: GFS118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationIntroduction to Machine Learning. Introduction to ML - TAU 2016/7 1
Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger
More informationCS 6375 Machine Learning
CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.
More informationExperiment 1: Linear Regression
Experiment 1: Linear Regression August 27, 2018 1 Description This first exercise will give you practice with linear regression. These exercises have been extensively tested with Matlab, but they should
More informationLinear Regression with mul2ple variables. Mul2ple features. Machine Learning
Linear Regression with mul2ple variables Mul2ple features Machine Learning Mul4ple features (variables). Size (feet 2 ) Price ($1000) 2104 460 1416 232 1534 315 852 178 Mul4ple features (variables). Size
More informationLeast Mean Squares Regression
Least Mean Squares Regression Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 Lecture Overview Linear classifiers What functions do linear classifiers express? Least Squares Method
More informationMachine Learning. Regression. Manfred Huber
Machine Learning Regression Manfred Huber 2015 1 Regression Regression refers to supervised learning problems where the target output is one or more continuous values Continuous output values imply that
More informationLinear Classifiers. Michael Collins. January 18, 2012
Linear Classifiers Michael Collins January 18, 2012 Today s Lecture Binary classification problems Linear classifiers The perceptron algorithm Classification Problems: An Example Goal: build a system that
More informationLogistic Regression. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824
Logistic Regression Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative Please start HW 1 early! Questions are welcome! Two principles for estimating parameters Maximum Likelihood
More informationMachine Learning. Boris
Machine Learning Boris Nadion boris@astrails.com @borisnadion @borisnadion boris@astrails.com astrails http://astrails.com awesome web and mobile apps since 2005 terms AI (artificial intelligence)
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationCS229 Supplemental Lecture notes
CS229 Supplemental Lecture notes John Duchi Binary classification In binary classification problems, the target y can take on at only two values. In this set of notes, we show how to model this problem
More informationMachine Learning (COMP-652 and ECSE-608)
Machine Learning (COMP-652 and ECSE-608) Lecturers: Guillaume Rabusseau and Riashat Islam Email: guillaume.rabusseau@mail.mcgill.ca - riashat.islam@mail.mcgill.ca Teaching assistant: Ethan Macdonald Email:
More informationMachine Learning. Lecture 2: Linear regression. Feng Li. https://funglee.github.io
Machine Learning Lecture 2: Linear regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2017 Supervised Learning Regression: Predict
More informationBinary Classification / Perceptron
Binary Classification / Perceptron Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Supervised Learning Input: x 1, y 1,, (x n, y n ) x i is the i th data
More informationMultilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)
Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w x + w 2 x 2 + w 0 = 0 Feature x 2 = w w 2 x w 0 w 2 Feature 2 A perceptron can separate
More informationCOMS 4721: Machine Learning for Data Science Lecture 1, 1/17/2017
COMS 4721: Machine Learning for Data Science Lecture 1, 1/17/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University OVERVIEW This class will cover model-based
More informationCOMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS16
COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS6 Lecture 3: Classification with Logistic Regression Advanced optimization techniques Underfitting & Overfitting Model selection (Training-
More informationECS171: Machine Learning
ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018 Gradient descent Optimization Goal: find the minimizer of a function min f (w) w For now we assume f
More informationMachine Learning Linear Models
Machine Learning Linear Models Outline II - Linear Models 1. Linear Regression (a) Linear regression: History (b) Linear regression with Least Squares (c) Matrix representation and Normal Equation Method
More informationMIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,
MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run
More informationAlgorithmic Stability and Generalization Christoph Lampert
Algorithmic Stability and Generalization Christoph Lampert November 28, 2018 1 / 32 IST Austria (Institute of Science and Technology Austria) institute for basic research opened in 2009 located in outskirts
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationClassification Based on Probability
Logistic Regression These slides were assembled by Byron Boots, with only minor modifications from Eric Eaton s slides and grateful acknowledgement to the many others who made their course materials freely
More informationMachine Learning (COMP-652 and ECSE-608)
Machine Learning (COMP-652 and ECSE-608) Instructors: Doina Precup and Guillaume Rabusseau Email: dprecup@cs.mcgill.ca and guillaume.rabusseau@mail.mcgill.ca Teaching assistants: Tianyu Li and TBA Class
More informationThe Perceptron Algorithm 1
CS 64: Machine Learning Spring 5 College of Computer and Information Science Northeastern University Lecture 5 March, 6 Instructor: Bilal Ahmed Scribe: Bilal Ahmed & Virgil Pavlu Introduction The Perceptron
More informationError Functions & Linear Regression (1)
Error Functions & Linear Regression (1) John Kelleher & Brian Mac Namee Machine Learning @ DIT Overview 1 Introduction Overview 2 Univariate Linear Regression Linear Regression Analytical Solution Gradient
More informationIntroduction to Machine Learning
Introduction to Machine Learning CS4731 Dr. Mihail Fall 2017 Slide content based on books by Bishop and Barber. https://www.microsoft.com/en-us/research/people/cmbishop/ http://web4.cs.ucl.ac.uk/staff/d.barber/pmwiki/pmwiki.php?n=brml.homepage
More informationRegression with Numerical Optimization. Logistic
CSG220 Machine Learning Fall 2008 Regression with Numerical Optimization. Logistic regression Regression with Numerical Optimization. Logistic regression based on a document by Andrew Ng October 3, 204
More informationMultilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)
Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w 1 x 1 + w 2 x 2 + w 0 = 0 Feature 1 x 2 = w 1 w 2 x 1 w 0 w 2 Feature 2 A perceptron
More informationLogistic Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com
Logistic Regression These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these
More informationLinear Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com
Linear Regression These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these
More informationMachine Learning CS 4900/5900. Lecture 03. Razvan C. Bunescu School of Electrical Engineering and Computer Science
Machine Learning CS 4900/5900 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Machine Learning is Optimization Parametric ML involves minimizing an objective function
More informationStat 406: Algorithms for classification and prediction. Lecture 1: Introduction. Kevin Murphy. Mon 7 January,
1 Stat 406: Algorithms for classification and prediction Lecture 1: Introduction Kevin Murphy Mon 7 January, 2008 1 1 Slides last updated on January 7, 2008 Outline 2 Administrivia Some basic definitions.
More informationCSC242: Intro to AI. Lecture 21
CSC242: Intro to AI Lecture 21 Administrivia Project 4 (homeworks 18 & 19) due Mon Apr 16 11:59PM Posters Apr 24 and 26 You need an idea! You need to present it nicely on 2-wide by 4-high landscape pages
More informationMachine Learning (CS 567) Lecture 3
Machine Learning (CS 567) Lecture 3 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationNeural networks and optimization
Neural networks and optimization Nicolas Le Roux INRIA 8 Nov 2011 Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov 2011 1 / 80 1 Introduction 2 Linear classifier 3 Convolutional neural networks
More information2018 EE448, Big Data Mining, Lecture 4. (Part I) Weinan Zhang Shanghai Jiao Tong University
2018 EE448, Big Data Mining, Lecture 4 Supervised Learning (Part I) Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html Content of Supervised Learning
More informationLinear Regression. S. Sumitra
Linear Regression S Sumitra Notations: x i : ith data point; x T : transpose of x; x ij : ith data point s jth attribute Let {(x 1, y 1 ), (x, y )(x N, y N )} be the given data, x i D and y i Y Here D
More informationGradient Descent. Stephan Robert
Gradient Descent Stephan Robert Model Other inputs # of bathrooms Lot size Year built Location Lake view Price (CHF) 6 5 4 3 2 1 2 4 6 8 1 Size (m 2 ) Model How it works Linear regression with one variable
More informationLecture 5 Neural models for NLP
CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm
More informationLeast Mean Squares Regression. Machine Learning Fall 2018
Least Mean Squares Regression Machine Learning Fall 2018 1 Where are we? Least Squares Method for regression Examples The LMS objective Gradient descent Incremental/stochastic gradient descent Exercises
More informationGenerative v. Discriminative classifiers Intuition
Logistic Regression Machine Learning 070/578 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features Y target
More informationOverview. Linear Regression with One Variable Gradient Descent Linear Regression with Multiple Variables Gradient Descent with Multiple Variables
Overview Linear Regression with One Variable Gradient Descent Linear Regression with Multiple Variables Gradient Descent with Multiple Variables Example: Advertising Data Data taken from An Introduction
More informationGenerative v. Discriminative classifiers Intuition
Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 1 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features
More informationLecture 5: Logistic Regression. Neural Networks
Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture
More informationCSC 411: Lecture 02: Linear Regression
CSC 411: Lecture 02: Linear Regression Richard Zemel, Raquel Urtasun and Sanja Fidler University of Toronto (Most plots in this lecture are from Bishop s book) Zemel, Urtasun, Fidler (UofT) CSC 411: 02-Regression
More informationOutline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation
More informationStatistical NLP for the Web
Statistical NLP for the Web Neural Networks, Deep Belief Networks Sameer Maskey Week 8, October 24, 2012 *some slides from Andrew Rosenberg Announcements Please ask HW2 related questions in courseworks
More informationDS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University
DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University January 17 2019 Logistics HW 1 is on Piazza and Gradescope Deadline: Friday, Jan. 25, 2019 Office
More informationNeural Networks biological neuron artificial neuron 1
Neural Networks biological neuron artificial neuron 1 A two-layer neural network Output layer (activation represents classification) Weighted connections Hidden layer ( internal representation ) Input
More informationData Informatics. Seon Ho Kim, Ph.D.
Data Informatics Seon Ho Kim, Ph.D. seonkim@usc.edu What is Machine Learning? Overview slides by ETHEM ALPAYDIN Why Learn? Learn: programming computers to optimize a performance criterion using example
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More informationLecture 5 Multivariate Linear Regression
Lecture 5 Multivariate Linear Regression Dan Sheldon September 23, 2014 Topics Multivariate linear regression Model Cost function Normal equations Gradient descent Features Book Data 10 8 Weight (lbs.)
More informationNaïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824
Naïve Bayes Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative HW 1 out today. Please start early! Office hours Chen: Wed 4pm-5pm Shih-Yang: Fri 3pm-4pm Location: Whittemore 266
More informationLogistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu
Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data
More informationMachine Learning Basics Lecture 3: Perceptron. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 3: Perceptron Princeton University COS 495 Instructor: Yingyu Liang Perceptron Overview Previous lectures: (Principle for loss function) MLE to derive loss Example: linear
More informationGradient Descent. Model
Gradient Descent Stephan Robert Model Other inputs # of bathrooms Lot size Year built Location Lake view Price (HF) 6 5 4 3 2 2 4 6 8 Size (m 2 ) 2 Model How it works Linear regression with one variable
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationLinear Regression. CSL603 - Fall 2017 Narayanan C Krishnan
Linear Regression CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis Regularization
More informationLinear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan
Linear Regression CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis
More informationLinear Discrimination Functions
Laurea Magistrale in Informatica Nicola Fanizzi Dipartimento di Informatica Università degli Studi di Bari November 4, 2009 Outline Linear models Gradient descent Perceptron Minimum square error approach
More informationRegression and Classification" with Linear Models" CMPSCI 383 Nov 15, 2011!
Regression and Classification" with Linear Models" CMPSCI 383 Nov 15, 2011! 1 Todayʼs topics" Learning from Examples: brief review! Univariate Linear Regression! Batch gradient descent! Stochastic gradient
More informationNeural Networks Learning the network: Backprop , Fall 2018 Lecture 4
Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:
More informationLinear Regression. Aarti Singh. Machine Learning / Sept 27, 2010
Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X
More informationOptimization and Gradient Descent
Optimization and Gradient Descent INFO-4604, Applied Machine Learning University of Colorado Boulder September 12, 2017 Prof. Michael Paul Prediction Functions Remember: a prediction function is the function
More informationLinear and Logistic Regression. (the SQL way)
Linear and Logistic Regression (the SQL way) What is the purpose of this presentation? To show if linear or logistic regression is possible with SQL and to provide their implementation To provide enough
More informationLecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning
Lecture 0 Neural networks and optimization Machine Learning and Data Mining November 2009 UBC Gradient Searching for a good solution can be interpreted as looking for a minimum of some error (loss) function
More informationECS171: Machine Learning
ECS171: Machine Learning Lecture 3: Linear Models I (LFD 3.2, 3.3) Cho-Jui Hsieh UC Davis Jan 17, 2018 Linear Regression (LFD 3.2) Regression Classification: Customer record Yes/No Regression: predicting
More informationBoosting: Foundations and Algorithms. Rob Schapire
Boosting: Foundations and Algorithms Rob Schapire Example: Spam Filtering problem: filter out spam (junk email) gather large collection of examples of spam and non-spam: From: yoav@ucsd.edu Rob, can you
More informationIntroduction to Logistic Regression
Introduction to Logistic Regression Guy Lebanon Binary Classification Binary classification is the most basic task in machine learning, and yet the most frequent. Binary classifiers often serve as the
More informationMachine Learning Overview
Machine Learning Overview Sargur N. Srihari University at Buffalo, State University of New York USA 1 Outline 1. What is Machine Learning (ML)? 2. Types of Information Processing Problems Solved 1. Regression
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationCS260: Machine Learning Algorithms
CS260: Machine Learning Algorithms Lecture 4: Stochastic Gradient Descent Cho-Jui Hsieh UCLA Jan 16, 2019 Large-scale Problems Machine learning: usually minimizing the training loss min w { 1 N min w {
More informationCPSC 340: Machine Learning and Data Mining. Stochastic Gradient Fall 2017
CPSC 340: Machine Learning and Data Mining Stochastic Gradient Fall 2017 Assignment 3: Admin Check update thread on Piazza for correct definition of trainndx. This could make your cross-validation code
More informationMachine Learning Basics III
Machine Learning Basics III Benjamin Roth CIS LMU München Benjamin Roth (CIS LMU München) Machine Learning Basics III 1 / 62 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient
More informationArtificial Neural Networks
0 Artificial Neural Networks Based on Machine Learning, T Mitchell, McGRAW Hill, 1997, ch 4 Acknowledgement: The present slides are an adaptation of slides drawn by T Mitchell PLAN 1 Introduction Connectionist
More informationEssence of Machine Learning (and Deep Learning) Hoa M. Le Data Science Lab, HUST hoamle.github.io
Essence of Machine Learning (and Deep Learning) Hoa M. Le Data Science Lab, HUST hoamle.github.io 1 Examples https://www.youtube.com/watch?v=bmka1zsg2 P4 http://www.r2d3.us/visual-intro-to-machinelearning-part-1/
More informationMachine Learning and Data Mining. Linear regression. Kalev Kask
Machine Learning and Data Mining Linear regression Kalev Kask Supervised learning Notation Features x Targets y Predictions ŷ Parameters q Learning algorithm Program ( Learner ) Change q Improve performance
More informationTraining the linear classifier
215, Training the linear classifier A natural way to train the classifier is to minimize the number of classification errors on the training data, i.e. choosing w so that the training error is minimized.
More informationGradient Descent. Sargur Srihari
Gradient Descent Sargur srihari@cedar.buffalo.edu 1 Topics Simple Gradient Descent/Ascent Difficulties with Simple Gradient Descent Line Search Brent s Method Conjugate Gradient Descent Weight vectors
More informationMathematical Formulation of Our Example
Mathematical Formulation of Our Example We define two binary random variables: open and, where is light on or light off. Our question is: What is? Computer Vision 1 Combining Evidence Suppose our robot
More informationLECTURE 22: SWARM INTELLIGENCE 3 / CLASSICAL OPTIMIZATION
15-382 COLLECTIVE INTELLIGENCE - S19 LECTURE 22: SWARM INTELLIGENCE 3 / CLASSICAL OPTIMIZATION TEACHER: GIANNI A. DI CARO WHAT IF WE HAVE ONE SINGLE AGENT PSO leverages the presence of a swarm: the outcome
More informationThe Perceptron Algorithm, Margins
The Perceptron Algorithm, Margins MariaFlorina Balcan 08/29/2018 The Perceptron Algorithm Simple learning algorithm for supervised classification analyzed via geometric margins in the 50 s [Rosenblatt
More informationMachine Learning: Logistic Regression. Lecture 04
Machine Learning: Logistic Regression Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Supervised Learning Task = learn an (unkon function t : X T that maps input
More informationCSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression
CSC2515 Winter 2015 Introduction to Machine Learning Lecture 2: Linear regression All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html
More informationThe perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt.
1 The perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt. The algorithm applies only to single layer models
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationLecture 6. Notes on Linear Algebra. Perceptron
Lecture 6. Notes on Linear Algebra. Perceptron COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne This lecture Notes on linear algebra Vectors
More informationLogic and machine learning review. CS 540 Yingyu Liang
Logic and machine learning review CS 540 Yingyu Liang Propositional logic Logic If the rules of the world are presented formally, then a decision maker can use logical reasoning to make rational decisions.
More informationCS221 / Autumn 2017 / Liang & Ermon. Lecture 2: Machine learning I
CS221 / Autumn 2017 / Liang & Ermon Lecture 2: Machine learning I cs221.stanford.edu/q Question How many parameters (real numbers) can be learned by machine learning algorithms using today s computers?
More informationExpectation Maximization, and Learning from Partly Unobserved Data (part 2)
Expectation Maximization, and Learning from Partly Unobserved Data (part 2) Machine Learning 10-701 April 2005 Tom M. Mitchell Carnegie Mellon University Clustering Outline K means EM: Mixture of Gaussians
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More information100 inference steps doesn't seem like enough. Many neuron-like threshold switching units. Many weighted interconnections among units
Connectionist Models Consider humans: Neuron switching time ~ :001 second Number of neurons ~ 10 10 Connections per neuron ~ 10 4 5 Scene recognition time ~ :1 second 100 inference steps doesn't seem like
More informationPattern Recognition and Machine Learning. Learning and Evaluation of Pattern Recognition Processes
Pattern Recognition and Machine Learning James L. Crowley ENSIMAG 3 - MMIS Fall Semester 2016 Lesson 1 5 October 2016 Learning and Evaluation of Pattern Recognition Processes Outline Notation...2 1. The
More informationNeural networks and optimization
Neural networks and optimization Nicolas Le Roux Criteo 18/05/15 Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 1 / 85 1 Introduction 2 Deep networks 3 Optimization 4 Convolutional
More informationRelationship between Least Squares Approximation and Maximum Likelihood Hypotheses
Relationship between Least Squares Approximation and Maximum Likelihood Hypotheses Steven Bergner, Chris Demwell Lecture notes for Cmpt 882 Machine Learning February 19, 2004 Abstract In these notes, a
More information