FA Homework 2 Recitation 2

Similar documents
Introduction to Machine Learning

CSC321 Lecture 4 The Perceptron Algorithm

Optimization and Gradient Descent

Linear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7.

Midterm exam CS 189/289, Fall 2015

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines

Probabilistic Machine Learning. Industrial AI Lab.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018

Logistic Regression. Machine Learning Fall 2018

Machine Learning Practice Page 2 of 2 10/28/13

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

Machine Learning. Linear Models. Fabio Vandin October 10, 2017

Regression and Classification" with Linear Models" CMPSCI 383 Nov 15, 2011!

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

1 Machine Learning Concepts (16 points)

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

ECE521 Lectures 9 Fully Connected Neural Networks

Linear Regression (continued)

Preliminaries. Definition: The Euclidean dot product between two vectors is the expression. i=1

Discriminative Models

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Single layer NN. Neuron Model

CSC321 Lecture 2: Linear Regression

You submitted this quiz on Wed 16 Apr :18 PM IST. You got a score of 5.00 out of 5.00.

Linear and Logistic Regression. Dr. Xiaowei Huang

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

Machine Learning and Data Mining. Linear classification. Kalev Kask

Logistic Regression & Neural Networks

Answers Machine Learning Exercises 4

Machine Learning Linear Models

Binary Classification / Perceptron

18.6 Regression and Classification with Linear Models

Machine Learning. Linear Models. Fabio Vandin October 10, 2017

Machine Learning for NLP

ESS2222. Lecture 4 Linear model

Machine Learning & Data Mining Caltech CS/CNS/EE 155 Set 2 January 12 th, 2018

COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS16

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

6.867 Machine learning

Discriminative Models

Perceptron (Theory) + Linear Regression

Max Margin-Classifier

Machine Learning 4771

CPSC 340: Machine Learning and Data Mining

April 9, Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá. Linear Classification Models. Fabio A. González Ph.D.

Nonlinear Classification

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

Machine Learning: Homework 4

6.036 midterm review. Wednesday, March 18, 15

Classification with Perceptrons. Reading:

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

CSCI567 Machine Learning (Fall 2014)

Linear discriminant functions

CSC321 Lecture 4: Learning a Classifier

9 Classification. 9.1 Linear Classifiers

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Linear models and the perceptron algorithm

Linear Classifiers. Michael Collins. January 18, 2012

Linear Discrimination Functions

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

CSC 411 Lecture 6: Linear Regression

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Learning with multiple models. Boosting.

Lecture 2 Machine Learning Review

The Perceptron algorithm

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Stanford Machine Learning - Week V

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

Kernel Methods and Support Vector Machines

Logistic Regression. COMP 527 Danushka Bollegala

ORIE 4741 Final Exam

COMP 652: Machine Learning. Lecture 12. COMP Lecture 12 1 / 37

CS145: INTRODUCTION TO DATA MINING

Support Vector Machines

Lecture 2: Linear regression

Deep Learning for Computer Vision

Linear Models in Machine Learning

CSC321 Lecture 4: Learning a Classifier

Classification Logistic Regression

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Logistic Regression. Some slides adapted from Dan Jurfasky and Brendan O Connor

Part of the slides are adapted from Ziko Kolter

Linear Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Support Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

CSC 411 Lecture 17: Support Vector Machine

Introduction to Logistic Regression

Stochastic Gradient Descent

More about the Perceptron

DATA MINING AND MACHINE LEARNING

We choose parameter values that will minimize the difference between the model outputs & the true function values.

Machine Learning, Fall 2009: Midterm

10-701/ Machine Learning - Midterm Exam, Fall 2010

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

ECE521: Inference Algorithms and Machine Learning University of Toronto. Assignment 1: k-nn and Linear Regression

Transcription:

FA17 10-701 Homework 2 Recitation 2 Logan Brooks,Matthew Oresky,Guoquan Zhao October 2, 2017 Logan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 1 / 15

Outline 1 Code Vectorization Why Linear Regression Example 2 Regularization Logan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 2 / 15

Outline 1 Code Vectorization Why Linear Regression Example 2 Regularization Logan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 3 / 15

How to Vectorize Code Linear Regression Example The goal in vectorization is to write code that avoids loops and uses whole-array operations. Why? Logan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 4 / 15

How to Vectorize Code Linear Regression Example The goal in vectorization is to write code that avoids loops and uses whole-array operations. Why? Efficient! Advanced algorithms to do computation on matrices and vectors. ogan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 4 / 15

Outline 1 Code Vectorization Why Linear Regression Example 2 Regularization Logan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 5 / 15

Linear Regression Example Obective Function : L ogan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 6 / 15

Linear Regression Example Obective Function : L n i=1 (X (i)t w y i ) 2 X (i)t is ith row of data. 1 x p w is a p x 1 column vector. y is a n x 1 column vector. dl dw =? ogan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 6 / 15

Linear Regression Example Obective Function : L n i=1 (X (i)t w y i ) 2 X (i)t is ith row of data. 1 x p w is a p x 1 column vector. y is a n x 1 column vector. dl dw =? dl dw = n (i) i=1 2X (X (i)t w y i ) ogan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 6 / 15

Linear Regression Example dl dw = n (i) i=1 2X (X (i)t w y i ) Octave Code: Logan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 7 / 15

Linear Regression Example dl dw = n (i) i=1 2X (X (i)t w y i ) Octave Code: How can we vectorize it? Logan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 7 / 15

Linear Regression Example dl dw = n (i) i=1 2X (X (i)t w y i ) Octave Code: How can we vectorize it? One way: Directly rewrite the formula. Remove the summation. We can easily remove one of the for loop to calculate xw because xw can be changed to X (i, :) w. Logan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 7 / 15

Linear Regression Example Sometimes it s not very intuitive. Logan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 8 / 15

Linear Regression Example dl dw = n (i) i=1 2X (X (i)t w y i ) Logan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 9 / 15

Linear Regression Example dl dw = n (i) i=1 2X (X (i)t w y i ) Another way: Build the matrix from the formula. We already know that if the formula can be rewritten in a matrix form, each entry can be calculated as the summation of some multiplication among a row and a column. Logan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 9 / 15

Linear Regression Example dl dw = n (i) i=1 2X (X (i)t w y i ) Another way: Build the matrix from the formula. We already know that if the formula can be rewritten in a matrix form, each entry can be calculated as the summation of some multiplication among a row and a column. Pretend that you know the result is given by matrix multiplication. Construct the matrix by figuring out which element should be put into which slot. Draw two empty matrices and fill in the blanks. Logan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 9 / 15

Linear Regression Example dl dw = n (i) i=1 2X (X (i)t w y i ) According to the formula, th row of dl/dw can be written as the sum of multiplication of a row and a column. The th row could be [X (1), X (2),..., X (n) ]. And the column could be [(X 1T w y1), (X 2T w y2),..., (X nt w yn)] T Logan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 10 / 15

Linear Regression Example dl dw = n (i) i=1 2X (X (i)t w y i ) According to the formula, th row of dl/dw can be written as the sum of multiplication of a row and a column. The th row could be [X (1), X (2),..., X (n) ]. And the column could be [(X 1T w y1), (X 2T w y2),..., (X nt w yn)] T If we stack each data point row by row, X (1)T X (1) 1, X (1) 2,..., X p (1) X (2)T X =. = X (2) 1, X (2) 2,..., X (2) p (1). X (n)t X (n) 1, X (n) 2,..., X p (n) We can see that th row of X T (or the th column of X) is the same as the row that we construct above! So the first matrix should be X T And the column could be calculated by Xw y dl dw = n i=1 2X (i) (X (i)t w y i ) = 2X T (Xw y) Logan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 10 / 15

Linear Regression Example dl dw = n (i) i=1 2X (X (i)t w y i ) = 2X T (Xw y) Use same method for logistic regression if you cannot pass the Autolab s autograder. Logan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 11 / 15

Question 1 Which of the following is true about regularization? a) One of the goals of regularization is to combat underfitting. b) The L0 norm is rarely used in practice because it is non-differentiable and non-convex. c) A model with regularization fits the training data better than a model without regularization. d) One way to understand regularization is that it makes the learning algorithm prefer more complex solutions. Logan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 12 / 15

Question 1 Which of the following is true about regularization? a) One of the goals of regularization is to combat underfitting. b) The L0 norm is rarely used in practice because it is non-differentiable and non-convex. c) A model with regularization fits the training data better than a model without regularization. d) One way to understand regularization is that it makes the learning algorithm prefer more complex solutions. Answer b. The goal of regularization is to combat overfitting, so a model without regularization will better fit the training data. Regularization makes the solution simpler by penalizing higher or non-zero weights. Logan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 12 / 15

Question 2 For a typical minimization problem with L 1 regularization: min β f (β) + λ β 1 where f is the obective function, which statement is true about λ? a) Larger λ leads to a sparser solution of β and a possible underfit of the data. b) Larger λ leads to a sparser solution of β and a possible overfit of the data. c) Larger λ leads to a denser solution of β and a possible underfit of the data. d) Larger λ leads to a denser solution of β and a possible overfit of the data. Logan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 13 / 15

Question 2 For a typical minimization problem with L 1 regularization: min β f (β) + λ β 1 where f is the obective function, which statement is true about λ? a) Larger λ leads to a sparser solution of β and a possible underfit of the data. b) Larger λ leads to a sparser solution of β and a possible overfit of the data. c) Larger λ leads to a denser solution of β and a possible underfit of the data. d) Larger λ leads to a denser solution of β and a possible overfit of the data. Answer a) Larger λ leads to a sparser solution of β and a possible underfit of the data. Logan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 13 / 15

Question 3 Consider a binary classification problem. Suppose I have trained a model on a linearly separable training set, and then I get a new labeled data point which is correctly classified by the model, and far away from the decision boundary. If I now append this new point to my training set and re-train, would the learned decision boundary change for the following models? a) Perceptron b) Logistic Regression (using gradient descent) Logan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 14 / 15

Question 3 Consider a binary classification problem. Suppose I have trained a model on a linearly separable training set, and then I get a new labeled data point which is correctly classified by the model, and far away from the decision boundary. If I now append this new point to my training set and re-train, would the learned decision boundary change for the following models? a) Perceptron b) Logistic Regression (using gradient descent) Answer a) No, if Perceptron has converged already, a correctly classified point will have no effect. b) Yes, Logistic regression will consider this data point s gradient in its gradient calculations and this will effect the learned weight vector. (The weight vector likely won t change by a meaningful amount however.) Logan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 14 / 15

For Further Reading I https://www.gnu.org/software/octave/doc/v4.2.1/basic- Vectorization.html Octave Doc. Logan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 15 / 15