HW3. All layers except the first will simply pass through the previous layer: W (t+1) . (The last layer will pass the first neuron).

Size: px
Start display at page:

Download "HW3. All layers except the first will simply pass through the previous layer: W (t+1) . (The last layer will pass the first neuron)."

Transcription

1 HW3 Inbal Joffe Aran Carmon Theory Questions 1. Let s look at the inputs set S = x 1,..., x d R d such that (x i ) j = δ ij (that is, 1 if i = j, and 0 otherwise). The size of the set is d. And for every subset A of S, we can build a network that will give 1 only on inputs in A: All layers except the first will simply pass through the previous layer: W (t+1) δ ij, b (t+1) i = 1 2. (The last layer will pass the first neuron). The first layer will only accept inputs from A: W (1) ij = ij = { 1 x i A 0 otherwise, b(1) i = 1 2. Notice that we can get a better lower bound by looking at the network that connects all the inputs to the first neuron, and that only passes the output of the first neuron all the way to the final output. This network is essentially only the first neuron, and we know that it s VCdim equals d + 1. We will use this result in the next question. 2. a. The hypothesis class is of the form: B = {h 1... h d h 1,..., h d H} We have seen in the recitation that for m VCdim(H) = d + 1, π H (m) ( ) d+1 em d + 1 Also, we saw in class that π F1 F 2 (m) π F1 (m) π F2 (m) (and inductively we can deduce for all n N: π F1... F n (m) π F1 (m)... π Fn (m)). 1

2 ( Therefore, π B (m) em d+1 ) d(d+1) b. The hypothesis class C is of the form {b 1... b l b 1,..., b l B} where B is as defined above. We saw in class that π F1 F 2 (m) π F1 (m) π F2 (m) (and inductively we can deduce for all n N: π F1... F n (m) π F1 (m)... π Fn (m)). ( ) ld(d+1) Therefore, π C (m) em d+1 c. Each neuron has d + 1 parameters (w (t) i,: and (b t+1 ) i ). For each of the first L 1 layers, we have d neurons per layer, and an additional single neuron in the last layer; all in all, N = (d + 1)d(L 1) + (d + 1) d. Assume 2 m (em) N ; then m N log 2 (em) em en log 2 (em) ( ) em 2eN log 2 (en) m 2N log 2 (en), as required. (*) Lemma: a > 0, x a log 2 (x) x 2a log 2 (a) Proof: Let a > 0, and assume x > 2a log 2 (a); we need to show that x > a log 2 (x). Notice that for a e it holds that x > a log 2 (x) (for x 1it is trivial, else 0 < x e log 2 (x) x a log 2 (x)). Now, for a > e we get x > 2a log 2 (A) > 2a 2 ln(2) x > a ln(2) (#). Let us look at the function f(x) = x a log 2 (x); its derivative is f (x) = 1 a x ln(2) (#) 0 At last, since a 2 log 2 (a) > 0 for all a > 0, it follows that: f(x) = x a log 2 (x) 2a log 2 (a)-a log 2 (2a log 2 (a)) = 2a log 2 (a) a log 2 (a) a log 2 (2 log 2 (a)) = a log 2 (a) a log 2 (2 log 2 (a)) 0 x > a log 2 (x), as required. e. For every m N, we have, π C (m) = max S =m Π C(S) = max S =m { h(s 1),..., h(s m ) h C} 2 m The last inequality holds since { h(s 1 ),..., h(s m ) h C} { 1, 1} m. We need to show: π C (m) ( em? d + 1 )Ld(d+1) (em) (d+1)d(l 1)+d+1 = (em) N (for m d + 1) Using d 1 and 1 L 2

3 Ld 2 Ld d (em) Ld2 Ld (em) d2 +1 (d + 1) Ld2 Ld (em) d2 +1 (em) Ld2 +Ld (d + 1) Ld2 Ld (em) Ld2 +Ld d 2 +1 ( ) Ld em 2 +Ld (em) N d + 1 And we get π C (m) (em) N (for m d + 1) Therefore, when m = VCdim(C), we have 2 m = π C (m) (em) N, and we can apply the previous subquestion to get: VCdim(C) = m 2N log 2 (en) Notice that we can assign m = VCdim(C), since VCdim(C) d + 1 as we showed in the previous question. 3. a. In order to project, we first test if x R. If it is, we return x; else, we return R x x. b. 3

4 Let z K, and denote by a, b, c the edges of the triangle between x, y, z as in the picture. From the definition of x we know that a b, and we need to show that c b. It is sufficient to show that 90 o β for that matter. Notice that for every z = ɛx + (1 ɛ) z (for 0 < ɛ < 1), the equivalent triangle xyz also upholds a b ; in addition β = β, so α β for every xyz. Assume by way of contradiction that β < 90 o ;then α < 90 o for every xyz there exists γ > 0 small enough such that α + β + γ < 180 o ; a contradiction. we then deduce β 90 o b is the largest edge in xyz c b. c. The proof is exactly the same as the proof mentioned in the question, until equation (12). In our case, we use w t+1 = Π K (w t ηv t ) instead of w t+1 = w t ηv t, and it changes the equation after equation (12) from to w t+1 w 2 2 = w t ηv t w 2 2 = w t w η2 v t 2 2 2η (w t w ) v t w t+1 w 2 2 = Π K (w t ηv t ) w 2 2 w t ηv t w 2 2 = w t w η2 v t 2 2 2η (w t w ) v t Where the inequality is the inequality from the previous subquestion. 4

5 We continue to rearrange equation 13 to v t (w t w ) w t w 2 2 w t+1 w 2 2 2η The rest of the proof continues without other modifications η v t a. Let w 1 and w 2 be the weights of a multiclass classifier with k = 2. We classify a new point x as 1 iff w 1 x > w 2 x, that is (w 1 w 2 )x > 0 which is the same as using a single class classifier with weights w = w 1 w 2. And wx > 0 iff (w 1 w 2 )x 0. On the other hand, if we have a single class classifier with weights w, we can build a multiclass classifier with w 1 = w and w 2 = w. The new classifier will classify a new point as 1 iff, w 1 x > w 2 x wx > wx 2wx > 0 wx > 0 which is the same as the original single class classifier. Furthermore, let us consider the optimization of the multiclass classifier, f = 1 2 w w C 2 m max(0, (w 3 yi w yi ) x i + 1) i=1 Since we classify to either label 1 or label 2, it is reasonable to expect w 1 = w 2. In that case the above turns to, f = w C 2 m max(0, 1 2w yi x i ) i=1 Define y i = { 1 y i = 1 1 y i = 2, and define w = 1 2 w 1, f = 2 d w 2 + C 2 m max(0, 1 y iwx i ) i=1 We can define c = 2 d+2 C and get, 5

6 f = 1 m 2 w 2 + C max(0, 1 y iwx i ) which is the same optimization problem as in the SVM we learned. i=1 b. Derive with respect to w j : We define j (w, x i, y i ) = arg max p (w p x i w yi x i + 1(p y i )) l w j = x i (1(j = j ) 1(j = y i )) f w j = w j + C m So an SGD version, would be to sample a random point at each step, and to update all w j s according to the following rule, m i=1 l w j If j y i and j = arg max p (w p,t x i w yi,tx i + 1(p y i )): w j,t+1 = (1 η)w j,t ηcx i If j = y i and j arg max p (w p,t x i w yi,tx i + 1(p y i )): In any other case: w j,t+1 = (1 η)w j,t + ηcx i w j,t+1 = (1 η)w j,t c. We notice that w j is a linear combination of x i s. Instead of keeping w j explicitly, we can keep track of the coefficients of x i. Define w j = m i=1 M j,ix i. Classifying a new point x would be y = arg max j ( m i=1 M j,ik(x i, x)). Where K is the kernel function used. Pseudo-code for training: 6

7 Input: kernel function K list x i, y i of m training samples T number of iterations η step size C penalty coefficient Output: A matrix M Mat(k, m) to be used for classifying new points Initialize M Mat(k, m) to be zeroes for T iterations: choose a random point i [m] : x i, y i from the training set find j = arg max j ( m t=1 M j,tk(x t, x i )) M = (1 η)m for each j [k]: if j y i and j = j : M j,i = M j,i ηc if j = y i and j j : M j,i = M j,i + ηc return M 5. If at each level i of the tree, we will ask x i = 0, then after d questions, each leaf will contain only one member. That is, one-to-one correspondence between leafs and vectors {0, 1} d. To implement an arbitrary classifier using this tree, classify every leaf the same way the arbitrary classifier does. Let us show that the VCdim is 2 d : Let S {0, 1} d with S = 2 d (that is, S = {0, 1} d ), and let y 1,..., y 2 d be arbitrary labels. Since we can classify any subset we wish as 1, we can choose a binary decision tree in which each the leaf corresponding to each input s i will be classified as y i. We showed that a set of size 2 d can be shattered, which means VCdim 2 d ; and since VCdim {0, 1} d = 2 d, we have VCdim= 2 d. 7

8 HW3: Programming Assignment Aran Carmon Inbal Joffe 6. a. We created 2 functions to plot the training and validation errors for various η and C. How to run: python q.py 6 find_eta <from> <to> <step> <C> <T> <filename> python q.py 6 find_c <from> <to> <step> <eta> <T> <filename> We first start with scanning for η along a logarithmic scale, with T = 1000, and with c = 1: Both the test error and the validation error are shown in the plot, and we see they are almost the same. We continue to scan for C, using T = 1000 and η = : We zoom in, and use T = 10000: 8

9 We choose the parameters η = and C = b. Weights for the digits, shown as images: 9

10 We see that some of the weights resemble the digits they classify. e.g. 2, 3, and 9. Other weights look more like a mix of other digits. How to run: python q.py 6 show_digit <C> <eta> <T> <digit> <filename> c. Using T = 4 len(train_data)= We measured an accuracy of How to run: python q.py 6 calc_accuracy <C> <eta> <T> 10

11 7. a. We created 2 functions to plot the training and validation errors for various η and C. How to run: python q.py 7 find_eta <kernel> <training size> <from> <to> <step> <C> <T> <filename> python q.py 7 find_c <kernel> <training size> <from> <to> <step> <eta> <T> <filename> For a quicker tuning of parameters, we used a training set of only 1000 points, sampled randomly each time from the training set. We started by scanning for an η value, with T = 1000 and C = 1: The accuracy is mostly uniform at the lower part of the plot, we continue to scan for C value with η = 10 6 : The accuracy of C values seems uniform, so we will choose C = 1. b. With C = 1,η = 10 6, and T = 10000, we measured on the test set an accuracy of: How to run: python q.py 7 calc_accuracy <kernel> <C> <eta> <T> 11

12 c. We measured an accuracy of with RBF σ = 1000, T = 10000, C = 1, η = 10 6 which is comparable to the quadratic kernel. Due to time constraints, we did not investigate it further. How to run: python q.py 7 calc_accuracy r

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017 Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training

More information

1 Binary Classification

1 Binary Classification CS6501: Advanced Machine Learning 017 Problem Set 1 Handed Out: Feb 11, 017 Due: Feb 18, 017 Feel free to talk to other members of the class in doing the homework. You should, however, write down your

More information

Generalization, Overfitting, and Model Selection

Generalization, Overfitting, and Model Selection Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification Maria-Florina (Nina) Balcan 10/03/2016 Two Core Aspects of Machine Learning Algorithm Design. How

More information

Generalization and Overfitting

Generalization and Overfitting Generalization and Overfitting Model Selection Maria-Florina (Nina) Balcan February 24th, 2016 PAC/SLT models for Supervised Learning Data Source Distribution D on X Learning Algorithm Expert / Oracle

More information

COMS 4771 Introduction to Machine Learning. Nakul Verma

COMS 4771 Introduction to Machine Learning. Nakul Verma COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW1 due next lecture Project details are available decide on the group and topic by Thursday Last time Generative vs. Discriminative

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

PAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018

PAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University PAC Learning Matt Gormley Lecture 14 March 5, 2018 1 ML Big Picture Learning Paradigms:

More information

Understanding Generalization Error: Bounds and Decompositions

Understanding Generalization Error: Bounds and Decompositions CIS 520: Machine Learning Spring 2018: Lecture 11 Understanding Generalization Error: Bounds and Decompositions Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the

More information

Supervised Machine Learning (Spring 2014) Homework 2, sample solutions

Supervised Machine Learning (Spring 2014) Homework 2, sample solutions 58669 Supervised Machine Learning (Spring 014) Homework, sample solutions Credit for the solutions goes to mainly to Panu Luosto and Joonas Paalasmaa, with some additional contributions by Jyrki Kivinen

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

CIS 520: Machine Learning Oct 09, Kernel Methods

CIS 520: Machine Learning Oct 09, Kernel Methods CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed

More information

Computational and Statistical Learning theory

Computational and Statistical Learning theory Computational and Statistical Learning theory Problem set 2 Due: January 31st Email solutions to : karthik at ttic dot edu Notation : Input space : X Label space : Y = {±1} Sample : (x 1, y 1,..., (x n,

More information

Name (NetID): (1 Point)

Name (NetID): (1 Point) CS446: Machine Learning (D) Spring 2017 March 16 th, 2017 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Hypothesis Space variable size deterministic continuous parameters Learning Algorithm linear and quadratic programming eager batch SVMs combine three important ideas Apply optimization

More information

CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring Final Exam CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the

More information

Machine Learning, Midterm Exam

Machine Learning, Midterm Exam 10-601 Machine Learning, Midterm Exam Instructors: Tom Mitchell, Ziv Bar-Joseph Wednesday 12 th December, 2012 There are 9 questions, for a total of 100 points. This exam has 20 pages, make sure you have

More information

Name (NetID): (1 Point)

Name (NetID): (1 Point) CS446: Machine Learning Fall 2016 October 25 th, 2016 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains four

More information

Announcements - Homework

Announcements - Homework Announcements - Homework Homework 1 is graded, please collect at end of lecture Homework 2 due today Homework 3 out soon (watch email) Ques 1 midterm review HW1 score distribution 40 HW1 total score 35

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU10701 11. Learning Theory Barnabás Póczos Learning Theory We have explored many ways of learning from data But How good is our classifier, really? How much data do we

More information

CS 446: Machine Learning Lecture 4, Part 2: On-Line Learning

CS 446: Machine Learning Lecture 4, Part 2: On-Line Learning CS 446: Machine Learning Lecture 4, Part 2: On-Line Learning 0.1 Linear Functions So far, we have been looking at Linear Functions { as a class of functions which can 1 if W1 X separate some data and not

More information

Introduction to Machine Learning (67577) Lecture 5

Introduction to Machine Learning (67577) Lecture 5 Introduction to Machine Learning (67577) Lecture 5 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Nonuniform learning, MDL, SRM, Decision Trees, Nearest Neighbor Shai

More information

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training

More information

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Machine Learning. Lecture 9: Learning Theory. Feng Li. Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell

More information

Computational Learning Theory. CS534 - Machine Learning

Computational Learning Theory. CS534 - Machine Learning Computational Learning Theory CS534 Machine Learning Introduction Computational learning theory Provides a theoretical analysis of learning Shows when a learning algorithm can be expected to succeed Shows

More information

Midterm: CS 6375 Spring 2015 Solutions

Midterm: CS 6375 Spring 2015 Solutions Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Vapnik Chervonenkis Theory Barnabás Póczos Empirical Risk and True Risk 2 Empirical Risk Shorthand: True risk of f (deterministic): Bayes risk: Let us use the empirical

More information

Research Methods in Mathematics Homework 4 solutions

Research Methods in Mathematics Homework 4 solutions Research Methods in Mathematics Homework 4 solutions T. PERUTZ (1) Solution. (a) Since x 2 = 2, we have (p/q) 2 = 2, so p 2 = 2q 2. By definition, an integer is even if it is twice another integer. Since

More information

Network Design and Game Theory Spring 2008 Lecture 2

Network Design and Game Theory Spring 2008 Lecture 2 Network Design and Game Theory Spring 2008 Lecture 2 Instructor: Mohammad T. Hajiaghayi Scribe: Imdadullah Khan February 04, 2008 MAXIMUM COVERAGE In this lecture we review Maximum Coverage and Unique

More information

where x i is the ith coordinate of x R N. 1. Show that the following upper bound holds for the growth function of H:

where x i is the ith coordinate of x R N. 1. Show that the following upper bound holds for the growth function of H: Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 2 October 25, 2017 Due: November 08, 2017 A. Growth function Growth function of stum functions.

More information

Machine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015

Machine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015 Machine Learning 10-701, Fall 2015 VC Dimension and Model Complexity Eric Xing Lecture 16, November 3, 2015 Reading: Chap. 7 T.M book, and outline material Eric Xing @ CMU, 2006-2015 1 Last time: PAC and

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc.

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Harsha Veeramachaneni Thomson Reuter Research and Development April 1, 2010 Harsha Veeramachaneni (TR R&D) Logistic Regression April 1, 2010

More information

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18 CSE 417T: Introduction to Machine Learning Lecture 11: Review Henry Chai 10/02/18 Unknown Target Function!: # % Training data Formal Setup & = ( ), + ),, ( -, + - Learning Algorithm 2 Hypothesis Set H

More information

Midterm Exam, Spring 2005

Midterm Exam, Spring 2005 10-701 Midterm Exam, Spring 2005 1. Write your name and your email address below. Name: Email address: 2. There should be 15 numbered pages in this exam (including this cover sheet). 3. Write your name

More information

Empirical Risk Minimization

Empirical Risk Minimization Empirical Risk Minimization Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Introduction PAC learning ERM in practice 2 General setting Data X the input space and Y the output space

More information

FINAL: CS 6375 (Machine Learning) Fall 2014

FINAL: CS 6375 (Machine Learning) Fall 2014 FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for

More information

Radial-Basis Function Networks

Radial-Basis Function Networks Radial-Basis Function etworks A function is radial () if its output depends on (is a nonincreasing function of) the distance of the input from a given stored vector. s represent local receptors, as illustrated

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline

More information

Mistake Bound Model, Halving Algorithm, Linear Classifiers, & Perceptron

Mistake Bound Model, Halving Algorithm, Linear Classifiers, & Perceptron Stat 928: Statistical Learning Theory Lecture: 18 Mistake Bound Model, Halving Algorithm, Linear Classifiers, & Perceptron Instructor: Sham Kakade 1 Introduction This course will be divided into 2 parts.

More information

FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE

FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE You are allowed a two-page cheat sheet. You are also allowed to use a calculator. Answer the questions in the spaces provided on the question sheets.

More information

HOMEWORK 4: SVMS AND KERNELS

HOMEWORK 4: SVMS AND KERNELS HOMEWORK 4: SVMS AND KERNELS CMU 060: MACHINE LEARNING (FALL 206) OUT: Sep. 26, 206 DUE: 5:30 pm, Oct. 05, 206 TAs: Simon Shaolei Du, Tianshu Ren, Hsiao-Yu Fish Tung Instructions Homework Submission: Submit

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Uppsala University Department of Linguistics and Philology Slides borrowed from Ryan McDonald, Google Research Machine Learning for NLP 1(50) Introduction Linear Classifiers Classifiers

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear

More information

The definitions and notation are those introduced in the lectures slides. R Ex D [h

The definitions and notation are those introduced in the lectures slides. R Ex D [h Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 2 October 04, 2016 Due: October 18, 2016 A. Rademacher complexity The definitions and notation

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 27, 2015 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass

More information

Decision Trees. Lewis Fishgold. (Material in these slides adapted from Ray Mooney's slides on Decision Trees)

Decision Trees. Lewis Fishgold. (Material in these slides adapted from Ray Mooney's slides on Decision Trees) Decision Trees Lewis Fishgold (Material in these slides adapted from Ray Mooney's slides on Decision Trees) Classification using Decision Trees Nodes test features, there is one branch for each value of

More information

Kernel Logistic Regression and the Import Vector Machine

Kernel Logistic Regression and the Import Vector Machine Kernel Logistic Regression and the Import Vector Machine Ji Zhu and Trevor Hastie Journal of Computational and Graphical Statistics, 2005 Presented by Mingtao Ding Duke University December 8, 2011 Mingtao

More information

CS60021: Scalable Data Mining. Large Scale Machine Learning

CS60021: Scalable Data Mining. Large Scale Machine Learning J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 1 CS60021: Scalable Data Mining Large Scale Machine Learning Sourangshu Bhattacharya Example: Spam filtering Instance

More information

1 Machine Learning Concepts (16 points)

1 Machine Learning Concepts (16 points) CSCI 567 Fall 2018 Midterm Exam DO NOT OPEN EXAM UNTIL INSTRUCTED TO DO SO PLEASE TURN OFF ALL CELL PHONES Problem 1 2 3 4 5 6 Total Max 16 10 16 42 24 12 120 Points Please read the following instructions

More information

Machine Learning & Data Mining Caltech CS/CNS/EE 155 Set 2 January 12 th, 2018

Machine Learning & Data Mining Caltech CS/CNS/EE 155 Set 2 January 12 th, 2018 Policies Due 9 PM, January 9 th, via Moodle. You are free to collaborate on all of the problems, subject to the collaboration policy stated in the syllabus. You should submit all code used in the homework.

More information

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines Pattern Recognition and Machine Learning James L. Crowley ENSIMAG 3 - MMIS Fall Semester 2016 Lessons 6 10 Jan 2017 Outline Perceptrons and Support Vector machines Notation... 2 Perceptrons... 3 History...3

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Introduction to Machine Learning HW6

Introduction to Machine Learning HW6 CS 189 Spring 2018 Introduction to Machine Learning HW6 Your self-grade URL is http://eecs189.org/self_grade?question_ids=1_1,1_ 2,2_1,2_2,3_1,3_2,3_3,4_1,4_2,4_3,4_4,4_5,4_6,5_1,5_2,6. This homework is

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Lecture 11. Linear Soft Margin Support Vector Machines

Lecture 11. Linear Soft Margin Support Vector Machines CS142: Machine Learning Spring 2017 Lecture 11 Instructor: Pedro Felzenszwalb Scribes: Dan Xiang, Tyler Dae Devlin Linear Soft Margin Support Vector Machines We continue our discussion of linear soft margin

More information

Generalization, Overfitting, and Model Selection

Generalization, Overfitting, and Model Selection Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification MariaFlorina (Nina) Balcan 10/05/2016 Reminders Midterm Exam Mon, Oct. 10th Midterm Review Session

More information

Introduction to Machine Learning Spring 2018 Note 18

Introduction to Machine Learning Spring 2018 Note 18 CS 189 Introduction to Machine Learning Spring 2018 Note 18 1 Gaussian Discriminant Analysis Recall the idea of generative models: we classify an arbitrary datapoint x with the class label that maximizes

More information

Introduction to Statistical Learning Theory

Introduction to Statistical Learning Theory Introduction to Statistical Learning Theory Definition Reminder: We are given m samples {(x i, y i )} m i=1 Dm and a hypothesis space H and we wish to return h H minimizing L D (h) = E[l(h(x), y)]. Problem

More information

Review: Support vector machines. Machine learning techniques and image analysis

Review: Support vector machines. Machine learning techniques and image analysis Review: Support vector machines Review: Support vector machines Margin optimization min (w,w 0 ) 1 2 w 2 subject to y i (w 0 + w T x i ) 1 0, i = 1,..., n. Review: Support vector machines Margin optimization

More information

Support Vector Machine

Support Vector Machine Andrea Passerini passerini@disi.unitn.it Machine Learning Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

More information

Machine Learning Lecture 6 Note

Machine Learning Lecture 6 Note Machine Learning Lecture 6 Note Compiled by Abhi Ashutosh, Daniel Chen, and Yijun Xiao February 16, 2016 1 Pegasos Algorithm The Pegasos Algorithm looks very similar to the Perceptron Algorithm. In fact,

More information

CSCI567 Machine Learning (Fall 2018)

CSCI567 Machine Learning (Fall 2018) CSCI567 Machine Learning (Fall 2018) Prof. Haipeng Luo U of Southern California Sep 12, 2018 September 12, 2018 1 / 49 Administration GitHub repos are setup (ask TA Chi Zhang for any issues) HW 1 is due

More information

Evaluation requires to define performance measures to be optimized

Evaluation requires to define performance measures to be optimized Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation

More information

ECS171: Machine Learning

ECS171: Machine Learning ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018 Gradient descent Optimization Goal: find the minimizer of a function min f (w) w For now we assume f

More information

Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm. Lecturer: Sanjeev Arora

Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm. Lecturer: Sanjeev Arora princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm Lecturer: Sanjeev Arora Scribe: (Today s notes below are

More information

CS Communication Complexity: Applications and New Directions

CS Communication Complexity: Applications and New Directions CS 2429 - Communication Complexity: Applications and New Directions Lecturer: Toniann Pitassi 1 Introduction In this course we will define the basic two-party model of communication, as introduced in the

More information

MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE March 28, 2012 The exam is closed book. You are allowed a double sided one page cheat sheet. Answer the questions in the spaces provided on

More information

Multiclass Boosting with Repartitioning

Multiclass Boosting with Repartitioning Multiclass Boosting with Repartitioning Ling Li Learning Systems Group, Caltech ICML 2006 Binary and Multiclass Problems Binary classification problems Y = { 1, 1} Multiclass classification problems Y

More information

CIS519: Applied Machine Learning Fall Homework 5. Due: December 10 th, 2018, 11:59 PM

CIS519: Applied Machine Learning Fall Homework 5. Due: December 10 th, 2018, 11:59 PM CIS59: Applied Machine Learning Fall 208 Homework 5 Handed Out: December 5 th, 208 Due: December 0 th, 208, :59 PM Feel free to talk to other members of the class in doing the homework. I am more concerned

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1

More information

AdaBoost. Lecturer: Authors: Center for Machine Perception Czech Technical University, Prague

AdaBoost. Lecturer: Authors: Center for Machine Perception Czech Technical University, Prague AdaBoost Lecturer: Jan Šochman Authors: Jan Šochman, Jiří Matas Center for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz Motivation Presentation 2/17 AdaBoost with trees

More information

TDT4173 Machine Learning

TDT4173 Machine Learning TDT4173 Machine Learning Lecture 3 Bagging & Boosting + SVMs Norwegian University of Science and Technology Helge Langseth IT-VEST 310 helgel@idi.ntnu.no 1 TDT4173 Machine Learning Outline 1 Ensemble-methods

More information

Statistical learning theory, Support vector machines, and Bioinformatics

Statistical learning theory, Support vector machines, and Bioinformatics 1 Statistical learning theory, Support vector machines, and Bioinformatics Jean-Philippe.Vert@mines.org Ecole des Mines de Paris Computational Biology group ENS Paris, november 25, 2003. 2 Overview 1.

More information

Support Vector Machines for Classification: A Statistical Portrait

Support Vector Machines for Classification: A Statistical Portrait Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,

More information

Classifier Complexity and Support Vector Classifiers

Classifier Complexity and Support Vector Classifiers Classifier Complexity and Support Vector Classifiers Feature 2 6 4 2 0 2 4 6 8 RBF kernel 10 10 8 6 4 2 0 2 4 6 Feature 1 David M.J. Tax Pattern Recognition Laboratory Delft University of Technology D.M.J.Tax@tudelft.nl

More information

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES Supervised Learning Linear vs non linear classifiers In K-NN we saw an example of a non-linear classifier: the decision boundary

More information

CS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines

CS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines CS4495/6495 Introduction to Computer Vision 8C-L3 Support Vector Machines Discriminative classifiers Discriminative classifiers find a division (surface) in feature space that separates the classes Several

More information

The role of dimensionality reduction in classification

The role of dimensionality reduction in classification The role of dimensionality reduction in classification Weiran Wang and Miguel Á. Carreira-Perpiñán Electrical Engineering and Computer Science University of California, Merced http://eecs.ucmerced.edu

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 12: Weak Learnability and the l 1 margin Converse to Scale-Sensitive Learning Stability Convex-Lipschitz-Bounded Problems

More information

Lecture 4. 1 Learning Non-Linear Classifiers. 2 The Kernel Trick. CS-621 Theory Gems September 27, 2012

Lecture 4. 1 Learning Non-Linear Classifiers. 2 The Kernel Trick. CS-621 Theory Gems September 27, 2012 CS-62 Theory Gems September 27, 22 Lecture 4 Lecturer: Aleksander Mądry Scribes: Alhussein Fawzi Learning Non-Linear Classifiers In the previous lectures, we have focused on finding linear classifiers,

More information

Radial-Basis Function Networks

Radial-Basis Function Networks Radial-Basis Function etworks A function is radial basis () if its output depends on (is a non-increasing function of) the distance of the input from a given stored vector. s represent local receptors,

More information

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Lecture 4: Types of errors. Bayesian regression models. Logistic regression Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture

More information

Logistic Regression. COMP 527 Danushka Bollegala

Logistic Regression. COMP 527 Danushka Bollegala Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will

More information

Adaptive Crowdsourcing via EM with Prior

Adaptive Crowdsourcing via EM with Prior Adaptive Crowdsourcing via EM with Prior Peter Maginnis and Tanmay Gupta May, 205 In this work, we make two primary contributions: derivation of the EM update for the shifted and rescaled beta prior and

More information

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall

More information

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Gaussian and Linear Discriminant Analysis; Multiclass Classification Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015

More information

Littlestone s Dimension and Online Learnability

Littlestone s Dimension and Online Learnability Littlestone s Dimension and Online Learnability Shai Shalev-Shwartz Toyota Technological Institute at Chicago The Hebrew University Talk at UCSD workshop, February, 2009 Joint work with Shai Ben-David

More information

Online Learning with Experts & Multiplicative Weights Algorithms

Online Learning with Experts & Multiplicative Weights Algorithms Online Learning with Experts & Multiplicative Weights Algorithms CS 159 lecture #2 Stephan Zheng April 1, 2016 Caltech Table of contents 1. Online Learning with Experts With a perfect expert Without perfect

More information

Support vector machines Lecture 4

Support vector machines Lecture 4 Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The

More information

Support Vector Machines and Kernel Methods

Support Vector Machines and Kernel Methods 2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University

More information

VC Dimension Review. The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces.

VC Dimension Review. The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces. VC Dimension Review The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces. Previously, in discussing PAC learning, we were trying to answer questions about

More information

Economics 204 Fall 2011 Problem Set 1 Suggested Solutions

Economics 204 Fall 2011 Problem Set 1 Suggested Solutions Economics 204 Fall 2011 Problem Set 1 Suggested Solutions 1. Suppose k is a positive integer. Use induction to prove the following two statements. (a) For all n N 0, the inequality (k 2 + n)! k 2n holds.

More information

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Perceptrons Definition Perceptron learning rule Convergence Margin & max margin classifiers (Linear) support vector machines Formulation

More information

EC 521 MATHEMATICAL METHODS FOR ECONOMICS. Lecture 1: Preliminaries

EC 521 MATHEMATICAL METHODS FOR ECONOMICS. Lecture 1: Preliminaries EC 521 MATHEMATICAL METHODS FOR ECONOMICS Lecture 1: Preliminaries Murat YILMAZ Boğaziçi University In this lecture we provide some basic facts from both Linear Algebra and Real Analysis, which are going

More information

Advanced Introduction to Machine Learning

Advanced Introduction to Machine Learning 10-715 Advanced Introduction to Machine Learning Homework Due Oct 15, 10.30 am Rules Please follow these guidelines. Failure to do so, will result in loss of credit. 1. Homework is due on the due date

More information

Spring, 2010 CIS 511. Introduction to the Theory of Computation Jean Gallier. Homework 4

Spring, 2010 CIS 511. Introduction to the Theory of Computation Jean Gallier. Homework 4 Spring, 00 CIS 5 Introduction to the Theory of Computation Jean Gallier Do either Problem B or Problem B. Do Problems B3, B4, B5 and B6. B problems must be turned in. Homework 4 March 4, 00; Due March

More information

10-701/ Machine Learning - Midterm Exam, Fall 2010

10-701/ Machine Learning - Midterm Exam, Fall 2010 10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam

More information

Multiclass and Introduction to Structured Prediction

Multiclass and Introduction to Structured Prediction Multiclass and Introduction to Structured Prediction David S. Rosenberg New York University March 27, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 March 27, 2018 1 / 49 Contents

More information

What is semi-supervised learning?

What is semi-supervised learning? What is semi-supervised learning? In many practical learning domains, there is a large supply of unlabeled data but limited labeled data, which can be expensive to generate text processing, video-indexing,

More information