Midterm. Introduction to Machine Learning. CS 189 Spring You have 1 hour 20 minutes for the exam.

Similar documents
Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.

Midterm exam CS 189/289, Fall 2015

Machine Learning Practice Page 2 of 2 10/28/13

Support Vector Machine (continued)

10-701/ Machine Learning - Midterm Exam, Fall 2010

Machine Learning, Midterm Exam: Spring 2009 SOLUTION

Midterm II. Introduction to Artificial Intelligence. CS 188 Spring ˆ You have approximately 1 hour and 50 minutes.

Linear & nonlinear classifiers

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Introduction to Machine Learning Midterm, Tues April 8

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Jeff Howbert Introduction to Machine Learning Winter

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron

Cheng Soon Ong & Christian Walder. Canberra February June 2018

CS534 Machine Learning - Spring Final Exam

Linear & nonlinear classifiers

Support Vector Machines and Kernel Methods

MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

Midterm: CS 6375 Spring 2015 Solutions

The exam is closed book, closed notes except your one-page cheat sheet.

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Midterm: CS 6375 Spring 2018

Topics we covered. Machine Learning. Statistics. Optimization. Systems! Basics of probability Tail bounds Density Estimation Exponential Families

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

CS , Fall 2011 Assignment 2 Solutions

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

ICS-E4030 Kernel Methods in Machine Learning

Ch 4. Linear Models for Classification

Overfitting, Bias / Variance Analysis

Pattern Recognition and Machine Learning

Discriminative Models

Kernelized Perceptron Support Vector Machines

Machine Learning and Data Mining. Support Vector Machines. Kalev Kask

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

Discriminative Models

Machine Learning, Midterm Exam

Midterm II. Introduction to Artificial Intelligence. CS 188 Spring ˆ You have approximately 1 hour and 50 minutes.

CS798: Selected topics in Machine Learning

Linear Models for Classification

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

Introduction to Machine Learning Midterm Exam

Review: Support vector machines. Machine learning techniques and image analysis

Machine Learning And Applications: Supervised Learning-SVM

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Support Vector Machines

Announcements - Homework

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning

Probabilistic Machine Learning. Industrial AI Lab.

Homework 2 Solutions Kernel SVM and Perceptron

Machine Learning. Support Vector Machines. Manfred Huber

Homework 4. Convex Optimization /36-725

Linear Classification

Support Vector Machines, Kernel SVM

Support Vector Machines

L5 Support Vector Classification

Name (NetID): (1 Point)

CS-E4830 Kernel Methods in Machine Learning

Kernels and the Kernel Trick. Machine Learning Fall 2017

LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Support Vector Machine (SVM) and Kernel Methods

Relevance Vector Machines

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

Max Margin-Classifier

Perceptron Revisited: Linear Separators. Support Vector Machines

10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers

CS145: INTRODUCTION TO DATA MINING

Nonparametric Bayesian Methods (Gaussian Processes)

CSE 546 Final Exam, Autumn 2013

Support Vector Machine

Introduction to SVM and RVM

Kernel Methods. Machine Learning A W VO

Support Vector Machine (SVM) and Kernel Methods

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

CSC 411: Lecture 04: Logistic Regression

Kernel methods, kernel SVM and ridge regression

Lecture 10: A brief introduction to Support Vector Machine

Math for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han

6.036 midterm review. Wednesday, March 18, 15

Basis Expansion and Nonlinear SVM. Kai Yu

Introduction to Machine Learning Midterm Exam Solutions

Deep Learning for Computer Vision

Midterm Exam, Spring 2005

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Kernel Methods and Support Vector Machines

CSC 411 Lecture 17: Support Vector Machine

1. Kernel ridge regression In contrast to ordinary least squares which has a cost function. m (θ T x (i) y (i) ) 2, J(θ) = 1 2.

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines

Support Vector Machines: Maximum Margin Classifiers

Transcription:

CS 189 Spring 2013 Introduction to Machine Learning Midterm You have 1 hour 20 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. Please use non-programmable calculators only. Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. All short answer sections can be successfully answered in a few sentences AT MOST. For true/false questions, fill in the True/False bubble. For multiple-choice questions, fill in the bubbles for ALL CORRECT CHOICES (in some cases, there may be more than one). For a question with p points and k choices, every false positive wil incur a penalty of p/(k 1) points. First name Last name SID For staff use only: Q1. True/False /14 Q2. Multiple Choice Questions /21 Q3. Short Answers /15 Total /50 1

Q1. [14 pts] True/False (a) [1 pt] In Support Vector Machines, we maximize w 2 2 subject to the margin constraints. (b) [1 pt] In kernelized SVMs, the kernel matrix K has to be positive definite. (c) [1 pt] If two random variables are independent, then they have to be uncorrelated. (d) [1 pt] Isocontours of Gaussian distributions have axes whose lengths are proportional to the eigenvalues of the covariance matrix. (e) [1 pt] The RBF kernel (K(x i, x j ) = exp( γ x i x j 2 )) corresponds to an infinite dimensional mapping of the feature vectors. (f) [1 pt] If (X, Y ) are jointly Gaussian, then X and Y are also Gaussian distributed. (g) [1 pt] A function f(x, y, z) is convex if the Hessian of f is positive semi-definite. (h) [1 pt] In a least-squares linear regression problem, adding an L 2 regularization penalty cannot decrease the L 2 error of the solution w on the training data. (i) [1 pt] In linear SVMs, the optimal weight vector w is a linear combination of training data points. (j) [1 pt] In stochastic gradient descent, we take steps in the exact direction of the gradient vector. (k) [1 pt] In a two class problem when the class conditionals P (x y = 0) and P (x y = 1) are modelled as Gaussians with different covariance matrices, the posterior probabilities turn out to be logistic functions. (l) [1 pt] The perceptron training procedure is guaranteed to converge if the two classes are linearly separable. (m) [1 pt] The maximum likelihood estimate for the variance of a univariate Gaussian is unbiased. (n) [1 pt] In linear regression, using an L 1 regularization penalty term results in sparser solutions than using an L 2 regularization penalty term. 2

Q2. [21 pts] Multiple Choice Questions (a) [2 pts] If X N (µ, σ 2 ) and Y = ax + b, then the variance of Y is: aσ 2 + b a 2 σ 2 + b aσ 2 a 2 σ 2 (b) [2 pts] In soft margin SVMs, the slack variables ξ i defined in the constraints y i (w T x i + b) 1 ξ i have to be < 0 0 > 0 0 (c) [4 pts] Which of the following transformations when applied on X N (µ, Σ) transforms it into an axis aligned Gaussian? (Σ = UDU T is the spectral decomposition of Σ) U 1 (X µ) (UD 1/2 ) 1 (X µ) (UD) 1 (X µ) U(X µ) UD(X µ) Σ 1 (X µ) (d) [2 pts] Consider the sigmoid function f(x) = 1/(1 + e x ). The derivative f (x) is f(x) ln f(x) + (1 f(x)) ln(1 f(x)) f(x) ln(1 f(x)) f(x)(1 f(x)) f(x)(1 + f(x)) (e) [2 pts] In regression, using an L 2 regularizer is equivalent to using a prior. Laplace, 2βexp( x /β) Gaussian with Σ = ci, c R Exponential, β exp( x/β), for x > 0 Gaussian with diagonal covariance (Σ ci, c R) (f) [2 pts] Consider a two class classification problem with the loss matrix given as ( λ 11 λ 12 ) λ 21 λ 22. Note that λij is the loss for classifying an instance from class j as class i. At the decision boundary, the ratio P (ω2 x) P (ω 1 x) is equal to: λ11 λ22 λ 21 λ 12 λ11 λ21 λ 22 λ 12 λ11+λ22 λ 21+λ 12 λ11 λ12 λ 22 λ 21 (g) [2 pts] Consider the L 2 regularized loss function for linear regression L(w) = 1 2 Y Xw 2 + λ w 2, where λ is the regularization parameter. The Hessian matrix 2 wl(w) is X T X 2λX T X X T X + 2λI (X T X) 1 (h) [2 pts] The geometric margin in a hard margin Support Vector Machine is w 2 2 1 w 2 2 w 2 w 2 (i) [3 pts] Which of the following functions are convex? sin(x) x min(f 1 (x), f 2 (x)), where f 1 and f 2 are convex max(f 1 (x), f 2 (x)), where f 1 and f 2 are convex 3

Q3. [15 pts] Short Answers (a) [4 pts] For a hard margin SVM, give an expression to calculate b given the solutions for w and the Lagrange multipliers {α i } N i=1. Using the KKT conditions α i (y i (w T x i + b) 1) = 0, we know that for support vectors, α i 0. Thus for some α i 0, y i (w T x i + b) = 1 and thus b = y i w T x i For numerical stability, we can take an average over all the support vectors. b = y i wt x i S v x i S v (b) Consider a Bernoulli random variable X with parameter p (P (X = 1) = p). We observe the following samples of X: (1, 1, 0, 1). (i) [2 pts] Give an expression for the likelihood as a function of p. L(p) = p 3 (1 p) (ii) [2 pts] Give an expression for the derivative of the negative log likelihood. dn LL(p) dp = 1 1 p 3 p (iii) [1 pt] What is the maximum likelihood estimate of p? p = 3 4 (c) [6 pts] Consider the weighted least squares problem in which you are given a dataset { x i, y i, w i } N i=1, where w i is an importance weight attached to the i th data point. The loss is defined as L(β) = N i=1 w i(y i β T x i ) 2. Give an expression to calculate the coefficients β in closed form. Hint: You might need to use a matrix W such that diag(w ) = [w 1 w 2... w N ] T. y 1 x T 1 y 2 Define Y =. and X = x T 2... y N x T N Then L(β) = (Y Xβ) T W (Y Xβ). Setting dl(β) dβ = 0, we get β = (X T W X) 1 X T W Y 4

Scratch paper 5

Scratch paper 6