Machine Learning: and 15781, 2003 Assignment 4

Similar documents
Instance-Based Learning and Clustering

Linear Classification, SVMs and Nearest Neighbors

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Lecture 10 Support Vector Machines. Oct

10-701/ Machine Learning, Fall 2005 Homework 3

Kernel Methods and SVMs Extension

Lecture 10 Support Vector Machines II

Support Vector Machines

Chapter 14 Simple Linear Regression

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING

18-660: Numerical Methods for Engineering Design and Optimization

Classification as a Regression Problem

Support Vector Machines CS434

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

Which Separator? Spring 1

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015

Lecture Notes on Linear Regression

NP-Completeness : Proofs

Image classification. Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing i them?

JSM Survey Research Methods Section. Is it MAR or NMAR? Michail Sverchkov

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Generative classification models

Natural Language Processing and Information Retrieval

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014

Support Vector Machines

Lecture 3: Dual problems and Kernels

Ensemble Methods: Boosting

Homework Assignment 3 Due in class, Thursday October 15

Instance-Based Learning (a.k.a. memory-based learning) Part I: Nearest Neighbor Classification

Support Vector Machines

Support Vector Machines

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

Prediction of the reliability of genomic breeding values for crossbred performance

ISQS 6348 Final Open notes, no books. Points out of 100 in parentheses. Y 1 ε 2

Excess Error, Approximation Error, and Estimation Error

The Expectation-Maximization Algorithm

Support Vector Machines

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

Learning from Data 1 Naive Bayes

Problem Set 9 Solutions

Spectral Clustering. Shannon Quinn

Controller Design for Networked Control Systems in Multiple-packet Transmission with Random Delays

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ

The corresponding link function is the complementary log-log link The logistic model is comparable with the probit model if

Classification learning II

Admin NEURAL NETWORKS. Perceptron learning algorithm. Our Nervous System 10/25/16. Assignment 7. Class 11/22. Schedule for the rest of the semester

Supporting Information

SDMML HT MSc Problem Sheet 4

Lecture 2: Prelude to the big shrink

p 1 c 2 + p 2 c 2 + p 3 c p m c 2

LECTURE 9 CANONICAL CORRELATION ANALYSIS

Module 2. Random Processes. Version 2 ECE IIT, Kharagpur

Clustering. CS4780/5780 Machine Learning Fall Thorsten Joachims Cornell University

DUE: WEDS FEB 21ST 2018

Chapter 12. Ordinary Differential Equation Boundary Value (BV) Problems

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

Lecture 10: May 6, 2013

Georgia Tech PHYS 6124 Mathematical Methods of Physics I

Support Vector Machines CS434

Discriminative classifier: Logistic Regression. CS534-Machine Learning

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

Linear Feature Engineering 11

Semi-Supervised Learning

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Why Monte Carlo Integration? Introduction to Monte Carlo Method. Continuous Probability. Continuous Probability

Lecture 4: September 12

CSC 411 / CSC D11 / CSC C11

DOAEstimationforCoherentSourcesinBeamspace UsingSpatialSmoothing

Outline. Clustering: Similarity-Based Clustering. Supervised Learning vs. Unsupervised Learning. Clustering. Applications of Clustering

Week 5: Neural Networks

CHALMERS, GÖTEBORGS UNIVERSITET. SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD

Discriminative classifier: Logistic Regression. CS534-Machine Learning

Multilayer Perceptron (MLP)

FE REVIEW OPERATIONAL AMPLIFIERS (OP-AMPS)( ) 8/25/2010

A solution to the Curse of Dimensionality Problem in Pairwise Scoring Techniques

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

CSE 546 Midterm Exam, Fall 2014(with Solution)

First Year Examination Department of Statistics, University of Florida

17 Support Vector Machines

EM and Structure Learning

Learning Theory: Lecture Notes

Newton s Method for One - Dimensional Optimization - Theory

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

CS286r Assign One. Answer Key

FAULT DETECTION AND IDENTIFICATION BASED ON FULLY-DECOUPLED PARITY EQUATION

Machine learning: Density estimation

Chapter 6 Support vector machine. Séparateurs à vaste marge

CS 229, Public Course Problem Set #3 Solutions: Learning Theory and Unsupervised Learning

Relevance Vector Machines Explained

Curve Fitting with the Least Square Method

Systems of Equations (SUR, GMM, and 3SLS)

The Geometry of Logit and Probit

PhysicsAndMathsTutor.com

β0 + β1xi. You are interested in estimating the unknown parameters β

CSCI B609: Foundations of Data Science

Space of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics

Additional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty

Transcription:

ahne Learnng: 070 and 578, 003 Assgnment 4. VC Dmenson 30 onts Consder the spae of nstane X orrespondng to all ponts n the D x, plane. Gve the VC dmenson of the followng hpothess spaes. No explanaton requred. a H r the set of all axes-parallel retangles n the H r { a < x < b < < d a, b,, d R} lassfed as postve. Answer: 4 * x, plane. That s,. onts nsde the retangle are * * * b H a lke a, but nludng all retangles not just the ones parallel to the axes of oordnate sstem. Answer: at least 5 * * * * *

H d real-valued, depth- deson trees. For example, the followng trees are n H d. x x >x0 / \ <x0 >x0 / \ <x0 / \ / \ / \ / \ / \ / \ _ >0 / \ <0 / \ >0 / \ <0 / \ > / \ < / \ / \ / \ / \ + - - + - + Answer : at least 5 assume attrbutes used n st and nd splts must be dfferent Answer : at least 6 otherwse

e Zero onts H hpotheses wth the form f θ sn α, x,α R where f θ z ff z > 0 and 0 otherwse. You an onsder ths queston n D spae. Zero onts: Onl Do Ths For Fun If You Would Lke To And If You Have Tme. Answer: nfnte read Burges tutoral for detal. Support Vetor ahne 45 onts The followng queston requres ou to use atlab, but t has been desgned to be just as eas for a atlab nove as for a atlab expert. lease read the appendx for how to use atlab. We wll nvestgate Support Vetor ahne wth two to datasets. The fle d.lean ontans 00 examples eah of whh has a real-valued nput attrbute x and a lass label. The data set s generated n the followng wa: x ~ p where p 0. 5 ff x, otherwse p 0. x 0 x < 0 In addton, we add nose to ths dataset b negatng the lass of some examples that produes the nos dataset d.nose. Tranng a SV nvolves settng up a onvex quadrat optmzaton problem and solvng t. atlab s a ommon mathematal programmng language that faltates ths b provdng quadrat programmng funtons, qp or quadprog. A atlab program, svm.m, has been prepared for ou that trans the SV on eah of the datasets and outputs results nludng margn wdth, tranng error, number of support vetors and number of mslassfatons. In ths assgnment, ou are asked to nvestgate the mpat of the trade-off weght C on margn wdth and tranng error. Consderng the objet funton for non-separable ase: L w + C ξ the margn wdth s defned as: argn w, and the tranng set error s defned as: Error. a Read the program and omplete the settng up of Hessan matrx H, and the lower and upper bounds for Lagrange multpler α Alpha. Note that the Hessan matrx H s exatl the Q matrx n our sldes. H, j X X j Y Y j LB zerosnsample, UB C n * ones nsample, ξ

b easure the mpat of trade-off weght C on the margn wdth and tranng error wth the lean dataset d.lean. Turn n plots, showng how margn wdth and tranng error var wth C, nludng the values 0.0, 0.,, 5, 0, 0, 50, 00, 00, 500, 000, 000, 5000, 0000 and nf. easure the mpat of trade-off weght C on the margn wdth and tranng error wth the nose dataset d.nose. Turn n plots, showng how margn wdth and tranng error var wth C, nludng the values 0.0, 0.,, 5, 0, 0, 50, 00, 00, 500, 000, 000, 5000, 0000 and nf. d Brefl explan our fndngs. Some hnts: when C s small C 0.0, 0.,, 5 beause w α x and 0 α C > w s small > large margn w. lease note that the margn value alulated from the formula w ma be wrong when C s small. when C gets too large C nf The nrease of margn and tranng error for nos data when C nf s probabl due to numeral nstablt. oreover, settng C to nf s equvalent to assume the dataset s separable whh sn t true for the nos data. 3. K Nearest Neghbor n Regresson 5 onts Suppose we have a real-valued tranng dataset { x,, x R, R} whh s generated usng the followng dstrbuton: ~ N, σ where s unknown to us. Note that we assume the varane σ s known. x ~ where for 0 x, otherwse 0. Our task s to ompare the performane of the followng two regresson algorthms. Alg: Use axmum Lkelhood Estmaton LE to learn from the dataset. The LE assumpton results n. For an nput x, the output s smpl. Alg: Use - Nearest Neghbor to predt. Namel, value assoated wth the tranng set datapont x that s the nearest neghbor of x. If there s a te among multple tranng dataponts for beng the nearest neghbor of x, then we just randoml selet one of them., where s the output a Assume. What s the expeted squared error of Alg and Alg on the tranng set? For Alg, lm x?

For Alg, lm x? For Alg : Answer: lm lm lm x lm σ For Alg : Answer : Beause x s real-valued ontnuous varable, so theoretall x x j 0 for an two tranng examples x, and x j, j. Namel, no two tranng examples have the same x value. Therefore, lm lm x Answer : When we use a omputer to generate x, we wll fnd that x beomes dsrete due to the aura loss of the omputer. oreover, there wll be nfnte number of tranng examples wth the same x value when. In ths ase, lm lm σ x [ + lm 0 + ] b Assume. What s the expeted squared error of Alg and Alg for predtng the output of a future data pont x, generated n the same wa as tranng data. For Alg, E? For Alg, E? For Alg : Answer: E E σ For Alg : Answer:

,,, σ + + d d d d d d d d d d x E