Geometric View of Machine Learning Nearest Neighbor Classification. Slides adapted from Prof. Carpuat

Similar documents
CMSC 422 Introduction to Machine Learning Lecture 4 Geometry and Nearest Neighbors. Furong Huang /

Computational Methods CMSC/AMSC/MAPL 460. Linear Systems, Matrices, LU Decomposition, Ramani Duraiswami, Dept. of Computer Science

Machine Learning Basics

Linear and Logistic Regression. Dr. Xiaowei Huang

Machine Learning. Kernels. Fall (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang. (Chap. 12 of CIML)

Nearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2

Computational Methods CMSC/AMSC/MAPL 460. Vectors, Matrices, Linear Systems, LU Decomposition, Ramani Duraiswami, Dept. of Computer Science

MIRA, SVM, k-nn. Lirong Xia

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Linear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7.

Lecture Notes 1: Vector spaces

Mining Classification Knowledge

Brief Introduction to Machine Learning

CSC 411 Lecture 6: Linear Regression

Announcements. Proposals graded

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Introduction to Machine Learning

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Dot product. The dot product is an inner product on a coordinate vector space (Definition 1, Theorem

Metric-based classifiers. Nuno Vasconcelos UCSD

Linear classifiers Lecture 3

Learning Theory Continued

ECE521 week 3: 23/26 January 2017

PAC-learning, VC Dimension and Margin-based Bounds

Support Vector Machines

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

ESS2222. Lecture 4 Linear model

EECS 349:Machine Learning Bryan Pardo

Mining Classification Knowledge

Data Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

Tufts COMP 135: Introduction to Machine Learning

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

Lecture 6. Notes on Linear Algebra. Perceptron

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)

1 Handling of Continuous Attributes in C4.5. Algorithm

Introduction to Machine Learning Midterm Exam

Perceptron. Subhransu Maji. CMPSCI 689: Machine Learning. 3 February February 2015

Linear Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com

ML in Practice: CMSC 422 Slides adapted from Prof. CARPUAT and Prof. Roth

COMS 4771 Introduction to Machine Learning. Nakul Verma

CSE446: non-parametric methods Spring 2017

Nonlinear Classification

Lecture 2: Linear regression

Bayes Classifiers. CAP5610 Machine Learning Instructor: Guo-Jun QI

1 Handling of Continuous Attributes in C4.5. Algorithm

The Perceptron algorithm

L11: Pattern recognition principles

Linear Classifiers. Michael Collins. January 18, 2012

Machine Learning. Theory of Classification and Nonparametric Classifier. Lecture 2, January 16, What is theoretically the best classifier

CS 188: Artificial Intelligence Spring Announcements

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University

Machine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods)

Preliminaries. Definition: The Euclidean dot product between two vectors is the expression. i=1

Machine Learning and Deep Learning! Vincent Lepetit!

The geometry of least squares

II. Linear Models (pp.47-70)

CSC321 Lecture 2: Linear Regression

ORIE 4741: Learning with Big Messy Data. Train, Test, Validate

18.6 Regression and Classification with Linear Models

Machine Learning, Fall 2011: Homework 5

Holdout and Cross-Validation Methods Overfitting Avoidance

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

A first model of learning

Probability and Statistical Decision Theory

A Course in Machine Learning

9 Classification. 9.1 Linear Classifiers

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017

Linear models and the perceptron algorithm

Support Vector Machines. Machine Learning Fall 2017

Statistical Data Mining and Machine Learning Hilary Term 2016

Lecture 2: Linear Algebra Review

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Logistic Regression & Neural Networks

CS 188: Artificial Intelligence Spring Announcements

Neural Networks. Prof. Dr. Rudolf Kruse. Computational Intelligence Group Faculty for Computer Science

4 THE PERCEPTRON. 4.1 Bio-inspired Learning. Learning Objectives:

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

CITS 4402 Computer Vision

Announcements. CS 188: Artificial Intelligence Spring Classification. Today. Classification overview. Case-Based Reasoning

Machine Learning

LEC1: Instance-based classifiers

Day 3: Classification, logistic regression

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

Classification: The rest of the story

Inner Product Spaces

Definition 1. A set V is a vector space over the scalar field F {R, C} iff. there are two operations defined on V, called vector addition

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

Introduction to Machine Learning Midterm Exam Solutions

Applied inductive learning - Lecture 4

Logic and machine learning review. CS 540 Yingyu Liang

Review: Support vector machines. Machine learning techniques and image analysis

CS145: INTRODUCTION TO DATA MINING

Lecture 4: Linear predictors and the Perceptron

Transcription:

Geometric View of Machine Learning Nearest Neighbor Classification Slides adapted from Prof. Carpuat

What we know so far Decision Trees What is a decision tree, and how to induce it from data Fundamental Machine Learning Concepts Difference between memorization and generalization What inductive bias is, and what is its role in learning What underfitting and overfitting means How to take a task and cast it as a learning problem Why you should never ever touch your test data!!

Linear Algebra Provides compact representation of data For a given example, all its features can be represented as a single vector An entire dataset can be represented as a single matrix Provide ways of manipulating these objects Dot products, vector/matrix operations, etc Provides formal ways of describing and discovering patterns in data Examples are points in a Vector Space We can use Norms and Distances to compare them Some are valid for feature data types Some can be made valid, with generalization

Mathematical view of vectors Ordered set of numbers: (1,2,3,4) Example: (x,y,z) coordinates of a point in space. The 16384 pixels in a 128 128 image of a face List of choices in the tennis example Vectors usually indicated with bold lower case letters. Scalars with lower case Usual mathematical operations with vectors: Addition operation u + v, with: Identity 0 v + 0 = v Inverse - v + (-v) = 0 Scalar multiplication: Distributive rule: a(u + v) = a(u) + a(v) (a + b)u = au + bu

Dot Product The dot product or, more generally, inner product of two vectors is a scalar: v 1 v 2 = x 1 x 2 + y 1 y 2 + z 1 z 2 (in 3D) Useful for many purposes Computing the Euclidean length of a vector: length(v) = sqrt(v v) Normalizing a vector, making it unit-length Computing the angle between two vectors: u v = u v cos(θ) Checking two vectors for orthogonality Projecting one vector onto another v Other ways of measuring length and distance are possible θ u

2 Vector norms v ( x, x,, x ) 1 2 n Two norm (Euclidean norm) n v x 2 2 i i 1 If v 1, v is a unit vector Infinity norm v max( x 1, x 2,, x ) n One norm ("Manhattan distance") n v x 21 i i 1 For a 2 dimensional vector, write down the set of vectors with two, one and infinity norm equal to unity

Nearest Neighbor Intuition points close in a feature space are likely to belong to the same class Choosing right features is very important Nearest Neighbors (NN) algorithms for classification K-NN, Epsilon ball NN Fundamental Machine Learning Concepts Decision boundary

Intuition for Nearest Neighbor Classification Simple idea Store all training examples Classify new examples based on label for K closest training examples Training may just involve making structures to make computing closest examples cheaper

K Nearest Neighbor classification Classification is based on Test instance with Training Data K: number of neighbors that unknown class in { 1; +1 }

2 approaches to learning Eager learning (eg decision trees) Learn/Train Induce an abstract model from data Test/Predict/Classify Apply learned model to new data Lazy learning (eg nearest neighbors) Learn Just store data in memory Test/Predict/Classify Compare new data to stored data Properties Retains all information seen in training Complex hypothesis space Classification can be very slow

Components of a k-nn Classifier Distance metric How do we measure distance between instances? Determines the layout of the example space The k hyperparameter How large a neighborhood should we consider? Determines the complexity of the hypothesis space

Distance metrics We can use any distance function to select nearest neighbors. Different distances yield different neighborhoods L2 distance ( = Euclidean distance) L1 distance Max norm

K=1 and Voronoi Diagrams Imagine we are given a bunch of training examples Find regions in the feature space which are closest to every training example Algorithm if our test point is in the region corresponding to a given input point return its label

Decision Boundary of a Classifier It is simply the line that separates positive and negative regions in the feature space Why is it useful? it helps us visualize how examples will be classified for the entire feature space it helps us visualize the complexity of the learned model

Decision Boundaries for 1-NN

Decision Boundaries change with the distance function

Decision Boundaries change with K

The k hyperparameter Tunes the complexity of the hypothesis space If k = 1, every training example has its own neighborhood If k = N, the entire feature space is one neighborhood! Higher k yields smoother decision boundaries How would you set k in practice?