CMSC 422 Introduction to Machine Learning Lecture 4 Geometry and Nearest Neighbors. Furong Huang /

Similar documents
Geometric View of Machine Learning Nearest Neighbor Classification. Slides adapted from Prof. Carpuat

Linear and Logistic Regression. Dr. Xiaowei Huang

MIRA, SVM, k-nn. Lirong Xia

Machine Learning. Kernels. Fall (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang. (Chap. 12 of CIML)

Machine Learning Basics

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Nearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2

II. Linear Models (pp.47-70)

ECE521 week 3: 23/26 January 2017

CSC 411 Lecture 6: Linear Regression

Computational Methods CMSC/AMSC/MAPL 460. Linear Systems, Matrices, LU Decomposition, Ramani Duraiswami, Dept. of Computer Science

Tufts COMP 135: Introduction to Machine Learning

Brief Introduction to Machine Learning

Lecture 4 Discriminant Analysis, k-nearest Neighbors

ESS2222. Lecture 4 Linear model

Mining Classification Knowledge

CS 188: Artificial Intelligence Spring Announcements

Linear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7.

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag

Machine Learning. Theory of Classification and Nonparametric Classifier. Lecture 2, January 16, What is theoretically the best classifier

CS 188: Artificial Intelligence Spring Announcements

Lecture 4: Linear predictors and the Perceptron

Announcements. CS 188: Artificial Intelligence Spring Classification. Today. Classification overview. Case-Based Reasoning

Day 3: Classification, logistic regression

1 Handling of Continuous Attributes in C4.5. Algorithm

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018

COMS 4771 Introduction to Machine Learning. Nakul Verma

Machine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods)

Lecture Notes 1: Vector spaces

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Ridge Regression 1. to which some random noise is added. So that the training labels can be represented as:

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)

1 Handling of Continuous Attributes in C4.5. Algorithm

LEC1: Instance-based classifiers

9 Classification. 9.1 Linear Classifiers

Lecture 2: Linear regression

Announcements. Proposals graded

ORIE 4741: Learning with Big Messy Data. Train, Test, Validate

CSE446: non-parametric methods Spring 2017

CSC321 Lecture 2: Linear Regression

PAC-learning, VC Dimension and Margin-based Bounds

Computational Methods CMSC/AMSC/MAPL 460. Vectors, Matrices, Linear Systems, LU Decomposition, Ramani Duraiswami, Dept. of Computer Science

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017

A first model of learning

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Machine Learning and Deep Learning! Vincent Lepetit!

EECS 349:Machine Learning Bryan Pardo

Mining Classification Knowledge

Data Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

Perceptron. Subhransu Maji. CMPSCI 689: Machine Learning. 3 February February 2015

Metric-based classifiers. Nuno Vasconcelos UCSD

18.9 SUPPORT VECTOR MACHINES

Consistency of Nearest Neighbor Methods

Nearest neighbor classification in metric spaces: universal consistency and rates of convergence

18.6 Regression and Classification with Linear Models

Nonlinear Classification

Bayes Classifiers. CAP5610 Machine Learning Instructor: Guo-Jun QI

A Course in Machine Learning

Learning Theory Continued

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

Linear classifiers Lecture 3

Supervised Learning: Non-parametric Estimation

Algorithms for Picture Analysis. Lecture 07: Metrics. Axioms of a Metric

Classification: The rest of the story

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Support Vector Machines

Learning with multiple models. Boosting.

Probability and Statistical Decision Theory

Applied inductive learning - Lecture 4

Introduction to Machine Learning Midterm Exam

Machine Learning, Fall 2011: Homework 5

More about the Perceptron

DATA MINING LECTURE 10

Neural Networks. Prof. Dr. Rudolf Kruse. Computational Intelligence Group Faculty for Computer Science

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Support Vector Machines. Machine Learning Fall 2017

Introduction to Machine Learning

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

L11: Pattern recognition principles

Lecture 6. Notes on Linear Algebra. Perceptron

Vote. Vote on timing for night section: Option 1 (what we have now) Option 2. Lecture, 6:10-7:50 25 minute dinner break Tutorial, 8:15-9

Lecture 2: Linear Algebra Review

4 THE PERCEPTRON. 4.1 Bio-inspired Learning. Learning Objectives:

Machine Learning

Machine Learning for Signal Processing Bayes Classification and Regression

Lecture 3: Empirical Risk Minimization

Recap from previous lecture

Distribution-specific analysis of nearest neighbor search and classification

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Linear Classifiers and the Perceptron

Lecture 4: Perceptrons and Multilayer Perceptrons

Understanding Generalization Error: Bounds and Decompositions

Linear Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Data Mining algorithms

Review: Support vector machines. Machine learning techniques and image analysis

Chapter 6 Classification and Prediction (2)

What s Cooking? Predicting Cuisines from Recipe Ingredients

PAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018

Transcription:

CMSC 422 Introduction to Machine Learning Lecture 4 Geometry and Nearest Neighbors Furong Huang / furongh@cs.umd.edu

What we know so far Decision Trees What is a decision tree, and how to induce it from data Fundamental Machine Learning Concepts Difference between memorization and generalization What inductive bias is, and what is its role in learning What underfitting and overfitting means How to take a task and cast it as a learning problem Why you should never ever touch your test data!!

Today s Topics Nearest Neighbors (NN) algorithms for classification K-NN, Epsilon ball NN Fundamental Machine Learning Concepts Decision boundary

Linear Algebra Provides compact representation of data For a given example, all its features can be represented as a single vector An entire dataset can be represented as a single matrix Provide ways of manipulating these objects Dot products, vector/matrix operations, etc Provides formal ways of describing and discovering patterns in data Examples are points in a Vector Space We can use Norms and Distances to compare them Some are valid for feature data types Some can be made valid, with generalization...

Mathematical view of vectors Ordered set of numbers: (1,2,3,4) Example: (x,y,z) coordinates of a point in space. The 16384 pixels in a 128 128 image of a face List of choices in the tennis example Vectors usually indicated with bold lower case letters. Scalars with lower case Usual mathematical operations with vectors: Addition operation! + $, with: v Zero % $ + % = $ v Inverse $ + $ = % Scalar multiplication: v Distributive rule: (! + $ = (! + ($ (( + *)! = (! + *!

Dot Product The dot product or, more generally, inner product of two vectors is a scalar:!!! " = $! $ " + &! & " + '! ' " (in 3D) Useful for many purposes Computing the Euclidean length of a vector: length(v) = sqrt(v v) Normalizing a vector, making it unit-length Computing the angle between two vectors: u v= u v cos(θ) Checking two vectors for orthogonality Projecting one vector onto another Other ways of measuring length and distance are possible

Vector Norms! = ($ %, $ ',, $ ) ) Two norm (Euclidean norm) )! ' = + $, ' If! ' = 1,! is a unit vector,-% Infinity norm! / = max $ %, $ ',, $ ) One norm (Manhattan distance) )! % = + $,,-%

Law of Large Numbers Suppose that!!,! ",! # are independent and identically distributed random variables The empirical sample average approaches the population average as the number of sample goes to infinity 1 Pr lim # % * +! & = -! = 1 &

Nearest Neighbor Intuition points close in a feature space are likely to belong to the same class Choosing right features is very important Nearest Neighbors (NN) algorithms for Classification K-NN, Epsilon ball NN Fundamental Machine Learning Concepts Decision boundary

Intuition for Nearest Neighbor Classification This rule of nearest neighbor has considerable elementary intuitive appeal and probably corresponds to practice in many situations. For example, it is possible that much medical diagnosis is influenced by the doctor s recollection of the subsequent history of an earlier patient whose symptoms resemble in some way those of the current patient. (Fix and Hodges, 1952)

Intuition for Nearest Neighbor Classification Simple idea Store all training examples Classify new examples based on label for K closest training examples Training may just involve making structures to make computing closest examples cheaper

classification is K Nearest Neighbor Classification Training Data K: number of neighbors that based on Test instance with unknown class in { 1; +1 }

2 approaches to learning Eager learning (eg decision trees) Learn/Train Induce an abstract model from data Test/Predict/Classify Apply learned model to new data Lazy learning (eg nearest neighbors) Learn Just store data in memory Test/Predict/Classify Compare new data to stored data Properties Retains all information seen in training Complex hypothesis space Classification can be very slow

Components of a k-nn Classifier Distance metric How do we measure distance between instances? Determines the layout of the example space The k hyperparameter How large a neighborhood should we consider? Determines the complexity of the hypothesis space

Distance metrics We can use any distance function to select nearest neighbors. Different distances yield different neighborhoods L2 distance ( = Euclidean distance) L1 distance Max norm

K=1 and Voronoi Diagrams Imagine we are given a bunch of training examples Find regions in the feature space which are closest to every training example Algorithm if our test point is in the region corresponding to a given input point return its label

Decision Boundary of a Classifier It is simply the line that separates positive and negative regions in the feature space Why is it useful? it helps us visualize how examples will be classified for the entire feature space it helps us visualize the complexity of the learned model

Decision Boundaries for 1-NN

Decision Boundaries change with the distance function

Decision Boundaries change with K

The k hyperparameter Tunes the complexity of the hypothesis space If k = 1, every training example has its own neighborhood If k = N, the entire feature space is one neighborhood! Higher k yields smoother decision boundaries How would you set k in practice?

What is the inductive bias of k-nn? Nearby instances should have the same label All features are equally important Complexity is tuned by the k parameter

Variations on k-nn: Weighted voting Weighted voting Default: all neighbors have equal weight Extension: weight neighbors by (inverse) distance

Variations on k-nn: Epsilon Ball Nearest Neighbors Same general principle as K-NN, but change the method for selecting which training examples vote Instead of using K nearest neighbors, use all examples x such that!"#$%&'( )*, ).

Exercise: How would you modify KNN-Predict to perform Epsilon Ball NN?

Furong Huang 3251 A.V. Williams, College Park, MD 20740 301.405.8010 / furongh@cs.umd.edu