MIMA Group. Chapter 4 Non-Parameter Estimation. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

Similar documents
Nonparametric Density Estimation Intro

6. Nonparametric techniques

Introduction to local (nonparametric) density estimation. methods

Chapter 4 (Part 1): Non-Parametric Classification (Sections ) Pattern Classification 4.3) Announcements

Bayes (Naïve or not) Classifiers: Generative Approach

Parameter Estimation

CS 2750 Machine Learning Lecture 5. Density estimation. Density estimation

An Introduction to. Support Vector Machine

Generative classification models

Binary classification: Support Vector Machines

Unsupervised Learning and Other Neural Networks

Random Variables. ECE 313 Probability with Engineering Applications Lecture 8 Professor Ravi K. Iyer University of Illinois

CS 1675 Introduction to Machine Learning Lecture 12 Support vector machines

Kernel-based Methods and Support Vector Machines

Bayesian Classification. CS690L Data Mining: Classification(2) Bayesian Theorem: Basics. Bayesian Theorem. Training dataset. Naïve Bayes Classifier

Applications of Multiple Biological Signals

Linear Regression Linear Regression with Shrinkage. Some slides are due to Tommi Jaakkola, MIT AI Lab

Lecture 3 Naïve Bayes, Maximum Entropy and Text Classification COSI 134

Feature Selection: Part 2. 1 Greedy Algorithms (continued from the last lecture)

Support vector machines

Nonparametric Techniques

Radial Basis Function Networks

CS 2750 Machine Learning. Lecture 8. Linear regression. CS 2750 Machine Learning. Linear regression. is a linear combination of input components x

Summary of the lecture in Biostatistics

Supervised learning: Linear regression Logistic regression

Support vector machines II

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Multivariate Transformation of Variables and Maximum Likelihood Estimation

Point Estimation: definition of estimators

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Outline. Point Pattern Analysis Part I. Revisit IRP/CSR

Dimensionality Reduction and Learning

å 1 13 Practice Final Examination Solutions - = CS109 Dec 5, 2018

IS 709/809: Computational Methods in IS Research. Simple Markovian Queueing Model

Lecture 9. Some Useful Discrete Distributions. Some Useful Discrete Distributions. The observations generated by different experiments have

Part 4b Asymptotic Results for MRR2 using PRESS. Recall that the PRESS statistic is a special type of cross validation procedure (see Allen (1971))

Artificial Intelligence Learning of decision trees

Lecture 7: Linear and quadratic classifiers

Estimation of Stress- Strength Reliability model using finite mixture of exponential distributions

Training Sample Model: Given n observations, [[( Yi, x i the sample model can be expressed as (1) where, zero and variance σ

Maximum Likelihood Estimation

Parametric Density Estimation: Bayesian Estimation. Naïve Bayes Classifier

6.867 Machine Learning

2SLS Estimates ECON In this case, begin with the assumption that E[ i

2006 Jamie Trahan, Autar Kaw, Kevin Martin University of South Florida United States of America

CHAPTER VI Statistical Analysis of Experimental Data

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

PROJECTION PROBLEM FOR REGULAR POLYGONS

Overview. Basic concepts of Bayesian learning. Most probable model given data Coin tosses Linear regression Logistic regression

Channel Models with Memory. Channel Models with Memory. Channel Models with Memory. Channel Models with Memory

Special Instructions / Useful Data

BASIC PRINCIPLES OF STATISTICS

Machine Learning. Topic 4: Measuring Distance

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

STK4011 and STK9011 Autumn 2016

CS 3710 Advanced Topics in AI Lecture 17. Density estimation. CS 3710 Probabilistic graphical models. Administration

Continuous Random Variables: Conditioning, Expectation and Independence

KLT Tracker. Alignment. 1. Detect Harris corners in the first frame. 2. For each Harris corner compute motion between consecutive frames

9.1 Introduction to the probit and logit models

Model Fitting, RANSAC. Jana Kosecka

Lecture 9: Tolerant Testing

Dimensionality reduction Feature selection

CHAPTER 6. d. With success = observation greater than 10, x = # of successes = 4, and

D KL (P Q) := p i ln p i q i

Functions of Random Variables

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections

Objectives of Multiple Regression

Probability and Statistics. What is probability? What is statistics?

Machine Learning. Introduction to Regression. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

STK3100 and STK4100 Autumn 2017

L5 Polynomial / Spline Curves

Pinaki Mitra Dept. of CSE IIT Guwahati

Solving Constrained Flow-Shop Scheduling. Problems with Three Machines

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1

PTAS for Bin-Packing

For combinatorial problems we might need to generate all permutations, combinations, or subsets of a set.

STK3100 and STK4100 Autumn 2018

Parametric Density Estimation: Bayesian Estimation. Naïve Bayes Classifier

Chapter 5 Properties of a Random Sample

Simulation Output Analysis

Chapter 14 Logistic Regression Models

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

Chapter 3 Sampling For Proportions and Percentages

Line Fitting and Regression

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA

Classification : Logistic regression. Generative classification model.

Econometric Methods. Review of Estimation

Median as a Weighted Arithmetic Mean of All Sample Observations

New Schedule. Dec. 8 same same same Oct. 21. ^2 weeks ^1 week ^1 week. Pattern Recognition for Vision

Discrete Mathematics and Probability Theory Fall 2016 Seshia and Walrand DIS 10b

Homework 1: Solutions Sid Banerjee Problem 1: (Practice with Asymptotic Notation) ORIE 4520: Stochastics at Scale Fall 2015

2. Independence and Bernoulli Trials

Continuous Distributions

= lim. (x 1 x 2... x n ) 1 n. = log. x i. = M, n

Block-Based Compact Thermal Modeling of Semiconductor Integrated Circuits

1 Onto functions and bijections Applications to Counting

Quantitative analysis requires : sound knowledge of chemistry : possibility of interferences WHY do we need to use STATISTICS in Anal. Chem.?

Regression and the LMS Algorithm

Bayes Decision Theory - II

LECTURE 2: Linear and quadratic classifiers

Transcription:

Grou M D L M Chater 4 No-Parameter Estmato X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty

Cotets Itroducto Parze Wdows K-Nearest-Neghbor Estmato Classfcato Techques The Nearest-Neghbor rule(-nn The Nearest-Neghbor rule(k-nn Dstace Metrcs X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 2

Bayes Rule for Classfcato To comute the osteror robablty, we eed to kow the ror robablty ad the lkelhood. Case I: has certa arametrc form Mamum-Lkelhood Estmato Bayesa Parameter Estmato Problems: P( ( ( ( ( The assumed arametrc form may ot ft the groudtruth desty ecoutered ractce, e.g., assumed arametrc form: umodal; groud-truth: multmodal X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 3

No-Parameter Estmato Case II: does t have arametrc form How? ( Let the data seak for themselves! Parze Wdows K -Nearest-Neghbor X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 4

Goals Estmate class-codtoal destes ( Estmate osteror robabltes P( X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 5

Desty Estmato Assume ( s cotuous, ad R s small Fudametal fact The robablty of a vector fall to a rego R: P( XR ( ' d' R (V R P R ( d' Gve eamles (..d. {, 2,, }, let K deote the radom varable reresetg umber of samles fallg to R, K wll take Bomal dstrbuto: k K ~ B(, PR P( K k PR ( PR k X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 6 R k + R samles

Desty Estmato Assume ( s cotuous, ad R s small Fudametal fact The robablty of a vector fall to a rego R: P( XR ( ' d' R (V R P R ( d' ( V R P R E[ K]/ ( E[ K] P R VR Let k R deote the actual umber of samles R X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 7 ( R k R V / R + R samles

Desty Estmato Use subscrt to take samle sze to accout k / V We hoe that: lm ( To do ths, we should have lmv 0 ( k ( R V / R ( + R lm k lm k / 0 samles X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 8

Desty Estmato ( k / V What tems ca be cotrolled? How? F V ad determe k Parze Wdows F k ad determe V k -Nearest-Neghbor X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 9

Parze Wdows ( k / V F V ad determe k Assume R s a d-dmesoal hyercube The legth of each edge s h d V h Determe k wth wdow fucto a.k.a. kerel fucto, otetal fucto. Emauel Parze (929- X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 0

Wdow fucto It defes a ut hyercube cetered at the org. h h / 2 0 otherwse h h h X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty

X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 2 Wdow fucto meas that falls wth the hyercube of volume v cetered at. k : # samles sde the hyercube cetered at, otherwse 0 2 / j j h h h h h h k

X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 3 Parze Wdow Estmato s ot lmted to be the hyercube wdow fucto defed revously. It could be ay df fucto h k V k / ( h V ( (u ( u u d Parze df

Parze Wdow Estmato ( V h ( s a df fucto? Set (- /h =u. V h d d V h u du Wdow fucto Beg df Wdow wdth Trag data Parze df X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 4

X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 5 Parze Wdow Estmato (: suerosto ( 叠加 of terolatos ( 插值 : cotrbutes to ( based o ts dstace from. h V ( h V ( ( - ( What s the effect of h (wdow wdth o the Parze df?

X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 6 Parze Wdow Estmato The effect of h d h h h V ( Affects the wdth (horzotal scale Affects the amltude (vertcal scale

Parze Wdow Estmato ( V h Suose φ(. beg a 2-d Gaussa df. The shae of δ ( wth decreasg values of h X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 7

Parze Wdow Estmato ( ( ( - V h Whe h s very large, δ ( wll be broad wth small amltude. P ( wll be the suerosto of broad, slowly chagg fuctos,.e., beg smooth wth low resoluto. Whe h s very small, δ ( wll be shar wth large amltude. P ( wll be the suerosto of shar ulses,.e., beg varable/ustable wth hgh resoluto. X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 8

X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 9 Parze Wdow Estmato Paze wdow estmatos for fve samles, suosg that φ(. s a 2-d Gaussa df. h V ( ( - (

Parze Wdow Estmato Covergece codtos To esure covergece,.e., lm E[ ( ] ( lmvar[ ( ] 0 We have the followg addtoal costrats: su( u u lmv 0 lm ( u u d u 0 lm V X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 20

X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 2 Illustratos Oe dmeso case: 2 / 2 2 ( e u u h h ( h h / h h / ~ N(0, X

X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 22 Illustratos Oe dmeso case: h h ( h h / h h / 2 / 2 2 ( e u u

X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 23 Illustratos Two dmeso case: h h / h h / h h 2 (

Classfcato Eamle Smaller wdow Larger wdow X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 24

Choosg Wdow Fucto V must aroach zero whe, but at a rate slower tha /, e.g., V V / The value of tal volume V s mortat. I some cases, a cell volume s roer for oe rego but usutable a dfferet rego. X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 25

k -Nearest Neghbor ( k / V F k ad the determe V To estmate (, we ca ceter a cell about ad let t grow utl t catures k samles, k s some secfed fucto of, e.g., k Prcled rule to choose k lm k lmv 0 X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 26

k -Nearest Neghbor Eght ots oe dmeso(=8, d= Red curve: k =3 Black curve: k =5 Thrty-oe ots two dmesos ( = 3, d=2 Black surface: k =5 X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 27

X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 28 Estmato of A Posteror robablty P ( =? c j j P, (, ( ( V k /, ( c j j V k /, ( k k

X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 29 Estmato of A Posteror robablty P ( =? c j j P, (, ( ( V k /, ( c j j V k /, ( k k The value of V or k ca be determed base o Parze wdow or k -earest-eghbor techque.

Nearest Neghbor Classfer Store all trag eamles Gve a ew eamle to be classfed, search for the trag eamle (, y whose s most smlar (or closest to, ad redct y. (Lazy Learg (, k P ( c k (, j j X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 30

Decso Boudares Decso Boudares The voroo dagram Gve a set of ots, a Voroo dagram descrbes the areas that are earest to ay gve ot. These areas ca be vewed as zoes of cotrol. X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 3

Decso Boudares Decso boudary s formed by oly retag these le segmet searatg dfferet classes. The more trag eamles we have stored, the more comle the decso boudares ca become. X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 32

Decso Boudares Wth large umber of eamles ad ose the labels, the decso boudary ca become asty! It ca be bad some tmes-ote the slads ths fgure, they are formed because of osy eamles. If the earest eghbor haes to be a osy ot, the redcto wll be correct. How to deal wth ths? X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 33

Effect of k Dfferet k values gve dfferet results: Large k roduces smoother boudares The mact of class label oses caceled out by oe aother. Whe k s too large, what wll hae. Oversmlfed boudares, e.g., k=n, we always redct the majorty class X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 34

How to Choose k? Ca we choose k to mmze the mstakes that we make o trag eamles? (trag error What s the trag error of earest-eghbor? Ca we choose k to mmze the mstakes that we make o test eamles? (test error X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 35

How to Choose k? How do trag error ad test error chage as we chage the value ok k? X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 36

Model Selecto Choosg k for k-nn s just oe of the may model selecto roblems we face mache learg. Model selecto s about choosg amog dfferet models Lear regresso vs. quadratc regresso K-NN vs. decso tree Heavly studed mache learg, crucal mortace ractce. If we use trag error to select models, we wll always choose more comle oes. X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 37

Model Selecto Choosg k for k-nn s just oe of the may model selecto roblems we face mache learg. Model selecto s about choosg amog dfferet models Lear regresso vs. quadratc regresso K-NN vs. decso tree Heavly studed mache learg, crucal mortace ractce. If we use trag error to select models, we wll always choose more comle oes. X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 38

Model Selecto We ca kee art of the labeled data aart as valdato data. Evaluate dfferet k values based o the redcto accuracy o the valdato data Choose k that mmze valdato error Valdato ca be vewed as aother ame for testg, but the ame testg s tycally reserved for fal evaluato urose, whereas valdato s mostly used for model selecto urose. X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 39

Model Selecto The mact of valdato set sze If we oly reserve oe ot our valdato set, should we trust the valdato error as a relable estmate of our classfer s erformace? The larger the valdato set, the more relable our model selecto choces are Whe the total labeled set s small, we mght ot be able to get a bg eough valdato set leadg to urelable model selecto decsos X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 40

Model Selecto K-fold Cross Valdato Perform learg/testg K tmes Each tme reserve oe subset for valdato set, tra o the rest Secal case: Learve oe-out crass valdato X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 4

Other ssues of knn It ca be comutatoally eesve to fd the earest eghbors! Seed u the comutato by usg smart data structures to quckly search for aromate solutos For large data set, t requres a lot of memory Remove umortat eamles X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 42

Fal words o KNN KNN s what we call lazy learg (vs. eager learg Lazy: learg oly occur whe you see the test eamle Eager: lear a model before you see the test eamle, trag eamles ca be throw away after learg Advatage: Cocetually smle, easy to uderstad ad ela Very fleble decso boudares Not much learg at all! Dsadvatage It ca be hard to fd a good dstace measure Irrelevat features ad ose ca be very detrmetal Tycally ca ot hadle more tha 30 attrbutes Comutatoal cost: requres a lot comutato ad memory X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 43

Dstace Metrcs Dstace Measuremet s a mortace factor for earest-eghbor classfer, e.g., To acheve varat atter recogto ad data mg results. The effect of chage uts X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 44

Dstace Metrcs Dstace Measuremet s a mortace factor for earest-eghbor classfer, e.g., To acheve varat atter recogto ad data mg results. The effect of chage uts X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 45

Proertes of a Dstace Metrc Noegatvty Reflevty Symmetry Tragle Iequalty D( a, b 0 D( a, b 0 ff a D( a, b D( b, a b D( a, b D( b, c D( a, c X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 46

Mkowsk Metrc (L Norm. L orm Mahatta or cty block dstace 2. L 2 orm Eucldea dstace 3. L orm Chessboard dstace L L d / L d ( a, b a b a b ( d 2 2 a, b a b 2 a b ( a, b a b d a b / / 2 ma( a b X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 47

Mkowsk Metrc (L Norm. L orm Mahatta or cty block dstace 2. L 2 orm Eucldea dstace 3. L orm Chessboard dstace L L d / L d ( a, b a b a b ( d 2 2 a, b a b 2 a b ( a, b a b d a b / / 2 ma( a b X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 48

Summary Basc settg for o-arametrc techques Let the data seak for themselves Parametrc form ot assumed for class-codtoal df Estmate class-codtoal df from trag eamles Make redctos based o Bayes Theorem Fudametal results desty estmato ( k / V X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 49

Summary ( k / V Parze Wdows F V ad the determe k ( V h Wdow fucto Beg df Wdow wdth Trag data Parze df X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 50

Summary k -Nearest-Neghbor F k ad the determe V F k ad the determe V To estmate (, we ca ceter a cell about ad let t grow utl t catures k samles, where s some secfed fucto of, e.g., k Prcled rule to choose k lm k lmv 0 X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty 5

Grou Ay Questo? X-Shu Xu @ SDU School of Comuter Scece ad Techology, Shadog Uversty