Non-parametric techniques. Instance Based Learning. NN Decision Boundaries. Nearest Neighbor Algorithm. Distance metric important

Similar documents
Non-parametric techniques. Instance Based Learning. NN Decision Boundaries. Nearest Neighbor Algorithm. Distance metric important

Tasty Coffee example

CptS 570 Machine Learning School of EECS Washington State University. CptS Machine Learning 1

Ensamble methods: Bagging and Boosting

Ensamble methods: Boosting

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

Lecture 33: November 29

Sequential Importance Resampling (SIR) Particle Filter

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power

Smoothing. Backward smoother: At any give T, replace the observation yt by a combination of observations at & before T

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD

A Bayesian Approach to Spectral Analysis

Dimitri Solomatine. D.P. Solomatine. Data-driven modelling (part 2). 2

Kriging Models Predicting Atrazine Concentrations in Surface Water Draining Agricultural Watersheds

Online Convex Optimization Example And Follow-The-Leader

Speaker Adaptation Techniques For Continuous Speech Using Medium and Small Adaptation Data Sets. Constantinos Boulis

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles

Article from. Predictive Analytics and Futurism. July 2016 Issue 13

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017

Lesson 2, page 1. Outline of lesson 2

1 Review of Zero-Sum Games

Ensemble Confidence Estimates Posterior Probability

Block Diagram of a DCS in 411

Bias in Conditional and Unconditional Fixed Effects Logit Estimation: a Correction * Tom Coupé

Understanding the asymptotic behaviour of empirical Bayes methods

Internet Traffic Modeling for Efficient Network Research Management Prof. Zhili Sun, UniS Zhiyong Liu, CATR

Estimation of Poses with Particle Filters

ACE 562 Fall Lecture 5: The Simple Linear Regression Model: Sampling Properties of the Least Squares Estimators. by Professor Scott H.

An recursive analytical technique to estimate time dependent physical parameters in the presence of noise processes

GMM - Generalized Method of Moments

Zürich. ETH Master Course: L Autonomous Mobile Robots Localization II

3.1 More on model selection

L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS. NA568 Mobile Robotics: Methods & Algorithms

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED

Institute for Mathematical Methods in Economics. University of Technology Vienna. Singapore, May Manfred Deistler

Unit Root Time Series. Univariate random walk

1 Evaluating Chromatograms

ES.1803 Topic 22 Notes Jeremy Orloff

R t. C t P t. + u t. C t = αp t + βr t + v t. + β + w t

Deep Learning: Theory, Techniques & Applications - Recurrent Neural Networks -

Reliability of Technical Systems

Notes on online convex optimization

Introduction to Mobile Robotics

Nature Neuroscience: doi: /nn Supplementary Figure 1. Spike-count autocorrelations in time.

Chapter 2. Models, Censoring, and Likelihood for Failure-Time Data

Distribution of Estimates

20. Applications of the Genetic-Drift Model

SEIF, EnKF, EKF SLAM. Pieter Abbeel UC Berkeley EECS

Econ107 Applied Econometrics Topic 7: Multicollinearity (Studenmund, Chapter 8)

Lecture 10 Estimating Nonlinear Regression Models

Pattern Classification (VI) 杜俊

The Rosenblatt s LMS algorithm for Perceptron (1958) is built around a linear neuron (a neuron with a linear

Probabilistic Robotics

1.1. Example: Polynomial Curve Fitting 4 1. INTRODUCTION

Notes for Lecture 17-18

Content-Based Shape Retrieval Using Different Shape Descriptors: A Comparative Study Dengsheng Zhang and Guojun Lu

Y. Xiang, Learning Bayesian Networks 1

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle

Biol. 356 Lab 8. Mortality, Recruitment, and Migration Rates

STRUCTURAL CHANGE IN TIME SERIES OF THE EXCHANGE RATES BETWEEN YEN-DOLLAR AND YEN-EURO IN

Lecture 3: Exponential Smoothing

STATE-SPACE MODELLING. A mass balance across the tank gives:

Matlab and Python programming: how to get started

Robust estimation based on the first- and third-moment restrictions of the power transformation model

Empirical Process Theory

CS376 Computer Vision Lecture 6: Optical Flow

Air Traffic Forecast Empirical Research Based on the MCMC Method

Lecture 10 - Model Identification

RL Lecture 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1

An introduction to the theory of SDDP algorithm

References are appeared in the last slide. Last update: (1393/08/19)

The average rate of change between two points on a function is d t

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model

Robot Motion Model EKF based Localization EKF SLAM Graph SLAM

Forecasting optimally

2 1. INTRODUCTION. Figure 1.1. Examples of hand-written digits taken from US zip codes.

CMU-Q Lecture 3: Search algorithms: Informed. Teacher: Gianni A. Di Caro

5 The fitting methods used in the normalization of DSD

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB

OBJECTIVES OF TIME SERIES ANALYSIS

IB Physics Kinematics Worksheet

Distribution of Least Squares

A Video Vehicle Detection Algorithm Based on Improved Adaboost Algorithm Weiguang Liu and Qian Zhang*

Stability. Coefficients may change over time. Evolution of the economy Policy changes

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010

Physical Limitations of Logic Gates Week 10a

Comparing Means: t-tests for One Sample & Two Related Samples

Let us start with a two dimensional case. We consider a vector ( x,

Solutions to the Exam Digital Communications I given on the 11th of June = 111 and g 2. c 2

Augmented Reality II - Kalman Filters - Gudrun Klinker May 25, 2004

Supplement for Stochastic Convex Optimization: Faster Local Growth Implies Faster Global Convergence

Christos Papadimitriou & Luca Trevisan November 22, 2016

Lecture 12: Multiple Hypothesis Testing

Suggested Practice Problems (set #2) for the Physics Placement Test

Vehicle Arrival Models : Headway

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still.

CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK

Linear Gaussian State Space Models

Transcription:

on-parameric echniques Insance Based Learning AKA: neares neighbor mehods, non-parameric, lazy, memorybased, or case-based learning Copyrigh 2005 by David Helmbold 1 Do no fi a model (as do LTU, decision ree, ec) Includes eares eighbor and densiy esimaion mehods Variable sized hypohesis space, like Decision rees lazy Hypohesis (no gradien descen, opimizaion, or search) (under \lazy in Weka) 2 eares eighbor Algorihm Insances (x s) are vecor of real Sore he n raining examples (x 1, y 1 ),, (x n, y n ) To predic on new x, find x i closes o x and predic wih y i Commens: o jus simple able lookup Can avoid by minimizing squared disance Decision Boundaries Vornoi diagram, very flexible, ges more complicaed wih addiional poins - - 3 4 eares eighbor Applicaions Asronomy (classifying objecs) Medicine - diagnosis Objec deecion Characer recogniion (shape maching) Many ohers (basic heory from1950 s and 60 s) Disance meric imporan Consider expensive houses wih feaures: umber of bedrooms (1 o 5) Lo size in acres (1/6 o 1/2 plus ail) House square fee (1200 o 3000) Difference in square fee dominaes Irrelevan aribues (e.g. how far away was owner born? ) add variabiliy Correlaed aribues also bad 5 6 1

Irrelevan aribue example Le x 1 [0,1] deermine class: y 1 iff x 1 > 0.3 Consider predicing on (0,0) given daa (0.1, x 2 ) labeled 0 (0.5, x 2 ) labeled 1 where x 2, x 2 random draws from [0,1] Chance of error ~ 15%! Some ricks Rescale aribues o mean 0 variance 1 Use w j on j h componen: Dis(x, x ) j w j (x j -x j ) 2 w j I(x j, y) ( muual informaion ) Mahalanobis Disance (covariance Σ, like LDA) Dis(x,x ) (x-x ) T Σ -1 (x-x ) 7 8 Curse of Dimensionaliy K-d rees As number of aribues (d) goes up so does volume Consider 1000 raining poins in [0,1] d where does each poin predic? When d1, inerval per poin ~0.001 When d2, area per poin ~0.001, lengh of side abou 0.032 When d10, volume per poin ~0.001, lengh of side ~ 0.5 eed exponenially many poins (in d) o ge good coverage 9 Grealy speed up finding neares neighbor Like binary search ree, bu organized around dimensions Each node ess single dimension agains hreshold (median) Can use highes variance dimension or cycle hrough dimensions Growing a good K-d ree can be expensive 10 oise can cause problems oise example Assume ha rue labels always 1, bu noise randomly corrups labels 10% of he ime (making hem 0) Bayes opimal: predic 1, es error is 10% eares eighbor: use closes raining poin, 90% of he ime predic 1, 10% of hese predicions wrong 10% of he ime predic 0, 90% of hese predicions wrong Overall wrong 18% of he ime 11 12 2

K-neares neighbor Algorihm: Find he closes k poins and predic wih heir majoriy voe K- is Bayes opimal in limi as k and raining se size go o (known since 1960 s) Edied Key Idea: Reduce memory and compuaion by only soring imporan poins Heurisic: Discard hose poins correcly prediced by ohers (or ake incorrecly prediced poins) Remaining poins concenraed on he decision boundary Finding a smalles subse of poins correcly labeling ohers is P-complee. 13 14 Insance Based Densiy Esimaion Hisogram mehod: Break insance space X ino bins Use sample falling ino bin o esimae probabiliies Hisogram mehod is parameric, no insance based Has edge effecs 15 Smooher mehod: Add slice of probabiliy o area cenered a example raher han o predeermined bin In general, have a Kernel funcion ha ells how probabiliy added (see Duda and Har) Gaussians common Also called Parzon Windows Ofen a widh parameer conrols smoohing (like σ in Gaussians) 16 disribuion sample disribuion σ2 bins disks σ5 σ30 17 18 3

Final poins Can use smoohed neares neighbor also (whole sample voes on predicions wih weighs depending on disance o new poin), bu may be compuaionally expensive Can fix k and use disance o kh neares o esimae densiy Migh use cross validaion o esimae smoohing parameer Can use densiy esimaion for P(x class) and hen predic class labels using Bayes rule onparameric Regression Aka smoohing models Regressogram ĝ ( x) where b b 1 ( x,x ) 1 ( x,x ) r b( x,x ) # 1 if x "! 0 oherwise is in he same bin wih x 19 20 h is he bin size 21 22 Running Mean/Kernel Smooher Running mean smooher ĝ ( x) where 1 ) x * x & w ' r ( h % ) x * x & w ' 1 ( h % # 1 if u < 1 w( u) "! 0 oherwise Running line smooher (locally linear) Kernel smooher ' x ( x! K % " r 1 ( ) & h ĝ x # ' x ( x! K % " 1 & h # where K( ) is Gaussian Addiive models (Hasie and Tibshirani, 1990) 23 24 4

25 26 How o Choose k or h? When k or h is small, single insances maer; bias is small, variance is large (undersmoohing): High complexiy As k or h increases, we average over more insances and variance decreases bu bias increases (oversmoohing): Low complexiy Cross-validaion is used o fine-une k or h. Tree and comparison Model Daa inerpreable Missing values oise/ouliers Decision Trees Trees - flexible mixed If small ree Tricks Good wih pruning eares eighbor Insance based, flexible Usually umeric Only in 1 or 2 dimensions Training se no, bu ok for es poins Good wih knn 27 28 Tree and K Robusness Decision ree eares neighbor Monoone ransformaion Irrelevan feaures Compuaion ime Grea Fair OK Very bad Very bad Lazy - expensive 29 5