Elements of Machine Intelligence - I

Similar documents
x 1 Outline IAML: Logistic Regression Decision Boundaries Example Data

Pattern Recognition 2014 Support Vector Machines

IAML: Support Vector Machines

In SMV I. IAML: Support Vector Machines II. This Time. The SVM optimization problem. We saw:

The Kullback-Leibler Kernel as a Framework for Discriminant and Localized Representations for Visual Recognition

Support-Vector Machines

The blessing of dimensionality for kernel methods

COMP 551 Applied Machine Learning Lecture 9: Support Vector Machines (cont d)

k-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Part 3 Introduction to statistical classification techniques

Midwest Big Data Summer School: Machine Learning I: Introduction. Kris De Brabanter

Tree Structured Classifier

Five Whys How To Do It Better

What is Statistical Learning?

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) >

CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came.

Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017

COMP 551 Applied Machine Learning Lecture 4: Linear classification

Lab #3: Pendulum Period and Proportionalities

COMP9444 Neural Networks and Deep Learning 3. Backpropagation

Reinforcement Learning" CMPSCI 383 Nov 29, 2011!

MODULE FOUR. This module addresses functions. SC Academic Elementary Algebra Standards:

, which yields. where z1. and z2

This section is primarily focused on tools to aid us in finding roots/zeros/ -intercepts of polynomials. Essentially, our focus turns to solving.

Modelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification

A Correlation of. to the. South Carolina Academic Standards for Mathematics Precalculus

CAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank

Least Squares Optimal Filtering with Multirate Observations

Resampling Methods. Chapter 5. Chapter 5 1 / 52

Professional Development. Implementing the NGSS: High School Physics

[COLLEGE ALGEBRA EXAM I REVIEW TOPICS] ( u s e t h i s t o m a k e s u r e y o u a r e r e a d y )

ECE-271B. Nuno Vasconcelos ECE Department, UCSD

Assessment Primer: Writing Instructional Objectives

Agenda. What is Machine Learning? Learning Type of Learning: Supervised, Unsupervised and semi supervised Classification

Distributions, spatial statistics and a Bayesian perspective

Lesson Plan. Recode: They will do a graphic organizer to sequence the steps of scientific method.

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 13 Offprint

Data Mining: Concepts and Techniques. Classification and Prediction. Chapter February 8, 2007 CSE-4412: Data Mining 1

Simple Linear Regression (single variable)

Math Foundations 20 Work Plan

ENSC Discrete Time Systems. Project Outline. Semester

Kinetic Model Completeness

Dataflow Analysis and Abstract Interpretation

Lab 1 The Scientific Method

NUMBERS, MATHEMATICS AND EQUATIONS

Chapter 3: Cluster Analysis

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007

the results to larger systems due to prop'erties of the projection algorithm. First, the number of hidden nodes must

Trigonometric Ratios Unit 5 Tentative TEST date

MACHINE LEARNING FOR CLUSTER- GALAXY CLASSIFICATION

T Algorithmic methods for data mining. Slide set 6: dimensionality reduction

The general linear model and Statistical Parametric Mapping I: Introduction to the GLM

How do scientists measure trees? What is DBH?

CS 109 Lecture 23 May 18th, 2016

SURVIVAL ANALYSIS WITH SUPPORT VECTOR MACHINES

A New Evaluation Measure. J. Joiner and L. Werner. The problems of evaluation and the needed criteria of evaluation

The standards are taught in the following sequence.

Emphases in Common Core Standards for Mathematical Content Kindergarten High School

Statistics, Numerical Models and Ensembles

Eric Klein and Ning Sa

Getting Involved O. Responsibilities of a Member. People Are Depending On You. Participation Is Important. Think It Through

The Law of Total Probability, Bayes Rule, and Random Variables (Oh My!)

7 TH GRADE MATH STANDARDS

STATS216v Introduction to Statistical Learning Stanford University, Summer Practice Final (Solutions) Duration: 3 hours

Smoothing, penalized least squares and splines

Enhancing Performance of MLP/RBF Neural Classifiers via an Multivariate Data Distribution Scheme

SAMPLING DYNAMICAL SYSTEMS

Admin. MDP Search Trees. Optimal Quantities. Reinforcement Learning

Optimization of frequency quantization. VN Tibabishev. Keywords: optimization, sampling frequency, the substitution frequencies.

MODULE ONE. This module addresses the foundational concepts and skills that support all of the Elementary Algebra academic standards.

Aristotle I PHIL301 Prof. Oakes Winthrop University updated: 3/14/14 8:48 AM

Sequential Allocation with Minimal Switching

Support Vector Machines and Flexible Discriminants

Performance Bounds for Detect and Avoid Signal Sensing

Particle Size Distributions from SANS Data Using the Maximum Entropy Method. By J. A. POTTON, G. J. DANIELL AND B. D. RAINFORD

AP Physics Kinematic Wrap Up

Name: Block: Date: Science 10: The Great Geyser Experiment A controlled experiment

Checking the resolved resonance region in EXFOR database

Computational modeling techniques

Subject description processes

Admissibility Conditions and Asymptotic Behavior of Strongly Regular Graphs

WRITING THE REPORT. Organizing the report. Title Page. Table of Contents

Determining the Accuracy of Modal Parameter Estimation Methods

We say that y is a linear function of x if. Chapter 13: The Correlation Coefficient and the Regression Line

x x

Math Foundations 10 Work Plan

THE LIFE OF AN OBJECT IT SYSTEMS

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

Linear programming III

Physics 2010 Motion with Constant Acceleration Experiment 1

4th Indian Institute of Astrophysics - PennState Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur. Correlation and Regression

Linear Classification

Functional Form and Nonlinearities

EDA Engineering Design & Analysis Ltd

INTRODUCTION TO MACHINE LEARNING FOR MEDICINE

Transcription:

ECE-175A Elements f Machine Intelligence - I Ken Kreutz-Delgad Nun Vascncels ECE Department, UCSD Winter 2011

The curse The curse will cver basic, but imprtant, aspects f machine learning and pattern recgnitin We will cver a lt f grund, at the end f the quarter yu ll knw hw t implement a lt f things that may seem very cmplicated tday Hmewrk/Cmputer Assignments will cunt fr 30% f the verall grade. The hmewrk prblems will be graded A fr effrt. Eams: 1 mid-term, date TBA- 30% 1 final 40% (cvers everything) 2

Resurces Curse web page is accessible frm, http://dsp.ucsd.edu/~kreutz All materials, ecept hmewrk and eam slutins will be available there. Slutins will be available in my ffice pd. Curse Instructr: Ken Kreutz-Delgad, kreutz@ece.ucsd.edu, EBU 1-5605. Office hurs: Wednesday, Nn-1pm. Administrative Assistant: Travis Spackman (tspackman@ece.ucsd.edu), EBU1-5600, may smetimes be invlved in administrative issues. Tutr/Grader: Omar Nadeem, nadeem@ucsd.edu. Office hurs: Mn 4-6pm, Jacbs Hall (EBU-1) 4506 Wed 2:30-4:30pm, Jacbs Hall (EBU-1) 5706 4

Tets Required: Intrductin t Machine Learning, 2e Ethem Alpaydin, MIT Press, 2010 Suggested reference tets: Pattern Recgnitin and Machine Learning, C.M. Bishp, Springer 2007. Pattern Classificatin, Duda, Hart, Strk, Wiley, 2001 Prerequisites yu must knw well: Linear algebra, as in Linear Algebra, Strang, 1988 Prbability and cnditinal prbability, as in Fundamentals f Applied Prbability, Drake, McGraw-Hill, 1967 5

The curse Why Machine Learning? There are many prcesses in the wrld that are ruled by deterministic equatins E.g. f = ma; V = IR, Mawell s equatins, and ther physical laws. There are acceptable levels f nise, errr, and ther variability. In such dmains, we dn t need statistical learning. Learning is needed when there is a need fr predictins abut, r classificatin f, randm variables, Y: That represent events, situatins, r bjects in the wrld, and That may (r may nt) depend n ther factrs (variables) X, In a way that is impssible r t difficult t derive an eact, deterministic behaviral equatin fr. In rder t adapt t a cnstantly changing wrld. 6

Eamples and Perspectives Data-Mining viewpint: Large amunts f data that des nt fllw deterministic rules E.g. given an histry f thusands f custmer recrds and sme questins that I can ask yu, hw d I predict that yu will pay n time? Impssible t derive a thery fr this, must be learned While many assciate learning with data-mining, it is by n means the nly imprtant applicatin r viewpint. Signal Prcessing viewpint: Signals cmbine in ways that depend n hidden structure (e.g. speech wavefrms depend n language, grammar, etc.) Signals are usually subject t significant amunts f nise (which smetimes means things we d nt knw hw t mdel ) 7

Eamples (cnt d) Signal Prcessing viewpint: E.g. the Ccktail Party Prblem: Althugh there are all these peple talking ludly at nce, yu can still understand what yur friend is saying. Hw culd yu build a chip t separate the speakers? (As well as yur ear and brain can d.) Mdel the hidden dependence as a linear cmbinatin f independent surces + nise Many ther similar eamples in the areas f wireless, cmmunicatins, signal restratin, etc. 8

Eamples (cnt d) Perceptin/AI viewpint: It is a cmple wrld; ne cannt mdel everything in detail Rely n prbabilistic mdels that eplicitly accunt fr the variability Use the laws f prbability t make inferences. E.g., P( burglar alarm, n earthquake) is high P( burglar alarm, earthquake) is lw There is a whle field that studies perceptin as Bayesian inference In a sense, perceptin really is cnfirming what yu already knw. prirs + bservatins = rbust inference 9

Eamples (cnt d) Cmmunicatins Engineering viewpint: Detectin prblems: X channel Y Yu bserve Y and knw smething abut the statistics f the channel. What was X? This is the cannical detectin prblem. Fr eample, face detectin in cmputer visin: I see piel array Y. Is it a face? 10

What is Statistical Learning? Gal: given a functin f (.) y f ( ) and a cllectin f eample data-pints, learn what the functin f(.) is. This is called training. Tw majr types f learning: Unsupervised: nly X is knwn, usually referred t as clustering; Supervised: bth are knwn during training, nly X knwn at test time, usually referred t as classificatin r regressin. 11

Supervised Learning X can be anything, but the type f knwn data Y dictates the type f supervised learning prblem Y in {0,1} is referred t as Detectin r Binary Classificatin Y in {0,..., M-1} is referred t as (M-ary) Classificatin Y cntinuus is referred t as Regressin Theries are quite similar, and algrithms similar mst f the time We will emphasize classificatin, but will talk abut regressin when particularly insightful 12

Eample Classificatin f Fish: Fish rll dwn a cnveyer belt Camera takes a picture Decide if is this a salmn r a sea-bass? Q1: What is X? E.g. what features d I use t distinguish between the tw fish? This is smewhat f an artfrm. Frequently, the best is t ask dmain eperts. E.g., epert says use verall length and width f scales. 13

Q2: Hw t d Classificatin/Detectin? Tw majr types f classifiers Discriminant: determine the decisin bundary in feature space that best separates the classes; Generative: fit a prbability mdel t each class and then cmpare the prbabilities t find a decisin rule. A lt mre n the intimate relatinship between these tw appraches later! 14

Cautin Hw d we knw learning has wrked? We care abut generalizatin, i.e. accuracy utside the training set Mdels that are t pwerful n the training set can lead t ver-fitting: E.g. in regressin ne can always eactly fit n pts with plynmial f rder n-1. Is this gd? hw likely is the errr t be small utside the training set? Similar prblem fr classificatin Fundamental Rule: nly hld-ut test-set perfrmance results matter!!! 15

Generalizatin Gd generalizatin requires cntrlling the trade-ff between training and test errr training errr large, test errr large training errr smaller, test errr smaller training errr smallest, test errr largest This trade-ff is knwn by many names In the generative classificatin wrld it is usually due t the biasvariance trade-ff f the class mdels 16

Generative Mdel Learning Each class is characterized by a prbability density functin (class cnditinal density), the s-called prbabilistic generative mdel. E.g., a Gaussian. Training data is used t estimate the class pdf s. Overall, the prcess is referred t as density estimatin A nnparametric apprach wuld be t estimate the pdf s using histgrams: 17

Decisin rules Given class pdf s, Bayesian Decisin Thery (BDT) prvides us with ptimal rules fr classificatin Optimal here might mean minimum prbability f errr, fr eample We will Study BDT in detail, Establish cnnectins t ther decisin principles (e.g. linear discriminants) Shw that Bayesian decisins are usually intuitive Derive ptimal rules fr a range f classifiers 18

Features and dimensinality Fr mst f what we have seen s far Thery is well understd Algrithms available Limitatins characterized Usually, gd features are an art-frm We will survey traditinal techniques Bayesian Decisin Thery (BDT) Linear Discriminant Analysis (LDA) Principal Cmpnent Analysis (PCA) and sme mre recent methds Independent Cmpnents Analysis (ICA) Supprt Vectrs Machines (SVM) 19

Discriminant Learning Instead f learning mdels (pdf s) and deriving a decisin bundary frm the mdel, learn the bundary directly There are many such methds. The simplest case is the s-called hyperplane classifier Simply find the hyperplane that best separates the classes, assuming linear separability f the features: 20

Supprt Vectr Machines Hw d we d this? The mst recently develped classifiers are based n the use f supprt vectrs. One transfrms the data int linearly separable features using kernel functins. The best perfrmance is btained by maimizing the margin This is the distance between decisin hyperplane and clsest pint n each side 21

Supprt vectr machines Fr separable classes, the training errr can be made zer by classifying each pint crrectly This can be implemented by slving the ptimizatin prblem w * arg ma w margin( w ) s. t l crrectlyclassified l This is an ptimizatin prblem with n cnstraints, nt trivial but slvable The slutin is the supprt-vectr machine (pints n the margin are the supprt vectrs ) w* 22

Kernels and Linear Separability The trick is t map the prblem t a higher dimensinal space: nn-linear bundary in riginal space becmes hyperplane in transfrmed space 2 1 This can be dne efficiently by the intrductin f a kernel functin Kernel-based feature transfrmatin Classificatin prblem is mapped int a reprducing kernel Hilbert space Kernels are at the cre f the success f SVM classificatin Mst classical linear techniques (e.g. PCA, LDA, ICA, etc.) can be kernelized with significant imprvement n 1 3 2 23

Unsupervised learning S far, we have talked abut supervised learning: We knw the class f each pint In may prblems this is nt feasible t d (e.g. image segmentatin) 24

Unsupervised learning In these prblems we are given X, but nt Y The standard algrithms fr this are iterative: Start frm best guess Given Y-estimates fit class mdels Given class mdels re-estimate Y-estimates The prcedure usually cnverges t an ptimal slutin, althugh nt necessarily the glbal ptimum Perfrmance wrse than that f supervised classifier, but this is the best we can d. 25

Reasns t take the curse T learn abut Classificatin and Statistical Learning tremendus amunt f thery but things invariably g wrng t little data, nise, t many dimensins, training sets that d nt reflect all pssible variability, etc. T learn that gd learning slutins require: knwledge f the dmain (e.g. these are the features t use ) knwledge f the available techniques, their limitatins, etc. In the absence f either f these, yu will fail! T learn skills that are highly valued in the marketplace! 26