Visual Object Detection

Size: px
Start display at page:

Download "Visual Object Detection"

Transcription

1 Visual Object Detection Ying Wu Electrical Engineering and Computer Science Northwestern University, Evanston, IL / 47

2 Visual Object Detection I Detecting an object in an image I I output: location and size of all instances of this object class Challenges I I I I I I what is an object? how to describe the object? how likely is an image an object of interest? how to handle the scale changes? how to handle the orientation of the target? how to handle all sorts of visual variabilities? 2 / 47

3 Outline Basics in Detection Theory Boosting-based Detection Feature Template-based Detection Deformable Parts Model (DPM) based Detection Deep Network based Detection 3 / 47

4 Action and Risk Classes: {ω 1, ω 2,..., ω c } Actions: {α 1, α 2,..., α a } Loss: λ(α k ω i ) Conditional risk: R(α k x) = c λ(α k ω i )p(ω i x) i=1 Decision function, α(x), specifies a decision rule. Overall risk: R = x R(α(x) x)p(x)dx It is the expected loss associated with a given decision rule. 4 / 47

5 Bayesian Decision and Bayesian Risk Bayesian decision α = argmin R(α k x) k This leads to the minimum overall risk. (why?) Bayesian risk: the minimum overall risk R = R(α x)p(x)dx Bayesian risk is the best one can achieve. x 5 / 47

6 Example: Minimum-error-rate classification Let s have a specific example of Bayesian decision In classification problems, action α k corresponds to ω k Let s define a zero-one loss function λ(α k ω i ) = { 0 i = k 1 i k i, k = 1,..., c This means: no loss for correct decisions & all errors are equal It easy to see: the conditional risk error rate R(α k x) = i k P(ω i x) = 1 P(ω k x) Bayesian decision rule minimum-error-rate classification Decide ω k if P(ω k x) > P(ω i x) i k 6 / 47

7 Classifier and Discriminant Functions Discriminant function: g i (x), i = 1,..., C, assigns ω i to x Classifier Examples: x ω i if g i (x) > g j (x) g i (x) = P(ω i x) g i (x) = P(x ω i )P(ω i ) j i g i (x) = ln P(x ω i ) + ln P(ω i ) Note: the choice of D-function is not unique, but they may give equivalent classification result. Decision region: the partition of the feature space x R i if g i (x) > g j (x) j i Decision boundary: 7 / 47

8 Miss Detections vs. False Positives errors: miss detections AND false positives No free lunch! 8 / 47

9 Visual Detection: Three Key Issues Target representation Rule-based models Shape template-based models Image appearance-based models Visual feature-based models Pattern classification various choices of classifers training Effective search determing the location: scanning all pixels locations determinng the scale: scanning the scale space how to make the search faster? 9 / 47

10 Outline Basics in Detection Theory Boosting-based Detection Feature Template-based Detection Deformable Parts Model (DPM) based Detection Deep Network based Detection 10 / 47

11 Example: front-view face detection locate the faces in an image challenges: large variations in the visual appearances due to: scale and/or rotation illumination facial expression partial occlusion 11 / 47

12 Viola-Jones Detector Feature Simple Harr wavelet features Classifier AdaBoost feature selection Smart ideas to speed things up integral image cascading classifiers 12 / 47

13 Feature: Harr-like wavelet A bank of Harr-like wavelet filters Applying a filter to a pixel location produces a feature How many such features does a detection window generate? How to compute such features rapidly? 13 / 47

14 A Smart Idea: Integral Image The value of the integral image at (x, y) is the sum of all the pixels above and to the left II (x, y) = I (u, v) u x,v y This is done only once for an image Then the computation of the sum of all pixels within any rectangular region is a constant complexity Ex: the sum within D is done via / 47

15 Weak Classifier Weak features and weak classifiers a weak classifier uses only one simple feature for classification { x if p j f j (x) < p j θ j h j (x) = 0 otherwise a weak classifer: (f j, θ j, p j ) Why not combining multiple weak classifiers? 15 / 47

16 AdaBoost for Feature Selection 16 / 47

17 Feature Selection and Combination Strong classifier T 1 α t h t (x) 1 2 h(x) = t=1 0 otherwise Does the selection make senses? T α t t=1 17 / 47

18 Speeding Up: Attentional Cascade Motivation most deteciton windwos contain non-faces thus most computation is wasted Idea? can we save computation on non-faces? early rejection? using simple classifiers for screening. Does the selection make senses? 18 / 47

19 Designing Cascade Design parameters # of cascade stages # of features for each stage parameters of each stage Example: a 32-stage classifier S1: 2-feature, detect 100% faces and reject 60% non-faces S2: 5-feature, detect 100% faces and reject 80% non-faces S3-5: 20-feature S6-7: 50-feature S8-12: 100-feature S13-32: 200-feature Designing a good cascade needs tremendous engineering efforts 19 / 47

20 Cascade Performance A 200-feature classifier vs a 10-stage 20-feature cascade Similar accuracy, but cascade is 10 times faster 20 / 47

21 Training Images Data collection: positive data and negative data Validation set 21 / 47

22 Results 22 / 47

23 Summary Advantages simple: easy to implement fast: real-time performances Limitations and open problems difficult to design cascade cannot handle out-of-plane rotation difficult to handle partial occlusion 23 / 47

24 Outline Basics in Detection Theory Boosting-based Detection Feature Template-based Detection Deformable Parts Model (DPM) based Detection Deep Network based Detection 24 / 47

25 Viola-Jones Detector for Pedestrain Detection use 45,000 possible features OK results, but still far from satisfactory 25 / 47

26 From Face to Pedstrain Detection I I I articulated poses various views unpredictable cloth 26 / 47

27 Histogram of Gradient Orientations Bining of the gradient orientations within a cell quantization of the orientations of the image gradient weightedby the maganitude of the gradient (not a histogram) Spatial combination (R-HoG and C-HoG) to form a block the purpose is to normalize the local histograms with the block lead to a normalized descriptor A HoG descriptor represents a block 27 / 47

28 a 7 15 array a 3,780-D vector 28 / 47 HoG Feature HoG descriptor dimension (a 36-D vector) using 9 bins for the orientation quantization ([0, π)) cell size: 8 8 pixels, and block size: 2 2 cells HoG-based human representation (an array of HoG vectors) detection window size: stride (i.e., block overlap): half of the block size (8 pixels)

29 HoG + Linear SVM (a) average gradient image over the training samples (b) each pixel is the max positive SVM weight for the block (b) each pixel is the max negative SVM weight for the block 29 / 47

30 HoG + Linear SVM (d) a test image (e) its R-HOG features (7x15x36) (f) the descriptor weighted by the positive SVM weights (g) the descriptor weighted by the negative SVM weights 30 / 47

31 Examples: before clustering I Search over scale (scaling factor 1.05) 31 / 47

32 Examples Clustering needs to be performed to (1) group multiple detections, (2) reduce false postivies 32 / 47

33 Outline Basics in Detection Theory Boosting-based Detection Feature Template-based Detection Deformable Parts Model (DPM) based Detection Deep Network based Detection 33 / 47

34 Deformable Parts Model Large variations in visual appearances challenge object detection Such variations are induced by: deformation of the target s shape structual composition large appearance chagnes view changes etc. These variations may be tremendous Rigid templates and single deformable models are not able to capture such huge variations Part-based deformable models modeling the structual composition and variations strong expressive power sharing computation a rich model 34 / 47

35 Model: features and filters Scale-space image representation (or image pyramid) p = (x, y, l) is a position (x, y) in the l-th level of the pyramid H(p) is the raw visual feature pyramid (a tensor) Visual features: φ(h, p, w, h) located at p supported by the w h subwindow (whose top-left corner is p) using the H as the raw feature represented as a vector by stacking features in subwindow denoted by φ(p) for short, and w, h are predefined parameters φ(p) are visual observations Filters: F a 2D filter, with size w h represented as a 1D vector by stacking the elements concept: weighting the features in the w h subwindow filter response: < F, φ(p) > F will be learned Existance of an object an object is encoded by a filter the response of such a filter at p indicates how likely this object exists at p 35 / 47

36 Model: configuration and springs The DPM model of an object with n parts is defined by (F 0, P 1,..., P n, b) A root filter F0 : covers an entire object in lower resolution n fine part filters Pi : covers smaller parts in higher resolution b is a real-value bias term A part filter: P i = (F i, v i, d i ) Fi is the filter for the i-th part vi is a 2D vector: an anchor position for this part w.r.t. root di is a 4D vector: coefficients for the deformable cost displacement: [dx i, dy i ] = [x i, y i ] (2[x 0, y 0] + v i ) Define φ d (p i, p 0) = φ d (dx i, dy i ) = [dx i, dy i, dxi 2, dyi 2 ] Configuration and Spring Configuration: (p0, p 1, p 2,..., p n ) Spring : the strength of a spring is di. A star topology Every part is connected to the root No connection among parts Configuration is to be inferred (or estimated) for each image Spring is to be learned based on all training images 36 / 47

37 Model: parameters and the linear model Model parameters Λ = (F 0, F 1,..., F n, d 1,..., d n, b) Model obervations (evidence) Y = φ(p) Model target variable X = p0, the location of the root Model laten variable Z = (p1,..., p n ), the part locations To evaluate a complete hypothesis (X, Z) s(x, Z Y) = s(p 0, p 1,..., p n ) n = < F i, φ(p i ) > i=0 Another way: a linear form where n < d i, φ d (p i, p 0 ) > +b i=1 s(x, Z Y) =< β, ψ(x, Z) > β = [F 0,..., F n, d 1,... d n, b] ψ(x, Z) = [φ(p 0 ),..., φ(p n ), φ d (p 1, p 0 ),..., φ d (p n, p 0 ), 1] 37 / 47

38 Inference The objective of inference is to find s(p 0 ) s(p 0 Y) = For each part, define { D i (p 0 ) = max p i Then, easy to see: max s(p 0, p 1,..., p n Y) (p 1,...,p n) Fi T s(p 0 ) = F T 0 φ(p 0 ) + } φ(p i ) di T φ d (p i, p 0 ) n D i (p 0 ) + d D i (p 0 ) is the maximum contribution of the i-th part to the score of the root at p 0 (i.e., optimal subpath property) This is a very simple dynamic programming problem The part localization is done via back tracking in the DP i=1 p i = arg max p i D i (p 0 ) 38 / 47

39 Inference: Computing D i (p 0 ) The key in DPM inference is to compute D i (p 0 ). { } D i (p 0 ) = max F T p i φ(p i ) di T φ d (p i, p 0 ) i The first term Fi T φ(p i ) the response map of the part filter Fi independent of the root location p 0. easy to compute The second term is di T φ d (p i, p 0 ) penelty of the placement of pi for a given root position p 0 easy to compute as well The major issue is the maximization over p i if considering all possible choices of pi, although it is linear, it wastes a lot of computation This is implemented via a generalized distance tranform. This leads to a transformed response map [transforming from the part filter response map Fi T φ(p i ) to D i (p 0 )] 39 / 47

40 Inference: process 40 / 47

41 Learning: Latent SVM Consider a classifer (discriminative function) in the following form f β (x) = max z Z(x) βt Φ(x, z) where β is a vector of the model parameters, and z are the latent values The set Z(x) defines the domain of z given an x Classification is obtained based on the sign of f β (x) Given training data D = ((x 1, y 1 ),..., (x n, y n )), where y i { 1, 1} minimzing the following objective function L D (β) = 1 n 2 β 2 + C max(0, 1 y i f β (x i )) i=1 where max(0, 1 y i f β (x i ) is the standard hinge loss. C controls the regularization. Note: If Z(x i ) = 1, then it degenerates to linear SVM. 41 / 47

42 Learning: Solving Latent SVM Denote by Z p the latent value for each positive training sample For a positive example, set Z(x i ) = {z i } where z i is the latent value specified for x i by Z p. Define an auxiliary objective function L D (β, Z p ) = L D(Zp)(β) Property: L D (β) = min Z p L D (β, Z p ) i.e., L D (β, Z p ) bounds the LSVM objective. Now, we minimize L D (β, Z p ) instead Minimzing L D (β, Z p ) Relabeling positive examples: optimize L D (β, Z p ) over Z p by selecting the highest-score latent values for each pos. example: z i = arg max z Z(x i ) βt Φ(x i, z) Estimating β: optimize LD (β, Z p ) over β by solving the convex optimziation problem defined by L D(Zp)(β) L D (β, Z p ) = 1 2 β 2 + C max(0, 1 f β (x i )) {x i y i (x i )=1} 42 / 47

43 Some DPM models 43 / 47

44 DPM on PASCAL 2007 Dataset 44 / 47

45 Outline Basics in Detection Theory Boosting-based Detection Feature Template-based Detection Deformable Parts Model (DPM) based Detection Deep Network based Detection 45 / 47

46 Rowley-Baluja-Kanade s Detector Train a multilayer neural network (1998) Receptive fields An early attempt of using neural network for face detection Tremendous Deep networks for face detection nowadays 46 / 47

47 Some Results 47 / 47

Discriminative part-based models. Many slides based on P. Felzenszwalb

Discriminative part-based models. Many slides based on P. Felzenszwalb More sliding window detection: ti Discriminative part-based models Many slides based on P. Felzenszwalb Challenge: Generic object detection Pedestrian detection Features: Histograms of oriented gradients

More information

A Discriminatively Trained, Multiscale, Deformable Part Model

A Discriminatively Trained, Multiscale, Deformable Part Model A Discriminatively Trained, Multiscale, Deformable Part Model P. Felzenszwalb, D. McAllester, and D. Ramanan Edward Hsiao 16-721 Learning Based Methods in Vision February 16, 2009 Images taken from P.

More information

Bayesian Decision and Bayesian Learning

Bayesian Decision and Bayesian Learning Bayesian Decision and Bayesian Learning Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 30 Bayes Rule p(x ω i

More information

Boosting: Algorithms and Applications

Boosting: Algorithms and Applications Boosting: Algorithms and Applications Lecture 11, ENGN 4522/6520, Statistical Pattern Recognition and Its Applications in Computer Vision ANU 2 nd Semester, 2008 Chunhua Shen, NICTA/RSISE Boosting Definition

More information

Object Detection Grammars

Object Detection Grammars Object Detection Grammars Pedro F. Felzenszwalb and David McAllester February 11, 2010 1 Introduction We formulate a general grammar model motivated by the problem of object detection in computer vision.

More information

Reconnaissance d objetsd et vision artificielle

Reconnaissance d objetsd et vision artificielle Reconnaissance d objetsd et vision artificielle http://www.di.ens.fr/willow/teaching/recvis09 Lecture 6 Face recognition Face detection Neural nets Attention! Troisième exercice de programmation du le

More information

Face detection and recognition. Detection Recognition Sally

Face detection and recognition. Detection Recognition Sally Face detection and recognition Detection Recognition Sally Face detection & recognition Viola & Jones detector Available in open CV Face recognition Eigenfaces for face recognition Metric learning identification

More information

PCA FACE RECOGNITION

PCA FACE RECOGNITION PCA FACE RECOGNITION The slides are from several sources through James Hays (Brown); Srinivasa Narasimhan (CMU); Silvio Savarese (U. of Michigan); Shree Nayar (Columbia) including their own slides. Goal

More information

Two-Layered Face Detection System using Evolutionary Algorithm

Two-Layered Face Detection System using Evolutionary Algorithm Two-Layered Face Detection System using Evolutionary Algorithm Jun-Su Jang Jong-Hwan Kim Dept. of Electrical Engineering and Computer Science, Korea Advanced Institute of Science and Technology (KAIST),

More information

ECE 661: Homework 10 Fall 2014

ECE 661: Homework 10 Fall 2014 ECE 661: Homework 10 Fall 2014 This homework consists of the following two parts: (1) Face recognition with PCA and LDA for dimensionality reduction and the nearest-neighborhood rule for classification;

More information

COS 429: COMPUTER VISON Face Recognition

COS 429: COMPUTER VISON Face Recognition COS 429: COMPUTER VISON Face Recognition Intro to recognition PCA and Eigenfaces LDA and Fisherfaces Face detection: Viola & Jones (Optional) generic object models for faces: the Constellation Model Reading:

More information

2D Image Processing Face Detection and Recognition

2D Image Processing Face Detection and Recognition 2D Image Processing Face Detection and Recognition Prof. Didier Stricker Kaiserlautern University http://ags.cs.uni-kl.de/ DFKI Deutsches Forschungszentrum für Künstliche Intelligenz http://av.dfki.de

More information

Beyond Spatial Pyramids

Beyond Spatial Pyramids Beyond Spatial Pyramids Receptive Field Learning for Pooled Image Features Yangqing Jia 1 Chang Huang 2 Trevor Darrell 1 1 UC Berkeley EECS 2 NEC Labs America Goal coding pooling Bear Analysis of the pooling

More information

Lecture 13 Visual recognition

Lecture 13 Visual recognition Lecture 13 Visual recognition Announcements Silvio Savarese Lecture 13-20-Feb-14 Lecture 13 Visual recognition Object classification bag of words models Discriminative methods Generative methods Object

More information

CS 231A Section 1: Linear Algebra & Probability Review

CS 231A Section 1: Linear Algebra & Probability Review CS 231A Section 1: Linear Algebra & Probability Review 1 Topics Support Vector Machines Boosting Viola-Jones face detector Linear Algebra Review Notation Operations & Properties Matrix Calculus Probability

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

Learning theory. Ensemble methods. Boosting. Boosting: history

Learning theory. Ensemble methods. Boosting. Boosting: history Learning theory Probability distribution P over X {0, 1}; let (X, Y ) P. We get S := {(x i, y i )} n i=1, an iid sample from P. Ensemble methods Goal: Fix ɛ, δ (0, 1). With probability at least 1 δ (over

More information

Achieving scale covariance

Achieving scale covariance Achieving scale covariance Goal: independently detect corresponding regions in scaled versions of the same image Need scale selection mechanism for finding characteristic region size that is covariant

More information

Differential Motion Analysis (II)

Differential Motion Analysis (II) Differential Motion Analysis (II) Ying Wu Electrical Engineering and Computer Science Northwestern University, Evanston, IL 60208 yingwu@northwestern.edu http://www.eecs.northwestern.edu/~yingwu 1/30 Outline

More information

Bayesian Decision Theory

Bayesian Decision Theory Bayesian Decision Theory Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 1 / 46 Bayesian

More information

Minimum Error-Rate Discriminant

Minimum Error-Rate Discriminant Discriminants Minimum Error-Rate Discriminant In the case of zero-one loss function, the Bayes Discriminant can be further simplified: g i (x) =P (ω i x). (29) J. Corso (SUNY at Buffalo) Bayesian Decision

More information

Machine Learning Basics

Machine Learning Basics Security and Fairness of Deep Learning Machine Learning Basics Anupam Datta CMU Spring 2019 Image Classification Image Classification Image classification pipeline Input: A training set of N images, each

More information

Metric Embedding of Task-Specific Similarity. joint work with Trevor Darrell (MIT)

Metric Embedding of Task-Specific Similarity. joint work with Trevor Darrell (MIT) Metric Embedding of Task-Specific Similarity Greg Shakhnarovich Brown University joint work with Trevor Darrell (MIT) August 9, 2006 Task-specific similarity A toy example: Task-specific similarity A toy

More information

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang CS 231A Section 1: Linear Algebra & Probability Review Kevin Tang Kevin Tang Section 1-1 9/30/2011 Topics Support Vector Machines Boosting Viola Jones face detector Linear Algebra Review Notation Operations

More information

Loss Functions and Optimization. Lecture 3-1

Loss Functions and Optimization. Lecture 3-1 Lecture 3: Loss Functions and Optimization Lecture 3-1 Administrative Assignment 1 is released: http://cs231n.github.io/assignments2017/assignment1/ Due Thursday April 20, 11:59pm on Canvas (Extending

More information

CITS 4402 Computer Vision

CITS 4402 Computer Vision CITS 4402 Computer Vision A/Prof Ajmal Mian Adj/A/Prof Mehdi Ravanbakhsh Lecture 06 Object Recognition Objectives To understand the concept of image based object recognition To learn how to match images

More information

Machine Learning for Signal Processing Detecting faces in images

Machine Learning for Signal Processing Detecting faces in images Machine Learning for Signal Processing Detecting faces in images Class 7. 19 Sep 2013 Instructor: Bhiksha Raj 19 Sep 2013 11755/18979 1 Administrivia Project teams? Project proposals? 19 Sep 2013 11755/18979

More information

LoG Blob Finding and Scale. Scale Selection. Blobs (and scale selection) Achieving scale covariance. Blob detection in 2D. Blob detection in 2D

LoG Blob Finding and Scale. Scale Selection. Blobs (and scale selection) Achieving scale covariance. Blob detection in 2D. Blob detection in 2D Achieving scale covariance Blobs (and scale selection) Goal: independently detect corresponding regions in scaled versions of the same image Need scale selection mechanism for finding characteristic region

More information

INTRODUCTION HIERARCHY OF CLASSIFIERS

INTRODUCTION HIERARCHY OF CLASSIFIERS INTRODUCTION DETECTION AND FOLDED HIERARCHIES FOR EFFICIENT DONALD GEMAN JOINT WORK WITH FRANCOIS FLEURET 2 / 35 INTRODUCTION DETECTION (CONT.) HIERARCHY OF CLASSIFIERS...... Advantages - Highly efficient

More information

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS LAST TIME Intro to cudnn Deep neural nets using cublas and cudnn TODAY Building a better model for image classification Overfitting

More information

Face recognition Computer Vision Spring 2018, Lecture 21

Face recognition Computer Vision Spring 2018, Lecture 21 Face recognition http://www.cs.cmu.edu/~16385/ 16-385 Computer Vision Spring 2018, Lecture 21 Course announcements Homework 6 has been posted and is due on April 27 th. - Any questions about the homework?

More information

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Bayesian decision theory 8001652 Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Jussi Tohka jussi.tohka@tut.fi Institute of Signal Processing Tampere University of Technology

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Representing Images Detecting faces in images

Representing Images Detecting faces in images 11-755 Machine Learning for Signal Processing Representing Images Detecting faces in images Class 5. 15 Sep 2009 Instructor: Bhiksha Raj Last Class: Representing Audio n Basic DFT n Computing a Spectrogram

More information

Outline: Ensemble Learning. Ensemble Learning. The Wisdom of Crowds. The Wisdom of Crowds - Really? Crowd wiser than any individual

Outline: Ensemble Learning. Ensemble Learning. The Wisdom of Crowds. The Wisdom of Crowds - Really? Crowd wiser than any individual Outline: Ensemble Learning We will describe and investigate algorithms to Ensemble Learning Lecture 10, DD2431 Machine Learning A. Maki, J. Sullivan October 2014 train weak classifiers/regressors and how

More information

LECTURE NOTE #3 PROF. ALAN YUILLE

LECTURE NOTE #3 PROF. ALAN YUILLE LECTURE NOTE #3 PROF. ALAN YUILLE 1. Three Topics (1) Precision and Recall Curves. Receiver Operating Characteristic Curves (ROC). What to do if we do not fix the loss function? (2) The Curse of Dimensionality.

More information

Classifier Performance. Assessment and Improvement

Classifier Performance. Assessment and Improvement Classifier Performance Assessment and Improvement Error Rates Define the Error Rate function Q( ω ˆ,ω) = δ( ω ˆ ω) = 1 if ω ˆ ω = 0 0 otherwise When training a classifier, the Apparent error rate (or Test

More information

Q&A of the Deformable Part Model

Q&A of the Deformable Part Model Q&A of the Deformable Part Model Philipp Krähenbühl Lecture 1 -! 1 Deformable Part Model [P.Felzenszwalb, D.McAllester, and D.Ramanan. A DiscriminaFvely Trained, MulFscale, Deformable Part Model. CVPR

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers Blaine Nelson, Tobias Scheffer Contents Classification Problem Bayesian Classifier Decision Linear Classifiers, MAP Models Logistic

More information

Pattern recognition. "To understand is to perceive patterns" Sir Isaiah Berlin, Russian philosopher

Pattern recognition. To understand is to perceive patterns Sir Isaiah Berlin, Russian philosopher Pattern recognition "To understand is to perceive patterns" Sir Isaiah Berlin, Russian philosopher The more relevant patterns at your disposal, the better your decisions will be. This is hopeful news to

More information

CMU-Q Lecture 24:

CMU-Q Lecture 24: CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input

More information

Intelligent Systems Statistical Machine Learning

Intelligent Systems Statistical Machine Learning Intelligent Systems Statistical Machine Learning Carsten Rother, Dmitrij Schlesinger WS2014/2015, Our tasks (recap) The model: two variables are usually present: - the first one is typically discrete k

More information

Max-Margin Additive Classifiers for Detection

Max-Margin Additive Classifiers for Detection Max-Margin Additive Classifiers for Detection Subhransu Maji and Alexander C. Berg Sam Hare VGG Reading Group October 30, 2009 Introduction CVPR08: SVMs with additive kernels can be evaluated efficiently.

More information

Learning Methods for Linear Detectors

Learning Methods for Linear Detectors Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2011/2012 Lesson 20 27 April 2012 Contents Learning Methods for Linear Detectors Learning Linear Detectors...2

More information

Region Covariance: A Fast Descriptor for Detection and Classification

Region Covariance: A Fast Descriptor for Detection and Classification MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Region Covariance: A Fast Descriptor for Detection and Classification Oncel Tuzel, Fatih Porikli, Peter Meer TR2005-111 May 2006 Abstract We

More information

Hierarchical Boosting and Filter Generation

Hierarchical Boosting and Filter Generation January 29, 2007 Plan Combining Classifiers Boosting Neural Network Structure of AdaBoost Image processing Hierarchical Boosting Hierarchical Structure Filters Combining Classifiers Combining Classifiers

More information

Discriminative Learning and Big Data

Discriminative Learning and Big Data AIMS-CDT Michaelmas 2016 Discriminative Learning and Big Data Lecture 2: Other loss functions and ANN Andrew Zisserman Visual Geometry Group University of Oxford http://www.robots.ox.ac.uk/~vgg Lecture

More information

Pictorial Structures Revisited: People Detection and Articulated Pose Estimation. Department of Computer Science TU Darmstadt

Pictorial Structures Revisited: People Detection and Articulated Pose Estimation. Department of Computer Science TU Darmstadt Pictorial Structures Revisited: People Detection and Articulated Pose Estimation Mykhaylo Andriluka Stefan Roth Bernt Schiele Department of Computer Science TU Darmstadt Generic model for human detection

More information

Loss Functions and Optimization. Lecture 3-1

Loss Functions and Optimization. Lecture 3-1 Lecture 3: Loss Functions and Optimization Lecture 3-1 Administrative: Live Questions We ll use Zoom to take questions from remote students live-streaming the lecture Check Piazza for instructions and

More information

UNSUPERVISED LEARNING

UNSUPERVISED LEARNING UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training

More information

Fast Human Detection from Videos Using Covariance Features

Fast Human Detection from Videos Using Covariance Features THIS PAPER APPEARED IN THE ECCV VISUAL SURVEILLANCE WORKSHOP (ECCV-VS), MARSEILLE, OCTOBER 2008 Fast Human Detection from Videos Using Covariance Features Jian Yao Jean-Marc Odobez IDIAP Research Institute

More information

Bayesian Decision Theory Lecture 2

Bayesian Decision Theory Lecture 2 Bayesian Decision Theory Lecture 2 Jason Corso SUNY at Buffalo 14 January 2009 J. Corso (SUNY at Buffalo) Bayesian Decision Theory Lecture 2 14 January 2009 1 / 58 Overview and Plan Covering Chapter 2

More information

MRC: The Maximum Rejection Classifier for Pattern Detection. With Michael Elad, Renato Keshet

MRC: The Maximum Rejection Classifier for Pattern Detection. With Michael Elad, Renato Keshet MRC: The Maimum Rejection Classifier for Pattern Detection With Michael Elad, Renato Keshet 1 The Problem Pattern Detection: Given a pattern that is subjected to a particular type of variation, detect

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Graphical Object Models for Detection and Tracking

Graphical Object Models for Detection and Tracking Graphical Object Models for Detection and Tracking (ls@cs.brown.edu) Department of Computer Science Brown University Joined work with: -Ying Zhu, Siemens Corporate Research, Princeton, NJ -DorinComaniciu,

More information

ECE521 Lecture7. Logistic Regression

ECE521 Lecture7. Logistic Regression ECE521 Lecture7 Logistic Regression Outline Review of decision theory Logistic regression A single neuron Multi-class classification 2 Outline Decision theory is conceptually easy and computationally hard

More information

Classification: The rest of the story

Classification: The rest of the story U NIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN CS598 Machine Learning for Signal Processing Classification: The rest of the story 3 October 2017 Today s lecture Important things we haven t covered yet Fisher

More information

CS5670: Computer Vision

CS5670: Computer Vision CS5670: Computer Vision Noah Snavely Lecture 5: Feature descriptors and matching Szeliski: 4.1 Reading Announcements Project 1 Artifacts due tomorrow, Friday 2/17, at 11:59pm Project 2 will be released

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please

More information

Learning Linear Detectors

Learning Linear Detectors Learning Linear Detectors Instructor - Simon Lucey 16-423 - Designing Computer Vision Apps Today Detection versus Classification Bayes Classifiers Linear Classifiers Examples of Detection 3 Learning: Detection

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

Part 4: Conditional Random Fields

Part 4: Conditional Random Fields Part 4: Conditional Random Fields Sebastian Nowozin and Christoph H. Lampert Colorado Springs, 25th June 2011 1 / 39 Problem (Probabilistic Learning) Let d(y x) be the (unknown) true conditional distribution.

More information

Kernel methods, kernel SVM and ridge regression

Kernel methods, kernel SVM and ridge regression Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;

More information

Convolutional Neural Networks

Convolutional Neural Networks Convolutional Neural Networks Books» http://www.deeplearningbook.org/ Books http://neuralnetworksanddeeplearning.com/.org/ reviews» http://www.deeplearningbook.org/contents/linear_algebra.html» http://www.deeplearningbook.org/contents/prob.html»

More information

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows Kn-Nearest

More information

CSE 598C Vision-based Tracking Seminar. Times: MW 10:10-11:00AM Willard 370 Instructor: Robert Collins Office Hours: Tues 2-4PM, Wed 9-9:50AM

CSE 598C Vision-based Tracking Seminar. Times: MW 10:10-11:00AM Willard 370 Instructor: Robert Collins Office Hours: Tues 2-4PM, Wed 9-9:50AM CSE 598C Vision-based Tracking Seminar Times: MW 10:10-11:00AM Willard 370 Instructor: Robert Collins Office Hours: Tues 2-4PM, Wed 9-9:50AM What is Tracking? typical idea: tracking a single target in

More information

44 CHAPTER 2. BAYESIAN DECISION THEORY

44 CHAPTER 2. BAYESIAN DECISION THEORY 44 CHAPTER 2. BAYESIAN DECISION THEORY Problems Section 2.1 1. In the two-category case, under the Bayes decision rule the conditional error is given by Eq. 7. Even if the posterior densities are continuous,

More information

Final Examination CS540-2: Introduction to Artificial Intelligence

Final Examination CS540-2: Introduction to Artificial Intelligence Final Examination CS540-2: Introduction to Artificial Intelligence May 9, 2018 LAST NAME: SOLUTIONS FIRST NAME: Directions 1. This exam contains 33 questions worth a total of 100 points 2. Fill in your

More information

COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection

COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection Instructor: Herke van Hoof (herke.vanhoof@cs.mcgill.ca) Based on slides by:, Jackie Chi Kit Cheung Class web page:

More information

Final Examination CS 540-2: Introduction to Artificial Intelligence

Final Examination CS 540-2: Introduction to Artificial Intelligence Final Examination CS 540-2: Introduction to Artificial Intelligence May 7, 2017 LAST NAME: SOLUTIONS FIRST NAME: Problem Score Max Score 1 14 2 10 3 6 4 10 5 11 6 9 7 8 9 10 8 12 12 8 Total 100 1 of 11

More information

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1 EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle

More information

Pattern Classification

Pattern Classification Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000 with the permission of the authors

More information

CS 1674: Intro to Computer Vision. Final Review. Prof. Adriana Kovashka University of Pittsburgh December 7, 2016

CS 1674: Intro to Computer Vision. Final Review. Prof. Adriana Kovashka University of Pittsburgh December 7, 2016 CS 1674: Intro to Computer Vision Final Review Prof. Adriana Kovashka University of Pittsburgh December 7, 2016 Final info Format: multiple-choice, true/false, fill in the blank, short answers, apply an

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I Christensen (RIM@GT) Linear

More information

Example: Face Detection

Example: Face Detection Announcements HW1 returned New attendance policy Face Recognition: Dimensionality Reduction On time: 1 point Five minutes or more late: 0.5 points Absent: 0 points Biometrics CSE 190 Lecture 14 CSE190,

More information

Machine Learning 2017

Machine Learning 2017 Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section

More information

Linear Classifiers as Pattern Detectors

Linear Classifiers as Pattern Detectors Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2014/2015 Lesson 16 8 April 2015 Contents Linear Classifiers as Pattern Detectors Notation...2 Linear

More information

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

More information

How to do backpropagation in a brain

How to do backpropagation in a brain How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep

More information

TDT4173 Machine Learning

TDT4173 Machine Learning TDT4173 Machine Learning Lecture 3 Bagging & Boosting + SVMs Norwegian University of Science and Technology Helge Langseth IT-VEST 310 helgel@idi.ntnu.no 1 TDT4173 Machine Learning Outline 1 Ensemble-methods

More information

CS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines

CS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines CS4495/6495 Introduction to Computer Vision 8C-L3 Support Vector Machines Discriminative classifiers Discriminative classifiers find a division (surface) in feature space that separates the classes Several

More information

Learning features by contrasting natural images with noise

Learning features by contrasting natural images with noise Learning features by contrasting natural images with noise Michael Gutmann 1 and Aapo Hyvärinen 12 1 Dept. of Computer Science and HIIT, University of Helsinki, P.O. Box 68, FIN-00014 University of Helsinki,

More information

Performance Evaluation and Comparison

Performance Evaluation and Comparison Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Cross Validation and Resampling 3 Interval Estimation

More information

Introduction to Statistical Inference

Introduction to Statistical Inference Structural Health Monitoring Using Statistical Pattern Recognition Introduction to Statistical Inference Presented by Charles R. Farrar, Ph.D., P.E. Outline Introduce statistical decision making for Structural

More information

Structured Prediction

Structured Prediction Structured Prediction Classification Algorithms Classify objects x X into labels y Y First there was binary: Y = {0, 1} Then multiclass: Y = {1,...,6} The next generation: Structured Labels Structured

More information

Modeling Complex Temporal Composition of Actionlets for Activity Prediction

Modeling Complex Temporal Composition of Actionlets for Activity Prediction Modeling Complex Temporal Composition of Actionlets for Activity Prediction ECCV 2012 Activity Recognition Reading Group Framework of activity prediction What is an Actionlet To segment a long sequence

More information

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall

More information

DISCRIMINATIVE DECORELATION FOR CLUSTERING AND CLASSIFICATION

DISCRIMINATIVE DECORELATION FOR CLUSTERING AND CLASSIFICATION DISCRIMINATIVE DECORELATION FOR CLUSTERING AND CLASSIFICATION ECCV 12 Bharath Hariharan, Jitandra Malik, and Deva Ramanan MOTIVATION State-of-the-art Object Detection HOG Linear SVM Dalal&Triggs Histograms

More information

Basis Expansion and Nonlinear SVM. Kai Yu

Basis Expansion and Nonlinear SVM. Kai Yu Basis Expansion and Nonlinear SVM Kai Yu Linear Classifiers f(x) =w > x + b z(x) = sign(f(x)) Help to learn more general cases, e.g., nonlinear models 8/7/12 2 Nonlinear Classifiers via Basis Expansion

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger

More information

Robotics 2 AdaBoost for People and Place Detection

Robotics 2 AdaBoost for People and Place Detection Robotics 2 AdaBoost for People and Place Detection Giorgio Grisetti, Cyrill Stachniss, Kai Arras, Wolfram Burgard v.1.0, Kai Arras, Oct 09, including material by Luciano Spinello and Oscar Martinez Mozos

More information

Deep Learning for Computer Vision

Deep Learning for Computer Vision Deep Learning for Computer Vision Lecture 4: Curse of Dimensionality, High Dimensional Feature Spaces, Linear Classifiers, Linear Regression, Python, and Jupyter Notebooks Peter Belhumeur Computer Science

More information

Error Rates. Error vs Threshold. ROC Curve. Biometrics: A Pattern Recognition System. Pattern classification. Biometrics CSE 190 Lecture 3

Error Rates. Error vs Threshold. ROC Curve. Biometrics: A Pattern Recognition System. Pattern classification. Biometrics CSE 190 Lecture 3 Biometrics: A Pattern Recognition System Yes/No Pattern classification Biometrics CSE 190 Lecture 3 Authentication False accept rate (FAR): Proportion of imposters accepted False reject rate (FRR): Proportion

More information

Recognition Performance from SAR Imagery Subject to System Resource Constraints

Recognition Performance from SAR Imagery Subject to System Resource Constraints Recognition Performance from SAR Imagery Subject to System Resource Constraints Michael D. DeVore Advisor: Joseph A. O SullivanO Washington University in St. Louis Electronic Systems and Signals Research

More information

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17 3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural

More information

Deep learning on 3D geometries. Hope Yao Design Informatics Lab Department of Mechanical and Aerospace Engineering

Deep learning on 3D geometries. Hope Yao Design Informatics Lab Department of Mechanical and Aerospace Engineering Deep learning on 3D geometries Hope Yao Design Informatics Lab Department of Mechanical and Aerospace Engineering Overview Background Methods Numerical Result Future improvements Conclusion Background

More information

Machine Learning, Midterm Exam

Machine Learning, Midterm Exam 10-601 Machine Learning, Midterm Exam Instructors: Tom Mitchell, Ziv Bar-Joseph Wednesday 12 th December, 2012 There are 9 questions, for a total of 100 points. This exam has 20 pages, make sure you have

More information