Pose Estimation in SAR using an Information Theoretic Criterion
|
|
- Kevin Davidson
- 5 years ago
- Views:
Transcription
1 Pose Estimation in SAR using an Information Theoretic Criterion Jose C. Principe, Dongxin Xu, John W. Fisher III Computational NeuroEngineering Laboratory, U. of Florida. Abstract This paper describes a pose estimation algorithm based on an information theoretic formulation. We formulate the pose estimation statistically and show that pose can be estimated from a low dimensional feature space obtained by maximizing the mutual information between the aspect angle and the output of the mapper. We use the Havrda-Charvat definition of entropy to implement a nonparametric estimator based on the Parzen window method. Results in the MSTAR data set are presented and show the good performance of the methodology. 1.0 Introduction Knowing the relative position of a vehicle with respect to the sensor (normally called the aspect angle of the observation or the pose) is an important piece of information for vehicle recognition. Since pattern classifiers are statistical machines, without the pose information the classifier has to be trained with all possible poses to become invariant to aspect angle during operation. This is the principle of classifiers based on the synthetic discriminant function (SDF) so widely used in optical correlators [1], or the template based classifiers [2]. Even if the classifier is built around Bayesian principles or neural networks, all possible aspect angles have to be included during training to describe reliably the object. In SAR this is not a simple task due to the enormous variability of the scattering phenomenology created by man-made objects. This argument suggests that alternatively one could divide the classification in two stages: first find the pose of the object and then decide which is the class by selecting a classifier trained exclusively for that pose. Notice that this approach drastically reduced the complexity of the classifier training. This in fact is the principle used in the MSTAR architecture [3] where classifica- Jose C. Principe 1 CNEL, University of Florida
2 tion is divided in an indexing module followed by search and match. However, the approach utilized in MSTARs is based on the traditional method of a priori selecting landmarks in the vehicle and then comparing them for the best match with a database of features taken at different angles. This solution has several drawbacks. First, it is computationally expensive (search has to be done on-line). Second, it is highly dependent on the quality of the landmarks. Edges have been proved useful in optical images, but in SAR point scatters are normally preferred due to the different image formation characteristics. The issue is that point scatters vary abruptly with the depression angle and pose so the stability of the method is still under investigation. Third, the size of the problem space increases drastically with the number of objects and the precision required when local features are utilized. Instead of thinking that the system complexity is intrinsic to the problem [4], we submit that the problem formulation also affects the complexity of the solution. If the landmarks are local, then it is obvious that the problem does not scale up well. Our approach is to extract optimal features directly from the data by training an adaptive system. The advantages are the following: First, the method is very fast. Once the system is trained, during testing the image is presented and the output of the system is the estimation of pose, i.e. we have created a content addressable memory (CAM). Any microprocessor can do this in real time. Second, the system is not sensitive to the detection of landmarks which is a big advantage primarily when we do not know how much information is carried in the landmarks. Until the information formulation proposed here, this optimal feature extraction could only be done using principal component analysis (PCA) or linear discriminant analysis. PCA provides only global (rough) information about the objects (second order statistics) and the information provided may not be directly related to pose, which is just one aspect of the input image. So the results may be disappointing. However, our method of mutual information maximization is using the full information contained on the probability density function (pdf) so it can utilize local information if it is needed to solve the problem and the model parameters are right directed to the pose, which is our only interest here. This paper starts with a statistical formulation of the problem of pose estimation, describes a method of computing entropy from samples and how to construct a mutual information estimator, and presents preliminary results in the MSTAR data set. Jose C. Principe 2 CNEL, University of Florida
3 2.0 A statistical formulation of pose estimation Suppose that we have collected data in pairs ( x i, a i ), i 1,, N, where the image x i can be regarded as a vector in a high dimensional space x i ( m is usually in the thousands) and a i is a vector of ground truth information relative to the image contents. For the general case of pose R m estimation, a i is a six dimensional vector containing the translational and rotational information [5]. Here we will treat the one degree of freedom (1DOF) pose estimation problem where is a x i SAR image of a land vehicle obtained at a given depression angle, and a i is the azimuth (aspect) angle of the vehicle. The MSTAR data set [6] can be readily utilized to test the accuracy of 1DOF pose estimation algorithms. In general, the estimation of the aspect angle (here called pose) given can be formulated as a MAP (maximum a posteriori probability) prob- a particular image x lem: R m â argmax f AX x ( x, a) a (1) where f AX ( xa, ) is the a posteriori probability density function (pdf) of the aspect angle A x given x. This formulation implies that the best estimation of the aspect angle given x is the one which maximizes the a posteriori probability. Although the aspect angle A is a continuous variable, we can discretize it for convenience, where the possible values are a i, i 1,, N, i.e. all the angles in the training set. Since we have no a priori knowledge about the aspect angle, the uniform distribution is the most reasonable assumption about the probability density function of A in the sense that it is the direct result of MaxEnt [7] principle. Under these conditions, the above MAP problem is equivalent to ML (Maximum Likelihood): â Pa ( i )f XA ai ( xa i ) argmax Pa ( i x) argmax argmax f f X ( x) XA ai ( xa i ) i i i (2) where pa ( i x), i 1,, N, is the a posteriori probability of the discrete variable A given x, Jose C. Principe 3 CNEL, University of Florida
4 Pa ( i ) is the a priori probability of A, which here is the uniform distribution, i.e. Pa ( i ) constant for i 1,, N ; f XA ai ( xa i ) is the conditional pdf of the image x for a particular aspect angle A, and f X ( x) is the marginal pdf of x. Therefore from (2), the problem becomes the estimation of the conditional pdf of x for all the possible angle a i, i 1,, N. Since x is a very high dimensional vector and any assumption about the form of the pdf is not appropriate for realistic pose estimation in SAR, a non-parametric method should be used. However, nonparametric pdf estimation of x becomes very unreliable since x is in a very high dimensional space and training data is limited. So, dimensionality reduction or feature extraction is necessary in this case, which means that instead of estimating the angle directly from the image x, we estimate it from a feature space of the image x. a i Generally, a feature is the output of a mapping. Let y h( x, w) be a feature set for x, where h: R m R k is a mapping, also called the feature extractor, y R k, k «m, and w is the parameter set of the feature extractor. Now, the problem according to (2) becomes: â argmax f ya ai ( ), y h( x, w) i ya i (3) In this framework, the key issue is how to choose the parameter set w. We propose to apply Information Theory [8]. From the information theoretic point of view, a mapping or feature extractor is an information transmission channel. The parameter of the mapping should be chosen so that it transmits as much information as possible. Here, the problem requires that the mapping transmits the most information about the aspect angle, i.e. the feature y should best represent the aspect angle. According to Information Theory, the quantitative measure for this purpose is the mutual information between the feature y and aspect angle a. So, parameter selection can be formulated as: w opt argmax Iy ( hxw (, ), a) w (4) Jose C. Principe 4 CNEL, University of Florida
5 where Iya (, ) is the mutual information between y and a, that is the optimal parameter set should be the one which maximizes the mutual information between the feature and the angle. Actually, the mutual information measure directly relies on pdfs. As mentioned above, non-parametric pdf estimation should be used, so the Parzen Window [9] method is selected here. Unfortunately, Shannon s mutual information measure will become too complex to be implementable with the Parzen Window pdf estimation. In the next section, we will introduce our method of mutual information estimation by the Havrda-Chavart s entropy Pose estimation using the Havrda Chavart s entropy Figure 1 shows the proposed block diagram for pose estimation. A(ngles) x y 1 Estimate mutual information y 2 I(Y,A) y f( x, w) adapt parameters w Figure 1. Pose estimation with the MLP From Information Theory, the mutual information can be computed by the difference between the entropy and the conditional entropy: IyA (, ) H H2 ( Y) H H2 ( YA) (5) where y is the feature and A is aspect angle. For reasons that are connected to the estimator of entropy from samples, here we utilize the Havrda-Chavart definition of entropy [10] Jose C. Principe 5 CNEL, University of Florida
6 H Hα ( Y) f 1 α Y ( y) α dy 1 (6) with α2 which will also be called the Quadratic entropy. For a more in depth discussion of several definitions of entropy see [10]. So H 2 ( Y) is the Quadratic entropy of the output and H 2 ( YA) is the conditional Quadratic Entropy. Since the MLP is a universal mapper [11] it is used in this application as the mapping function (here we use the configuration e.g. 6,400x3x2). Now, the problem can be described as finding the parameters (w) of the MLP so that the mutual information between the output of the MLP and the aspect angle is maximized, i.e. we let the output convey the most information about the aspect angle. We can think of this scheme as information filtering as opposed to the more traditional image filtering so commonly utilized in image processing. Suppose the training data set are pairs { x i, a i }, where x i is a SAR image of a vehicle and a i is its true azimuth (aspect) angle. The feature set y i hx ( i, w) is a 2 dimensional vector (y 1i,y 2i ) where the aspect can be easily measured as the angle of the vector. We can discretize uniformly the angles around the curve described by the output vector, as shown in Figure 2, where a circumference is assumed for simplicity. y 2 a 0 a1 a i, i 1,, N a 2 y 1 Fig 2. Structure for the angle information In our problem formulation, the pose is a random variable which must be described statistically. Jose C. Principe 6 CNEL, University of Florida
7 We create a local structure weighting the samples of adjacent angles samples a i k a i 1 a i a i 1 weights w l w 1 w 0 w 1 w l + a i + k 1 w l 0 w l 1 The neighborhood size was experimentally set at l 2 nearest neighbors, and the weighting was selected as a Gaussian decay. Effectively this arrangement says that there is a fuzzy correspondence between several possible angles and each one of the sampled points in the unit circumference. The reason we selected the HC Quadratic entropy is related to the Parzen window estimator presented in [12]. Let R k, i 1,, N, be a set of samples from a random variable Y R k in k- y i dimensional feature space. One interesting question is what will be the entropy associated with this set of data points. One answer lies in the estimation of the data pdf by the Parzen window method using a Gaussian kernel: f Y ( y) N Gy y N ( i, σ 2 ) i 1 (7) where G(.,.) is the Gaussian kernel Gyσ (, ) in dimensional ( 2π) k 2 σ exp y T y 1 2 2σ 2 k space, and σ 2 is the variance. When Shannon s entropy is used along with this pdf estimation, the measure becomes very complex. Fortunately, HC quadratic entropy of (6) leads to a simpler form and we obtain the following entropy measure for a set of discrete data points { }: y i H( { y i }) H H2 ( Y { y i }) 1 f Y ( y) 2 dy 1 V( { y i }) V( { y i }) N N + N 2 i 1 j Gy ( y i, σ 2 )Gy ( y j, σ 2 ) dy N N Gy ( i y j, 2σ 2 ) N 2 i 1 j 1 (8) Jose C. Principe 7 CNEL, University of Florida
8 With this estimator the mutual information related to the quadratic HC entropy becomes IYA (, ) k w N l Gy ( y i + l ) y Gy ( y N 2 i ) d 2 dy i i l k (9) The second term estimates the entropy due to all the input images, while the first term estimates the conditional entropy. In order to train the MLP, we take the derivative of (9) with respect to the parameters and interpret it as an injected error to the back-propagation algorithm [12]. In this way, the feature extraction mapping for pose estimation can be obtained. After training the testing image x is presented to the MLP, and its output y estimates the discrete conditional pdf in the output feature space ( ). Then the pose can be estimated by using (3). f YA ai ya i 3.0 Experimental Results This algorithm was validated in the MSTAR public release database [6]. We trained the pose estimator with the class BMP-2 vehicle, type sn-c21 with a depression angle of 15 degrees. We simply clipped the chips (128x128) from pixel 20 to 99 both vertically and horizontally (obtained image chip size of 80x80) to preserve the image of the vehicle and its shadow. No fine centering of the vehicle was attempted. The training set was constructed from 53 chips taken at approximately 3.5 o angle apart to cover angles from 0 to 180 degrees. The algorithm takes about 100 batch iterations to converge (very repeatable performance). In Figure 3 the circle at left (diamonds) represents the training results in the feature space. Notice that the MLP trained with our criterion created an output that is almost a perfect circle. The circle can be interpreted as the best output distribution to maximize the mutual information between the input and the pose. This result is intuitively appealing, but notice that it was discovered automatically using our algorithm (i.e. we did not enforced the circle as a desired response). The triangles at the left show the typical results in a test set (the chips for the same vehicle not used for training). It is interesting that the amplitude for the test set fluctuates a lot, but the outputs tend to move inwards along the radial direction, preserving the quality of the pose estimation. This means that Jose C. Principe 8 CNEL, University of Florida
9 the algorithm created an output metric that preserves angle relationships as we expected. The figure at the right shows the true and estimated pose. The vertical axis is the angle and the horizontal axis is the exemplar index. angle y1 y2 image # Figure 3. BMP-2, CN-C21 (180 degree training) The testing was conducted in the rest of the chips from the same vehicle and two other vehicle types (SN-9563 and 9566) which represent different configurations (all at the same depression angle). We also tested the pose estimator on a different class, the T-72, using the type sn-s7. Table I quantifies the results. Table 1: Testing with training class/type error mean (degrees) error std. dev. (degrees) BMP2/sn-c BMP2/sn BMP2/sn T72/sn-s Notice that the pose estimation error in the testing of the same vehicle type is basically the same as the resolution in training (3.5 o ) which means that the accuracy of the estimator is very good. Therefore we expect that more precise pose estimation are achievable by creating training sets with more images with finer resolution in pose. Table I also shows that the algorithm generalizes very well to both other vehicle types and even Jose C. Principe 9 CNEL, University of Florida
10 other vehicle classes. We notice a degradation in performance in the T72, but it is a smooth rolloff. If we want to obviate this degradation of performance with the vehicle type we should utilize more than one vehicle in the training set, which at the same time will obviate the resolution problem addressed above. However, we have to state that the algorithm for mutual information estimation is O(N 2 ), which means that there is practically a limit on the number of exemplars (N) utilized in training. In order to quantify the robustness of the algorithm to vehicle occlusion we have replaced progressively one vehicle image with the background (this is an image of the BMP2 not used in training). We observed that although the amplitude of the output feature decreased appreciably when the bright output of the vehicle was substituted by the darker background (the triangles in the left portion of Figure 4), the pose estimation hold-off remarkably well (right portion of Figure 4). In this case the pose was within an angle of +/- 5 degree up to 50% occlusion and +/- 10 degrees up to 95% occlusion (which occurs at increment 36 in the plot). In our opinion this smooth degradation is one of the advantages of using a distributed system as a mapper, and the same behavior has been extensively reported in the associative memory literature [13]. However, different occlusion directions may provide different performance (it all depends upon which portions of the image are occluded). angle y1 y2 occlusion sequence Figure 4. Results of pose estimation with vehicle occlusion. Vehicle pose is 58 degrees. Jose C. Principe 10 CNEL, University of Florida
11 6.0 Conclusions This paper reports on our present efforts to create a robust and easy to implement pose estimator for SAR imagery. The need for such an algorithm stems from our goal of creating accurate and robust classifiers. Knowing the pose of the vehicle will streamline the size and training of the classifier module which should be translated in better performance. Our pose estimation framework is statistical in nature and utilizes directly information through manipulation of entropy from examples. We address the enormous complexity of the input space by creating an adaptive systems with optimal parameters. This is probably the best way to deal and conquer complexity. We project the input data to a subspace such that some property of the input relevant for our problem is optimally preserved. This can be thought as information filtering as opposed to the more conventional signal filtering. The issue is the choice of the criterion for optimization. We were fully aware of the limitation of the second order methods utilized traditionally in pattern recognition, so we sought a method that would utilize the full information about the pdf of the input class. The mutual information between the feature and pose becomes the criterion of choice. This criterion measures simply the uncertainty remaining in the feature (the output of the mapper) about pose. By maximizing mutual information we are decreasing the uncertainty of pose in the feature, i.e. we are transferring as much information as possible between the feature and pose. There are also other reasons to use mutual information for classification such as the decrease of the lower bound of the classification error according to Fano s equality [14]. The big issue is the estimation of entropy from examples. In [12] we proposed a Parzen window to estimate the pdf along with mean squared difference between the uniform distribution and the estimated one to manipulate the output entropy. The derivative of the criterion can be used as an injected error to adapt the parameters of our mapper (linear or nonlinear) using the backpropagation algorithm. In this paper we couple the entropy estimator with the definition of Quadratic entropy according to Havrda-Charvat to come up with an estimator of mutual information. The preliminary results of our method are very promising. We successfully trained our pose esti- Jose C. Principe 11 CNEL, University of Florida
12 mator with MSTAR vehicles. The accuracy in the test set is similar to the training set in the same vehicle, and the performance degrades gracefully to other vehicle types. Even with severe occlusion of the training vehicle (up to 95% occlusion) we obtain estimates of pose within +/- 10 degrees. Further testing of the algorithm is required, as well as further refinements to the theory. The image set is realistic, but still simple (1-DOF). Extension to more degrees of freedom will be pursued next, as well as more vehicles. Our pose estimator is based on the fact that the angle is discrete. It is important for accuracy to utilize the angle as a continuous variable. This will require a new estimator for the condition entropy. It is also important to understand the algorithm better and to compare its performance with alternate approaches. One of the bottlenecks of the method is that the computation is O(N 2 ), which imposes a practical limit on the size of training sets. Acknowledgments: This work was partially support by DARPA-Air force grant F References [1] Kumar B., Minimum variance synthetic discriminant functions, J. Opt. Soc. Am. A 3(1), , [2] Duda R. and Hart P., Pattern classificatioin and scene analysis, Wiley, [3] MSTAR Kickoff Meeting Proceedings, Washingtom, [4] Minardi M., Moving & stationary target acquisition and recognition, WL talk, September [5] Lowe, D., Solving parameters of object models from image descriptions, In Proc. ARPA IU workshop, pp , [6] MSTAR (Public) Targets, CDROM, Veda Inc., Ohio, [7] Jaynes E., Information theory and statistical mechanics, Phys. Rev., vol 106, pp , [8] Shannon, C.E. A mathematical theory of communication. Bell Sys. Tech. J. 27, 1948, pp , [9] Parzen, E. On the estimation of a probability density function and the mode, Ann. Math. Jose C. Principe 12 CNEL, University of Florida
13 Stat. 33, 1962, p1065 [10] Kapur, J.N. Measures of Information and Their Applications. John Wiley & Sons [11] Haykin S., Neural Networks, A Comprehensive Foundation, Macmillan Publishing Company, 1994 [12] Fisher J., Principe J., Entropy manipulation of arbitrary nonlinear mappings, Proc. IEEE Workshop on Neural Nt. for Signal Proc. VII, 14-23, [13] Kohonen T., Self-organization and associative memory, Springer Verlag, [14] Fisher J.W.III Nonlinear Extensions to the Minimum Average Correlation Energy Filter Ph.D dissertation, Dept. of ECE, University of Florida, Jose C. Principe 13 CNEL, University of Florida
Entropy Manipulation of Arbitrary Non I inear Map pings
Entropy Manipulation of Arbitrary Non I inear Map pings John W. Fisher I11 JosC C. Principe Computational NeuroEngineering Laboratory EB, #33, PO Box 116130 University of Floridaa Gainesville, FL 326 1
More informationLearning from Examples with Information Theoretic Criteria
Learning from Examples with Information Theoretic Criteria Jose C. Principe, Dongxin Xu, Qun Zhao, John W. Fisher III Abstract Computational NeuroEngineering Laboratory, University of Florida, Gainesville,
More informationA Nonlinear Extension of the MACE Filter
page 1 of 27 A Nonlinear Extension of the MACE Filter John W. Fisher III and Jose C. Principe Computational NeuroEngineering Laboratoryfisher@synapse.ee.ufl.edu Department of Electrical Engineeringprincipe@brain.ee.ufl.edu
More informationEEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1
EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle
More informationStatistical Learning Theory and the C-Loss cost function
Statistical Learning Theory and the C-Loss cost function Jose Principe, Ph.D. Distinguished Professor ECE, BME Computational NeuroEngineering Laboratory and principe@cnel.ufl.edu Statistical Learning Theory
More informationPATTERN CLASSIFICATION
PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto CONTENTS
More informationOBJECT DETECTION AND RECOGNITION IN DIGITAL IMAGES
OBJECT DETECTION AND RECOGNITION IN DIGITAL IMAGES THEORY AND PRACTICE Bogustaw Cyganek AGH University of Science and Technology, Poland WILEY A John Wiley &. Sons, Ltd., Publication Contents Preface Acknowledgements
More informationRecursive Least Squares for an Entropy Regularized MSE Cost Function
Recursive Least Squares for an Entropy Regularized MSE Cost Function Deniz Erdogmus, Yadunandana N. Rao, Jose C. Principe Oscar Fontenla-Romero, Amparo Alonso-Betanzos Electrical Eng. Dept., University
More informationWHEN IS A MAXIMAL INVARIANT HYPOTHESIS TEST BETTER THAN THE GLRT? Hyung Soo Kim and Alfred O. Hero
WHEN IS A MAXIMAL INVARIANT HYPTHESIS TEST BETTER THAN THE GLRT? Hyung Soo Kim and Alfred. Hero Department of Electrical Engineering and Computer Science University of Michigan, Ann Arbor, MI 489-222 ABSTRACT
More informationLecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides
Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Intelligent Data Analysis and Probabilistic Inference Lecture
More informationARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92
ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 BIOLOGICAL INSPIRATIONS Some numbers The human brain contains about 10 billion nerve cells (neurons) Each neuron is connected to the others through 10000
More informationSimultaneous Multi-frame MAP Super-Resolution Video Enhancement using Spatio-temporal Priors
Simultaneous Multi-frame MAP Super-Resolution Video Enhancement using Spatio-temporal Priors Sean Borman and Robert L. Stevenson Department of Electrical Engineering, University of Notre Dame Notre Dame,
More informationECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction
ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering
More informationLecture 3: Pattern Classification
EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures
More informationMINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava
MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS Maya Gupta, Luca Cazzanti, and Santosh Srivastava University of Washington Dept. of Electrical Engineering Seattle,
More informationGeneralized Information Potential Criterion for Adaptive System Training
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 5, SEPTEMBER 2002 1035 Generalized Information Potential Criterion for Adaptive System Training Deniz Erdogmus, Student Member, IEEE, and Jose C. Principe,
More informationFeature Extraction with Weighted Samples Based on Independent Component Analysis
Feature Extraction with Weighted Samples Based on Independent Component Analysis Nojun Kwak Samsung Electronics, Suwon P.O. Box 105, Suwon-Si, Gyeonggi-Do, KOREA 442-742, nojunk@ieee.org, WWW home page:
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationStatistical Rock Physics
Statistical - Introduction Book review 3.1-3.3 Min Sun March. 13, 2009 Outline. What is Statistical. Why we need Statistical. How Statistical works Statistical Rock physics Information theory Statistics
More informationReconnaissance d objetsd et vision artificielle
Reconnaissance d objetsd et vision artificielle http://www.di.ens.fr/willow/teaching/recvis09 Lecture 6 Face recognition Face detection Neural nets Attention! Troisième exercice de programmation du le
More informationMachine Learning. Theory of Classification and Nonparametric Classifier. Lecture 2, January 16, What is theoretically the best classifier
Machine Learning 10-701/15 701/15-781, 781, Spring 2008 Theory of Classification and Nonparametric Classifier Eric Xing Lecture 2, January 16, 2006 Reading: Chap. 2,5 CB and handouts Outline What is theoretically
More informationSmall sample size generalization
9th Scandinavian Conference on Image Analysis, June 6-9, 1995, Uppsala, Sweden, Preprint Small sample size generalization Robert P.W. Duin Pattern Recognition Group, Faculty of Applied Physics Delft University
More informationMachine Learning Lecture 2
Machine Perceptual Learning and Sensory Summer Augmented 15 Computing Many slides adapted from B. Schiele Machine Learning Lecture 2 Probability Density Estimation 16.04.2015 Bastian Leibe RWTH Aachen
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction
More informationMicroarray Data Analysis: Discovery
Microarray Data Analysis: Discovery Lecture 5 Classification Classification vs. Clustering Classification: Goal: Placing objects (e.g. genes) into meaningful classes Supervised Clustering: Goal: Discover
More informationDiscriminative Direction for Kernel Classifiers
Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering
More informationVariable selection and feature construction using methods related to information theory
Outline Variable selection and feature construction using methods related to information theory Kari 1 1 Intelligent Systems Lab, Motorola, Tempe, AZ IJCNN 2007 Outline Outline 1 Information Theory and
More informationRecursive Generalized Eigendecomposition for Independent Component Analysis
Recursive Generalized Eigendecomposition for Independent Component Analysis Umut Ozertem 1, Deniz Erdogmus 1,, ian Lan 1 CSEE Department, OGI, Oregon Health & Science University, Portland, OR, USA. {ozertemu,deniz}@csee.ogi.edu
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationRecognition Performance from SAR Imagery Subject to System Resource Constraints
Recognition Performance from SAR Imagery Subject to System Resource Constraints Michael D. DeVore Advisor: Joseph A. O SullivanO Washington University in St. Louis Electronic Systems and Signals Research
More informationMultiple Similarities Based Kernel Subspace Learning for Image Classification
Multiple Similarities Based Kernel Subspace Learning for Image Classification Wang Yan, Qingshan Liu, Hanqing Lu, and Songde Ma National Laboratory of Pattern Recognition, Institute of Automation, Chinese
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationOverriding the Experts: A Stacking Method For Combining Marginal Classifiers
From: FLAIRS-00 Proceedings. Copyright 2000, AAAI (www.aaai.org). All rights reserved. Overriding the Experts: A Stacking ethod For Combining arginal Classifiers ark D. Happel and Peter ock Department
More informationClassification of Ordinal Data Using Neural Networks
Classification of Ordinal Data Using Neural Networks Joaquim Pinto da Costa and Jaime S. Cardoso 2 Faculdade Ciências Universidade Porto, Porto, Portugal jpcosta@fc.up.pt 2 Faculdade Engenharia Universidade
More informationRegularization in Neural Networks
Regularization in Neural Networks Sargur Srihari 1 Topics in Neural Network Regularization What is regularization? Methods 1. Determining optimal number of hidden units 2. Use of regularizer in error function
More informationLecture 3: Pattern Classification. Pattern classification
EE E68: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mitures and
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables
More informationStatistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart
Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart 1 Motivation and Problem In Lecture 1 we briefly saw how histograms
More informationAn Error-Entropy Minimization Algorithm for Supervised Training of Nonlinear Adaptive Systems
1780 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 50, NO. 7, JULY 2002 An Error-Entropy Minimization Algorithm for Supervised Training of Nonlinear Adaptive Systems Deniz Erdogmus, Member, IEEE, and Jose
More informationSTA 414/2104: Lecture 8
STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA
More informationAruna Bhat Research Scholar, Department of Electrical Engineering, IIT Delhi, India
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2017 IJSRCSEIT Volume 2 Issue 6 ISSN : 2456-3307 Robust Face Recognition System using Non Additive
More informationIntro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation
Lecture 15. Pattern Classification (I): Statistical Formulation Outline Statistical Pattern Recognition Maximum Posterior Probability (MAP) Classifier Maximum Likelihood (ML) Classifier K-Nearest Neighbor
More informationDensity Estimation: ML, MAP, Bayesian estimation
Density Estimation: ML, MAP, Bayesian estimation CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Maximum-Likelihood Estimation Maximum
More informationScale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract
Scale-Invariance of Support Vector Machines based on the Triangular Kernel François Fleuret Hichem Sahbi IMEDIA Research Group INRIA Domaine de Voluceau 78150 Le Chesnay, France Abstract This paper focuses
More informationMachine Learning Lecture 2
Machine Perceptual Learning and Sensory Summer Augmented 6 Computing Announcements Machine Learning Lecture 2 Course webpage http://www.vision.rwth-aachen.de/teaching/ Slides will be made available on
More informationParametric Techniques
Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure
More informationFunctional Preprocessing for Multilayer Perceptrons
Functional Preprocessing for Multilayer Perceptrons Fabrice Rossi and Brieuc Conan-Guez Projet AxIS, INRIA, Domaine de Voluceau, Rocquencourt, B.P. 105 78153 Le Chesnay Cedex, France CEREMADE, UMR CNRS
More informationNo. of dimensions 1. No. of centers
Contents 8.6 Course of dimensionality............................ 15 8.7 Computational aspects of linear estimators.................. 15 8.7.1 Diagonalization of circulant andblock-circulant matrices......
More informationClassifier s Complexity Control while Training Multilayer Perceptrons
Classifier s Complexity Control while Training Multilayer Perceptrons âdu QDVRaudys Institute of Mathematics and Informatics Akademijos 4, Vilnius 2600, Lithuania e-mail: raudys@das.mii.lt Abstract. We
More informationLearning features by contrasting natural images with noise
Learning features by contrasting natural images with noise Michael Gutmann 1 and Aapo Hyvärinen 12 1 Dept. of Computer Science and HIIT, University of Helsinki, P.O. Box 68, FIN-00014 University of Helsinki,
More informationMODULE -4 BAYEIAN LEARNING
MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities
More informationKernel Methods and Support Vector Machines
Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete
More informationGlobal Scene Representations. Tilke Judd
Global Scene Representations Tilke Judd Papers Oliva and Torralba [2001] Fei Fei and Perona [2005] Labzebnik, Schmid and Ponce [2006] Commonalities Goal: Recognize natural scene categories Extract features
More informationLinks between Perceptrons, MLPs and SVMs
Links between Perceptrons, MLPs and SVMs Ronan Collobert Samy Bengio IDIAP, Rue du Simplon, 19 Martigny, Switzerland Abstract We propose to study links between three important classification algorithms:
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationInformation-Theoretic Learning
Information-Theoretic Learning I- Introduction Tutorial Jose C. Principe, Ph.D. One of the fundamental problems of our technology driven society is the huge amounts of data that are being generated by
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More informationA Method to Improve the Accuracy of Remote Sensing Data Classification by Exploiting the Multi-Scale Properties in the Scene
Proceedings of the 8th International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences Shanghai, P. R. China, June 25-27, 2008, pp. 183-188 A Method to Improve the
More informationOld painting digital color restoration
Old painting digital color restoration Michail Pappas Ioannis Pitas Dept. of Informatics, Aristotle University of Thessaloniki GR-54643 Thessaloniki, Greece Abstract Many old paintings suffer from the
More informationNon-parametric Classification of Facial Features
Non-parametric Classification of Facial Features Hyun Sung Chang Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology Problem statement In this project, I attempted
More informationAdvanced statistical methods for data analysis Lecture 2
Advanced statistical methods for data analysis Lecture 2 RHUL Physics www.pp.rhul.ac.uk/~cowan Universität Mainz Klausurtagung des GK Eichtheorien exp. Tests... Bullay/Mosel 15 17 September, 2008 1 Outline
More informationFAST METHODS FOR EVALUATING THE ELECTRIC FIELD LEVEL IN 2D-INDOOR ENVIRONMENTS
Progress In Electromagnetics Research, PIER 69, 247 255, 2007 FAST METHODS FOR EVALUATING THE ELECTRIC FIELD LEVEL IN 2D-INDOOR ENVIRONMENTS D. Martinez, F. Las-Heras, and R. G. Ayestaran Department of
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction
More informationNeutron inverse kinetics via Gaussian Processes
Neutron inverse kinetics via Gaussian Processes P. Picca Politecnico di Torino, Torino, Italy R. Furfaro University of Arizona, Tucson, Arizona Outline Introduction Review of inverse kinetics techniques
More informationEM-algorithm for Training of State-space Models with Application to Time Series Prediction
EM-algorithm for Training of State-space Models with Application to Time Series Prediction Elia Liitiäinen, Nima Reyhani and Amaury Lendasse Helsinki University of Technology - Neural Networks Research
More informationChange Detection in Optical Aerial Images by a Multi-Layer Conditional Mixed Markov Model
Change Detection in Optical Aerial Images by a Multi-Layer Conditional Mixed Markov Model Csaba Benedek 12 Tamás Szirányi 1 1 Distributed Events Analysis Research Group Computer and Automation Research
More informationStatistical Independence and Novelty Detection with Information Preserving Nonlinear Maps
Statistical Independence and Novelty Detection with Information Preserving Nonlinear Maps Lucas Parra, Gustavo Deco, Stefan Miesbach Siemens AG, Corporate Research and Development, ZFE ST SN 4 Otto-Hahn-Ring
More informationDrift Reduction For Metal-Oxide Sensor Arrays Using Canonical Correlation Regression And Partial Least Squares
Drift Reduction For Metal-Oxide Sensor Arrays Using Canonical Correlation Regression And Partial Least Squares R Gutierrez-Osuna Computer Science Department, Wright State University, Dayton, OH 45435,
More informationGoodness of Fit Test and Test of Independence by Entropy
Journal of Mathematical Extension Vol. 3, No. 2 (2009), 43-59 Goodness of Fit Test and Test of Independence by Entropy M. Sharifdoost Islamic Azad University Science & Research Branch, Tehran N. Nematollahi
More informationLearning Kernel Parameters by using Class Separability Measure
Learning Kernel Parameters by using Class Separability Measure Lei Wang, Kap Luk Chan School of Electrical and Electronic Engineering Nanyang Technological University Singapore, 3979 E-mail: P 3733@ntu.edu.sg,eklchan@ntu.edu.sg
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationIntroduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones
Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive
More informationDiscrete Mathematics and Probability Theory Fall 2015 Lecture 21
CS 70 Discrete Mathematics and Probability Theory Fall 205 Lecture 2 Inference In this note we revisit the problem of inference: Given some data or observations from the world, what can we infer about
More informationMachine Learning 2017
Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section
More informationArtificial Neural Networks
Introduction ANN in Action Final Observations Application: Poverty Detection Artificial Neural Networks Alvaro J. Riascos Villegas University of los Andes and Quantil July 6 2018 Artificial Neural Networks
More informationProbability Models for Bayesian Recognition
Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIAG / osig Second Semester 06/07 Lesson 9 0 arch 07 Probability odels for Bayesian Recognition Notation... Supervised Learning for Bayesian
More informationIterative Laplacian Score for Feature Selection
Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,
More informationFeature selection and extraction Spectral domain quality estimation Alternatives
Feature selection and extraction Error estimation Maa-57.3210 Data Classification and Modelling in Remote Sensing Markus Törmä markus.torma@tkk.fi Measurements Preprocessing: Remove random and systematic
More informationSensor Tasking and Control
Sensor Tasking and Control Sensing Networking Leonidas Guibas Stanford University Computation CS428 Sensor systems are about sensing, after all... System State Continuous and Discrete Variables The quantities
More informationFace Detection and Recognition
Face Detection and Recognition Face Recognition Problem Reading: Chapter 18.10 and, optionally, Face Recognition using Eigenfaces by M. Turk and A. Pentland Queryimage face query database Face Verification
More informationFace Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi
Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Overview Introduction Linear Methods for Dimensionality Reduction Nonlinear Methods and Manifold
More informationHuman Pose Tracking I: Basics. David Fleet University of Toronto
Human Pose Tracking I: Basics David Fleet University of Toronto CIFAR Summer School, 2009 Looking at People Challenges: Complex pose / motion People have many degrees of freedom, comprising an articulated
More informationECE662: Pattern Recognition and Decision Making Processes: HW TWO
ECE662: Pattern Recognition and Decision Making Processes: HW TWO Purdue University Department of Electrical and Computer Engineering West Lafayette, INDIANA, USA Abstract. In this report experiments are
More informationNonparametric Methods Lecture 5
Nonparametric Methods Lecture 5 Jason Corso SUNY at Buffalo 17 Feb. 29 J. Corso (SUNY at Buffalo) Nonparametric Methods Lecture 5 17 Feb. 29 1 / 49 Nonparametric Methods Lecture 5 Overview Previously,
More informationW vs. QCD Jet Tagging at the Large Hadron Collider
W vs. QCD Jet Tagging at the Large Hadron Collider Bryan Anenberg: anenberg@stanford.edu; CS229 December 13, 2013 Problem Statement High energy collisions of protons at the Large Hadron Collider (LHC)
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationVECTOR-QUANTIZATION BY DENSITY MATCHING IN THE MINIMUM KULLBACK-LEIBLER DIVERGENCE SENSE
VECTOR-QUATIZATIO BY DESITY ATCHIG I THE IIU KULLBACK-LEIBLER DIVERGECE SESE Anant Hegde, Deniz Erdogmus, Tue Lehn-Schioler 2, Yadunandana. Rao, Jose C. Principe CEL, Electrical & Computer Engineering
More informationComparison of Modern Stochastic Optimization Algorithms
Comparison of Modern Stochastic Optimization Algorithms George Papamakarios December 214 Abstract Gradient-based optimization methods are popular in machine learning applications. In large-scale problems,
More informationGeneralized Laplacian as Focus Measure
Generalized Laplacian as Focus Measure Muhammad Riaz 1, Seungjin Park, Muhammad Bilal Ahmad 1, Waqas Rasheed 1, and Jongan Park 1 1 School of Information & Communications Engineering, Chosun University,
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our
More informationLinear Classifiers as Pattern Detectors
Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2014/2015 Lesson 16 8 April 2015 Contents Linear Classifiers as Pattern Detectors Notation...2 Linear
More informationLearning Gaussian Process Models from Uncertain Data
Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada
More informationInformation Theory in Computer Vision and Pattern Recognition
Francisco Escolano Pablo Suau Boyan Bonev Information Theory in Computer Vision and Pattern Recognition Foreword by Alan Yuille ~ Springer Contents 1 Introduction...............................................
More informationTowards a Ptolemaic Model for OCR
Towards a Ptolemaic Model for OCR Sriharsha Veeramachaneni and George Nagy Rensselaer Polytechnic Institute, Troy, NY, USA E-mail: nagy@ecse.rpi.edu Abstract In style-constrained classification often there
More informationThe memory centre IMUJ PREPRINT 2012/03. P. Spurek
The memory centre IMUJ PREPRINT 202/03 P. Spurek Faculty of Mathematics and Computer Science, Jagiellonian University, Łojasiewicza 6, 30-348 Kraków, Poland J. Tabor Faculty of Mathematics and Computer
More informationHeeyoul (Henry) Choi. Dept. of Computer Science Texas A&M University
Heeyoul (Henry) Choi Dept. of Computer Science Texas A&M University hchoi@cs.tamu.edu Introduction Speaker Adaptation Eigenvoice Comparison with others MAP, MLLR, EMAP, RMP, CAT, RSW Experiments Future
More informationSupplementary Figure 1: Scheme of the RFT. (a) At first, we separate two quadratures of the field (denoted by and ); (b) then, each quadrature
Supplementary Figure 1: Scheme of the RFT. (a At first, we separate two quadratures of the field (denoted by and ; (b then, each quadrature undergoes a nonlinear transformation, which results in the sine
More information