Perception: objects in the environment

Size: px

Start display at page:

Download "Perception: objects in the environment"

Regina Owens
5 years ago
Views:

1 Zsolt Vizi, Ph.D. 2018

2 Self-driving cars

3 Sensor fusion: one categorization Type 1: low-level/raw data fusion combining several sources of raw data to produce new data that is expected to be more informative than the inputs. Type 2: intermediate-level/feature level fusion combining various features (e.g. positions) into a feature map, which can be used for higher level decisions Type 3: high-level/decision fusion combining decisions from several experts (e.g. voting, fuzzy-logic, statistical methods)

4 Perception: topics of this lecture Object tracking Object type classification

5 Tracking problems The basic tracking problem is to estimate the position and velocity of the target(s), using the available sensor data (from a sequence of scans). The multi-target tracking problem is not simply a tracking problem when there are more than one target problem of associating measurements with targets.

6 Tracking problems vs. typical estimation problems Strong temporal component is involved. Estimation of quantities, which are expected to change over time. Current state is interested. Current state is computed from previous states.

7 Bayesian inference Inference methods consist of estimating the current values for a set of parameters based on a set of observations or measurements. Bayesian estimation: the parameters are random variables that have a prior probability and the observations are noisy as well

8 Recursive Bayesian estimation Source:

9 Bayes theorem p(x z) = p(z x)p(x) p(z) X: target state (random vector variable); Z: observation (random vector variable); p(x) (probability density function of X): prior density; p(x z): posterior density; p(z x): likelihood function; p(z): normalization constant p(z) = p(z x)p(x) dx R nx Notation: p(x, z) = joint density

10 Point estimators In summary: we estimate densities at each steps... How do we use it in a specific tracking problem? Mapping the density into real world: point estimator ˆx Procedure 1. Define cost function L(x, ˆx), which defines a penalty for an erroneous estimate ˆx x. Typical choice: L(x, ˆx) = (x ˆx) T M(x ˆx). 2. Bayesian risk R: R = E (L(x, ˆx)) = L(x, ˆx)p(x)dx R nx Note: ˆx = ˆx(z). Optimal choice: ˆx(z) = argmin L(x (z), x)p(x, z) dx dz x (z) R nx+nz

11 Point estimators In summary: we estimate densities at each steps... How do we use it in a specific tracking problem? Mapping the density into real world: point estimator ˆx Procedure 3. Using first derivative, we get [homework] ˆx(z) = E(x z) 4. Uncertainity for this estimation: P xx = E((x ˆx)(x ˆx) T z)

12 Dynamic and sensor model We focus on first-order Markov processes (current state is dependent only on the previous state). 1. System dynamic model: x n = f n 1 (x n 1 ) + u n + v n 1 x n : state vector at time t n f n 1 : deterministic transition function u n : known deterministic control v n 1 : additive noise

13 Dynamic and sensor model We focus on first-order Markov processes (current state is dependent only on the previous state). 2. Sensor model: z n = h n (x n ) + w n z n : current observation vector h n : deterministic observation function w n : additive noise Simplifying assumption: f n and h n are adiabatic (changing very slowly in time) f n = f, h n = h

14 Recursive Bayesian filtering Notation: z 1:n = {z 1, z 2,..., z n } (observations up to nth step) Goal: estimating p(x n z 1:n ) applying Bayes theorem: p(x n z 1:n ) = p(z 1:n x n )p(x n ) p(z 1:n ) After some calculations, we obtain p(x n z 1:n ) = p (z n x n ) p (x n z 1:n 1 ) p (z n z 1:n 1 ) Using Chapman-Kolmogorov equation for p (x n z 1:n 1 ), we derive p (x n z 1:n 1 ) = p(x n x n 1 )p(x n 1 z 1:n 1 ) dx n 1 R nx

15 Recursive Bayesian filtering

16 Back to the point estimators: a notation State estimation: ˆx n p = E{x n z 1:p } Uncertainity estimation: P xx n p = E { (x n ˆx n p )(x n ˆx n p ) T z 1:p }

17 Back to the point estimators: prediction More algebraic manipulation (combining previous formulae, system dynamic model) give: ˆx n n 1 = {f n 1 (x n 1 ) + u n + v n 1 } p(x n 1 z 1:n 1 )dx n 1 R nx and P xx n n 1 = R nx { fn 1 (x n 1 ) + u n ˆx n n 1 } { fn 1 (x n 1 ) + u n ˆx n n 1 } T p(x n 1 z 1:n 1 )dx n 1 + Q, where Q is the covariance matrix of the system noise: Q = v n 1 vn 1p(x T n 1 z 1:n 1 )dx n 1 R nx

18 Back to the point estimators: update [by Kálmán] ˆx n n = Az obs n + b Assumption: E(x n ˆx n z 1:n 1 ) = 0 E((x n ˆx n )zn obs z 1:n 1 ) = 0 Goal #1: Determine A, b Some calculations give b = ˆx n n 1 Aẑ n n 1 and ( ) 1 A = K n = Pn n 1 xz Pn n 1 zz which implies ˆx n n = ˆx n n 1 + K n (z obs n ẑ n n 1 ) P xx n n = K np zz n n 1 KT n

19 Back to the point estimators: update [by Kálmán] Goal #2: Determine ẑ n n, Pn n 1 xz, P n n 1 zz ẑ n n = {h n (x n ) + w n }p(x n z 1:n 1 )dx n, R nx P xz n n 1 = R nx {x n ˆx n n 1 }{h n (x n ) ẑ n n 1 } T p(x n z 1:n 1 )dx n, P zz n n 1 = R nx {h n (x n ) ẑ n n 1 }{h n (x n ) ẑ n n 1 } T p(x n z 1:n 1 )dx n + R where covariance matrix of observation noise is R = w n wn T p(x n z 1:n 1 )dx n R nx

20 Recursive point estimation process

21 Gaussian densities: Kálmán filters Linear KF: f, h linear Extended KF: f, h nonlinear + Taylor approximation Finite Difference KF: f, h nonlinear + Stirling approximation Unscented KF: f, h nonlinear + sigma points: on hypersphere in all directions Spherical Simplex KF: f, h nonlinear + sigma points: on intersection of simplex & hypersphere Gaussian-Hermite KF: f, h, nonlinear + sigma points: vertices of hypercube Monte Carlo KF: f, h nonlinear + MC sampling

22 Kálmán Filters Source: html

23 LKF System dynamic model: Sensor model: x n = F x n 1 + v n 1 z n = Hx n + w n All densities are Gaussian: 1 N (x; m, C) = { (2π) n det(c) exp 1 } 2 (x m)c 1 (x m) T ) p(x n z 1:n ) = N (x n ; ˆx n n, P xx n n ) p(q n z 1:n 1 ) = N (q n ; ˆq n n, P qq n n 1 ) p(z n ) = N (z n ; ẑ n n, P zz n n 1 )

24 LKF Using these assumptions, we obtain: 1. Prediction: ˆx n n 1 = F ˆx n 1 n 1, P xx n n 1 = F P xx n 1 n 1 F T + Q 2. Observation prediction/likelihood: ẑ n n 1 = H ˆx n n 1, Pn n 1 zz xx = HPn n 1 HT + R, P xz n n 1 = P xx n n 1 HT

25 LKF Using these assumptions, we obtain: 3. Update: ( ) 1 K n = Pn n 1 xz Pn n 1 zz ( ) ˆx n n = ˆx n n 1 + K n zn obs ẑ n n 1, P xx n n = P xx n n 1 K np zz n n 1 KT n,

26 An example [finally...] State vector: Constant velocity model: System noise: x n = [ p n v n ] = [ position velocity ] F = [ ] 1 dt 0 1 v n 1 N (0, Q) Observation model (sensor output: noisy measurement for position): H = [ 1 0 ] Measurement noise: w n 1 N (0, σ 2 )

27 An example [finally...] (

28 Another example Simo Särkkä, Bayesian Filtering and Smoothing, Cambridge Unversity Press, 2013 Figure: Chapter 4.3, Example 4.3

29 Some words about multi-target tracking problem Nearest Neighbors Filter Probabilistic Data Association Filter Multihypothesis Filter More details: freiburg.de/teaching/ws10/robotics2/pdfs/rob2-15- dataassociation.pdf

30 Non-Gaussian densities: Particle Filters Explanation without equations: Tutorial with a bunch of equations: Doucet A., Johansen A.M., A Tutorial on Particle Filtering and Smoothing: Fifteen years later

31 References 1. Anton J. Haug, Bayesian Estimation and Tracking: A Practical Guide, Wiley, Sudha Challa et. al, Fundamentals of object tracking, Cambridge University Press, 2011

32 Classification problems The problem of identifying to which of a set of categories (sub-populations) a new observation belongs, on the basis of a training set of data containing observations (or instances) whose category membership is known.

33 Classification problems Supervised learning: learning where a training set of correctly identified observations is available (Machine Learning) Feature/Explanatory variable/independent variable: quantifiable property, which is used in the representation of the observation Category/Outcome/Dependent variable/target variable/class/response variable: e.g. spam/not-spam Classifier: classification algorithm Binary and multiclass classification: two and more classes are involved

34 Classification vs. estimation Many classifiers output simply the best class as an answer Probabilistic classifiers/estimators return a probability of the instance being a member of each of the possible classes best class is naturally the one with the highest probability Advantages of probabilistic methods: probability = confidence value efficient in large-scale problems, because error-propagation can be avoided

35 General Linear Models GLM Large class of models, where the response variable is assumed to follow an exponential family distribution and this variable is a (typically nonlinear) function of the linear combination of feature values.

36 General Linear Models Exponential family p(x) = h(x) exp { η T T (x) A(η) } More details: jordan/courses/260- spring10/other-readings/chapter8.pdf Examples for family members Bernoulli distribution: X {0, 1}; P (X = 1) = π Normal distribution p(x) = π x (1 π) 1 x, x {0, 1}

37 General Linear Models General equation where X: explanatory variables; E(Y ) = g 1 (β T X), Y : response variable from an exponentatial family distribution; β: parameter vector; g: link function

38 General Linear Models An example for GLM: linear regression Model: E(Y ) = β 0 + β 1 x β k x k Y is normally distributed Link function: g 1 (E(Y )) = E(Y ) [identity]

39 Logistic regression Response variable Y is Bernoulli distributed with parameter π, i.e. the observations can correspond to two classes (binary case) if Y Bernoulli(π), then E(Y ) = π. X = {X 1, X 2,..., X k } is the feature vector. Odds ratio: ratio of π and 1 π, which gives a measure for comparing class memberships Link function: logarithm of odds ratio ( ) π g(π) = log 1 π Model: ( ) π log = β 0 + β 1 X β k X k 1 π

40 Logistic regression Probably more familiar with mean function g 1 (β T X): π = exp( β T X) Source:

41 Logistic regression We use a training set { X (i), Y (i)} N i=1. Goodness of parameter vector = how correctly it works on the training set gap/error is small Cost function/error function/loss function: measures the error of the estimate, the goal is to minimize Classical choice: mean squared error MSE(β) = 1 N N (Y i Ŷi) 2 i=1

42 Logistic regression A more convenient cost function now: CE(β) = 1 N 1 N N i=1 Y (i) log (g ( 1 β T X (i))) N (1 Y (i) ) log (1 ( g 1 β T X (i))) i=1 Finding optimal parameter vector = minimizing cost function

43 Some notes on optimization Gradient descent Source:

44 Some notes on optimization Stochastic gradient descent Source: Andrew Ng, Machine Learning (online course, Coursera)

45 Multiclass problem Possible modifications: 1 vs. rest [K classifiers] 1 vs. 1 [K(K 1)/2 classifiers] multinominal extension: for logistic regression, introduce softmax function ) exp (β j T X π j = K i=1 exp ( j = 1, 2,..., K βi T X);

Evaluation Confusion matrix Source: https://www.packtpub.

46 Evaluation Confusion matrix Source: Liu et. al, Learning accurate and interpretable models based on regularized random forests regression, doi: / s3-s5

47 Evaluation ROC curve Source:

48 Naive Bayes classifier Probabilistic model: the goal is to calculate the probability p(c j x), where C j is the correspodence to class j, x = (x 1, x 2,..., x k ) is an instance of the feature vector. Approach: Bayes theorem! p(c k x) = p(x C k)p(c k ) p(x) Condition of naivety : features are conditionally independent p(x i x i+1,..., x k, C j ) = p(x i C j )

49 Naive Bayes classifier Using definition of conditional PDFs, law of total probability and chain rule, we can derive p(c j x) = p(c j) k i=1 p(x i C j ) k i=1 p(c i)p(x C j ) How to classify: k ŷ = argmax j {1,2,...,K} p(c j ) p(x i C j ) i=1

50 Naive Bayes-Gauss classifier

51 More general topics (not covered now) Bias, variance Problem of underfitting/overfitting Regularization Curse of dimensionality Feature engineering Model selection (e.g. elimination techniques)

52 More basic classification algorithms Decision Tree, Random Forest K-nearest neighbors Support Vector Machine

Deep Learning Artificial Neural Networks Source: https://www.

53 Deep Learning Artificial Neural Networks Source:

54 Deep Learning Artificial Neural Networks Source:

55 Deep Learning Artificial Neural Networks Source:

56 References Online courses Kirill Eremenko, Machine Learning (udemy.com) Kirill Eremenko, Deep Learning (udemy.com) Andrew Ng, Machine Learning (coursera.org) Pennsylvania State University, Statistics online ( Books Christopher Bishop, Pattern Recognition and Machine Learning, Springer, 2011 Goodfellow et. al, Deep Learning, MIT Press, 2016 Useful links

57 Thank You For Your Attention!

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing