Introduction to Estimation and Data fusion Part I: Probability, State and Information Models

Size: px
Start display at page:

Download "Introduction to Estimation and Data fusion Part I: Probability, State and Information Models"

Transcription

1 Introduction to Estimation and Data fusion Part I: Probability, State and Information Models Hugh Durrant-Whyte ARC Centre of Excellence for Autonomous Systems Australian Centre for Field Robotics The University of Sydney Introduction to Estimation and Data Fusion Slide 1

2 Introduction Estimation is the problem of determining the value of an unknown quantity from one or more observations Data fusion is the process of combing information from a number of different sources to provide a robust and complete description of an environment or process of interest This course provides a practical introduction to estimation and data fusion methods. The focus is on mathematical, probabilistic and decisiontheoretic, methods. The course is a cut down version of a full five-day course and includes computer-based laboratories which provide the opportunity to implement and evaluate algorithms

3 Rules of Engagement This course is developed to enable you to get up to speed with basic methods as quickly as possible. It is your course: ask questions, make suggestions, etc. If you do not understand something, please ask, or If something is already well known to you, ask me to move on. Use the labs effectively. These are the best means of understanding the mathematics. Introduction to Estimation and Data Fusion Slide 3

4 Course Content Probabilistic Models Probabilistic Methods Data Fusion with Bayes Theorem Information Measures and Information Fusion State models and noise Estimation The Linear Kalman Filter The Extended Kalman Filter Localisation and Map Building Probabilistic (Monte Carlo) Filters Data Fusion The Multi-Sensor Kalman Filter The Inverse Covariance Filter Decentralised Data Fusion Methods Introduction to Estimation and Data Fusion Slide 4

5 Laboratory Sessions Laboratory 1: Probabilistic and Information Data Fusion Methods Laboratory 2: The Linear Kalman Filter (tracking) Laboratory 3: The extended Kalman Filter (localisation) Laboratory 4: The SLAM algorithm Laboratory 5: Particle Filters Laboratory 6: Multi-sensor multi-target tracking Laboratory 7: Decentralised tracking and Sensor Networks Introduction to Estimation and Data Fusion Slide 5

6 Recommended Reference Material Maybeck: the best practical book in the field. Brown and Hwang: good introductory text. Barshalom: Issues of data association and tracking. Gelb: Useful components Grewal and Andrews: Advanced elements Data Fusion: Blackman, Waltz and Linas BarShalom. DDF Methods: Books by Manyika and Mutambara, papers in open literature. These Course Notes Papers included as part of this course Introduction to Estimation and Data Fusion Slide 6

7 Probabilistic Models Introduction to Estimation and Data Fusion Slide 7

8 Probabilistic Models Uncertainty lies at the heart of all descriptions of the sensing and data fusion process. Probabilistic models provide a powerful and consistent means of describing uncertainty and lead naturally into ideas of information fusion and decision making. Introduction to Estimation and Data Fusion Slide 8

9 Probabilistic Models Familiarity with essential probability theory is assumed A probability density function (pdf ) P y ( ) is defined on a random variable y, generally written as P y (y) orsimplyp (y) The random variable may be a scalar or vector quantity, and may be either discrete or continuous in measure. The pdf is a (probabilistic) model of the quantity y; observation or state. The pdf P (y) is considered valid if; 1. It is positive; P (y) 0 for all y, and 2. It sums (integrates) to a total probability of 1; y P (y)dy =1. The joint distribution P xy (x, y) is defined in a similar manner. Introduction to Estimation and Data Fusion Slide 9

10 Joint Probabilistic Models Integrating the pdf P xy (x, y) over the variable x gives the marginal pdf P y (y) as P y (y) = P x xy(x, y)dx, and similarly integrating over y gives the marginal pdf P x (x). The joint pdf over n variables, P (x 1,, x n ), may also be defined with analogous properties to the joint pdf of two variables. The conditional pdf P (x y) isdefinedby and is a pdf on x for each value of y P (x y) = P (x, y) P (y) P ( y) isnotapdf on y Introduction to Estimation and Data Fusion Slide 10

11 The Total Probability Theorem Chain-rule can be used to expand a joint pdf in terms of conditional and marginal distributions: P (x, y) =P (x y)p (y) The chain-rule can be extended to any number of variables P (x 1,, x n )=P(x 1 x 2, x n ) P(x n 1 x n )P (x n ) Expansion may be taken in any convenient order. The Total Probability Theorem P y (y) = P x x y(y x)p x (x)dx. The total probability in a state y can be obtained by considering the ways in which y can occur given that the state x takes a specific value (this is encoded in P x y (y x)), weighted by the probability that each of these values of x is true (encoded in P x (x)). Introduction to Estimation and Data Fusion Slide 11

12 Independence and Conditional Independence If knowledge of y provides no information about x then x and y are independent P (x y) =P (x) Or P (x, y) =P (x)p (y) Conditional independence: Given three random variables x, y and z, If knowledge of the value of z makesthevalueofxindependent of the value of y then P (x y, z) =P (x z) If z indirectly contains all the information contributed by y to the value of x (for example) Implies the intuitive result P (x, y z) =P (x z)p (y z), Introduction to Estimation and Data Fusion Slide 12

13 Independence and Conditional Independence Conditional independence underlies many data fusion algorithms Consider the state of a system x and two observations of this state z 1 and z 2. It should be clear that the two observations are not independent, P (z 1, z 2 ) P (z 1 )P (z 2 ), as they must both depend on the common state x. However the observations usually are conditionally independent given the state P (z 1, z 2 x) =P (z 1 x)p (z 2 x). For data fusion purposes this is a good definition of state Introduction to Estimation and Data Fusion Slide 13

14 Bayes Theorem Consider two random variables x and z on which is defined a joint probability density function P (x, z). The chain-rule of conditional probabilities can be used to expand this density function in two ways P (x, z) = P (x z)p (z) = P (z x)p (x). Bayes theorem is obtained as P (z x)p (x) P (x z) = P (z) Computes the posterior P (x z) given the prior P (x) and an observation P (z x) P (z x) takes the role of a sensor model: First building a sensor model: fix x = x and then ask what pdf on z results. Then use a sensor model: observe z = z and then ask what the pdf on x is. Practically P (z x) is constructed as a function of both variables (or a matrix in discrete form). For each fixed value of x, a distribution in z is defined. Therefore as x varies, a family of distributions in z is created. Introduction to Estimation and Data Fusion Slide 14

15 Bayes Theorem Example I A continuous valued state x, the range to target for example, An observation z of this state. A Gaussian observation model A function of both z and x. P (z x) = 1 exp 1 2πσ 2 z 2 (z x) 2 Building model: state is fixed, x = x, and distribution is a function of z. Using Model: observation is made, z = z, and distribution is a function of x. σ 2 z Prior P (x) = 1 exp 1 2πσ 2 x 2 (x x p ) 2 σ 2 x Introduction to Estimation and Data Fusion Slide 15

16 Bayes Theorem Example I Posterior after taking an observation P (x z) = C 1 exp 1 (x z) 2 2πσ 2 z 2 σz 2 1 = exp 1 (x x) 2 2πσ 2 2 σ 2 where and σ2 x. σ2 z 1 exp 2πσ 2 x x = z + x σx 2 + σz 2 σx 2 + σz 2 p, σ 2 = σ2 zσ 2 x σ 2 z + σ 2 x = 1 σz σ 2 x (x x p ) 2 σ 2 x Introduction to Estimation and Data Fusion Slide 16

17 Bayes Theorem Example IIa A single state x which can take on one of three values: x 1 : x is a type 1 target. x 2 : x is a type 2 target. x 3 : No visible target. Single sensor observes x and returns three possible values: z 1 : Observation of a type 1 target. z 2 : Observation of a type 2 target. z 3 : No target observed. Introduction to Estimation and Data Fusion Slide 17

18 The sensor model is described by the likelihood matrix P 1 (z x): z 1 z 2 z 3 x x x Likelihood matrix is a function of both x and z. For a fixed state, it describes the probability of a particular observation being made (the rows of the matrix). For an observation it describes a probability distribution over the values of true state (the columns) and is then the Likelihood Function Λ(x). Introduction to Estimation and Data Fusion Slide 18

19 Bayes Theorem Example IIb The posterior distribution of the true state x after making an observation z = z i is given by P (x z i )=αp 1 (z i x)p (x) α is a normalizing constant so that sum, over x, of posteriors is 1. Assume a non-informative prior: P (x) =(0.333, 0.333, 0.333) Observe z = z 1, then likelihood is P (z 1 x) =(0.45, 0.45, 0.15) and the posterior is P (x z 1 )=(0.4286, , ). Make this posterior the new prior and again observe z = z 1,then P (x z 1 ) = αp 1 (z 1 x)p (x) = α (0.45, 0.45, 0.15) (0.4286, , ) = (0.4737, , ). Note result is to increase the probability in both type 1 and type 2 targets at the expense of the no-target hypothesis. Introduction to Estimation and Data Fusion Slide 19

20 Consider the set of observations Data Fusion using Bayes Theorem Z n = {z 1 Z 1,, z n Z n }. Posterior distribution given observation set is naively P (x Z n ) = P (Zn x)p (x) P (Z n ) = P (z 1,, z n x)p (x) P (z 1,, z n ) Not easy directly as the joint distribution P (z 1,, z n x) must be known completely Assume conditional independence P (z 1,, z n x) =P (z 1 x) P (z n x) = n i=1 P (z i x) Introduction to Estimation and Data Fusion Slide 20

21 Data Fusion using Bayes Theorem So update becomes P (x Z n )=[P(Z n )] 1 P (x) n P (z i x) This is the independent likelihood pool. In practice, the conditional probabilities P (z i x) are stored a priori as functions of both z i and x. When an observation sequence Z n = {z 1,z 2,,z n } is made, the observed values are instantiated in this probability distribution and likelihood functions Λ i (x) are constructed. i=1 Introduction to Estimation and Data Fusion Slide 21

22 The Independent Likelihood Pool Central Processor P(x) n P(x Z n )=C P( x) Π Λ i (x) i=1 Λ 1 (x) Λ i (x) Λ n (x) P(z 1 x) P(z i x) P(z n x) z 1 z i z n X Introduction to Estimation and Data Fusion Slide 22

23 Data Fusion using Bayes Theorem The effectiveness of fusion relies on the assumption that the information obtained from different information sources is independent when conditioned on the true underlying state of the world. Clearly P (z 1,, z n ) P (z 1 ) P (z n ), As each piece of information depends on a common underlying state x X. Conversely, it is generally quite reasonable to assume that the underlying state is the only thing in common between information sources and so once the state has been specified it is correspondingly reasonable to assume that the information gathered is conditionally independent given this state. Introduction to Estimation and Data Fusion Slide 23

24 Data Fusion using Bayes Theorem Example Ia A second sensor which makes the same three observations as the first sensor, but whose likelihood matrix P 2 (z 2 x) is described by z 1 z 2 z 3 x x x Whereas the first sensor was good at detecting targets but not at distinguishing between different target types, this second sensor has poor overall detection probabilities but good target discrimination capabilities. With a uniform prior, observe z = z 1 then the posterior is (the first column of the likelihood matrix) P (x z 1 )=(0.45, 0.1, 0.45) Introduction to Estimation and Data Fusion Slide 24

25 Data Fusion using Bayes Theorem Example Ib Makes sense to combine the information from both sensors to provide both good detection and good discrimination capabilities. Overall likelihood function for the combined system is P 12 (z 1, z 2 x) =P 1 (z 1 x) :P 2 (z 2 x) x = x 1 z 2 = z 1 z 2 z 3 z 1 = z z 1 = z z 1 = z Introduction to Estimation and Data Fusion Slide 25

26 x = x 2 z 2 = z 1 z 2 z 3 z 1 = z z 1 = z z 1 = z x = x 3 z 2 = z 1 z 2 z 3 z 1 = x z 1 = x z 1 = x Introduction to Estimation and Data Fusion Slide 26

27 Data Fusion using Bayes Theorem Example Ic For each state x = {x 1,x 2,x 3 }, each sub-matrix represents the joint probability of the pair of observations {z i z 1,z j z 2 } being made. Note, each sub-matrix sums to one and so is indeed a valid pdf. Observe z 1 = z 1 and z 2 = z 1 and assuming a uniform prior, then posterior is P (x z 1,z 1 ) = αp 12 (z 1,z 1 x) = αp 1 (z 1 x)p 2 (z 1 x) = α (0.45, 0.45, 0.15) (0.45, 0.1, 0.45) = (0.6429, , ) Sensor 2 adds substantial target discrimination power at the cost of a slight loss of detection performance for the same number of observations. Introduction to Estimation and Data Fusion Slide 27

28 Data Fusion using Bayes Theorem Example Id Repeating this calculation for each z 1, z 2 observation pair: z 1 = z 1 z 2 = z 1 z 2 z 3 x x x z 1 = z 2 z 2 = z 1 z 2 z 3 x x x Introduction to Estimation and Data Fusion Slide 28

29 z 1 = z 3 z 2 = z 1 z 2 z 3 x x x Introduction to Estimation and Data Fusion Slide 29

30 Data Fusion using Bayes Theorem Example Ie The combined sensor provides substantial improvements in overall system performance EG: observe z 1 = z 1 and z 2 = z 1 P (x z 1, z 2 )=(0.6429, , ) Target 1 most likely as expected However observe z 1 = z 1 and z 2 = z 2.Get P (x z 1, z 2 )=(0.1429, , ) Target type 2 has high probability because sensor 1 does detection while sensor 2 does discrimination. If now we observe no target with sensor 2, having detected target type 1 (or 2) with the first sensor, the posterior is given by (0.4821, , ). That is there is a target (because we know sensor 1 is much better at target detection than sensor 2), but we still have no idea which of target 1 or 2 it is as sensor 2 did not make a valid detection. Introduction to Estimation and Data Fusion Slide 30

31 Data Fusion using Bayes Theorem Example If Finally, if sensor 1 gets no detection, but sensor 2 detects target type 1, then the posterior is given (0.1216, , ). That is we still believe there is no target (sensor 1 is better at providing this information) and perversely, sensor 2 confirms this. Practically, the joint likelihood matrix is never constructed (it is easy to see why) Rather, the likelihood matrix is constructed for each sensor and these are only combined when instantiated with an observation. Introduction to Estimation and Data Fusion Slide 31

32 Recursive Bayes Updating Bayes Theorem allows incremental or recursive addition of new information With Z k = {z k, Z k 1 }, expand two ways: P (x, Z k ) = P (x Z k )P (Z k ) = P (z k, Z k 1 x)p (x) = P (z k x)p (Z k 1 x)p (x) Assumed conditional independence of the observation sequence. Equating both sides gives P (x Z k )P (Z k ) = P (z k x)p (Z k 1 x)p (x) = P (z k x)p (x Z k 1 )P (Z k 1 ). Introduction to Estimation and Data Fusion Slide 32

33 Recursive Bayes Updating Noting that P (Z k )/P (Z k 1 )=P(z k Z k 1 ) and rearranging gives P (x Z k )= P (z k x)p (x Z k 1 ) P (z k Z k 1 ) Only need compute and store P (x Z k 1 ) Contains a complete summary of all past information. On arrival of a new P (z k x) posterior takes on the role of the current prior and the product of the two becomes the new posterior. Introduction to Estimation and Data Fusion Slide 33

34 Recursive Bayes Updating: An Example Ia Example of observations in independent Gaussian noise Scalar x, zero mean noise with variance σ 2 ; 1 P (z k x) = exp 1 (z k x) 2 2πσ 2 2 σ 2. Assume posterior distribution after first k 1 observations is Gaussian with mean x k 1 and variance σk 1, 2 P (x Z k 1 1 )= exp 1 (x k 1 x) 2. 2πσ 2 k 1 2 σ 2 k 1 Introduction to Estimation and Data Fusion Slide 34

35 Recursive Bayes Updating: An Example Ib Then the posterior distribution after k observations is P (x Z k ) = K 1 exp 1 (z k x) 2 2πσ 2 2 σ 2 = Where K is a constant and 1 exp 1 2πσ 2 k 2 x k = (x k x) 2 σ 2 k σ2 k 1 σ 2 k 1 + σ 2z k +. 1 exp 2πσ 2 k 1 σ 2 σ 2 k 1 + σ 2x k (x k 1 x) 2 σ 2 k 1 Gaussian distributions are conjugate. σ 2 k = σ2 σ 2 k 1 σ 2 + σ 2 k 1 Introduction to Estimation and Data Fusion Slide 35

36 Recursive Bayes Updating: An Example IIa Quite general prior and likelihood distributions can be handled by direct application of Bayes Theorem defined on a spatial grid. Consider the problem in which we are required to determine the location (x, y) of a target in a defined area. The distribution P (x, y) is simply defined as a set of probability values P (x i,y j )definedatagrid points x i,y j,of area δx i, δy j. The only constraints placed on this distribution are that P (x i,y j ) > 0, x i,y j and that i j P (x i,x j )δx i δy i =1 Introduction to Estimation and Data Fusion Slide 36

37 Recursive Bayes Updating: An Example IIb A passive sensor (sensor 1) located at xs 1 =15,ys 1 = 0km, measures bearings to the target. The sensor is modeled by a conditional probability distribution P 1 (z 1 x, y) which describes, for each possible true target location (x = x i, y = y j ), a probability distribution on observed bearings. In general this conditional density requires a function on z 1 to be defined for each possible target location In the discrete case this requires a three dimensional matrix defining a probability P 1 (z 1 = z 1 x = x i, y = y i ) for each combination of (z 1,x i,y i ). Introduction to Estimation and Data Fusion Slide 37

38 Recursive Bayes Updating: Example IIc In practice however, these functions normally take x and y as parametric inputs, for example, defining Θ = arctan y ys 1 x xs 1 a possible sensor model might be P (z 1 x, y) = α exp 1 (z k Θ) α exp 1 (z k Θ B) 2. 2πσ 2 a 2 2πσ 2 b 2 σ 2 a This density consist of a weighted sum of two Gaussians, each with different variances and one with a fixed bias B. The model is parametrised by the true target bearing Θ. σ 2 b Introduction to Estimation and Data Fusion Slide 38

39 Recursive Bayes Updating: An Example IId As in earlier examples, when a specific observation z 1 = z 1 is made, this is substituted into the sensor model which becomes the likelihood function on Θ, orx and y, only. Practically in this example, the likelihood function is obtained by simply substituting in the value of the observation, then computing and assigning the probability value P 1 (z 1 x i,y j )foreach possible (x i,y j ) combination by direct substitution into the sensor model. The Figure shows a likelihood function computed in this manner. The likelihood shows that the bearing resolution of the sensor is high, whereas it has almost no range accuracy (the likelihood is long and thin with probability mass concentrated on a line running from sensor to target). Introduction to Estimation and Data Fusion Slide 39

40 Recursive Bayes Updating: An Example IIe The posterior distribution can now be computed by simply taking the product of the prior probability P (x, y) with the likelihood P 1 (z 1 x, y) at each of the discrete locations (x = x i, y = y j ) and normalising. The result shows that the distribution defining target location is now approximately restrained to a line along the detected bearing. The posterior P (x, y z 1,z 2 ) following a second observation z 2 by the same sensor provides little improvement in location density following this second observation; this is to be expected as there is no range data available. Introduction to Estimation and Data Fusion Slide 40

41 Recursive Bayes Updating: Example IIf Prior Location Density Location likelihood from Sensor 1 x Y Range (km) X Range (km) Y Range (km) X Range (km) (a) (b) Figure 1: Generalised Bayes Theorem. The figures show plots of two-dimensional distribution functions defined on a grid of x and y points: (a) Prior distribution; (b) likelihood function for first sensor. Introduction to Estimation and Data Fusion Slide 41

42 Recursive Bayes Updating: Example IIg Posterior location density after one observation from sensor 1 Posterior location density after two observations from sensor Y Range (km) X Range (km) Y Range (km) X Range (km) (c) (d) Figure 2: Generalised Bayes Theorem. The figures show plots of two-dimensional distribution functions defined on a grid of x and y points: (c) posterior after one application of Bayes Theorem; (d) posterior after two applications. Introduction to Estimation and Data Fusion Slide 42

43 Recursive Bayes Updating: An Example IIh A second sensor (sensor 2) now takes observations of the target from a location xs 2 =50,ys 2 = 20km. Figure shows the target likelihood P 2 (z 3 x, y) following an observation z 3 by this sensor. It can be seen that this sensor (like sensor 1), has high bearing resolution, but almost no range resolution. However, because the sensor is located at a different site, we would expect that the combination of bearing information from the two sensors would provide accurate location data. Indeed, following point-wise multiplication of the second sensor likelihood with the new prior (the posterior P (x, y z 1,z 2 ) from the previous two observations of sensor 1), we obtain the posterior P (x, y z 1,z 2,z 3 ) shown in Figure which shows all probability mass highly concentrated around a single target location. Introduction to Estimation and Data Fusion Slide 43

44 Recursive Bayes Updating: Example IIi Location likelihood from Sensor 2 Posterior location density following update from sensor Y Range (km) X Range (km) Y Range (km) X Range (km) (e) (f) Figure 3: Generalised Bayes Theorem. The figures show plots of two-dimensional distribution functions defined on a grid of x and y points: (e) likelihood function for second sensor; (f) final posterior. Introduction to Estimation and Data Fusion Slide 44

45 Recursive Bayes Updating: An Example IIj The general approach demonstrated in this example has broad appeal in situations where case-specific prior knowledge can be obtained. For example, if the problem we are interested in is tracking in an underwater environment, with passive accoustics then we could add knowledge about no-go areas (such as land forms) by simply setting the prior to be zero in these areas; P (x = x land, y = y land )=0. In other examples, constraints such as road-ways could be used. Introduction to Estimation and Data Fusion Slide 45

46 Generalised Bayesian Filtering: Problem Statement x k : The state vector to be estimated at time k. u k : A control vector, assumed known, and applied at time k 1 to drive the state from x k 1 to x k at time k. z k : An observation taken of the state x k at time k. In addition, the following sets are also defined: The history of states: X k = {x 0, x 1,, x k } = {X k 1, x k }. The history of control inputs: U k = {u 1, u 2,, u k } = {U k 1, u k }. The history of state observations: Z k = {z 1, z 2,, z k } = {Z k 1, z k }. Recursively estimate Posterior P (x k Z k, U k, x 0 ). Introduction to Estimation and Data Fusion Slide 46

47 Sensor and Motion Models Observation model describes the probability of making an observation z k when the true state x(k) is known P (z k x k ). Assume conditional independence P (Z k X k )= k P (z i X k )= k P (z i x i ). i=1 i=1 Assume vehicle model is Markov: P (x k x k 1, u k ). Introduction to Estimation and Data Fusion Slide 47

48 Observation Update Step Expand joint distribution in terms of the state P (x k, z k Z k 1, U k, x 0 ) = P (x k z k, Z k 1, U k, x 0 )P (z k Z k 1, U k, x 0 ) = P (x k Z k, U k, x 0 )P (z k Z k 1 U k ) and the observation P (x k, z k Z k 1, U k, x 0 ) = P (z k x k, Z k 1, U k, x 0 )P (x k Z k 1, U k, x 0 ) = P (z k x k )P (x k Z k 1, U k, x 0 ) Rearranging: P (x k Z k, U k, x 0 )= P (z k x k )P (x k Z k 1, U k, x 0 ) P (z k Z k 1, U k. ) Introduction to Estimation and Data Fusion Slide 48

49 Observation Update Step P(z k x k = x 1 ) P(z k x k = x 2 ) P(z k = x 1 x k ) P( x k ) Z X Introduction to Estimation and Data Fusion Slide 49

50 Time Update Step: Total Probability Theorem: P (x k Z k 1, U k, x 0 ) = P (x k, x k 1 Z k 1, U k x 0 )dx k 1 = P (x k x k 1, Z k 1, U k, x 0 )P (x k 1 Z k 1, U k, x 0 )dx k 1 = P (x k x k 1, u k )P (x k 1 Z k 1, U k 1, x 0 )dx k 1 Introduction to Estimation and Data Fusion Slide 50

51 Time Update Step P(x k 1,x k ) P(x k ) P(x k,x k 1 )dx k 1 P(x k 1 ) P(x k x k 1 ) P(x k,x k 1 )dx k x k =f(x k 1,U k ) x k x k Introduction to Estimation and Data Fusion Slide 51

52 Recursive Solution Prediction: P (x k Z k 1, U k, x 0 )=P (x k 1 Z k 1, U k 1, x 0 )dx k 1 Update P (x k Z k, U k, x 0 )=K.P(z k x k )P (x k Z k 1, U k, x 0 ) Introduction to Estimation and Data Fusion Slide 52

53 Generalised Bayesian Filtering: Example Ia For low-dimensional problems, it is possible, and instructive, to implement the general Bayesian filter in a direct form. Consider a scalar-valued state x k indexed by time k. Assume that the state-transition is Markovian with defined state-transition probability P (x k x k 1,u k ), where u k is a known control applied to drive x k 1 to x k. Assume also a prior probability P (x k 1 ). The prior information can be predicted forward to time k as P (x k u k )= P (x k x k 1,u k )P (x k 1 )dx k 1. Introduction to Estimation and Data Fusion Slide 53

54 Generalised Bayesian Filtering: Example Ib This time-prediction step is essentially a convolution of two probability densities; P (x k 1 ) and P (x k x k 1,u k ). The process of convolution acts to blur or spread the prior density with the uncertainty arising from state transition. Figure is an example of this process. The time-prediction step clearly represents a loss of information as the prediction is more widely spread than the prior. More generally, convolution results in information loss. Introduction to Estimation and Data Fusion Slide 54

55 Generalised Bayesian Filtering: Example Ic Prediction Step for the Bayes Filter Prior Motion Model Prediction Introduction to Estimation and Data Fusion Slide 55

56 Generalised Bayesian Filtering: Example Id For low dimensional problems, the convolution for the time-prediction can be implemented directly. The direct approach scales exponentially with state dimension. An efficient implementation of convolution is to use multiplication in the frequency domain: % input is two N-length vectors of the two densities INPUT: fx[n], fy[n]; % time-reverse one to get fxy(tau-t) fy=fliplr(fy); % convolution in time is multiplication in frequency: X=fft(fx); Y=fft(fy); Fxy=X.*conj(Y)/N; % take inverse. Only use real part and centre fxy=ifftshift(real(ifft(fxy))); % normalise fxy=gnorm(fxy,x); Introduction to Estimation and Data Fusion Slide 56

57 Generalised Bayesian Filtering: Example Ie For the observation update step an observation model P (z k x k ) is required. This is generally a two-dimensional function of both z k and x k. When an observation z k = z is made, a likelihood function P (z k = z x k ) defined on the state only is generated. The observation update is then simply the normalised product of the prediction with this likelihood P (x k u k,z k = z) =C.P(z k = z x k )P (x k u k ). (1) This computation can be implemented as a point-wise product of the arrays, P (x k u k ) and P (z k = z x k ), both defined only on x k. Figure shows the peaks of the posterior distribution is a weighted sum of the peaks of the prediction and likelihood, and that the spread of the posterior distribution is less than the spread of either the prediction or likelihood. The observation-update step clearly represents a gain of information as the updated density is more compact than the prediction. More generally, multiplication of probability densities results in information gain. Introduction to Estimation and Data Fusion Slide 57

58 Generalised Bayesian Filtering: Example If Update Step for the Bayes Filter Prediction Observation Model Update Introduction to Estimation and Data Fusion Slide 58

59 Distributed Data Fusion with Bayes Theorem Providing basic conditional probability rules are followed, not difficult to construct data fusion architectures. Three Possible Approaches: Communicate observations Communicate likelihoods Communicate local posteriors Introduction to Estimation and Data Fusion Slide 59

60 The Independent Likelihood Pool Fusion Centre P(x) n P(x Z n )=C P( x) Π Λ i (x) i=1 Λ 1 (x) Λ i (x) Λ n (x) P(z 1 x) Sensor 1 Sensor i... P(z... i x) P(z n x) Sensor n z 1 z i z n X Figure 4: The distributed implementation of the independent likelihood pool. Each sensor maintains it s own model in the form of a conditional probability distribution P i (z i x). On arrival of a measurement, z i, the sensor model is instantiated with the associated observation to form a likelihood Λ i (x). This is transmitted to a central fusion centre were the normalised product of likelihoods and prior yields the posterior distribution P (x Z n ). Introduction to Estimation and Data Fusion Slide 60

61 The Independent Likelihood Pool I Likelihood Pool: for example local models: P 1 (z 1 x); P (x Z n )=P (x) i z 1 z 2 z 3 Λ i (x) and P 2 (z 2 x); x x x z 1 z 2 z 3 x x x Introduction to Estimation and Data Fusion Slide 61

62 The Independent Likelihood Pool II if z 1 = z 1 communicate Λ 1 (x) =(0.45, 0.45, 0.15) if z 2 = z 1 communicate Λ 2 (x) =(0.45, 0.1, 0.45) At the fusion processor, the information is combined by multiplication as: P (x z 1 = z 1, z 2 = z 1 ) = C.Λ 1 (x)λ 2 (x)p (x) = (0.45, 0.45, 0.15) (0.45, 0.1, 0.45) (1/3, 1/3, 1/3) = (0.6429, , ), The sensors have become anonymous, they are simply devices that communicate probability distributions on the common state. Introduction to Estimation and Data Fusion Slide 62

63 Distributed Data Fusion with Bayes Theorem k k-1 Fusion Centre P(x k Z n )=C P( x k-1 ) Π n i=1 P i (x k z) P(x k-1 Z n ) P 1 (x k z) P(x k Z n ) P n (x k z) Sensor 1 P(x k z 1 ) P(x k z n ) Sensor n P(z 1 x)... P(z n x) z 1 z n X Figure 5: A distributed implementation of the independent opinion pool in which each sensor maintains both it s own model and also computes a local posterior. The complete posterior is made available to all sensors and so they become, in some sense, autonomous. The figure shows Bayes Theorem in a recursive form. Introduction to Estimation and Data Fusion Slide 63

64 Pool with Local Posteriors Assume prior, P (x) =(1/3, 1/3, 1/3) communicated to the two sensors. Observe, say, z 1 = z 1 so local posterior P 1 (x z 1 )=(0.4286, , ) Observe, say, z 2 = z 1 so local posterior P 2 (x z 2 )=(0.45, 0.1, 0.45) Posterior fusion: P 12 (x z 1, z 2 )=P (x) A further local observation: P 1 (x z 1 ) P (x) P 2 (x z 2 ) =(0.6429, , ) P (x) P 12 (x z 1 = z 1, z 2 = z 1, z 2 = z 1 ) = C.P 2 (z 2 = z 1 x)p 12 (x z 1 = z 1, z 2 = z 1 ) = C.(0.45, 0.1, 0.45) (0.6429, , ) = (0.7232, , ). Introduction to Estimation and Data Fusion Slide 64

65 A Note on The Expectation Operator Expected value of a function of a random variable: E{G(x)} = G(x)f(x)dx E{G(x)} = G(x)f(x) Of note: E{x n },then th moment, and E{(x E{x} ) n },then th central moment. x X EG: the second central moment σ 2 =E{(x x) 2 } is the variance. When x is a vector, the variance is defined as Expectation is a linear operator. Σ =E{(x x)(x x) T }. E{AG(x)+BH(x)} = AE{G(x)} + BE{H(x)}. Introduction to Estimation and Data Fusion Slide 65

66 Log-Likelihoods and Information Methods Introduction to Estimation and Data Fusion Slide 66

67 Data Fusion with Log-Likelihoods Log-likelihoods have both the advantage of computational efficiency and are also more closely related to formal definitions of information. The log-likelihood or conditional log-likelihood are defined as; l(x) =logp (x), l(x y) =logp (x y). Log-likelihood: is always less than or equal to zero: l(x) 0. The log-likelihood is a useful and efficient means of implementing probability calculations. For example Bayes theorem: l(x z) =l(z x)+l(x) l(z). Introduction to Estimation and Data Fusion Slide 67

68 Log-Likelihood Example I Two-sensor discrete target identification example. The log-likelihood matrix for the first sensor (using natural logs) is z 1 z 2 z 3 x x x and for the second z 1 z 2 z 3 x x x Introduction to Estimation and Data Fusion Slide 68

69 The posterior likelihood (given a uniform prior) following observation of target 1 by sensor 1 and target 1 by sensor 2 is the sum of the first columns of each of the likelihood matrices l(x z 1,z 1 ) = l 1 (z 1 x)+l 2 (z 1 x)+c = ( 0.799, 0.799, 2.303) + ( , , ) + C = ( 1.597, 3.101, 2.696) + C = ( 0.442, 1.946, 1.540) where the constant C = is found through normalisation (which in this case requires that the anti-logs sum to one). Note ease of computation is obviously simpler than in probability. Also sufficient to indicate relative likelihoods. Introduction to Estimation and Data Fusion Slide 69

70 Log-Likelihood Example II Case of Gaussian: completing squares gives l(x Z k ) = 1 (x k x) 2 2 σk 2 = 1 (z k x) σ 2 2 x k = σ2 k 1 σ 2 k 1 + σ 2z k + (x k 1 x) 2 σ 2 k 1 σ 2 σ 2 k 1 + σ 2x k 1, + C. (2) σ2 σ 2 k 1 σk 2 = σ 2 + σk 1 2, thus the log-likelihood is quadratic in x; foreachvalueofx, a log-likelihood is specified as 1 (x k x) 2 2, modulo addition of a constant C. σk 2 Introduction to Estimation and Data Fusion Slide 70

71 Data Fusion with Log-Likelihoods Log-likelihoods are a convenient way of implementing distributed data fusion architectures. Fusion of information is simply a matter of summing log-likelihoods. Examples: Fully centralised fusion Independent opinion pool Independent opinion pool with local posterior log-likelihoods. Introduction to Estimation and Data Fusion Slide 71

72 Data Fusion with Log-Likelihoods z 1 (k) k k-1 log P( x {Z k-1 }) log P(z 1 (k) x) 1 z 2 (k) log P( x {Z k }) Σ log P(z 2 (k) x) 2 log P(z N (k) x) N z N (k) Central Processor Sensor Models Sensor Figure 6: A log-likelihood implementation of a fully centralised data fusion architecture. Introduction to Estimation and Data Fusion Slide 72

73 Data Fusion with Log-Likelihoods log P(z 1 (k) x) 1 z 1 (k) k k-1 log P( x {Z k-1 }) log P(z 2 (k) x) 2 z 2 (k) log P( x {Z k }) Σ Sensor Models Central Processor log P(z N (k) x) N z N (k) Sensor Figure 7: A log-likelihood implementation of the independent likelihood pool architecture. Introduction to Estimation and Data Fusion Slide 73

74 Data Fusion with Log-Likelihoods k k-1 Sensor 1 log P(z 1 (k) x) log P( x {Z k } 1 ) Σ z 1 (k) k k-1 Sensor 2 log P(z 2 (k) x) log P( x {Z k }) Σ log P( x {Z k } 2 ) Σ z 2 (k) Central Processor k k-1 Sensor N log P(z N (k) x) log P( x {Z k } N ) Σ z N (k) Figure 8: A log-likelihood implementation of the independent opinion pool architecture. Introduction to Estimation and Data Fusion Slide 74

75 Information Measures Probabilities and log-likelihoods are defined on states or observations. It is often valuable to also measure the amount of information contained in a given probability distribution. Formally, information is a measure of the compactness of a distribution; logically if a probability distribution is spread evenly across many states, then it s information content is low, Conversely, if a probability distribution is highly peaked on a few states, then it s information content is high. Information is thus a function of the distribution, rather than the underlying state. Information measures play an important role in designing and managing data fusion systems. Two probabilistic measures of information are of particular value in data fusion problems; the Shannon information (or entropy) and the Fisher information. Introduction to Estimation and Data Fusion Slide 75

76 Entropy Consider a discrete random variable x which has N possible outcomes {x 1,,x N }. Define p i = P (x = x i ) as the probability that a realisation of x is x i, i =1:,,N. The Shannon information content of an outcome x i is defined as (usually measured in bits, but we use nats ) h(x i ) = log 1 p i = log p i. As the p i are always less than one, the h(x i ) are always positive. As the probability p i becomes smaller, so h(x i ), the information content, becomes larger. Essentially, h(x i ) measures surprise. The more unlikely an event, the more surprising and informative it is when it occurs. Introduction to Estimation and Data Fusion Slide 76

77 Entropy of a Discrete Distribution The entropy or Shannon information H P (x) associated with a probability distribution P (x), defined on a random variable x, is the ensemble average of the Shannon information content of the outcomes, or equivalently, the expected value of minus the log-likelihood: H P (x) = 1 E log = E[logP (x)] P (x) = x X P (x)log 1 P (x) = P (x)logp(x) x X = i p i log 1 p i = i p i log p i for discrete valued random variables. Note that following convention, we have used x as an argument for H P ( ) even though the integral or sum is taken over values of x so H P ( ) is not strictly a function of x but is rather a function of the distribution P ( ). Introduction to Estimation and Data Fusion Slide 77

78 Entropy of a Continuous Distribution I For continuous-valued random variables x an entropy (Boltzman-Shannon Entropy) may also be defined H P (x) = E{log P (x)} = P (x)logp(x)dx. The relationship between continuous and discrete entropy is not immediate. Consider a scalar random variable x and let x i x i 1 P (x)dx = p i, so p i P (x), x i 1 x<x i. x i x i 1 Then the Boltzman-Shannon Entropy may be written H P (x) = n i=1 = n i=1 P (x)logp (x)dx x i x i 1 p i log The discrete analogue of the continuous entropy p i log x i x i 1 p i. x i x i 1 p i x i x i 1 dx Introduction to Estimation and Data Fusion Slide 78

79 Entropy of a Continuous Distribution II A distinction between continuous and discrete entropy is that the discrete case the variables are unequivocal but in the continuous case may be chosen with some freedom. In particular, a transformation of continuous variables x to y such as y = g(x) may be effected. In this case, the entropy on y may be found in terms of the entropy on x as: H(y) =H(x) x P (x)log g x(x) dx where is the determinant operation and g x (x) = g x x=x i is the Jacobian of g with respect to x evaluated at the roots x i = g 1 (y) Of particular interest are changes in coordinate systems. In this case, with y = Ax, the entropy relation is H(y) =H(x)+log A for rotations and other measure preserving (orthonormal) transformations, A = 1 and so H(y) =H(x). Introduction to Estimation and Data Fusion Slide 79

80 The Meaning of Entropic Information I The entropic or Shannon information is both subtle and powerful in its meaning and application. Fundamentally, the entropy H P ( ) measures the compactness of a density P ( ) on a state space. It achieves a minimum of zero when all probability mass is assigned to a single value of x. It achieves a maximum when probability mass is uniformly distributed over all states. In an estimation-theoretic context, it is most natural to think of the most informative probability distribution as that which assigns all probability to a single state; logically the most compact of probabilities. Conversely the least informative distribution is one in which probability is spread uniformly over all states and so entropy is a maximum. Introduction to Estimation and Data Fusion Slide 80

81 The Meaning of Entropic Information II Maximum entropy distributions are often used as prior distributions when no useful prior information is available For example, if the random variable x can take on at most n discrete values in the set X, then the least informative (maximum entropy) distribution on x is one which assigns a uniform probability 1/n to each value. This distribution will clearly have an entropy of log n. Whenx is continuous-valued, the least informative distribution is also uniform, although strictly improper as P (x) = 1 does not integrate to 1. Introduction to Estimation and Data Fusion Slide 81

82 The Meaning of Entropic Information III However, this reasoning is reversed in the context of communication and experimental design. Imagine a simple experiment in which a finite number of outcomes are possible, or equally imagine the receiving channel of a communications link with a finite alphabet of transmission symbols. If the result of the experiment is known with high probability a priori, then the actual occurrence of the event is not very informative. Conversely, if the possibility of each outcome is uniformly distributed, then the outcome itself is most informative. Succinctly; The outcome of a random experiment is guaranteed to be most informative if the probability distribution over outcomes is uniform Mathematically, maximising the Shannon information corresponds to this second interpretation of information. However, in this course we will tend to think about information maximisation as the process of compacting a density and is thus strictly equivalent to maximising the negative of the Shannon information. Introduction to Estimation and Data Fusion Slide 82

83 The Meaning of Entropic Information IV Up to a constant factor entropy turns out to be the only reasonable definition of informativeness Informally, three conditions on an information measure lead to this conclusion: 1. Continuity: The measure should be continuous in the p i. Thus the measure must be a continuous function. 2. Choice: The measure should be a monotonically increasing function of the number of possible outcomes of a random event. In particular, imagine a storage device consisting of N binary switches. The number of possible states is clearly 2 N and the logarithm of the number of states is proportional to N. As the number of switches increases we wish the information measure to also increase in monotonic relation a function proportional to the logarithm of the number of states clearly achieves this. 3. Composition: If a choice of outcome is broken down into two successive stages, the resulting information measure should be a weighted sum of the measures from each stage separately. As we have seen, this is true for log-likelihoods and will be true for linear operators, such as expectation, on log-likelihoods. Introduction to Estimation and Data Fusion Slide 83

84 The Meaning of Entropic Information V The implications of this in data fusion problems are many-fold. The fact the information, by definition, is linearly additive makes computation particularly simple and is fundamental in developing efficient decentralised data fusion algorithms. Introduction to Estimation and Data Fusion Slide 84

85 The Entropy of English (Part I) Classic and intuitive example of measuring the information content of the English language Written language is not random. Different letters have different probabilities of occurrence; we are not so surprised to see an a or b, but are relatively more surprised when we see a q or z. This is readily captured by the Shannon information measure. In this example, the text of Flatland by A. Square (Edwin Abbott) is employed as a sample of the English Language. The sample comprises approximately 200,000 letters and spaces. To make plotting easier, numerical assignments are made for letters with a =1,b=2,,z =26 and space= 27. All other characters are ignored. The probability of occurrence and information content of each character is plotted. It is clear that that letters such as j, q and z are least likely and therefore provide most information. Introduction to Estimation and Data Fusion Slide 85

86 The entropy for the ensemble as a whole is H P (x) = i p i log 1 p i =2.83 nats. Interestingly, most English texts have numerically similar information content. It is also interesting to compare this to the information content of letters (and space) chosen randomly: log N = log 27 = The redundancy of a sample is defined as: R(x) =1 H P(x) log N The redundancy of the example text is 0.14; these means that approximately 14% of letters are redundant. Introduction to Estimation and Data Fusion Slide 86

87 The Entropy of English (Part I) 0.2 Probability of Occurrence of Letter 8 Information Content of Letter Probability Information (nats) Number of Letter Number of Letter (a) (b) Figure 9: The probability and information content of letters in an English text. (a) The probability of occurrence of each letter. Highly probable letters are space, e, t and a (letters 5, 20 and 1 respectively). (b) The information content of each letter. Highly informative letters are z, j and q (letters 26, 10 and 17 respectively). Introduction to Estimation and Data Fusion Slide 87

88 Joint Entropy The joint entropy of one or more outcomes is defined through the joint probability density of these events. If x and y are two random variables with joint probability density P (x y), the the joint entropy is defined as H(x, y) = X in the discrete case, and H(x, y) = X in the continuous case. Y Y P (x, y)logp (x, y) P (x, y)logp (x, y)dxdy The definition of entropy can be extended to any number of random variables and outcomes in the obvious manner. Introduction to Estimation and Data Fusion Slide 88

89 English language example: The Entropy of English (Part II) Pairs of successive letters are sampled so that event x is the selection of the first letter and y the selection of the second letter in a sequence. These pairs of letters are termed bigrams. the joint entropy is found to be H(x, y) = The redundancy in this case is R(x, y) =1 H P(x, y) = log N 2 2 log 27 =0.22 Taking letters in pairs, the English Language has 22% redundancy. Taking letters in triples and so on leads to the conclusion that approximately 50% of written English is redundant (no surprises there!). Introduction to Estimation and Data Fusion Slide 89

90 The Entropy of English (Part II) Joint Probability Density Probability Second Letter First Letter Figure 10: The joint probability of a sequence of two letters in the example text. Two letter sequences are known as bigrams. Note sequences of high probability include the pair th (point [20, 8])and n (letter 14) followed by a vowel (a, e, i, etc). Introduction to Estimation and Data Fusion Slide 90

91 Entropy of an n-dimensional Gaussian is Entropy of Gaussian P (x) =N(x, P) = 2πP 1/2 exp 1 2 (x x)t P 1 (x x) H P (x) = E{log P (x)} = 1 2 E{(x x)t P 1 (x x)+log[(2π) n P ]} = 1 2 E{ ij(x i x i )P 1 ij (x j x j )} 1 2 log[(2π)n P ], = 1 2 = 1 2 = 1 2 = 1 2 ij j j j E{(x j x j )(x i x i )} P 1 ij 1 2 log[(2π)n P ] i P ji P 1 ij 1 2 log[(2π)n P ] (PP 1 ) jj 1 2 log[(2π)n P ] 1 jj 1 2 log[(2π)n P ] Introduction to Estimation and Data Fusion Slide 91

92 = n log[(2π)n P ] = 1 2 log[(2πe)n P ]. Entropy defined only by vector length n and the covariance P. The entropy is proportional to the log of the determinant of the covariance. The determinant of a matrix is a volume measure (determinant is product of eigenvalues) Entropy is a measure of volume enclosed by covariance matrix and consequently the compactness of the probability distribution. If the Gaussian is scalar with variance σ 2, then the entropy is simply given by H(x) =logσ 2πe. Entropy increases with increasing variance. Introduction to Estimation and Data Fusion Slide 92

93 Conditional Entropy The definition of entropy can be extended to include conditional entropy. Consider the information (entropy) about a state x contained in the distribution P (x y) given that the outcome y = y j has already been observed. For discrete random variables this is H P (x y j ) = E{log P (x y j )} = x P (x y j)logp (x y j ) and for continuous-valued random variables H P (x y j ) = E{log P (x y j )} = P (x y j)logp (x y j )dx. Introduction to Estimation and Data Fusion Slide 93

94 Conditional Entropy The conditional entropy is defined as the average or expected value of this entropy over all possible realisations of y: H(x y) = j P (y = y j )H(x y = y j ) = j x P (y = y j)p (x y = y j )logp(x y = y j ) = j x P (x, y = y j)logp (x y = y j ) = P (x, y)logp (x y). y x for discrete random variables and (similarly) H(x y) for continuous random variables. = E{H(x y)} = + + P (x, y)logp (x y)dxdy, Note that H(x y) is not a function of either x or y, rather it is a measure of the information that will be obtained about about x given knowledge of y; on the average before a specific value of y has been determined. Introduction to Estimation and Data Fusion Slide 94

95 Conditional Entropy The chain-rule for conditional probabilities can be employed to obtain a chain-rule for conditional entropy. Taking logs of the chain-rule: log P (x, y) = logp (x y)+logp (y) = logp (y x)+logp (x). Taking expected values of both sides of this equation over P (x, y) yields H(x, y) = H(x y)+h(y) = H(y x)+h(x). This quite naturally states that the entropy about the combined outcome is the sum of the entropy of the first outcome plus the entropy of the second outcome given the first. The chain-rule for conditional entropy can be extended to any number of random variables H(x 1, x 2,, x N )=H(x 1 x 2,, x N )+H(x 2, x 3,, x N )+ + H(x N ). Introduction to Estimation and Data Fusion Slide 95

Chapter I: Fundamental Information Theory

Chapter I: Fundamental Information Theory ECE-S622/T62 Notes Chapter I: Fundamental Information Theory Ruifeng Zhang Dept. of Electrical & Computer Eng. Drexel University. Information Source Information is the outcome of some physical processes.

More information

Simultaneous Localization and Mapping (SLAM) Corso di Robotica Prof. Davide Brugali Università degli Studi di Bergamo

Simultaneous Localization and Mapping (SLAM) Corso di Robotica Prof. Davide Brugali Università degli Studi di Bergamo Simultaneous Localization and Mapping (SLAM) Corso di Robotica Prof. Davide Brugali Università degli Studi di Bergamo Introduction SLAM asks the following question: Is it possible for an autonomous vehicle

More information

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows. Chapter 5 Two Random Variables In a practical engineering problem, there is almost always causal relationship between different events. Some relationships are determined by physical laws, e.g., voltage

More information

ECE 4400:693 - Information Theory

ECE 4400:693 - Information Theory ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential

More information

Review of Probability Theory

Review of Probability Theory Review of Probability Theory Arian Maleki and Tom Do Stanford University Probability theory is the study of uncertainty Through this class, we will be relying on concepts from probability theory for deriving

More information

CHAPTER 3. P (B j A i ) P (B j ) =log 2. j=1

CHAPTER 3. P (B j A i ) P (B j ) =log 2. j=1 CHAPTER 3 Problem 3. : Also : Hence : I(B j ; A i ) = log P (B j A i ) P (B j ) 4 P (B j )= P (B j,a i )= i= 3 P (A i )= P (B j,a i )= j= =log P (B j,a i ) P (B j )P (A i ).3, j=.7, j=.4, j=3.3, i=.7,

More information

CS 630 Basic Probability and Information Theory. Tim Campbell

CS 630 Basic Probability and Information Theory. Tim Campbell CS 630 Basic Probability and Information Theory Tim Campbell 21 January 2003 Probability Theory Probability Theory is the study of how best to predict outcomes of events. An experiment (or trial or event)

More information

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering Lecturer: Nikolay Atanasov: natanasov@ucsd.edu Teaching Assistants: Siwei Guo: s9guo@eng.ucsd.edu Anwesan Pal:

More information

Dynamic System Identification using HDMR-Bayesian Technique

Dynamic System Identification using HDMR-Bayesian Technique Dynamic System Identification using HDMR-Bayesian Technique *Shereena O A 1) and Dr. B N Rao 2) 1), 2) Department of Civil Engineering, IIT Madras, Chennai 600036, Tamil Nadu, India 1) ce14d020@smail.iitm.ac.in

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 2: Entropy and Mutual Information Chapter 2 outline Definitions Entropy Joint entropy, conditional entropy Relative entropy, mutual information Chain rules Jensen s inequality Log-sum inequality

More information

Formulas for probability theory and linear models SF2941

Formulas for probability theory and linear models SF2941 Formulas for probability theory and linear models SF2941 These pages + Appendix 2 of Gut) are permitted as assistance at the exam. 11 maj 2008 Selected formulae of probability Bivariate probability Transforms

More information

Kalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein

Kalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein Kalman filtering and friends: Inference in time series models Herke van Hoof slides mostly by Michael Rubinstein Problem overview Goal Estimate most probable state at time k using measurement up to time

More information

Probabilistic and Bayesian Machine Learning

Probabilistic and Bayesian Machine Learning Probabilistic and Bayesian Machine Learning Lecture 1: Introduction to Probabilistic Modelling Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Why a

More information

Mobile Robot Localization

Mobile Robot Localization Mobile Robot Localization 1 The Problem of Robot Localization Given a map of the environment, how can a robot determine its pose (planar coordinates + orientation)? Two sources of uncertainty: - observations

More information

Lecture 8: Channel Capacity, Continuous Random Variables

Lecture 8: Channel Capacity, Continuous Random Variables EE376A/STATS376A Information Theory Lecture 8-02/0/208 Lecture 8: Channel Capacity, Continuous Random Variables Lecturer: Tsachy Weissman Scribe: Augustine Chemparathy, Adithya Ganesh, Philip Hwang Channel

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Inference and estimation in probabilistic time series models

Inference and estimation in probabilistic time series models 1 Inference and estimation in probabilistic time series models David Barber, A Taylan Cemgil and Silvia Chiappa 11 Time series The term time series refers to data that can be represented as a sequence

More information

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30 Problem Set 2 MAS 622J/1.126J: Pattern Recognition and Analysis Due: 5:00 p.m. on September 30 [Note: All instructions to plot data or write a program should be carried out using Matlab. In order to maintain

More information

Signal Processing - Lecture 7

Signal Processing - Lecture 7 1 Introduction Signal Processing - Lecture 7 Fitting a function to a set of data gathered in time sequence can be viewed as signal processing or learning, and is an important topic in information theory.

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Gaussian Processes for Sequential Prediction

Gaussian Processes for Sequential Prediction Gaussian Processes for Sequential Prediction Michael A. Osborne Machine Learning Research Group Department of Engineering Science University of Oxford Gaussian processes are useful for sequential data,

More information

Exercises with solutions (Set D)

Exercises with solutions (Set D) Exercises with solutions Set D. A fair die is rolled at the same time as a fair coin is tossed. Let A be the number on the upper surface of the die and let B describe the outcome of the coin toss, where

More information

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,

More information

2 (Statistics) Random variables

2 (Statistics) Random variables 2 (Statistics) Random variables References: DeGroot and Schervish, chapters 3, 4 and 5; Stirzaker, chapters 4, 5 and 6 We will now study the main tools use for modeling experiments with unknown outcomes

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing

More information

Consider the joint probability, P(x,y), shown as the contours in the figure above. P(x) is given by the integral of P(x,y) over all values of y.

Consider the joint probability, P(x,y), shown as the contours in the figure above. P(x) is given by the integral of P(x,y) over all values of y. ATMO/OPTI 656b Spring 009 Bayesian Retrievals Note: This follows the discussion in Chapter of Rogers (000) As we have seen, the problem with the nadir viewing emission measurements is they do not contain

More information

Conditional probabilities and graphical models

Conditional probabilities and graphical models Conditional probabilities and graphical models Thomas Mailund Bioinformatics Research Centre (BiRC), Aarhus University Probability theory allows us to describe uncertainty in the processes we model within

More information

Statistics for scientists and engineers

Statistics for scientists and engineers Statistics for scientists and engineers February 0, 006 Contents Introduction. Motivation - why study statistics?................................... Examples..................................................3

More information

Introduction to Machine Learning

Introduction to Machine Learning What does this mean? Outline Contents Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola December 26, 2017 1 Introduction to Probability 1 2 Random Variables 3 3 Bayes

More information

2 Functions of random variables

2 Functions of random variables 2 Functions of random variables A basic statistical model for sample data is a collection of random variables X 1,..., X n. The data are summarised in terms of certain sample statistics, calculated as

More information

BAYESIAN DECISION THEORY

BAYESIAN DECISION THEORY Last updated: September 17, 2012 BAYESIAN DECISION THEORY Problems 2 The following problems from the textbook are relevant: 2.1 2.9, 2.11, 2.17 For this week, please at least solve Problem 2.3. We will

More information

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

Multimedia Communications. Mathematical Preliminaries for Lossless Compression Multimedia Communications Mathematical Preliminaries for Lossless Compression What we will see in this chapter Definition of information and entropy Modeling a data source Definition of coding and when

More information

AUTOMOTIVE ENVIRONMENT SENSORS

AUTOMOTIVE ENVIRONMENT SENSORS AUTOMOTIVE ENVIRONMENT SENSORS Lecture 5. Localization BME KÖZLEKEDÉSMÉRNÖKI ÉS JÁRMŰMÉRNÖKI KAR 32708-2/2017/INTFIN SZÁMÚ EMMI ÁLTAL TÁMOGATOTT TANANYAG Related concepts Concepts related to vehicles moving

More information

Introduction to Mobile Robotics Probabilistic Robotics

Introduction to Mobile Robotics Probabilistic Robotics Introduction to Mobile Robotics Probabilistic Robotics Wolfram Burgard 1 Probabilistic Robotics Key idea: Explicit representation of uncertainty (using the calculus of probability theory) Perception Action

More information

9 Multi-Model State Estimation

9 Multi-Model State Estimation Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof. N. Shimkin 9 Multi-Model State

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Mathematical Formulation of Our Example

Mathematical Formulation of Our Example Mathematical Formulation of Our Example We define two binary random variables: open and, where is light on or light off. Our question is: What is? Computer Vision 1 Combining Evidence Suppose our robot

More information

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information. L65 Dept. of Linguistics, Indiana University Fall 205 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission rate

More information

Ch. 8 Math Preliminaries for Lossy Coding. 8.4 Info Theory Revisited

Ch. 8 Math Preliminaries for Lossy Coding. 8.4 Info Theory Revisited Ch. 8 Math Preliminaries for Lossy Coding 8.4 Info Theory Revisited 1 Info Theory Goals for Lossy Coding Again just as for the lossless case Info Theory provides: Basis for Algorithms & Bounds on Performance

More information

Dept. of Linguistics, Indiana University Fall 2015

Dept. of Linguistics, Indiana University Fall 2015 L645 Dept. of Linguistics, Indiana University Fall 2015 1 / 28 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission

More information

Based on slides by Richard Zemel

Based on slides by Richard Zemel CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

More information

Bayesian Linear Regression [DRAFT - In Progress]

Bayesian Linear Regression [DRAFT - In Progress] Bayesian Linear Regression [DRAFT - In Progress] David S. Rosenberg Abstract Here we develop some basics of Bayesian linear regression. Most of the calculations for this document come from the basic theory

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Lecture 4: Extended Kalman filter and Statistically Linearized Filter

Lecture 4: Extended Kalman filter and Statistically Linearized Filter Lecture 4: Extended Kalman filter and Statistically Linearized Filter Department of Biomedical Engineering and Computational Science Aalto University February 17, 2011 Contents 1 Overview of EKF 2 Linear

More information

Mobile Robot Localization

Mobile Robot Localization Mobile Robot Localization 1 The Problem of Robot Localization Given a map of the environment, how can a robot determine its pose (planar coordinates + orientation)? Two sources of uncertainty: - observations

More information

Modeling and state estimation Examples State estimation Probabilities Bayes filter Particle filter. Modeling. CSC752 Autonomous Robotic Systems

Modeling and state estimation Examples State estimation Probabilities Bayes filter Particle filter. Modeling. CSC752 Autonomous Robotic Systems Modeling CSC752 Autonomous Robotic Systems Ubbo Visser Department of Computer Science University of Miami February 21, 2017 Outline 1 Modeling and state estimation 2 Examples 3 State estimation 4 Probabilities

More information

A review of probability theory

A review of probability theory 1 A review of probability theory In this book we will study dynamical systems driven by noise. Noise is something that changes randomly with time, and quantities that do this are called stochastic processes.

More information

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Bayesian decision theory 8001652 Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Jussi Tohka jussi.tohka@tut.fi Institute of Signal Processing Tampere University of Technology

More information

EKF and SLAM. McGill COMP 765 Sept 18 th, 2017

EKF and SLAM. McGill COMP 765 Sept 18 th, 2017 EKF and SLAM McGill COMP 765 Sept 18 th, 2017 Outline News and information Instructions for paper presentations Continue on Kalman filter: EKF and extension to mapping Example of a real mapping system:

More information

Robotics. Lecture 4: Probabilistic Robotics. See course website for up to date information.

Robotics. Lecture 4: Probabilistic Robotics. See course website   for up to date information. Robotics Lecture 4: Probabilistic Robotics See course website http://www.doc.ic.ac.uk/~ajd/robotics/ for up to date information. Andrew Davison Department of Computing Imperial College London Review: Sensors

More information

Some Concepts of Probability (Review) Volker Tresp Summer 2018

Some Concepts of Probability (Review) Volker Tresp Summer 2018 Some Concepts of Probability (Review) Volker Tresp Summer 2018 1 Definition There are different way to define what a probability stands for Mathematically, the most rigorous definition is based on Kolmogorov

More information

p(z)

p(z) Chapter Statistics. Introduction This lecture is a quick review of basic statistical concepts; probabilities, mean, variance, covariance, correlation, linear regression, probability density functions and

More information

conditional cdf, conditional pdf, total probability theorem?

conditional cdf, conditional pdf, total probability theorem? 6 Multiple Random Variables 6.0 INTRODUCTION scalar vs. random variable cdf, pdf transformation of a random variable conditional cdf, conditional pdf, total probability theorem expectation of a random

More information

INTRODUCTION TO PATTERN RECOGNITION

INTRODUCTION TO PATTERN RECOGNITION INTRODUCTION TO PATTERN RECOGNITION INSTRUCTOR: WEI DING 1 Pattern Recognition Automatic discovery of regularities in data through the use of computer algorithms With the use of these regularities to take

More information

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21 CS 70 Discrete Mathematics and Probability Theory Fall 205 Lecture 2 Inference In this note we revisit the problem of inference: Given some data or observations from the world, what can we infer about

More information

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion Today Probability and Statistics Naïve Bayes Classification Linear Algebra Matrix Multiplication Matrix Inversion Calculus Vector Calculus Optimization Lagrange Multipliers 1 Classical Artificial Intelligence

More information

Preliminary statistics

Preliminary statistics 1 Preliminary statistics The solution of a geophysical inverse problem can be obtained by a combination of information from observed data, the theoretical relation between data and earth parameters (models),

More information

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Joint Probability Distributions and Random Samples (Devore Chapter Five) Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01: Probability and Statistics for Engineers Spring 2013 Contents 1 Joint Probability Distributions 2 1.1 Two Discrete

More information

1 Introduction to information theory

1 Introduction to information theory 1 Introduction to information theory 1.1 Introduction In this chapter we present some of the basic concepts of information theory. The situations we have in mind involve the exchange of information through

More information

ECE531: Principles of Detection and Estimation Course Introduction

ECE531: Principles of Detection and Estimation Course Introduction ECE531: Principles of Detection and Estimation Course Introduction D. Richard Brown III WPI 22-January-2009 WPI D. Richard Brown III 22-January-2009 1 / 37 Lecture 1 Major Topics 1. Web page. 2. Syllabus

More information

Introduction to Machine Learning Midterm Exam Solutions

Introduction to Machine Learning Midterm Exam Solutions 10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Math Review Sheet, Fall 2008

Math Review Sheet, Fall 2008 1 Descriptive Statistics Math 3070-5 Review Sheet, Fall 2008 First we need to know about the relationship among Population Samples Objects The distribution of the population can be given in one of the

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

CS491/691: Introduction to Aerial Robotics

CS491/691: Introduction to Aerial Robotics CS491/691: Introduction to Aerial Robotics Topic: State Estimation Dr. Kostas Alexis (CSE) World state (or system state) Belief state: Our belief/estimate of the world state World state: Real state of

More information

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent

More information

Kalman Filter. Predict: Update: x k k 1 = F k x k 1 k 1 + B k u k P k k 1 = F k P k 1 k 1 F T k + Q

Kalman Filter. Predict: Update: x k k 1 = F k x k 1 k 1 + B k u k P k k 1 = F k P k 1 k 1 F T k + Q Kalman Filter Kalman Filter Predict: x k k 1 = F k x k 1 k 1 + B k u k P k k 1 = F k P k 1 k 1 F T k + Q Update: K = P k k 1 Hk T (H k P k k 1 Hk T + R) 1 x k k = x k k 1 + K(z k H k x k k 1 ) P k k =(I

More information

Sensor Tasking and Control

Sensor Tasking and Control Sensor Tasking and Control Sensing Networking Leonidas Guibas Stanford University Computation CS428 Sensor systems are about sensing, after all... System State Continuous and Discrete Variables The quantities

More information

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber Data Modeling & Analysis Techniques Probability & Statistics Manfred Huber 2017 1 Probability and Statistics Probability and statistics are often used interchangeably but are different, related fields

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

Introduction to Probability and Stocastic Processes - Part I

Introduction to Probability and Stocastic Processes - Part I Introduction to Probability and Stocastic Processes - Part I Lecture 2 Henrik Vie Christensen vie@control.auc.dk Department of Control Engineering Institute of Electronic Systems Aalborg University Denmark

More information

Random Variables and Their Distributions

Random Variables and Their Distributions Chapter 3 Random Variables and Their Distributions A random variable (r.v.) is a function that assigns one and only one numerical value to each simple event in an experiment. We will denote r.vs by capital

More information

+ + ( + ) = Linear recurrent networks. Simpler, much more amenable to analytic treatment E.g. by choosing

+ + ( + ) = Linear recurrent networks. Simpler, much more amenable to analytic treatment E.g. by choosing Linear recurrent networks Simpler, much more amenable to analytic treatment E.g. by choosing + ( + ) = Firing rates can be negative Approximates dynamics around fixed point Approximation often reasonable

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

Markov localization uses an explicit, discrete representation for the probability of all position in the state space.

Markov localization uses an explicit, discrete representation for the probability of all position in the state space. Markov Kalman Filter Localization Markov localization localization starting from any unknown position recovers from ambiguous situation. However, to update the probability of all positions within the whole

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by

More information

Vector Derivatives and the Gradient

Vector Derivatives and the Gradient ECE 275AB Lecture 10 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p. 1/1 Lecture 10 ECE 275A Vector Derivatives and the Gradient ECE 275AB Lecture 10 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego

More information

Example: Letter Frequencies

Example: Letter Frequencies Example: Letter Frequencies i a i p i 1 a 0.0575 2 b 0.0128 3 c 0.0263 4 d 0.0285 5 e 0.0913 6 f 0.0173 7 g 0.0133 8 h 0.0313 9 i 0.0599 10 j 0.0006 11 k 0.0084 12 l 0.0335 13 m 0.0235 14 n 0.0596 15 o

More information

1 Probabilities. 1.1 Basics 1 PROBABILITIES

1 Probabilities. 1.1 Basics 1 PROBABILITIES 1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability

More information

CIS 390 Fall 2016 Robotics: Planning and Perception Final Review Questions

CIS 390 Fall 2016 Robotics: Planning and Perception Final Review Questions CIS 390 Fall 2016 Robotics: Planning and Perception Final Review Questions December 14, 2016 Questions Throughout the following questions we will assume that x t is the state vector at time t, z t is the

More information

1 Random Variable: Topics

1 Random Variable: Topics Note: Handouts DO NOT replace the book. In most cases, they only provide a guideline on topics and an intuitive feel. 1 Random Variable: Topics Chap 2, 2.1-2.4 and Chap 3, 3.1-3.3 What is a random variable?

More information

Example: Letter Frequencies

Example: Letter Frequencies Example: Letter Frequencies i a i p i 1 a 0.0575 2 b 0.0128 3 c 0.0263 4 d 0.0285 5 e 0.0913 6 f 0.0173 7 g 0.0133 8 h 0.0313 9 i 0.0599 10 j 0.0006 11 k 0.0084 12 l 0.0335 13 m 0.0235 14 n 0.0596 15 o

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

More information

ECE295, Data Assimila0on and Inverse Problems, Spring 2015

ECE295, Data Assimila0on and Inverse Problems, Spring 2015 ECE295, Data Assimila0on and Inverse Problems, Spring 2015 1 April, Intro; Linear discrete Inverse problems (Aster Ch 1 and 2) Slides 8 April, SVD (Aster ch 2 and 3) Slides 15 April, RegularizaFon (ch

More information

Lecture 17: Differential Entropy

Lecture 17: Differential Entropy Lecture 17: Differential Entropy Differential entropy AEP for differential entropy Quantization Maximum differential entropy Estimation counterpart of Fano s inequality Dr. Yao Xie, ECE587, Information

More information

[POLS 8500] Review of Linear Algebra, Probability and Information Theory

[POLS 8500] Review of Linear Algebra, Probability and Information Theory [POLS 8500] Review of Linear Algebra, Probability and Information Theory Professor Jason Anastasopoulos ljanastas@uga.edu January 12, 2017 For today... Basic linear algebra. Basic probability. Programming

More information

Probability, CLT, CLT counterexamples, Bayes. The PDF file of this lecture contains a full reference document on probability and random variables.

Probability, CLT, CLT counterexamples, Bayes. The PDF file of this lecture contains a full reference document on probability and random variables. Lecture 5 A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2015 http://www.astro.cornell.edu/~cordes/a6523 Probability, CLT, CLT counterexamples, Bayes The PDF file of

More information

ECE531 Lecture 8: Non-Random Parameter Estimation

ECE531 Lecture 8: Non-Random Parameter Estimation ECE531 Lecture 8: Non-Random Parameter Estimation D. Richard Brown III Worcester Polytechnic Institute 19-March-2009 Worcester Polytechnic Institute D. Richard Brown III 19-March-2009 1 / 25 Introduction

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians Engineering Part IIB: Module F Statistical Pattern Processing University of Cambridge Engineering Part IIB Module F: Statistical Pattern Processing Handout : Multivariate Gaussians. Generative Model Decision

More information

Learning Bayesian network : Given structure and completely observed data

Learning Bayesian network : Given structure and completely observed data Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution

More information

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring / Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Combining Classifiers Empirical view Theoretical

More information

F denotes cumulative density. denotes probability density function; (.)

F denotes cumulative density. denotes probability density function; (.) BAYESIAN ANALYSIS: FOREWORDS Notation. System means the real thing and a model is an assumed mathematical form for the system.. he probability model class M contains the set of the all admissible models

More information

Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization

Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization Wolfram Burgard, Cyrill Stachniss, Maren Bennewitz, Kai Arras 1 Motivation Recall: Discrete filter Discretize the

More information

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:

More information

Multiple Random Variables

Multiple Random Variables Multiple Random Variables Joint Probability Density Let X and Y be two random variables. Their joint distribution function is F ( XY x, y) P X x Y y. F XY ( ) 1, < x

More information