A Theoretical Overview on Kalman Filtering

Size: px
Start display at page:

Download "A Theoretical Overview on Kalman Filtering"

Transcription

1 A Theoretical Overview on Kalman Filtering Constantinos Mavroeidis Vanier College Presented to professors: IVANOV T. IVAN STAHN CHRISTIAN June 6, 208 Abstract Kalman filtering is a state estimation algorithm used in tracking and data prediction tasks. Common applications of the Kalman filter consist of machine navigation, localization systems of autonomous vehicles [8], statistical signal processing, prediction of future network congestions [], spacecraft orbit calculations and trajectory optimization [7], econometrics [2], and biometrics [3]. The standard Kalman filter is relies on a linear dynamical system with discretized time. It is most commonly used in GPS positioning and computer system analysis, and has important applications in prediction systems of network congestions []. There are many extensions of the algorithm that have been developed for non-linear systems and for systems that do not follow assumptions of the standard Kalman filter. The Extended Kalman Filter[9] and the Unscented Kalman filter[0] have been developed to encompass the non-linear properties of certain systems. In this paper, we focus on a linear underlying system with normally distributed noise, which is the most common case in simple applications. We first introduce any statistical background necessary to understand the theoretical concepts behind Kalman filtering, assuming the reader has a general understanding of probability-theory, linear dynamical systems and basic calculus. After introducing the statistical background necessary to understand the math behind the algorithm, we illustrate the essence of these theoretical results using a basic state estimation example of a linear system. The state observation and estimation models developed in this paper can effectively be used in other, similar types of applications with a strictly underlying linear dynamical system. Introduction Kalman filtering is an optimal state estimation process proposed in 960 by Rudolf E. Kalman as a solution to the filtering problem of a linear, discretized nature [-4]. It is widely used in the engineering field, most commonly in spacecraft navigation and trajectory optimization for robots. It is a useful tool due to its recursive process and its little memory requirements, and gives most optimal state estimations even when a part of the system is hidden. In applications, part of a system s state is usually not observable, so the best estimate of the hidden portion of the state can be determined using the Kalman filter. A variation of the standard algorithm, presently known as the Extended Kalman filter (EKF) [5-6] was initially used in the aerospace industry, more specifically, for the non-linear trajectory outlined by the Apollo Navigation Compututer during the Apollo program [7]. The EKF /textitlinearizes the non-linear system and forces the system to work with the linearized version of the model. Some applications of the EKF include constructing localization systems for autonomous vehicles [8]. Despite its ability to linearize non-linear models, the math behind the EKF can be rather complex, making it difficult to implement such algorithm with efficiency [9]. Another extension of the Kalman filter, the Unscented Kalman filter (UKF) [0], was therefore developed to overcome such difficulties using non-linear estimation techniques that have been shown to be more effective than the EKF in many applications, including vehicle navigation [9]. Despite the wide range of applications of these extensions of the standard Kalman Filter, many dynamical systems can be modeled linearly, including position and motion estimation of a road vehicle. Obtaining a highly accurate estimation of the vehicle s actual position through the algorithm is crucial in developing trustworthy GPS systems. In this paper, we introduce and expand on the fundamental ideas behind the Kalman filter and how it can be used to obtain the most optimal state of a linear system with Gaussian noise. Kalman filtering is an algorithm that combines imprecise estimates of some true value, either observable or not, and yields the most precise estimate of that value. Informally, this can be illustrated through the problem of choosing the most trustworthy value out of two proposed values. For instance, we can measure the temperature across a resistor s terminals using different types of thermometers. For now, we will use the word independent to indicate that the value displayed by each thermometer does not affect the value displayed by another. Suppose we obtain different results from different

2 thermometers. How does one decide which is the most accurate temperature across the resistor s terminals? Rather than choosing the value displayed by one thermometer, we can combine all the displayed values using a linear estimator, and allocate weights to each thermometer s displayed value based on how trustworthy a thermometer is. This idea informally defines confidence in an estimate. When dealing with a simple case of only two thermometers, we might be inclined to take the average of the two temperatures. If thermometer displays a temperature estimate x and thermometer 2 displays an estimate x 2, we can combine both estimates using the formula 0.5 x x 2, where each estimate is given equal weight. If we have additional information on the quality of each thermometer, more specifically, if one is newer or more appropriate for such types of measurements, we may have more confidence in the resukts displayed by that thermometer. We can generally consider a linear combination of the two estimates given by an expression of the form ( α) x + α x 2 where 0 α. Intuitively, the more confidence we have in the second estimate x 2, the closer α should be to, where x no longer contributes to the equation. The equation ( α) x + α x 2 is an example of a linear estimator. In this paper we will use similar types of linear estimators for both scalar and vector estimates. The Kalman Filter is an algorithm that helps us determine the optimal value of α. We will conclude that the weight given to an estimate should be proportional to the confidence we have in that estimate. Section 2 describes the model used in Kalman filtering and develops on scalar and vector estimates and confidence in estimates. A scalar estimate is a sample drawn from certain distribution with mean µ and variance σ 2 as its parameters. Vector estimates are samples drawn from a distribution with mean µ and covariance matrix Σ. We quantify confidence in estimates based on the variances and covariances of these distributions. Sections 3-5 develop on the two most important statistical ideas behind the Kalman filter.. Section 3 shows that we should fuse scalar estimates optimally by minimizing the variance of the fused estimate. It is also shown that fusing more than two estimates can give the same result as fusing two estimates recursively, without any loss of quality in the final estimate. Section 4 extends these results from scalar estimates to vector estimates. A vector estimate is a vector where the entries are scalar samples drawn from distinct distributions. Extending the fusion of estimates from the scalar to the vector case might seem difficult due to the complexity of the equations, but the main difference is that variances are simply replaced by covariance matrices. 2. In applications where the estimates are vectors, only part of the vector may be directly observable. For instance if state estimation consists of position and velocity, the position of an aircraft might not be observable, so Section 5 introduces the Best Linear Unbiased Estimator(BLUE). We use the BLUE estimator to obtain the optimal result for the hidden portion of the state that is not directly determined. Section 6 shows how these results can be used to obtain optimal state estimates for linear systems. We first consider the problem of state estimation where the entire state is observable in Sections 3 and 4, and then consider the more complex case of a partially observable state, which requires the use of the BLUE estimator introduced in Section 5. Finally, we illustrate the application of the theoretical results obtained in sections 3-5 by using an example of state estimation of a road vehicle with partial observability of the vehicle s state. 2 SCALAR AND VECTOR ESTIMATES In this section we further explore scalar and vector estimates, and use a basic understanding of probability theory to introduce confidence in the desired estimate. 2. Scalar estimates We will use the hypothetical example proposed in section to develop the basic principles of scalar estimates and the confidence assigned to them. An estimate drawn from thermometer i corresponds to a sample temperature drawn from distribution i. Since temperature is a continuous scalar, each distribution p i has a probability density assigned to every possible temperature interval. The probability of obtaining a precise temperature value from a distribution is assigned a 0 probability, since no thermometer can record exact values of temperature. The i th estimate is obtained from the i th distribution as shown in Figure, where the x-axis corresponds to all possible temperature values in C, and the y-axis represents the probability density. In most applications, the estimates used in the Kalman filtering process are assumed to be drawn from Gaussian distributions, but for the sake of generality, we will only assume that the mean µ i and the variance σi 2 of each distribution p i are known. Throughout the paper, we will use the following notation: x i : p i (µ i, σi 2) to denote that x i is a random sample with mean µ i and variance σi 2 drawn from distribution p i. For complicated expressions containing variance parameters, we will use the reciprocal of the variance called the precision of a distribution to simplify the equations. The confidence in temperature estimates made by each thermometer is determined by the variance of their corresponding distributions. We quantify the concept of confidence in the context presented by the example. When measuring the temperature across resistor terminals, a resistance temperature detector (RTD) gives a highly accurate temperature 2

3 Figure : Distributions Figure 2: Precision Vs. Accuracy estimate compared to a thermocouple when dealing with temperatures below 600 C [4]. Therefore, the distribution corresponding to the RTD has a smaller variance than the distribution used to model the thermocouple s temperature estimation. We should therefore be more confident in the temperature samples drawn from the RTD distribution. The estimates drawn from the distributions may seem ambiguous because we are unsure of what the actual temperature across the resistor is and how close the estimates are to the real value. The approach used to model the confidence in these estimates does not give clarity to how close the estimated temperatures are to the actual temperature across the terminals, so we rely on how close the estimates are to each other. This requires the distinction between precision and accuracy to be made. Accuracy measures how close an estimate is to the true value, usually called the ground truth, and precision evaluates how close the estimates are to each other, with no mention of the ground truth. Figure 2 makes the distinction between the two using darts as an example. The bullseye represents the ground truth value, and the highly-rewarding points around the bullseye repreresent accurate estimates of the true value. In this context, the real value of the temperature across the resistor terminals is unknown due to the continuous nature of such scalar quantity, so confidence in a sample is assigned based on the values obtained from other estimated temperatues. We use the same approach in this paper, and evaluate confidence in estimates based on precision. An important assumption was made in the introduction of the thermometer example. The estimates x, x 2,..., x n are assumed to be uncorrelated. This is intuitive, since the result observed by one thermometer is not affected by the other thermometers recording the temperature at the same time. Uncorrelated scalar estimates x i are formally defined as follows: E[(x i µ i )(x j µ j )] = 0, where i j. Even though dependence and correlation of random variables are related, there is an important distinction to be made between the two. Correlation between two random variables only captures the linear relationship between the two. Two random variables can be uncorrelated and dependent, meaning that they are simply not linearly dependent. The difference between correlation and dependence is further discussed in Appendix 8.. LEMMA 2.. Let x : p (µ, σ), 2..., x n : p n (µ n, σn) 2 be a set of pairwise uncorrelated random variables. Let y be a linear combination of the x i s, such that y = n i= α ix i. The mean and variance of y are: µ y = α i µ i () i= 3

4 PROOF. Equation is obtained from the linearity of expectation: σy 2 = αi 2 σ i (2) i=. µ y = E[y] = E [ n ] α i x i = n i= i= α i µ i Equation 2 is obtained due to the fact that the estimates are pairwise uncorrelated: [ ( n σy 2 = E[(y µ y ) 2 ] = E α i x i [ n = E α i (x i µ i ) i= i= )( n α i µ i α j x j i= ] α j (x j µ j ) = j= j= i= j= ) ] α j µ j j= E [ α i α j (x i µ i )(x j µ j ) ] Because x i are pairwise uncorrelated, E[(x i µ i )(x j µ j )] = 0 for i j, so the result follows: αi 2 E[(x i µ i ) 2 ] = αi 2 σi 2 i= i= 2.2 Vector estimates Instead of dealing with a single random variable, we construct a column matrix with a random variable in each entry, usually called a random vector. For instance, random variables such [ as] the position (X) and velocity(v) of a vehicle with X(t) respect to time could be expressed in a random vector as follows:. In this paper, we denote a vector by a boldfaced V (t) lowercase letter, and a matrix by an uppercase letter. The covariance of a random vector denoted by cov is analogous to the variance of a random variable, but instead measures the joint variability of two random variables. The covariance between two jointly distributed random variables X and Y is defined by cov(x, Y ) = E[(X E[X])(Y E[Y ])] and describes the expectation product of each random variable s deviation from its mean. The covariance matrix of a random vector x with mean µ x is the matrix given by E[(x µ x )(x µ x ) T ]. µ 2 Estimates: A vector estimate x i is a random sample drawn from distribution p i (µ i,σ i ). Each µ i is a vector., where n is the number of random variables in the random vector x i, and each entry represents the mean of each corresponding random variable. The covariance matrix of a n-dimensional random vector [ X X 2 X n ] T is the matrix E[(X µ )(X µ )] E[(X µ )(X 2 µ 2 )] E[(X µ )(X n µ n )] E[(X 2 µ 2 )(X µ )] E[(X 2 µ 2 )(X 2 µ 2 )] E[(X 2 µ 2 )(X n µ n )] Σ= E[(X n µ n )(X µ )] E[(X n µ n )(X 2 µ 2 )] E[(X n µ n )(X n µ n )] or equivalently given by the following expression: Σ = E[(x E[x])(x E[x]) T ]. Each (i, j) entry corresponds to the covariance between the i th and j th random variables of the random vector. Intuitively, the covariance matrix of a random vector is symmetric. If we choose to construct the covariance matrix between two distinct random vectors, the covariance matrix loses its symmetric property, but each (i, j) entry still corresponds to the µ µ n 4

5 covariance between the i th random variable of the first random vector, and the j th random variable of the second random vector. If the dimension of x i is one, the covariance matrix reduces to its first entry, which represents the variance of the random variable X. Similar to the scalar case, the inverse of the covariance matrix Σ i is called the precision or information matrix. Uncorrelated estimates: Estimates x i and x j are uncorrelated if E[(x i µ i )(x j µ j ) T ] = 0, that is, each i th and j th random variables are uncorrelated. This implies that each entry E[(X i µ i )(X j µ j )] of the covariance matrix is zero. The following Lemma generalizes Lemma 2. to the vector case. LEMMA 2.2. Let x : p (µ, Σ ),..., x n : p n (µ n, Σ n ) be a set of pairwise uncorrelated random vectors of length m, and let y = n i= A ix i. The mean and variance of y are: µ y = A i µ i (3) Σ y = i= A i Σ i A T i (4) i= PROOF. Equation 3 follows from the linearity of the expectation operator. We prove equation 4 in the same way we proved equation 2: [ n Σ y = E[(y µ y )(y µ y ) T ] = E A i (x i µ i ) i= ] A j (x j µ j ) T = j= i= j= ] A i E [(x i µ i )(x j µ j ) T A T j Since x i and x j are uncorrelated, E[(x i µ i )(x j µ j )] = 0 for (i j), so the result follows. 3 FUSING SCALAR ESTIMATES In this section, we introduce the fusion process of scalar estimates. In section 3., we discuss the fusion of two scalar estimates, and generalize this to the fusion of n > 2 scalar estimates in section 3.2. Finally in section 3.3, we show that without any loss of quality in the final estimate, we can repeatedly fuse 2 estimates at a time instead of fusing n > 2 estimates. 3. Fusing two scalar estimates We now consider the problem of choosing the optimal value of α when fusing estimates with the linear estimator y α (x, x 2 ) = ( α) x + α(x 2 ), where x i are scalar uncorrelated estimates. We also formalize confidence in estimates. A statistical approach in choosing the appropriate α depends on the variance of y a (x, x 2 ). The optimal value of α will minimize the variance of the fused estimate. This is not counterintuitive, because lower variance yields higher confidence; that is, confidence in the estimate is inversely proportional to the variance of the distribution from which the estimate is drawn. Minimizing the variance of the estimator will therefore produce the a fused estimate with the highest confidence, ŷ α. The variance of the estimator is called the mean square error (MSE), and the minimum value of this variance as α varies is called the minimum mean square error (MMSE). The following Theorem shows how to obtain the MMSE. THEOREM 3.. Let x : p (x) (µ, σ 2 ) and x 2 : p 2 (x) (µ 2, σ 2 2) be uncorrelated scalar estimates fused by the linear estimator y α (x, x 2 ) = ( α) x + α x 2. The variance (MSE) of y α is minimized for α = σ2 σ 2 +. σ2 2 PROOF. From Lemma 2., the two-estimate case (i = 2) is expressed as follows: σ 2 y(α) = ( α) 2 σ 2 + α 2 σ 2 2. (5) Differentiating σ 2 y(α) with respect to α and setting the derivative equal to zero leads to the result: d dα σ2 y(α) = 2σ 2 (α ) + 2ασ 2 2 = 0 α = σ2 σ 2 + σ2 2 5

6 This optimal α value is denoted by K, which is called the Kalman gain. Substituting K into the linear fusion model gives us the optimal linear estimator ŷ(x, x 2 ): ŷ(x, x 2 ) = σ2 2 σ 2 + x + σ2 σ2 2 σ 2 + x 2 σ2 2 Multiplying this by σ 2 σ2 2 σ 2 σ2 2 will help us generalize this result for the fusion of n > 2 estimates: ŷ(x, x 2 ) = σ 2 x + σ 2 2 x 2 (6) σ 2 + σ 2 2 σ 2 + σ 2 2 Substituting K into equation 5 gives the variance of ŷ: σ 2 ŷ = + σ 2 2 (7) σ 2 We have now derived the equations for the optimally fused estimate ŷ and its corresponding variance σŷ. We define a new variable to replace the reciprocal of the variance so we can simplify the process of fusing n > 2 estimates. We let v = and v σ 2 2 =, where v σ2 2 i represents the precision of the corresponding estimate s distribution. Equations 6 and 7 can be rewritten as follows: ŷ(x, x 2 ) = v v + v 2 x + v 2 v + v 2 x 2 (8) vŷ = v + v 2 (9) Since the confidence in an estimate is inversely proportional to its variance, these results show that the weight we assign to each estimate is proportional to the confidence we have in each estimate. Note that if µ = µ 2, then E[y α ] = µ = µ 2, where y α is called an unbiased estimator. The bias of the estimator y α is zero, meaning that the difference between the mean of the estimator and the mean of the true value is zero. 3.2 Fusing scalar estimates for n > 2 In Section 3., we developed a way to optimally fuse two scalar uncorrelated estimates. Note that the estimates x i don t have to be mutually uncorrelated; that is, using pairwise uncorrelated estimates is enough to generalize the fusion approach. The following Theorem shows how to obtain the values of α that minimize the variance of the fused estimate. THEOREM 3.2. Let pairwise uncorrelated estimates {x i } drawn from distributions p i (x) (µ i, σi 2 ) be fused using the linear model y α (x,..., x n ) = n i= α ix i where σi= n α i =. The values α i that minimize the variance of the fused estimate y α are given by: α i = σ 2 i n i= σ 2 i 6

7 PROOF. Lemma 2. can be used to obtain the optimal values α i by minimizing the variance given by the equation σ 2 y(α) = i= α2 i σ2 i. To find the values that minimize σ2 y(α) under the constraint σ n i= α i =, we use the method of Lagrange multipliers. We define the lagrangian as αi 2 σi 2 = λ ( n α i ) i= We then take the gradient of both sides. Note that since we are dealing with scalar estimates, taking the gradient vector of both sides of the equation is equivalent to taking the partial derivatives of each side with respect to each α i : ( n αi 2 σ 2 ) i = λ α i i= α i ( n i= i= ) α i = λ 2σ 2 i We then substitute the values of α i into the constraint n i= α i =, and substitute λ back into the equation α i = λ to obtain the result. Substituting the optimal values obtained into equations σ 2 y(α) = i= α2 i σ2 i and y α(x,..., x n ) = n i= α ix i yields the following results that generalize equations 6-9: 2σ 2 i σ 2 ŷ = n i= σ 2 i (0) ŷ(x,..., x n ) = i= σ 2 i σ 2 i σ 2 n x i These expressions can be simplified by changing parameters from variances to precisions: ŷ(x,..., x n ) = i= vŷ = v i v v n x i () v i (2) i= 3.3 Incremental fusing is optimal In the specific case where a scalar estimate represents the position of a vehicle at a specific time, the estimates would only become available over a period of time as the vehicle moves. If all estimates are available and stored, we can repeatedly use equations and 2 whenever a new estimate becomes available, in order to fuse all currently available estimates. Because this requires unnecessary work and storage, we prefer to perform the fusion process incrementally, so we can save both time and space. Rather than adding new estimates into a running sum every time a new estimate becomes available, we show that we can instead keep a running estimate that is fused with the next available estimate, without any loss in the quality of the final estimate. The desired outcome of this section is illustrated with the equation: ŷ(x, x 2,..., x n ) = ŷ(ŷ(ŷ(x, x 2 ), x 3,..., x n ) where each time, the running estimate is fused with the new estimate. This equation is illustrated in figure 3. The initially available estimate x is first fused with x 2 which shortly becomes available. The labels on the arrows that connect x and x 2 to the fused estimate represent the weights allocated to the estimates. Using equation 8, the first two estimates are fused into ŷ(x, x 2 ), with a precision obtained by equation 9, which is displayed above the estimate. We can proceed with the incremental fusion process when the next estimate x 3 becomes available. Using equation 8 again, x 3 and ŷ(x, x 2 ) can be fused in the same way as the previous step, to produce ŷ(ŷ(x, x 2 ), x 3 ). The incremental fusion process can be repeated n times, where each estimate becomes available over a period of time and is fused with the previously fused estimate. 7

8 Figure 3: Dataflow graph for incremental fusion Each estimate x i contributes to the final value ŷ(ŷ(ŷ(x 0, x ), x 2,..., x n ), and its contribution is given by the product of the weights on the path from that estimate to the final value. For instance, the contribution of x i to the final fused value is given by: v i v v i... v v n v i = v v i v v i+ v v n v v n which yields the same value for the weight of each x i given by equation. Because we can obtain the same result by fusing estimates that successively become available over a period of time, incremental fusion becomes optimal. 3.4 Summary: fusing scalar estimates for n = 2 The results obtained in this section can be used for optimal incremental fusion of scalar estimates. When fusing uncertain scalar estimates with a linear model, the weight allocated to each estimate should be directly proportional to the precision of the distribution that the estimate is drawn from. Intuitively, the weight should be inversely proportional to the variance of that estimate. The most optimal fusion process is incremental, meaning that fusing n > 2 estimates will not result in any loss of quality in the final estimate. Finally, the confidence in the estimate is formally expressed in terms of the Kalman gain, and the equations for incremental fusing of scalar estimates are given below. x : p (µ, σ 2 ), x 2 : p 2 (µ 2, σ 2 2) K = σ2 σ 2 + = v 2 σ2 2 v + v 2 ŷ(x, x 2 ) = x + K(x 2 x ) µŷ = µ + K(µ 2 µ ) σ 2 ŷ = σ 2 Kσ 2 or vŷ = v + v 2 (3) (4) (5) (6) 4 FUSING VECTOR ESTIMATES In this section we extend the results for fusing n > 2 scalar estimates to fusing n > 2 vector estimates. Incremental fusing is still optimal for vector estimates, but we only discuss the generalized fusion model since the two-vector case can be determined from the general case. Similar to section 3, we conclude that vector estimates can be fused by a linear model simply by replacing the variance with a covariance matrix which was previously defined in section Fusing vector estimates for n > 2 For vector estimates, we extend the scalar linear fusion model to the vector case: y A (x, x 2,..., x n ) = A i x i (7) where each matrix A i represents the matrix parameter for each corresponding x i, and n i= A i = I. If µ i = µ j for i j, i= 8

9 y A is an unbiased estimator. Optimallity: The two-norm (l 2 -norm) of an n-dimensional real vector is defined by x = n i= x2 i where x i is the i th entry of the vector x. The MSE of the estimator is given by the expected value of the two-norm of (y A µ ya ), which is E[(y A µ ya ) T (y A µ ya )]. The MMSE is determined by the values A i that yield the optimal covariance matrix of the fused vector estimate. The proof for theorem 4. is similar to the one of Theorem 3.2, but instead uses matrix derivatives which are necessary for Lagrangian optimization with matrix constraints. The complete proof of the theorem is given in Appendix 8.3. Theorem 4. generalizes theorem 3.2, where the variance is replaced by the covariance matrix. THEOREM 4.. Let pairwise uncorrelated estimates x i drawn from distributions p i (µ i, Σ i ) be fused using the linear model y A (x, x 2,..., x n ) = n i= A ix i under the constraint n i= A i = I. MSE(y A ) is minimized for A i = ( n j= Σ ) Σ j i. (8) The covariance matrix of ŷ can be determined by substituting equation 8 into equation 4 given by Lemma 2.2. To show this, we denote the covariance matrix Σ by the matrix B for a clear proof. Bŷ = = A i B i A T i = i= [ ( n i= j= B j [ ( n i= j= B j ) (B (( n ) T i ) B i j= B j B i [ ( n j= ) T ) ] B j ) B i In the following step, we bring n i= inside the square brackets, and change the index of n ] T ] i= (B i ) T from i to j ( n = j= B j ) ( n ( n ) = B j j= j= B j ) T (( n j= ) T ) B j Changing back from B to Σ, the variance of the fused vector estimate ŷ is given: Σŷ = ( n j= Σ ) j (9) The inverse of the covariance matrix, Σ is the precision matrix denoted by N. We simplify Equations 8 and 9 using the precision matrix, and obtain Equations 20 and 2 which generalize ŷ(x,..., x n ) and vŷ to the vector case: ŷ(x,..., x n ) = Nŷ = ( N j ) N i x i (20) i= j= N j (2) j= It can be shown that there is no loss in the quality of the final vector estimate, when fusing n > 2 vector estimates incrementally. The proof is similar to the scalar case in Section 3.3 and is omitted. Similar to the scalar case, we define the Kalman gain K as the value of A that minimizes MSE(ŷ): K = Σ (Σ + Σ 2 ) (22) 9

10 We can generalize Equations 3 through 8 to the vector case by using the Kalman gain. The covariance matrix of the fused estimate ŷ in terms of Kalman gain can be written as follows: Σŷ = Σ (Σ + Σ 2 ) Σ 2 (23) = KΣ 2 = Σ KΣ (24) The Equations obtained in this section can be expressed in terms of the Kalman gain by substituting K = Σ (Σ +Σ 2 ) into each Equation. Equation 26 is obtained in the following way: Obtaining Equation 26: For n = 2, we use Equation 20 to get ŷ(x, x 2 ) = (N + N 2 ) (N x ) + (N + N 2 ) (N 2 x 2 ) = ( K)x + Kx 2 = x + K(x 2 x ) 4.2 Summary: fusing vector estimates for n = 2 The results for fusing two vector estimates in terms of K are as follows: x : p (µ, Σ ), x 2 : p 2 (µ 2, Σ 2 ) K = Σ (Σ + Σ 2 ) = (N + N 2 ) N 2 ŷ(x, x 2 ) = x + K(x 2 x ) µŷ = µ + K(µ + µ 2 ) Σŷ = Σ KΣ or Nŷ = N + N 2 (25) (26) (27) (28) 5 BEST LINEAR UNBIASED ESTIMATOR (BLUE) In most applications, the estimates are vectors. In the example where vector estimates represent the state of a vehicle with respect to time, the entire state of the vehicle might not be directly observable. For instance, state variables such as position and velocity of a rocket might not be available, but can be estimated through the acceleration of the rocket. In this case, the acceleration is observable, but its position and velocity are hidden. It is important to be able to estimate the hidden portion of the rocket s state for an accurate estimation of the rocket s real trajectory. The Gauss-Markov theorem [5] states that for a linear regression model where all errors are mutually uncorrelated with expectation 0 and have equal variances, the best linear unbiased estimator is given by the OLS estimator. This section introduces a generalization of the ordinary least squares (OLS) estimator, where the the set of discrete points to be optimally modeled by a linear relation is replaced by a set of random vectors. A deterministic approach to obtain a value of y for a given value of x will not be enough in the case where x and y are random variables. This requires the modeling of a stochastic process, where we estimate the value of y given x and necessary information on the correlation between the two random variables. The joint probability distribution of scalars x and y is illustrated in Figure 4, where each point (x, y) on the xy plane is mapped to a point on the positive z-axis by the transformation associated to the joint probability function f(x, y). The ellipse in Figure 5 represents the projection of a joint probability distribution onto the xy plane. This shaded region contains the points (x,y) with significant probability; that is, it s the region where most points (x,y) are likely to be found. 0

11 Figure 4: A joint probability distribution of two random variables Figure 5: BLUE line corresponding to equation 29 For a given value x, there are infinitely many possible values of y, but the values of y inside the ellipse are most probable. Furthermore, the value ŷ on the line crossing the ellipse is the most reasonable estimate for y, since the variance at this point is minimized. This line is called the best linear unbiased estimator (BLUE), and is the analog of ordinary least squares (OLS) for distributions. In this paper, we introduce the BLUE estimator which replaces the set of discrete data points {(x i, y i )} to a set of random vectors {(x i, y i )}. 5. Computing BLUE Let x : p x (µ x, Σ xx ) and y : p y (µ y, Σ yy ) be random vectors related by the linear model y A,b (x) = Ax + b. Similar to the OLS approach, we choose the values A and b that minimze the MSE between the random variable y and its estimate, in order to obtain the optimal estimate, ŷ. MSE A,b (ŷ) = E[(y ŷ) T (y ŷ)] = E[(y (Ax + B)) T (y (Ax + b))] = E[((y T (Ax + b)) T )(y (Ax + b))] = E[y T y 2y T (Ax + b) + (Ax + b) T (Ax + b)] By setting the partial derivatives of MSE A,b (ŷ) with respect to b and A equal to zero, we get b = µ y + Σ yx Σ xx(x µ x ) and A = Σ yx Σ xx, where Σ yx is the covariance between y and x. We then susbtistute the results into the linear model:

12 ŷ = (Σ yx Σ xx)x + µ y µ x (Σ yx Σ xx) The best linear estimator is therefore given by the following Equation: ŷ = µ y + Σ yx Σ xx(x µ x ) (29) The expected value of the best linear estimator is equal to µ y, making it an unbiased estimator. If random vectors x and y are uncorrelated, E[(X i µ Xi )(Y j µ Yj )] = 0 i = j, i j, therefore making Σ yx the zero matrix. If the two random vectors are uncorrelated, ŷ = µ y, meaning that different values of x will not provide additional information about y. Furthermore if Σ yx = 0, the BLUE line is parallel to the x-axis, where different values of x are all mapped to the same y value. If y and x are related by a constant matrix C such as y = Cx, then ŷ = Cx since Σ yx = CΣ xx. (This is obtained from the linearity of the expectation operator). In Figure 5, this corresponds to the case where the ellipse is projected onto the BLUE line. The covariance matrix of the best estimator of y is given by the following Equation: Σŷ = Σ yy Σ yx Σ xxσ xy (30) where Σ yx = Σ T xy. Intuitively, for a given value x 0, we can reduce the uncertainty in the estimator of y by a factor that depends on the strength of correlation between x and y. The best linear unbiased estimator given by Equation 29 for the vector case can easily be shown to be a generalization of the OLS estimator. Simply computing the means and variances of each point (x, y) in the set ({x i, y i )} and constructing the vector means and covariance matrices where the i th entry corresponds to the covariance and mean of the scalars (x i, y i ), we can obtain the BLUE line using the OLS approach. 6 KALMAN FILTERING: LINEAR SYSTEMS The most common application of the Kalman filter for linearly behaving systems is state estimation with uncertain observations. In this Section, we apply the theory developed in Sections 3-5 to show how the Kalman filter yields the most optimal state of the system at each discrete timestep, using observed estimates and estimates drawn from a model. Before we apply the filter for state estimation, the linear model of the system must first be determined. This model can be constructed based on the underlying dynamics of the system. In most cases, the equations necessary to construct the model are determined by the physics of the dynamical system. Figure 6(a) shows that by using such model, we can determine the evolution of the state of a system if the details of behind the linear system are known precisely. The function f t is a function of the system s state at the previous time step t and known control inputs such as the acceleration of a particle due to other external variables. This can be expressed by the Equation x t+ = f t (x t, u t ), where x t is the state of the system at time t, and u t is the control input. For linear systems where f t is strictly a linear function, we can express the evolution of the system s state as x t+ = F t x t + B t u t where F t and B t are time dependent matrices determined by the dynamics of the system and the known control variables, respectively. Using such linear model, we can determine the system s state at any time t, as long as the state of the previous time step is known, and the dynamics and control inputs of the system are modeled precisely through matrices F t and B t. The model that determines the state of the system can sometimes be inaccurate compared to the actual state of the system due to statistical noise. Apart from the predicted state at each discrete time step, we use observations of the state s system as a second uncertain estimate at each time step, which allows us to determine the best out of the two estimates, or rather fuse both estimates based on our confidence in each one. If the measurement or the modeled estimate were exact and accurate to the actual state of the system, only one of the two ways of determining the system s state would be needed. In general, both the model and our measurements are noisy, and some components of the state might not be directly observable by measurements at every time step. The Kalman filter is therefore used to obtain the most optimal state at each discrete time step for the entire state of the system. 6. Fusing complete observations of the state If the entire state of the system can be observed through measurements at each time step, we can use both, measurements and predicted estimates and determine the fused state estimate which is the closest to the actual state of the system. If the state measurements and the predicted states are uncorrelated, and the covariance matrices of the initial modeled state and all measurements are known, Equations can be used to determine the optimal fused estimate and its 2

13 corresponding covariance. The fused estimate at time t can be used by the model to obtain the estimate of the model at time t +, which is then fused with the measurement at time t + by using their respective covariance matrices. The covariance of a fused estimate at time t is obtained through Equations 25 and 28, where Σ and Σ 2 are the covariance matrices of the fused estimate at the previous time step and of the measurement at the new time step, respectively. This recursive process is illustrated in Figure 6(b). The initial state at time t = 0 is fed into the model, which produces the predicted estimate of the state and its corresponding covariance at time t =, denoted by x 0 and Σ 0, respectively. The observation of the state at time t = with covariance matrix R is then fused with estimate x 0 using Equation 26 to produce the optimally fused state estimate x at time t =. Intuitively, the notation x 0 stands for the estimate of the state at time t = given the information at time t = 0, called the a priori estimate. Similarly, x t+ t+ is the corresponding fused estimate given information of the state at t =, often called the a posteriori estimate. We introduce the following generalized notation to proceed with the problem of obtaining state estimations for a linear system. The initial state of the system is denoted by x 0 with covariance Σ 0 0. In the example presented in Section 6.3, we make an arbitrary assumption that Σ 0 0 = Q t where Q t is the covariance matrix of the normally distributed noise term in the state evolution equation. Introducing a zero-mean noise term w t into the state evolution equation due to uncertainty in the model and the known control inputs requires x t+ t to be a random variable. The state evolution equation is now changed to x t+ t = F t x t t + B t u t + w t (3) where w t : N(0, Q t ) is uncorrelated with x t t. The measured estimate of the state at time t + is denoted by z t+ and is a random vector due to the normally distributed noise term v t+ : N(0, R t+ ). The measurement is modeled as follows: z t+ = x t+ + v t+, where v t+ is also uncorrelated with the a priori estimate. Figure 6c illustrates the application of the process described in Figure 3 to a linear system with uncertainty in both, the dynamical system and the measurements. The covariance matrix of the a posteriori estimate Σ t+ t can be obtained from the covariance matrix Σ t t by using Equation 4, which gives Σ t+ t = F t Σ t t F T t + Q t Figure 6c shows that we can proceed with the fusion of the two vector estimates as described in Section Fusing partial observations of the state In some applications such as in state estimation, part of the state might not be available for measurements to be drawn. In this case, the prediction is obtained in the same way, and the only difference lies within the fusion phase. The following steps are given for an understanding of the fusion process when part of the state is not observable. We will later illustrate this in Section 6.3, where the position of a vehicle s state is hidden and the velocity is observable. i. The observable component of the a priori estimate x t+ t is fused with the corresponding measurement using the equations derived in Sections 3-4 to produce the a posteriori estimate of the observable state. ii. To determine the hidden portion of the a posteriori estimate, we use the a posteriori estimate of the observable state and the BLUE estimator. iii. Both the hidden and observable a posteriori estimates are then used to obtain the a posteriori estimate of the entire state of the system. Implementing the filtering process does not outline the steps mentioned above, as shown in Figure 6d, but the steps are still necessary for a complete understanding of how the BLUE estimator is used when part of the state is hidden. 3

14 Figure 6: State estimation using Kalman filtering D state application example Figure 6 illustrates the steps outlined in Section 6.2 using a two-dimensional problem of estimating a state vector, where the first component can be measured directly, and the second component is hidden. This example is convenient because we can separately treat the state components as scalars, since one of the components is hidden, therefore avoiding to deal with vectors. We introduce the following notation to simplify the example. [ ] hi a priori estimate: x i = where h c i is the observable component, and c i is the hidden component. i [ ] σ 2 covariance matrix of x i : Σ i = h σ hc a posteriori state estimate: x 0 = measured estimate: z variance of measured estimate: r 2 σ ch σc 2 [ ] ho c o Steps i through iii are mathematically expressed by the following calculations which are also illustrated in Figure 7. i. The a posteriori estimate of the observable state is simply obtained by the a priori estimate using Equation 4: h o = h i + σ2 h σh 2 + (z h i) = h i + K h (z h i ) r2 4

15 where K h = σ2 h σh 2 +. r2 ii. The a posteriori estimate of the hidden state, c o, is determined using the a priori estimate of the hidden state and Equation 29 simplified to the scalar case, where covariance matrices are replaced by variances. where K c = c o = c i + σ hc σ 2 h σ hc σ 2 h + r2. σh 2 σh 2 + (z h i) = c i + σ hc r2 σh 2 + (z h i) = c i + K c (z h i ) r2 iii. Combining steps i and ii, we get [ ho ] = c o [ hi c i ] + [ ] Kh (z h K i ) c We express this result in terms of matrices as a step towards the general case presented in Section We define H = [ 0 ] and R = r 2, and express x o and K as follows: x o = x i + K(z Hx i ) K = Σ i H T (HΣ i H T + R) Figure 7: Computing the a posteriori estimate when part of the state is not observable Figure 7 illustrates the computations of the previous steps, where c o is obtained by adding K h (z h i ) to c i, and similarly, h o is obtained by adding K c (z h i ) to h i. The BLUE estimator is used to determine the hidden component of the a posteriori estimate, allowing us to construct the a posteriori estimate of the entire state by combining the hidden and observable components together General case Suppose that the observable component of x is given by Hx where H is a full row-rank matrix; that is, each of the rows of matrix H are linearly independent. Let C be a basis for the orthogonal complement of H, meaning that each row of the matrix C is orthogonal to every row of the H matrix. If Σ represents the covariance matrix of x, it can easily be shown that the covariance between Cx and Hx is CΣH T by the linearity of the expectation operator. We generalize the steps listed in Section 6.2. to compute the a posteriori estimate from the a priori estimate. i. The a priori estimate of the observable state is given by Hx t+ t, and the corresponding a posteriori estimate is obtained using Equation 26: Hx t+ t+ = Hx t+ t + HΣ t+ t H T (HΣ t+ t H T + R t+ ) (z t+ Hx t+ t ) We can simplify this result by using the Kalman gain notation K t+ = Σ t+ t H T (HΣ t+ t H T + R t+ ) and rewriting the previous equation as follows: Hx t+ t+ = Hx t+ t + HK t+ (z t+ Hx t+ t ) (32) 5

16 ii. The a posteriori estimate of the hidden state can be obtained using the corresponding a priori estimate and Equation 29. This result is functionally similar to the result given by Equation 32. Cx t+ t+ = Cx t+ t + (CΣ t+ t H T )(HΣ t+ t H T ) HK t+ (z t+ Hx t+ t ) = Cx t+ t + CK t+ (z t+ t Hx t+ t ) (33) iii. Combining the a posteriori estimates of the hidden and observable components yields the following generalized result: [ ] [ ] [ ] H H H x C t+ t+ = x C t+ t + K C t+ (z t+ Hx t+ t ) Matrices C and H are invertible since they both have linearly independent rows. This implies that well, so the equation becomes [ ] H is invertible as C x t+ t+ = x t+ t + K t+ (z t+ Hx t+ t ) = (I K t+ Hx t+ t + K t+ z t+ (34) Lemma 2.2 allows us to compute the covariance of the entire a posteriori estimate x t+ t+. Both estimates x t+ t and z t+ are uncorrelated, so we get Σ t+ t+ = (I K t+ H)Σ t+ t (I K t+ H) T + K t+ R t+ K T t+ We can then substitute the value of K t+ given in step i to obtain the covariance of the entire a posteriori estimate. Σ t+ t+ = (I K t+ H)Σ t+ t (35) These results are illustrated in Figure 6d. If the entire state is observable, that is, all components can be measured, the results in this section reduce to those given in Figure 6c. 6.3 An example of a 2D state estimation with partial observability The simplest application of the Kalman filtering process is the state estimation of a vehicle. We illustrate the concepts discussed in Section 6 with the position and velocity of an initially stationary vehicle. The vehicle starts at position x = 0m with initial speed v = 00 m s and decelerates along a straight line with a constant deceleration of 5 m s until it stops. 2 In this example, we assume that we have no known control inputs, so the B t u t term in Equation 3 vanishes. The underlying linear dynamical system is modeled by the equations d(t) = d 0 + v 0 t + 2 at2 and v(t) = v 0 + at which are used for systems with constant translational acceleration in a straight line, where d(t) represents the vehicle s distance from the origin at time t, and v(t) represents the vehicle s speed at time t. For the purpose of the experiment, we discretize time into intervals of 0.25 seconds. The equations for the vehicle s distance and speed can be manipulated under the following conditions where t = 0.25s and a = 5 m s as follows: 2 d(t + ) = d(t) v(t) ( ) v(t + ) = v(t) (0.25 5) Using the equations above, we can construct a two-dimensional state prediction model with a zero-mean normally distributed statistical noise variable w t, and initial conditions d(0) = 0s and v(0) = 00 m s : [ ] dt+ = v t+ [ ] [ ] 0.25 dt + 0 v t [ ] [ w ] t (36) In this example we use MATLAB to plot the different types of position and velocity trajectories that are determined by the model itself, with and without a noise variable. Furthermore, we plot the observable component at each time step. Finally, we compare these estimated trajectories to the optimal trajectory determined by the filter, in order to understand the effect of the filter in a real life application. In our example, the observable component of the state is the speed, and the hidden portion of the state is the position of the vehicle. The black lines in Figures 8 and 9 represent the evolution of velocity and position according to the model without a noise variable. These lines are called the model trajectories. The red lines in the figures show the trajectories obtained by the state evolution model with a normally distributed noise term w t : N(0, Q t ) added to it, where Q t is the covariance of the error terms. These [ lines are ] called the ground truth trajectories. We arbitrarily choose an intuitively acceptable covariance matrix Q t = which is the same at all

17 time steps. Furthermore, we arbitrarily make the covariance matrix Σ 0 0 = Q t. The green points in the velocity graphs represent the noisy observations of the system s state at each discrete time step, denoted by z. The noise terms z are normally distributed zero-mean estimates with constant variance r 2 = 8. Finally, the blue lines show the a posteriori estimates of the velocity and position at time t. Because the noise terms are random variables, there are many possible trajectories that could be plotted. We illustrate one in Figure 8. Figure 8: Estimates of the car s state over time Figure 9: Behavior of standard deviation of car s velocity and position 6.4 Analysis of results obtained from Example 6.3 The length of each error bar on the graphs displayed in Figure 9 are given by twice the standard deviation of the a posteriori estimates. By maintaining a list of the variances of the position and velocity at each time step, we observe that the standard deviation of the velocity converges to a value of.6. This result is not counterintuitive, since we repeatedly fuse an uncertain observation with a refined predicted estimate that continuously smoothes out the noise produced by the predictor. On the other hand, the standard deviation of the a posteriori estimates on the position graph diverges. This can be explained by the fact that the BLUE estimator used to determine the hidden portion of the a posteriori estimates gives uncertain estimations, meaning that the estimate of the vehicle s position is always inaccurate and sometimes even imprecise. Additionally, there is no second estimator of the position of the vehicle like there was for velocity, so we rely on the estimations obtained by the algorithm alone. The accumulation of these imprecise results leads to a diverging standard deviation, meaning that the position of the vehicle estimated by the algorithm becomes increasingly unreliable as time progresses. This is shown in the position graphs of Figures 8 and 9, where the error bars increase in size and the trajectory outlined by the model (blue line) begins to separate from the ideal trajectory illustrated by the black line. 7

18 6.5 Discussion Equation 34 shows us that the a prosteriori estimate is a linear combination of the a priori estimate x t+ t and the measurement estimate of the observable state, z t+. The Equation is not the optimal unbiased linear estimator for combining the two estimates, so we use the M SE minimizing technique to obtain an unbiased estimator with minimum MSE. If the a posteriori estimate is of the form K x t+ t + K 2 z t+, we choose values K and K 2 that produce the best optimal unbiased estimator. These results are underlined in extensions of the Kalman filter, and will not be covered in this paper. The assumption that the noise terms w t and v t are Gaussian restricts certain applications, so other filtering techniques replace the standard Kalman filtering process, where the results no longer assumed to be normally distributed. In extensions of the Kalman filter presented in this paper, one considers a nonlinear system where the results are much more complex. Also, if the noise variables are not Gaussian, it can be shown that there exist other nonlinear estimators with a lower M SE value than the one previously introduced in this paper. If the noise variables are normally distributed, then we can be certain that the linear estimator is a minimum variance unbiased estimator (MVUE). Finally, this property is not needed to develop a basic understanding of the Kalman filtering process. 7 CONCLUSIONS Kalman filtering is a widely used state estimation process in computer systems, and has many applications in the field of engineering. Robot motion and navigation of vehicles are the most common standard applications of the Kalman filter, and will be frequently seen when autonomous vehicles are implemented in society. Such a task requires an incredibly accurate state estimation process, so by understanding the basic concepts behind the Kalman filtering process, the systems to be filtered can be extended to the non-linear case. Understanding a simplified approach to the algorithm is crucial and necessary in developing the knowledge necessary for applying the filter to computer science and engineering problems. 8

An Elementary Introduction to Kalman Filtering

An Elementary Introduction to Kalman Filtering An Elementary Introduction to Kalman Filtering arxiv:1710.04055v [cs.sy] 10 Nov 017 ABSTRACT Yan Pei University of Texas at Austin ypei@cs.utexas.edu Donald S. Fussell University of Texas at Austin fussell@cs.utexas.edu

More information

Kalman Filter. Predict: Update: x k k 1 = F k x k 1 k 1 + B k u k P k k 1 = F k P k 1 k 1 F T k + Q

Kalman Filter. Predict: Update: x k k 1 = F k x k 1 k 1 + B k u k P k k 1 = F k P k 1 k 1 F T k + Q Kalman Filter Kalman Filter Predict: x k k 1 = F k x k 1 k 1 + B k u k P k k 1 = F k P k 1 k 1 F T k + Q Update: K = P k k 1 Hk T (H k P k k 1 Hk T + R) 1 x k k = x k k 1 + K(z k H k x k k 1 ) P k k =(I

More information

Lecture Notes 4 Vector Detection and Estimation. Vector Detection Reconstruction Problem Detection for Vector AGN Channel

Lecture Notes 4 Vector Detection and Estimation. Vector Detection Reconstruction Problem Detection for Vector AGN Channel Lecture Notes 4 Vector Detection and Estimation Vector Detection Reconstruction Problem Detection for Vector AGN Channel Vector Linear Estimation Linear Innovation Sequence Kalman Filter EE 278B: Random

More information

Kalman Filter. Man-Wai MAK

Kalman Filter. Man-Wai MAK Kalman Filter Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: S. Gannot and A. Yeredor,

More information

4 Derivations of the Discrete-Time Kalman Filter

4 Derivations of the Discrete-Time Kalman Filter Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof N Shimkin 4 Derivations of the Discrete-Time

More information

Time Series Analysis

Time Series Analysis Time Series Analysis hm@imm.dtu.dk Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby 1 Outline of the lecture State space models, 1st part: Model: Sec. 10.1 The

More information

The Kalman Filter ImPr Talk

The Kalman Filter ImPr Talk The Kalman Filter ImPr Talk Ged Ridgway Centre for Medical Image Computing November, 2006 Outline What is the Kalman Filter? State Space Models Kalman Filter Overview Bayesian Updating of Estimates Kalman

More information

Optimization-Based Control

Optimization-Based Control Optimization-Based Control Richard M. Murray Control and Dynamical Systems California Institute of Technology DRAFT v1.7a, 19 February 2008 c California Institute of Technology All rights reserved. This

More information

ECONOMETRIC METHODS II: TIME SERIES LECTURE NOTES ON THE KALMAN FILTER. The Kalman Filter. We will be concerned with state space systems of the form

ECONOMETRIC METHODS II: TIME SERIES LECTURE NOTES ON THE KALMAN FILTER. The Kalman Filter. We will be concerned with state space systems of the form ECONOMETRIC METHODS II: TIME SERIES LECTURE NOTES ON THE KALMAN FILTER KRISTOFFER P. NIMARK The Kalman Filter We will be concerned with state space systems of the form X t = A t X t 1 + C t u t 0.1 Z t

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

A Study of Covariances within Basic and Extended Kalman Filters

A Study of Covariances within Basic and Extended Kalman Filters A Study of Covariances within Basic and Extended Kalman Filters David Wheeler Kyle Ingersoll December 2, 2013 Abstract This paper explores the role of covariance in the context of Kalman filters. The underlying

More information

ECE531 Lecture 11: Dynamic Parameter Estimation: Kalman-Bucy Filter

ECE531 Lecture 11: Dynamic Parameter Estimation: Kalman-Bucy Filter ECE531 Lecture 11: Dynamic Parameter Estimation: Kalman-Bucy Filter D. Richard Brown III Worcester Polytechnic Institute 09-Apr-2009 Worcester Polytechnic Institute D. Richard Brown III 09-Apr-2009 1 /

More information

chapter 12 MORE MATRIX ALGEBRA 12.1 Systems of Linear Equations GOALS

chapter 12 MORE MATRIX ALGEBRA 12.1 Systems of Linear Equations GOALS chapter MORE MATRIX ALGEBRA GOALS In Chapter we studied matrix operations and the algebra of sets and logic. We also made note of the strong resemblance of matrix algebra to elementary algebra. The reader

More information

CS 532: 3D Computer Vision 6 th Set of Notes

CS 532: 3D Computer Vision 6 th Set of Notes 1 CS 532: 3D Computer Vision 6 th Set of Notes Instructor: Philippos Mordohai Webpage: www.cs.stevens.edu/~mordohai E-mail: Philippos.Mordohai@stevens.edu Office: Lieb 215 Lecture Outline Intro to Covariance

More information

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

LECTURE 2 LINEAR REGRESSION MODEL AND OLS SEPTEMBER 29, 2014 LECTURE 2 LINEAR REGRESSION MODEL AND OLS Definitions A common question in econometrics is to study the effect of one group of variables X i, usually called the regressors, on another

More information

Kalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein

Kalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein Kalman filtering and friends: Inference in time series models Herke van Hoof slides mostly by Michael Rubinstein Problem overview Goal Estimate most probable state at time k using measurement up to time

More information

If we want to analyze experimental or simulated data we might encounter the following tasks:

If we want to analyze experimental or simulated data we might encounter the following tasks: Chapter 1 Introduction If we want to analyze experimental or simulated data we might encounter the following tasks: Characterization of the source of the signal and diagnosis Studying dependencies Prediction

More information

The Kalman Filter. Data Assimilation & Inverse Problems from Weather Forecasting to Neuroscience. Sarah Dance

The Kalman Filter. Data Assimilation & Inverse Problems from Weather Forecasting to Neuroscience. Sarah Dance The Kalman Filter Data Assimilation & Inverse Problems from Weather Forecasting to Neuroscience Sarah Dance School of Mathematical and Physical Sciences, University of Reading s.l.dance@reading.ac.uk July

More information

Regression. Oscar García

Regression. Oscar García Regression Oscar García Regression methods are fundamental in Forest Mensuration For a more concise and general presentation, we shall first review some matrix concepts 1 Matrices An order n m matrix is

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

The Hilbert Space of Random Variables

The Hilbert Space of Random Variables The Hilbert Space of Random Variables Electrical Engineering 126 (UC Berkeley) Spring 2018 1 Outline Fix a probability space and consider the set H := {X : X is a real-valued random variable with E[X 2

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 6: Bias and variance (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 49 Our plan today We saw in last lecture that model scoring methods seem to be trading off two different

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

ENGR352 Problem Set 02

ENGR352 Problem Set 02 engr352/engr352p02 September 13, 2018) ENGR352 Problem Set 02 Transfer function of an estimator 1. Using Eq. (1.1.4-27) from the text, find the correct value of r ss (the result given in the text is incorrect).

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Robotics 2 Target Tracking. Kai Arras, Cyrill Stachniss, Maren Bennewitz, Wolfram Burgard

Robotics 2 Target Tracking. Kai Arras, Cyrill Stachniss, Maren Bennewitz, Wolfram Burgard Robotics 2 Target Tracking Kai Arras, Cyrill Stachniss, Maren Bennewitz, Wolfram Burgard Slides by Kai Arras, Gian Diego Tipaldi, v.1.1, Jan 2012 Chapter Contents Target Tracking Overview Applications

More information

MS&E 226: Small Data. Lecture 6: Bias and variance (v2) Ramesh Johari

MS&E 226: Small Data. Lecture 6: Bias and variance (v2) Ramesh Johari MS&E 226: Small Data Lecture 6: Bias and variance (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 47 Our plan today We saw in last lecture that model scoring methods seem to be trading o two di erent

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing

More information

Xβ is a linear combination of the columns of X: Copyright c 2010 Dan Nettleton (Iowa State University) Statistics / 25 X =

Xβ is a linear combination of the columns of X: Copyright c 2010 Dan Nettleton (Iowa State University) Statistics / 25 X = The Gauss-Markov Linear Model y Xβ + ɛ y is an n random vector of responses X is an n p matrix of constants with columns corresponding to explanatory variables X is sometimes referred to as the design

More information

From Bayes to Extended Kalman Filter

From Bayes to Extended Kalman Filter From Bayes to Extended Kalman Filter Michal Reinštein Czech Technical University in Prague Faculty of Electrical Engineering, Department of Cybernetics Center for Machine Perception http://cmp.felk.cvut.cz/

More information

The Scaled Unscented Transformation

The Scaled Unscented Transformation The Scaled Unscented Transformation Simon J. Julier, IDAK Industries, 91 Missouri Blvd., #179 Jefferson City, MO 6519 E-mail:sjulier@idak.com Abstract This paper describes a generalisation of the unscented

More information

Kalman Filter and Parameter Identification. Florian Herzog

Kalman Filter and Parameter Identification. Florian Herzog Kalman Filter and Parameter Identification Florian Herzog 2013 Continuous-time Kalman Filter In this chapter, we shall use stochastic processes with independent increments w 1 (.) and w 2 (.) at the input

More information

Review of Probability Theory

Review of Probability Theory Review of Probability Theory Arian Maleki and Tom Do Stanford University Probability theory is the study of uncertainty Through this class, we will be relying on concepts from probability theory for deriving

More information

Econ 2120: Section 2

Econ 2120: Section 2 Econ 2120: Section 2 Part I - Linear Predictor Loose Ends Ashesh Rambachan Fall 2018 Outline Big Picture Matrix Version of the Linear Predictor and Least Squares Fit Linear Predictor Least Squares Omitted

More information

Covariance and Correlation

Covariance and Correlation Covariance and Correlation ST 370 The probability distribution of a random variable gives complete information about its behavior, but its mean and variance are useful summaries. Similarly, the joint probability

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

The Multivariate Gaussian Distribution

The Multivariate Gaussian Distribution The Multivariate Gaussian Distribution Chuong B. Do October, 8 A vector-valued random variable X = T X X n is said to have a multivariate normal or Gaussian) distribution with mean µ R n and covariance

More information

State-space Model. Eduardo Rossi University of Pavia. November Rossi State-space Model Fin. Econometrics / 53

State-space Model. Eduardo Rossi University of Pavia. November Rossi State-space Model Fin. Econometrics / 53 State-space Model Eduardo Rossi University of Pavia November 2014 Rossi State-space Model Fin. Econometrics - 2014 1 / 53 Outline 1 Motivation 2 Introduction 3 The Kalman filter 4 Forecast errors 5 State

More information

Linear Algebra. The analysis of many models in the social sciences reduces to the study of systems of equations.

Linear Algebra. The analysis of many models in the social sciences reduces to the study of systems of equations. POLI 7 - Mathematical and Statistical Foundations Prof S Saiegh Fall Lecture Notes - Class 4 October 4, Linear Algebra The analysis of many models in the social sciences reduces to the study of systems

More information

Multi-Robotic Systems

Multi-Robotic Systems CHAPTER 9 Multi-Robotic Systems The topic of multi-robotic systems is quite popular now. It is believed that such systems can have the following benefits: Improved performance ( winning by numbers ) Distributed

More information

4 Bias-Variance for Ridge Regression (24 points)

4 Bias-Variance for Ridge Regression (24 points) Implement Ridge Regression with λ = 0.00001. Plot the Squared Euclidean test error for the following values of k (the dimensions you reduce to): k = {0, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500,

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 12 Dynamical Models CS/CNS/EE 155 Andreas Krause Homework 3 out tonight Start early!! Announcements Project milestones due today Please email to TAs 2 Parameter learning

More information

Simple Linear Regression Estimation and Properties

Simple Linear Regression Estimation and Properties Simple Linear Regression Estimation and Properties Outline Review of the Reading Estimate parameters using OLS Other features of OLS Numerical Properties of OLS Assumptions of OLS Goodness of Fit Checking

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

DS-GA 1002 Lecture notes 10 November 23, Linear models

DS-GA 1002 Lecture notes 10 November 23, Linear models DS-GA 2 Lecture notes November 23, 2 Linear functions Linear models A linear model encodes the assumption that two quantities are linearly related. Mathematically, this is characterized using linear functions.

More information

Lecture 4: Least Squares (LS) Estimation

Lecture 4: Least Squares (LS) Estimation ME 233, UC Berkeley, Spring 2014 Xu Chen Lecture 4: Least Squares (LS) Estimation Background and general solution Solution in the Gaussian case Properties Example Big picture general least squares estimation:

More information

Gaussian, Markov and stationary processes

Gaussian, Markov and stationary processes Gaussian, Markov and stationary processes Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of Rochester gmateosb@ece.rochester.edu http://www.ece.rochester.edu/~gmateosb/ November

More information

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) 1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For

More information

9 Multi-Model State Estimation

9 Multi-Model State Estimation Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof. N. Shimkin 9 Multi-Model State

More information

Miscellaneous. Regarding reading materials. Again, ask questions (if you have) and ask them earlier

Miscellaneous. Regarding reading materials. Again, ask questions (if you have) and ask them earlier Miscellaneous Regarding reading materials Reading materials will be provided as needed If no assigned reading, it means I think the material from class is sufficient Should be enough for you to do your

More information

Shannon meets Wiener II: On MMSE estimation in successive decoding schemes

Shannon meets Wiener II: On MMSE estimation in successive decoding schemes Shannon meets Wiener II: On MMSE estimation in successive decoding schemes G. David Forney, Jr. MIT Cambridge, MA 0239 USA forneyd@comcast.net Abstract We continue to discuss why MMSE estimation arises

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation

More information

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012 Problem Set #6: OLS Economics 835: Econometrics Fall 202 A preliminary result Suppose we have a random sample of size n on the scalar random variables (x, y) with finite means, variances, and covariance.

More information

Statistical Techniques in Robotics (16-831, F12) Lecture#20 (Monday November 12) Gaussian Processes

Statistical Techniques in Robotics (16-831, F12) Lecture#20 (Monday November 12) Gaussian Processes Statistical Techniques in Robotics (6-83, F) Lecture# (Monday November ) Gaussian Processes Lecturer: Drew Bagnell Scribe: Venkatraman Narayanan Applications of Gaussian Processes (a) Inverse Kinematics

More information

1 Kalman Filter Introduction

1 Kalman Filter Introduction 1 Kalman Filter Introduction You should first read Chapter 1 of Stochastic models, estimation, and control: Volume 1 by Peter S. Maybec (available here). 1.1 Explanation of Equations (1-3) and (1-4) Equation

More information

9.2 Support Vector Machines 159

9.2 Support Vector Machines 159 9.2 Support Vector Machines 159 9.2.3 Kernel Methods We have all the tools together now to make an exciting step. Let us summarize our findings. We are interested in regularized estimation problems of

More information

STAT 100C: Linear models

STAT 100C: Linear models STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 56 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix

More information

Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes

Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes Lecturer: Drew Bagnell Scribe: Venkatraman Narayanan 1, M. Koval and P. Parashar 1 Applications of Gaussian

More information

An Introduction to Parameter Estimation

An Introduction to Parameter Estimation Introduction Introduction to Econometrics An Introduction to Parameter Estimation This document combines several important econometric foundations and corresponds to other documents such as the Introduction

More information

Joint Distributions. (a) Scalar multiplication: k = c d. (b) Product of two matrices: c d. (c) The transpose of a matrix:

Joint Distributions. (a) Scalar multiplication: k = c d. (b) Product of two matrices: c d. (c) The transpose of a matrix: Joint Distributions Joint Distributions A bivariate normal distribution generalizes the concept of normal distribution to bivariate random variables It requires a matrix formulation of quadratic forms,

More information

Problem Solving. Correlation and Covariance. Yi Lu. Problem Solving. Yi Lu ECE 313 2/51

Problem Solving. Correlation and Covariance. Yi Lu. Problem Solving. Yi Lu ECE 313 2/51 Yi Lu Correlation and Covariance Yi Lu ECE 313 2/51 Definition Let X and Y be random variables with finite second moments. the correlation: E[XY ] Yi Lu ECE 313 3/51 Definition Let X and Y be random variables

More information

Implicit Functions, Curves and Surfaces

Implicit Functions, Curves and Surfaces Chapter 11 Implicit Functions, Curves and Surfaces 11.1 Implicit Function Theorem Motivation. In many problems, objects or quantities of interest can only be described indirectly or implicitly. It is then

More information

Lessons in Estimation Theory for Signal Processing, Communications, and Control

Lessons in Estimation Theory for Signal Processing, Communications, and Control Lessons in Estimation Theory for Signal Processing, Communications, and Control Jerry M. Mendel Department of Electrical Engineering University of Southern California Los Angeles, California PRENTICE HALL

More information

A Matrix Theoretic Derivation of the Kalman Filter

A Matrix Theoretic Derivation of the Kalman Filter A Matrix Theoretic Derivation of the Kalman Filter 4 September 2008 Abstract This paper presents a matrix-theoretic derivation of the Kalman filter that is accessible to students with a strong grounding

More information

ESTIMATION THEORY. Chapter Estimation of Random Variables

ESTIMATION THEORY. Chapter Estimation of Random Variables Chapter ESTIMATION THEORY. Estimation of Random Variables Suppose X,Y,Y 2,...,Y n are random variables defined on the same probability space (Ω, S,P). We consider Y,...,Y n to be the observed random variables

More information

Appendix A : Introduction to Probability and stochastic processes

Appendix A : Introduction to Probability and stochastic processes A-1 Mathematical methods in communication July 5th, 2009 Appendix A : Introduction to Probability and stochastic processes Lecturer: Haim Permuter Scribe: Shai Shapira and Uri Livnat The probability of

More information

OPTIMAL ESTIMATION of DYNAMIC SYSTEMS

OPTIMAL ESTIMATION of DYNAMIC SYSTEMS CHAPMAN & HALL/CRC APPLIED MATHEMATICS -. AND NONLINEAR SCIENCE SERIES OPTIMAL ESTIMATION of DYNAMIC SYSTEMS John L Crassidis and John L. Junkins CHAPMAN & HALL/CRC A CRC Press Company Boca Raton London

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 89 Part II

More information

Topics in Probability and Statistics

Topics in Probability and Statistics Topics in Probability and tatistics A Fundamental Construction uppose {, P } is a sample space (with probability P), and suppose X : R is a random variable. The distribution of X is the probability P X

More information

UAV Navigation: Airborne Inertial SLAM

UAV Navigation: Airborne Inertial SLAM Introduction UAV Navigation: Airborne Inertial SLAM Jonghyuk Kim Faculty of Engineering and Information Technology Australian National University, Australia Salah Sukkarieh ARC Centre of Excellence in

More information

Chapter I: Fundamental Information Theory

Chapter I: Fundamental Information Theory ECE-S622/T62 Notes Chapter I: Fundamental Information Theory Ruifeng Zhang Dept. of Electrical & Computer Eng. Drexel University. Information Source Information is the outcome of some physical processes.

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

X t = a t + r t, (7.1)

X t = a t + r t, (7.1) Chapter 7 State Space Models 71 Introduction State Space models, developed over the past 10 20 years, are alternative models for time series They include both the ARIMA models of Chapters 3 6 and the Classical

More information

EKF, UKF. Pieter Abbeel UC Berkeley EECS. Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics

EKF, UKF. Pieter Abbeel UC Berkeley EECS. Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics EKF, UKF Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics Kalman Filter Kalman Filter = special case of a Bayes filter with dynamics model and sensory

More information

EKF, UKF. Pieter Abbeel UC Berkeley EECS. Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics

EKF, UKF. Pieter Abbeel UC Berkeley EECS. Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics EKF, UKF Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics Kalman Filter Kalman Filter = special case of a Bayes filter with dynamics model and sensory

More information

Vectors and Matrices Statistics with Vectors and Matrices

Vectors and Matrices Statistics with Vectors and Matrices Vectors and Matrices Statistics with Vectors and Matrices Lecture 3 September 7, 005 Analysis Lecture #3-9/7/005 Slide 1 of 55 Today s Lecture Vectors and Matrices (Supplement A - augmented with SAS proc

More information

Regression Models - Introduction

Regression Models - Introduction Regression Models - Introduction In regression models, two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent variable,

More information

FIR Filters for Stationary State Space Signal Models

FIR Filters for Stationary State Space Signal Models Proceedings of the 17th World Congress The International Federation of Automatic Control FIR Filters for Stationary State Space Signal Models Jung Hun Park Wook Hyun Kwon School of Electrical Engineering

More information

1 Introduction ISSN

1 Introduction ISSN Techset Composition Ltd, Salisbury Doc: {IEE}CTA/Articles/Pagination/CTA58454.3d www.ietdl.org Published in IET Control Theory and Applications Received on 15th January 2009 Revised on 18th May 2009 ISSN

More information

A Crash Course on Kalman Filtering

A Crash Course on Kalman Filtering A Crash Course on Kalman Filtering Dan Simon Cleveland State University Fall 2014 1 / 64 Outline Linear Systems Probability State Means and Covariances Least Squares Estimation The Kalman Filter Unknown

More information

Lecture Note 12: Kalman Filter

Lecture Note 12: Kalman Filter ECE 645: Estimation Theory Spring 2015 Instructor: Prof. Stanley H. Chan Lecture Note 12: Kalman Filter LaTeX prepared by Stylianos Chatzidakis) May 4, 2015 This lecture note is based on ECE 645Spring

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

STAT 111 Recitation 7

STAT 111 Recitation 7 STAT 111 Recitation 7 Xin Lu Tan xtan@wharton.upenn.edu October 25, 2013 1 / 13 Miscellaneous Please turn in homework 6. Please pick up homework 7 and the graded homework 5. Please check your grade and

More information

A new unscented Kalman filter with higher order moment-matching

A new unscented Kalman filter with higher order moment-matching A new unscented Kalman filter with higher order moment-matching KSENIA PONOMAREVA, PARESH DATE AND ZIDONG WANG Department of Mathematical Sciences, Brunel University, Uxbridge, UB8 3PH, UK. Abstract This

More information

Stochastic Analogues to Deterministic Optimizers

Stochastic Analogues to Deterministic Optimizers Stochastic Analogues to Deterministic Optimizers ISMP 2018 Bordeaux, France Vivak Patel Presented by: Mihai Anitescu July 6, 2018 1 Apology I apologize for not being here to give this talk myself. I injured

More information

State-space Model. Eduardo Rossi University of Pavia. November Rossi State-space Model Financial Econometrics / 49

State-space Model. Eduardo Rossi University of Pavia. November Rossi State-space Model Financial Econometrics / 49 State-space Model Eduardo Rossi University of Pavia November 2013 Rossi State-space Model Financial Econometrics - 2013 1 / 49 Outline 1 Introduction 2 The Kalman filter 3 Forecast errors 4 State smoothing

More information

Chapter 2: simple regression model

Chapter 2: simple regression model Chapter 2: simple regression model Goal: understand how to estimate and more importantly interpret the simple regression Reading: chapter 2 of the textbook Advice: this chapter is foundation of econometrics.

More information

Link lecture - Lagrange Multipliers

Link lecture - Lagrange Multipliers Link lecture - Lagrange Multipliers Lagrange multipliers provide a method for finding a stationary point of a function, say f(x, y) when the variables are subject to constraints, say of the form g(x, y)

More information

DEEP LEARNING CHAPTER 3 PROBABILITY & INFORMATION THEORY

DEEP LEARNING CHAPTER 3 PROBABILITY & INFORMATION THEORY DEEP LEARNING CHAPTER 3 PROBABILITY & INFORMATION THEORY OUTLINE 3.1 Why Probability? 3.2 Random Variables 3.3 Probability Distributions 3.4 Marginal Probability 3.5 Conditional Probability 3.6 The Chain

More information

ECON The Simple Regression Model

ECON The Simple Regression Model ECON 351 - The Simple Regression Model Maggie Jones 1 / 41 The Simple Regression Model Our starting point will be the simple regression model where we look at the relationship between two variables In

More information

Gaussian Processes. 1 What problems can be solved by Gaussian Processes?

Gaussian Processes. 1 What problems can be solved by Gaussian Processes? Statistical Techniques in Robotics (16-831, F1) Lecture#19 (Wednesday November 16) Gaussian Processes Lecturer: Drew Bagnell Scribe:Yamuna Krishnamurthy 1 1 What problems can be solved by Gaussian Processes?

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1 Inverse of a Square Matrix For an N N square matrix A, the inverse of A, 1 A, exists if and only if A is of full rank, i.e., if and only if no column of A is a linear combination 1 of the others. A is

More information

Linear Model Under General Variance

Linear Model Under General Variance Linear Model Under General Variance We have a sample of T random variables y 1, y 2,, y T, satisfying the linear model Y = X β + e, where Y = (y 1,, y T )' is a (T 1) vector of random variables, X = (T

More information

The Kalman filter is arguably one of the most notable algorithms

The Kalman filter is arguably one of the most notable algorithms LECTURE E NOTES «Kalman Filtering with Newton s Method JEFFREY HUMPHERYS and JEREMY WEST The Kalman filter is arguably one of the most notable algorithms of the 0th century [1]. In this article, we derive

More information

Statistics 910, #15 1. Kalman Filter

Statistics 910, #15 1. Kalman Filter Statistics 910, #15 1 Overview 1. Summary of Kalman filter 2. Derivations 3. ARMA likelihoods 4. Recursions for the variance Kalman Filter Summary of Kalman filter Simplifications To make the derivations

More information

BACKGROUND NOTES FYS 4550/FYS EXPERIMENTAL HIGH ENERGY PHYSICS AUTUMN 2016 PROBABILITY A. STRANDLIE NTNU AT GJØVIK AND UNIVERSITY OF OSLO

BACKGROUND NOTES FYS 4550/FYS EXPERIMENTAL HIGH ENERGY PHYSICS AUTUMN 2016 PROBABILITY A. STRANDLIE NTNU AT GJØVIK AND UNIVERSITY OF OSLO ACKGROUND NOTES FYS 4550/FYS9550 - EXERIMENTAL HIGH ENERGY HYSICS AUTUMN 2016 ROAILITY A. STRANDLIE NTNU AT GJØVIK AND UNIVERSITY OF OSLO efore embarking on the concept of probability, we will first define

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 26. Estimation: Regression and Least Squares

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 26. Estimation: Regression and Least Squares CS 70 Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 26 Estimation: Regression and Least Squares This note explains how to use observations to estimate unobserved random variables.

More information

Estimation techniques

Estimation techniques Estimation techniques March 2, 2006 Contents 1 Problem Statement 2 2 Bayesian Estimation Techniques 2 2.1 Minimum Mean Squared Error (MMSE) estimation........................ 2 2.1.1 General formulation......................................

More information