1 Joint GPS and Vision Estimation Using an Adaptive Filter Shubhendra Vikram Singh Chauhan and Grace Xingxin Gao, University of Illinois at Urbana-Champaign Shubhendra Vikram Singh Chauhan received his B.Tech. and M.Tech degree in aerospace engineering from Indian Institute of Technology, Bombay, India in 2016. He received Institute Silver Medal on graduation. He is currently pursuing his PhD degree at University of Illinois at Urbana-Champaign. His research interests include robotics, controls and sensor fusion. Grace Xingxin Gao received the B.S. degree in mechanical engineering and the M.S. degree in electrical engineering from Tsinghua University, Beijing, China in 2001 and 2003. She received the PhD degree in electrical engineering from Stanford University in 2008. From 2008 to 2012, she was a research associate at Stanford University. Since 2012, she has been with University of Illinois at Urbana-Champaign, where she is presently an assistant professor in the Aerospace Engineering Department. Her research interests are systems, signals, control, and robotics. Abstract Navigation in urban environments with standalone GPS is a challenging task. In urban environments, it is probable that GPS signals are being blocked or reflected by buildings. These factors degrade position accuracy. Therefore, it is desirable to augment GPS with another sensor using a sensor fusion technique. The performance of a sensor fusion technique is highly dependent on covariance matrices and tuning these matrices is time consuming. In case of GPS, the covariance matrices may change with time and may vary in size. The expected noise in GPS measurements in urban environments is different than that of open environments. The number of visible satellites may increase or decrease with time, changing the size of measurement covariance matrix. The time and size variation of covariance matrices makes the position estimation even harder. In this paper, we propose an adaptive filter for GPS-vision fusion that adapts to time and size varying covariance matrices. In proposed method, we assume that the noise is Gaussian in small time intervals, and then use innovation sequence to derive an expression for covariance matrices. The camera image is compared with Google Street View (GSV) to obtain position. We use pseudoranges from GPS receiver and positions obtained from camera as measurements for our Extended Kalman Filter (EKF). The covariance matrices are obtained from innovation sequence and Kalman gain. The proposed filter is tested in GPS-challenged urban environments on the University of Illinois at Urbana- Champaign campus. The proposed filter demonstrates improved performance over the filter with fixed covariance matrices. Keywords Adaptive filter, image localization, global positioning system (GPS) I. INTRODUCTION Navigation in urban environments has always been a challenging task, especially with standalone GPS. In urban environments, there are high chances of GPS satellite signals being blocked or reflected by buildings. These factors affect the position accuracy of GPS. In literature this problem has been approached in two ways. The first method is to identify the multipath signal and then either remove it or weigh it in estimation process [1], [2]. The second method is to use a sensor to augment GPS. Vision sensor is commonly used for augmentation as urban environments are rich in features like buildings, street lights etc. These features are utilized in localization process. A sensor fusion technique is used to fuse the measurements. The performance of a sensor fusion technique is highly dependent on covariance matrices. Tuning covariance matrices is a laborious task. [3] uses a ground truth to train covariance matrices. However, in this method there is chance of overfitting i.e. once tuned for a particular scenario the matrices may not work for a different scenario as noise may change with time. The second challenge, which is not addressed in literature, is that the size of measurements may change with time. [4] uses covariance matching method to estimate covariance matrices adaptively. The paper doesn t address the issue of size change of measurement covariance matrix. A. Approach The main contribution of this paper is the development of the filter that adapts to time and size varying noise. The Gaussian assumption of noise is not valid when measurements are used from GPS or vision. The noise in GPS measurements in urban areas is expected to vary with time given, the chances of reflection is high. Also number of visible satellites may increase or decrease during a course of experiment. Similarly the noise in vision will be dependent on the number of features present in the environment. We expect the noise in vision to be varying with time. We use measurements to estimate covariance matrices, as any time and size variation in noise affects measurements. The overall architecture is shown in Figure 1. Input measurements of our EKF are pseudoranges, obtained from GPS receiver and position, obtained from vision. We create a set of grid points around previous position estimate and pull images from street view database. In image matching module, we compare raw image from camera and GSV to obtain the position. This is the position referred as position obtained from vision. In covariance estimation module, we use
2 Fig. 1: Overall architecture innovation sequence and Kalman gain to estimate covariance matrices. The estimated covariance matrices are fed to EKF for next time step. We conducted experiment in urban environments on the University of Illinois at Urbana Champaign (UIUC) campus. We implemented the proposed filter on the collected dataset. The proposed filter adapted to time and size varying noise and we were able to improve position accuracy in urban environments. The rest of the paper is organized as follows: Section II describes the algorithm to match images from GSV database. Section III presents our method to estimate covariance matrices. Section IV provides detail for EKF measurement and process model. Experimental scenario is given in section V. Section VI shows the results and finally the paper is concluded in section VII with conclusions. II. IMAGE MATCHING USING SIFT FEATURES GSV contains database of images that are geo-referenced. This map is discrete i.e. if the difference in the position of the pull request is less than 10m, the GSV gives the same image. SIFT [5] features are invariant to scale, rotation and lighting. Each feature comes with a vector descriptor. Features are matched by finding the nearest neighbor in descriptor space with some additional constraint. We use SIFT features for image matching. Figure 2 shows the steps involved in image matching. Fig. 2: Image matching using SIFT features First SIFT features are calculated for all the images present in database and also for the camera image. These features are then matched by implementing the Lowe s criteria. Lowe s criteria considers two features corresponding to each other if the ratio of distance between two nearest neighbors is below a certain threshold. Feature matching step is performed to find the feature correspondences. This step is susceptible to outliers. The transformation matrix relating the images are found in outlier rejection step. The matrix is found by randomly selecting feature correspondences and then estimating the matrix elements. The random selection of features is repeated and transformation matrix is estimated for each step. The matrix that gives largest number of inlier is considered to be the transformation matrix. After applying transformation matrix to the images, the number of inlier are counted. The number of inlier are stored for every database images. The image that has the largest number of inlier is considered to be a match and the position corresponding to that image is considered to be the position obtained from vision. III. ADAPTIVE COVARIANCE ESTIMATION We use innovation sequence to estimate noise covariance matrices. Innovation sequence is the difference of measurements and expected measurements. Any time and size variation of noise will affect the measurements and the effect will be observed in innovation sequence. Figure 3 conceptually shows the step involved in covariance estimation. Fig. 3: Estimation covariance matrices Measurements and states are input to EKF. We use innovation sequence and measurement model to get an expression of measurement covariance noise. Similarly we use, process model, Kalman gain and innovation sequence to get an expression of process covariance noise. These covariance matrices are then fed to EKF in next time step. A. Filter Equations Predict and update are the two steps involved in Kalman filtering. Equations for predict step are given below x k k 1 = f(x k 1 k 1 ) + ω k (1) P k k 1 = F k P k 1 k 1 F T k + Q k (2) where subscript k k 1 denotes the value at time step k given k 1 measurements, k 1 k 1 denotes the value at time step k 1 given k 1 measurements, x is a state vector, f is a non linear function that maps previous states to current time step, ω k is assumed to be zero mean Gaussian, P is the error covariance matrix, F k is the jacobian of non linear function f, evaluated at x k k 1 and Q k is the covariance matrix of process noise ω k.
3 Equation 1 uses dynamics of system to propagate states forward in time. Equation 2 is used to propagate error covariance matrices forward in time. The update step is performed by following equations z k = h(x k k 1 ) + ν k (3) ỹ k = z k H k x k k 1 (4) S k = R k + H k P k k 1 H T k (5) K k = P k k 1 H T k S 1 k (6) x k k = x k k 1 + K k ỹ k (7) P k k = (I K k H k )P k k 1 (8) where subscript k denotes the time step, h is the non linear function that maps states to measurements, ν k is assumed to be zero mean Gaussian, ỹ is innovation sequence, y k is the measurement, H k is the jacobian of non linear function h, evaluated at x k k 1, S k is the theoretical covariance of innovation sequence and K k is the Kalman gain. Measurement model is given by equation 3. Innovation sequence is the difference of measurements and expected measurements. Innovation sequence is given by equation 4. Equation 5 shows the covariance of innovation sequence. Kalman gain is obtained using equation 6. The states are updated using equation 7 and error covariance matrices are updated using equation 8. In the standard EKF, both process and measurement noise is assumed to be Gaussian. B. Covariance Estimation We relaxed the assumption of noise being Gaussian for whole duration to noise being Gaussian for small interval of time. Let the length of this time interval be N. Actual covariance of innovation sequence is calculated using the following equation E[ỹ k ỹ T k ] = 1 N 1 N (ỹ k i ȳ)(ỹ k i ȳ) T (9) i=1 where E[] denotes the expectation with respect to time, ȳ is the mean of N samples of innovation sequence around time step k. Let the actual covariance be denoted by Sk m i.e. E[ỹ k ỹk T ] = Sm k. Substituting this expression for covariance of innovation sequence in equation 5 and rearranging it would give the following equation ˆR k = S m k H k P k k 1 H T k (10) where ˆR k represents estimated measurement covariance noise. We also assume the noise elements to be independent. This along with the assumption of noise being Gaussian in small time intervals, makes ˆR k a diagonal matrix. The elements of diagonal matrix are then given by ˆr ik = S m k (i, i) H k (i, :)P k k 1 H k (i, :) T (11) where ˆr ik is the ith element of the diagonal matrix ˆR k and H k (i, :) denotes the ith row of the measurement matrix. Any size change in the measurement will change the size of Sk m and equation 11 is used to estimate the additional noise elements. Equation 11 is used to estimate the elements of measurement covariance noise. Using process model, the process noise can be written as ˆω k = x k k f(x k 1 k 1 ) (12) ˆω k = x k k x k k 1 (13) Using equation 7, above equation is simplified as ˆω k = K k ỹ k (14) Taking expectation on both sides in equation 14 would result in following equation ˆQ k = K k S m k K T k (15) where ˆQ k is the estimated process covariance noise. We have smoothed the estimation of covariance matrices by weighting previous estimates, using the following equation ˆr ik = αˆr ik + (1 α)ˆr ik 1 (16) ˆQ k = α ˆQ k + (1 α) ˆQ k 1 (17) where α is the parameter used for averaging the estimates. Equation 16 and 17 does not ensure the positive definiteness, therefore in implementation, we take the absolute value of equation 16 and 17. Equation 16 and 17 are used to adaptively estimate the covariance matrices. In the next section, process and measurement model are described that are used in our implementation. IV. GPS-VISION FUSION The states for our system includes x = [x ẋ y ẏ z ż cδt cδt] T (18) where (x, y, z) denotes the position in Earth Centered Earth Fixed (ECEF) coordinate system, (ẋ, ẏ, ż), denotes the velocity in ECEF coordinate system, (cδt, cδt) denotes clock bias and clock bias rate respectively. We use constant velocity model for our system. The dynamics of the system is given by ẋ k k 1 = Ax k 1 k 1 + ω k (19) 1 t 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 t 0 0 0 0 where A = 0 0 0 1 0 0 0 0 0 0 0 0 1 t 0 0, t is a 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 t 0 0 0 0 0 0 0 1 small time step and ω k is process noise. As this system is linear, A is same as F k. We use pseudoranges from GPS
4 and position obtained from vision as measurements. The measurement vector is given by SV1 ρ... SVi ρ.. (20) z=. SVn ρ xvision y vision red line in Figure 4. The initial path is surrounded by large structures. It is expected that few satellites will be blocked in north-south direction. The rest of the path can be considered as open environment as the heights of surrounded structures is relatively low than that of initial path. VI. R ESULTS We implemented our image matching algorithm and tested on the collected dataset. The obtained result is shown in Figure 5. zvision where ρsvi is the pseudorange obtained from satellite i, n is the number of visible satellites and (xvision, yvision, zvision ) is the position obtained from vision. The measurement model is given by p (X SV1 x)2 + (Y SV1 y)2 + (Z SV1 z)2 + cδt... p (X SVi x)2 + (Y SVi y)2 + (Z SVi z)2 + cδt.. +νk z=. p SVn x)2 + (Y SVn y)2 + (Z SVn z)2 + cδt (X x Fig. 5: Matched features for a single image in database y z (21) where (X SVi, Y SVi, Z SVi ) denotes the satellite position in In Figure 5, SIFT features are shown with green circles. ECEF frame and νk is the measurement noise, Appropriate The top image is obtained from GSV and the bottom image jacobian is used in filter implementation. We used the specified is collected while conducting the experiment. The subsequent process and measurement model for our system. In the next images, from left to right, correspond to the steps shown in the section, experimental scenario is briefly described. block diagram. The right most images show the features after removing outliers. It can be visually verified that the features shown in bottom right image are corresponding to the same V. E XPERIMENTAL S CENARIO features shown in top right image. This shows that reliable We conducted the experiment on urban environments of matching is achieved using our algorithm. UIUC campus. The scenario is shown in Figure 4. In equation 20, z denotes measurements. z is a column vector with pseudoranges and positions obtained from vision. The length of this vector is referred as size of measurements. In equation 20, this length is n + 3. Figure 6 shows that the length of z is not constant and is varying with time. Variance of vision noise is estimated using the proposed method. The variance of vision noise with time is shown in Figure 7. This demonstrates that noise is variable with time. We implemented the filter with time and size varying noise. Figures 8 and 9 show the east and north position with time respectively. In Figures 8 and 9, the noise in vision changes its magnitude around 450 seconds, the proposed filter is successful in adapting to this time variation. In Figure 8, the filtered output is smooth as compared to the position obtained from vision. Fig. 4: Experimental scenario in an urban environment Using pseudoranges, we obtained position by finding the least square estimate (LSE). These estimates are shown in Figure 10 with yellow circle. To compare our method, we We used a commercial GPS receiver to record GPS data. implemented EKF with fixed covariance matrices, the result We used a hand held mobile camera to record a video while is shown in Figure 10 with red line and our proposed filter conducting the experiment. The traveled path is shown with output is shown in blue color. The proposed filter improved the position accuracy in urban environments.
5 Fig. 6: Length of measurement vector (z) with time Fig. 9: Variation of north position with time Fig. 7: Variance of vision noise with time Fig. 8: Variation of east position with time VII. CONCLUSION This paper proposed a GPS-vision fusion method that accounts for time and size variation of noise. We used measure- Fig. 10: LSE, filtered position with fixed and proposed covariance matrices ments to estimate time- and size-varying covariance matrices, as the affect of noise is observed in measurements. We conducted an experiment in urban environments of UIUC campus. The proposed filter adapts to time and size varying noise. We compared our method with fixed covariance matrices and observed improvement in urban environments. REFERENCES [1] P. D. Groves, Shadow matching: A new GNSS positioning technique for urban canyons, Journal of Navigation, vol. 64, no. 03, pp. 417 430, 2011. [2] P. D. Groves, Z. Jiang, L. Wang, and M. K. Ziebart, Intelligent urban positioning using multi-constellation GNSS with 3D mapping and NLOS signal detection, in Proceedings of the 25th International Technical Meeting of The Satellite Division of the Institute of Navigation (ION GNSS+ 2012), Nashville, TN, USA, 2012, pp. 458 472. [3] P. Abbeel, A. Coates, M. Montemerlo, A. Y. Ng, and S. Thrun, Discriminative training of kalman filters, in Proceedings of Robotics: Science and Systems, Cambridge, USA, June 2005. [4] S. Akhlaghi, N. Zhou, and Z. Huang, Adaptive Adjustment of Noise Covariance in Kalman Filter for Dynamic State Estimation, ArXiv e- prints, Feb. 2017. [5] D. G. Lowe, Distinctive image features from scaleinvariant keypoints, International Journal of Computer Vision, vol. 60, no. 2, pp. 91 110, Nov 2004. [Online]. Available: https://doi.org/10.1023/b:visi.0000029664.99615.94