Reducing The Computational Cost of Bayesian Indoor Positioning Systems Konstantinos Kleisouris, Richard P. Martin Computer Science Department Rutgers University WINLAB Research Review May 15 th, 2006
Motivation Bayesian networks (BNs) Have recently been used for location estimation in wireless networks Networks M1, M2, M3, A1 [Madigan 05, Elnahrawy] Can incorporate radio properties Received signal strength (RSS) Angle of Arrival of the signal (AoA) Attractive approach Similar performance with smaller training set when compared to other solutions Training set: Radio properties measured at specific locations
Motivation (cont d) M1 network X, Y: latent variables (location) S i : observable data (RSS) b 0i, b 1i, τ i : stochastic variables Linear regression model for RSS S = b + b log( 1+ D ) i 0i 1i i We want to compute the probability density function (PDF) of X, Y given the training set M1
Motivation (cont d) Computational cost of using BNs with statistical packages like WinBugs is large Platform: Pentium 4 PC, 2.80 GHz, 1 GB RAM, Windows XP Training set size: 253 Time (secs) 16 14 12 10 8 6 4 2 0 WinBugs Localize 1 Point M1 M2 M3 A1 Time (secs) 90 80 70 60 50 40 30 20 10 0 WinBugs Localize 10 Points M1 M2 M3 A1 Networks Networks Can we be faster?
Talk Outline Motivation Our Approach & Contributions Experimental Results Analytic Analysis Conclusions & Future Work
Our Approach We have implemented our own solvers Our Bayesian networks have no closed-form solution We use Markov Chain Monte Carlo (MCMC) simulation MCMC explores the PDF of the variables in a BN using sampling
MCMC methods Gibbs sampling It draws a new value for a variable from its full conditional, f ( v) p( v V \ v) Conditional PDF given all the other quantities are fixed at their current value p( v V \ v) p( v pa( v)) E.g. Conjugate sampling, Slice sampling Metropolis algorithm Draws candidate values for from a proposal distribution w child ( v) p( w v pa( w)) v
Contributions Low latency in localization 1 or 10 points 0.5 sec 51 points with no location info in the training set 6 secs Over 10 times faster than WinBugs Full conditionals are flat Most efficient algorithm: Whole domain sampling Analytic model How flat a distribution should be in order to use whole domain sampling
Slice Sampling Goal: Compute the histogram of a variable given we can compute only Challenge: unknown shape of f (v) f v
Slice Sampling (cont d) Defines an interval given the current value of a variable v Hard to estimate the edges of. It is approximated by Schemes to find Whole domain Whole Domain Sampling Step out Double out The new value of is chosen from I I v S S As approximates better, the fewer the rejections (i.e. values not in S I I ) S S I I v 0 I
Tradeoffs of Estimating I Whole Domain Pros: Easy to compute Cons: Potentially, a lot of rejections Step Out Pros: A few rejections Cons: Many evaluations of f (v) to compute Double Out: Intermediate solution I I
Localization Models M1 M2 M3 A1
Talk Outline Motivation Our Approach & Contributions Experimental Results Analytic Analysis Conclusions & Future Work
MCMC Algorithms Met algorithms Metropolis for X, Y, angle Conjugate sampling for the other variables Slice algorithms Slice sampling for X, Y, angle Conjugate sampling for the other variables Algorithm met wd met sd = k (or sd=k, l) slice wd slice so = k (or so=k, l) slice do = k (or do=k, l) slice2d Description Metropolis with proposal distribution uniform over the whole domain of X, Y (angle) Metropolis with proposal distribution Gaussian whose standard deviation = k ft (l rands for angle) Slice sampling over the whole domain X, Y (angle) Slice sampling by stepping out with w = k ft (w=l rands for angle) Slice sampling by doubling out with w = k ft (w = l rands for angle) 2d slice sampling by updating (X, Y) together and using the whole domain of X, Y (1d for angle over whole domain)
Comparing Algorithms Metric: Relative Reference Euclidean distance of the mean of our results to the ones from WinBugs WinBugs results 10000 iterations burnin 100000 iterations additional Intuition: The results of our solver should converge to the statistics of a well-tested solver after a long run
Comparing Algorithms (cont d) Rel. Accuracy (ft) 3.5 3 2.5 2 1.5 1 M1, N=253, NA=1 met wd met sd=20 met sd=43 slice wd slice so=10 slice do=1 slice2d wd Rel. Accuracy (ft) 4 3.5 3 2.5 2 1.5 1 M2, N=253, NA=10 met wd met sd=20 met sd=43 slice wd slice so=10 slice do=1 slice2d wd 0.5 0.5 0 0.05 0.15 0.25 0.35 0.45 0.55 0.65 0 0 0.5 1 1.5 2 2.5 Time (secs) Time (secs) Rel. Accuracy (ft) 12 10 8 6 4 2 M3, N=253, NA=1 met wd met sd=20 met sd=43 slice wd slice so=10 slice do=1 slice2d wd Rel. Accuracy (ft) 140 120 100 80 60 40 20 M1, N=253, NA=1 met sd=1 slice so=1 0 0.1 0.3 0.5 0.7 0.9 1.1 1.3 1.5 Time (secs) 0 0.05 0.15 0.25 0.35 0.45 0.55 0.65 Time (secs)
Comparing Against WinBugs Whole Domain sampling is faster than WinBugs by a factor that ranges: Localize 1 point: 9.8 (M1) 17.9 (A1) Localize 10 points: 9.1 (M1) 16.1 (A1) Time (secs) 16 14 12 10 8 6 4 2 0 slice wd Localize 1 Point WinBugs M1 M2 M3 A1 Time (secs) 90 80 70 60 50 40 30 20 10 0 slice wd Localize 10 Points WinBugs M1 M2 M3 A1 Networks Networks
Talk Outline Motivation Our Approach & Contributions Experimental Results Analytic Analysis Conclusions & Future Work
Analytic Analysis When whole domain sampling is computationally more efficient than step out? We use the double exponential distribution PDF: λ H ( x ; λ ) λ x λ e =, λ > 0 2 determines how peaky the distribution is 1.2 Double Exponential 1 0.8 lambda=1 lambda=2 lambda=0.5 f(x) 0.6 0.4 0.2 0-10 -5 0 5 10 x
Comparison of flatness Analytic model: when λ < 2 whole domain is faster Full conditionals fall into this regime 1.2 1.0 Network M1 x coordinate lambda=2 1.2 1 Network A1 angle lambda=2 0.8 0.8 f(x) 0.6 f(x) 0.6 0.4 0.4 0.2 0.2 0.0 0 100 200 300 x 0 0 1 2 3 4 5 6 7 x
Talk Outline Motivation Our Approach & Contributions Experimental Results Analytic Analysis Conclusions & Future Work
Conclusions & Future Work Conclusions Low latency in localization Whole domain sampling has the best performance Relative accuracy vs. time Requires no tuning Full conditionals are flat Future Work Parallel implementation of solvers
Thank you!