Sensor Deployment Recommendation for 3D Fine-Grained Air Quality Monitoring using Semi-Supervised Learning

Size: px

Start display at page:

Download "Sensor Deployment Recommendation for 3D Fine-Grained Air Quality Monitoring using Semi-Supervised Learning"

Victoria McKenzie
5 years ago
Views:

1 Sensor Deployment Recommendation for 3D Fine-Grained Air Quality Monitoring using Semi-Supervised Learning Yuzhe Yang, Zijie Zheng, Kaigui Bian, Lingyang Song,andZhuHan School of Electrical Engineering and Computer Science, Peking University, Beijing, China, Electrical and Computer Engineering Department, University of Houston, Houston, TX, USA. Abstract Driven by the increasingly serious air pollution problem, the monitoring of fine-grained air quality index (AQI) in urban areas has drawn considerable attention. In this paper, we design a novel algorithm to recommend the placement of sensors for energy-efficient AQI monitoring in urban threedimensional (3D) space. Specifically, we first propose an entropybased semi-supervised learning (ESSL) model to estimate the AQI distribution of unobserved locations, using the sparse historical spatial-temporal data and other features, including 3D coordinates, wind speed and weather conditions. Based on ESSL, we then design an entropy minimization ranking (EMR) algorithm to recommend the best sensor locations for AQI monitoring. Through the emulation on a fine-grained AQI dataset, the results demonstrate our scheme can provide energy-efficient solutions by using the least number of sensors to achieve higher accuracy than other existing approaches. I. INTRODUCTION Air pollution has been proved to have significantly negative effects on human health and sustainable development, which has attracted a great consideration around the world []. Government agencies have defined the air quality index (AQI) to evaluate pollution degree. AQI is calculated based on the concentration of a number of air pollutants, e.g., the particulate matter (PM) such as PM.5 and PM 0 particles. A higher AQI indicates that air pollution becomes more severe and people are more likely to experience harmful health effects []. Thus, AQI monitoring is a critical issue. The more accurate AQI distribution that can be obtained in a region, the more effective methods we can find to deal with the air pollution. Air quality monitoring is usually done by setting up a few monitoring stations on dedicated sites in a city []. However, these fixed stations can only provide coarse-grained monitoring, i.e., two measurements are separated by few hundreds of meters in the three-dimensional (3D) space. Existing studies have shown that AQI has intrinsic changes from meters to meters, and it is preferred to perform AQI monitoring in the 3D space surrounding an office building or throughout a university campus, rather than city-wide [3]. The fine-grained AQI distribution in meter-sliced areas would be desirable for people, particularly those living in urban areas [4]. For monitoring fine-grained AQI in a 3D area, placing a number of low-cost sensors with laser-based AQI detector can be a desirable method. However, as we want to utilize a number of sensors to do accurate AQI monitoring and prediction, recommending suitable sensor locations in a given 3D space is on demand. In [5] [7], the authors have well investigated how to select monitoring stations over the citywide range. However, they focus on only D coarse-grained scenarios. The 3D fine-grained scenario has not yet fully been addressed. Moreover, considering the communication overhead and data transmission process of the sensors, the battery life also acts as a key factor for monitoring [8]. In order to save the total energy consumption in such sensor networks, the number of sensors would be as small as possible [9]. Thus, the goal is to use as few sensors as possible while maintaining high accuracy in AQI estimation, which is so-called an energyefficient scheme. In this paper, we design a novel scheme for energy-efficient sensor deployment recommendation in urban 3D space, e.g., around an office building. We first propose an entropy-based semi-supervised learning (ESSL) model to estimate the AQI distribution of unobserved locations, based on the sparse historical spatial-temporal data. The proposed ESSL utilizes key features in the fine-grained AQI distribution and is robust for very sparse historical air quality data. Further, we consider the entropy of the AQI distribution at unobserved locations as the uncertainty of our model. The objective is to select locations that can minimize the model s entropy. We then propose an entropy minimization ranking (EMR) algorithm for recommending such a set of locations for sensor deployment, to obtain the best estimation accuracy. The main contributions are summarized as below. The proposed ESSL model can provide higher AQI estimation accuracy based on a fine-grained AQI monitoring dataset [0], and performs better than other learning methods; The EMR algorithm can recommend the most suitable sensor locations, which can approximate the optimal deployment; The proposed scheme can realize energy-efficient deployment by using far fewer sensors while maintaining higher estimation accuracy than existing approaches. The rest of this paper is organized as follows. The preliminaries about dataset and feature selection are introduced in Section II. In Section III, we present our system model and /8/$ IEEE

2 formulate the problem. Section IV introduces the ESSL model for AQI estimation. In Section V, we propose the algorithms for sensor deployment recommendation. Experimental results are provided in Section VI, and conclusions are drawn in Section VII. Temporal Dimension II. DATASET DESCRIPTION In this section, we introduce the dataset with which we use to design and test our scheme throughout this paper. The dataset includes more than 00 days data in a typical 3D scenario, i.e., the courtyard of an office building inside Peking University [0]. In the dataset, each.txt file includes one complete measurement over a day. In each.txt file, each sample has four parameters, 3D coordinates (x, y, z) and an AQI value. Each value represents the measured AQI, while its coordinates in the matrix reflect the measuring position. Every row presents fixed position in xy plane, while every column represents the height at an interval of 5m in z direction. The courtyard is in the size of 40m 40m 50m, which can be divided into 640 5m 5m 5m cubes. Based on the dataset [0] and the previous experimental results of AQI monitoring in fine-grained scenarios [8], wind, location and weather condition are highly related to the finegrained AQI distributions. Thus, we consider these highly correlated parameters as key features as well. Labeled node (cube) Unlabeled node (cube) Edge between labeled and unlabeled nodes Edge between spatial neighbors Edge between temporal neighbors Spatial Dimension III. SYSTEM MODEL We consider the AQI monitoring in a common scenario, as the 3D space surrounding a campus building that is within hundred meters scale. The main objective is to recommend s suitable locations to deploy sensors for AQI monitoring. Based on the graph model, the AQI values at unmeasured locations can be estimated based on the data from the recommended locations. A. Multi-Layer 3D Spatial-Temporal Correlation Graph As shown in Fig., the target 3D space can be divided into a set of cubes, each presented by a node in the graph. Motivated by the correlation of AQI values in both spatial and temporal perspectives, these nodes are connected in both spatial and temporal dimension to form a multi-layer 3D graph G =(V,E). The spatial dimension is presented by the 3D coordinates in the target area, while the temporal dimension is described by a series of time points {T,T,...,T d } with equal interval (e.g., several hours in the dataset [0]), and each layer presents the 3D spatial graph at one specific time point T k. The nodes with monitoring data are named as labeled nodes, while nodes without monitoring data are named as unlabeled nodes. Each labeled node l has the true AQI value, while the AQI of each unlabeled node u can be estimated through a probability function, p u. For convenience, the node set can be written as V = {V L V U }, where V L denotes the set of labeled nodes and V U denotes unlabeled ones. Fig.. An example of the proposed multi-layer 3D spatial-temporal correlation graph model. The edges in E in the graph model can be constructed with the following methods: ) Connected with labeled nodes. Each unlabeled node is connected with all labeled nodes in V L at the same time point T k. Since labeled nodes are sparse, the connection would not increase the complexity of the whole model, while it can increase the accuracy and the speed of convergence. ) Connected with spatial neighbors. Each unlabeled node is also connected with neighboring nodes within a given spatial radius r, since the AQI value of one node is highly correlated to the AQI of its neighbors spatially. 3) Connected with temporal neighbors. Each unlabeled node is connected to the node that has the same location but at neighboring time points, e.g., node u 0 at (x k,y k,z k ) at time point T k will be connected with u and u both at (x k,y k,z k ), but at time point T k and T k+, respectively. This is because of the potential temporal correlation of the AQI value at the same location. For every edge {(v,v ) E}, it has a corresponding weight. The weight of edge represents how much of the features, e.g., wind speed, between two nodes v and v are correlated. The correlation function is defined to describe the correlation:

3 Definition. (Correlation Function) Given a set of features e = {e (),e (),...,e (M) }, the correlation function of each feature between node v and v is defined to have the form of Taylor series expansion, expressed as e Q e (m)(v,v )=ε m + α m + β (m) m (v ) e (m) (v ) + e γ (m) m (v ) e (m) (v ) + ( e ) o (m) (v ) e (m) (v ), m =,,...,M. () In (), ε m is a random variable and is assumed to follow ε m N(0,σ ). As for the fine-grained scenario, the correlation function is adopted as the second-order approximation of the Taylor expansion, of the feature distance (e (m) (v ) e (m) (v )) = δ(e (m) ). α m, β m and γ m are parameters that need to be estimated. Let N be the number of samples, τ (m) =[α m β m γ m ] denotes the parameter vector, e (m) j =[δ(e (m) j )(δ(e (m) j )) ] denotes the feature distance vector for j th sample, the log-likelihood function l can be computed as ( ) N l (m) (τ (m) )=ln exp Q j e (m) e (m) j (τ (m) ) T πσ σ j= = σ N ( Q j e (m) e (m) j j= (τ (m) ) T) n ln σ π. () The parameters τ (m) of the correlation function can be estimated through the least square estimation, given by arg max τ (m) l (m) =argmin τ (m) N ( Q j e (m) e (m) j (τ (m) ) T) =argmin τ (m) j= Q e (m) e (m) (τ (m) ) T. (3) Based on the correlation function in (), we define the weight matrix W = {w ij }, where the weight on edge {(v,v ) E} is expressed as ( ) M =exp θm Q e (m)(v,v ), (4) w v,v m= where θ m is the weight of feature e (m), and needs to be further learned to determine the AQI distribution of the unlabeled nodes. B. Problem Formulation for AQI Estimation The main objective for the model s convergence is to minimize the model s uncertainty for estimating unlabeled nodes. Since the knowledge we hold for the labeled nodes is sparse, it is infeasible to minimize the error probability, as the true values of most nodes are unknown. Instead, we first use the weighted average AQI of neighboring nodes to express the probability function p u at unlabeled nodes []. The objective then becomes to minimize the entropy of the whole model, i.e., H(p u )= u p u log p u, to achieve the accurate estimation []. This idea comes from the fact that an unlabeled node should possess a similar AQI value of its adjacent labeled nodes which are connected to it. Therefore, based on the edge weight function in (4), we define the loss function of the correlation graph to enable the propagation between highly correlated nodes with higher edge weights, expressed by L(p) = (v,v ) E w v,v p v p v, (5) where p v and p v are the AQI distribution at node v and v. Definition. (AQI Distribution Distance) The degree of the similarity of AQI distribution between two nodes is defined by their Symmetrical Kullback-Leibler (KL) Divergence [], which is written as p v p v = D KL (p v p v )+D KL (p v p v ). (6) We assume the AQI value of each node has discrete quantized values on positive integer. The AQI distribution distance can be elaborated by p v p v = [ p v (x)log p v (x) p v (x) + p v (x)log p ] v (x) p v (x) x X X = { 0,,,...,X }, (7) where X denotes the maximum possible AQI value. Thus, our goal is to determine the AQI distribution that minimizes L(p), which is given by the objective function p =argmin L(p). (8) θ By minimizing the loss function L(p), the nodes with higher edge weights would possess more similar AQI value while the nodes with lower edge weights would be more independent. Thus, the objective function can enable the AQI value propagation between the highly correlated nodes, thus improving the estimation accuracy. IV. AQI ESTIMATION USING SEMI-SUPERVISED LEARNING In this section, we investigate the entropy-based semisupervised learning (ESSL) solution for problem (8). We first derive the AQI estimation on unlabeled nodes, then introduce the entropy-based learning method. A. AQI Estimation on Unlabeled Nodes According to literatures, the minimum function in (8) is harmonic [], which means it satisfies Δp u =0on unlabeled nodes U, while Δp l = P (v l ) on labeled nodes L. HereΔ is the combinatorial Laplacian, which is defined by Δ= D W. D = diag(d i ) denotes the diagonal matrix with d i = j u(w i,j), where u( ) is the unit step function; W = {w i,j } is the weight matrix defined in (4).

4 The harmonic property of p u provides the solution for the distribution of unlabeled nodes, which is expressed as the average of its neighboring nodes []: p u (x) = w u,l p l (x), x X. (9) d u (u,l) E The solution again reflects the influence by the highly correlated nodes connected by higher weight edges. To normalize the solution, we redefine it as p u (x) = d u x X p w u,l p l (x) u(x) (u,l) E (0) = x X (u,l) E w u,lp l (x) (u,l) E w u,lp l (x). Proposition. p u in (0) is a probability mass function (PMF) on x. Proof: To be a PMF on x, we test the satisfaction of p u on the following three properties []: The domain of p u is the set of all possible states of x. x x, 0 p u (x). x x p u(x) =. Considering the expression form in (0), the conclusion is obvious, that p u is a PMF on x. The solution of harmonic function can be completed explicitly in the matrix form. We split W, Δ and P into labeled part and unlabeled part as [ ] [ ] [ ] WLL W LU ΔLL Δ LU PL W =, Δ=, P =, W UL W UU Δ UL Δ UU where P L and P U are the AQI distributions at labeled and unlabeled nodes. Applying ΔP U =0,wehave P U = Δ UU Δ ULP L. () The result of () determines the AQI probability distribution of every unlabeled nodes. To provide an exact labeling value of estimation, as is proofed in Proposition that p u is a PMF on x, we quantize it using the expectation of p u : ˆP u = E x pu [x] = P U X x P u (x = x), x X. () x= B. Entropy-based Learning So far, the expression of p u is determined, the next step is to investigate the learning weight functions given by (4). We will learn θ m from both labeled and unlabeled data, and thus form a semi-supervised mechanism. As θ m can influence the correlation between nodes, and thus influence the distribution of unlabeled locations, learning a suitable set of {θ m } is of vital importance. The common criterion of learning θ m is to maximize the likelihood of labeled data. However, this method is infeasible in our case since the labeled nodes are sparse, and thus would not improve the estimation accuracy of unlabeled ones. We instead focus on the model s entropy as the criterion, which represents the authentication of the estimation. This is intuitive since a high entropy can be regarded as the unpredicted value, resulting in poor capability of estimation and low accuracy. Thus, the objective of ESSL is to minimize the whole entropy of unlabeled nodes. The average entropy H(p u ) of unlabeled node set U is defined as H(p u )= H j (p j ) = j= j= x= X p j (x = x)logp j (x = x), x X, (3) where denotes the number of unlabeled nodes. For simplicity, we denote x p j(x = x)logp j (x = x) as p j log p j, the gradient can be derived as H θ m = j= ( log ) pj. (4) p j ln θ m For every unlabeled p j, we investigate pj θ m based on (0) and (4). By applying the chain rule of differentiation, the final gradient can be derived as w u,l =w u,l Q θ e (m)θ m. (5) m Thus, by iteratively learning and updating θ m using (5), the edge weights w u,l can be investigated and further generate the final AQI distribution of P U when the iteration converges. V. SENSOR DEPLOYMENT RECOMMENDATION ALGORITHM In this section, based on the proposed ESSL, we design an entropy minimization ranking (EMR) algorithm for sensor deployment recommendation. More specifically, we first introduce a reverse entropy minimization (REM) algorithm to suggest how to choose a single location. Based on REM, we then introduce EMR for multiple sensors. A. Reverse Entropy Minimization We first introduce the REM algorithm based on [5]. When given a set of unlabeled nodes {U R }, one may intuitively choose a target location with the highest entropy. However, once the chosen node is labeled, the correlation model will change correspondingly, which can result in other unlabeled nodes with even higher entropy. Hence, the lowest entropy node is chosen in U R every time and ranks it reversely from U R to, thus generating a reverse ranking of U R iteratively. Finally, we choose the top rank node as target location, since the correlation between it and other nodes are the lowest, hence is the most uncertain one and is the most difficult for estimation. Let denote the size of U R, since we traverse set U R, the complexity of REM is O(). The process of REM is described in Algorithm.

5 Algorithm : Reverse Entropy Minimization (REM) Input: labeled set {L R }, unlabeled set {U R } Output: target location l for i =0to ( ) do (a) Choose u that has the lowest entropy in {U R }, rank u as ( i); (b) Add u to {L R }, remove u from {U R }; end Return l that ranks st. B. Entropy Minimization Ranking Based on REM, we now propose EMR to recommend s prospective sensor locations. Since the historical data is sparse, we assume the size of labeled set {L} is less than s. This assumption promises the universal applicability of EMR in the fine-grained AQI monitoring scenario. Intuitively, labeled nodes already with historical data can be chosen as target locations. However, it cannot guarantee the minimization of model entropy. That is to say, we need to balance the choice from labeled set {L} as well as unlabeled set {U}. Inspired by this idea, we first obtain all nodes in L as potential locations, and use REM to choose and label s l ( l denotes the size of L) locations from U, to generate the initial recommending set M. Then we iteratively find one node in L without which the entropy improves the least, and replace it by finding the node in U that can best reduce the entropy. When the replacement cannot reduce the total entropy, the iteration stops. This balanced selection of target locations can promise a minimized entropy. Thus, the EMR provides a sub-optimal solution for the location selection problem. Note that at every specific time point T k, we perform the above steps to obtain a rank list M (k). Since the entropy distribution can vary with time, we average the result of {M (), M (),...,M (d) } to select the final recommended locations. This process is vital for improving the performance of EMR, because it can determine nodes with consistently low rankings. The physical meaning of these nodes is that they are more independent from other nodes, and thus need to be picked out as target locations. For analyzing the complexity of EMR, the outer loop contains d time points, the inner loop needs to compute (s + l ) times in the worst case. Since s > l, the total complexity can be derived as O (d(s + l )) = O(d s ), which is of low complexity as a linear function of s. The process of EMR is described in Algorithm. VI. SIMULATION RESULTS In this section, we evaluate the performance of the proposed ESSL model, and the EMR algorithm. As has described in Section II, we use the fine-grained AQI dataset for verification. The dataset contains more than 00 days data, each with 45 labeled locations. Hence, there are 4500 labeled samples in total. Although the 3D space can be divided into 640 cubes with total instances, we do not know the ground Algorithm : Entropy Minimization Ranking (EMR) Input: labeled set {L}, unlabeled set {U}, time series T = {T,T,...,T d }, number of recommended locations s Output: target location set M forall T k T do (a) Use REM ( L (k),u (k)) to select (s l ) unlabeled nodes, combine them with L (k) to initialize the recommend set M (k) ; (b) Remove one node l0 in L (k) which can minimize the improvement of entropy, remove l0 from M (k) ; (c) Use REM ( L (k),u (k)) to select one node u 0 in U (k), add u 0 into M (k) ; (d) Compare the entropy H with that of the last time H. IfH <H then go to (b), else the iteration stops with result M (k) for T k ; end select M from { M (k)}, k =,,...,d. RMSE ESSL DNN knn CART SVR LI MLR Number of unselected locations M Fig.. Comparison of AQI estimation accuracy between different methods, when M labeled locations are unselected. truth AQI value of unlabeled data. Thus, we divide labeled samples into 3500 samples as training set and 000 samples as testing set, while performing an cross-validation by randomly choosing the training data, and repeat for 000 times to avoid stochastic errors. A. Estimation Accuracy We first evaluate the estimation accuracy for the proposed ESSL, versus other commonly used methods. We use rootmean-square error (RMSE) as merit to reflect the estimation accuracy (the lower the better). The proposed scheme is compared to the following baselines []: Deep Neural Networks (DNN) with 50 hidden layers. k-nearest Neighbors (knn). Classification and Regression Tree (CART). Support Vector Regression (SVR). Linear Interpolation (LI). Multi-variable Linear Regression (MLR).

6 RMSE Optimal Proposed EMR Maximum Coverage Spatial-Temporal Greedy Entropy Greedy Search Number of recommending sensor locations s Fig. 3. Accuracy comparison of different recommendation algorithms, via different recommended locations s. Fig. evaluates the estimation accuracy of different methods. We remove several different numbers of labeled locations. The testing set remains to be 0 locations that are randomly chosen. From the figure, we can see our proposed ESSL outperforms other common methods significantly. When labeled locations are removed, ESSL always performs better than other solutions, which validates the robustness of our method. B. Energy-Efficiency of Recommended Locations We verify the ability of EMR to recommend best suitable locations by testing whether it can bring the best promotion on estimation accuracy. We use 0 labeled locations as testing set, and another 5 are used as the set in which we recommend s best locations. We randomly choose l =0locations from 5 potential locations to be the known labeled data, while assuming the other 5 are unknown. Since s> l, we range s from to 4. For each s, there can be Cs 5 combinations of different recommending set M in total. We traverse all the potential combinations to find an optimal M as one baseline. Also, EMR is compared to other following baselines [5]: Maximum Coverage: Every time it selects the location that has the longest distance from the last chosen one, to manage a maximum coverage of the 3D space. Spatial-Temporal Greedy Search: It greedily chooses locations that have the most dissimilarity in both spatial and temporal dimension, based on (). Entropy Greedy Search: It greedily selects locations that have the highest entropy as candidates. In Fig. 3, we report the total RMSE that each algorithm achieves. The proposed EMR generally brings much better improvement than other methods. Moreover, EMR also performs very close to the optimal choice of recommending sensor locations, which demonstrates EMR s effectiveness. In Fig. 4, we show the minimum number of sensors needed, when given the estimation accuracy (i.e., RMSE). The result indicates that EMR can achieve high accuracy with much fewer sensors, thus demonstrates that EMR can provide energy-efficient solutions. Minimum number of sensors Proposed EMR Maximum Coverage Spatial-Temporal Greedy Entropy Greedy Search RMSE Fig. 4. The minimum number of sensors needed between different methods, when given certain RMSE value as the estimation accuracy. VII. CONCLUSION In this paper, we investigated how to recommend the most suitable sensor locations in 3D space for fine-grained AQI monitoring. We first built up a multi-layer 3D spatialtemporal correlation model, and proposed an entropy-based learning model ESSL for estimation. Then, we proposed a recommendation algorithm EMR to recommend the most suitable sensor locations. Experimental results showed that ESSL can achieve higher AQI estimation accuracy than the existing methods, and EMR can provide near-optimal sensor deployment recommendation, which is also energy-efficient. REFERENCES [] Q. Di et al., Air pollution and mortality in the medicare population, New England J. of Medicine, vol. 376, no. 6, pp Jul. 07. [] B. Zou, J. Wilson, F. Zhan, and Y. Zeng, Air pollution exposure assessment methods utilized in epidemiological studies, J. of Environmental Monitoring, vol., no. 3, pp , Feb [3] T. Quang et al., Vertical particle concentration profiles around urban office buildings, Atmospheric Chemistry and Physics, vol., no., pp May 0. [4] C. Borrego et al., How urban structure can affect city sustainability from an air quality perspective, Environmental modelling & software, vol., no. 4, pp , Apr [5] H. Hsieh, S. Lin, and Y. Zheng, Inferring air quality for station location recommendation based on urban big data, ACM Int. Conf. on Knowledge Discovery and Data Mining (KDD 5), Sydney, Australia, Aug. 05. [6] T. Liu et al., Finding optimal meteorological observation locations by multi-source urban big data analysis, IEEE Int. Conf. on Cloud Computing and Big Data (CCBD 6), Macau, China, Nov. 06. [7] Y. Zheng et al., Forecasting fine-grained air quality based on big data, ACM Int. Conf. on Knowledge Discovery and Data Mining (KDD 5), Sydney, Australia, Aug. 05. [8] Y. Yang et al., Arms: a fine-grained 3D AQI realtime monitoring system by UAV, IEEE Global Commun. Conf. (GLOBECOM 7), Singapore, Dec. 07. [9] Y. Yang et al., AQNet: fine-grained 3D spatio-temporal air quality monitoring by aerial-ground WSN, IEEE Int. Conf. on Comput. Commun. (INFOCOM 8), Honolulu, HI, Apr. 08. [0] Y. Yang, Z. Zheng, K. Bian, L. Song, and Z. Han, Realtime profiling of fine-grained air quality index distribution using UAV sensing, IEEE Internet of Things Journal, vol. 99, pp. -3, Nov. 07. [] X. Zhu, Z. Ghahramani, and J. Lafferty, Semi-supervised learning using gaussian fields and harmonic functions, Proc. of the 0th Int. Conf. on Machine Learning (ICML 03), Washington, DC, Aug [] I. Goodfellow, Y. Bengio, and A. Courville, Applied Math and Machine Learning, in Deep Learning. Cambridge, MA: MIT Press, 06.

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab