Bayesian Reconstruction of Missing Observations

Size: px

Start display at page:

Download "Bayesian Reconstruction of Missing Observations"

Dylan Dalton
6 years ago
Views:

1 Interdisciplinary Information Sciences Vol. 1, No. 1 (15) 11 3 #Graduate School of Information Sciences, Tohoku University ISSN print/ online DOI 1.436/iis Bayesian Reconstruction of Missing Observations Shun KATAOKA 1, Muneki YASUDA ; and Kazuyuki TANAKA 1 1 Graduate School of Information Sciences, Tohoku University, Sendai , Japan Graduate School of Science and Engineering, Yamagata University, Yonezawa, Yamagata , Japan Received April 4, 14; final version accepted September 19, 14 We focus on an interpolation method referred to Bayesian reconstruction in this paper. Whereas in standard interpolation methods missing data are interpolated deterministically, in Bayesian reconstruction, missing data are interpolated probabilistically using a Bayesian treatment. In this paper, we address the framework of Bayesian reconstruction and its application to the traffic data reconstruction problem in the field of traffic engineering. In the latter part of this paper, we describe the evaluation of the statistical performance of our Bayesian traffic reconstruction model using a statistical mechanical approach and clarify its statistical behavior. KEYWORDS: Bayesian reconstruction, missing data, traffic data reconstruction, Markov random field, mean-field analysis 1. Introduction Methods for interpolating missing data are important in various scientific fields. A standard interpolation method, such as spline interpolation, is a deterministic interpolation technique. An alternative, probabilistic, interpolation technique has been developed in the last few years. In the probabilistic interpolation technique, which is called Bayesian reconstruction, the Bayesian treatment is used to interpolate and reconstruct missing regions. To the best of our Knowledge, Bayesian reconstruction was first implemented in the digital image inpainting filter. The digital image inpainting filter is used in the process of reconstructing lost or deteriorated parts of images [1] (see Fig. 1). Previously, two of the authors applied Bayesian reconstruction to the digital image inpainting filter [3, 4]. Bayesian reconstruction is now becoming the standard technique in the digital image inpainting filter [5]. It can be expected that the framework of Bayesian reconstructions will be utilized in various reconstruction problems, and therefore, their use should not be limited to image processing. Recently, the authors applied Bayesian reconstruction to the traffic data reconstruction problem [6]. Traffic data reconstruction is an important processing that precedes traffic prediction, such as travel time prediction, density prediction, and route planning. In order to provide accurate information to drivers, a broad-scale database of real-time vehicular traffic over an entire city is required. However, in practice, it is difficult to collect the traffic data for an entire city, because traffic sensors are not installed on all roads. Therefore, the objective of the traffic data reconstruction is to reconstruct the states of unobserved roads where traffic sensors are not installed by using information from observed roads where traffic sensors are installed. In the first part of this paper, we introduce the details of Bayesian reconstruction, and subsequently, an overview of the Bayesian traffic data reconstruction method proposed in Ref. 6, together with some new numerical results. In the latter part of this paper, we show a statistical mechanical analysis of our Bayesian traffic reconstruction, and clarify its statistical performance. The remainder of this paper is organized as follows. In Section, we introduce the framework Fig. 1. Example of digital image inpainting filter. The left image is the original scratched image and the center image is masked by the black regions. We reconstructed the masked region using the digital image inpainting filter to restore the original damaged image. The right image is the reconstructed image obtained by using the digital image inpainting method proposed in Ref.. Corresponding author. muneki@yz.yamagata-u.ac.jp

2 1 KATAOKA et al. of Bayesian reconstructions based on Markov random fields (MRFs). We explain a machine learning strategy for model selection based on the maximum likelihood estimation (MLE) in Section.. In Section 3, we present an overview of Bayesian traffic data reconstruction according to the method proposed in Ref. 6, and we show some new numerical results in Section 3.. We describe our evaluation of the statistical performance of our Bayesian traffic reconstruction in terms of a statistical mechanical analysis in Section 4. Finally, we present the conclusions of this paper and outline future work in Section 5.. Scheme of Bayesian Reconstruction of Missing Observations In the Bayesian framework, we suppose observations (observed data) are probabilistically drawn from a specific probability distribution, referred to as prior probability, because observations suffer from uncertainty, whose origin is such as physical noise and incompleteness of some elements. Suppose that there exists an n-dimensional observation x ¼fx i R j i V ¼f1; ;...; ngg, which is generated from prior probability P prior ðxþ, and that we cannot observe a part of the elements in the observation for some reason. Since the observation is probabilistically generated, we can treat its elements as random variables. We define the set of labels of missing elements by M V and the complementary set of M by notation O, i.e., O :¼ V n M, and therefore, O is the set of labels of observed elements. Given an observation x, we describe the values of the observed elements by notation y to distinguish them from unobserved elements and collectively express the observed elements by y ¼fy i R j i Og. The values of the elements in set O are fixed by the observation y. Bayesian reconstruction considered in this paper consists of reconstructing the unobserved elements, M, in the observation by using the observed elements, O; in other words, the objective is to estimate the values of by using y, where notation x A is the set of x i belonging to set A V, i.e., x A :¼fx i j i Ag. In order to reconstruct the missing elements in terms of the Bayesian point of view, we first formulate the posterior probability of the missing elements, Pð j yþ, by the Bayesian rule Pð j yþ ¼ Pð; yþ ; ð:1þ PðyÞ where the value of y is fixed by the observation. By using Dirac s delta, we have Pð ; yþ ¼ Y ðy i x i Þ io! Pð ; x O Þ¼ Y ðy i x i Þ io! P prior ðxþ: It should be noted that, if x are discrete variables, Dirac s delta is replaced by Kronecker s delta. From Eqs. (.1) and (.), we have Pð j yþ /P likelihood ðy j xþp prior ðxþ; ð:3þ where P likelihood ðy j xþ :¼ Y ðy i x i Þ: io ð:þ Probability P likelihood ðy j xþ is referred to as the likelihood in the Bayesian framework. In Bayesian reconstruction, we consider the suitable reconstructed values of unobserved elements, x M, to be the values of that maximize the posterior probability in Eq. (.3), i.e., x M ¼ arg max Pð j yþ ¼arg max P likelihood ðy j xþp prior ðxþ: ð:4þ The above reconstruction scheme requires prior probabilities that describe the hidden probabilistic mechanisms of observations. However, unfortunately, in almost all situations we do not know the details of the prior probabilities. Therefore, in order to implement the Bayesian reconstruction system, we should model unknown prior probabilities..1 Prior modeling based on Markov random fields One of the models of prior probabilities of observed data that is presently available is MRF. MRFs can easily treat complex spatial interactions among observational data points that create a variety of appearance patterns. Consider an undirected graph GðV; EÞ, where V ¼f1; ;...; ng is the set of vertices and E ¼fði; jþg is the set undirected edges between the vertices. An MRF is usually defined on such an undirected graph GðV; EÞ by assigning each variable x i to the corresponding vertex i. Edge ði; jþ expresses a spatial interaction between variable x i and variable x j. To construct a probabilistic model, we define the joint probability of x, P model ðxþ. On the undirected graph GðV; EÞ, if we assume a spatial Markov property among random variables, x, and the positivity of model P model ðxþ >, by the Hammersley Clifford theorem, the model can be expressed as P model ðxþ ¼ 1 Z exp i ðx i Þþ! ijðx i ; x j Þ ; ð:5þ ði;jþe

3 Bayesian Reconstruction of Missing Observations 13 without loss of generalities. The first term in the exponent, i ðx i Þ, is a potential function on vertex i that determines the characteristic of x i, and the second term in the exponent, ij ðx i ; x j Þ, is a potential function between vertices i and j that determines the interaction between x i and x j. P ði;jþe represents the summation running over all edges, Z denotes the normalization constant, sometimes referred to as the partition function, defined by Z :¼ exp i ðx i Þþ! ijðx i ; x j Þ dx: 1 ði;jþe The model in Eq. (.5) is the MRF that is most frequently used. In the MRF, let us consider the conditional probability of x i expressed as P model ðx i j x i Þ¼ 1 P model ðxþ ; P model ðxþ dx i where x i denotes the set of all variables except x i : x i ¼fx j j j V nfigg. Equations (.5) and (.6) lead to P model ðx i j x i Þ¼P model ðx i j Þ; denotes the set of vertices connecting to vertex i in the graph, and denotes the set of variables on the vertices belonging that is, is the set of nearest neighbor variables of x i : ¼fx j j Equation (.7) states that the variable x i depends on only nearest neighbor variables in the conditional probability, and this constitutes the spatial Markov property of MRF.. Model selection using parametric machine learning In order to implement the MRF in Eq. (.5), the forms of potential functions should be determined. This is one of the most important points in MRF modeling. Parametrically, we model the potential functions by certain parametric functions with parameter, P model ðx j Þ ¼ 1 ZðÞ exp i ðx i j i Þþ! ijðx i ; x j ; j ij Þ : ð:8þ ði;jþe Thus, we should find the optimal values of the parameters. The standard method for achieving this is provided by the field of machine learning theory described as follows. The objective of MRF modeling is to model the unknown prior probability of observation P prior ðxþ. Therefore, the optimal values of the parameters,, should minimize some distance between the prior probability and our model P model ðx j Þ. The Kullback-Leibler divergence (KLD) KðP k P 1 Þ :¼ P ðxþ ln P ðxþ 1 P 1 ðxþ dx ð:9þ is often utilized as a measure of two distinct probabilities, P ðxþ and P 1 ðxþ. The value of KLD is always non-negative and is zero when two probabilities are equivalent. Thus, we consider that two distinct probabilities are close to each other when the value of KLD is small. In terms of KLD, we suppose the optimal values of the parameters are given by minimizing the value of KLD between the prior probability and our model, ¼ arg min KðP prior k P model Þ: However, we cannot perform this minimization because we do not know the prior probability. Since we do not know the prior probability, we suppose instead that we have many complete observations generated from the prior probability. We describe the observations by D ¼fy ðþ R n j ¼ 1; ;...; Ng, and we define the empirical distribution of the N complete observations by Q D ðxþ :¼ 1 N Y ðy ðþ i x i Þ: ð:1þ N ¼1 The empirical distribution is the frequency distribution of the N complete observations. It should be noted that, if x are discrete variables, Dirac s delta is again replaced by Kronecker s delta. We suppose the empirical distribution has some important properties of the prior probability and that suppose the optimal values of the parameters are approximately obtained by minimizing the value of KLD between the empirical distribution and our model, ¼ arg min KðP prior k P model Þarg min KðQ D k P model Þ: ð:11þ This minimization can be perform if we have the complete observations generated from the prior probability. Equation (.11) is rewritten as ð:6þ ð:7þ Complete means each observation includes no missing points.

4 14 KATAOKA et al. arg min KðQ D k P model Þ¼arg max Q D ðxþ ln P model ðx j Þ dx: 1 This corresponds to the MLE in statistics. In the above scheme, we assumed there are no missing points in the N observations used in the estimation of the parameters. If the observations include missing points, we will use an alternative strategy and apply the expectation and maximization (EM) algorithm. From the above arguments, the (parametric) Bayesian reconstruction system is summarized as follows. Before the reconstructions, we design the potential functions in our MRF model P model ðx j Þ and estimate the optimal values of the parameters,, in advance by using many complete observations and Eq. (.11). Then, the reconstruction is approximately performed using the constructed model P model ðx j Þ instead of the prior probability in Eq. (.4), i.e., x M ¼ arg max P likelihood ðy j xþp prior ðxþ arg max P likelihood ðy j xþp model ðx j Þ: ð:1þ Since arg max P likelihood ðy j xþp model ðx j Þ¼arg max P model ð j x O ¼ y; Þ; the suitable reconstructed values of unobserved missing points, x M, are the values that maximize the conditional probability of our model, with x O fixed by the observation y. P model ð j x O ; Þ¼ 3. Overview of Bayesian Traffic Data Reconstruction 1 P model ðx j Þ ; P model ðx j Þ d In this section, we give an overview of the application of the Bayesian reconstruction scheme presented in the previous section to the traffic data recognition problem proposed by the authors [6], together with some new numerical results that were obtained in different conditions from the results obtained in Ref MRF model for Bayesian traffic data reconstruction We applied the Bayesian reconstruction scheme to traffic data reconstruction. Our goal is to reconstruct the states of roads, and therefore, random variables x are assigned to roads. In order to formulate an MRF for a road network, we construct an undirected graph GðV; EÞ as follows: we assign each vertex on each road and draw each edge between two roads that are connected to each other at a traffic intersection (see Fig. ). On the undirected graph, we define the MRF (a) (b) (c) Fig.. Undirected graph representation for road network. (a) Road network with six roads and two intersections. (b) Vertices are assigned to roads. (c) Edges are drawn between two roads that are connected to each other at intersections. by P model ðx j Þ ¼ 1 ZðÞ exp h i x i x i J ðx i x j Þ!; ð3:1þ ði;jþe where ¼fh;;Jg are the parameters of the model y. The variable x i R expresses the state of road i. In this paper, we consider x as traffic densities according to the method in Ref. 6. The traffic density on road i is defined by the number of cars per unit area on road i; high densities tend to lead to traffic jams. This MRF is obtained by setting i ðx i Þ¼ h i x i x i = and ijðx i ; x j Þ¼Jðx i x j Þ = in Eq. (.5). Parameter h i is the bias that controls the level of the traffic density of road i, and parameter >controls the variances in the traffic densities. The interaction term in the last term in the exponent in Eq. (3.1) corresponds to our assumption that is traffic densities of neighboring roads take close y Although this expression seems to differ slightly from the original model proposed in Ref. 6, this expression is essentially equivalent to the original model.

5 Bayesian Reconstruction of Missing Observations 15 values. Parameter J controls the strength of the assumption. This MRF forms the multi-dimensional Gaussian and is known as the Gaussian graphical model (GGM). We represent the relationships among roads by the undirected links by ignoring road directions and model the distribution of traffic densities by the time-independent static static distribution. Hence, each node in our model represent the time-independent static distribution of density on the corresponding road. Our model can represent a timeindependent static property of traffic density on each road but cannot represent dynamical flows of traffic densities and dynamical time variations of the traffic densities. In order to consider the time-dependent dynamical properties of roads, we should extend our static model to a dynamical model based on a stochastic process. As in the previous section, we represent the set of unobserved roads by M and the set of observed roads by O. After determining the values of the parameters by the machine learning method in Eq. (.11), from Eq. (.1), the reconstructed densities on the unobserved roads are obtained by x M ¼ arg max P model ð j x O ¼ y; Þ; ð3:þ where y represents the densities on the observed roads. Since our model in Eq. (3.1) is multi-dimensional Gaussian, the conditional probability is also multi-dimensional Gaussian. Therefore, Eq. (3.) is rewritten as x M ¼ P model ð j x O ¼ y; Þ d : ð3:3þ 1 Hence, we find that the reconstructed densities, x M, are the expectations of P modelð j x O ¼ y; Þ. The expectations are obtained by solving the simultaneous equations, 1 x i ¼ h i þ J! z j i M ð3:4þ þj@ðiþjj j@ðiþ by an iteration method, where jaj denotes the number of elements in the assigned set A and z j ¼ x j j M : y j j O Equation (3.4) is known as Gaussian belief propagation (GaBP) [7] for P model ð j x O ¼ y; Þ and is also known as the Gauss Seidel method. It is known that in GGMs GaBPs and Gauss Seidel methods are equivalent in general and that GaBPs always provide exact expectations. 3. Results of numerical simulation using road network of Sendai-city In this section, we describe the application of our Bayesian traffic data reconstruction to the road network of the city of Sendai (shown in Fig. 3) and show the performance of our model. Figure 3 shows the road network of Sendai, which Fig. 3. Road network of Sendai, Japan. This network consists of about ten thousand roads. is the same as the road network shown in Ref. 6, consisting of about ten thousand roads (n ¼ 958). According to our Bayesian traffic data reconstruction scheme, first, we defined the MRF model shown in Eq. (3.1) for the road network. The structure of the MRF was constructed according to Fig.. The parameters were determined by the maximum likelihood estimation shown in Section. with the L regularizations in the same setting shown in Ref. 6. Regularizations are frequently used to avoid over-fitting to noises in training data. In the maximum likelihood estimation, we used N ¼ 359 complete traffic data generated by a traffic simulator, named Simulation On Urban road Network with Dynamic route choice (SOUND) [8], for the road network of Sendai. The SOUND is a mesoscopic traffic simulation model that can simulates realistic dynamical traffics on given road networks. Although the traffic data were not real, they were presumed to represent typical behavior of traffic in Sendai. Using the MRF model, we reconstructed the traffic densities of Sendai. Figure 4 shows the traffic density data that were not used in the learning, namely, the test data. In order to visually represent the densities, we quantized the densities into five stages at.3 intervals; each road is colored according to its quantized traffic density. The densities

increase in the following order: black, blue, green, yellow, and red. Thus, a road colored black has a density in the interval ½; :3Š. For the traffic density data shown in Fig.

6 16 KATAOKA et al. Fig. 4. True traffic density used in our numerical experiment. Each road is colored according to its traffic density. The network in the left panel is the entire network and the network in the right panel is an enlarged image of the center of Sendai. increase in the following order: black, blue, green, yellow, and red. Thus, a road colored black has a density in the interval ½; :3Š. For the traffic density data shown in Fig. 4, we suppose that the densities in some roads are unobserved and we randomly select unobserved roads with probability p ¼ :8. Figure 5 shows the positions of the unobserved roads, Fig. 5. Positions of unobserved roads, where the unobserved roads are colored red. About 8% of roads are unobserved. The network in the left panel is the entire network and the network in the right panel is an enlarged image of the center of Sendai. which are colored red. We reconstructed densities in the unobserved roads using the densities in the observed roads, which colored black in the figure. Our Bayesian reconstruction result is shown in Fig. 6. The mean square error (MSE) Fig. 6. Reconstruction result by using our Bayesian reconstruction, where each road is colored according to its traffic density. The network in the left panel is the entire network and the network in the right panel is an enlarged image of the central area of Sendai. between the true densities shown in Fig. 4 and the reconstructed densities shown in Fig. 6 of the unobserved roads is approximately.1447, where the MSE is defined by ½MSEŠ :¼ 1 ðx data i x recon i Þ ; ð3:5þ t im where x data i is the true density on road i and x recon i is the density on road i reconstructed by our method. The notation t :¼ jmj is the number of missing elements. The scatter plot of this reconstruction is shown in Fig. 7. The correlation coefficient of this scatter plot is approximately.919. It can be seen that the correlation coefficient is close to one. Thus, our simple MRF model can be expected to capture a static statistical property of the traffic data. Next, we address the average performance of our reconstruction method versus the value of the missing probability

7 Bayesian Reconstruction of Missing Observations true densities estimated densities Fig. 7. Scatter plot of the true densities in Fig. 4 and the reconstructed densities in Fig. 6. [MSE] p Fig. 8. MSE versus the missing probability p. The solid curve is the Bezier interpolation of points. p. Figure 8 shows the MSE versus the value of the missing probability p. Each point is the average value of MSE over 1 trials using the leave-one-out cross-validation method. It can be seen that the error increases with the value of the missing probability p. 4. A Statistical Mechanical Analysis of Bayesian Traffic Data Reconstruction In this section, we clarify the relationship between the model parameters in the MRF model in Eq. (3.1) and the reconstruction performance from a statistical mechanical point of view. In our analysis, we assume that the prior probability of traffic data has the same form as our model in Eq. (3.1), P prior ðx j Þ ¼ 1 ZðÞ exp h i x i x i J ðx i x j Þ!; ð4:1þ n i<jv and assume the values of h are independently drawn from an identical distribution p h ðhþ, where notation ZðÞ is the normalization constant and notation P i<jv is the summation running over all distinct pairs of vertices i and j in V ¼f1; ;...; ng, i.e., Pi<jV ¼ P n P n i¼1 j¼iþ1. Although road networks have complex structures, we neglect the structures and employ the fully-connected model with no structure for the simplicity of analysis. For the observations generated from the prior probability in Eq. (4.1), we conduct the reconstructions by using the model taking the form P model ðx j Þ¼ 1 Zð Þ exp i x i x i J ðx i x j Þ!; ð4:þ n i<jv where ¼f; ; J g. The bias parameters in the reconstruction model in Eq. (4.) are defined by i :¼ h i þ " i, and we assume that the values of " are independently drawn from an identical distribution p " ð"þ. IfJ ¼ J, ¼, and " i ¼, the prior model and the reconstruction model are equivalent. For the observation with some missing elements, x ¼f ; x O g, generated from the prior model, as shown in Eq. (3.3), the reconstruction using the reconstruction model in Eq. (4.) is conducted by x i ðx OÞ¼ 1 x i P model ð j x O ; Þ d ; ð4:3þ

8 18 KATAOKA et al. for 8i M. We measure the statistical performance of the reconstruction by the MSE in Eq. (3.5) averaged over all the possible observations generated from the prior probability and over all the possible values of bias parameters h and " that are generated from p h ðhþ and p " ð"þ, respectively. The averaged MSE is expressed by E :¼ Eðh; "Þ Y p h ðh i Þp " ð" i Þ dh d"; ð4:4þ 1 1 where 1 Eðh; "Þ :¼ P prior ðx j Þ ðx i x i 1 t ðx OÞÞ!dx ð4:5þ im is the MSE averaged over all the possible observations for the specific biases. Equation (4.4) represents the MSE of our Bayesian reconstruction averaged over all the possible situations that appear under our assumption for the prior model. Since road networks are quite large, we consider the thermaldynamic limit of the mean square error by taking limits n; t!1where p :¼ t=n is fixed at a finite constant. Parameter p corresponds to the missing rate; it must be in the interval ð; 1Š. In the thermal dynamical limit, from Eqs. (4.3) and (A 1), we have x i ðx OÞ¼ i þ f ðx O Þ þ pj ð h þ " þ f ðx O ÞÞ þ J ð þ J Þð þð1 pþj Þ ; ð4:6þ where f ðx O Þ¼ J x i ð4:7þ n io (see appendix A for the detailed derivation). Notations h and " are the averages of p h ðhþ and p " ð"þ, respectively. From Eq. (A 9), we have ( hx i x j i prior ¼ hx ii prior hx j i prior i 6¼ j 1=ð þ JÞþhx i i ; prior i ¼ j where h i prior :¼ R 1 1 ð ÞP priorðx j Þ dx. Thus, Eq. (4.5) can be rewritten as Eðh; "Þ ¼ 1 ðhx i t i prior hx i x i ðx OÞi prior þhx i ðx OÞ i prior Þ im ¼ 1 1 t þ J þhx ii prior hx ii prior hx i ðx OÞi prior þhx i ðx OÞ i prior : im Similarly, from Eq. (A 9), hf ðx O Þ i prior ¼ J hx n i x j i prior ¼ ð1 pþj ð þ JÞn þ J! hx i i prior! h f ðx O Þi n n!1 prior : io jo io This means that f ðx O Þ coincides with h f ðx O Þi prior with a probability of one. This leads to relation hx i ðx OÞ i prior ¼hx i ðx OÞi prior ; and then, Eq. (4.5) is rewritten as Since, we find Eðh; "Þ ¼ 1 þ J þ 1 t ðhx i i prior hx i ðx OÞi prior Þ : im h f ðx O Þi prior! n!1 ð1 pþj h from Eq. (A 8), we obtain the average of x i ðx OÞ in Eq. (4.6) with respect to the prior probability as hx i ðx OÞi prior ¼ i J þ þ J ð þ J Þ ð1 pþ h þ pðð h þ " Þþð1 pþj h Þ þð1 pþj h ð4:8þ : ð4:9þ By using Eqs. (4.4), (4.8), (4.9), and (A 8), with a straightforward calculation, we finally obtain the explicit form of Eq. (4.4) as E ¼ 1 þ J þ h þ J h " þ þ J ð þ J Þ þ ð Þ h þ " ; ð4:1þ ð þð1 pþj Þ where h and " denote the variances of p hðhþ and p " ð"þ, respectively. Equation (4.1) does not require the information about which roads are selected as the unobserved. The dependency on these data disappears as a result of the averaging operations.

9 Bayesian Reconstruction of Missing Observations 19 It is obvious that E takes the minimum value E min ¼ 1 ð4:11þ þ J when there is no model error between the prior model in Eq. (4.1) and the reconstruction model in Eq. (4.), that is, when J ¼ J, ¼, and " ¼ " ¼. In the following, we examine numerically the relation between the averaged MSE and the parameters in the reconstruction model. For the numerical experiments, we set the parameters in the prior model to J ¼ 1, ¼ :, h ¼ 1, and h ¼ :5. First, we examine the relation when a model error exists only in the interaction parameter, that is, when ¼, " ¼ " ¼, and J ¼ J þ r. The parameter r is the error of interaction. Figure 9 shows the plot of E against error r Averaged MSE Fig. 9. Plot of E against error r when ¼, " ¼ " ¼, p ¼ :5, and J ¼ J þ r. The vertical axis is E in Eq. (4.1). The inset is an enlarged plot around r ¼. r when p ¼ :5. It can be seen that, where J is bigger than J, the performance level is relatively robust. In contrast, the performance level drastically decreases when J is smaller than J. The presented model reconstructs the densities of missing roads by using information of surround roads incoming through the interaction. When the interaction is quite weak, nodes corresponding missing roads cannot receive enough information to be reconstructed. In consequence, the performance level decreases. Next, we examine the relation when a model error exists in the bias and variance parameters. Figures 1 and 11 show 1 8 Averaged MSE Fig. 1. Plot of E against error r when ¼ þ r, " ¼ " ¼, p ¼ :5, and J ¼ J. The vertical axis is E in Eq. (4.1). r the plots of E against error r in the variance parameter and the bias parameters, respectively, when p ¼ :5. Figure 1 shows the plot when a model error exists only in the variance parameter, that is, when ¼ þ r, " ¼ " ¼, and J ¼ J. We see that the error in the variance parameter drastically decreases the performance level. Figure 11 shows the plot when a model error exists in the bias parameters. The solid line shows the plot when a model error exists only in the mean value of the error distribution of biases, pð"þ, and the dotted line shows that when a model error exists only in the variance of the error distribution of biases. The reconstruction performance level monotonically decreases with the increase in error r in Fig. 11. From Eq. (4.1), it can be seen that the dependency on the missing rate p arises when model errors exist in the bias parameters or in variance parameters and. Here, we consider the case where model errors exist in the bias

10 KATAOKA et al. 3 Averaged MSE Fig. 11. Plot of E against error r when ¼, p ¼ :5, and J ¼ J. The solid line is the plot when " ¼ r and " ¼, and the dotted line is that when " ¼ and " ¼ r. The vertical axis is E in Eq. (4.1). r 6 5 Averaged MSE p Fig. 1. Plot of E against missing rate p when ¼ :4, " ¼ :1, " ¼, and J ¼ J. The vertical axis is E in Eq. (4.1). parameters and the variance parameters. Figure 1 shows the plot of E against missing rate p when ¼ :4, " ¼ :1, " ¼, and J ¼ J. The reconstruction performance level decreases with the increase in the value of p. This performance behavior seems to be qualitatively similar to the behavior of our numerical traffic density reconstruction in Fig. 8. From this result, we can presume that the behavior of MSE in Fig. 8 is caused primarily by the model errors in either the biases or the variance, or both. In Fig. 13, we show the regression result obtained by fitting the mean-field Averaged MSE.176 Bayesian reconstruction mean field theory p Fig. 13. Regression result of E obtained by using the least squares. + is the result of numerical simulation of the Bayesian reconstruction shown in Fig. 8. result in Eq. (4.1) to the numerical reconstruction result shown in Fig. 8. The regression shows the excellent fit to the result of numerical simulation of the Bayesian reconstruction. The parameters in our reconstruction model were determined by the training data as described in Section 3.. However, in the training, it was not easy to find the truly optimal values of parameters from the training data, because the number of training data was much smaller than the number of parameters. This could be one of the reasons why the model errors exist.

11 Bayesian Reconstruction of Missing Observations 1 5. Conclusion In this paper, we introduced the Bayesian reconstruction framework, in which missing data are probabilistically interpolated, and overviewed the application of Bayesian reconstruction to the problem of traffic data reconstruction in the field of traffic engineering. Our traffic reconstruction model in Eq. (3.1) neglects some real traffic properties such as traffic lanes and contraflows. Nevertheless, the results of our reconstruction seem to be accurate. It can be expected that our simple GGM captures the static statistical property of traffic data and that it can be used an important base model of Bayesian traffic reconstructions. The extension of our model by taking real traffic properties into account should be addressed in the next study. In the latter part of this paper, we evaluated the statistical performance of our reconstruction by using the statistical mechanical analysis, that is mean-field analysis. In our analysis, we used the simplified reconstruction model, which has no network structures, for the convenience of calculations. However, since real road networks are network structures, the result of our evaluation can be only a rough approximation. The authors proposed a method based on the belief propagation method to evaluate the statistical properties of Bayesian reconstructions on structured networks in the context of image processing [9]. We strongly suggest that the method can be applied to Bayesian traffic reconstruction and can lead to more realistic evaluations. A Mean-field Analysis for Traffic Data Reconstruction Model The partition function in Eq. (4.1) is ZðÞ ¼ exp h i x i x i J ðx i x j Þ!dx 1 n i<jv ¼ exp h i x i 1 ðn 1ÞJ þ x i þ J x j x j!dx: 1 n i<jv By using the Hubbard-Stratonovich transformation, we have ZðÞ ¼ exp h i x i þ J x i þ Jn! 1 x i A dx 1 n rffiffiffiffiffiffi Z Jn 1 ¼ exp ðh i þ zjþx i þ J x i Jn!dx 1 1 z dz: The Gaussian integral leads to sffiffiffiffiffiffiffiffiffiffiffi 1 ZðÞ ¼ þ J þ J n= 1 exp@ h i þ nð þ JÞ J n ð þ JÞ Therefore, the free energy (per one variable) for the prior model in Eq. (4.1) is expressed by! h i A: ða:1þ FðÞ :¼ 1 1 ln ZðÞ ¼ n n ln þ J 1 ln þ J 1 nð þ JÞ h i J n ð þ JÞ! h i ; ða:þ so that the first- and the second-order moments of the prior model are given by hx i i prior ¼ ¼ h i þ J þ J nð þ JÞ jv and hx i x j i prior ¼ j þhx i i prior hx j i prior ¼ ij þ J þ h j J nð þ JÞ þhx ii prior hx j i prior ; respectively, where h i prior ¼ R 1 1 ð ÞP priorðx j Þ dx and ij is Kronecker s delta. Next, we find the free energy for the conditional probability of the reconstruction model in Eq. (4.). The conditional probability is expressed by 1 P model ð j x O ; Þ¼ Zð ; x O Þ exp ð i þ f ðx O ÞÞx i 1 þ ðn 1ÞJ x i þ J x i x j!; n n im im i<jm where f ðx O Þ is defined in Eq. (4.7) and P i<jm represents the summation running over all distinct pairs of vertices i and j in M V. Zð ; x O Þ represents the partition function defined by ða:3þ ða:4þ

12 KATAOKA et al. Zð ; x O Þ :¼ exp ð i þ f ðx O ÞÞx i 1 1 þ ðn 1ÞJ n im im x i þ J n i<jm x i x j!d : Using almost the same derivation as the above mean-field derivation for the prior model, we find sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi þ J t= 1 Zð ; x O Þ¼ exp( ð i þ f ðx O ÞÞ þð1 pþj þ J ð þ J Þ im! ) pj þ ð i þ f ðx O ÞÞ ; ða:5þ tð þ J Þð þð1 pþj Þ im where t is the number of missing elements defined in Section 4 and p ¼ t=n is associated with the missing rate. From Eq. (A 5), we can obtain the free energy (per one variable) for the conditional probability of the reconstruction model by F ð ; x O Þ :¼ 1 t ln Zð ; x O Þ¼ 1 þ J 1 t þð1 pþj ln 1 ð i þ f ðx O ÞÞ þ J tð þ J Þ pj ð t i þ f ðx O ÞÞ! ; ða:6þ ð þ J Þð þð1 pþj Þ im so that the first-order moments of the conditional probability of the reconstruction model are given by hx i i MjO ¼ ð ; x O i ¼ i þ f ðx O Þ þ J þ pj tð þ J Þð þð1 pþj Þ im ð j þ f ðx O ÞÞ for 8i M, where we define h i MjO :¼ R 1 1 ð ÞP modelð j x O ; Þ dx. We consider the thermodynamic limit by taking limits n; t!1, where p is fixed at a finite constant. In the thermal dynamical limit, the free energies in Eqs. (A ) and (A 6) are reduced to FðÞ ¼ 1 ln þ J h þ h ð þ JÞ J h ð þ JÞ and F ð ; x O Þ¼ 1 ln h þ " þð h þ " þ f ðx O ÞÞ pj ð h þ " þ f ðx O ÞÞ þ J ð þ J Þ ð þ J Þð þð1 pþj Þ ; respectively, where h and h are the average and variance of p hðhþ, respectively, and " and " are the average and variance of p " ð"þ, respectively. f h ; " ;h ; " g appear because of the law of large numbers. Similarly, the moments in Eqs. (A 3), (A 4), and (A 7) are reduced to and h i hx i i prior ¼ þ J þ hx i x j i prior ¼ jm ða:7þ J h ð þ JÞ ; ða :8Þ ij þ J þhx ii prior hx j i prior ; hx i i MjO ¼ i þ f ðx O Þ þ pj ð h þ " þ f ðx O ÞÞ þ J ð þ J Þð þð1 pþj Þ ; ða :1Þ respectively, in the thermaldynamic limit. Acknowledgment The authors are very grateful to Professor Masao Kuwahara and Dr. Jinyoung Kim of the Graduate School of Information Science, Tohoku University for providing road network data and traffic simulation data. This work was partially supported by Grants-in-Aid (Nos. 5889, 47, and 5-759) from the Ministry of Education, Culture, Sports, Science and Technology of Japan. REFERENCES [1] Bertalmío, M., Sapiro, G., Caselles, V., and Ballester, C., Image inpainting, Proceedings of SIGGRAPH (). [] Yasuda, M., and Tanaka, K., Mean field approximation for fields of experts, Interdisciplinary Information Sciences, 19: (13). [3] Yasuda, M., Ohkubo, J., and Tanaka, K., Digital image inpainting based on Markov random field, Proceedings of 5 ða:9þ

13 Bayesian Reconstruction of Missing Observations 3 International Conference on Computational Intelligence for Modelling, Control and Automation (CIMCA 5), : (5). [4] Yasuda, M., Ohkubo, J., and Tanaka, K., Digital image inpainting algorithm by using Gaussian graphical model, Information Technology Letters (Proceedings of Forum of Information Technology (FIT) 6), 5: 5 8 (6) (in Japanese). [5] Roth, S., Black, and M. J., Fields of experts, International Journal of Computer Vision, 8: 5 9 (9). [6] Kataoka, S., Yasuda, M., Furtlehner, C., and Tanaka, K., Traffic data reconstruction based on Markov random field modeling, Inverse Problems, 3: 53 (14). [7] Weiss, Y., and Freeman, W. T., Correctness of belief propagation in Gaussian graphical models of arbitrary topology, Neural Computation, 13: 173 (1). [8] SOUND is commercialized by i-transport Lab. Co., Ltd. [9] Kataoka, S., Yasuda, M., and Tanaka, K., Statistical analysis of Gaussian image inpainting problems, Journal of the Physical Society of Japan, 81: 51 (1).

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional