Bayesian Reconstruction of Missing Observations

Size: px
Start display at page:

Download "Bayesian Reconstruction of Missing Observations"

Transcription

1 Interdisciplinary Information Sciences Vol. 1, No. 1 (15) 11 3 #Graduate School of Information Sciences, Tohoku University ISSN print/ online DOI 1.436/iis Bayesian Reconstruction of Missing Observations Shun KATAOKA 1, Muneki YASUDA ; and Kazuyuki TANAKA 1 1 Graduate School of Information Sciences, Tohoku University, Sendai , Japan Graduate School of Science and Engineering, Yamagata University, Yonezawa, Yamagata , Japan Received April 4, 14; final version accepted September 19, 14 We focus on an interpolation method referred to Bayesian reconstruction in this paper. Whereas in standard interpolation methods missing data are interpolated deterministically, in Bayesian reconstruction, missing data are interpolated probabilistically using a Bayesian treatment. In this paper, we address the framework of Bayesian reconstruction and its application to the traffic data reconstruction problem in the field of traffic engineering. In the latter part of this paper, we describe the evaluation of the statistical performance of our Bayesian traffic reconstruction model using a statistical mechanical approach and clarify its statistical behavior. KEYWORDS: Bayesian reconstruction, missing data, traffic data reconstruction, Markov random field, mean-field analysis 1. Introduction Methods for interpolating missing data are important in various scientific fields. A standard interpolation method, such as spline interpolation, is a deterministic interpolation technique. An alternative, probabilistic, interpolation technique has been developed in the last few years. In the probabilistic interpolation technique, which is called Bayesian reconstruction, the Bayesian treatment is used to interpolate and reconstruct missing regions. To the best of our Knowledge, Bayesian reconstruction was first implemented in the digital image inpainting filter. The digital image inpainting filter is used in the process of reconstructing lost or deteriorated parts of images [1] (see Fig. 1). Previously, two of the authors applied Bayesian reconstruction to the digital image inpainting filter [3, 4]. Bayesian reconstruction is now becoming the standard technique in the digital image inpainting filter [5]. It can be expected that the framework of Bayesian reconstructions will be utilized in various reconstruction problems, and therefore, their use should not be limited to image processing. Recently, the authors applied Bayesian reconstruction to the traffic data reconstruction problem [6]. Traffic data reconstruction is an important processing that precedes traffic prediction, such as travel time prediction, density prediction, and route planning. In order to provide accurate information to drivers, a broad-scale database of real-time vehicular traffic over an entire city is required. However, in practice, it is difficult to collect the traffic data for an entire city, because traffic sensors are not installed on all roads. Therefore, the objective of the traffic data reconstruction is to reconstruct the states of unobserved roads where traffic sensors are not installed by using information from observed roads where traffic sensors are installed. In the first part of this paper, we introduce the details of Bayesian reconstruction, and subsequently, an overview of the Bayesian traffic data reconstruction method proposed in Ref. 6, together with some new numerical results. In the latter part of this paper, we show a statistical mechanical analysis of our Bayesian traffic reconstruction, and clarify its statistical performance. The remainder of this paper is organized as follows. In Section, we introduce the framework Fig. 1. Example of digital image inpainting filter. The left image is the original scratched image and the center image is masked by the black regions. We reconstructed the masked region using the digital image inpainting filter to restore the original damaged image. The right image is the reconstructed image obtained by using the digital image inpainting method proposed in Ref.. Corresponding author. muneki@yz.yamagata-u.ac.jp

2 1 KATAOKA et al. of Bayesian reconstructions based on Markov random fields (MRFs). We explain a machine learning strategy for model selection based on the maximum likelihood estimation (MLE) in Section.. In Section 3, we present an overview of Bayesian traffic data reconstruction according to the method proposed in Ref. 6, and we show some new numerical results in Section 3.. We describe our evaluation of the statistical performance of our Bayesian traffic reconstruction in terms of a statistical mechanical analysis in Section 4. Finally, we present the conclusions of this paper and outline future work in Section 5.. Scheme of Bayesian Reconstruction of Missing Observations In the Bayesian framework, we suppose observations (observed data) are probabilistically drawn from a specific probability distribution, referred to as prior probability, because observations suffer from uncertainty, whose origin is such as physical noise and incompleteness of some elements. Suppose that there exists an n-dimensional observation x ¼fx i R j i V ¼f1; ;...; ngg, which is generated from prior probability P prior ðxþ, and that we cannot observe a part of the elements in the observation for some reason. Since the observation is probabilistically generated, we can treat its elements as random variables. We define the set of labels of missing elements by M V and the complementary set of M by notation O, i.e., O :¼ V n M, and therefore, O is the set of labels of observed elements. Given an observation x, we describe the values of the observed elements by notation y to distinguish them from unobserved elements and collectively express the observed elements by y ¼fy i R j i Og. The values of the elements in set O are fixed by the observation y. Bayesian reconstruction considered in this paper consists of reconstructing the unobserved elements, M, in the observation by using the observed elements, O; in other words, the objective is to estimate the values of by using y, where notation x A is the set of x i belonging to set A V, i.e., x A :¼fx i j i Ag. In order to reconstruct the missing elements in terms of the Bayesian point of view, we first formulate the posterior probability of the missing elements, Pð j yþ, by the Bayesian rule Pð j yþ ¼ Pð; yþ ; ð:1þ PðyÞ where the value of y is fixed by the observation. By using Dirac s delta, we have Pð ; yþ ¼ Y ðy i x i Þ io! Pð ; x O Þ¼ Y ðy i x i Þ io! P prior ðxþ: It should be noted that, if x are discrete variables, Dirac s delta is replaced by Kronecker s delta. From Eqs. (.1) and (.), we have Pð j yþ /P likelihood ðy j xþp prior ðxþ; ð:3þ where P likelihood ðy j xþ :¼ Y ðy i x i Þ: io ð:þ Probability P likelihood ðy j xþ is referred to as the likelihood in the Bayesian framework. In Bayesian reconstruction, we consider the suitable reconstructed values of unobserved elements, x M, to be the values of that maximize the posterior probability in Eq. (.3), i.e., x M ¼ arg max Pð j yþ ¼arg max P likelihood ðy j xþp prior ðxþ: ð:4þ The above reconstruction scheme requires prior probabilities that describe the hidden probabilistic mechanisms of observations. However, unfortunately, in almost all situations we do not know the details of the prior probabilities. Therefore, in order to implement the Bayesian reconstruction system, we should model unknown prior probabilities..1 Prior modeling based on Markov random fields One of the models of prior probabilities of observed data that is presently available is MRF. MRFs can easily treat complex spatial interactions among observational data points that create a variety of appearance patterns. Consider an undirected graph GðV; EÞ, where V ¼f1; ;...; ng is the set of vertices and E ¼fði; jþg is the set undirected edges between the vertices. An MRF is usually defined on such an undirected graph GðV; EÞ by assigning each variable x i to the corresponding vertex i. Edge ði; jþ expresses a spatial interaction between variable x i and variable x j. To construct a probabilistic model, we define the joint probability of x, P model ðxþ. On the undirected graph GðV; EÞ, if we assume a spatial Markov property among random variables, x, and the positivity of model P model ðxþ >, by the Hammersley Clifford theorem, the model can be expressed as P model ðxþ ¼ 1 Z exp i ðx i Þþ! ijðx i ; x j Þ ; ð:5þ ði;jþe

3 Bayesian Reconstruction of Missing Observations 13 without loss of generalities. The first term in the exponent, i ðx i Þ, is a potential function on vertex i that determines the characteristic of x i, and the second term in the exponent, ij ðx i ; x j Þ, is a potential function between vertices i and j that determines the interaction between x i and x j. P ði;jþe represents the summation running over all edges, Z denotes the normalization constant, sometimes referred to as the partition function, defined by Z :¼ exp i ðx i Þþ! ijðx i ; x j Þ dx: 1 ði;jþe The model in Eq. (.5) is the MRF that is most frequently used. In the MRF, let us consider the conditional probability of x i expressed as P model ðx i j x i Þ¼ 1 P model ðxþ ; P model ðxþ dx i where x i denotes the set of all variables except x i : x i ¼fx j j j V nfigg. Equations (.5) and (.6) lead to P model ðx i j x i Þ¼P model ðx i j Þ; denotes the set of vertices connecting to vertex i in the graph, and denotes the set of variables on the vertices belonging that is, is the set of nearest neighbor variables of x i : ¼fx j j Equation (.7) states that the variable x i depends on only nearest neighbor variables in the conditional probability, and this constitutes the spatial Markov property of MRF.. Model selection using parametric machine learning In order to implement the MRF in Eq. (.5), the forms of potential functions should be determined. This is one of the most important points in MRF modeling. Parametrically, we model the potential functions by certain parametric functions with parameter, P model ðx j Þ ¼ 1 ZðÞ exp i ðx i j i Þþ! ijðx i ; x j ; j ij Þ : ð:8þ ði;jþe Thus, we should find the optimal values of the parameters. The standard method for achieving this is provided by the field of machine learning theory described as follows. The objective of MRF modeling is to model the unknown prior probability of observation P prior ðxþ. Therefore, the optimal values of the parameters,, should minimize some distance between the prior probability and our model P model ðx j Þ. The Kullback-Leibler divergence (KLD) KðP k P 1 Þ :¼ P ðxþ ln P ðxþ 1 P 1 ðxþ dx ð:9þ is often utilized as a measure of two distinct probabilities, P ðxþ and P 1 ðxþ. The value of KLD is always non-negative and is zero when two probabilities are equivalent. Thus, we consider that two distinct probabilities are close to each other when the value of KLD is small. In terms of KLD, we suppose the optimal values of the parameters are given by minimizing the value of KLD between the prior probability and our model, ¼ arg min KðP prior k P model Þ: However, we cannot perform this minimization because we do not know the prior probability. Since we do not know the prior probability, we suppose instead that we have many complete observations generated from the prior probability. We describe the observations by D ¼fy ðþ R n j ¼ 1; ;...; Ng, and we define the empirical distribution of the N complete observations by Q D ðxþ :¼ 1 N Y ðy ðþ i x i Þ: ð:1þ N ¼1 The empirical distribution is the frequency distribution of the N complete observations. It should be noted that, if x are discrete variables, Dirac s delta is again replaced by Kronecker s delta. We suppose the empirical distribution has some important properties of the prior probability and that suppose the optimal values of the parameters are approximately obtained by minimizing the value of KLD between the empirical distribution and our model, ¼ arg min KðP prior k P model Þarg min KðQ D k P model Þ: ð:11þ This minimization can be perform if we have the complete observations generated from the prior probability. Equation (.11) is rewritten as ð:6þ ð:7þ Complete means each observation includes no missing points.

4 14 KATAOKA et al. arg min KðQ D k P model Þ¼arg max Q D ðxþ ln P model ðx j Þ dx: 1 This corresponds to the MLE in statistics. In the above scheme, we assumed there are no missing points in the N observations used in the estimation of the parameters. If the observations include missing points, we will use an alternative strategy and apply the expectation and maximization (EM) algorithm. From the above arguments, the (parametric) Bayesian reconstruction system is summarized as follows. Before the reconstructions, we design the potential functions in our MRF model P model ðx j Þ and estimate the optimal values of the parameters,, in advance by using many complete observations and Eq. (.11). Then, the reconstruction is approximately performed using the constructed model P model ðx j Þ instead of the prior probability in Eq. (.4), i.e., x M ¼ arg max P likelihood ðy j xþp prior ðxþ arg max P likelihood ðy j xþp model ðx j Þ: ð:1þ Since arg max P likelihood ðy j xþp model ðx j Þ¼arg max P model ð j x O ¼ y; Þ; the suitable reconstructed values of unobserved missing points, x M, are the values that maximize the conditional probability of our model, with x O fixed by the observation y. P model ð j x O ; Þ¼ 3. Overview of Bayesian Traffic Data Reconstruction 1 P model ðx j Þ ; P model ðx j Þ d In this section, we give an overview of the application of the Bayesian reconstruction scheme presented in the previous section to the traffic data recognition problem proposed by the authors [6], together with some new numerical results that were obtained in different conditions from the results obtained in Ref MRF model for Bayesian traffic data reconstruction We applied the Bayesian reconstruction scheme to traffic data reconstruction. Our goal is to reconstruct the states of roads, and therefore, random variables x are assigned to roads. In order to formulate an MRF for a road network, we construct an undirected graph GðV; EÞ as follows: we assign each vertex on each road and draw each edge between two roads that are connected to each other at a traffic intersection (see Fig. ). On the undirected graph, we define the MRF (a) (b) (c) Fig.. Undirected graph representation for road network. (a) Road network with six roads and two intersections. (b) Vertices are assigned to roads. (c) Edges are drawn between two roads that are connected to each other at intersections. by P model ðx j Þ ¼ 1 ZðÞ exp h i x i x i J ðx i x j Þ!; ð3:1þ ði;jþe where ¼fh;;Jg are the parameters of the model y. The variable x i R expresses the state of road i. In this paper, we consider x as traffic densities according to the method in Ref. 6. The traffic density on road i is defined by the number of cars per unit area on road i; high densities tend to lead to traffic jams. This MRF is obtained by setting i ðx i Þ¼ h i x i x i = and ijðx i ; x j Þ¼Jðx i x j Þ = in Eq. (.5). Parameter h i is the bias that controls the level of the traffic density of road i, and parameter >controls the variances in the traffic densities. The interaction term in the last term in the exponent in Eq. (3.1) corresponds to our assumption that is traffic densities of neighboring roads take close y Although this expression seems to differ slightly from the original model proposed in Ref. 6, this expression is essentially equivalent to the original model.

5 Bayesian Reconstruction of Missing Observations 15 values. Parameter J controls the strength of the assumption. This MRF forms the multi-dimensional Gaussian and is known as the Gaussian graphical model (GGM). We represent the relationships among roads by the undirected links by ignoring road directions and model the distribution of traffic densities by the time-independent static static distribution. Hence, each node in our model represent the time-independent static distribution of density on the corresponding road. Our model can represent a timeindependent static property of traffic density on each road but cannot represent dynamical flows of traffic densities and dynamical time variations of the traffic densities. In order to consider the time-dependent dynamical properties of roads, we should extend our static model to a dynamical model based on a stochastic process. As in the previous section, we represent the set of unobserved roads by M and the set of observed roads by O. After determining the values of the parameters by the machine learning method in Eq. (.11), from Eq. (.1), the reconstructed densities on the unobserved roads are obtained by x M ¼ arg max P model ð j x O ¼ y; Þ; ð3:þ where y represents the densities on the observed roads. Since our model in Eq. (3.1) is multi-dimensional Gaussian, the conditional probability is also multi-dimensional Gaussian. Therefore, Eq. (3.) is rewritten as x M ¼ P model ð j x O ¼ y; Þ d : ð3:3þ 1 Hence, we find that the reconstructed densities, x M, are the expectations of P modelð j x O ¼ y; Þ. The expectations are obtained by solving the simultaneous equations, 1 x i ¼ h i þ J! z j i M ð3:4þ þj@ðiþjj j@ðiþ by an iteration method, where jaj denotes the number of elements in the assigned set A and z j ¼ x j j M : y j j O Equation (3.4) is known as Gaussian belief propagation (GaBP) [7] for P model ð j x O ¼ y; Þ and is also known as the Gauss Seidel method. It is known that in GGMs GaBPs and Gauss Seidel methods are equivalent in general and that GaBPs always provide exact expectations. 3. Results of numerical simulation using road network of Sendai-city In this section, we describe the application of our Bayesian traffic data reconstruction to the road network of the city of Sendai (shown in Fig. 3) and show the performance of our model. Figure 3 shows the road network of Sendai, which Fig. 3. Road network of Sendai, Japan. This network consists of about ten thousand roads. is the same as the road network shown in Ref. 6, consisting of about ten thousand roads (n ¼ 958). According to our Bayesian traffic data reconstruction scheme, first, we defined the MRF model shown in Eq. (3.1) for the road network. The structure of the MRF was constructed according to Fig.. The parameters were determined by the maximum likelihood estimation shown in Section. with the L regularizations in the same setting shown in Ref. 6. Regularizations are frequently used to avoid over-fitting to noises in training data. In the maximum likelihood estimation, we used N ¼ 359 complete traffic data generated by a traffic simulator, named Simulation On Urban road Network with Dynamic route choice (SOUND) [8], for the road network of Sendai. The SOUND is a mesoscopic traffic simulation model that can simulates realistic dynamical traffics on given road networks. Although the traffic data were not real, they were presumed to represent typical behavior of traffic in Sendai. Using the MRF model, we reconstructed the traffic densities of Sendai. Figure 4 shows the traffic density data that were not used in the learning, namely, the test data. In order to visually represent the densities, we quantized the densities into five stages at.3 intervals; each road is colored according to its quantized traffic density. The densities

6 16 KATAOKA et al. Fig. 4. True traffic density used in our numerical experiment. Each road is colored according to its traffic density. The network in the left panel is the entire network and the network in the right panel is an enlarged image of the center of Sendai. increase in the following order: black, blue, green, yellow, and red. Thus, a road colored black has a density in the interval ½; :3Š. For the traffic density data shown in Fig. 4, we suppose that the densities in some roads are unobserved and we randomly select unobserved roads with probability p ¼ :8. Figure 5 shows the positions of the unobserved roads, Fig. 5. Positions of unobserved roads, where the unobserved roads are colored red. About 8% of roads are unobserved. The network in the left panel is the entire network and the network in the right panel is an enlarged image of the center of Sendai. which are colored red. We reconstructed densities in the unobserved roads using the densities in the observed roads, which colored black in the figure. Our Bayesian reconstruction result is shown in Fig. 6. The mean square error (MSE) Fig. 6. Reconstruction result by using our Bayesian reconstruction, where each road is colored according to its traffic density. The network in the left panel is the entire network and the network in the right panel is an enlarged image of the central area of Sendai. between the true densities shown in Fig. 4 and the reconstructed densities shown in Fig. 6 of the unobserved roads is approximately.1447, where the MSE is defined by ½MSEŠ :¼ 1 ðx data i x recon i Þ ; ð3:5þ t im where x data i is the true density on road i and x recon i is the density on road i reconstructed by our method. The notation t :¼ jmj is the number of missing elements. The scatter plot of this reconstruction is shown in Fig. 7. The correlation coefficient of this scatter plot is approximately.919. It can be seen that the correlation coefficient is close to one. Thus, our simple MRF model can be expected to capture a static statistical property of the traffic data. Next, we address the average performance of our reconstruction method versus the value of the missing probability

7 Bayesian Reconstruction of Missing Observations true densities estimated densities Fig. 7. Scatter plot of the true densities in Fig. 4 and the reconstructed densities in Fig. 6. [MSE] p Fig. 8. MSE versus the missing probability p. The solid curve is the Bezier interpolation of points. p. Figure 8 shows the MSE versus the value of the missing probability p. Each point is the average value of MSE over 1 trials using the leave-one-out cross-validation method. It can be seen that the error increases with the value of the missing probability p. 4. A Statistical Mechanical Analysis of Bayesian Traffic Data Reconstruction In this section, we clarify the relationship between the model parameters in the MRF model in Eq. (3.1) and the reconstruction performance from a statistical mechanical point of view. In our analysis, we assume that the prior probability of traffic data has the same form as our model in Eq. (3.1), P prior ðx j Þ ¼ 1 ZðÞ exp h i x i x i J ðx i x j Þ!; ð4:1þ n i<jv and assume the values of h are independently drawn from an identical distribution p h ðhþ, where notation ZðÞ is the normalization constant and notation P i<jv is the summation running over all distinct pairs of vertices i and j in V ¼f1; ;...; ng, i.e., Pi<jV ¼ P n P n i¼1 j¼iþ1. Although road networks have complex structures, we neglect the structures and employ the fully-connected model with no structure for the simplicity of analysis. For the observations generated from the prior probability in Eq. (4.1), we conduct the reconstructions by using the model taking the form P model ðx j Þ¼ 1 Zð Þ exp i x i x i J ðx i x j Þ!; ð4:þ n i<jv where ¼f; ; J g. The bias parameters in the reconstruction model in Eq. (4.) are defined by i :¼ h i þ " i, and we assume that the values of " are independently drawn from an identical distribution p " ð"þ. IfJ ¼ J, ¼, and " i ¼, the prior model and the reconstruction model are equivalent. For the observation with some missing elements, x ¼f ; x O g, generated from the prior model, as shown in Eq. (3.3), the reconstruction using the reconstruction model in Eq. (4.) is conducted by x i ðx OÞ¼ 1 x i P model ð j x O ; Þ d ; ð4:3þ

8 18 KATAOKA et al. for 8i M. We measure the statistical performance of the reconstruction by the MSE in Eq. (3.5) averaged over all the possible observations generated from the prior probability and over all the possible values of bias parameters h and " that are generated from p h ðhþ and p " ð"þ, respectively. The averaged MSE is expressed by E :¼ Eðh; "Þ Y p h ðh i Þp " ð" i Þ dh d"; ð4:4þ 1 1 where 1 Eðh; "Þ :¼ P prior ðx j Þ ðx i x i 1 t ðx OÞÞ!dx ð4:5þ im is the MSE averaged over all the possible observations for the specific biases. Equation (4.4) represents the MSE of our Bayesian reconstruction averaged over all the possible situations that appear under our assumption for the prior model. Since road networks are quite large, we consider the thermaldynamic limit of the mean square error by taking limits n; t!1where p :¼ t=n is fixed at a finite constant. Parameter p corresponds to the missing rate; it must be in the interval ð; 1Š. In the thermal dynamical limit, from Eqs. (4.3) and (A 1), we have x i ðx OÞ¼ i þ f ðx O Þ þ pj ð h þ " þ f ðx O ÞÞ þ J ð þ J Þð þð1 pþj Þ ; ð4:6þ where f ðx O Þ¼ J x i ð4:7þ n io (see appendix A for the detailed derivation). Notations h and " are the averages of p h ðhþ and p " ð"þ, respectively. From Eq. (A 9), we have ( hx i x j i prior ¼ hx ii prior hx j i prior i 6¼ j 1=ð þ JÞþhx i i ; prior i ¼ j where h i prior :¼ R 1 1 ð ÞP priorðx j Þ dx. Thus, Eq. (4.5) can be rewritten as Eðh; "Þ ¼ 1 ðhx i t i prior hx i x i ðx OÞi prior þhx i ðx OÞ i prior Þ im ¼ 1 1 t þ J þhx ii prior hx ii prior hx i ðx OÞi prior þhx i ðx OÞ i prior : im Similarly, from Eq. (A 9), hf ðx O Þ i prior ¼ J hx n i x j i prior ¼ ð1 pþj ð þ JÞn þ J! hx i i prior! h f ðx O Þi n n!1 prior : io jo io This means that f ðx O Þ coincides with h f ðx O Þi prior with a probability of one. This leads to relation hx i ðx OÞ i prior ¼hx i ðx OÞi prior ; and then, Eq. (4.5) is rewritten as Since, we find Eðh; "Þ ¼ 1 þ J þ 1 t ðhx i i prior hx i ðx OÞi prior Þ : im h f ðx O Þi prior! n!1 ð1 pþj h from Eq. (A 8), we obtain the average of x i ðx OÞ in Eq. (4.6) with respect to the prior probability as hx i ðx OÞi prior ¼ i J þ þ J ð þ J Þ ð1 pþ h þ pðð h þ " Þþð1 pþj h Þ þð1 pþj h ð4:8þ : ð4:9þ By using Eqs. (4.4), (4.8), (4.9), and (A 8), with a straightforward calculation, we finally obtain the explicit form of Eq. (4.4) as E ¼ 1 þ J þ h þ J h " þ þ J ð þ J Þ þ ð Þ h þ " ; ð4:1þ ð þð1 pþj Þ where h and " denote the variances of p hðhþ and p " ð"þ, respectively. Equation (4.1) does not require the information about which roads are selected as the unobserved. The dependency on these data disappears as a result of the averaging operations.

9 Bayesian Reconstruction of Missing Observations 19 It is obvious that E takes the minimum value E min ¼ 1 ð4:11þ þ J when there is no model error between the prior model in Eq. (4.1) and the reconstruction model in Eq. (4.), that is, when J ¼ J, ¼, and " ¼ " ¼. In the following, we examine numerically the relation between the averaged MSE and the parameters in the reconstruction model. For the numerical experiments, we set the parameters in the prior model to J ¼ 1, ¼ :, h ¼ 1, and h ¼ :5. First, we examine the relation when a model error exists only in the interaction parameter, that is, when ¼, " ¼ " ¼, and J ¼ J þ r. The parameter r is the error of interaction. Figure 9 shows the plot of E against error r Averaged MSE Fig. 9. Plot of E against error r when ¼, " ¼ " ¼, p ¼ :5, and J ¼ J þ r. The vertical axis is E in Eq. (4.1). The inset is an enlarged plot around r ¼. r when p ¼ :5. It can be seen that, where J is bigger than J, the performance level is relatively robust. In contrast, the performance level drastically decreases when J is smaller than J. The presented model reconstructs the densities of missing roads by using information of surround roads incoming through the interaction. When the interaction is quite weak, nodes corresponding missing roads cannot receive enough information to be reconstructed. In consequence, the performance level decreases. Next, we examine the relation when a model error exists in the bias and variance parameters. Figures 1 and 11 show 1 8 Averaged MSE Fig. 1. Plot of E against error r when ¼ þ r, " ¼ " ¼, p ¼ :5, and J ¼ J. The vertical axis is E in Eq. (4.1). r the plots of E against error r in the variance parameter and the bias parameters, respectively, when p ¼ :5. Figure 1 shows the plot when a model error exists only in the variance parameter, that is, when ¼ þ r, " ¼ " ¼, and J ¼ J. We see that the error in the variance parameter drastically decreases the performance level. Figure 11 shows the plot when a model error exists in the bias parameters. The solid line shows the plot when a model error exists only in the mean value of the error distribution of biases, pð"þ, and the dotted line shows that when a model error exists only in the variance of the error distribution of biases. The reconstruction performance level monotonically decreases with the increase in error r in Fig. 11. From Eq. (4.1), it can be seen that the dependency on the missing rate p arises when model errors exist in the bias parameters or in variance parameters and. Here, we consider the case where model errors exist in the bias

10 KATAOKA et al. 3 Averaged MSE Fig. 11. Plot of E against error r when ¼, p ¼ :5, and J ¼ J. The solid line is the plot when " ¼ r and " ¼, and the dotted line is that when " ¼ and " ¼ r. The vertical axis is E in Eq. (4.1). r 6 5 Averaged MSE p Fig. 1. Plot of E against missing rate p when ¼ :4, " ¼ :1, " ¼, and J ¼ J. The vertical axis is E in Eq. (4.1). parameters and the variance parameters. Figure 1 shows the plot of E against missing rate p when ¼ :4, " ¼ :1, " ¼, and J ¼ J. The reconstruction performance level decreases with the increase in the value of p. This performance behavior seems to be qualitatively similar to the behavior of our numerical traffic density reconstruction in Fig. 8. From this result, we can presume that the behavior of MSE in Fig. 8 is caused primarily by the model errors in either the biases or the variance, or both. In Fig. 13, we show the regression result obtained by fitting the mean-field Averaged MSE.176 Bayesian reconstruction mean field theory p Fig. 13. Regression result of E obtained by using the least squares. + is the result of numerical simulation of the Bayesian reconstruction shown in Fig. 8. result in Eq. (4.1) to the numerical reconstruction result shown in Fig. 8. The regression shows the excellent fit to the result of numerical simulation of the Bayesian reconstruction. The parameters in our reconstruction model were determined by the training data as described in Section 3.. However, in the training, it was not easy to find the truly optimal values of parameters from the training data, because the number of training data was much smaller than the number of parameters. This could be one of the reasons why the model errors exist.

11 Bayesian Reconstruction of Missing Observations 1 5. Conclusion In this paper, we introduced the Bayesian reconstruction framework, in which missing data are probabilistically interpolated, and overviewed the application of Bayesian reconstruction to the problem of traffic data reconstruction in the field of traffic engineering. Our traffic reconstruction model in Eq. (3.1) neglects some real traffic properties such as traffic lanes and contraflows. Nevertheless, the results of our reconstruction seem to be accurate. It can be expected that our simple GGM captures the static statistical property of traffic data and that it can be used an important base model of Bayesian traffic reconstructions. The extension of our model by taking real traffic properties into account should be addressed in the next study. In the latter part of this paper, we evaluated the statistical performance of our reconstruction by using the statistical mechanical analysis, that is mean-field analysis. In our analysis, we used the simplified reconstruction model, which has no network structures, for the convenience of calculations. However, since real road networks are network structures, the result of our evaluation can be only a rough approximation. The authors proposed a method based on the belief propagation method to evaluate the statistical properties of Bayesian reconstructions on structured networks in the context of image processing [9]. We strongly suggest that the method can be applied to Bayesian traffic reconstruction and can lead to more realistic evaluations. A Mean-field Analysis for Traffic Data Reconstruction Model The partition function in Eq. (4.1) is ZðÞ ¼ exp h i x i x i J ðx i x j Þ!dx 1 n i<jv ¼ exp h i x i 1 ðn 1ÞJ þ x i þ J x j x j!dx: 1 n i<jv By using the Hubbard-Stratonovich transformation, we have ZðÞ ¼ exp h i x i þ J x i þ Jn! 1 x i A dx 1 n rffiffiffiffiffiffi Z Jn 1 ¼ exp ðh i þ zjþx i þ J x i Jn!dx 1 1 z dz: The Gaussian integral leads to sffiffiffiffiffiffiffiffiffiffiffi 1 ZðÞ ¼ þ J þ J n= 1 exp@ h i þ nð þ JÞ J n ð þ JÞ Therefore, the free energy (per one variable) for the prior model in Eq. (4.1) is expressed by! h i A: ða:1þ FðÞ :¼ 1 1 ln ZðÞ ¼ n n ln þ J 1 ln þ J 1 nð þ JÞ h i J n ð þ JÞ! h i ; ða:þ so that the first- and the second-order moments of the prior model are given by hx i i prior ¼ ¼ h i þ J þ J nð þ JÞ jv and hx i x j i prior ¼ j þhx i i prior hx j i prior ¼ ij þ J þ h j J nð þ JÞ þhx ii prior hx j i prior ; respectively, where h i prior ¼ R 1 1 ð ÞP priorðx j Þ dx and ij is Kronecker s delta. Next, we find the free energy for the conditional probability of the reconstruction model in Eq. (4.). The conditional probability is expressed by 1 P model ð j x O ; Þ¼ Zð ; x O Þ exp ð i þ f ðx O ÞÞx i 1 þ ðn 1ÞJ x i þ J x i x j!; n n im im i<jm where f ðx O Þ is defined in Eq. (4.7) and P i<jm represents the summation running over all distinct pairs of vertices i and j in M V. Zð ; x O Þ represents the partition function defined by ða:3þ ða:4þ

12 KATAOKA et al. Zð ; x O Þ :¼ exp ð i þ f ðx O ÞÞx i 1 1 þ ðn 1ÞJ n im im x i þ J n i<jm x i x j!d : Using almost the same derivation as the above mean-field derivation for the prior model, we find sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi þ J t= 1 Zð ; x O Þ¼ exp( ð i þ f ðx O ÞÞ þð1 pþj þ J ð þ J Þ im! ) pj þ ð i þ f ðx O ÞÞ ; ða:5þ tð þ J Þð þð1 pþj Þ im where t is the number of missing elements defined in Section 4 and p ¼ t=n is associated with the missing rate. From Eq. (A 5), we can obtain the free energy (per one variable) for the conditional probability of the reconstruction model by F ð ; x O Þ :¼ 1 t ln Zð ; x O Þ¼ 1 þ J 1 t þð1 pþj ln 1 ð i þ f ðx O ÞÞ þ J tð þ J Þ pj ð t i þ f ðx O ÞÞ! ; ða:6þ ð þ J Þð þð1 pþj Þ im so that the first-order moments of the conditional probability of the reconstruction model are given by hx i i MjO ¼ ð ; x O i ¼ i þ f ðx O Þ þ J þ pj tð þ J Þð þð1 pþj Þ im ð j þ f ðx O ÞÞ for 8i M, where we define h i MjO :¼ R 1 1 ð ÞP modelð j x O ; Þ dx. We consider the thermodynamic limit by taking limits n; t!1, where p is fixed at a finite constant. In the thermal dynamical limit, the free energies in Eqs. (A ) and (A 6) are reduced to FðÞ ¼ 1 ln þ J h þ h ð þ JÞ J h ð þ JÞ and F ð ; x O Þ¼ 1 ln h þ " þð h þ " þ f ðx O ÞÞ pj ð h þ " þ f ðx O ÞÞ þ J ð þ J Þ ð þ J Þð þð1 pþj Þ ; respectively, where h and h are the average and variance of p hðhþ, respectively, and " and " are the average and variance of p " ð"þ, respectively. f h ; " ;h ; " g appear because of the law of large numbers. Similarly, the moments in Eqs. (A 3), (A 4), and (A 7) are reduced to and h i hx i i prior ¼ þ J þ hx i x j i prior ¼ jm ða:7þ J h ð þ JÞ ; ða :8Þ ij þ J þhx ii prior hx j i prior ; hx i i MjO ¼ i þ f ðx O Þ þ pj ð h þ " þ f ðx O ÞÞ þ J ð þ J Þð þð1 pþj Þ ; ða :1Þ respectively, in the thermaldynamic limit. Acknowledgment The authors are very grateful to Professor Masao Kuwahara and Dr. Jinyoung Kim of the Graduate School of Information Science, Tohoku University for providing road network data and traffic simulation data. This work was partially supported by Grants-in-Aid (Nos. 5889, 47, and 5-759) from the Ministry of Education, Culture, Sports, Science and Technology of Japan. REFERENCES [1] Bertalmío, M., Sapiro, G., Caselles, V., and Ballester, C., Image inpainting, Proceedings of SIGGRAPH (). [] Yasuda, M., and Tanaka, K., Mean field approximation for fields of experts, Interdisciplinary Information Sciences, 19: (13). [3] Yasuda, M., Ohkubo, J., and Tanaka, K., Digital image inpainting based on Markov random field, Proceedings of 5 ða:9þ

13 Bayesian Reconstruction of Missing Observations 3 International Conference on Computational Intelligence for Modelling, Control and Automation (CIMCA 5), : (5). [4] Yasuda, M., Ohkubo, J., and Tanaka, K., Digital image inpainting algorithm by using Gaussian graphical model, Information Technology Letters (Proceedings of Forum of Information Technology (FIT) 6), 5: 5 8 (6) (in Japanese). [5] Roth, S., Black, and M. J., Fields of experts, International Journal of Computer Vision, 8: 5 9 (9). [6] Kataoka, S., Yasuda, M., Furtlehner, C., and Tanaka, K., Traffic data reconstruction based on Markov random field modeling, Inverse Problems, 3: 53 (14). [7] Weiss, Y., and Freeman, W. T., Correctness of belief propagation in Gaussian graphical models of arbitrary topology, Neural Computation, 13: 173 (1). [8] SOUND is commercialized by i-transport Lab. Co., Ltd. [9] Kataoka, S., Yasuda, M., and Tanaka, K., Statistical analysis of Gaussian image inpainting problems, Journal of the Physical Society of Japan, 81: 51 (1).

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models David Sontag New York University Lecture 4, February 16, 2012 David Sontag (NYU) Graphical Models Lecture 4, February 16, 2012 1 / 27 Undirected graphical models Reminder

More information

Does Better Inference mean Better Learning?

Does Better Inference mean Better Learning? Does Better Inference mean Better Learning? Andrew E. Gelfand, Rina Dechter & Alexander Ihler Department of Computer Science University of California, Irvine {agelfand,dechter,ihler}@ics.uci.edu Abstract

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Bayesian analysis of Markov Modulated Bernoulli Processes

Bayesian analysis of Markov Modulated Bernoulli Processes Math Meth Oper Res (2003) 57: 125 140 Bayesian analysis of Markov Modulated Bernoulli Processes S. Özekici1 and R. Soyer2 1Department of Industrial Engineering, Koç University, 80910 Sarıyer-İstanbul,

More information

Probabilistic Graphical Models for Image Analysis - Lecture 1

Probabilistic Graphical Models for Image Analysis - Lecture 1 Probabilistic Graphical Models for Image Analysis - Lecture 1 Alexey Gronskiy, Stefan Bauer 21 September 2018 Max Planck ETH Center for Learning Systems Overview 1. Motivation - Why Graphical Models 2.

More information

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Part I C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Probabilistic Graphical Models Graphical representation of a probabilistic model Each variable corresponds to a

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

Expectation Propagation Algorithm

Expectation Propagation Algorithm Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

Chapter 16. Structured Probabilistic Models for Deep Learning

Chapter 16. Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe

More information

3 : Representation of Undirected GM

3 : Representation of Undirected GM 10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart 1 Motivation and Problem In Lecture 1 we briefly saw how histograms

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Least Squares Regression

Least Squares Regression E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

Density Estimation: ML, MAP, Bayesian estimation

Density Estimation: ML, MAP, Bayesian estimation Density Estimation: ML, MAP, Bayesian estimation CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Maximum-Likelihood Estimation Maximum

More information

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 13: Learning in Gaussian Graphical Models, Non-Gaussian Inference, Monte Carlo Methods Some figures

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Expect Values and Probability Density Functions

Expect Values and Probability Density Functions Intelligent Systems: Reasoning and Recognition James L. Crowley ESIAG / osig Second Semester 00/0 Lesson 5 8 april 0 Expect Values and Probability Density Functions otation... Bayesian Classification (Reminder...3

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

Learning Gaussian Process Models from Uncertain Data

Learning Gaussian Process Models from Uncertain Data Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada

More information

Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013

Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013 Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013 Outline Modeling Inference Training Applications Outline Modeling Problem definition Discriminative vs. Generative Chain CRF General

More information

CS Lecture 4. Markov Random Fields

CS Lecture 4. Markov Random Fields CS 6347 Lecture 4 Markov Random Fields Recap Announcements First homework is available on elearning Reminder: Office hours Tuesday from 10am-11am Last Time Bayesian networks Today Markov random fields

More information

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 Plan for today and next week Today and next time: Bayesian networks (Bishop Sec. 8.1) Conditional

More information

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions

More information

Introduction to Restricted Boltzmann Machines

Introduction to Restricted Boltzmann Machines Introduction to Restricted Boltzmann Machines Ilija Bogunovic and Edo Collins EPFL {ilija.bogunovic,edo.collins}@epfl.ch October 13, 2014 Introduction Ingredients: 1. Probabilistic graphical models (undirected,

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4 ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

arxiv: v1 [stat.ml] 20 Mar 2018

arxiv: v1 [stat.ml] 20 Mar 2018 Momentum-Space Renormalization Group Transformation in Bayesian Image Modeling by Gaussian Graphical Model arxiv:1804.0077v1 [stat.ml] 0 Mar 018 Kazuyuki Tanaka 1, Masamichi akamura 1, Shun Kataoka 1,

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Statistical-Mechanical Approach to Probabilistic Image Processing Loopy Belief Propagation and Advanced Mean-Field Method

Statistical-Mechanical Approach to Probabilistic Image Processing Loopy Belief Propagation and Advanced Mean-Field Method Statistical-Mechanical Approach to Probabilistic Image Processing Loopy Belief Propagation and Advanced Mean-Field Method Kazuyuki Tanaka and Noriko Yoshiike 2 Graduate School of Information Sciences,

More information

Small sample corrections for LTS and MCD

Small sample corrections for LTS and MCD Metrika (2002) 55: 111 123 > Springer-Verlag 2002 Small sample corrections for LTS and MCD G. Pison, S. Van Aelst*, and G. Willems Department of Mathematics and Computer Science, Universitaire Instelling

More information

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

PDF hosted at the Radboud Repository of the Radboud University Nijmegen PDF hosted at the Radboud Repository of the Radboud University Nijmegen The following full text is a preprint version which may differ from the publisher's version. For additional information about this

More information

p L yi z n m x N n xi

p L yi z n m x N n xi y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen

More information

A brief introduction to Conditional Random Fields

A brief introduction to Conditional Random Fields A brief introduction to Conditional Random Fields Mark Johnson Macquarie University April, 2005, updated October 2010 1 Talk outline Graphical models Maximum likelihood and maximum conditional likelihood

More information

Mathematical Formulation of Our Example

Mathematical Formulation of Our Example Mathematical Formulation of Our Example We define two binary random variables: open and, where is light on or light off. Our question is: What is? Computer Vision 1 Combining Evidence Suppose our robot

More information

Mixtures of Gaussians. Sargur Srihari

Mixtures of Gaussians. Sargur Srihari Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm

More information

6.867 Machine learning, lecture 23 (Jaakkola)

6.867 Machine learning, lecture 23 (Jaakkola) Lecture topics: Markov Random Fields Probabilistic inference Markov Random Fields We will briefly go over undirected graphical models or Markov Random Fields (MRFs) as they will be needed in the context

More information

Lecture 15. Probabilistic Models on Graph

Lecture 15. Probabilistic Models on Graph Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how

More information

Variational Inference (11/04/13)

Variational Inference (11/04/13) STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further

More information

Markov Random Fields

Markov Random Fields Markov Random Fields Umamahesh Srinivas ipal Group Meeting February 25, 2011 Outline 1 Basic graph-theoretic concepts 2 Markov chain 3 Markov random field (MRF) 4 Gauss-Markov random field (GMRF), and

More information

Possible numbers of ones in 0 1 matrices with a given rank

Possible numbers of ones in 0 1 matrices with a given rank Linear and Multilinear Algebra, Vol, No, 00, Possible numbers of ones in 0 1 matrices with a given rank QI HU, YAQIN LI and XINGZHI ZHAN* Department of Mathematics, East China Normal University, Shanghai

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

Bayesian Linear Regression [DRAFT - In Progress]

Bayesian Linear Regression [DRAFT - In Progress] Bayesian Linear Regression [DRAFT - In Progress] David S. Rosenberg Abstract Here we develop some basics of Bayesian linear regression. Most of the calculations for this document come from the basic theory

More information

Machine Learning Techniques for Computer Vision

Machine Learning Techniques for Computer Vision Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

Bayesian Machine Learning - Lecture 7

Bayesian Machine Learning - Lecture 7 Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine CS 484 Data Mining Classification 7 Some slides are from Professor Padhraic Smyth at UC Irvine Bayesian Belief networks Conditional independence assumption of Naïve Bayes classifier is too strong. Allows

More information

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics DS-GA 100 Lecture notes 11 Fall 016 Bayesian statistics In the frequentist paradigm we model the data as realizations from a distribution that depends on deterministic parameters. In contrast, in Bayesian

More information

Bayesian networks for multilevel system reliability

Bayesian networks for multilevel system reliability Reliability Engineering and System Safety 92 (2007) 1413 1420 www.elsevier.com/locate/ress Bayesian networks for multilevel system reliability Alyson G. Wilson a,,1, Aparna V. Huzurbazar b a Statistical

More information

Exact Inference I. Mark Peot. In this lecture we will look at issues associated with exact inference. = =

Exact Inference I. Mark Peot. In this lecture we will look at issues associated with exact inference. = = Exact Inference I Mark Peot In this lecture we will look at issues associated with exact inference 10 Queries The objective of probabilistic inference is to compute a joint distribution of a set of query

More information

2D Image Processing (Extended) Kalman and particle filter

2D Image Processing (Extended) Kalman and particle filter 2D Image Processing (Extended) Kalman and particle filter Prof. Didier Stricker Dr. Gabriele Bleser Kaiserlautern University http://ags.cs.uni-kl.de/ DFKI Deutsches Forschungszentrum für Künstliche Intelligenz

More information

Introduction to Graphical Models

Introduction to Graphical Models Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 12 Dynamical Models CS/CNS/EE 155 Andreas Krause Homework 3 out tonight Start early!! Announcements Project milestones due today Please email to TAs 2 Parameter learning

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Managing Uncertainty

Managing Uncertainty Managing Uncertainty Bayesian Linear Regression and Kalman Filter December 4, 2017 Objectives The goal of this lab is multiple: 1. First it is a reminder of some central elementary notions of Bayesian

More information

Parameter Estimation. Industrial AI Lab.

Parameter Estimation. Industrial AI Lab. Parameter Estimation Industrial AI Lab. Generative Model X Y w y = ω T x + ε ε~n(0, σ 2 ) σ 2 2 Maximum Likelihood Estimation (MLE) Estimate parameters θ ω, σ 2 given a generative model Given observed

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Based on slides by Richard Zemel

Based on slides by Richard Zemel CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

Undirected Graphical Models

Undirected Graphical Models Undirected Graphical Models 1 Conditional Independence Graphs Let G = (V, E) be an undirected graph with vertex set V and edge set E, and let A, B, and C be subsets of vertices. We say that C separates

More information

Density Propagation for Continuous Temporal Chains Generative and Discriminative Models

Density Propagation for Continuous Temporal Chains Generative and Discriminative Models $ Technical Report, University of Toronto, CSRG-501, October 2004 Density Propagation for Continuous Temporal Chains Generative and Discriminative Models Cristian Sminchisescu and Allan Jepson Department

More information

CSC 412 (Lecture 4): Undirected Graphical Models

CSC 412 (Lecture 4): Undirected Graphical Models CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:

More information

Probabilistic Graphical Models Lecture Notes Fall 2009

Probabilistic Graphical Models Lecture Notes Fall 2009 Probabilistic Graphical Models Lecture Notes Fall 2009 October 28, 2009 Byoung-Tak Zhang School of omputer Science and Engineering & ognitive Science, Brain Science, and Bioinformatics Seoul National University

More information

Unsupervised Learning with Permuted Data

Unsupervised Learning with Permuted Data Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University

More information

Bias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions

Bias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions - Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions Simon Luo The University of Sydney Data61, CSIRO simon.luo@data61.csiro.au Mahito Sugiyama National Institute of

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

Probability and Information Theory. Sargur N. Srihari

Probability and Information Theory. Sargur N. Srihari Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal

More information

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of

More information

Statistical learning. Chapter 20, Sections 1 4 1

Statistical learning. Chapter 20, Sections 1 4 1 Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Thomas G. Dietterich tgd@eecs.oregonstate.edu 1 Outline What is Machine Learning? Introduction to Supervised Learning: Linear Methods Overfitting, Regularization, and the

More information

Active and Semi-supervised Kernel Classification

Active and Semi-supervised Kernel Classification Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),

More information

10-701/ Machine Learning, Fall

10-701/ Machine Learning, Fall 0-70/5-78 Machine Learning, Fall 2003 Homework 2 Solution If you have questions, please contact Jiayong Zhang .. (Error Function) The sum-of-squares error is the most common training

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger

More information

Random Field Models for Applications in Computer Vision

Random Field Models for Applications in Computer Vision Random Field Models for Applications in Computer Vision Nazre Batool Post-doctorate Fellow, Team AYIN, INRIA Sophia Antipolis Outline Graphical Models Generative vs. Discriminative Classifiers Markov Random

More information

Path and travel time inference from GPS probe vehicle data

Path and travel time inference from GPS probe vehicle data Path and travel time inference from GPS probe vehicle data Timothy Hunter Department of Electrical Engineering and Computer Science University of California, Berkeley tjhunter@eecs.berkeley.edu Pieter

More information

Figure : Learning the dynamics of juggling. Three motion classes, emerging from dynamical learning, turn out to correspond accurately to ballistic mot

Figure : Learning the dynamics of juggling. Three motion classes, emerging from dynamical learning, turn out to correspond accurately to ballistic mot Learning multi-class dynamics A. Blake, B. North and M. Isard Department of Engineering Science, University of Oxford, Oxford OX 3PJ, UK. Web: http://www.robots.ox.ac.uk/vdg/ Abstract Standard techniques

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 9: Expectation Maximiation (EM) Algorithm, Learning in Undirected Graphical Models Some figures courtesy

More information

Stochastic Spectral Approaches to Bayesian Inference

Stochastic Spectral Approaches to Bayesian Inference Stochastic Spectral Approaches to Bayesian Inference Prof. Nathan L. Gibson Department of Mathematics Applied Mathematics and Computation Seminar March 4, 2011 Prof. Gibson (OSU) Spectral Approaches to

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

p(d θ ) l(θ ) 1.2 x x x

p(d θ ) l(θ ) 1.2 x x x p(d θ ).2 x 0-7 0.8 x 0-7 0.4 x 0-7 l(θ ) -20-40 -60-80 -00 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ θ x FIGURE 3.. The top graph shows several training points in one dimension, known or assumed to

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Introduction to Bayesian Learning. Machine Learning Fall 2018

Introduction to Bayesian Learning. Machine Learning Fall 2018 Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability

More information

Dynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji

Dynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji Dynamic Data Modeling, Recognition, and Synthesis Rui Zhao Thesis Defense Advisor: Professor Qiang Ji Contents Introduction Related Work Dynamic Data Modeling & Analysis Temporal localization Insufficient

More information

ECE521 Lectures 9 Fully Connected Neural Networks

ECE521 Lectures 9 Fully Connected Neural Networks ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance

More information

Intelligent Systems:

Intelligent Systems: Intelligent Systems: Undirected Graphical models (Factor Graphs) (2 lectures) Carsten Rother 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM Roadmap for next two lectures Definition

More information