Model Complexity of Pseudo-independent Models

Size: px
Start display at page:

Download "Model Complexity of Pseudo-independent Models"

Transcription

1 Model Complexity of Pseudo-independent Models Jae-Hyuck Lee and Yang Xiang Department of Computing and Information Science University of Guelph, Guelph, Canada {jaehyuck, Abstract A type of problem domains known as pseudo-independent (PI) domains poses difficulty for common probabilistic learning methods based on the single-link lookahead search. To learn this type of domain models, a learning method based on the multiple-link lookahead search is needed. An improved result can be obtained by incorporating model complexity into a scoring metric to explicitly trade off model accuracy for complexity and vice versa during selection of the best model among candidates at each learning step. To implement this scoring metric for the PI-learning method, the complexity formula for PI models is required. Previous studies found the complexity formula for full PI models, one of the three major types of PI models (the other two are partial and mixed PI models). This study presents the complexity formula for atomic partial PI models, partial PI models that contain no embedded PI submodels. The complexity is acquired by arithmetic operation on the spaces of domain variables. The new formula provides the basis for further characterizing the complexity of non-atomic PI models, which contain embedded PI submodels in their domains. Keywords: probabilistic reasoning, knowledge discovery, data mining, machine learning, belief networks, model complexity. Introduction Learning probabilistic networks (Lam & Bacchus 1994; Cooper & Herskovits 1992; Heckerman, Geiger, & Chickering 1995; Friedman, Murphy, & Russell 1998) has been an active area of research recently. The task of learning networks is NP-hard (Chickering, Geiger, & Heckerman 1995). Therefore, learning algorithms use a heuristic search, and the common search method is the single-link lookahead which generates network structures that differ by a single link at each level of the search. Pseudo-independent (PI) models (Xiang, Wong, & Cercone 1996) are a class of probabilistic domain models where a group of marginally independent variables displays collective dependency. PI models can be classified into three types: full, partial, and mixed PI models based on the pattern of marginal independency and the extent of collective dependency. The most restrictive type is full PI models Copyright c 2005, American Association for Artificial Intelligence ( All rights reserved. where every proper subset of variables are marginally independent. This is relaxed in partial PI models where not every proper subset of variables are marginally independent. While in full or partial PI models, all variables in the domain are collectively dependent, this is not the case in mixed PI models where only proper subsets, called embedded PI subdomains, of domain variables are collectively dependent. However, the marginal independency pattern of each embedded PI subdomain in mixed PI models is that of either a full or a partial PI model. PI models cannot be learned by the single-link lookahead search because the underlying collective dependency cannot be recovered. Incorrectly learned models introduce silent errors when used for decision making. To learn PI models, a more sophisticated search method called multi-link lookahead (Xiang, Wong, & Cercone 1997) should be used. It was implemented in the learning algorithm called RML (Xiang et al. 2000). The algorithm is equipped with the Kullback- Leibler cross entropy as the scoring metric for the goodnessof-fit to data. The scoring metric of the learning algorithm can be improved by combining cross entropy as a measure of model accuracy with a measure of model complexity. A weighted sum of the two measures is a simple way of combination and other alternatives are also possible. By using such a scoring metric, between two models of the same accuracy (as measured by cross entropy), the one with less complexity will end up with a higher score and be preferred. The focus of this paper is on the assessment of the model complexity and we defer the details of combination to future work. Model complexity is defined by the number of parameters required to fully specify a model. In previous work (Xiang 1997), a formula was presented for estimating the number of parameters in full PI models, one of the three types of PI models (the other two are partial and mixed PI models). However, the formula was very complex, and did not show the structural dependence relationships among parameters. The new concise formula for full PI models was recently presented (Xiang, Lee, & Cercone 2004) using a hypercube (Xiang, Lee, & Cercone 2003). In this study, we present the model complexity formula for atomic partial PI models, partial PI models that contain no embedded PI submodels. Atomic partial PI models are the building blocks of mixed PI models. This new formula

2 is simple in form, and provides good insight into the structural dependency relationships among parameters of partial PI models. Furthermore, the previous complexity formula for full PI models is integrated into this new formula. In other words, substituted with the conditions of full PI models, the new formula is reduced to the formula for full PI models; this also confirms that full PI models are a special case of partial PI models. Besides, the new formula provides the basis for further characterizing the complexity of mixed PI models. We apply the hypercube method to show how the complexity of partial PI models can be acquired from the spaces of variables. Background Let V be a set of n discrete variables X 1,..., X n (in what follows we will focus on domains of finite and discrete variables). Each variable X i has a finite space S i = {x i,1, x i,2,..., x i,di } of cardinality D i. The space of a set V of variables is defined by the Cartesian product of the spaces of all variables in V, that is, S V = S 1 S n (or i S i). Thus, S V contains the tuples made of all possible combinations of values of the variables in V. Each tuple is called a configuration of V, denoted by (x 1,..., x n ). Let P (X i ) denote the probability function over X i and P (x i ) denote the probability value P (X i = x i ). The following axiom of probability is called the total probability law: P (S i ) = 1, or P (x i,1 )+P (x i,2 )+ +P (x i,di ) = 1. (1) A probabilistic domain model (PDM) M over V defines the probability values of every configuration for every subset A V. P (V ) or P (X 1,..., X n ) refers to the joint probability distribution (JPD) function over X 1,..., X n, and P (x 1,..., x n ) refers to the joint probability value of the configuration (x 1,..., x n ). The probability function P (A) over any proper subsets A V refers to the marginal probability distribution (MPD) function over A. If A = {X 1,..., X m } (A V ), then P (x 1,..., x m ) refers to the marginal probability value. A set of probability values that directly specifies a PDM is called parameters of the PDM. A joint probability value P (x 1,..., x n ) is referred to as a joint parameter or joint and a marginal probability value P (x 1,..., x m ) as a marginal parameter or marginal. Among parameters associated with a PDM, some parameters can be derived from others by using constraints such as the total probability law. Such derivable parameters are called constrained or dependent parameters while underivable parameters are called unconstrained, free, or independent parameters. The number of independent parameters of a PDM is called the model complexity of the PDM, denoted as ω. When no information of the constraints on a general PDM is given, the PDM should be specified only by joint parameters. The following ω g gives the number of joint parameters required: Let M be a general PDM over V = {X 1,..., X n }. Then the number of independent parameters of M is upper-bounded by ( n ) ω g = D i 1. (2) i=1 One joint is dependent since it can be derived from others by the total probability law (Eq. (1)). For any three disjoint subsets of variables A, B and C in V, A and B are called conditionally independent given C, denoted by I(A, B C), iff P (A B, C) = P (A C) for all values in A, B and C such that P (B, C) > 0. Given subsets of variables A, B, C, D, E V, the following property of conditional independence is called Symmetry: I(A, B C) I(B, A C); (3) and the following is Composition: I(A, B C) I(A, D C) I(A, B D C). (4) Two disjoint subsets A and B are said to be marginally independent, denoted by I(A, B ), iff P (A B) = P (A) for all values A and B such that P (B) > 0. If two subsets of variables are marginally independent, no dependency exists between them. Hence, each subset can be modeled independently without losing information. If each variable X i in a set A is marginally independent of the rest, the variables in A are said to be marginally independent. The probability distribution over a set of marginally independent variables can be written as the product of the marginal of each variable, that is, P (A) = X P (X i A i). Variables in a set A are called generally dependent if P (B A \ B) P (B) for every proper subset B A. If a subset of variables is generally dependent, a proper subsets cannot be modeled independently without losing information. Variables in A are collectively dependent if, for each proper subset B A, there exists no proper subset C A \ B that satisfies P (B A \ B) = P (B C). Collective dependence prevents conditional independence and modeling through proper subsets of variables. A pseudo-independent (PI) model is a PDM where proper subsets of a set of collectively dependent variables display marginal independence (Xiang, Wong, and Cercone 1997). Definition 1 (Full PI model). A PDM over a set V ( V 3) of variables is a full PI model if the following properties (called axioms of full PI models) hold: (S I ) Variables in any proper subset of V are marginally independent. (S II ) Variables in V are collectively dependent. The complexity of full PI models is given as follows: Theorem 2 (Complexity of full PI models by Xiang, Lee, & Cercone, 2004). Let a PDM M be a full PI model over V = {X 1,..., X n }. Then the number of independent parameters of M is upper-bounded by n n ω f = (D i 1) + (D i 1). (5) i=1 i=1 The axiom (S I ) of marginal independence is relaxed in partial PI models, as is defined through marginally independent partition.

3 Definition 3 (Marginally independent partition). Let V ( V 3) be a set of variables and B = {B 1,..., B m } (m 2) be a partition of V. B is a marginally independent partition if for every subset A = {X i,k X i,k B k for k = 1,..., m}, variables in A are marginally independent. Each partition block B i in B is called a marginally independent block. A marginally independent partition of V groups variables in V into m marginally independent blocks. The property of marginally independent blocks is that if a subset A is formed by taking one element from different blocks, then variables in A are always marginally independent. In a partial PI model, it is not necessary that every proper subset is marginally independent. Definition 4 (Partial PI model). A PDM over a set V ( V 3) of variables is a partial PI model on V if the following properties (called axioms of partial PI models) holds: (S I ) V can be partitioned into two or more marginally independent blocks. (S II ) Variables in V are collectively dependent. The following definitions on maximum marginally independent partition is needed later for obtaining the complexity of partial PI models: Definition 5 (Maximum partition and minimum block). Let B = {B 1,..., B m } be a marginally independent partition of a partial PI model over V. B is a maximum marginally independent partition if there exists no marginally independent partition B of V such that B < B. The blocks of a maximum marginally independent partition are called the minimum marginally-independent blocks or minimum blocks. Complexity of atomic partial PI models The following defines atomic partial PI models: Definition 6 (Atomic PI model). A PDM M over a set V ( V 3) of variables is an atomic PI model if M is either a full or partial PI model, and no collective dependency exists in any proper subsets of V. Because the dependency within each block, the complexity of partial PI models is higher than full PI models, but lower than general PDMs with variables of the same space cardinalities, as we will see later. The following lemma states that in a PDM that satisfies Composition (Eq. (4)), if every pair of variables between two subsets are marginally independent, then the two subsets are marginally independent. Lemma 7 (Marginal independence of subsets). Let M be a PDM over V where Composition holds in every subset. Let B α = {Y 1,..., Y s } and B β = {Z 1,..., Z t } denote any two disjoint nonempty subsets of variables in V. If I(Y i, Z j ) holds for every pair (Y i, Z j ), then I(B α, B β ). (6) Proof. We prove that I(Y i, Z j ) for every (Y i, Z j ) implies I(Y i, B β ) and that I(Y i, B β ) for every Y i implies I(B α, B β ). Applying Composition recursively from I(Y i, Z 1 ) to I(Y i, Z t ) gives I(Y i, Z 1 Z t ) or I(Y i, B β ). By Symmetry (Eq. (3)), I(Y i, B β ) is equivalent to I(B β, Y i ). In the same manner, by applying Composition recursively from I(B β, Y 1 ) to I(B β, Y s ) gives I(B β, Y 1 Y s ) or I(B β, B α ). By Symmetry this is equivalent to I(B α, B β ). Lemma 7 assumes that the PDM satisfies Composition. Composition implies no collective dependency in the PDM as follows: Lemma 8 (Composition implies no collective dependency). Let M be a PDM over V that satisfies Composition. Then no collective dependency exists in V. Proof. This Lemma directly follows from the definition of Composition (Eq. (4)). The following Lemmas 9 and 10 are required for proving Lemma 11 on marginal independence of blocks. The two lemmas state that the marginal independency of partition is preserved after removing a proper subset of variables from the partition blocks and after merging the blocks. These Lemmas hold because the marginal independency of partition is defined by the independency in a set of each variable taken from the partition blocks. Lemma 9 (Marginal independency of any subpartitions). Let a PDM M be a partial PI model over V = {X 1,..., X n } (n 3) with a marginally-independent partition B = {B 1,..., B m } (m 2). A new partition B = {B 1,..., B m} over V (V V ) can be defined by removing a proper subset of variables from one or more partition blocks such that B i B i for every B i (i = 1,..., m). Then, B is also a marginally-independent partition called a subpartition. Lemma 10 (Marginal independency after merging). Let a PDM M be a partial PI model over V = {X 1,..., X n } (n 3) with a marginally-independent partition B = {B 1,..., B m } (m 2). A new partition B = { B 1,..., B r } (r < m) over V can be defined by merging one or more blocks in B. Then, B is also a marginallyindependent partition. The following lemma states that in an atomic partial PI model, removing one variable from V makes any two blocks in V marginally independent. Lemma 11 (Marginal independence between two independent blocks). Let a PDM M be an atomic partial PI model over V = {X 1,..., X n } (n 3), where Composition holds in every proper subset. Let a marginallyindependent partition be denoted by B = {B 1,..., B m }. When m > 2, for any two distinct blocks B r and B q in B, I(B r, B q ). (7a) When m = 2, that is B = {B 1, B 2 }, for any X i B 1 or any X j B 2, I(B 1 \ {X i }, B 2 ) and I(B 1, B 2 \ {X j } ). (7b)

4 Proof. Case 1, where m > 2: Let B r = {Y 1,..., Y s } and B q = {Z 1,..., Z t }. Let V = B r B q. By the definition of partial PI models, I(Y, Z ) holds for every Y and Z. Since V V, Composition holds in V. By Lemma 7, I(B r, B q ) must hold. Case 2, where m = 2: Let V i = ( B 1 \ {X i } ) B 2 and V {X j } ). Then V i V and V j j = B 1 ( B 2 \ V. Therefore, by the same argument as Case 1, both I(B 1 \ {X i }, B 2 ) and I(B 1, B 2 \ {X j } ) hold. As shown in the parameterization of full PI models (Xiang, Lee, & Cercone 2004), a sum of joint probabilities can be represented by a product of marginal probabilities. While the size of joint spaces grows exponentially in the number of variables, the size of marginal spaces grows only linearly. This results in the lesser complexity of a full PI model compared with a general PDM. A similar joint-marginal relationship holds for partial PI models. However, the marginal probability is from each partition block, not from each variable as is the case with full PI models. This is shown in the following lemma: Lemma 12 (Joint-marginal equality of partial PI models). Let a PDM M be an atomic partial PI model over V = {X 1,..., X n } (n 3), where Composition holds in every proper subset. Let a marginally-independent partition be denoted by B = {B 1,..., B m } (m 2). For an arbitrary X i, let B = {B 1,..., B m} denote a subpartition of B made by removing an X i from B so that every B i is the same as B i except the block from which X i was removed. Then, D i k=1 P (X 1,..., x i,k,..., X n ) = P (B 1) P (B m). (8) Proof. The summation D i k=1 P (X 1,..., x i,k,..., X n ) at the left represents marginalization on a variable X i. This is equal to P (X 1,..., X i 1, X i+1,..., X n ) or in partitional notation P (B 1,..., B m). The proof by induction is as follows: Base case where m = 2: We need to show P (B 1, B 2) = P (B 1)P (B 2). This is equivalent to I(B 1, B 2 ), which is obvious by Lemma 11. Induction hypothesis: Assume the following holds for m = k: P (B 1,..., B k) = P (B 1) P (B k). (9) Induction step: We need to show the following holds for m = k + 1: P (B 1,..., B k+1) = P (B 1) P (B k+1). Merging {B 1,..., B k } into one block and applying Lemmas 9, 10, 11 give P (B 1,..., B k, B k+1) = P (B 1,..., B k)p (B k+1). By the induction hypothesis (Eq. (9)), P (B 1,..., B k)p (B k+1) = P (B 1) P (B k)p (B k+1). Therefore, by mathematical induction from m = 2, k and k + 1, Eq. (8) must hold for any m. B1 B2 Figure 1: A partial PI model. (The dotted circle depicts each partition block.) Corollary 13, which directly follows from Lemma 12, shows the relation between joint parameters and marginal parameters, by which joint parameters can be derived from other marginal and joint parameters. Corollary 13 (Partial PI joint constraint). Let a PDM M be an atomic partial PI model over V = {X 1,..., X n } (n 3), where Composition holds in every proper subset. Let a marginally-independent partition be denoted by B = {B 1,..., B m } (m 2). For an arbitrary X i, let B = {B 1,..., B m} denote a subpartition of B made by removing an X i from B so that every B i is the same as B i except the block from which X i was removed. Then, P (X 1,..., x i,r,..., X n ) = P (B 1) P (B m) D i k=1,k r P (X 1,..., x i,k,..., X n ). (10) We are to derive the number of independent parameters for specifying a PDM by using the constraint (Corollary 13) on atomic partial PI domains. First, we determine the number of independent marginal parameters, denoted as ω m, that are required to specify all P (B 1),..., P (B m) terms in Corollary 13. Next, we determine the number of joint parameters that cannot be derived from ω m plus other joint parameters. In a hypercube, this is equivalent to counting the number of cells that cannot be derived since a cell in a hypercube corresponds to a joint parameter in the JPD. The procedure is as follows: (i) Check cells one by one to see whether it can be derived from ω m and any other cells by applying Corollary 13. (ii) As soon as a cell is determined to be derivable, it is eliminated from further consideration. Repeat this procedure until no more cells can be eliminated. The remaining cells and the ω m marginal parameters constitute the total number of independent parameters of the partial PI model.

5 For example, consider the partial PI model in Figure 1, which corresponds to the hypercube in Figure 2(a). The PDM consists of three variables {X 1, X 2, X 3 }; X 1, X 2 are ternary, and X 3 is binary. The marginally-independent partition B = {B 1, B 2 } or {{X 1 }, {X 2, X 3 }}. For this PDM, ω m = (3 1) + (3 2 1) = 7. For example, to specify P (X 1 ) for B 1, 2 marginals are required such as: P (x 1,1 ), P (x 1,2 ); and to specify P (X 2, X 3 ) for B 2, 5 parameters are required such as: P (x 2,1, x 3,1 ), P (x 2,2, x 3,1 ), P (x 2,3, x 3,1 ), P (x 2,1, x 3,2 ), P (x 2,2, x 3,2 ). (11) We assume that the 7 marginal parameters have been specified, and thus the other 2 marginal parameters P (x 1,3 ) and P (x 2,3, x 3,2 ) can be derived by the total probability law (Eq. (1)). Then, by marginalization, the following 6 marginal parameters can also be derived from the 5 parameters in (11) plus P (x 2,3, x 3,2 ): P (x 2,1 ), P (x 2,2 ), P (x 2,3 ), P (x 3,1 ), P (x 3,2 ), P (x 3,3 ). We refer to the set of cells with the identical value X i = x i,j as the hyperplane at X i = x i,j in the hypercube. For example, the hyperplane at X 1 = x 1,3 in Figure 2(a) refer to the following 6 cells. (We use p(i, j, k) as an abbreviation for P (X 1,i, X 2,j, X 3,k ).): p(3, 1, 1), p(3, 2, 1), p(3, 3, 1), p(3, 1, 2), p(3, 2, 2), p(3, 3, 2) By Corollary 13, we have p(3, 1, 1) = P (x 2,1, x 3,1 ) (p(1, 1, 1) + p(2, 1, 1)). That is, the cell at the front-lower-left corner can be derived by the two cells behind it and the marginal parameters. All other cells on the hyperplane at X 1 = x 1,3 can be similarly derived. Therefore, these 6 cells can be eliminated from further consideration. The remaining 12 cells are shown in Figure 2(b). Using the same idea, four of the x1,3 x2,1 x2,2 x 2,3 x 3,2 (a) The original hypercube (3 3 2) x2,1 x2,2 x2,3 x3,2 (b) The hypercube after eliminating X 1 = x 1,3 Figure 2: Eliminating derivable cells from a JPD hypercube. remaining cells at X 2 = x 2,3 can be derived, and therefore eliminated from further consideration. The remaining 8 cells are shown in Figure 3(a). Again, four of the remaining cells at X 3 = x 3,2 can be derived. After eliminating them, only 4 cells are left, as shown in Figure 3(b): p(1, 1, 1), p(2, 1, 2), p(1, 2, 1), p(2, 2, 1), Since no more cells can be further eliminated, the number of independent parameters needed to specify the partial PI model is 11, with 7 marginal parameters and 4 joint parameters. Note it would take 17 parameters to specify the JPD of a general PDM over three variables of the same space cardinalities. x2,1 x2,2 x3,2 (a) The hypercube after eliminating X 2 = x 2,3 x2,1 x2,2 (b) The hypercube after eliminating X 3 = x 3,2 Figure 3: Eliminating derivable cells from a JPD hypercube. Now we present the general result on the number of independent parameters of atomic partial PI models. Theorem 14 (Complexity of atomic partial PI models). Let a PDM M be an atomic partial PI model over V = {X 1,..., X n } (n 3), where Composition holds in every proper subset. Let D 1,..., D n denote the cardinality of the space of each variable. Let a marginally-independent partition of V be denoted by B = {B 1,..., B m } (m 2), and the cardinality of the space of each block B 1,..., B m be denoted by D (B1),..., D (Bm), respectively. Then, the number ω ap of parameters required for specifying the JPD of M is upper-bounded by ω ap = n i=1 (D i 1) + m j=1 (D (B j) 1). (12) Proof. Before proving the theorem, we explain the result briefly. The first term on the right is the cardinality of the joint space of a general PDM over the set of variables except the space of each variable is reduced by one. This term is the same as the one in the model complexity formula for the full PI model (Eq. (2)). The second term is the number of marginal parameters for specifying the joint space of each partition block. What we need to show is how to derive all joint parameters with m j=1 (D (B j) 1) marginals plus n i=1 (D i 1) joints. First, m j=1 (D (B j) 1) marginal parameters are required for specifying all P (B 1),..., P (B m) terms in Corollary 13. We construct a hypercube for M to apply Eq. (10) among groups of cells. Applying Corollary 13 and using the similar argument for the example in Figure 1, we can eliminate hyperplanes at X 1 = x 1,D1, X 2 = x 2,D2,..., X n = x n,dn in that order such that for each X i, all cells on the hyperplanes at X i = x i,di can be derived from cells outside the hyperplane and the marginal parameters. The remaining cells form a hypercube whose length along the X i axis is D i 1 (i = 1, 2,..., n). Therefore, the total number of cells in this hypercube is n i=1 (D i 1).

6 The following shows how to use Theorem 14 to compute complexity of an atomic partial PI model. Example 15 (Computing model complexity). Consider the partial PI model in Figures 4. The domain consists of 6 variables from X 1 to X 6, and X 1, X 2, X 3 are ternary; X 4, X 5 are binary; and X 6 is 5-nary. The domain has a marginally-independent partition {B 1, B 2, B 3 } or {{X 1, X 3 }, {X 2, X 4, X 5 }, {X 6 }}. B1 X6 Figure 4: A partial PI model with 3 blocks and a total of 6 variables. (The dotted circles depict each partition block.) The number of marginal parameters for all partition blocks is 23, given by (3 3 1) + ( ) + (5 1). The number of independent joint parameters is 32, given by (2 1) 2 (3 1) 3 (5 1). Therefore, the total number of parameters for specifying the domain in this example is = 55. Compare this number with the number of parameters for specifying a general PDM over the same set of variables by using the total probability law, giving = 539. This shows the complexity of a partial PI model is significantly less than that of a general PDM. From the perspective of using a learned model, this means the model allows faster inference, more accurate results, and requires less space to represent and to store parameters during inference, as well as provides more expressive power with a compact form. Note Theorem 14 holds also for full PI models since a full PI model is a special case of partial PI models. The proof of this is done by substituting D (Bj) in Eq. (12) with D i (every partition block of full PI models is a singleton), yielding Eq. (5). Conclusion In this work, we present the complexity formula for atomic partial PI models, the building blocks of non-atomic PI models. We employ the hypercube method for analyzing the complexity of PI models. Further work to be done in this line of research is to find the complexity formula for non-atomic PI models. Since a non-atomic PI model contains (recursively embedded) full or partial-pi submodel(s), the complexity can be acquired B3 X5 X4 B2 by applying either full or atomic partial PI formula appropriately to each subdomain (recursively along the depth of embedding from the top to the bottom). The study of PI model complexity will lead to a new generation of PI-learning algorithms equipped with a better scoring metric. Acknowledgments The authors are grateful to the anonymous reviewers for their comments on this paper. Moreover, we especially thank Mr. Chang-Yup Lee for his financial support. Besides, this research is supported in part by NSERC of Canada. References Chickering, D.; Geiger, D.; and Heckerman, D Learning Bayesian networks: search methods and experimental results. In Proceedings of 5th Conference on Artificial Intelligence and Statistics, Cooper, G., and Herskovits, E A Bayesian method for the induction of probabilistic networks from data. Machine Learning 9: Friedman, N.; Murphy, K.; and Russell, S Learning the structure of dynamic probabilistic networks. In Cooper, G., and Moral, S., eds., Proceedings of 14th Conference on Uncertainty in Artificial Intelligence, Heckerman, D.; Geiger, D.; and Chickering, D Learning Bayesian networks: the combination of knowledge and statistical data. Machine Learning 20: Lam, W., and Bacchus, F Learning Bayesian networks: an approach based on the MDL principle. Computational Intelligence 10(3): Xiang, Y.; Hu, J.; Cercone, N.; and Hamilton, H Learning pseudo-independent models: analytical and experimental results. In Hamilton, H., ed., Advances in Artificial Intelligence. Springer Xiang, Y.; Lee, J.; and Cercone, N Parameterization of pseudo-independent models. In Proceedings of 16th Florida Artificial Intelligence Research Society Conference, Xiang, Y.; Lee, J.; and Cercone, N Towards better scoring metrics for pseudo-independent models. International Journal of Intelligent Systems 20. Xiang, Y.; Wong, S.; and Cercone, N Critical remarks on single link search in learning belief networks. In Proceedings of 12th Conference on Uncertainty in Artificial Intelligence, Xiang, Y.; Wong, S.; and Cercone, N A microscopic study of minimum entropy search in learning decomposable Markov networks. Machine Learning 26(1): Xiang, Y Towards understanding of pseudoindependent domains. In Poster Proceedings of 10th International Symposium on Methodologies for Intelligent Systems.

Discovery of Pseudo-Independent Models from Data

Discovery of Pseudo-Independent Models from Data Discovery of Pseudo-Independent Models from Data Yang Xiang, University of Guelph, Canada June 4, 2004 INTRODUCTION Graphical models such as Bayesian networks (BNs) and decomposable Markov networks (DMNs)

More information

A Decision Theoretic View on Choosing Heuristics for Discovery of Graphical Models

A Decision Theoretic View on Choosing Heuristics for Discovery of Graphical Models A Decision Theoretic View on Choosing Heuristics for Discovery of Graphical Models Y. Xiang University of Guelph, Canada Abstract Discovery of graphical models is NP-hard in general, which justifies using

More information

Learning Bayesian Networks Does Not Have to Be NP-Hard

Learning Bayesian Networks Does Not Have to Be NP-Hard Learning Bayesian Networks Does Not Have to Be NP-Hard Norbert Dojer Institute of Informatics, Warsaw University, Banacha, 0-097 Warszawa, Poland dojer@mimuw.edu.pl Abstract. We propose an algorithm for

More information

COMP538: Introduction to Bayesian Networks

COMP538: Introduction to Bayesian Networks COMP538: Introduction to Bayesian Networks Lecture 9: Optimal Structure Learning Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering Hong Kong University of Science and Technology

More information

Characterising Probability Distributions via Entropies

Characterising Probability Distributions via Entropies 1 Characterising Probability Distributions via Entropies Satyajit Thakor, Terence Chan and Alex Grant Indian Institute of Technology Mandi University of South Australia Myriota Pty Ltd arxiv:1602.03618v2

More information

Exact distribution theory for belief net responses

Exact distribution theory for belief net responses Exact distribution theory for belief net responses Peter M. Hooper Department of Mathematical and Statistical Sciences University of Alberta Edmonton, Canada, T6G 2G1 hooper@stat.ualberta.ca May 2, 2008

More information

Respecting Markov Equivalence in Computing Posterior Probabilities of Causal Graphical Features

Respecting Markov Equivalence in Computing Posterior Probabilities of Causal Graphical Features Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10) Respecting Markov Equivalence in Computing Posterior Probabilities of Causal Graphical Features Eun Yong Kang Department

More information

Abstract. Three Methods and Their Limitations. N-1 Experiments Suffice to Determine the Causal Relations Among N Variables

Abstract. Three Methods and Their Limitations. N-1 Experiments Suffice to Determine the Causal Relations Among N Variables N-1 Experiments Suffice to Determine the Causal Relations Among N Variables Frederick Eberhardt Clark Glymour 1 Richard Scheines Carnegie Mellon University Abstract By combining experimental interventions

More information

Basing Decisions on Sentences in Decision Diagrams

Basing Decisions on Sentences in Decision Diagrams Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence Basing Decisions on Sentences in Decision Diagrams Yexiang Xue Department of Computer Science Cornell University yexiang@cs.cornell.edu

More information

Prof. Dr. Lars Schmidt-Thieme, L. B. Marinho, K. Buza Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course

Prof. Dr. Lars Schmidt-Thieme, L. B. Marinho, K. Buza Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course Course on Bayesian Networks, winter term 2007 0/31 Bayesian Networks Bayesian Networks I. Bayesian Networks / 1. Probabilistic Independence and Separation in Graphs Prof. Dr. Lars Schmidt-Thieme, L. B.

More information

Lecture 9: Bayesian Learning

Lecture 9: Bayesian Learning Lecture 9: Bayesian Learning Cognitive Systems II - Machine Learning Part II: Special Aspects of Concept Learning Bayes Theorem, MAL / ML hypotheses, Brute-force MAP LEARNING, MDL principle, Bayes Optimal

More information

International Journal of Approximate Reasoning

International Journal of Approximate Reasoning International Journal of Approximate Reasoning 53 (2012) 988 1002 Contents lists available at SciVerse ScienceDirect International Journal of Approximate Reasoning journal homepage:www.elsevier.com/locate/ijar

More information

Introduction to Probabilistic Graphical Models

Introduction to Probabilistic Graphical Models Introduction to Probabilistic Graphical Models Kyu-Baek Hwang and Byoung-Tak Zhang Biointelligence Lab School of Computer Science and Engineering Seoul National University Seoul 151-742 Korea E-mail: kbhwang@bi.snu.ac.kr

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Does Better Inference mean Better Learning?

Does Better Inference mean Better Learning? Does Better Inference mean Better Learning? Andrew E. Gelfand, Rina Dechter & Alexander Ihler Department of Computer Science University of California, Irvine {agelfand,dechter,ihler}@ics.uci.edu Abstract

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Naive Bayesian Rough Sets

Naive Bayesian Rough Sets Naive Bayesian Rough Sets Yiyu Yao and Bing Zhou Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 {yyao,zhou200b}@cs.uregina.ca Abstract. A naive Bayesian classifier

More information

Weighting Expert Opinions in Group Decision Making for the Influential Effects between Variables in a Bayesian Network Model

Weighting Expert Opinions in Group Decision Making for the Influential Effects between Variables in a Bayesian Network Model Weighting Expert Opinions in Group Decision Making for the Influential Effects between Variables in a Bayesian Network Model Nipat Jongsawat 1, Wichian Premchaiswadi 2 Graduate School of Information Technology

More information

Exercises for Unit VI (Infinite constructions in set theory)

Exercises for Unit VI (Infinite constructions in set theory) Exercises for Unit VI (Infinite constructions in set theory) VI.1 : Indexed families and set theoretic operations (Halmos, 4, 8 9; Lipschutz, 5.3 5.4) Lipschutz : 5.3 5.6, 5.29 5.32, 9.14 1. Generalize

More information

On Markov Properties in Evidence Theory

On Markov Properties in Evidence Theory On Markov Properties in Evidence Theory 131 On Markov Properties in Evidence Theory Jiřina Vejnarová Institute of Information Theory and Automation of the ASCR & University of Economics, Prague vejnar@utia.cas.cz

More information

13 : Variational Inference: Loopy Belief Propagation

13 : Variational Inference: Loopy Belief Propagation 10-708: Probabilistic Graphical Models 10-708, Spring 2014 13 : Variational Inference: Loopy Belief Propagation Lecturer: Eric P. Xing Scribes: Rajarshi Das, Zhengzhong Liu, Dishan Gupta 1 Introduction

More information

Fast and Accurate Causal Inference from Time Series Data

Fast and Accurate Causal Inference from Time Series Data Fast and Accurate Causal Inference from Time Series Data Yuxiao Huang and Samantha Kleinberg Stevens Institute of Technology Hoboken, NJ {yuxiao.huang, samantha.kleinberg}@stevens.edu Abstract Causal inference

More information

12 : Variational Inference I

12 : Variational Inference I 10-708: Probabilistic Graphical Models, Spring 2015 12 : Variational Inference I Lecturer: Eric P. Xing Scribes: Fattaneh Jabbari, Eric Lei, Evan Shapiro 1 Introduction Probabilistic inference is one of

More information

Independence for Full Conditional Measures, Graphoids and Bayesian Networks

Independence for Full Conditional Measures, Graphoids and Bayesian Networks Independence for Full Conditional Measures, Graphoids and Bayesian Networks Fabio G. Cozman Universidade de Sao Paulo Teddy Seidenfeld Carnegie Mellon University February 28, 2007 Abstract This paper examines

More information

Lecture 6: Graphical Models: Learning

Lecture 6: Graphical Models: Learning Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)

More information

Extraction of NAT Causal Structures Based on Bipartition

Extraction of NAT Causal Structures Based on Bipartition Proceedings of the Thirtieth International Florida Artificial Intelligence Research Society Conference Extraction of NAT Causal Structures Based on Bipartition Yang Xiang School of Computer Science University

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Branch and Bound for Regular Bayesian Network Structure Learning

Branch and Bound for Regular Bayesian Network Structure Learning Branch and Bound for Regular Bayesian Network Structure Learning Joe Suzuki and Jun Kawahara Osaka University, Japan. j-suzuki@sigmath.es.osaka-u.ac.jp Nara Institute of Science and Technology, Japan.

More information

Short Note: Naive Bayes Classifiers and Permanence of Ratios

Short Note: Naive Bayes Classifiers and Permanence of Ratios Short Note: Naive Bayes Classifiers and Permanence of Ratios Julián M. Ortiz (jmo1@ualberta.ca) Department of Civil & Environmental Engineering University of Alberta Abstract The assumption of permanence

More information

Variational algorithms for marginal MAP

Variational algorithms for marginal MAP Variational algorithms for marginal MAP Alexander Ihler UC Irvine CIOG Workshop November 2011 Variational algorithms for marginal MAP Alexander Ihler UC Irvine CIOG Workshop November 2011 Work with Qiang

More information

LINDSTRÖM S THEOREM SALMAN SIDDIQI

LINDSTRÖM S THEOREM SALMAN SIDDIQI LINDSTRÖM S THEOREM SALMAN SIDDIQI Abstract. This paper attempts to serve as an introduction to abstract model theory. We introduce the notion of abstract logics, explore first-order logic as an instance

More information

Reductionist View: A Priori Algorithm and Vector-Space Text Retrieval. Sargur Srihari University at Buffalo The State University of New York

Reductionist View: A Priori Algorithm and Vector-Space Text Retrieval. Sargur Srihari University at Buffalo The State University of New York Reductionist View: A Priori Algorithm and Vector-Space Text Retrieval Sargur Srihari University at Buffalo The State University of New York 1 A Priori Algorithm for Association Rule Learning Association

More information

CSCE 478/878 Lecture 6: Bayesian Learning

CSCE 478/878 Lecture 6: Bayesian Learning Bayesian Methods Not all hypotheses are created equal (even if they are all consistent with the training data) Outline CSCE 478/878 Lecture 6: Bayesian Learning Stephen D. Scott (Adapted from Tom Mitchell

More information

Machine Learning Summer School

Machine Learning Summer School Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,

More information

Uncertainty and Bayesian Networks

Uncertainty and Bayesian Networks Uncertainty and Bayesian Networks Tutorial 3 Tutorial 3 1 Outline Uncertainty Probability Syntax and Semantics for Uncertainty Inference Independence and Bayes Rule Syntax and Semantics for Bayesian Networks

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396 Data Mining Linear & nonlinear classifiers Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 31 Table of contents 1 Introduction

More information

Score Metrics for Learning Bayesian Networks used as Fitness Function in a Genetic Algorithm

Score Metrics for Learning Bayesian Networks used as Fitness Function in a Genetic Algorithm Score Metrics for Learning Bayesian Networks used as Fitness Function in a Genetic Algorithm Edimilson B. dos Santos 1, Estevam R. Hruschka Jr. 2 and Nelson F. F. Ebecken 1 1 COPPE/UFRJ - Federal University

More information

CONSTRAINED PERCOLATION ON Z 2

CONSTRAINED PERCOLATION ON Z 2 CONSTRAINED PERCOLATION ON Z 2 ZHONGYANG LI Abstract. We study a constrained percolation process on Z 2, and prove the almost sure nonexistence of infinite clusters and contours for a large class of probability

More information

Representing Independence Models with Elementary Triplets

Representing Independence Models with Elementary Triplets Representing Independence Models with Elementary Triplets Jose M. Peña ADIT, IDA, Linköping University, Sweden jose.m.pena@liu.se Abstract An elementary triplet in an independence model represents a conditional

More information

Handout 2 (Correction of Handout 1 plus continued discussion/hw) Comments and Homework in Chapter 1

Handout 2 (Correction of Handout 1 plus continued discussion/hw) Comments and Homework in Chapter 1 22M:132 Fall 07 J. Simon Handout 2 (Correction of Handout 1 plus continued discussion/hw) Comments and Homework in Chapter 1 Chapter 1 contains material on sets, functions, relations, and cardinality that

More information

Seminaar Abstrakte Wiskunde Seminar in Abstract Mathematics Lecture notes in progress (27 March 2010)

Seminaar Abstrakte Wiskunde Seminar in Abstract Mathematics Lecture notes in progress (27 March 2010) http://math.sun.ac.za/amsc/sam Seminaar Abstrakte Wiskunde Seminar in Abstract Mathematics 2009-2010 Lecture notes in progress (27 March 2010) Contents 2009 Semester I: Elements 5 1. Cartesian product

More information

Undirected Graphical Models: Markov Random Fields

Undirected Graphical Models: Markov Random Fields Undirected Graphical Models: Markov Random Fields 40-956 Advanced Topics in AI: Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2015 Markov Random Field Structure: undirected

More information

Justifying Multiply Sectioned Bayesian Networks

Justifying Multiply Sectioned Bayesian Networks Justifying Multiply Sectioned Bayesian Networks Y. Xiang and V. Lesser University of Massachusetts, Amherst, MA {yxiang, lesser}@cs.umass.edu Abstract We consider multiple agents who s task is to determine

More information

The binary entropy function

The binary entropy function ECE 7680 Lecture 2 Definitions and Basic Facts Objective: To learn a bunch of definitions about entropy and information measures that will be useful through the quarter, and to present some simple but

More information

An Embedded Implementation of Bayesian Network Robot Programming Methods

An Embedded Implementation of Bayesian Network Robot Programming Methods 1 An Embedded Implementation of Bayesian Network Robot Programming Methods By Mark A. Post Lecturer, Space Mechatronic Systems Technology Laboratory. Department of Design, Manufacture and Engineering Management.

More information

On the Conditional Independence Implication Problem: A Lattice-Theoretic Approach

On the Conditional Independence Implication Problem: A Lattice-Theoretic Approach On the Conditional Independence Implication Problem: A Lattice-Theoretic Approach Mathias Niepert Department of Computer Science Indiana University Bloomington, IN, USA mniepert@cs.indiana.edu Dirk Van

More information

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data

More information

Distributed Optimization. Song Chong EE, KAIST

Distributed Optimization. Song Chong EE, KAIST Distributed Optimization Song Chong EE, KAIST songchong@kaist.edu Dynamic Programming for Path Planning A path-planning problem consists of a weighted directed graph with a set of n nodes N, directed links

More information

STATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas. Bayesian networks: Representation

STATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas. Bayesian networks: Representation STATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas Bayesian networks: Representation Motivation Explicit representation of the joint distribution is unmanageable Computationally:

More information

Separable and Transitive Graphoids

Separable and Transitive Graphoids Separable and Transitive Graphoids Dan Geiger Northrop Research and Technology Center One Research Park Palos Verdes, CA 90274 David Heckerman Medical Computer Science Group Knowledge Systems Laboratory

More information

Coalitional Structure of the Muller-Satterthwaite Theorem

Coalitional Structure of the Muller-Satterthwaite Theorem Coalitional Structure of the Muller-Satterthwaite Theorem Pingzhong Tang and Tuomas Sandholm Computer Science Department Carnegie Mellon University {kenshin,sandholm}@cscmuedu Abstract The Muller-Satterthwaite

More information

Optimization of Quadratic Forms: NP Hard Problems : Neural Networks

Optimization of Quadratic Forms: NP Hard Problems : Neural Networks 1 Optimization of Quadratic Forms: NP Hard Problems : Neural Networks Garimella Rama Murthy, Associate Professor, International Institute of Information Technology, Gachibowli, HYDERABAD, AP, INDIA ABSTRACT

More information

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti

More information

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

PDF hosted at the Radboud Repository of the Radboud University Nijmegen PDF hosted at the Radboud Repository of the Radboud University Nijmegen The following full text is an author's version which may differ from the publisher's version. For additional information about this

More information

Automatic Differentiation Equipped Variable Elimination for Sensitivity Analysis on Probabilistic Inference Queries

Automatic Differentiation Equipped Variable Elimination for Sensitivity Analysis on Probabilistic Inference Queries Automatic Differentiation Equipped Variable Elimination for Sensitivity Analysis on Probabilistic Inference Queries Anonymous Author(s) Affiliation Address email Abstract 1 2 3 4 5 6 7 8 9 10 11 12 Probabilistic

More information

EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE FILTER

EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE FILTER EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE FILTER Zhen Zhen 1, Jun Young Lee 2, and Abdus Saboor 3 1 Mingde College, Guizhou University, China zhenz2000@21cn.com 2 Department

More information

On Conditional Independence in Evidence Theory

On Conditional Independence in Evidence Theory 6th International Symposium on Imprecise Probability: Theories and Applications, Durham, United Kingdom, 2009 On Conditional Independence in Evidence Theory Jiřina Vejnarová Institute of Information Theory

More information

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute BW/WI & Institute for Computer Science, University of Hildesheim

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute BW/WI & Institute for Computer Science, University of Hildesheim Course on Bayesian Networks, summer term 2010 0/42 Bayesian Networks Bayesian Networks 11. Structure Learning / Constrained-based Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL)

More information

A Comparative Study of Noncontextual and Contextual Dependencies

A Comparative Study of Noncontextual and Contextual Dependencies A Comparative Study of Noncontextual Contextual Dependencies S.K.M. Wong 1 C.J. Butz 2 1 Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 E-mail: wong@cs.uregina.ca

More information

Graphical Model Inference with Perfect Graphs

Graphical Model Inference with Perfect Graphs Graphical Model Inference with Perfect Graphs Tony Jebara Columbia University July 25, 2013 joint work with Adrian Weller Graphical models and Markov random fields We depict a graphical model G as a bipartite

More information

Finite Mixture Model of Bounded Semi-naive Bayesian Networks Classifier

Finite Mixture Model of Bounded Semi-naive Bayesian Networks Classifier Finite Mixture Model of Bounded Semi-naive Bayesian Networks Classifier Kaizhu Huang, Irwin King, and Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong Shatin,

More information

Intelligent Systems:

Intelligent Systems: Intelligent Systems: Undirected Graphical models (Factor Graphs) (2 lectures) Carsten Rother 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM Roadmap for next two lectures Definition

More information

arxiv: v1 [math.fa] 14 Jul 2018

arxiv: v1 [math.fa] 14 Jul 2018 Construction of Regular Non-Atomic arxiv:180705437v1 [mathfa] 14 Jul 2018 Strictly-Positive Measures in Second-Countable Locally Compact Non-Atomic Hausdorff Spaces Abstract Jason Bentley Department of

More information

A Bayesian Approach to Concept Drift

A Bayesian Approach to Concept Drift A Bayesian Approach to Concept Drift Stephen H. Bach Marcus A. Maloof Department of Computer Science Georgetown University Washington, DC 20007, USA {bach, maloof}@cs.georgetown.edu Abstract To cope with

More information

Tractable Inference in Hybrid Bayesian Networks with Deterministic Conditionals using Re-approximations

Tractable Inference in Hybrid Bayesian Networks with Deterministic Conditionals using Re-approximations Tractable Inference in Hybrid Bayesian Networks with Deterministic Conditionals using Re-approximations Rafael Rumí, Antonio Salmerón Department of Statistics and Applied Mathematics University of Almería,

More information

Density Propagation for Continuous Temporal Chains Generative and Discriminative Models

Density Propagation for Continuous Temporal Chains Generative and Discriminative Models $ Technical Report, University of Toronto, CSRG-501, October 2004 Density Propagation for Continuous Temporal Chains Generative and Discriminative Models Cristian Sminchisescu and Allan Jepson Department

More information

A UNIVERSAL SEQUENCE OF CONTINUOUS FUNCTIONS

A UNIVERSAL SEQUENCE OF CONTINUOUS FUNCTIONS A UNIVERSAL SEQUENCE OF CONTINUOUS FUNCTIONS STEVO TODORCEVIC Abstract. We show that for each positive integer k there is a sequence F n : R k R of continuous functions which represents via point-wise

More information

Y. Xiang, Inference with Uncertain Knowledge 1

Y. Xiang, Inference with Uncertain Knowledge 1 Inference with Uncertain Knowledge Objectives Why must agent use uncertain knowledge? Fundamentals of Bayesian probability Inference with full joint distributions Inference with Bayes rule Bayesian networks

More information

Variable Elimination: Algorithm

Variable Elimination: Algorithm Variable Elimination: Algorithm Sargur srihari@cedar.buffalo.edu 1 Topics 1. Types of Inference Algorithms 2. Variable Elimination: the Basic ideas 3. Variable Elimination Sum-Product VE Algorithm Sum-Product

More information

The cocycle lattice of binary matroids

The cocycle lattice of binary matroids Published in: Europ. J. Comb. 14 (1993), 241 250. The cocycle lattice of binary matroids László Lovász Eötvös University, Budapest, Hungary, H-1088 Princeton University, Princeton, NJ 08544 Ákos Seress*

More information

Efficient Sensitivity Analysis in Hidden Markov Models

Efficient Sensitivity Analysis in Hidden Markov Models Efficient Sensitivity Analysis in Hidden Markov Models Silja Renooij Department of Information and Computing Sciences, Utrecht University P.O. Box 80.089, 3508 TB Utrecht, The Netherlands silja@cs.uu.nl

More information

A Class of Star-Algebras for Point-Based Qualitative Reasoning in Two- Dimensional Space

A Class of Star-Algebras for Point-Based Qualitative Reasoning in Two- Dimensional Space From: FLAIRS- Proceedings. Copyright AAAI (www.aaai.org). All rights reserved. A Class of Star-Algebras for Point-Based Qualitative Reasoning in Two- Dimensional Space Debasis Mitra Department of Computer

More information

Partial Order MCMC for Structure Discovery in Bayesian Networks

Partial Order MCMC for Structure Discovery in Bayesian Networks Partial Order MCMC for Structure Discovery in Bayesian Networks Teppo Niinimäki, Pekka Parviainen and Mikko Koivisto Helsinki Institute for Information Technology HIIT, Department of Computer Science University

More information

Towards an extension of the PC algorithm to local context-specific independencies detection

Towards an extension of the PC algorithm to local context-specific independencies detection Towards an extension of the PC algorithm to local context-specific independencies detection Feb-09-2016 Outline Background: Bayesian Networks The PC algorithm Context-specific independence: from DAGs to

More information

A new Approach to Drawing Conclusions from Data A Rough Set Perspective

A new Approach to Drawing Conclusions from Data A Rough Set Perspective Motto: Let the data speak for themselves R.A. Fisher A new Approach to Drawing Conclusions from Data A Rough et Perspective Zdzisław Pawlak Institute for Theoretical and Applied Informatics Polish Academy

More information

MODULE -4 BAYEIAN LEARNING

MODULE -4 BAYEIAN LEARNING MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities

More information

Issues in Modeling for Data Mining

Issues in Modeling for Data Mining Issues in Modeling for Data Mining Tsau Young (T.Y.) Lin Department of Mathematics and Computer Science San Jose State University San Jose, CA 95192 tylin@cs.sjsu.edu ABSTRACT Modeling in data mining has

More information

Bayesian Networks: Representation, Variable Elimination

Bayesian Networks: Representation, Variable Elimination Bayesian Networks: Representation, Variable Elimination CS 6375: Machine Learning Class Notes Instructor: Vibhav Gogate The University of Texas at Dallas We can view a Bayesian network as a compact representation

More information

The Coin Algebra and Conditional Independence

The Coin Algebra and Conditional Independence The Coin lgebra and Conditional Independence Jinfang Wang Graduate School of Science and Technology, Chiba University conditional in dependence Kyoto University, 2006.3.13-15 p.1/81 Multidisciplinary Field

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

ARTICLE IN PRESS. Journal of Multivariate Analysis ( ) Contents lists available at ScienceDirect. Journal of Multivariate Analysis

ARTICLE IN PRESS. Journal of Multivariate Analysis ( ) Contents lists available at ScienceDirect. Journal of Multivariate Analysis Journal of Multivariate Analysis ( ) Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva Marginal parameterizations of discrete models

More information

Fuzzy Rough Sets with GA-Based Attribute Division

Fuzzy Rough Sets with GA-Based Attribute Division Fuzzy Rough Sets with GA-Based Attribute Division HUGANG HAN, YOSHIO MORIOKA School of Business, Hiroshima Prefectural University 562 Nanatsuka-cho, Shobara-shi, Hiroshima 727-0023, JAPAN Abstract: Rough

More information

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida Bayesian Statistical Methods Jeff Gill Department of Political Science, University of Florida 234 Anderson Hall, PO Box 117325, Gainesville, FL 32611-7325 Voice: 352-392-0262x272, Fax: 352-392-8127, Email:

More information

An Empirical Investigation of the K2 Metric

An Empirical Investigation of the K2 Metric An Empirical Investigation of the K2 Metric Christian Borgelt and Rudolf Kruse Department of Knowledge Processing and Language Engineering Otto-von-Guericke-University of Magdeburg Universitätsplatz 2,

More information

Possibilistic Networks: Data Mining Applications

Possibilistic Networks: Data Mining Applications Possibilistic Networks: Data Mining Applications Rudolf Kruse and Christian Borgelt Dept. of Knowledge Processing and Language Engineering Otto-von-Guericke University of Magdeburg Universitätsplatz 2,

More information

Binary Decision Diagrams. Graphs. Boolean Functions

Binary Decision Diagrams. Graphs. Boolean Functions Binary Decision Diagrams Graphs Binary Decision Diagrams (BDDs) are a class of graphs that can be used as data structure for compactly representing boolean functions. BDDs were introduced by R. Bryant

More information

Expectation Propagation in Factor Graphs: A Tutorial

Expectation Propagation in Factor Graphs: A Tutorial DRAFT: Version 0.1, 28 October 2005. Do not distribute. Expectation Propagation in Factor Graphs: A Tutorial Charles Sutton October 28, 2005 Abstract Expectation propagation is an important variational

More information

An Introduction to Bayesian Machine Learning

An Introduction to Bayesian Machine Learning 1 An Introduction to Bayesian Machine Learning José Miguel Hernández-Lobato Department of Engineering, Cambridge University April 8, 2013 2 What is Machine Learning? The design of computational systems

More information

Fusion in simple models

Fusion in simple models Fusion in simple models Peter Antal antal@mit.bme.hu A.I. February 8, 2018 1 Basic concepts of probability theory Joint distribution Conditional probability Bayes rule Chain rule Marginalization General

More information

Published in: Tenth Tbilisi Symposium on Language, Logic and Computation: Gudauri, Georgia, September 2013

Published in: Tenth Tbilisi Symposium on Language, Logic and Computation: Gudauri, Georgia, September 2013 UvA-DARE (Digital Academic Repository) Estimating the Impact of Variables in Bayesian Belief Networks van Gosliga, S.P.; Groen, F.C.A. Published in: Tenth Tbilisi Symposium on Language, Logic and Computation:

More information

Utilitarian Preferences and Potential Games

Utilitarian Preferences and Potential Games Utilitarian Preferences and Potential Games Hannu Salonen 1 Department of Economics University of Turku 20014 Turku Finland email: hansal@utu.fi Abstract We study games with utilitarian preferences: the

More information

Probabilistic and Logistic Circuits: A New Synthesis of Logic and Machine Learning

Probabilistic and Logistic Circuits: A New Synthesis of Logic and Machine Learning Probabilistic and Logistic Circuits: A New Synthesis of Logic and Machine Learning Guy Van den Broeck KULeuven Symposium Dec 12, 2018 Outline Learning Adding knowledge to deep learning Logistic circuits

More information

Non-impeding Noisy-AND Tree Causal Models Over Multi-valued Variables

Non-impeding Noisy-AND Tree Causal Models Over Multi-valued Variables Non-impeding Noisy-AND Tree Causal Models Over Multi-valued Variables Yang Xiang School of Computer Science, University of Guelph, Canada Abstract To specify a Bayesian network (BN), a conditional probability

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete

More information

AN ALGEBRAIC APPROACH TO GENERALIZED MEASURES OF INFORMATION

AN ALGEBRAIC APPROACH TO GENERALIZED MEASURES OF INFORMATION AN ALGEBRAIC APPROACH TO GENERALIZED MEASURES OF INFORMATION Daniel Halpern-Leistner 6/20/08 Abstract. I propose an algebraic framework in which to study measures of information. One immediate consequence

More information

Introduction to Probability Theory, Algebra, and Set Theory

Introduction to Probability Theory, Algebra, and Set Theory Summer School on Mathematical Philosophy for Female Students Introduction to Probability Theory, Algebra, and Set Theory Catrin Campbell-Moore and Sebastian Lutz July 28, 2014 Question 1. Draw Venn diagrams

More information

Beyond Uniform Priors in Bayesian Network Structure Learning

Beyond Uniform Priors in Bayesian Network Structure Learning Beyond Uniform Priors in Bayesian Network Structure Learning (for Discrete Bayesian Networks) scutari@stats.ox.ac.uk Department of Statistics April 5, 2017 Bayesian Network Structure Learning Learning

More information

An Empirical-Bayes Score for Discrete Bayesian Networks

An Empirical-Bayes Score for Discrete Bayesian Networks An Empirical-Bayes Score for Discrete Bayesian Networks scutari@stats.ox.ac.uk Department of Statistics September 8, 2016 Bayesian Network Structure Learning Learning a BN B = (G, Θ) from a data set D

More information

7 RC Simulates RA. Lemma: For every RA expression E(A 1... A k ) there exists a DRC formula F with F V (F ) = {A 1,..., A k } and

7 RC Simulates RA. Lemma: For every RA expression E(A 1... A k ) there exists a DRC formula F with F V (F ) = {A 1,..., A k } and 7 RC Simulates RA. We now show that DRC (and hence TRC) is at least as expressive as RA. That is, given an RA expression E that mentions at most C, there is an equivalent DRC expression E that mentions

More information