Model Complexity of Pseudo-independent Models
|
|
- Lester Kelley
- 5 years ago
- Views:
Transcription
1 Model Complexity of Pseudo-independent Models Jae-Hyuck Lee and Yang Xiang Department of Computing and Information Science University of Guelph, Guelph, Canada {jaehyuck, Abstract A type of problem domains known as pseudo-independent (PI) domains poses difficulty for common probabilistic learning methods based on the single-link lookahead search. To learn this type of domain models, a learning method based on the multiple-link lookahead search is needed. An improved result can be obtained by incorporating model complexity into a scoring metric to explicitly trade off model accuracy for complexity and vice versa during selection of the best model among candidates at each learning step. To implement this scoring metric for the PI-learning method, the complexity formula for PI models is required. Previous studies found the complexity formula for full PI models, one of the three major types of PI models (the other two are partial and mixed PI models). This study presents the complexity formula for atomic partial PI models, partial PI models that contain no embedded PI submodels. The complexity is acquired by arithmetic operation on the spaces of domain variables. The new formula provides the basis for further characterizing the complexity of non-atomic PI models, which contain embedded PI submodels in their domains. Keywords: probabilistic reasoning, knowledge discovery, data mining, machine learning, belief networks, model complexity. Introduction Learning probabilistic networks (Lam & Bacchus 1994; Cooper & Herskovits 1992; Heckerman, Geiger, & Chickering 1995; Friedman, Murphy, & Russell 1998) has been an active area of research recently. The task of learning networks is NP-hard (Chickering, Geiger, & Heckerman 1995). Therefore, learning algorithms use a heuristic search, and the common search method is the single-link lookahead which generates network structures that differ by a single link at each level of the search. Pseudo-independent (PI) models (Xiang, Wong, & Cercone 1996) are a class of probabilistic domain models where a group of marginally independent variables displays collective dependency. PI models can be classified into three types: full, partial, and mixed PI models based on the pattern of marginal independency and the extent of collective dependency. The most restrictive type is full PI models Copyright c 2005, American Association for Artificial Intelligence ( All rights reserved. where every proper subset of variables are marginally independent. This is relaxed in partial PI models where not every proper subset of variables are marginally independent. While in full or partial PI models, all variables in the domain are collectively dependent, this is not the case in mixed PI models where only proper subsets, called embedded PI subdomains, of domain variables are collectively dependent. However, the marginal independency pattern of each embedded PI subdomain in mixed PI models is that of either a full or a partial PI model. PI models cannot be learned by the single-link lookahead search because the underlying collective dependency cannot be recovered. Incorrectly learned models introduce silent errors when used for decision making. To learn PI models, a more sophisticated search method called multi-link lookahead (Xiang, Wong, & Cercone 1997) should be used. It was implemented in the learning algorithm called RML (Xiang et al. 2000). The algorithm is equipped with the Kullback- Leibler cross entropy as the scoring metric for the goodnessof-fit to data. The scoring metric of the learning algorithm can be improved by combining cross entropy as a measure of model accuracy with a measure of model complexity. A weighted sum of the two measures is a simple way of combination and other alternatives are also possible. By using such a scoring metric, between two models of the same accuracy (as measured by cross entropy), the one with less complexity will end up with a higher score and be preferred. The focus of this paper is on the assessment of the model complexity and we defer the details of combination to future work. Model complexity is defined by the number of parameters required to fully specify a model. In previous work (Xiang 1997), a formula was presented for estimating the number of parameters in full PI models, one of the three types of PI models (the other two are partial and mixed PI models). However, the formula was very complex, and did not show the structural dependence relationships among parameters. The new concise formula for full PI models was recently presented (Xiang, Lee, & Cercone 2004) using a hypercube (Xiang, Lee, & Cercone 2003). In this study, we present the model complexity formula for atomic partial PI models, partial PI models that contain no embedded PI submodels. Atomic partial PI models are the building blocks of mixed PI models. This new formula
2 is simple in form, and provides good insight into the structural dependency relationships among parameters of partial PI models. Furthermore, the previous complexity formula for full PI models is integrated into this new formula. In other words, substituted with the conditions of full PI models, the new formula is reduced to the formula for full PI models; this also confirms that full PI models are a special case of partial PI models. Besides, the new formula provides the basis for further characterizing the complexity of mixed PI models. We apply the hypercube method to show how the complexity of partial PI models can be acquired from the spaces of variables. Background Let V be a set of n discrete variables X 1,..., X n (in what follows we will focus on domains of finite and discrete variables). Each variable X i has a finite space S i = {x i,1, x i,2,..., x i,di } of cardinality D i. The space of a set V of variables is defined by the Cartesian product of the spaces of all variables in V, that is, S V = S 1 S n (or i S i). Thus, S V contains the tuples made of all possible combinations of values of the variables in V. Each tuple is called a configuration of V, denoted by (x 1,..., x n ). Let P (X i ) denote the probability function over X i and P (x i ) denote the probability value P (X i = x i ). The following axiom of probability is called the total probability law: P (S i ) = 1, or P (x i,1 )+P (x i,2 )+ +P (x i,di ) = 1. (1) A probabilistic domain model (PDM) M over V defines the probability values of every configuration for every subset A V. P (V ) or P (X 1,..., X n ) refers to the joint probability distribution (JPD) function over X 1,..., X n, and P (x 1,..., x n ) refers to the joint probability value of the configuration (x 1,..., x n ). The probability function P (A) over any proper subsets A V refers to the marginal probability distribution (MPD) function over A. If A = {X 1,..., X m } (A V ), then P (x 1,..., x m ) refers to the marginal probability value. A set of probability values that directly specifies a PDM is called parameters of the PDM. A joint probability value P (x 1,..., x n ) is referred to as a joint parameter or joint and a marginal probability value P (x 1,..., x m ) as a marginal parameter or marginal. Among parameters associated with a PDM, some parameters can be derived from others by using constraints such as the total probability law. Such derivable parameters are called constrained or dependent parameters while underivable parameters are called unconstrained, free, or independent parameters. The number of independent parameters of a PDM is called the model complexity of the PDM, denoted as ω. When no information of the constraints on a general PDM is given, the PDM should be specified only by joint parameters. The following ω g gives the number of joint parameters required: Let M be a general PDM over V = {X 1,..., X n }. Then the number of independent parameters of M is upper-bounded by ( n ) ω g = D i 1. (2) i=1 One joint is dependent since it can be derived from others by the total probability law (Eq. (1)). For any three disjoint subsets of variables A, B and C in V, A and B are called conditionally independent given C, denoted by I(A, B C), iff P (A B, C) = P (A C) for all values in A, B and C such that P (B, C) > 0. Given subsets of variables A, B, C, D, E V, the following property of conditional independence is called Symmetry: I(A, B C) I(B, A C); (3) and the following is Composition: I(A, B C) I(A, D C) I(A, B D C). (4) Two disjoint subsets A and B are said to be marginally independent, denoted by I(A, B ), iff P (A B) = P (A) for all values A and B such that P (B) > 0. If two subsets of variables are marginally independent, no dependency exists between them. Hence, each subset can be modeled independently without losing information. If each variable X i in a set A is marginally independent of the rest, the variables in A are said to be marginally independent. The probability distribution over a set of marginally independent variables can be written as the product of the marginal of each variable, that is, P (A) = X P (X i A i). Variables in a set A are called generally dependent if P (B A \ B) P (B) for every proper subset B A. If a subset of variables is generally dependent, a proper subsets cannot be modeled independently without losing information. Variables in A are collectively dependent if, for each proper subset B A, there exists no proper subset C A \ B that satisfies P (B A \ B) = P (B C). Collective dependence prevents conditional independence and modeling through proper subsets of variables. A pseudo-independent (PI) model is a PDM where proper subsets of a set of collectively dependent variables display marginal independence (Xiang, Wong, and Cercone 1997). Definition 1 (Full PI model). A PDM over a set V ( V 3) of variables is a full PI model if the following properties (called axioms of full PI models) hold: (S I ) Variables in any proper subset of V are marginally independent. (S II ) Variables in V are collectively dependent. The complexity of full PI models is given as follows: Theorem 2 (Complexity of full PI models by Xiang, Lee, & Cercone, 2004). Let a PDM M be a full PI model over V = {X 1,..., X n }. Then the number of independent parameters of M is upper-bounded by n n ω f = (D i 1) + (D i 1). (5) i=1 i=1 The axiom (S I ) of marginal independence is relaxed in partial PI models, as is defined through marginally independent partition.
3 Definition 3 (Marginally independent partition). Let V ( V 3) be a set of variables and B = {B 1,..., B m } (m 2) be a partition of V. B is a marginally independent partition if for every subset A = {X i,k X i,k B k for k = 1,..., m}, variables in A are marginally independent. Each partition block B i in B is called a marginally independent block. A marginally independent partition of V groups variables in V into m marginally independent blocks. The property of marginally independent blocks is that if a subset A is formed by taking one element from different blocks, then variables in A are always marginally independent. In a partial PI model, it is not necessary that every proper subset is marginally independent. Definition 4 (Partial PI model). A PDM over a set V ( V 3) of variables is a partial PI model on V if the following properties (called axioms of partial PI models) holds: (S I ) V can be partitioned into two or more marginally independent blocks. (S II ) Variables in V are collectively dependent. The following definitions on maximum marginally independent partition is needed later for obtaining the complexity of partial PI models: Definition 5 (Maximum partition and minimum block). Let B = {B 1,..., B m } be a marginally independent partition of a partial PI model over V. B is a maximum marginally independent partition if there exists no marginally independent partition B of V such that B < B. The blocks of a maximum marginally independent partition are called the minimum marginally-independent blocks or minimum blocks. Complexity of atomic partial PI models The following defines atomic partial PI models: Definition 6 (Atomic PI model). A PDM M over a set V ( V 3) of variables is an atomic PI model if M is either a full or partial PI model, and no collective dependency exists in any proper subsets of V. Because the dependency within each block, the complexity of partial PI models is higher than full PI models, but lower than general PDMs with variables of the same space cardinalities, as we will see later. The following lemma states that in a PDM that satisfies Composition (Eq. (4)), if every pair of variables between two subsets are marginally independent, then the two subsets are marginally independent. Lemma 7 (Marginal independence of subsets). Let M be a PDM over V where Composition holds in every subset. Let B α = {Y 1,..., Y s } and B β = {Z 1,..., Z t } denote any two disjoint nonempty subsets of variables in V. If I(Y i, Z j ) holds for every pair (Y i, Z j ), then I(B α, B β ). (6) Proof. We prove that I(Y i, Z j ) for every (Y i, Z j ) implies I(Y i, B β ) and that I(Y i, B β ) for every Y i implies I(B α, B β ). Applying Composition recursively from I(Y i, Z 1 ) to I(Y i, Z t ) gives I(Y i, Z 1 Z t ) or I(Y i, B β ). By Symmetry (Eq. (3)), I(Y i, B β ) is equivalent to I(B β, Y i ). In the same manner, by applying Composition recursively from I(B β, Y 1 ) to I(B β, Y s ) gives I(B β, Y 1 Y s ) or I(B β, B α ). By Symmetry this is equivalent to I(B α, B β ). Lemma 7 assumes that the PDM satisfies Composition. Composition implies no collective dependency in the PDM as follows: Lemma 8 (Composition implies no collective dependency). Let M be a PDM over V that satisfies Composition. Then no collective dependency exists in V. Proof. This Lemma directly follows from the definition of Composition (Eq. (4)). The following Lemmas 9 and 10 are required for proving Lemma 11 on marginal independence of blocks. The two lemmas state that the marginal independency of partition is preserved after removing a proper subset of variables from the partition blocks and after merging the blocks. These Lemmas hold because the marginal independency of partition is defined by the independency in a set of each variable taken from the partition blocks. Lemma 9 (Marginal independency of any subpartitions). Let a PDM M be a partial PI model over V = {X 1,..., X n } (n 3) with a marginally-independent partition B = {B 1,..., B m } (m 2). A new partition B = {B 1,..., B m} over V (V V ) can be defined by removing a proper subset of variables from one or more partition blocks such that B i B i for every B i (i = 1,..., m). Then, B is also a marginally-independent partition called a subpartition. Lemma 10 (Marginal independency after merging). Let a PDM M be a partial PI model over V = {X 1,..., X n } (n 3) with a marginally-independent partition B = {B 1,..., B m } (m 2). A new partition B = { B 1,..., B r } (r < m) over V can be defined by merging one or more blocks in B. Then, B is also a marginallyindependent partition. The following lemma states that in an atomic partial PI model, removing one variable from V makes any two blocks in V marginally independent. Lemma 11 (Marginal independence between two independent blocks). Let a PDM M be an atomic partial PI model over V = {X 1,..., X n } (n 3), where Composition holds in every proper subset. Let a marginallyindependent partition be denoted by B = {B 1,..., B m }. When m > 2, for any two distinct blocks B r and B q in B, I(B r, B q ). (7a) When m = 2, that is B = {B 1, B 2 }, for any X i B 1 or any X j B 2, I(B 1 \ {X i }, B 2 ) and I(B 1, B 2 \ {X j } ). (7b)
4 Proof. Case 1, where m > 2: Let B r = {Y 1,..., Y s } and B q = {Z 1,..., Z t }. Let V = B r B q. By the definition of partial PI models, I(Y, Z ) holds for every Y and Z. Since V V, Composition holds in V. By Lemma 7, I(B r, B q ) must hold. Case 2, where m = 2: Let V i = ( B 1 \ {X i } ) B 2 and V {X j } ). Then V i V and V j j = B 1 ( B 2 \ V. Therefore, by the same argument as Case 1, both I(B 1 \ {X i }, B 2 ) and I(B 1, B 2 \ {X j } ) hold. As shown in the parameterization of full PI models (Xiang, Lee, & Cercone 2004), a sum of joint probabilities can be represented by a product of marginal probabilities. While the size of joint spaces grows exponentially in the number of variables, the size of marginal spaces grows only linearly. This results in the lesser complexity of a full PI model compared with a general PDM. A similar joint-marginal relationship holds for partial PI models. However, the marginal probability is from each partition block, not from each variable as is the case with full PI models. This is shown in the following lemma: Lemma 12 (Joint-marginal equality of partial PI models). Let a PDM M be an atomic partial PI model over V = {X 1,..., X n } (n 3), where Composition holds in every proper subset. Let a marginally-independent partition be denoted by B = {B 1,..., B m } (m 2). For an arbitrary X i, let B = {B 1,..., B m} denote a subpartition of B made by removing an X i from B so that every B i is the same as B i except the block from which X i was removed. Then, D i k=1 P (X 1,..., x i,k,..., X n ) = P (B 1) P (B m). (8) Proof. The summation D i k=1 P (X 1,..., x i,k,..., X n ) at the left represents marginalization on a variable X i. This is equal to P (X 1,..., X i 1, X i+1,..., X n ) or in partitional notation P (B 1,..., B m). The proof by induction is as follows: Base case where m = 2: We need to show P (B 1, B 2) = P (B 1)P (B 2). This is equivalent to I(B 1, B 2 ), which is obvious by Lemma 11. Induction hypothesis: Assume the following holds for m = k: P (B 1,..., B k) = P (B 1) P (B k). (9) Induction step: We need to show the following holds for m = k + 1: P (B 1,..., B k+1) = P (B 1) P (B k+1). Merging {B 1,..., B k } into one block and applying Lemmas 9, 10, 11 give P (B 1,..., B k, B k+1) = P (B 1,..., B k)p (B k+1). By the induction hypothesis (Eq. (9)), P (B 1,..., B k)p (B k+1) = P (B 1) P (B k)p (B k+1). Therefore, by mathematical induction from m = 2, k and k + 1, Eq. (8) must hold for any m. B1 B2 Figure 1: A partial PI model. (The dotted circle depicts each partition block.) Corollary 13, which directly follows from Lemma 12, shows the relation between joint parameters and marginal parameters, by which joint parameters can be derived from other marginal and joint parameters. Corollary 13 (Partial PI joint constraint). Let a PDM M be an atomic partial PI model over V = {X 1,..., X n } (n 3), where Composition holds in every proper subset. Let a marginally-independent partition be denoted by B = {B 1,..., B m } (m 2). For an arbitrary X i, let B = {B 1,..., B m} denote a subpartition of B made by removing an X i from B so that every B i is the same as B i except the block from which X i was removed. Then, P (X 1,..., x i,r,..., X n ) = P (B 1) P (B m) D i k=1,k r P (X 1,..., x i,k,..., X n ). (10) We are to derive the number of independent parameters for specifying a PDM by using the constraint (Corollary 13) on atomic partial PI domains. First, we determine the number of independent marginal parameters, denoted as ω m, that are required to specify all P (B 1),..., P (B m) terms in Corollary 13. Next, we determine the number of joint parameters that cannot be derived from ω m plus other joint parameters. In a hypercube, this is equivalent to counting the number of cells that cannot be derived since a cell in a hypercube corresponds to a joint parameter in the JPD. The procedure is as follows: (i) Check cells one by one to see whether it can be derived from ω m and any other cells by applying Corollary 13. (ii) As soon as a cell is determined to be derivable, it is eliminated from further consideration. Repeat this procedure until no more cells can be eliminated. The remaining cells and the ω m marginal parameters constitute the total number of independent parameters of the partial PI model.
5 For example, consider the partial PI model in Figure 1, which corresponds to the hypercube in Figure 2(a). The PDM consists of three variables {X 1, X 2, X 3 }; X 1, X 2 are ternary, and X 3 is binary. The marginally-independent partition B = {B 1, B 2 } or {{X 1 }, {X 2, X 3 }}. For this PDM, ω m = (3 1) + (3 2 1) = 7. For example, to specify P (X 1 ) for B 1, 2 marginals are required such as: P (x 1,1 ), P (x 1,2 ); and to specify P (X 2, X 3 ) for B 2, 5 parameters are required such as: P (x 2,1, x 3,1 ), P (x 2,2, x 3,1 ), P (x 2,3, x 3,1 ), P (x 2,1, x 3,2 ), P (x 2,2, x 3,2 ). (11) We assume that the 7 marginal parameters have been specified, and thus the other 2 marginal parameters P (x 1,3 ) and P (x 2,3, x 3,2 ) can be derived by the total probability law (Eq. (1)). Then, by marginalization, the following 6 marginal parameters can also be derived from the 5 parameters in (11) plus P (x 2,3, x 3,2 ): P (x 2,1 ), P (x 2,2 ), P (x 2,3 ), P (x 3,1 ), P (x 3,2 ), P (x 3,3 ). We refer to the set of cells with the identical value X i = x i,j as the hyperplane at X i = x i,j in the hypercube. For example, the hyperplane at X 1 = x 1,3 in Figure 2(a) refer to the following 6 cells. (We use p(i, j, k) as an abbreviation for P (X 1,i, X 2,j, X 3,k ).): p(3, 1, 1), p(3, 2, 1), p(3, 3, 1), p(3, 1, 2), p(3, 2, 2), p(3, 3, 2) By Corollary 13, we have p(3, 1, 1) = P (x 2,1, x 3,1 ) (p(1, 1, 1) + p(2, 1, 1)). That is, the cell at the front-lower-left corner can be derived by the two cells behind it and the marginal parameters. All other cells on the hyperplane at X 1 = x 1,3 can be similarly derived. Therefore, these 6 cells can be eliminated from further consideration. The remaining 12 cells are shown in Figure 2(b). Using the same idea, four of the x1,3 x2,1 x2,2 x 2,3 x 3,2 (a) The original hypercube (3 3 2) x2,1 x2,2 x2,3 x3,2 (b) The hypercube after eliminating X 1 = x 1,3 Figure 2: Eliminating derivable cells from a JPD hypercube. remaining cells at X 2 = x 2,3 can be derived, and therefore eliminated from further consideration. The remaining 8 cells are shown in Figure 3(a). Again, four of the remaining cells at X 3 = x 3,2 can be derived. After eliminating them, only 4 cells are left, as shown in Figure 3(b): p(1, 1, 1), p(2, 1, 2), p(1, 2, 1), p(2, 2, 1), Since no more cells can be further eliminated, the number of independent parameters needed to specify the partial PI model is 11, with 7 marginal parameters and 4 joint parameters. Note it would take 17 parameters to specify the JPD of a general PDM over three variables of the same space cardinalities. x2,1 x2,2 x3,2 (a) The hypercube after eliminating X 2 = x 2,3 x2,1 x2,2 (b) The hypercube after eliminating X 3 = x 3,2 Figure 3: Eliminating derivable cells from a JPD hypercube. Now we present the general result on the number of independent parameters of atomic partial PI models. Theorem 14 (Complexity of atomic partial PI models). Let a PDM M be an atomic partial PI model over V = {X 1,..., X n } (n 3), where Composition holds in every proper subset. Let D 1,..., D n denote the cardinality of the space of each variable. Let a marginally-independent partition of V be denoted by B = {B 1,..., B m } (m 2), and the cardinality of the space of each block B 1,..., B m be denoted by D (B1),..., D (Bm), respectively. Then, the number ω ap of parameters required for specifying the JPD of M is upper-bounded by ω ap = n i=1 (D i 1) + m j=1 (D (B j) 1). (12) Proof. Before proving the theorem, we explain the result briefly. The first term on the right is the cardinality of the joint space of a general PDM over the set of variables except the space of each variable is reduced by one. This term is the same as the one in the model complexity formula for the full PI model (Eq. (2)). The second term is the number of marginal parameters for specifying the joint space of each partition block. What we need to show is how to derive all joint parameters with m j=1 (D (B j) 1) marginals plus n i=1 (D i 1) joints. First, m j=1 (D (B j) 1) marginal parameters are required for specifying all P (B 1),..., P (B m) terms in Corollary 13. We construct a hypercube for M to apply Eq. (10) among groups of cells. Applying Corollary 13 and using the similar argument for the example in Figure 1, we can eliminate hyperplanes at X 1 = x 1,D1, X 2 = x 2,D2,..., X n = x n,dn in that order such that for each X i, all cells on the hyperplanes at X i = x i,di can be derived from cells outside the hyperplane and the marginal parameters. The remaining cells form a hypercube whose length along the X i axis is D i 1 (i = 1, 2,..., n). Therefore, the total number of cells in this hypercube is n i=1 (D i 1).
6 The following shows how to use Theorem 14 to compute complexity of an atomic partial PI model. Example 15 (Computing model complexity). Consider the partial PI model in Figures 4. The domain consists of 6 variables from X 1 to X 6, and X 1, X 2, X 3 are ternary; X 4, X 5 are binary; and X 6 is 5-nary. The domain has a marginally-independent partition {B 1, B 2, B 3 } or {{X 1, X 3 }, {X 2, X 4, X 5 }, {X 6 }}. B1 X6 Figure 4: A partial PI model with 3 blocks and a total of 6 variables. (The dotted circles depict each partition block.) The number of marginal parameters for all partition blocks is 23, given by (3 3 1) + ( ) + (5 1). The number of independent joint parameters is 32, given by (2 1) 2 (3 1) 3 (5 1). Therefore, the total number of parameters for specifying the domain in this example is = 55. Compare this number with the number of parameters for specifying a general PDM over the same set of variables by using the total probability law, giving = 539. This shows the complexity of a partial PI model is significantly less than that of a general PDM. From the perspective of using a learned model, this means the model allows faster inference, more accurate results, and requires less space to represent and to store parameters during inference, as well as provides more expressive power with a compact form. Note Theorem 14 holds also for full PI models since a full PI model is a special case of partial PI models. The proof of this is done by substituting D (Bj) in Eq. (12) with D i (every partition block of full PI models is a singleton), yielding Eq. (5). Conclusion In this work, we present the complexity formula for atomic partial PI models, the building blocks of non-atomic PI models. We employ the hypercube method for analyzing the complexity of PI models. Further work to be done in this line of research is to find the complexity formula for non-atomic PI models. Since a non-atomic PI model contains (recursively embedded) full or partial-pi submodel(s), the complexity can be acquired B3 X5 X4 B2 by applying either full or atomic partial PI formula appropriately to each subdomain (recursively along the depth of embedding from the top to the bottom). The study of PI model complexity will lead to a new generation of PI-learning algorithms equipped with a better scoring metric. Acknowledgments The authors are grateful to the anonymous reviewers for their comments on this paper. Moreover, we especially thank Mr. Chang-Yup Lee for his financial support. Besides, this research is supported in part by NSERC of Canada. References Chickering, D.; Geiger, D.; and Heckerman, D Learning Bayesian networks: search methods and experimental results. In Proceedings of 5th Conference on Artificial Intelligence and Statistics, Cooper, G., and Herskovits, E A Bayesian method for the induction of probabilistic networks from data. Machine Learning 9: Friedman, N.; Murphy, K.; and Russell, S Learning the structure of dynamic probabilistic networks. In Cooper, G., and Moral, S., eds., Proceedings of 14th Conference on Uncertainty in Artificial Intelligence, Heckerman, D.; Geiger, D.; and Chickering, D Learning Bayesian networks: the combination of knowledge and statistical data. Machine Learning 20: Lam, W., and Bacchus, F Learning Bayesian networks: an approach based on the MDL principle. Computational Intelligence 10(3): Xiang, Y.; Hu, J.; Cercone, N.; and Hamilton, H Learning pseudo-independent models: analytical and experimental results. In Hamilton, H., ed., Advances in Artificial Intelligence. Springer Xiang, Y.; Lee, J.; and Cercone, N Parameterization of pseudo-independent models. In Proceedings of 16th Florida Artificial Intelligence Research Society Conference, Xiang, Y.; Lee, J.; and Cercone, N Towards better scoring metrics for pseudo-independent models. International Journal of Intelligent Systems 20. Xiang, Y.; Wong, S.; and Cercone, N Critical remarks on single link search in learning belief networks. In Proceedings of 12th Conference on Uncertainty in Artificial Intelligence, Xiang, Y.; Wong, S.; and Cercone, N A microscopic study of minimum entropy search in learning decomposable Markov networks. Machine Learning 26(1): Xiang, Y Towards understanding of pseudoindependent domains. In Poster Proceedings of 10th International Symposium on Methodologies for Intelligent Systems.
Discovery of Pseudo-Independent Models from Data
Discovery of Pseudo-Independent Models from Data Yang Xiang, University of Guelph, Canada June 4, 2004 INTRODUCTION Graphical models such as Bayesian networks (BNs) and decomposable Markov networks (DMNs)
More informationA Decision Theoretic View on Choosing Heuristics for Discovery of Graphical Models
A Decision Theoretic View on Choosing Heuristics for Discovery of Graphical Models Y. Xiang University of Guelph, Canada Abstract Discovery of graphical models is NP-hard in general, which justifies using
More informationLearning Bayesian Networks Does Not Have to Be NP-Hard
Learning Bayesian Networks Does Not Have to Be NP-Hard Norbert Dojer Institute of Informatics, Warsaw University, Banacha, 0-097 Warszawa, Poland dojer@mimuw.edu.pl Abstract. We propose an algorithm for
More informationCOMP538: Introduction to Bayesian Networks
COMP538: Introduction to Bayesian Networks Lecture 9: Optimal Structure Learning Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering Hong Kong University of Science and Technology
More informationCharacterising Probability Distributions via Entropies
1 Characterising Probability Distributions via Entropies Satyajit Thakor, Terence Chan and Alex Grant Indian Institute of Technology Mandi University of South Australia Myriota Pty Ltd arxiv:1602.03618v2
More informationExact distribution theory for belief net responses
Exact distribution theory for belief net responses Peter M. Hooper Department of Mathematical and Statistical Sciences University of Alberta Edmonton, Canada, T6G 2G1 hooper@stat.ualberta.ca May 2, 2008
More informationRespecting Markov Equivalence in Computing Posterior Probabilities of Causal Graphical Features
Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10) Respecting Markov Equivalence in Computing Posterior Probabilities of Causal Graphical Features Eun Yong Kang Department
More informationAbstract. Three Methods and Their Limitations. N-1 Experiments Suffice to Determine the Causal Relations Among N Variables
N-1 Experiments Suffice to Determine the Causal Relations Among N Variables Frederick Eberhardt Clark Glymour 1 Richard Scheines Carnegie Mellon University Abstract By combining experimental interventions
More informationBasing Decisions on Sentences in Decision Diagrams
Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence Basing Decisions on Sentences in Decision Diagrams Yexiang Xue Department of Computer Science Cornell University yexiang@cs.cornell.edu
More informationProf. Dr. Lars Schmidt-Thieme, L. B. Marinho, K. Buza Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course
Course on Bayesian Networks, winter term 2007 0/31 Bayesian Networks Bayesian Networks I. Bayesian Networks / 1. Probabilistic Independence and Separation in Graphs Prof. Dr. Lars Schmidt-Thieme, L. B.
More informationLecture 9: Bayesian Learning
Lecture 9: Bayesian Learning Cognitive Systems II - Machine Learning Part II: Special Aspects of Concept Learning Bayes Theorem, MAL / ML hypotheses, Brute-force MAP LEARNING, MDL principle, Bayes Optimal
More informationInternational Journal of Approximate Reasoning
International Journal of Approximate Reasoning 53 (2012) 988 1002 Contents lists available at SciVerse ScienceDirect International Journal of Approximate Reasoning journal homepage:www.elsevier.com/locate/ijar
More informationIntroduction to Probabilistic Graphical Models
Introduction to Probabilistic Graphical Models Kyu-Baek Hwang and Byoung-Tak Zhang Biointelligence Lab School of Computer Science and Engineering Seoul National University Seoul 151-742 Korea E-mail: kbhwang@bi.snu.ac.kr
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationDoes Better Inference mean Better Learning?
Does Better Inference mean Better Learning? Andrew E. Gelfand, Rina Dechter & Alexander Ihler Department of Computer Science University of California, Irvine {agelfand,dechter,ihler}@ics.uci.edu Abstract
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationNaive Bayesian Rough Sets
Naive Bayesian Rough Sets Yiyu Yao and Bing Zhou Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 {yyao,zhou200b}@cs.uregina.ca Abstract. A naive Bayesian classifier
More informationWeighting Expert Opinions in Group Decision Making for the Influential Effects between Variables in a Bayesian Network Model
Weighting Expert Opinions in Group Decision Making for the Influential Effects between Variables in a Bayesian Network Model Nipat Jongsawat 1, Wichian Premchaiswadi 2 Graduate School of Information Technology
More informationExercises for Unit VI (Infinite constructions in set theory)
Exercises for Unit VI (Infinite constructions in set theory) VI.1 : Indexed families and set theoretic operations (Halmos, 4, 8 9; Lipschutz, 5.3 5.4) Lipschutz : 5.3 5.6, 5.29 5.32, 9.14 1. Generalize
More informationOn Markov Properties in Evidence Theory
On Markov Properties in Evidence Theory 131 On Markov Properties in Evidence Theory Jiřina Vejnarová Institute of Information Theory and Automation of the ASCR & University of Economics, Prague vejnar@utia.cas.cz
More information13 : Variational Inference: Loopy Belief Propagation
10-708: Probabilistic Graphical Models 10-708, Spring 2014 13 : Variational Inference: Loopy Belief Propagation Lecturer: Eric P. Xing Scribes: Rajarshi Das, Zhengzhong Liu, Dishan Gupta 1 Introduction
More informationFast and Accurate Causal Inference from Time Series Data
Fast and Accurate Causal Inference from Time Series Data Yuxiao Huang and Samantha Kleinberg Stevens Institute of Technology Hoboken, NJ {yuxiao.huang, samantha.kleinberg}@stevens.edu Abstract Causal inference
More information12 : Variational Inference I
10-708: Probabilistic Graphical Models, Spring 2015 12 : Variational Inference I Lecturer: Eric P. Xing Scribes: Fattaneh Jabbari, Eric Lei, Evan Shapiro 1 Introduction Probabilistic inference is one of
More informationIndependence for Full Conditional Measures, Graphoids and Bayesian Networks
Independence for Full Conditional Measures, Graphoids and Bayesian Networks Fabio G. Cozman Universidade de Sao Paulo Teddy Seidenfeld Carnegie Mellon University February 28, 2007 Abstract This paper examines
More informationLecture 6: Graphical Models: Learning
Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)
More informationExtraction of NAT Causal Structures Based on Bipartition
Proceedings of the Thirtieth International Florida Artificial Intelligence Research Society Conference Extraction of NAT Causal Structures Based on Bipartition Yang Xiang School of Computer Science University
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationBranch and Bound for Regular Bayesian Network Structure Learning
Branch and Bound for Regular Bayesian Network Structure Learning Joe Suzuki and Jun Kawahara Osaka University, Japan. j-suzuki@sigmath.es.osaka-u.ac.jp Nara Institute of Science and Technology, Japan.
More informationShort Note: Naive Bayes Classifiers and Permanence of Ratios
Short Note: Naive Bayes Classifiers and Permanence of Ratios Julián M. Ortiz (jmo1@ualberta.ca) Department of Civil & Environmental Engineering University of Alberta Abstract The assumption of permanence
More informationVariational algorithms for marginal MAP
Variational algorithms for marginal MAP Alexander Ihler UC Irvine CIOG Workshop November 2011 Variational algorithms for marginal MAP Alexander Ihler UC Irvine CIOG Workshop November 2011 Work with Qiang
More informationLINDSTRÖM S THEOREM SALMAN SIDDIQI
LINDSTRÖM S THEOREM SALMAN SIDDIQI Abstract. This paper attempts to serve as an introduction to abstract model theory. We introduce the notion of abstract logics, explore first-order logic as an instance
More informationReductionist View: A Priori Algorithm and Vector-Space Text Retrieval. Sargur Srihari University at Buffalo The State University of New York
Reductionist View: A Priori Algorithm and Vector-Space Text Retrieval Sargur Srihari University at Buffalo The State University of New York 1 A Priori Algorithm for Association Rule Learning Association
More informationCSCE 478/878 Lecture 6: Bayesian Learning
Bayesian Methods Not all hypotheses are created equal (even if they are all consistent with the training data) Outline CSCE 478/878 Lecture 6: Bayesian Learning Stephen D. Scott (Adapted from Tom Mitchell
More informationMachine Learning Summer School
Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,
More informationUncertainty and Bayesian Networks
Uncertainty and Bayesian Networks Tutorial 3 Tutorial 3 1 Outline Uncertainty Probability Syntax and Semantics for Uncertainty Inference Independence and Bayes Rule Syntax and Semantics for Bayesian Networks
More informationBrief Introduction of Machine Learning Techniques for Content Analysis
1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview
More informationData Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396
Data Mining Linear & nonlinear classifiers Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 31 Table of contents 1 Introduction
More informationScore Metrics for Learning Bayesian Networks used as Fitness Function in a Genetic Algorithm
Score Metrics for Learning Bayesian Networks used as Fitness Function in a Genetic Algorithm Edimilson B. dos Santos 1, Estevam R. Hruschka Jr. 2 and Nelson F. F. Ebecken 1 1 COPPE/UFRJ - Federal University
More informationCONSTRAINED PERCOLATION ON Z 2
CONSTRAINED PERCOLATION ON Z 2 ZHONGYANG LI Abstract. We study a constrained percolation process on Z 2, and prove the almost sure nonexistence of infinite clusters and contours for a large class of probability
More informationRepresenting Independence Models with Elementary Triplets
Representing Independence Models with Elementary Triplets Jose M. Peña ADIT, IDA, Linköping University, Sweden jose.m.pena@liu.se Abstract An elementary triplet in an independence model represents a conditional
More informationHandout 2 (Correction of Handout 1 plus continued discussion/hw) Comments and Homework in Chapter 1
22M:132 Fall 07 J. Simon Handout 2 (Correction of Handout 1 plus continued discussion/hw) Comments and Homework in Chapter 1 Chapter 1 contains material on sets, functions, relations, and cardinality that
More informationSeminaar Abstrakte Wiskunde Seminar in Abstract Mathematics Lecture notes in progress (27 March 2010)
http://math.sun.ac.za/amsc/sam Seminaar Abstrakte Wiskunde Seminar in Abstract Mathematics 2009-2010 Lecture notes in progress (27 March 2010) Contents 2009 Semester I: Elements 5 1. Cartesian product
More informationUndirected Graphical Models: Markov Random Fields
Undirected Graphical Models: Markov Random Fields 40-956 Advanced Topics in AI: Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2015 Markov Random Field Structure: undirected
More informationJustifying Multiply Sectioned Bayesian Networks
Justifying Multiply Sectioned Bayesian Networks Y. Xiang and V. Lesser University of Massachusetts, Amherst, MA {yxiang, lesser}@cs.umass.edu Abstract We consider multiple agents who s task is to determine
More informationThe binary entropy function
ECE 7680 Lecture 2 Definitions and Basic Facts Objective: To learn a bunch of definitions about entropy and information measures that will be useful through the quarter, and to present some simple but
More informationAn Embedded Implementation of Bayesian Network Robot Programming Methods
1 An Embedded Implementation of Bayesian Network Robot Programming Methods By Mark A. Post Lecturer, Space Mechatronic Systems Technology Laboratory. Department of Design, Manufacture and Engineering Management.
More informationOn the Conditional Independence Implication Problem: A Lattice-Theoretic Approach
On the Conditional Independence Implication Problem: A Lattice-Theoretic Approach Mathias Niepert Department of Computer Science Indiana University Bloomington, IN, USA mniepert@cs.indiana.edu Dirk Van
More informationText Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University
Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data
More informationDistributed Optimization. Song Chong EE, KAIST
Distributed Optimization Song Chong EE, KAIST songchong@kaist.edu Dynamic Programming for Path Planning A path-planning problem consists of a weighted directed graph with a set of n nodes N, directed links
More informationSTATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas. Bayesian networks: Representation
STATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas Bayesian networks: Representation Motivation Explicit representation of the joint distribution is unmanageable Computationally:
More informationSeparable and Transitive Graphoids
Separable and Transitive Graphoids Dan Geiger Northrop Research and Technology Center One Research Park Palos Verdes, CA 90274 David Heckerman Medical Computer Science Group Knowledge Systems Laboratory
More informationCoalitional Structure of the Muller-Satterthwaite Theorem
Coalitional Structure of the Muller-Satterthwaite Theorem Pingzhong Tang and Tuomas Sandholm Computer Science Department Carnegie Mellon University {kenshin,sandholm}@cscmuedu Abstract The Muller-Satterthwaite
More informationOptimization of Quadratic Forms: NP Hard Problems : Neural Networks
1 Optimization of Quadratic Forms: NP Hard Problems : Neural Networks Garimella Rama Murthy, Associate Professor, International Institute of Information Technology, Gachibowli, HYDERABAD, AP, INDIA ABSTRACT
More informationROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015
ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti
More informationPDF hosted at the Radboud Repository of the Radboud University Nijmegen
PDF hosted at the Radboud Repository of the Radboud University Nijmegen The following full text is an author's version which may differ from the publisher's version. For additional information about this
More informationAutomatic Differentiation Equipped Variable Elimination for Sensitivity Analysis on Probabilistic Inference Queries
Automatic Differentiation Equipped Variable Elimination for Sensitivity Analysis on Probabilistic Inference Queries Anonymous Author(s) Affiliation Address email Abstract 1 2 3 4 5 6 7 8 9 10 11 12 Probabilistic
More informationEVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE FILTER
EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE FILTER Zhen Zhen 1, Jun Young Lee 2, and Abdus Saboor 3 1 Mingde College, Guizhou University, China zhenz2000@21cn.com 2 Department
More informationOn Conditional Independence in Evidence Theory
6th International Symposium on Imprecise Probability: Theories and Applications, Durham, United Kingdom, 2009 On Conditional Independence in Evidence Theory Jiřina Vejnarová Institute of Information Theory
More informationLars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute BW/WI & Institute for Computer Science, University of Hildesheim
Course on Bayesian Networks, summer term 2010 0/42 Bayesian Networks Bayesian Networks 11. Structure Learning / Constrained-based Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL)
More informationA Comparative Study of Noncontextual and Contextual Dependencies
A Comparative Study of Noncontextual Contextual Dependencies S.K.M. Wong 1 C.J. Butz 2 1 Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 E-mail: wong@cs.uregina.ca
More informationGraphical Model Inference with Perfect Graphs
Graphical Model Inference with Perfect Graphs Tony Jebara Columbia University July 25, 2013 joint work with Adrian Weller Graphical models and Markov random fields We depict a graphical model G as a bipartite
More informationFinite Mixture Model of Bounded Semi-naive Bayesian Networks Classifier
Finite Mixture Model of Bounded Semi-naive Bayesian Networks Classifier Kaizhu Huang, Irwin King, and Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong Shatin,
More informationIntelligent Systems:
Intelligent Systems: Undirected Graphical models (Factor Graphs) (2 lectures) Carsten Rother 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM Roadmap for next two lectures Definition
More informationarxiv: v1 [math.fa] 14 Jul 2018
Construction of Regular Non-Atomic arxiv:180705437v1 [mathfa] 14 Jul 2018 Strictly-Positive Measures in Second-Countable Locally Compact Non-Atomic Hausdorff Spaces Abstract Jason Bentley Department of
More informationA Bayesian Approach to Concept Drift
A Bayesian Approach to Concept Drift Stephen H. Bach Marcus A. Maloof Department of Computer Science Georgetown University Washington, DC 20007, USA {bach, maloof}@cs.georgetown.edu Abstract To cope with
More informationTractable Inference in Hybrid Bayesian Networks with Deterministic Conditionals using Re-approximations
Tractable Inference in Hybrid Bayesian Networks with Deterministic Conditionals using Re-approximations Rafael Rumí, Antonio Salmerón Department of Statistics and Applied Mathematics University of Almería,
More informationDensity Propagation for Continuous Temporal Chains Generative and Discriminative Models
$ Technical Report, University of Toronto, CSRG-501, October 2004 Density Propagation for Continuous Temporal Chains Generative and Discriminative Models Cristian Sminchisescu and Allan Jepson Department
More informationA UNIVERSAL SEQUENCE OF CONTINUOUS FUNCTIONS
A UNIVERSAL SEQUENCE OF CONTINUOUS FUNCTIONS STEVO TODORCEVIC Abstract. We show that for each positive integer k there is a sequence F n : R k R of continuous functions which represents via point-wise
More informationY. Xiang, Inference with Uncertain Knowledge 1
Inference with Uncertain Knowledge Objectives Why must agent use uncertain knowledge? Fundamentals of Bayesian probability Inference with full joint distributions Inference with Bayes rule Bayesian networks
More informationVariable Elimination: Algorithm
Variable Elimination: Algorithm Sargur srihari@cedar.buffalo.edu 1 Topics 1. Types of Inference Algorithms 2. Variable Elimination: the Basic ideas 3. Variable Elimination Sum-Product VE Algorithm Sum-Product
More informationThe cocycle lattice of binary matroids
Published in: Europ. J. Comb. 14 (1993), 241 250. The cocycle lattice of binary matroids László Lovász Eötvös University, Budapest, Hungary, H-1088 Princeton University, Princeton, NJ 08544 Ákos Seress*
More informationEfficient Sensitivity Analysis in Hidden Markov Models
Efficient Sensitivity Analysis in Hidden Markov Models Silja Renooij Department of Information and Computing Sciences, Utrecht University P.O. Box 80.089, 3508 TB Utrecht, The Netherlands silja@cs.uu.nl
More informationA Class of Star-Algebras for Point-Based Qualitative Reasoning in Two- Dimensional Space
From: FLAIRS- Proceedings. Copyright AAAI (www.aaai.org). All rights reserved. A Class of Star-Algebras for Point-Based Qualitative Reasoning in Two- Dimensional Space Debasis Mitra Department of Computer
More informationPartial Order MCMC for Structure Discovery in Bayesian Networks
Partial Order MCMC for Structure Discovery in Bayesian Networks Teppo Niinimäki, Pekka Parviainen and Mikko Koivisto Helsinki Institute for Information Technology HIIT, Department of Computer Science University
More informationTowards an extension of the PC algorithm to local context-specific independencies detection
Towards an extension of the PC algorithm to local context-specific independencies detection Feb-09-2016 Outline Background: Bayesian Networks The PC algorithm Context-specific independence: from DAGs to
More informationA new Approach to Drawing Conclusions from Data A Rough Set Perspective
Motto: Let the data speak for themselves R.A. Fisher A new Approach to Drawing Conclusions from Data A Rough et Perspective Zdzisław Pawlak Institute for Theoretical and Applied Informatics Polish Academy
More informationMODULE -4 BAYEIAN LEARNING
MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities
More informationIssues in Modeling for Data Mining
Issues in Modeling for Data Mining Tsau Young (T.Y.) Lin Department of Mathematics and Computer Science San Jose State University San Jose, CA 95192 tylin@cs.sjsu.edu ABSTRACT Modeling in data mining has
More informationBayesian Networks: Representation, Variable Elimination
Bayesian Networks: Representation, Variable Elimination CS 6375: Machine Learning Class Notes Instructor: Vibhav Gogate The University of Texas at Dallas We can view a Bayesian network as a compact representation
More informationThe Coin Algebra and Conditional Independence
The Coin lgebra and Conditional Independence Jinfang Wang Graduate School of Science and Technology, Chiba University conditional in dependence Kyoto University, 2006.3.13-15 p.1/81 Multidisciplinary Field
More informationLecture Notes 1: Vector spaces
Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector
More informationARTICLE IN PRESS. Journal of Multivariate Analysis ( ) Contents lists available at ScienceDirect. Journal of Multivariate Analysis
Journal of Multivariate Analysis ( ) Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva Marginal parameterizations of discrete models
More informationFuzzy Rough Sets with GA-Based Attribute Division
Fuzzy Rough Sets with GA-Based Attribute Division HUGANG HAN, YOSHIO MORIOKA School of Business, Hiroshima Prefectural University 562 Nanatsuka-cho, Shobara-shi, Hiroshima 727-0023, JAPAN Abstract: Rough
More informationBayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida
Bayesian Statistical Methods Jeff Gill Department of Political Science, University of Florida 234 Anderson Hall, PO Box 117325, Gainesville, FL 32611-7325 Voice: 352-392-0262x272, Fax: 352-392-8127, Email:
More informationAn Empirical Investigation of the K2 Metric
An Empirical Investigation of the K2 Metric Christian Borgelt and Rudolf Kruse Department of Knowledge Processing and Language Engineering Otto-von-Guericke-University of Magdeburg Universitätsplatz 2,
More informationPossibilistic Networks: Data Mining Applications
Possibilistic Networks: Data Mining Applications Rudolf Kruse and Christian Borgelt Dept. of Knowledge Processing and Language Engineering Otto-von-Guericke University of Magdeburg Universitätsplatz 2,
More informationBinary Decision Diagrams. Graphs. Boolean Functions
Binary Decision Diagrams Graphs Binary Decision Diagrams (BDDs) are a class of graphs that can be used as data structure for compactly representing boolean functions. BDDs were introduced by R. Bryant
More informationExpectation Propagation in Factor Graphs: A Tutorial
DRAFT: Version 0.1, 28 October 2005. Do not distribute. Expectation Propagation in Factor Graphs: A Tutorial Charles Sutton October 28, 2005 Abstract Expectation propagation is an important variational
More informationAn Introduction to Bayesian Machine Learning
1 An Introduction to Bayesian Machine Learning José Miguel Hernández-Lobato Department of Engineering, Cambridge University April 8, 2013 2 What is Machine Learning? The design of computational systems
More informationFusion in simple models
Fusion in simple models Peter Antal antal@mit.bme.hu A.I. February 8, 2018 1 Basic concepts of probability theory Joint distribution Conditional probability Bayes rule Chain rule Marginalization General
More informationPublished in: Tenth Tbilisi Symposium on Language, Logic and Computation: Gudauri, Georgia, September 2013
UvA-DARE (Digital Academic Repository) Estimating the Impact of Variables in Bayesian Belief Networks van Gosliga, S.P.; Groen, F.C.A. Published in: Tenth Tbilisi Symposium on Language, Logic and Computation:
More informationUtilitarian Preferences and Potential Games
Utilitarian Preferences and Potential Games Hannu Salonen 1 Department of Economics University of Turku 20014 Turku Finland email: hansal@utu.fi Abstract We study games with utilitarian preferences: the
More informationProbabilistic and Logistic Circuits: A New Synthesis of Logic and Machine Learning
Probabilistic and Logistic Circuits: A New Synthesis of Logic and Machine Learning Guy Van den Broeck KULeuven Symposium Dec 12, 2018 Outline Learning Adding knowledge to deep learning Logistic circuits
More informationNon-impeding Noisy-AND Tree Causal Models Over Multi-valued Variables
Non-impeding Noisy-AND Tree Causal Models Over Multi-valued Variables Yang Xiang School of Computer Science, University of Guelph, Canada Abstract To specify a Bayesian network (BN), a conditional probability
More informationKernel Methods and Support Vector Machines
Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete
More informationAN ALGEBRAIC APPROACH TO GENERALIZED MEASURES OF INFORMATION
AN ALGEBRAIC APPROACH TO GENERALIZED MEASURES OF INFORMATION Daniel Halpern-Leistner 6/20/08 Abstract. I propose an algebraic framework in which to study measures of information. One immediate consequence
More informationIntroduction to Probability Theory, Algebra, and Set Theory
Summer School on Mathematical Philosophy for Female Students Introduction to Probability Theory, Algebra, and Set Theory Catrin Campbell-Moore and Sebastian Lutz July 28, 2014 Question 1. Draw Venn diagrams
More informationBeyond Uniform Priors in Bayesian Network Structure Learning
Beyond Uniform Priors in Bayesian Network Structure Learning (for Discrete Bayesian Networks) scutari@stats.ox.ac.uk Department of Statistics April 5, 2017 Bayesian Network Structure Learning Learning
More informationAn Empirical-Bayes Score for Discrete Bayesian Networks
An Empirical-Bayes Score for Discrete Bayesian Networks scutari@stats.ox.ac.uk Department of Statistics September 8, 2016 Bayesian Network Structure Learning Learning a BN B = (G, Θ) from a data set D
More information7 RC Simulates RA. Lemma: For every RA expression E(A 1... A k ) there exists a DRC formula F with F V (F ) = {A 1,..., A k } and
7 RC Simulates RA. We now show that DRC (and hence TRC) is at least as expressive as RA. That is, given an RA expression E that mentions at most C, there is an equivalent DRC expression E that mentions
More information