Charles University in Prague Faculty of Mathematics and Physics Department of Probability and Mathematical Statistics. Abstract of Doctoral Thesis
|
|
- Derrick Lewis
- 5 years ago
- Views:
Transcription
1 Charles University in Prague Faculty of Mathematics and Physics Department of Probability and Mathematical Statistics Abstract of Doctoral Thesis INDEPENDENCE MODELS Mgr. Petr Šimeček Supervisor: RNDr. Milan Studený, DrSc., ÚTIA AV ČR Study program: mathematics Study branch: m4, probability and mathematical statistics Prague 2007
2 Univerzita Karlova v Praze Matematicko-fyzikální fakulta Katedra pravděpodobnosti a matematické statistiky Autoreferát dizertační práce NEZÁVISLOSTNÍ MODELY Mgr. Petr Šimeček Školitel: RNDr. Milan Studený, DrSc., ÚTIA AV ČR Studijní program: matematika Studijní obor: m4, pravděpodobnost a matematická statistika Praha 2007
3 Tato dizertační práce byla vypracována v rámci doktorského studia, které uchazeč absolvoval na Matematicko-fyzikální fakultě Univerzity Karlovy v letech Uchazeč: Školitel: Oponenti: Mgr. Petr Šimeček RNDr. Milan Studený, DrSc. Ústav teorie informace a automatizace Akademie věd České republiky Pod Vodárenskou věží 4, Praha 8 Doc. RNDr. Petr Lachout, CSc. Katedra pravděpodobnosti a matematické statistiky Matematicko-fyzikální fakulta Univerzity Karlovy Sokolovská 83, Praha 8 Ing. František Matúš, CSc. Ústav teorie informace a automatizace Akademie věd České republiky Pod Vodárenskou věží 4, Praha 8 Autoreferát byl rozeslán dne Obhajoba se koná dne v 12:00 před komisí pro obhajoby dizertačních prací v oboru m4 - Pravděpodobnost a matematická statistika na Matematickofyzikální fakultě Univerzity Karlovy, Sokolovská 83, Praha 8, v posluchárně KPMS Praktikum. S dizertací je možno se seznámit na Útvaru doktorandského studia Matematickofyzikální fakulty Univerzity Karlovy, Ke Karlovu 3, Praha 2. Předseda oborové rady: Prof. RNDr. Marie Hušková, DrSc. Katedra pravděpodobnosti a mat. statistiky Matematicko-fyzikální fakulta Univerzity Karlovy Sokolovská 83, Praha 8
4 Contents Contents Introduction Basic Concepts Conditional Independence Gaussian Distributional Framework Discrete Distributional Framework Independence Models Graphical Independence Models No Finite Characterization Exists Representable Models Over N = {1, 2, 3, 4} Gaussian Distributional Framework Discrete Distributional Framework Discrete representability Positive binary representability Positive discrete representability Probability Distributions Over an Independence Model Connection Between Graphical Independence Models and Exponential Families In Regular Distributional Framework Model Search Simulation Study References List of Publications
5 Introduction An essential part of probabilistic reasoning is the theory of graphical models endowing a multivariate probabilistic distribution with the independence structure. An interesting question originally coming from J. Pearl is the problem of probabilistic representability, i.e. for which lists of conditional independence constraints there exists a random vector satisfying these and only these independences. In this work the known results are collected and the problem is further studied in several distributional frameworks. The focus is particularly on a special case of four random variables where the solution is found for a general Gaussian distribution (extending the result of R. Lněnička) and partially for discrete and binary distributions. At the end the achieved results are used to derive the variance matrix estimator in a case of regular Gaussian distribution. This abstract covers briefly the contents of the thesis, omitting proofs and focusing on the most important topics. Lemmas and Theorems is not necessarily in the same order as in the thesis (see references). 1 Basic Concepts For the reader s convenience, auxiliary results related to probability theory and independence models are recalled in this section. The object of our interest will be a random vector ξ = (ξ a ) a N indexed by a finite set N = {1,2,...,n}. Its distribution and density with respect to some product σ finite measure will be denoted by P and f. For A N, a subvector (ξ a ) a A is denoted by ξ A ; ξ is presumed to be a constant. Analogously, P A and f A are a marginal distribution and density; x A is an appropriate coordinate projection of a constant vector x = (x a ) a N. A singleton {a} will be shorten by a and an union of sets A B will be written simply as juxtaposition AB. For a square matrix Σ, let Σ 1, Σ, Σ and rank(σ) denote its inverse, generalized inverse, determinant and rank, i.e. the dimension of the space spanned by its rows (or columns), respectively. 1.1 Conditional Independence Provided A,B,C are pairwise disjoint subsets of N, ξ A ξ B ξ C stands for a statement ξ A and ξ B are conditionally independent given ξ C. In particular, unconditional independence (C = ) is abbreviated as ξ A ξ B. The meaning of conditional independence (CI) statement ξ A ξ B ξ C is as follows: if the value of ξ C is known then knowledge about ξ A is irrelevant (useless for prediction,... ) for knowledge of ξ B and vise versa. There are many equivalent ways to formalize this definition. Let us list a few of them in Lemma 1 (see [2], pp. 29 for a detailed discussion). 2
6 Lemma 1. If A,B and C are pairwise disjoint subset of N then statements i) vi) are equivalent: i) f AB C (x AB x C ) = f A C (x A x C ) f B C (x B x C ) ii) f ABC (x ABC ) f C (x C ) = f AC (x AC ) f BC (x BC ) iii) f A BC (x A x BC ) = f A C (x A x C ) iv) f AC B (x AC x B ) = f A C (x A x C ) f C B (x C x B ) v) f ABC (x ABC ) = f A C (x A x C ) f BC (x BC ) vi) g,h functions : f ABC (x ABC ) = g(x AC ) h(x BC ) where f A B denotes a conditional density f A B (x A x B ) = f AB (x AB )/f B (x B ) which is defined under a condition f B (x B ) > 0. Each of equations above should be understood in a sense that it holds for all x such that all conditional densities (in the equation) are defined. Moreover, if multiinformation 1 (MI) of ξ is finite then an another characterization of CI exists (see [15], pp for a proof): Lemma 2. If ξ is a random vector with finite MI and A,B,C are pairwise disjoint subsets of N then and MI(ξ ABC ) + MI(ξ C ) MI(ξ AC ) MI(ξ BC ) 0 MI(ξ ABC ) + MI(ξ C ) MI(ξ AC ) MI(ξ BC ) = 0 ξ A ξ B ξ C. If ξ does not have a density with respect to the given σ finite measure then generalized conditional independence ξ A ξ B ξ C might be defined as L(ξ AB ξ C = x C ) = L(ξ A ξ C = x C ) L(ξ B ξ C = x C ) P C a.e. i.e. a conditional distribution ξ AB ξ C is a product of corresponding distributions of ξ A and ξ B given ξ C. This generalization will be used just in a case of a Gaussian distribution which variance matrix is not regular. As following Lemma 3 shows any CI statement can be written in a form of elementary CI statements, i.e. as a collection (ξ Ai ξ Bi ξ Ci ) i such that A i = B i = 1 for all i. Lemma 3. Let A,B,C be pairwise disjoint subsets of N: (ξ A ξ B ξ C ) ( a A,b B,D ABC \ {a,b} : C D ξ a ξ b ξ D ). Proof. See [4], Lemma 3 for a proof. 1 That is Kullback Leibler divergence ( relative entropy) between a joint distribution and a product of one dimensional marginals. 3
7 1.2 Gaussian Distributional Framework Definition. A Gaussian distribution of a random vector ξ = (ξ 1,...,ξ n ) is a probability distribution specified by its characteristic function ( ) ϕ ξ (t) = E exp(it ξ) = exp it µ t Σt, 2 where µ R n and a symmetric positive semi definite matrix Σ R n R n are mean and variance parameters, respectively. If Σ is positive definite (Σ > 0), i.e. regular, the distribution is called regular Gaussian distribution and has a density with respect to Lebesgue measure ( 1 f(x) = exp 1 ) (2π) n 2 Σ (x µ) Σ 1 (x µ) and finite multiinformation MI(ξ) = 1 2 ( log( Σ ) ) n log σ a a. On the other hand side if Σ is not regular then ξ takes values in an affine subspace of R n, e.g. exists a R n such that a=1 a (ξ µ) = 0 (a.s.), (1) see [1], pp. 71; neither a density with respect to Lebesgue measure exists nor multiinformation is generally finite. Given a matrix Σ = (σ a b ) a,b {1,...,n} and A,B subsets of {1, 2,..., n}, the submatrix with A rows and B columns will be denoted by Σ A B = (σ a b ) a A, b B. The following lemma shows that both marginal and conditional distributions of a Gaussian distribution are also Gaussian. Lemma 4. Let A and B be disjoint subsets of N = {1, 2,...,n}. i) The marginal distribution ξ A is Gaussian distribution with the variance matrix Σ A A. ii) The conditional distribution of ξ A given ξ B = x B is a Gaussian distribution with the variance matrix Σ A A B = Σ A A Σ A B Σ B B Σ B A, where Σ B B is any generalized inverse of Σ B B. iii) Moreover, if Σ > 0 then also Σ A A > 0 and Σ A A B > 0. 4
8 Proof. See [2], pp. 256 for a proof of i) and ii). Regularity of Σ A A is obvious, Cholesky decomposition of Σ > 0 yields to Σ A A B > 0. Let us emphasize that the variance matrix of conditional distribution Σ A A B does not depend on the choice of x B. Lemma 5. Let a and b be distinct elements of {1,...,n} and C {1,...,n}\ab: i) ξ a ξ b σ a b = 0. ii) If Σ C C > 0, then ξ a ξ b ξ C Σ ac bc = 0. iii) If D C such that Σ D D > 0 and rank(σ D D ) = rank(σ C C ), then ξ a ξ b ξ C ξ a ξ b ξ D Σ ad bd = 0. iv) If Σ > 0 then ξ a ξ b ξ C ( Σ 1) N\(aC) N\(bC) = 0. Proof. The first part is a well-known fact (cf. [2], pp.257). The second comes from Cholesky decomposition yielding Σ ac bc = 0 σ a b Σ a C Σ 1 C C Σ C b = 0 (Σ ab ab C ) a b = 0 ξ a ξ b ξ C. The third part follows from Lemma 4 ii) by constructing the generalized inverse (cf. [9], section 1b.5) as follows ( Σ Σ 1 C C = D D 0 ). 0 0 See [3] for a proof of iv). 1.3 Discrete Distributional Framework Definition. A random vector ξ = (ξ 1,...,ξ n ) is called discrete if each ξ a takes values in a state space X a such that 1 < X a <. If a, X a = 2 then ξ is called binary. In a binary case, without loss of generality we will assume a, X a = { 1, 1}. Definition. A discrete random vector ξ is called positive if x X = n X a : 0 < P(ξ = x) < 1. a=1 5
9 In a case of discretely distributed random vector, MI is always finite and variables ξ a and ξ b are conditionally independent given ξ C if and only if x abc X abc : P(ξ abc = x abc )P(ξ C = x C ) = P(ξ ac = x ac )P(ξ bc = x bc ). (2) As shown in the thesis the class of binary distributions can be parametrized by its moments A N as follows e A = E ξ a = x a, (3) a A x N { 1,1} N P(ξ N = x N ) a A e is presumed to be one. The parametrization (3) implies ( ) ( A 1 P(ξ A = x A ) = x b )e B 2 B A and CI condition (2) may be written in a form: ξ a ξ b ξ C if and only if x C { 1, 1} C : ( ) ( ) ( ) ( ) e abd x d e D x d = e ad x d e bd x d. D C d D D C d D D C b B CI statements ξ a ξ b ξ C, C = 0, 1, 2 are bellow: ξ a ξ b e ab = e a e b d D ξ a ξ b ξ c e abc + e ab e c = e a e bc + e b e ac e ab + e abc e c = e ac e bc + e a e b D C d D ξ a ξ b ξ cd e abcd + e abc e d + e abd e c + e ab e cd e abc + e abd e cd + e ab e c + e d e abcd e abd + e abc e cd + e ab e d + e c e abcd e ab + e c e abc + e d e abd + e cd e abcd = e a e bcd + e b e acd + e ac e bd + e ad e bc = e ad e bcd + e bd e acd + e a e bc + e b e ac = e ac e bcd + e bc e acd + e a e bd + e b e ad = e a e b + e ac e bc + e ad e bd + e acd e bcd 1.4 Independence Models Let N = {1, 2,...,n} be a finite set and T N denotes the set of all triplets ab C such that ab is an (unordered) pair of distinct elements of N and C N \ ab. Definition. Subsets of T N will be referred here as formal independence models over N. 6
10 The independence model I(ξ) induced by a random vector ξ = (ξ 1,...,ξ n ) is the independence model over {1, 2,..., n} defined as follows I(ξ) = { xy Z ; ξ x ξ y ξ Z }. Let us emphasize that an independence model I(ξ) uniquely determines also all other conditional independencies among subvectors of ξ (due to Lemma 3). Definition. An independence model I is said to be Gaussian (g ), regular Gaussian (g + ), discrete (d ), positive discrete (d + ), binary (b ) and positive binary (b + ) representable if there exists a random vector ξ such that it is Gaussian, regular Gaussian, discrete, positive discrete, binary and positive binary, respectively, and I = I(ξ). Lemma 6. Let a,b,c be distinct elements of N = {1,...,n} and D N \ abc. If an independence models I over N is either discrete or Gaussian representable, then ( { ab cd, ac D } I ) ( { ac bd, ab D } I ). (4) Moreover, if I is positively discrete or regular Gaussian representable, then ( { ab cd, ac bd } I ) = ( { ab D, ac D } I ). (5) Proof. These are so called semigraphoid and graphoid properties, cf. [2] or [15] for more details. Diagrams proposed by R. Lněnička will be used for a visualisation of independence model I over N such that N 4. Each element of N is plotted as a dot. If ab I then dots corresponding to a and b are joined by a line. If ab c I then we put a line between dots corresponding to a and b and add small line in the middle pointing in c direction. If both ab c and ab d are elements of I, then only one line with two small lines in the middle is plotted. Finally, if ab cd I is visualised by a brace between a and b. For instance, see a diagram of I = { 12, 23 1, 23 4, 34 12, 14, 14 2 } on Figure { 3 Fig. 1: Diagram of an independence model. 7
11 Two independence models I and J over N and M are isomorphic if there exists a bijective mapping π from N to M such that xy Z I π(x)π(y) π(z) J, where π(z) stands for {π(z); z Z}. See Figure 2 for an example of three isomorphic models. An equivalence class of independence models with respect to the isomorphic relation will be referred as type. It is easy to see that models of the same type are either all representable or none of them is representable. Consequently, we can classify the entire type as representable or non-representable. Fig. 2: Example of three isomorphic models. Definition. If I is an independence model over N = {1,...,n} and E,F are disjoint subsets of N then a minor I [ F E is an independence model over N \ EF I [ F E = { ab C ; E (abc) =, ab CF I}. Minorization of independence models is an analogy to operations marginalization and conditioning of distributions. Note that I [ = I and (I [F 1 E 1 ) [ F 2 E 2 = I [ F 1F 2 E 1 E 2. In the discrete distributional framework, the intersection of representable models is also representable (see Lemma 21, pp. 46 in the thesis or [17], pp. 5., for a proof): Lemma 7. If I and I are d representable (d + representable) models, then I I is d representable (d + representable). Lemmas 7 follows that if I is a d representable (d + representable) independence model, then all its minors I [ F E are also d representable (d+ representable); i.e. the classes of d /d + representable models are closed to the operation of minorization (cf. Lemma 22, pp. 46 in the thesis). Also the classes of g /g + representable models are closed to the minors 2 due to Lemma 4 (cf. Lemma 11, pp. 26 in the thesis). 2 However, this property does not hold for the classes of b /b + representable models: I = { ab cd, ab d } is b + representable but I [ d = { ab c, ab } is not b representable. 8
12 1.5 Graphical Independence Models In the applications, the independence model determining a class of probability distributions is not usually given as a list of prescribed conditional independencies but as an undirected or directed acyclic graph. That is not only for interpretation reasons 3 but also because of nice theoretical properties of such models (like exact formulas for ML estimates and computationally efficient ways to evaluate conditional distributions in case of decomposable models). The theory is described in details in [2] and [8]. For a short introduction see Chapter 4 in the thesis. Here, only undirected graphs and independence models derived by a global Markov property will be presented. Definition. An undirected graph G = (N,E) is a pair of a finite vertex set N and a set of undirected edges between these vertices, ( ) N E = {ab; a,b N,a b}. 2 Usually, vertices are visualized as dots and edges as lines between them (see Figure 3). A path between two distinct vertices a,b N is a sequence of k 2 vertices a = v 1,..., v k = b such that v i v i+1 E, i = 1,2,...,k 1. Let us assume a,b N are two distinct vertices and C N \ ab. The vertices a and b is said to be separated by C (in symbols a b C) if every path between a and b contains a vertex from C. Fig. 3: An undirected graph G. Definition. An independence model derived from a graph 4 G = (N,E) is an independence model over N defined as follows I(G) = { ab C ; a b C in a graph G}. 3 An edge stands for a significant correlation, an arrow means direct causality. 4 There exist more ways how to derive a model from a graph, see pp. 61 in the thesis. For a simplicity reason we will now restrict ourselves to so called global separation criterion. 9
13 Example. Consider a graph G = (N,E) on Figure 3. The vertex set is N = {1, 2, 3, 4, 5}; the graph contains edges E = {12, 13, 23, 24, 34, 35}. Every two vertices are connected by a path. Vertices 2 and 5 are separated by a vertex 3 or by a set of vertices 14 but not just by vertex 1 or 4. An independence model derived from the graph G is I(G) = { 14 23, , 15 3, 15 34, 15 23, , 25 3, 25 34, 25 13, , 45 3, 45 13, 45 23, }. The main drawback of such approach is that there are much more representable models than just graphical ones. For instance, in the case N = 4 there are 11 graphical types only while there are 80 g representable, 53 g + representable, 1098 d representable and at least 304 d + representable types. On the other hand side, it is unfeasible to fit data to the most of non graphical independence models. 2 No Finite Characterization Exists Definition. An independence model J over M is forbidden minor for a class of independence models C if no model in C has M minor isomorphic to J. A class of independence models C is characterized by a finite collection of forbidden minors D if any independence model I satisfies that I C if and only if no minor of I is isomorphic to any element of D. Note that other possibilities how to definite finite characterization exist like a characterization by special inference rules (cf. [17]). Theorem 1. None of the classes of g /g + /d /d + /b /b + representable models can be characterized by a finite collection of forbidden minors. The Theorem 1 is proved at the end of Chapters 2 and 3 in the thesis, Theorems 3 and 5 (or see [11]). 3 Representable Models Over N = {1, 2, 3, 4} For N consisting of less than four elements, the problem of representability is easy to solve. All models over N 2 are representable in any above mentioned sense. Potential candidates for representability over N = {1, 2, 3} are models fulfilling properties (4) and (5) from Lemma 6 that fall into 10 types (8 types in case of positive representability). Regarding d /d + representability there are no other constraints (all models representable). Regarding b /b + representability I = { ab c, ab } is not binary representable (cf. Lemma 19, pp. 42 in the thesis). Regarding g /g + representability I = { ab c, ab }, I = { ab, ac } and I = { ab, ac, bc } are not Gaussian representable (cf. Lemma 10, pp. 24 in the thesis). That is why we focus on N = {1, 2, 3, 4} from now to the end of the section. 10
14 3.1 Gaussian Distributional Framework The g + representable models must necessarily have g + representable minors. The list of all types on N = {1, 2, 3, 4} that have g + representable minors are plotted in Figure 4 on page 12. As shown in [3], the last 5 ones, M54 M58, are not g + representable. For the rest of them the corresponding variance matrix has been found by a search through a space of all integer positive finite matrices. Therefore, there are 629 g + representable independence models corresponding to 53 types, M1 M53. For the general Gaussian representability, in [12] (or the thesis, Lemma 13, pp. 29) the following two properties are shown: Lemma 8. Let I be a g representable independence model and a, b, c and d distinct elements of N. i) if { ab c, ac b } I then either ξ b ξ c or { ab, ac } I, ii) if { ab cd, ac bd } I, then either ξ b ξ c or { ab d, ac d } I or ad bc I, where ξ b ξ c stands for functional dependence 5 between ξ b and ξ c yielding { ab c, bd c, ac b, cd b, ab cd, bd ac, ac bd, cd ab } I. M1 M88 in Figure 4 and 5 are types having g representable minors and not contradicting Lemma 8. However, it is proved in [12] (or the thesis, Lemma 14, pp. 30) that just M54 M58 and M86 M88 are not g representable. Therefore, there are 877 g representable independence models falling to 80 types. Achieved results are summarized in the following theorem: Theorem 2. There exist 877 g representable independence models over N = 4 assorting into 80 types. Among them there are 629 g + representable independence models (53 types) and these are just the models meeting a condition (5). There exists a conjecture that all g + representable independence models are b + representable. To show this on N = 4 b + representations of M25 and M37 are needed. The former analogous conjecture that all g representable independence models are b representable was rejected due to M71 which is g representable but cannot be represented by a discrete distribution (or any other distribution with finite multiinformation). The problem is listed in Open problem chapter of [15]. 5 i.e. correlation between ξ b and ξ c equals ±1 11
15 Fig. 4: List of types M1 M53 and their Gaussian representations. 12
16 Fig. 5: List of types M54 M80 and their Gaussian representations. 13
17 3.2 Discrete Distributional Framework The problem of (general) discrete representability has been solved by F. Matúš in [5], [6] and [7]. See [16] for an overview of the results. Only partial results exist for d + representability and b /b + representability at this moment (cf. [14] or Chapter 3 in the thesis) Discrete representability In brief, due to Lemma 7 an intersection of two d representable models is also d representable. As a result, the class of all d representable models over N can be described by a set C of so called irreducible models, i.e. nontrivial 6 d representable models that cannot be written as an intersection of other two d representable models. It is not difficult to evidence that a nontrivial independence model I is d representable if and only if there exists A C such that I = C A C. As shown in [5], [6] and [7] there are only only 13 types of irreducible models, see Figure 6. Fig. 6: Irreducible types. Theorem 3. There exist d representable independence models assorting into 1098 types. 6 set of all triplets over N, i.e. T N 14
18 3.2.2 Positive binary representability There are many special properties of b + representable models. A few of them are listed below (proved in Lemma 20, pp. 43 in the thesis): Lemma 9. If I is b + representable independence model over N and a,b,c,d are distinct elements of N then i) { ab cd, ab c, ad, cd } I = { ad c, cd a } I bd I ii) { ab cd, ab c, ad, bd } I = { ad b, bd a } I cd I iii) { ab cd, ab, bc, cd, ad } I = { ad c, cd a } I { bd c, cd b } I iv) { ab cd, ad b, cd b } I = { bc, cd, bc d } I ab c I v) { ab cd, ab, bc, bd, ab c, bc a } I = ad I ac I cd I ad c I { ab d, bd a } I vi) { ab cd, ab d, bc a } I = ac I ad I cd I cd a I ac d I Using a parametrization from Section 1 a search for rational representations was performed. The results are 139 b + representable types. However, a lot of undecided types with respect to b + representability still exist Positive discrete representability On the one hand, all d + representable independence models must be d representable models and must fulfill a condition (5). There are 5547 such a models (356 types). On the other hand, all b + representable models are d + representable. If we close 139 b + representable types found above to intersection (thanks to Lemma 7), we obtained 4555 d + representable models (299 types). The = 57 undecided types are shown on Figure 7. Further, types P12, P49, P50, P52, P54 (5 types,52 models) were found d + representable just by search for some representation (with a little help from computational software). We may conclude (Theorem 4, pp. 51 in the thesis): Theorem 4. There exist between 4607 and 5547 d + representable independence models ( types). It has been conjectured that all 5547 models (356 types) are d + representable. To prove this representations of types P2, P3, P19, P24, P30, P32, P37, P39, P41, P45, P46 and P57 from Figure 7 have to be found. 15
19 Fig. 7: d representable types meeting (5) but not found as an intersection of 139 types known to be b + representable. 16
20 4 Probability Distributions Over an Independence Model Definition. If I is an independence model then a family of distributions of a random vector ξ such that I I(ξ) will be denoted by P(I). Usually, we fix the distributional framework. This will be indicated by adding a lower index to P. For example P g ({ 12 }) is a family of all Gaussian distributions ξ such that ξ 1 ξ 2 (i.e. an element of a variance matrix σ 1 2 = 0). 4.1 Connection Between Graphical Independence Models and Exponential Families In Regular Distributional Framework In the regular Gaussian distributional framework there is an interesting connection between graphical independence models and exponential families: All graphical models are g + representable and families of (regular) Gaussian distributions over them are exponential as stated in the following lemma (cf. [2]). Lemma 10. If G is an undirected graph then P g +(I(G)) is exponential. More important, this property can be reversed. A sketch of a proof is included in [13] but the idea comes from an oral communication with F. Matúš. Lemma 11. If P g +(I) is exponential then there exists an undirected graph G such that I = I(G). 4.2 Model Search Searching for an appropriate statistical model based on data consists of two processes: parametrization of P(I) and estimation of parameters (usually by maximalization of log-likelihood function) over P(I) for I fixed; and searching the right independence model I either by CI testing or by maximization of some information criterion (like AIC or BIC). For an overview see Section 4.2 and 4.3 in the thesis. Consider now N = 4 and a regular Gaussian distributional framework to be able to parameterize all representable models. There are 53 g + representable types but only 11 of them are graphical. Let X be a dataset with 4 columns and k lines (observations). An estimate of a mean parameter will be the same for all models ˆµ a = 1 k k X ba, a = 1,...,4. b=1 17
21 The variance matrix Σ may be parameterized as σ Σ = 0 σ σ 33 0 Σ σ 44 where Σ 0 = 1 ρ 12 ρ 13 ρ 14 ρ 21 1 ρ 23 ρ 24 ρ 31 ρ 32 1 ρ 34 ρ 41 ρ 42 ρ 43 1 σ σ σ σ 44 Often parametrization by an inverse variance matrix V = Σ 1 is more natural. Analogously as above V may be parameterize as v v V = 0 v v 33 0 V 0 v v 33 0, v v 44 where V 0 is a one diagonal matrix. Let us denote V 0 = (w ij ) i,j=1,...,4. The correspondence of independence restrictions on Σ and V has been described in Lemma 5 iv). The parameters σ 11,...,σ 44 can be estimated as., ˆσ 2 aa = 1 k k (X ba ˆµ a ) 2, a = 1,...,4; b=1 analogously, v 11,...,v 44 can be estimated as diagonal of maximum likelihood Σ 1 estimate in saturated model (no independence restrictions). Constraints for the rest of parameters according to the type M1 M53 (see Figure 4, pp. 12) are listed below. The estimates of non diagonal elements of Σ 0 or V 0 must be found by some iterative optimization method (like optim in R environment, [10]). M1) ρ 12 = 0 M2) ρ 12 = ρ 14 ρ 24 M3) ρ 12 = ρ 14ρ 24 +ρ 13 ρ 23 ρ 13 ρ 24 ρ 34 ρ 14 ρ 23 ρ 34 1 ρ 2 34 M4) ρ 12 = 0, ρ 13 = ρ 14(ρ 23 ρ 34 ρ 24 ) ρ 24 ρ 34 ρ 23 M5) ρ 12 = 0, ρ 34 = ρ 23(1 ρ 2 14 )+ρ 13ρ 24 ρ 14 ρ 24 M6) ρ 12 = 0, ρ 34 = ρ 13 ρ 14 + ρ 23 ρ 24 18
22 M7) ρ 12 = 0, ρ 34 = ρ 14 ρ 13 M8) ρ 12 = ρ 14 ρ 24, ρ 34 = ρ 14 ρ 13 M9) w 12 = 0, w 34 = 0 M10) ρ 12 = ρ 14 ρ 24, ρ 13 = ρ 34+ρ 23 ρ 24 ρ 2 14 ρ 23ρ 24 ρ 2 14 ρ2 24 ρ 34 ρ 14 (1 ρ 2 24 ) M11) ρ 12 = 0, ρ 23 = ρ 24 ρ 34 M12) ρ 12 = 0, ρ 34 = 0 M13) ρ 12 = ρ 14 ρ 24, ρ 13 = ρ 14ρ 24 ρ 23 M14) ρ 12 = ρ 14 ρ 24, ρ 13 = ρ 14(1 ρ 2 23 ρ2 24 +ρ 23ρ 24 ρ 34 ) ρ 34 ρ 23 ρ 24 M15) ρ 14 = ρ 13 ρ 34, ρ 12 = ρ 24 ρ 13 ρ 34 M16) w 14 = 0, w 23 = 0, w 12 = w 13w 24 w 34 1 w 2 34 M17) ρ 12 = 0, ρ 14 = ρ 13 ρ 34, ρ 23 = ρ 34(1 ρ 2 13 ) ρ 24 M18) ρ 12 = 0, ρ 34 = ρ 13 ρ 14, ρ 24 = ρ 14(1 ρ 2 13 ρ2 23 ) ρ 13 ρ 23 M19) ρ 12 = 0, ρ 34 = 0, ρ 13 = ρ 23(1 ρ 2 14 ) ρ 14 ρ 24 M20) w 12 = 0, w 34 = 0, w 13 = w 14w 24 w 23 M21) ρ 12 = 0, ρ 34 = ρ 13 ρ 14, ρ 23 = ρ 24ρ 14 (1 ρ 2 13 ) ρ 13 (1 ρ 2 14 ) M22) ρ 12 = 0, ρ 34 = ρ 23 ρ 24, ρ 13 = ρ 23ρ 24 ρ 14 M23) ρ 12 = 0, ρ 23 = ρ 24 ρ 34, ρ 14 = ρ 13 ρ 34 M24) ρ 12 = 0, ρ 34 = ρ 23 ρ 24, ρ 14 = ρ 13 ρ 23 ρ 24 M25) ρ 12 = 0, ρ 14 = ρ 13 ρ 34, ρ 23 = 1 ρ2 24 ρ2 34 ρ 24 ρ 34 M26) ρ 12 = ρ 14 ρ 24, ρ 13 = ρ 14ρ 24 ρ 23, ρ 34 = ρ 23 ρ 24 M27) ρ 12 = 0, ρ 14 = ρ 13 ρ 34, ρ 24 = ρ 23(1 ρ 2 13 ρ2 34 ) ρ 34 (1 ρ 2 13 ) M28) ρ 12 = ρ 14 ρ 24, ρ 23 = ρ 14 ρ 24 ρ 13, ρ 34 = ρ 14 ρ 2 24ρ 13 M29) w 14 = 0, w 12 = w 13 w 23, w 34 = w 23 w 24 M30) ρ 12 = 0, ρ 34 = 0, ρ 13 = ρ 14ρ 24 ρ 23 19
23 M31) w 23 = 0, w 14 = w 13 w 34, w 12 = w 13 w 34 w 24 M32) w 34 = 0, w 12 = w 14 w 24, w 13 = w 14w 24 w 23 M33) ρ 12 = 0, ρ 23 = 0 M34) w 12 = 0, w 23 = 0 M35) ρ 12 = 0, ρ 34 = 0, ρ 13 = ±ρ 24, ρ 14 = ρ 23 M36) ρ 34 = ρ 23 ρ 24, ρ 12 = ±ρ 23 ρ 24, ρ 13 = ±ρ 24, ρ 14 = ±ρ 23 M37) ρ 12 = 0, ρ 23 = ρ 24 ρ 34, ρ 13 = ± 1 ρ 2 24, ρ 14 = ±ρ 34 1 ρ 2 24 M38) ρ 12 = 0, ρ 14 = 0, ρ 23 = ρ 24(1 ρ 2 13 ) ρ 34 M39) ρ 12 = 0, ρ 23 = 0, ρ 13 = ρ 14 ρ 34 M40) w 14 = 0, w 24 = 0, w 13 = w 12(1 w 2 34 ) w 23 M41) w 14 = 0, w 24 = 0, w 12 = w 13 w 23 M42) ρ 12 = 0, ρ 13 = 0, ρ 23 = 0 M43) w 12 = 0, w 13 = 0, w 23 = 0 M44) ρ 12 = 0, ρ 23 = 0, ρ 14 = ρ 13 ρ 34 M45) ρ 12 = 0, ρ 23 = 0, ρ 34 = 0 M46) w 12 = 0, w 23 = 0, w 34 = 0 M47) ρ 12 = 0, ρ 13 = 0, ρ 14 = 0 M48) ρ 12 = 0, ρ 23 = 0, ρ 13 = 0, ρ 14 = 0 M49) w 12 = 0, w 13 = 0, w 23 = 0, w 14 = 0 M50) ρ 12 = 0, ρ 14 = 0, ρ 23 = 0, ρ 34 = 0 M51) ρ 12 = 0, ρ 13 = 0, ρ 14 = 0, ρ 23 = 0, ρ 34 = 0 M52) ρ 12 = 0, ρ 13 = 0, ρ 14 = 0, ρ 24 = 0, ρ 23 = 0, ρ 34 = 0 M53) saturated model (no constraints) The model selection can be done by AIC/BIC criterion. Dimension of a parameter space D equals 14 minus a number of independence constraints for the given type. For example, in case of M11 D = 14 2 = 12 because there are constraints on ρ 12 and ρ
24 4.3 Simulation Study The variance matrix estimates and model selection algorithm from the previous subsection have been implemented in R package yest attached to the thesis. Model selection process takes about 9 minutes (AMD Sempron 3500+, 1800 GHz, 1 GB RAM). Note that because the sufficient statistics for all estimates above is a maximum likelihood parameters estimate for saturated model, the size of a dataset does not highly influence the time of computation. Validity has been verified by the following algorithm: 1. For each of 53 types take one model (denoted by I) and randomly generate an inverse variance matrix corresponding to I (denoted by V ). 2. Generate k lines dataset, each line from multivariate normal distribution with zero mean parameter and variance matrix V Based on AIC/BIC select one of 629 g + representable independence models (denoted by J). 4. Model selection is called successful if I = J. Otherwise, it is not successful. Transcription to R follows I<-ind.identification(type=testovany.typ)$model V<-ind.rgauss(model=I) data<-generate.data(v,k) J<-model.selection(data)$model if (I==J) print( Success ) else print( Failure ) AIC BIC probabilityof success probabilityof success log(k) log(k) Fig. 8: Success rate of AIC and BIC model selection. 21
25 The simulation was repeated for each of 53 types and 15 possible value of k four times. A graph of average success rate in dependence to log(k) is shown on Figure 8. Failure of a model selection does not automatically imply bad behavior of a variance matrix estimate. The simulation study has been repeated but now Kullback Leibler divergence 7 between estimated and the true distribution has been measured. Four estimates have been evaluated: an estimate based on saturated model (the usual one), an estimate based on BIC model selection over graphical models, an estimate based on BIC model selection over all independence models and an estimate based on actual independence model. Results are visualized on Figure 9: trimmed 8 (20%) average difference between KL divergence of inspected estimate to the true model and KL divergence of estimated based on saturated model to the true model is drawn in dependence to a logarithm of a number of lines in a dataset k. Trimmed Mean Detail difference difference log(k) log(k) Fig. 9: Saturated model full line, graphical models dashed line, independence models dotted line, actual independence model dash-dotted line. We may conclude that to have a good chance (> 80%) to determine the true independence model based on BIC model selection we need about 50 thousands of observations. However, for such a large amount of data the quality of a variance estimator is similar as the quality of an estimator based on a saturated model (the usual one). It seems that the improvement (in a sense average KL divergence with comparison to an estimator based on saturated model) is significant if the number of observations is between 200 and It is obvious that the quality of 7 That is DIV (µ,σ, µ, Σ) = 1 2 ( log ( Σ Σ ) + tr ( Σ 1 Σ µ,σ, µ, Σ are parameters of the first and the second distribution. 8 Without trimming the figures are similar but a bit more messy. ) ) + ( µ µ) Σ 1 ( µ µ) 4, where 22
26 estimation of a variance matrix using just the graphical models is unsatisfactory if the true model is not graphical and the number of observations is below An interesting direction of future research might be study of distributional families above in terms of curved exponential families (to speed up the computation) and estimation of variance matrices for more that four Gaussian distributed random variables. 23
27 References [1] J. Anděl. Matematická statistika. SNTL/ALFA, [2] S. L. Lauritzen. Graphical Models. Oxford University Press, [3] R. Lněnička. On Gaussian conditional independence structures. Technical report, ÚTIA AV ČR, [4] F. Matúš. On equivalence of Markov properties over undirected graphs. Journal of Applied Probability, 29: , [5] F. Matúš and M. Studený. Conditional independences among four random variables I. Combinatorics, Probability & Computing, 4: , [6] F. Matúš. Conditional independences among four random variables II. Combinatorics, Probability & Computing, 4: , [7] F. Matúš. Conditional independences among four random variables III. Combinatorics, Probability & Computing, 8: , [8] R.E. Neapolitan. Learning Bayesian Networks. Chapman & Hall/CRC, [9] R. C. Rao. Linear Statistical Inference and Its Applications. John Wiley & Sons, [10] R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, ISBN [11] P. Šimeček. Classes of Gaussian, discrete and binary representable independence models have no finite characterization. In M. Hušková and M. Janžura, editors, Proceedings of Prague Stochastics 2006, volume CD, pages MATFYZPRESS, [12] P. Šimeček. Gaussian representation of independence models over four random variables. In A. Rizzi, editor, Proceedings of COMPSTAT 2006, volume CD, pages Springer, [13] P. Šimeček. Independence models. In J. Vejnarová, editor, Proceedings of Workshop on Uncertainty Processing 2006, pages University of Economics, [14] P. Šimeček. A short note on discrete representability of independence models. In M. Studený and J. Vomlel, editors, Proceedings of Workshop on Probabilistic Graphical Models 2006, pages Action M Agency,
28 [15] M. Studený. Probabilistic Conditional Independence Structures. Springer, [16] M. Studený and P. Boček. CI-models arising among 4 random variables. In Proceedings of the 3rd workshop WUPES, pages , [17] M. Studený and J. Vejnarová. The multiinformation function as a tool for measuring stochastic dependence. In M. I. Jordan, editor, Learning in Graphical Models, pages Kluwer,
29 List of Publications P. Šimeček. A short note on structure learning. In Jana Šafránková, editor, Proceedings of the 14th Annual Conference of Doctoral Students - WDS 2004, pp , Prague, Matfyzpress. P. Šimeček and M. Studený. Využití Hilbertovy báze k ověřování shodnosti strukturálních a kombinatorických imsetů. In Jaromír Antoch and Gejza Dohnal, editors, Sborník prací 13. letní školy JČMF ROBUST, pp Jednota českých matematiků a fyziků, P. Šimeček. On the minimal probability of intersection of conditionally independent events. Submitted to Kybernetika, P. Šimeček. Gene Expression Data Analysis for In Vitro Toxicology. In Jaromír Antoch and Gejza Dohnal, editors, Sborník prací 14. letní školy JČMF ROBUST, pp Jednota českých matematiků a fyziků, Č. Štuka and P. Šimeček. Studium souvislostí mezi úspěšností studia medicíny, známkami na střední škole a výsledky přijímacích zkoušek. In Sborník konference MEDSOFT Action M Agency, P. Šimeček. Classes of Gaussian, discrete and binary representable independence models have no finite characterization. In M. Hušková a M. Janžura, editors, Proceedings of Prague Stochastics 2006, CD volume, pp MATFYZPRESS, P. Šimeček. Gaussian representation of independence models over four random variables. V A. Rizzi, editor, Proceedings of COMPSTAT 2006, CD volume, pp Springer, P. Šimeček. A short note on discrete representability of independence models. In M. Studený and J. Vomlel, editors, Proceedings of Workshop on Probabilistic Graphical Models 2006, pp Action M Agency, P. Šimeček. Independence models. In J. Vejnarová, editor, Proceedings of Workshop on Uncertainty Processing 2006, pp University of Economics,
Graphical Models and Independence Models
Graphical Models and Independence Models Yunshu Liu ASPITRG Research Group 2014-03-04 References: [1]. Steffen Lauritzen, Graphical Models, Oxford University Press, 1996 [2]. Christopher M. Bishop, Pattern
More informationMATEMATICKO-FYZIKÁLNÍ FAKULTA UNIVERZITA KARLOVA. Katedra algebry GRUP A LUP. Petr Vojtěchovský. Školitel: doc. Aleš Drápal, CSc.
MATEMATICKO-FYZIKÁLNÍ FAKULTA UNIVERZITA KARLOVA Katedra algebry SOUVISLOSTI KÓDŮ, GRUP A LUP Autoreferát doktorské disertační práce Petr Vojtěchovský Školitel: doc. Aleš Drápal, CSc. Obor M2-Algebra PRAHA
More informationOn Markov Properties in Evidence Theory
On Markov Properties in Evidence Theory 131 On Markov Properties in Evidence Theory Jiřina Vejnarová Institute of Information Theory and Automation of the ASCR & University of Economics, Prague vejnar@utia.cas.cz
More informationAUTOREFERÁT DISERTAČNÍ PRÁCE
Univerzita Karlova v Praze Matematicko-fyzikální fakulta AUTOREFERÁT DISERTAČNÍ PRÁCE Mgr. Michal Houda Stabilita a aproximace pro úlohy stochastického programování Katedra pravděpodobnosti a matematické
More informationRepresenting Independence Models with Elementary Triplets
Representing Independence Models with Elementary Triplets Jose M. Peña ADIT, IDA, Linköping University, Sweden jose.m.pena@liu.se Abstract An elementary triplet in an independence model represents a conditional
More informationCHARLES UNIVERSITY, PRAGUE Faculty of Mathematics and Physics Department of Numerical Mathematics ON SOME OPEN PROBLEMS IN KRYLOV SUBSPACE METHODS
CHARLES UNIVERSITY, PRAGUE Faculty of Mathematics and Physics Department of Numerical Mathematics ON SOME OPEN PROBLEMS IN KRYLOV SUBSPACE METHODS Summary of Ph.D. Thesis April 2002 Petr Tichý Branch:
More informationOn Conditional Independence in Evidence Theory
6th International Symposium on Imprecise Probability: Theories and Applications, Durham, United Kingdom, 2009 On Conditional Independence in Evidence Theory Jiřina Vejnarová Institute of Information Theory
More informationConditional Independence and Markov Properties
Conditional Independence and Markov Properties Lecture 1 Saint Flour Summerschool, July 5, 2006 Steffen L. Lauritzen, University of Oxford Overview of lectures 1. Conditional independence and Markov properties
More informationLecture 4 October 18th
Directed and undirected graphical models Fall 2017 Lecture 4 October 18th Lecturer: Guillaume Obozinski Scribe: In this lecture, we will assume that all random variables are discrete, to keep notations
More informationOn the Conditional Independence Implication Problem: A Lattice-Theoretic Approach
On the Conditional Independence Implication Problem: A Lattice-Theoretic Approach Mathias Niepert Department of Computer Science Indiana University Bloomington, IN, USA mniepert@cs.indiana.edu Dirk Van
More informationTensor rank-one decomposition of probability tables
Tensor rank-one decomposition of probability tables Petr Savicky Inst. of Comp. Science Academy of Sciences of the Czech Rep. Pod vodárenskou věží 2 82 7 Prague, Czech Rep. http://www.cs.cas.cz/~savicky/
More informationApplying Bayesian networks in the game of Minesweeper
Applying Bayesian networks in the game of Minesweeper Marta Vomlelová Faculty of Mathematics and Physics Charles University in Prague http://kti.mff.cuni.cz/~marta/ Jiří Vomlel Institute of Information
More informationTutorial: Gaussian conditional independence and graphical models. Thomas Kahle Otto-von-Guericke Universität Magdeburg
Tutorial: Gaussian conditional independence and graphical models Thomas Kahle Otto-von-Guericke Universität Magdeburg The central dogma of algebraic statistics Statistical models are varieties The central
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationLikelihood Analysis of Gaussian Graphical Models
Faculty of Science Likelihood Analysis of Gaussian Graphical Models Ste en Lauritzen Department of Mathematical Sciences Minikurs TUM 2016 Lecture 2 Slide 1/43 Overview of lectures Lecture 1 Markov Properties
More informationFaithfulness of Probability Distributions and Graphs
Journal of Machine Learning Research 18 (2017) 1-29 Submitted 5/17; Revised 11/17; Published 12/17 Faithfulness of Probability Distributions and Graphs Kayvan Sadeghi Statistical Laboratory University
More informationBayesian Learning in Undirected Graphical Models
Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ Work with: Iain Murray and Hyun-Chul
More informationLearning Marginal AMP Chain Graphs under Faithfulness
Learning Marginal AMP Chain Graphs under Faithfulness Jose M. Peña ADIT, IDA, Linköping University, SE-58183 Linköping, Sweden jose.m.pena@liu.se Abstract. Marginal AMP chain graphs are a recently introduced
More informationChris Bishop s PRML Ch. 8: Graphical Models
Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular
More informationMaximization of Multi - Information
Maximization of Multi - Information Week of Doctoral Students 2007 Jozef Juríček http://www.adultpdf.com Academy of Sciences of the Czech Republic Created by Image To PDF trial version, Institute to remove
More informationMikroakcelerometrická měření negravitačních sil v okolí Země (simulace a analýza dat družice Mimosa) Autoreferát dizertační práce.
Matematicko-fyzikální fakulta Univerzita Karlova v Praze Astronomický ústav Akademie věd České republiky Mikroakcelerometrická měření negravitačních sil v okolí Země (simulace a analýza dat družice Mimosa)
More informationLearning from Sensor Data: Set II. Behnaam Aazhang J.S. Abercombie Professor Electrical and Computer Engineering Rice University
Learning from Sensor Data: Set II Behnaam Aazhang J.S. Abercombie Professor Electrical and Computer Engineering Rice University 1 6. Data Representation The approach for learning from data Probabilistic
More informationTHE VINE COPULA METHOD FOR REPRESENTING HIGH DIMENSIONAL DEPENDENT DISTRIBUTIONS: APPLICATION TO CONTINUOUS BELIEF NETS
Proceedings of the 00 Winter Simulation Conference E. Yücesan, C.-H. Chen, J. L. Snowdon, and J. M. Charnes, eds. THE VINE COPULA METHOD FOR REPRESENTING HIGH DIMENSIONAL DEPENDENT DISTRIBUTIONS: APPLICATION
More informationMarkov properties for undirected graphs
Graphical Models, Lecture 2, Michaelmas Term 2011 October 12, 2011 Formal definition Fundamental properties Random variables X and Y are conditionally independent given the random variable Z if L(X Y,
More informationCSC 412 (Lecture 4): Undirected Graphical Models
CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:
More informationMTH 309 Supplemental Lecture Notes Based on Robert Messer, Linear Algebra Gateway to Mathematics
MTH 309 Supplemental Lecture Notes Based on Robert Messer, Linear Algebra Gateway to Mathematics Ulrich Meierfrankenfeld Department of Mathematics Michigan State University East Lansing MI 48824 meier@math.msu.edu
More informationClassification of semisimple Lie algebras
Chapter 6 Classification of semisimple Lie algebras When we studied sl 2 (C), we discovered that it is spanned by elements e, f and h fulfilling the relations: [e, h] = 2e, [ f, h] = 2 f and [e, f ] =
More informationIntrinsic products and factorizations of matrices
Available online at www.sciencedirect.com Linear Algebra and its Applications 428 (2008) 5 3 www.elsevier.com/locate/laa Intrinsic products and factorizations of matrices Miroslav Fiedler Academy of Sciences
More informationLearning Multivariate Regression Chain Graphs under Faithfulness
Sixth European Workshop on Probabilistic Graphical Models, Granada, Spain, 2012 Learning Multivariate Regression Chain Graphs under Faithfulness Dag Sonntag ADIT, IDA, Linköping University, Sweden dag.sonntag@liu.se
More informationGeometry of Gaussoids
Geometry of Gaussoids Bernd Sturmfels MPI Leipzig and UC Berkeley p 3 p 13 p 23 a 12 3 p 123 a 23 a 13 2 a 23 1 a 13 p 2 p 12 a 12 p p 1 Figure 1: With The vertices Tobias andboege, 2-faces ofalessio the
More informationSTRUCTURE OF GEODESICS IN A 13-DIMENSIONAL GROUP OF HEISENBERG TYPE
Steps in Differential Geometry, Proceedings of the Colloquium on Differential Geometry, 25 30 July, 2000, Debrecen, Hungary STRUCTURE OF GEODESICS IN A 13-DIMENSIONAL GROUP OF HEISENBERG TYPE Abstract.
More informationLaplacian Integral Graphs with Maximum Degree 3
Laplacian Integral Graphs with Maximum Degree Steve Kirkland Department of Mathematics and Statistics University of Regina Regina, Saskatchewan, Canada S4S 0A kirkland@math.uregina.ca Submitted: Nov 5,
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationConditional Independence
H. Nooitgedagt Conditional Independence Bachelorscriptie, 18 augustus 2008 Scriptiebegeleider: prof.dr. R. Gill Mathematisch Instituut, Universiteit Leiden Preface In this thesis I ll discuss Conditional
More informationRepresenting Independence Models with Elementary Triplets
Representing Independence Models with Elementary Triplets Jose M. Peña ADIT, IDA, Linköping University, Sweden jose.m.pena@liu.se Abstract In an independence model, the triplets that represent conditional
More informationExact distribution theory for belief net responses
Exact distribution theory for belief net responses Peter M. Hooper Department of Mathematical and Statistical Sciences University of Alberta Edmonton, Canada, T6G 2G1 hooper@stat.ualberta.ca May 2, 2008
More informationDiscrete Mathematics. Benny George K. September 22, 2011
Discrete Mathematics Benny George K Department of Computer Science and Engineering Indian Institute of Technology Guwahati ben@iitg.ernet.in September 22, 2011 Set Theory Elementary Concepts Let A and
More informationEves Equihoop, The Card Game SET, and Abstract SET
Eves Equihoop, The Card Game SET, and Abstract SET Arthur Holshouser 3600 Bullard St Charlotte, NC, USA Harold Reiter Department of Mathematics, University of North Carolina Charlotte, Charlotte, NC 83,
More informationProbabilistic Reasoning: Graphical Models
Probabilistic Reasoning: Graphical Models Christian Borgelt Intelligent Data Analysis and Graphical Models Research Unit European Center for Soft Computing c/ Gonzalo Gutiérrez Quirós s/n, 33600 Mieres
More informationBasic Equations and Inequalities
Hartfield College Algebra (Version 2017a - Thomas Hartfield) Unit ONE Page - 1 - of 45 Topic 0: Definition: Ex. 1 Basic Equations and Inequalities An equation is a statement that the values of two expressions
More informationDEEP LEARNING CHAPTER 3 PROBABILITY & INFORMATION THEORY
DEEP LEARNING CHAPTER 3 PROBABILITY & INFORMATION THEORY OUTLINE 3.1 Why Probability? 3.2 Random Variables 3.3 Probability Distributions 3.4 Marginal Probability 3.5 Conditional Probability 3.6 The Chain
More informationUndirected Graphical Models: Markov Random Fields
Undirected Graphical Models: Markov Random Fields 40-956 Advanced Topics in AI: Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2015 Markov Random Field Structure: undirected
More informationRelations Graphical View
Introduction Relations Computer Science & Engineering 235: Discrete Mathematics Christopher M. Bourke cbourke@cse.unl.edu Recall that a relation between elements of two sets is a subset of their Cartesian
More informationChordal Coxeter Groups
arxiv:math/0607301v1 [math.gr] 12 Jul 2006 Chordal Coxeter Groups John Ratcliffe and Steven Tschantz Mathematics Department, Vanderbilt University, Nashville TN 37240, USA Abstract: A solution of the isomorphism
More informationSupplementary material to Structure Learning of Linear Gaussian Structural Equation Models with Weak Edges
Supplementary material to Structure Learning of Linear Gaussian Structural Equation Models with Weak Edges 1 PRELIMINARIES Two vertices X i and X j are adjacent if there is an edge between them. A path
More informationMATH 829: Introduction to Data Mining and Analysis Graphical Models I
MATH 829: Introduction to Data Mining and Analysis Graphical Models I Dominique Guillot Departments of Mathematical Sciences University of Delaware May 2, 2016 1/12 Independence and conditional independence:
More informationUndirected Graphical Models
Undirected Graphical Models 1 Conditional Independence Graphs Let G = (V, E) be an undirected graph with vertex set V and edge set E, and let A, B, and C be subsets of vertices. We say that C separates
More informationGraphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence
Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence General overview Introduction Directed acyclic graphs (DAGs) and conditional independence DAGs and causal effects
More informationAlgebraic matroids are almost entropic
accepted to Proceedings of the AMS June 28, 2017 Algebraic matroids are almost entropic František Matúš Abstract. Algebraic matroids capture properties of the algebraic dependence among elements of extension
More informationDoes Better Inference mean Better Learning?
Does Better Inference mean Better Learning? Andrew E. Gelfand, Rina Dechter & Alexander Ihler Department of Computer Science University of California, Irvine {agelfand,dechter,ihler}@ics.uci.edu Abstract
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationProbability and Information Theory. Sargur N. Srihari
Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal
More information1 Basic Combinatorics
1 Basic Combinatorics 1.1 Sets and sequences Sets. A set is an unordered collection of distinct objects. The objects are called elements of the set. We use braces to denote a set, for example, the set
More informationMachine Learning, Fall 2009: Midterm
10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all
More informationLearning MN Parameters with Approximation. Sargur Srihari
Learning MN Parameters with Approximation Sargur srihari@cedar.buffalo.edu 1 Topics Iterative exact learning of MN parameters Difficulty with exact methods Approximate methods Approximate Inference Belief
More informationInferring the Causal Decomposition under the Presence of Deterministic Relations.
Inferring the Causal Decomposition under the Presence of Deterministic Relations. Jan Lemeire 1,2, Stijn Meganck 1,2, Francesco Cartella 1, Tingting Liu 1 and Alexander Statnikov 3 1-ETRO Department, Vrije
More informationCausal Effect Identification in Alternative Acyclic Directed Mixed Graphs
Proceedings of Machine Learning Research vol 73:21-32, 2017 AMBN 2017 Causal Effect Identification in Alternative Acyclic Directed Mixed Graphs Jose M. Peña Linköping University Linköping (Sweden) jose.m.pena@liu.se
More informationTotal positivity in Markov structures
1 based on joint work with Shaun Fallat, Kayvan Sadeghi, Caroline Uhler, Nanny Wermuth, and Piotr Zwiernik (arxiv:1510.01290) Faculty of Science Total positivity in Markov structures Steffen Lauritzen
More informationBayesian Learning in Undirected Graphical Models
Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ and Center for Automated Learning and
More informationPartial cubes: structures, characterizations, and constructions
Partial cubes: structures, characterizations, and constructions Sergei Ovchinnikov San Francisco State University, Mathematics Department, 1600 Holloway Ave., San Francisco, CA 94132 Abstract Partial cubes
More informationThe Symmetric Groups
Chapter 7 The Symmetric Groups 7. Introduction In the investigation of finite groups the symmetric groups play an important role. Often we are able to achieve a better understanding of a group if we can
More informationUndirected Graphical Models
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional
More informationCombinatorial and physical content of Kirchhoff polynomials
Combinatorial and physical content of Kirchhoff polynomials Karen Yeats May 19, 2009 Spanning trees Let G be a connected graph, potentially with multiple edges and loops in the sense of a graph theorist.
More informationCONVENIENT PRETOPOLOGIES ON Z 2
TWMS J. Pure Appl. Math., V.9, N.1, 2018, pp.40-51 CONVENIENT PRETOPOLOGIES ON Z 2 J. ŠLAPAL1 Abstract. We deal with pretopologies on the digital plane Z 2 convenient for studying and processing digital
More informationProbabilistic Graphical Models (I)
Probabilistic Graphical Models (I) Hongxin Zhang zhx@cad.zju.edu.cn State Key Lab of CAD&CG, ZJU 2015-03-31 Probabilistic Graphical Models Modeling many real-world problems => a large number of random
More informationTree-width and planar minors
Tree-width and planar minors Alexander Leaf and Paul Seymour 1 Princeton University, Princeton, NJ 08544 May 22, 2012; revised March 18, 2014 1 Supported by ONR grant N00014-10-1-0680 and NSF grant DMS-0901075.
More informationJunction Tree, BP and Variational Methods
Junction Tree, BP and Variational Methods Adrian Weller MLSALT4 Lecture Feb 21, 2018 With thanks to David Sontag (MIT) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,
More informationPropositions of a thesis: Mathematical Modelling and Numerical Solution of Some 2D- and 3D-cases of Atmospheric Boundary Layer Flow
CZECH TECHNICAL UNIVERSITY Faculty of Nuclear Sciences and Physical Engineering Propositions of a thesis: Mathematical Modelling and Numerical Solution of Some 2D- and 3D-cases of Atmospheric Boundary
More informationCocliques in the Kneser graph on line-plane flags in PG(4, q)
Cocliques in the Kneser graph on line-plane flags in PG(4, q) A. Blokhuis & A. E. Brouwer Abstract We determine the independence number of the Kneser graph on line-plane flags in the projective space PG(4,
More informationFinding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October
Finding normalized and modularity cuts by spectral clustering Marianna Bolla Institute of Mathematics Budapest University of Technology and Economics marib@math.bme.hu Ljubjana 2010, October Outline Find
More informationMarkov properties for undirected graphs
Graphical Models, Lecture 2, Michaelmas Term 2009 October 15, 2009 Formal definition Fundamental properties Random variables X and Y are conditionally independent given the random variable Z if L(X Y,
More information32 Divisibility Theory in Integral Domains
3 Divisibility Theory in Integral Domains As we have already mentioned, the ring of integers is the prototype of integral domains. There is a divisibility relation on * : an integer b is said to be divisible
More informationZero controllability in discrete-time structured systems
1 Zero controllability in discrete-time structured systems Jacob van der Woude arxiv:173.8394v1 [math.oc] 24 Mar 217 Abstract In this paper we consider complex dynamical networks modeled by means of state
More informationSYMBOL EXPLANATION EXAMPLE
MATH 4310 PRELIM I REVIEW Notation These are the symbols we have used in class, leading up to Prelim I, and which I will use on the exam SYMBOL EXPLANATION EXAMPLE {a, b, c, } The is the way to write the
More informationCurve Fitting Re-visited, Bishop1.2.5
Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the
More informationCodes on graphs. Chapter Elementary realizations of linear block codes
Chapter 11 Codes on graphs In this chapter we will introduce the subject of codes on graphs. This subject forms an intellectual foundation for all known classes of capacity-approaching codes, including
More informationChapter 8. P-adic numbers. 8.1 Absolute values
Chapter 8 P-adic numbers Literature: N. Koblitz, p-adic Numbers, p-adic Analysis, and Zeta-Functions, 2nd edition, Graduate Texts in Mathematics 58, Springer Verlag 1984, corrected 2nd printing 1996, Chap.
More informationIntegrating Correlated Bayesian Networks Using Maximum Entropy
Applied Mathematical Sciences, Vol. 5, 2011, no. 48, 2361-2371 Integrating Correlated Bayesian Networks Using Maximum Entropy Kenneth D. Jarman Pacific Northwest National Laboratory PO Box 999, MSIN K7-90
More informationBelief Update in CLG Bayesian Networks With Lazy Propagation
Belief Update in CLG Bayesian Networks With Lazy Propagation Anders L Madsen HUGIN Expert A/S Gasværksvej 5 9000 Aalborg, Denmark Anders.L.Madsen@hugin.com Abstract In recent years Bayesian networks (BNs)
More informationSOME DESIGNS AND CODES FROM L 2 (q) Communicated by Alireza Abdollahi
Transactions on Combinatorics ISSN (print): 2251-8657, ISSN (on-line): 2251-8665 Vol. 3 No. 1 (2014), pp. 15-28. c 2014 University of Isfahan www.combinatorics.ir www.ui.ac.ir SOME DESIGNS AND CODES FROM
More informationA Solution of a Tropical Linear Vector Equation
A Solution of a Tropical Linear Vector Equation NIKOLAI KRIVULIN Faculty of Mathematics and Mechanics St. Petersburg State University 28 Universitetsky Ave., St. Petersburg, 198504 RUSSIA nkk@math.spbu.ru
More informationLinear Algebra. Preliminary Lecture Notes
Linear Algebra Preliminary Lecture Notes Adolfo J. Rumbos c Draft date April 29, 23 2 Contents Motivation for the course 5 2 Euclidean n dimensional Space 7 2. Definition of n Dimensional Euclidean Space...........
More informationp L yi z n m x N n xi
y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen
More informationGraph coloring, perfect graphs
Lecture 5 (05.04.2013) Graph coloring, perfect graphs Scribe: Tomasz Kociumaka Lecturer: Marcin Pilipczuk 1 Introduction to graph coloring Definition 1. Let G be a simple undirected graph and k a positive
More informationMATH 8253 ALGEBRAIC GEOMETRY WEEK 12
MATH 8253 ALGEBRAIC GEOMETRY WEEK 2 CİHAN BAHRAN 3.2.. Let Y be a Noetherian scheme. Show that any Y -scheme X of finite type is Noetherian. Moreover, if Y is of finite dimension, then so is X. Write f
More informationDirected acyclic graphs and the use of linear mixed models
Directed acyclic graphs and the use of linear mixed models Siem H. Heisterkamp 1,2 1 Groningen Bioinformatics Centre, University of Groningen 2 Biostatistics and Research Decision Sciences (BARDS), MSD,
More informationMore Books At www.goalias.blogspot.com www.goalias.blogspot.com www.goalias.blogspot.com www.goalias.blogspot.com www.goalias.blogspot.com www.goalias.blogspot.com www.goalias.blogspot.com www.goalias.blogspot.com
More informationseries. Utilize the methods of calculus to solve applied problems that require computational or algebraic techniques..
1 Use computational techniques and algebraic skills essential for success in an academic, personal, or workplace setting. (Computational and Algebraic Skills) MAT 203 MAT 204 MAT 205 MAT 206 Calculus I
More informationThe Expectation-Maximization Algorithm
1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should
More informationNotes. Relations. Introduction. Notes. Relations. Notes. Definition. Example. Slides by Christopher M. Bourke Instructor: Berthe Y.
Relations Slides by Christopher M. Bourke Instructor: Berthe Y. Choueiry Spring 2006 Computer Science & Engineering 235 Introduction to Discrete Mathematics Sections 7.1, 7.3 7.5 of Rosen cse235@cse.unl.edu
More informationCausality in Econometrics (3)
Graphical Causal Models References Causality in Econometrics (3) Alessio Moneta Max Planck Institute of Economics Jena moneta@econ.mpg.de 26 April 2011 GSBC Lecture Friedrich-Schiller-Universität Jena
More informationTactical Decompositions of Steiner Systems and Orbits of Projective Groups
Journal of Algebraic Combinatorics 12 (2000), 123 130 c 2000 Kluwer Academic Publishers. Manufactured in The Netherlands. Tactical Decompositions of Steiner Systems and Orbits of Projective Groups KELDON
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by
More informationarxiv: v3 [math.ds] 21 Jan 2018
ASYMPTOTIC BEHAVIOR OF CONJUNCTIVE BOOLEAN NETWORK OVER WEAKLY CONNECTED DIGRAPH XUDONG CHEN, ZUGUANG GAO, AND TAMER BAŞAR arxiv:1708.01975v3 [math.ds] 21 Jan 2018 Abstract. A conjunctive Boolean network
More informationAn Introduction to Reversible Jump MCMC for Bayesian Networks, with Application
An Introduction to Reversible Jump MCMC for Bayesian Networks, with Application, CleverSet, Inc. STARMAP/DAMARS Conference Page 1 The research described in this presentation has been funded by the U.S.
More informationDiagnostic quiz for M303: Further Pure Mathematics
Diagnostic quiz for M303: Further Pure Mathematics Am I ready to start M303? M303, Further pure mathematics is a fascinating third level module, that builds on topics introduced in our second level module
More informationChapter 17: Undirected Graphical Models
Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)
More informationGaussian Mixture Models
Gaussian Mixture Models David Rosenberg, Brett Bernstein New York University April 26, 2017 David Rosenberg, Brett Bernstein (New York University) DS-GA 1003 April 26, 2017 1 / 42 Intro Question Intro
More information