Concepts of a Discrete Random Variable

Size: px

Start display at page:

Download "Concepts of a Discrete Random Variable"

Benedict Doyle
5 years ago
Views:

1 Concepts of a Discrete Random Variable Richard Emilion Laboratoire MAPMO, Université d Orléans, B.P Orléans Cedex 2, France, richard.emilion@univ-orleans.fr Abstract. A formal concept is defined in the literature as a pair (extent, intent) with respect to a context which is usually empirical, as for example a sample of transactions. This is somewhat unsatisfying since concepts, though born from experiences, should not depend on them. In this paper we consider the above concepts as empirical concepts and we define the notion of concept, in a context-free framework, as a limit intent, by proving, applying the large number law, that : Given a random variable X taking its value in a countable σ-semilattice, the random intents of empirical concepts, with respect to a sample of X, converge almost everywhere to a fixed deterministic limit, called a concept, whose identification shows that it only depends on the distribution P X of X. Moreover, the set of such concepts is the σ-semilattice generated by the support of X and has even a structure of σ-lattice: the lattice concept of a random variable. We also compute the mean number of concepts and frequent itemsets for a hierarchical Bernoulli mixtures model. Last, we propose an algorithm to find out maximal frequent itemsets by using minimal winning coalitions of P X. 1 Introduction An important component of data mining is rule induction, that is extraction of useful if-then rules from data, and a key step in this induction consists in mining what is usually called frequent itemsets (FI s) as introduced in Agrawal et al. (1993 and 1994). In order to understand the ideas beyond these mining algorithms, it is helpful to use the notions of Galois connections, intent, extent, closed sets and so on. A pair (extent, intent) was called concept by Wille (1980) but this notion of concept, widely used in various domains (artificial intelligence, robotics, psychology, software engineering, text mining and so on), depends on the extent which is, roughly speaking, a random sample. In the present paper we call it (random) empirical concept and we define concepts as limit of empirical intents, showing that these limits are no more random and do not depend on the sample. In other words concepts are defined with respect to a random variable rather than to a sample of this random variable. The paper is organized as follows: Random variables taking their values in a σ-semilattice are introduced in Section 2. Random empirical Galois lattices are defined in Section 3 where is also proved the convergence of random intents and is defined the concept lattice of a random variable. In Section 4, the average number of empirical concepts and of frequent itemsets is computed

2 2 Emilion for a hierarchical Bernoulli mixtures model. Frequent itemsets and winning coalitions are studied in Section 5, providing an algorithm for mining maximal frequent itemsets. The present paper answers a question of Edwin Diday who has also a definition of concept as an intent (Bock et al. (2000)). It is dedicated to Edwin Diday who introduced us to several interesting problems. 2 Notations and terminology 2.1 σ-semilattice Our set of observations, say L, is taken very general in order to cover a wide area of applications. It can be a subset of real numbers, real vectors, real functions, fuzzy sets, power set, words of a language, real cumulative distribution functions, real stochastic processes, and so on. Let (L,, ) be a countable semilattice, that is L is a countable set is a partial order relation on the set L is an infimum operator. We will asume in addition that L is a σ-semilattice : for any (countable) subset A L, there exists a largest element in L, denoted by L A L, which is lower than any L A. Without loss of generality, it can also be assumed that there exists a largest element in L, denoted by 1, and by convention L = 1 if A =. L A If (L n ) n 1 is a decreasing sequence in L, then we will say that this sequence is convergent and that its limit is 2.2 L-valued random variable L n. n=1 Let (Ω, B, P) be a probability space. Let X : Ω L be a (discrete) L-valued random variable (r.v.) whose distribution probability P X is a probability measure on L defined as usual as follows: L L, P X (L) = P(X = L) = P(ω Ω : X(ω) = L).

3 Concepts of a Discrete Random Variable 3 The support of P X will play a key role: it is defined as the set S X = {L L : P X (L) > 0}. For any n {1, 2, 3...} a n sample of X is a sequence X 1,...,X n where the X i s are independent and identically distributed (iid) r.v. s distributed as X. 2.3 Data mining context To join the terminology of data mining (which comes from marketing) with the preceding setting, it suffices to take L = (P(J),, ), the power set of a (large) finite set J of items. Any any L L is then a subset of J and is usually called an itemset. The random variable X modellizes random transactions made by customers, the itemset X(ω) L representing the random set of items bought by a customer. The following simple example illustrates what is usually called a binary context. Take J = {a, b, c, d, e} and n = 10 transactions (1 means that the item was bought and 0 not). The last column in Table 1 below contains the random set that will be considered in our approach. a b c d e random set X 1(ω) = {b, e} X 2(ω) = {a, b, e} X 3(ω) = {b, c, d} X 4(ω) = {a, d} X 5(ω) = {b, c, e} X 6(ω) = {a, b, c, d} X 7(ω) = {c, d, e} X 8(ω) = {a, b, d} X 9(ω) = {b, c, d, e} X 10(ω) = {c, d} Table 1. Binary context and random set. 3 Galois lattice for semilattices The notion of Galois connection (GC) was early introduced in Ore (1944), it is also mentionned in the book by Birkhoff (1967), chapter 5. We first note

4 4 Emilion that Barbut and Monjardet elegant and general definition of a Galois lattice (GL), stated for a GC between lattices (Barbut et al. (1970), pages 13 and 25), can be extended to a GC between semilattices: Let < E,, > and < F,, > be two semilattices, a GC between E and F is a pair of mappings (f, g) verifying f : E F and g : F E are decreasing, (1) h = g f : E E and k = f g : F F are extensive, (2) These definitions imply that i.e. x E, x h(x) and y F, y k(y). f h = f, h h = h, g k = g, k k = k. (3) Let I h = {x E : h(x) = x} (resp. I k = {y F : k(y) = y}) be the set of closed (or invariant) elements of E (resp. of F ). It can be seen that the restriction of f to I h is a one-to-one mapping into I k, its inverse being the restriction of g to I k. The Galois lattice (GL) G induced by the GC (f, g) is defined as the set of nodes G = {(x, f(x)), x I h }, which has a lattice structure if, and are defined as follows: It is easily seen that (x, f(x)) (x, f(x )) iff x x and f(x ) f(x), (x, f(x)) (x, f(x )) = (g(f(x) f(x )), f(x) f(x )), (x, f(x)) (x, f(x )) = (x x, f(x x )). G = {(x, f(x)), x I h } = {(g(y), y), y I k }. The mapping f (resp. g) is called an intent (resp. an extent). As any pair (x, y) of the GL satifies y = f(x) and x = g(y), Wille (1980) then proposed to call such a pair a concept. It is worthwhile to mention that the name of Galois appears here because of the analogy with a fundamental result in the celebrated Galois theory on the one-to-one correspondance between the intermediate fields of a field extension and the subgroups of its Galois group (see e.g. Stewart (1975), page 114).

5 Concepts of a Discrete Random Variable Binary GL Let I be a set of objects and J a set of properties. Let R be a binary relation on I J : irj iff object i has property j. For any non-emptyset A E = P(I) let f(a) = {j J : irj for all i A} and f( ) = J (4) be the the intent or the description of A, that is the set of properties satisfied by all objects of A. For any non-empty set B F = P(J ) let g(b) = {i I : irj for all j B} and g( ) = I (5) be the extent of B, that is the set of objects satisfying all the properties given by B. The pair (f, g) is a popular example of GC, it is called a binary GC. 3.2 Explicit formulas for a general GC Let E = P(I), where I) denote a countable set of objects. In most concrete situations, only the descriptions d(i), i I, which belong to a general σ semilattice L, are given. A natural question to ask is the existence of a GC (f, g) such that f({i}) = d(i) with explicit fomulas generalizing formulas (4) (5) of the binary case. The solution exists, and is unique if the GC is supposed maximal (that is not dominated by a GC): Theorem (Diday - Emilion (1997), (2003)) There exists a unique maximal GC (f, g) between E = P(I) and L verifying f({i}) = d(i). It is given by the formulas: f(a) = i A d(i) for any non-empty A E, (6) f( ) = 1, g(l) = {i I : L d(i)} for any L L. (7) Note that (6) and (7) imply h(a) = g(f(a)) = {i I : d(j) d(i)} for any A E, (8) k(l) = f(g(l)) = j A i I:L d(i) d(i) for any L L. (9) In the binary case, L = (P(J ),, ) is isomorphic to ({0, 1} #J,, ), therefore (6) and (7) generalize (4) and (5)

6 6 Emilion 4 Random Galois lattices 4.1 Random empirical Galois lattices As above, let X : Ω L be a (discrete) L-valued random variable (r.v.). Let X 1,..., X n,... be a sequence of iid r.v. s distributed as X. For any n = 1, 2,..., consider the following random Galois connections: < E n,, >=< P{1, 2,..., n},, >, < F,, >= (L,, ), f n (A) = X i, g n (L) = {i {1, 2,..., n} : L X i }, i A h n = g n f n, k n = f n g n for any A E n and L L. Note that h n (A) = {i {1, 2,..., n} : while k n (L) = j A i {1,2,...,n}:L X i X i. X j X i } 4.2 Convergence of random empirical intents We are now in a position to state the announced result on the convergence of random empirical intents with the identification of the deterministic limit. Theorem 1. For any L L the random intents k n (L) = i=1,...,n:l X i X i converge a.e. towards the following deterministic limit: k (L) = lim k n (L) = L. n L S X:L L Proof. For any L L, let 1 (Xi=L)(ω) = 1 if X i (ω) = L and = 0 otherwise. Since the r.v. s 1 (Xi=L) so defined are i.i.d. with expectation P X (L), the large number law provides a nullset N L Ω, N L B, such that P(N L ) = 0, which satisfies ω / N L, 1 n n 1 (Xi=L)(ω) P X (L). i=1 In particular for any L S X, since P X (L) > 0, we have ω / N L, n 1 (Xi=L)(ω) 1 i=1

7 Concepts of a Discrete Random Variable 7 for n large enough. Therefore that is ω / N L, i 1 : X i (ω) = L ω / N L, L {X i (ω), i = 1, 2,...}. As S X is countable, the set N = L S X N L belongs to B, P(N) = 0 and ω / N, S X {X i (ω), i = 1, 2,...}. (10) On the other hand, for any i = 1, 2,..., let Then, we have since L\S X is countable and N i = {ω : X i (ω) / S X }. P(N i ) = 0 P(N i ) = P(X i / S X ) = P(X / S X ) = L/ S X P(X = L) = 0 by definition of S X. Now, by definition of the N i s we have or equivalently ω / ω / N i, X i (ω) S X i = 1, 2,... i=1 N i, {X i (ω), i = 1, 2,...} S X. (11) i=1 So, if we let N 0 = N i=1 N i, then P(N 0 ) = 0 and (10), (11) imply that is, shortly, ω / N 0, {X i (ω), i = 1, 2,...} = S X, {X i, i = 1, 2,...} = S X a.e. (12) Note that (12) holds for any random variable taking its value in a countable set. Observe now that (12) implies that for any L L and thus {X i, i = 1, 2,... : L X i } = {L S X : L L } a.e. i=1,2...,:l X i X i = This completes the proof. L S X:L L L a.e..

8 8 Emilion 4.3 Limit GL Obviously, the above closure operator k can be obtained by the following limit GC < E,, >=< P{1, 2,..., },, >, < F,, >= (L,, ), f (A) = X i, i A g (L) = {i {1, 2,...} : L X i }, h = g f, k = f g for any A E and L L. So, h (A) = {i {1, 2,...} : while k (L) = j A i {1,2,...}:L X i X i. X j X i }, Hence the random limit GL can be defined as the lattice: G = {g (L), k (L)), L L}. Note that the extent g (L) is random and depends on the sample (X i ) i=1,2,... while the intent is deterministic and does not depend on the sequence (X i ) i=1,2, Concepts, concept lattice Definition: A concept of the r.v. X is an element of L such that L = L. L S X:L L The set of concepts will be denoted by C(X, L), shortly, C. The random set of empirical intents w.r.t. a sample X 1,..., X n of X will be denoted by C(X 1,..., X n, L), shortly, C n : The above theorem states that C n = k n (L) = {k n (L), L L}. k (L) = {k (L), L L} = C(X, L) a.e.. Since we have L k (L) k n+1 (L) k n (L) we see that k n (L) = L k (L) = L, in other words k n (L) k n+1 (L) k (L). (13)

9 Concepts of a Discrete Random Variable 9 Proposition 1. C(X, L) is the σ-semilattice generated by S X. In particular, if P(X = L) > 0 then L is a concept. Note however that L such that P(X = L) = 0 can be a concept: let L = {0, c, a, b, 1} where 0 (resp. 1) is the lowest (resp. largest) element of L, a b, b a, c = a b and let X be such that P(X = a) = 1/2, P(X = b) = 1/2. Then c is a concept and P(X = c) = 0. Further, observe that Proposition 2. i) If L < 1 is a concept then P(L X) > 0 ii) {L S X : L L } = {L S X : k (L) L } iii) P(L X) = P(k (L) X) iv) If k (L) < 1 then P(L X) > 0 v) 1 is a concept iff P(X = 1) > 0 vi) If k (L) = 1 then P(L X) > 0 iff P(X = 1) > 0 Note that P(L X) > 0 means that for a.a. sample, L appears infinitely often within an itemset. Also, the converse of i) need not be true (use iv)). Proposition 3. C(X, L) is a σ-lattice. 5 Average number of concepts for hierarchical Bernoulli mixtures Consider the case where L = (P(J),, ) the power set of a (large) finite set J = {1,..., r} of r items, P(J) being identified to {0, 1} r. Suppose that the distribution of the r.v. X = (X (1),...,X (j),..., X (r) ) is a finite mixture of products of Bernoulli s r j=1 B(p U,j), where the r.v. U {1,...,K} is a latent class variable and the weight vector q = (q 1,..., q K ) of the mixture has a Dirichlet distribution D(γ 1,..., γ K ). This precisely means that we have the following hierarchical mixture model (HMM): X U=u,q K q c r j=1 B(p u,j ), (14) u=1 P(U = u q) = q u, (15) q D(γ 1,...,γ K ). (16) The following generalizes some results in Lhote et al. (2005) and Emilion et al. (2005):

10 10 Emilion Proposition 4. For the HMM defined by equations (14), (15), (16) E(#C n ) = and K u=1 γ u γ n i=0 B P(J) ( ) n (1 p u,j ) n i p i u,j i j B j B lim E(#C n) = 2 r = #C. n (1 p u,j ) i, For such a model we can similarly compute the mean number of closed frequent itemsets. j / B 6 Maximal frequent itemsets 6.1 Empirical frequent itemsets We return now to the case of a general σ-semilattice whose elements are still called itemsets. Let α (0, 1) be a fixed treshold. An itemset L is said empirically frequent (w.r.t the empirical context X i, i = 1,...n) iff #g n (L) = #{i {1,..., n} : L X i } nα. As #g n (L) = n 1 L Xi, (17) and the X i s are i.i.d., we see that the r.v. 1 L Xi are Bernoulli i.i.d. and the r.v. #g n (L) has a binomial distribution: where Hence i=1 #g n (L) Binom(n, p L ) p L = P(L X). P(L empirical frequent) = P(#g n (L) nα) = k nα ( ) n p k k L(1 p L ) n k. The average number of empirical frequent itemsets is then equal to Proposition 5. L LP(L (n, α) frequent) = L L k nα ( ) n p k k L(1 p L ) n k.

11 Concepts of a Discrete Random Variable Frequent itemsets By the large number law, (17) implies: so that we are lead to the following: #g n (L) lim = P(L X) a.e. n n Definition: L L is an α-frequent itemset iff P(L X) α. A maximal α-frequent itemset is an α-frequent itemset which is maximal (for the order in L) among the α-frequent itemsets. 6.3 Minimal winning coalitions We now propose an algorithm to find out maximal frequent itemsets by using minimal coalitions of P X. Since X is countable, let S X = {L 1,..., L r,...}, and let p r = P(X = L r ) > 0. An α-winning coalition is a subset of {1,...,r,...}, say A, such that p r α. r A A minimal α-winning coalition is an α-winning coalition which is minimal (for the inclusion order) among the α-winning coalitions. Algorithms for finding minimal coalitions were intensively studied in games theory (see e.g. Matsu et. al (2000)). They can be applied to find out maximal frequent itemsets due to the following: Theorem 2. i) If L is a maximal frequent itemset then L = r A L r where A is an α-minimal coalition. ii) Conversely if A is an α-minimal coalition then L = r A L r is frequent. It is easy to construct an example where A is an α-minimal coalition but L = r A L r is not maximal.

12 12 Emilion 6.4 Algorithm The above theorem can be applied to the empirical measure (which is an estimator of P X ), from a finite table of observed itemsets such as the one in Subsection 2.3: Find the distinct itemsets L 1,..., L k, and their respective frequency p 1,..., p k Find the α-minimal winning coalitions from p 1,...,p k For each of such a coalition, say A, compute L = r A L r The list of such L contains all the maximal frequent itemsets which can be extracted from this list. Such an algorithm will be of interest if the number r of distinct itemsets is much lower than the total number of observed itemsets. Note that the step where are found minimal winning coalitions should be fast since it does not require any access to the dataset. References AGRAWAL, R., IMIELINSKI, T. and SWAMY, A. (1993): Mining Association Rules Between Sets of Items in Large Databases. In: ACM SIGMOD, Int l Conf. on Managment of Data, AGRAWAL, R. and SRIKANT, R. (1994): Fast Algorithm for Mining Association. In: 20th. Intl. Conf. VLDB, BARBUT, M. and MONJARDET, B. (1970): Ordre et classification. Hachette, Paris. BIRKHOFF, G. (1967): Lattice theory. AMS Colloq. Public. Vol. XXV. BOCK, H. H. and DIDAY, E. (2000): Analysis of Symbolic Data. Springer Verlag, Berlin. CASPARD, N. and MONJARDET, B. (2003): The Lattice of Closure Systems. Disc. Appl. Math. J., 127, DIDAY, E. and EMILION, R. (1997): Maximal and Stochastic Galois Lattices. C. R. Acad. Sci. Paris, 325, I (1), DIDAY, E. and EMILION, R. (2003): Maximal and Stochastic Galois Lattices. Disc. Applied Math. J, 27-2, EMILION, R. and LÉVY, G. (2005): Size of Random Galois Lattices and Number of Frequent Itemsets. LHOTE, L., RIOULT, F. and SOULET, A. (2005): Average Number of Frequent (Closed) Patterns in Bernouilli and Markovian Databases. In: Fifth IEEE International Conference on Data Mining (ICDM 05), Houston, Texas, MATSUI, T. and MATSUI, Y. (2000): A Survey of Algorithms for Calculating Power Indices of Weighted Majority Games. J. Oper. Research Soc. Japan, 43. ORE, O. (1944): Galois Connections. Trans. Amer. Math. Soc. 55, STEWART, J. (1975): Galois Theory. Chapman and Hall, New York. WILLE, R. (1980): Restructuring Lattice Theory, Ordered Sets I. Rival ed., Reider.

Encyclopedia of Machine Learning Chapter Number Book CopyRight - Year 2010 Frequent Pattern. Given Name Hannu Family Name Toivonen

Book Title Encyclopedia of Machine Learning Chapter Number 00403 Book CopyRight - Year 2010 Title Frequent Pattern Author Particle Given Name Hannu Family Name Toivonen Suffix Email hannu.toivonen@cs.helsinki.fi