A Geometric Theory of Feature Selection and Distance-Based Measures

Size: px
Start display at page:

Download "A Geometric Theory of Feature Selection and Distance-Based Measures"

Transcription

1 Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 25) A Geometric Theory of Feature Selection and Distance-Based Measures Kilho Shin and Adrian Pino Angulo University of Hyogo Kobe, Japan yshin@ai.u-hyogo.ac.jp Abstract Feature selection measures are often explained by the analogy to a rule to measure the distance of sets of features to the closest ideal sets of features. An ideal feature set is such that it can determine classes uniquely and correctly. This way of explanation was just an analogy before this paper. In this paper, we show a way to map arbitrary feature sets of datasets into a common metric space, which is indexed by a real number p with p. Since this determines the distance between an arbitrary pair of feature sets, even if they belong to different datasets, the distance of a feature set to the closest ideal feature set can be used as a feature selection measure. Surprisingly, when p =, the measure is identical to the Bayesian risk, which is probably the feature selection measure that is used the most widely in the literature. For < p, the measure is novel and has significantly different properties from the Bayesian risk. We also investigate the correlation between measurements by these measures and classification accuracy through experiments. As a result, we show that our novel measures with p > exhibit stronger correlation than the Bayesian risk. Introduction Feature selection is indeed one of the central focuses of machine learning research. In this paper, we understand the feature selection problem as follows: Given a dataset, select an appropriately small number of features that faithfully determine the classes of the examples of the dataset. If we could find a feature subset such that the features correctly determine the classes of the examples, it could be an answer we wanted [Almuallim and Dietterich, 994]. Such feature subsets are referred to as reducts in the rough set theory [Pawlak, 99] and as consistent feature sets in this paper [Liu et al., 998]. A dataset, however, does not necessarily include a consistent feature subset, and in such cases, we have to be satisfied with selecting feature subsets that are sufficiently close to being consistent. In other words, we need a measure that can measure the distance of a feature subset to the closest consistent feature set. However, we only know two necessary conditions for such a measure, namely, determinacy and monotonicity: Determinacy corresponds to the identity of indiscernibles of the axioms of metrics and requires that the measurement is zero, if, and only if, the feature set is consistent; Monotonicity, on the other hand, requires that the measurement of an arbitrary feature set is no greater than the measurement of its subset. When a measure has the determinacy and monotonicity properties, we also say that it is determinant and monotonous. With a determinant and monotonous measure, a feature selection algorithm attempts to find feature sets whose measurements are close to zero. Generally, the feature selection measures proposed in the literature are categorized into two groups. One consists of measures that simply evaluate dependence between two features [Mengle and Goharian, 29; Bolón-Canedo et al., 2; Foithong et al., 22; Suzuki and Sugiyama, 23; Wang, 25]. The framework of mrmr [Peng et al., 25] takes advantage of measures of this group and aims to select a feature set that maximizes the difference of the relevance from the redundancy of the feature set: The relevance is the collective dependence of the feature set to class labels, while the redundancy is the the internal mutual dependence of the features of the feature set. On the other hand, the other consists of determinant and monotonous measures [Liu et al., 998; Shin and Xu, 29; Arauzo-Azofra et al., 28], and 4 of the 7 feature selection measures studied in [Molina et al., 22] are known to belong to this group [Shin et al., 2]. In this paper, we are interested in the latter group. Example Bayesian risk, also known as inconsistency rate [Liu et al., 998], is defined as follows. µ D br (S) = ( ) p(s = x) max p(s = x, C = c). x Ω S S is a feature subset of a dataset D, and C is the random variable to represent classes. Ω S and Ω C are the sample spaces of S and C, and p is the empirical probability distribution of D. This measure is, indeed, determinant and monotonous. Example 2 Zhao et al. [Zhao and Liu, 27] proposed Interact, a feature selection algorithm that leverages the Bayes 382

2 risk µ br to evaluate the closeness of feature sets to the state of being consistent. Therefore, Interact selects feature sets with small µ br measurements. Example 3 Shin et al. [Shin and Xu, 29] also proposed the conditional entropy defined by µ D ce (S) = x Ω S p(s = x, C = c) log p(s = x, C = c) p(s = x) as a determinant and monotonous measure. On the other hand, the well-known feature selection algorithm that selects the top n features X with respect to the mutual information I(X; C) actually selects those feature sets that minimize µ D ({X,..., X n }) = n µ D ce ({X i }). i= By using µ D instead of µ ce D, this algorithm is fast but is not very accurate, since µ D is not determinant. When a measure represents the distance of a feature set to the closest consistent feature set in some metric space, we call it distance-based. About distance-based measures, we only knew two necessary conditions, that is, determinacy and monotonicity, and hence, the distance-based measure was only an analogy to explain the desirable natures of feature selection measures. This paper changes this. In this paper, we present a way to identify a feature subset of a dataset uniquely with a point in a common metric space. The metric space is parameterized by a real number p [, ], and hence, we have infinitely many such metric spaces. In the metric space, consistent feature sets form a closed subspace, and hence, we can determine the distance of an arbitrary feature set to the closest consistent feature set. This is nothing other than a distance-based measure. A different values of p determines a different metric function. For p =, we show that the measure is identical to the wellknown Bayesian risk. On the other hand, for p >, the measure is novel and unknown in the literature. Also, through experiments, we show that measurements by our novel distancebased measures correlate with classification accuracy. 2 Formulating feature selection measures In this paper, we do not discriminate between features and random variables. Furthermore, for a dataset D, we use the following notations. F D is the entire set of features of D. F D is finite and includes the feature C D that represents classes. Also, we denote F D \ {C D } by F D. Ω F D = X F D Ω X is the entire sample space, where Ω X is the sample space of an individual feature X. For a feature set S, Ω S means the Cartesian product X S Ω X, which is the sample space of S. p D : Ω F D Q is the empirical probability distribution. We also denote D by the triplet (F D, Ω F D, p D ). For S F D, D S is the dataset derived from D by eliminating the features in F D \ S. For S F D, D S is the dataset derived from D by replacing the features of S with a single feature S such that Ω S = Ω S. The values for S of an example in D are replaced with the vector consisting of the values. Example 4 For the dataset D determined by (a) below, we have F D = {B, R, G, C D } and Ω F D = {A, B, O, AB} {M, N, C, A} {M, F } {p, n}. When S = {R, G}, (b) and (c) determine D S and D S. B (a) D R G C D A M M p A M M n B N M p O C F n AB A M n (b) D S R G C D M M p M M n N M p C F n A M n (c) D S B S C D A (M, M) p A (M, M) n B (N, M) p O (C, F) n AB (A, M) n On the other hand, a feature selection measure µ is formulated as a family of µ D indexed by datasets D, and each µ D is a real-valued function defined over the power set P(F D ) of F D. In this paper, we assume that a feature selection measure supports the following three requirements, which all of the 9 measures surveyed in [Shin et al., 2] support as well. Requirement (Naming invariance) Measurements by the measure are invariant to renaming of features and feature values. This also implies that the measure assumes that every feature is categorical. Requirement 2 (Localization invariance) If T S, µ D (T ) = µ D S (T ) holds. Requirement 3 (Feature aggregation invariance) If T S, µ D (T ) = µ D S ((T \ S) { S}) holds. Example 5 Bayesian risk µ br has all of these invariance properties: µ D br (F D ) = µ E br (F E ) = 5 indicates the naming invariance; For S = {R, G}, µ D br ({G}) D and µ S br ({G}) are identical to 2 5, and E F F 2 F 3 C D D µ S br ({B, S}) = 5 holds These indicate the localization and feature aggregation invariance properties. In this paper, we will identify (D, S) and (E, T ), pairs of a dataset and a feature subset, if µ D (S) = µ E (T ) holds for any measure µ that supports Requirements, 2 and 3. For this, we introduce the notion of coherent mappings and then prove Proposition. Definition A mapping ϕ : F D F E is coherent between D and E, if, and only if, there exist injections v Y : Ω ϕ (Y ) Ω Y for Y Im(ϕ) and an injection v CE : Ω CD Ω CE such that p D (x) = p E (y), where v π = Y Im(ϕ) {C E } y:y Im(ϕ) {CE }=v π(x) v Y : Ω F D Ω Im(ϕ) {CE }. 383

3 Example 6 We consider the datasets D of Example 4 and E below. When we define ϕ by ϕ(b) = F and ϕ(r) = ϕ(g) = F 2, ϕ is coherent. To show this, we determine v F and v F2 as below and v CE by v CE (p) = and v CE (n) = 2. For example, p D (A, M, M, p) = 5 and p E(,,, ) + p E (,, 3, ) = + = 5 hold. E F F 2 F 3 C E F F 2 F 3 C E v F v F2 A (M, M) (C, M) 5 B 2 (M, F) 2 (C, F) 6 O 3 (N, M) 3 (A, M) 7 AB 4 (N, F) 4 (A, F) 8 Proposition If a feature selection measure µ supports Requirement to 3, µ E (S) = µ D (ϕ (S)) holds for an arbitrary coherent mapping ϕ : F D F E between D and E and an arbitrary S Im(ϕ). Proof. We let ϕ = ϕ ϕ such that ϕ : F D Im(ϕ) and ϕ : Im(ϕ) F E. Since ϕ yields a sequence of feature aggregations that convert D into E Im(ϕ) up to renaming of features and feature values, µ E (S) = µ E Im(ϕ) (S) = µ D (ϕ (S)) follows from Requirement to 3. Example 7 For the same D, E and ϕ as Example 6, µ br E ({F, F 2 }) = µ br D ({B, R, G}) = µ br E ({F }) = µ br D ({B}) = µ br E ({F 2 }) = µ br D ({R, G}) = 5 holds. 3 Projecting a feature set of a dataset into a subspace P + R of the l p space (p [, ]) We let n N { }. We introduce the space P n into which feature sets of datasets are projected. We start with preparation. Let K be either the rational number field Q or as the real number field R. We define P K and P + K follows, which are sets of functions p such that p : N 2 K. Definition 2 We define P K and P + K by: { P } K = p p(i, j), p(i, j) = ; { P + K = p p(i, j), i= j= i= j= } p(i, j). We can identify N 2 with N (for example, f : (i, j) (i+j )(i+j 2) 2 + i is bijective), and therefore, P R and P + R can be viewed as subspaces of l p for p. If we focus on datasets that include at most n classes, we can use P n K and P + n K instead. Definition 3 We define P n K and P + n K by: P n K = { p P } K j > n p(i, j) = ; P + n { K = p P + } j K > n p(i, j) = ; P n R and P + n R are metric spaces with the metric derived from the norm p of l p. P n R is closed in l p only when p =, while P + n R is the closure of n PR in l p for < p. Therefore, P n R in l and P + n R in l p for < p are complete, since l p n is a Banach space. Although P R and P + n R are bounded, none of them is compact. We define P n, a subspace of P n R, as follows. Definition 4 We let supp(p) be the smallest Ω U Ω C such that Ω U N, Ω C N and Ω U Ω C Supp(p) = {(i, j) p(i, j) > }. We define P and P n for n N by: P = {p P Q supp(p) < }; P n = {p P j > n p(i, j) = }. Example 8 When we define p by p(i, i) = ( r)r i for r (, ), p P R holds, because i= ( r)ri = r ( r) lim n n r =. Furthermore, if r Q, p P Q holds. For this p, we have Ω U = N, Ω C = N and supp(p) = {(i, i) i N}. Hence, p / P. For n N { }, P n turns out to be dense in P R n. For p P n, D p = ({U, C}, N N, p) is not a dataset, because N N is infinite. Nevertheless, we can think of a coherent mapping ϕ : F D {U} between a dataset D and D p. Theorem shows how to project a feature set S of a dataset D into P n. Theorem We let D be a dataset with n classes and S F D. There exists p P n such that ϕ : S {U} is coherent between D S and D p. Conversely, for arbitrary p P n, there exists a dataset D with n classes such that ϕ : F D {U} is coherent between D and D p. Proof. Let v : Ω S N and v 2 : Ω CD {,..., n} be injective. ( It suffices to define p P n by p(i, j) = p D S v (i), v 2 (j)) if (i, j) Im(v v 2 ) and p(i, j) = otherwise. To show the converse, we have only to let D = ({U, C}, supp(p), p). Example 9 To project D S of Example 4 into P 2, we determine p P 2 by p(, ) = p(, 2) = p(3, ) = p(6, 2) = p(7, 4) = 5 and p(i, j) = otherwise. When we determine v U and v C identical to v F2 and v CE of Example 6, ϕ : S {U} is coherent between D D and D p, and hence, D D is projected to p. Example A single D S can be projected to infinitely many points in P n. For example, we let D S, p and v U be the same as Example 9. For an arbitrary bijection π : N N, we define p π by p π (π(i), j) = p(i, j). Apparently, D S projects to p π with π v U. 384

4 Example More than one datasets are projected to a single point in P n. For example, D S of Example 4 and E {F2} of Example 6 are both projected to any of p π of Example. To be precise, a single point in P n has infinitely many datasets that project to the point. Based on Proposition and Theorem, Theorem 2 asserts that a feature selection measure can be uniquely viewed as a real-valued function defined over P n. Theorem 2 For a feature selection measure µ, ˆµ : P R uniquely exists and ˆµ(p) = µ D (S) holds for arbitrary D, S and p such that ϕ : S {U} is coherent between D S and D p. Proof. We let Dp be ({U, C}, supp(p), p) for p P. We determine ˆµ(p) by µ D p ({U}). For a dataset D such that ϕ : S {U} is coherent between D S and D p with v U : Ω S N and v C : Ω CD N, we let D p # = ({U, C}, Im(v U ) Im(v C ), p). ϕ : S {U} is coherent between D S and D p #, and the inclusion supp(p) Im(v U ) Im(v C ) yields coherence between Dp and D p #. Therefore, µ D (S) = µ D# p ({U}) = µ D p ({U}) = ˆµ(p). Example 2 Datasets that are projected to the same point in P n have the same measurement for an arbitrary feature selection measure µ. For example, as seen in Example, D S of Example 4 and E {F2} of Example 6 are both projected to the same point P 2. On the other hand, µ D br (S) = µ E br ({F 2 }) = 5 holds as seen in Example 7. Therefore, by letting ˆµ(p) = µ D (S) for D S that projects to p, ˆµ is well defined. The problem of the projection determined by Theorem consists in the fact that a single pair (D, S) is mapped to an infinite number of different points in P n (Example ). In the next section, we will solve this problem. 4 Defining a quotient space Q + R of P + R Our idea to solve the problem is to introduce an equivalence relation that unifies points of P n that are the images of the same single pair (D, S) and then use the resulting quotient space. The relation defined below is an equivalence relation. We assume n N { }. Definition 5 Let p and q be in P + R. We define p q, if, and only if, there exist bijections v U : N N and v C : N N such that p(i, j) = q(v U (i), v C (j)). Example 3 p π p holds for p π and π of Example. Definition 6 We let Q + R = P + R /. Moreover, for the canonical projection π : P + R Q + R, we let Q + n R = π(p + n R ), n QR = π(p n R ) and Q n = π(p n ). Proposition 2 is easy to see but plays a crucial rule to prove Theorem 3 and 4. Proposition 2 For p and q in P, the following are equivalent.. p q. 2. There exists (D, S) such that ϕ : S {U} is coherent between D S and D p and between D S and D q. 3. ϕ : S {U} is coherent between D S and D p, if, and only if, it is coherent between D S and D q. Theorem 3 and 4 follow from Theorem, Theorem 2 and Proposition 2. In the following, [p] denotes π(p), and [D, S] denotes the image of (D, S) in Q. ϕ : S {U} is coherent between D S and Dp Theorem 3 We let D be a dataset with n classes and S F D. There uniquely exists [p] Q n such that ϕ : S {U} is coherent between D S and D p. Conversely, for arbitrary [p] Q n, there exists a dataset D with n classes such that ϕ : F D {U} is coherent between D and D p. [D, S] (D, S) π P n Q n Theorem 4 For a feature selection measure µ, there uniquely exists [ˆµ] : Q R such that [ˆµ]([D, S]) = µ D (S) holds for any dataset D and S F D. 5 Introducing a metric d p into Q + R Although a quotient of a metric space is not always a metric space, we can derive a metric into Q + R from l p as follows. Definition 7 For [p] and [q] in Q + R, we define d p ([p], [q]) = inf{ p q p p p, q q }. Theorem 5 For p, d p is a metric over Q + R. Proof. We only prove the triangle inequality d p ([p], [r]) d p ([p], [q]) + d p ([q], [r]) taking advantage of Lemma. Lemma For p p and q in P + R, there exists q q such that p q p = p q p. For ε >, we assume p q p d p ([p], [q]) < ε 2 and q r p d p ([q], [r]) < ε 2 for q q. By Lemma, we have q r p = q r p and d p ([p], [r]) p r p p q p + q r p < d p ([p], [q]) + d p ([q], [r]) + ε. 6 Deriving a measure µ l p from d p We see that the minimum of d p distance from [D, S] to a consistent feature set determines a feature [D, S] selection measure. We first define consistent points of P + R. π(c ) Q n l p µ l p([d, S]) Definition 8 p P + R is consistent, if, and only if, there exists j : N N such that p(i, j) = if j j(i). 385

5 C + R denotes the entire set of consistent p, and we let C + n R = C + R P + n R, n CR = C + R n PR and C n = C + R n P for n N { }. Definition 9 For j : N N such that j(i) argmax{p(i, j) j N} and p P, we define: For p <, µ l p(p) = p p(i, j) p p(i, j(i)) p ; i N For p =, µ l (p) = max { p(i, j) (i, j) N 2, (i, j) (i, j(i)) }. Surprisingly, µ l is identical to the Bayesian risk. Proposition 3 If ϕ : S U is coherent between D S and D p, µ br D (S) = µ l (p) holds. Proof. We assume that v U : Ω S N and v C : Ω C N yield the coherence between D S and D p. µ D br (S) = ( ) p D S (S = x) max p D S (S = x, C = c) x Ω S = ( x Ω S = µ l (p). p(v U (x), v C (c)) max p(v U (x), v C (c)) Theorem 6 For p P, the following holds.. µ l (p) = 2 min{d ([p], [q]) q C }. 2. For < p, µ l p(p) = inf{d p ([p], [q]) q C } = min{d p ([p], [q]) q C + R supp(q) < }. Proof. Here, we prove only. For q C, we assume j : N N such that q(i, j) = if j j(i). Also, we let j : N N satisfy j(i) argmax{p(i, j) j N}. We first fix i N. p(i, j) q(i, j) = p(i, j(i)) q(i, j(i)) + \{j(i)} p(i, j) p(i, j(i)) q(i, j(i)) + p(i, j) max p(i, j). ) Then, we sum up the both sides across all i N. p q p(i, j(i)) q(i, j(i)) i N i N + i N p(i, j) max{p(i, j) j N} q(i, j(i)) i N p(i, j(i)) + µ l (p) 2 µ l (p). On the other hand, for q C such that q(i, j) = p(i, j) if j = j(i) and otherwise, we have p q = 2 ( i N max{p(i, j) j N} As q C R if q q, Lemma implies the assertion. Determinacy of µ l p immediately follows from Theorem 6. Also, it is easy to show monotonicity of µ l p. Furthermore, µ l p turns out continuous under d q for arbitrary p, q. 7 Contrasting d and d p with p > Proposition 3 asserts that µ l is identical to the Bayesian risk, which is well known in the literature. On the other hand, µ l p with p (, ] is a novel measure unknown in the literature. Also, as a metric function, d and d p with p (, ] are significantly different as stated below without proof. Q R is complete under d, while Q + R is complete under d p for p (, ]. Q R n is not compact under d, while Q + R n is compact under d p for p (, ]. Q n is dense in Q R n under d, while it is dense in Q + R n under dp for p (, ]. Moreover, in Section 8, we will see that µ l and µ l p for p (, ] have different properties in terms of correlation between their measurements and classification accuracy. Although we cannot explain the reason for this difference, we imagine that it is related with the aforementioned difference between d and d p with p (, ] as a metric function. 8 Correlation between measurements by µ l p and classification accuracy We evidently expect that there exists strong correlation between measurements by a good measure and classification accuracy. From this point of view, we ran experiments with µ l p for p =, 2,..., 5. In particular, we focus on comparison between p = and p > : µ l is identical to the Bayesian risk and is the feature selection measure that is used the most widely in the literature; On the other hand, the other measures are novel and introduced in this paper for the first time. ). 386

6 Table : Datasets NAME #FEAT. #EXAM. #CLASSES ARRHYTHMIA AUDIOLOGY MFEAT-FACTOR 26 2 MFEAT-FOURIER 76 2 MFEAT-KARHUNEN 64 2 MFEAT-PIXEL 24 2 MFEAT-ZERNIKE 47 2 MUSK OPTIDIGITS SONAR SPAMBASE SPECTROMETER Datasets Table shows the datasets that we use in our experiments as well as their important attributes, namely, the numbers of features, examples and classes. All of the datasets are obtained from the UCI repository of machine learning databases [Blake and Merz, 998]. 8.2 Methods The following are the steps of our experiments. Sampling feature sets For each dataset of Table, 6 feature sets are selected at random. The sizes of the selected feature sets also vary at random. In total, we obtain 6 2 = 72 pairs of a feature set and a dataset. Localizing datasets For each pair of a feature set and a dataset, we localize the dataset by eliminating all of the features that do not belong to the relevant feature set. Consequently, we obtain 72 localized datasets. Measuring accuracy of classification We apply three classifiers, namely, Naïve Bayes, C4.5 and SVM, to each of the localized datasets in the manner of -fold cross validation and record the averaged AUC-ROC scores, which we use as classification accuracy scores. 8.3 Results Figure, 2 and 3 display scatter plots of the results of our experiments with Naïve Bayes, C4.5 and SVM, respectively. The x-axis of each chart represents measurements by a measure out of (a) µ l, (b) µ l 2, (c) µ l 3, (d) µ l 4 and (e) µ l 5, while the y-axis does the averaged AUC-ROC scores. From the charts, we have the impression that µ l p with p > shows stronger negative correlation with classification accuracy than µ l. We look into this more closely in the next subsection. 8.4 Analysis on correlation To compare µ l and µ l p with p > in terms of correlation with classification accuracy, we introduce a function P t (x) as an index. We let N x be the number of plots whose µ l p distances fall within [x, x +.) and N t,x be the number of plots whose AUC-ROC scores exceed t and µ l p distances fall within [x, x+.). Then, we define P t (x) by P t (x) = Nt,x N x. Intuitively, P t (x) approximates the probability of the case that the classification accuracy exceeds t when the measurement by the relevant measure is x. Figure 4, 5 and 6 show the curves of P t (x) for Naïve Bayes, C4.5 and SVM. The value of t varies in {.95,.9,.85,.8}. Since P t (x) P t (x) holds for t < t, the curve for t is located above the curve for t. We observe two properties from the charts. The curves for µ l p for p > are akin to one another in shape, while the curves for µ l appear significantly different from the other measures. The curves for µ l p with p > exhibits clearer negative correlation between measurements by the measure and classification accuracy than the curves for µ l. In fact, the table below presents the correlation coefficients between the measurements x and the values of P t (x). The presented values also support the observation stated above. 9 Conclusion t = NAÏVE BAYES µ l µ l µ l µ l µ l C4.5 µ l µ l µ l µ l µ l SVM µ l µ l µ l µ l µ l We have introduced a family of feature selection measures { µ l p p [, ]} that are derived from the distance functions of some metric spaces. Although µ l turns out to be identical to the Bayesian risk, µ l p for p (, ] are novel. Through experiments, we have shown that µ l p with p > exhibits clearer negative correlation between its measurements and classification accuracy than µ l does. 387

7 . (a) µ l (b) µ l 2 (c) µ l 3 (d) µ l 4 (e) µ l BR DST2 DST3 DST4 DST Distance Distance Distance Distance Distance.6.8. Figure : Scatter plots of the experimental results (Naïve Bayes). BR. DST2. DST3. DST4. DST Distance Distance Distance Distance Distance.6.8. Figure 2: Scatter plots of the experimental results (C4.5). BR. DST2. DST3. DST4. DST Distance Distance Distance Distance Distance.6.8. t=.95 t=.9 t=.85 t=.8 Figure 3: Scatter plots of the experimental results (SVM) t=.95 t=.9 t=.85 t=.8 t=.95 t=.9 t=.85 t=.8 t=.95 t=.9 t=.85 t=.8 t=.95 t=.9 t=.85 t=.8 Precision- NB- BR Precision- NB- DST2 Precision- NB- DST3 Precision- NB- DST4 Precision- NB- DST x x x x x t=.95 t=.9 t=.85 t=.8 Figure 4: The curves of P t (x) for t {.95,.9,.85,.8} (Naïve Bayes) t=.95 t=.9 t=.85 t=.8 t=.95 t=.9 t=.85 t=.8 t=.95 t=.9 t=.85 t=.8 t=.95 t=.9 t=.85 t=.8 Precision- C4.5- BR Precision- C4.5- DST2 Precision- C4.5- DST3 Precision- C4.5- DST4 Precision- C4.5- DST x x x x x t=.95 t=.9 t=.85 t=.8 Figure 5: The curves of P t (x) for t {.95,.9,.85,.8} (C4.5) t=.95 t=.9 t=.85 t=.8 t=.95 t=.9 t=.85 t=.8 t=.95 t=.9 t=.85 t=.8 t=.95 t=.9 t=.85 t=.8 Precision- SVM- BR Precision- SVM- DST2 Precision- SVM- DST3 Precision- SVM- DST4 Precision- SVM- DST x x x x x Figure 6: The curves of P t (x) for t {.95,.9,.85,.8} (SVM)

8 References [Almuallim and Dietterich, 994] H. Almuallim and T. G. Dietterich. Learning boolean concepts in the presence of many irrelevant features. Artificial Intelligence, 69( - 2), 994. [Arauzo-Azofra et al., 28] A. Arauzo-Azofra, J. M. Benitez, and J. L Castro. Consistency measures for feature selection. Journal of Intelligent Information Systems, 3(3): , November 28. [Blake and Merz, 998] C. S. Blake and C. J. Merz. UCI repository of machine learning databases. Technical report, University of California, Irvine, 998. [Bolón-Canedo et al., 2] V. Bolón-Canedo, S. Seth, and N. Sánchez-Maro no. Statistical dependence measure for feature selection in microarray datasets. In ESANN 2, pages 23 28, 2. [Foithong et al., 22] S. Foithong, O. Pinngern, and B. Attachoo. Feature subset selection wrapper based on mutual information and rough sets. Expert Systems with Applications, 39(): , 22. [Liu et al., 998] H. Liu, H. Motoda, and M. Dash. A monotonic measure for optimal feature selection. In Proceedings of European Conference on Machine Learning, 998. [Mengle and Goharian, 29] S.S.R. Mengle and N. Goharian. Ambiguity measure feature-selection algorithm. Journal of the American Society for Information Science and Technology, 6(5):37 5, 29. [Molina et al., 22] L. Molina, L. Belanche, and A. Nebot. Feature selection algorithms: A survey and experimental evaluation. In Proceedings of IEEE International Conference on Data Mining, pages 36 33, 22. [Pawlak, 99] Z. Pawlak. Rough Sets, Theoretical aspects of reasoning about data. Kluwer Academic Publishers, 99. [Peng et al., 25] H. Peng, F. Long, and C. Ding. Feature selection based on mutual information: Criteria of maxdependency, max-relevance and min-redundancy. IEEE Transaction on Pattern Analysis and Machine Intelligence, 27(8), August 25. [Shin and Xu, 29] K. Shin and X.M. Xu. Consistencybased feature selection. In 3th International Conferece on Knowledge-Based and Intelligent Information & Engineering System, 29. [Shin et al., 2] K. Shin, D. Fernandes, and S. Miyazaki. Consistency measures for feature selection: A formal definition, relative sensitivity comparison, and a fast algorithm. In 22nd International Joint Conference on Artificial Intelligence, pages , 2. [Suzuki and Sugiyama, 23] T. Suzuki and M. Sugiyama. Sufficient dimension reduction via squared-loss mutual information estimation. Neural Computation, 25(3): , 23. [Wang, 25] L. Wang. Feature selection algorithm based on conditional dynamic mutual information. International Journal on Smart Sensing and Intelligent Systems, 8(), 25. [Zhao and Liu, 27] Z. Zhao and H. Liu. Searching for interacting features. In Proceedings of International Joint Conference on Artificial Intelligence, pages 56 6,

ENTROPIES OF FUZZY INDISCERNIBILITY RELATION AND ITS OPERATIONS

ENTROPIES OF FUZZY INDISCERNIBILITY RELATION AND ITS OPERATIONS International Journal of Uncertainty Fuzziness and Knowledge-Based Systems World Scientific ublishing Company ENTOIES OF FUZZY INDISCENIBILITY ELATION AND ITS OEATIONS QINGUA U and DAEN YU arbin Institute

More information

On Multi-Class Cost-Sensitive Learning

On Multi-Class Cost-Sensitive Learning On Multi-Class Cost-Sensitive Learning Zhi-Hua Zhou and Xu-Ying Liu National Laboratory for Novel Software Technology Nanjing University, Nanjing 210093, China {zhouzh, liuxy}@lamda.nju.edu.cn Abstract

More information

Iterative Laplacian Score for Feature Selection

Iterative Laplacian Score for Feature Selection Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,

More information

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory Part V 7 Introduction: What are measures and why measurable sets Lebesgue Integration Theory Definition 7. (Preliminary). A measure on a set is a function :2 [ ] such that. () = 2. If { } = is a finite

More information

Part III. 10 Topological Space Basics. Topological Spaces

Part III. 10 Topological Space Basics. Topological Spaces Part III 10 Topological Space Basics Topological Spaces Using the metric space results above as motivation we will axiomatize the notion of being an open set to more general settings. Definition 10.1.

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

Visualize Biological Database for Protein in Homosapiens Using Classification Searching Models

Visualize Biological Database for Protein in Homosapiens Using Classification Searching Models International Journal of Computational Intelligence Research ISSN 0973-1873 Volume 13, Number 2 (2017), pp. 213-224 Research India Publications http://www.ripublication.com Visualize Biological Database

More information

Naive Bayesian Rough Sets

Naive Bayesian Rough Sets Naive Bayesian Rough Sets Yiyu Yao and Bing Zhou Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 {yyao,zhou200b}@cs.uregina.ca Abstract. A naive Bayesian classifier

More information

Course 212: Academic Year Section 1: Metric Spaces

Course 212: Academic Year Section 1: Metric Spaces Course 212: Academic Year 1991-2 Section 1: Metric Spaces D. R. Wilkins Contents 1 Metric Spaces 3 1.1 Distance Functions and Metric Spaces............. 3 1.2 Convergence and Continuity in Metric Spaces.........

More information

Some Background Material

Some Background Material Chapter 1 Some Background Material In the first chapter, we present a quick review of elementary - but important - material as a way of dipping our toes in the water. This chapter also introduces important

More information

Classification Based on Logical Concept Analysis

Classification Based on Logical Concept Analysis Classification Based on Logical Concept Analysis Yan Zhao and Yiyu Yao Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada S4S 0A2 E-mail: {yanzhao, yyao}@cs.uregina.ca Abstract.

More information

Mapping kernels defined over countably infinite mapping systems and their application

Mapping kernels defined over countably infinite mapping systems and their application Journal of Machine Learning Research 20 (2011) 367 382 Asian Conference on Machine Learning Mapping kernels defined over countably infinite mapping systems and their application Kilho Shin yshin@ai.u-hyogo.ac.jp

More information

Postulate 2 [Order Axioms] in WRW the usual rules for inequalities

Postulate 2 [Order Axioms] in WRW the usual rules for inequalities Number Systems N 1,2,3,... the positive integers Z 3, 2, 1,0,1,2,3,... the integers Q p q : p,q Z with q 0 the rational numbers R {numbers expressible by finite or unending decimal expansions} makes sense

More information

An analysis of data characteristics that affect naive Bayes performance

An analysis of data characteristics that affect naive Bayes performance An analysis of data characteristics that affect naive Bayes performance Irina Rish Joseph Hellerstein Jayram Thathachar IBM T.J. Watson Research Center 3 Saw Mill River Road, Hawthorne, NY 1532 RISH@US.IBM.CM

More information

Week 2: Sequences and Series

Week 2: Sequences and Series QF0: Quantitative Finance August 29, 207 Week 2: Sequences and Series Facilitator: Christopher Ting AY 207/208 Mathematicians have tried in vain to this day to discover some order in the sequence of prime

More information

Solving Classification Problems By Knowledge Sets

Solving Classification Problems By Knowledge Sets Solving Classification Problems By Knowledge Sets Marcin Orchel a, a Department of Computer Science, AGH University of Science and Technology, Al. A. Mickiewicza 30, 30-059 Kraków, Poland Abstract We propose

More information

Fast Rates for Estimation Error and Oracle Inequalities for Model Selection

Fast Rates for Estimation Error and Oracle Inequalities for Model Selection Fast Rates for Estimation Error and Oracle Inequalities for Model Selection Peter L. Bartlett Computer Science Division and Department of Statistics University of California, Berkeley bartlett@cs.berkeley.edu

More information

Definition 2.1. A metric (or distance function) defined on a non-empty set X is a function d: X X R that satisfies: For all x, y, and z in X :

Definition 2.1. A metric (or distance function) defined on a non-empty set X is a function d: X X R that satisfies: For all x, y, and z in X : MATH 337 Metric Spaces Dr. Neal, WKU Let X be a non-empty set. The elements of X shall be called points. We shall define the general means of determining the distance between two points. Throughout we

More information

GEOMETRIC CONSTRUCTIONS AND ALGEBRAIC FIELD EXTENSIONS

GEOMETRIC CONSTRUCTIONS AND ALGEBRAIC FIELD EXTENSIONS GEOMETRIC CONSTRUCTIONS AND ALGEBRAIC FIELD EXTENSIONS JENNY WANG Abstract. In this paper, we study field extensions obtained by polynomial rings and maximal ideals in order to determine whether solutions

More information

Exercises to Applied Functional Analysis

Exercises to Applied Functional Analysis Exercises to Applied Functional Analysis Exercises to Lecture 1 Here are some exercises about metric spaces. Some of the solutions can be found in my own additional lecture notes on Blackboard, as the

More information

When is undersampling effective in unbalanced classification tasks?

When is undersampling effective in unbalanced classification tasks? When is undersampling effective in unbalanced classification tasks? Andrea Dal Pozzolo, Olivier Caelen, and Gianluca Bontempi 09/09/2015 ECML-PKDD 2015 Porto, Portugal 1/ 23 INTRODUCTION In several binary

More information

5 Measure theory II. (or. lim. Prove the proposition. 5. For fixed F A and φ M define the restriction of φ on F by writing.

5 Measure theory II. (or. lim. Prove the proposition. 5. For fixed F A and φ M define the restriction of φ on F by writing. 5 Measure theory II 1. Charges (signed measures). Let (Ω, A) be a σ -algebra. A map φ: A R is called a charge, (or signed measure or σ -additive set function) if φ = φ(a j ) (5.1) A j for any disjoint

More information

METRIC BASED ATTRIBUTE REDUCTION IN DYNAMIC DECISION TABLES

METRIC BASED ATTRIBUTE REDUCTION IN DYNAMIC DECISION TABLES Annales Univ. Sci. Budapest., Sect. Comp. 42 2014 157 172 METRIC BASED ATTRIBUTE REDUCTION IN DYNAMIC DECISION TABLES János Demetrovics Budapest, Hungary Vu Duc Thi Ha Noi, Viet Nam Nguyen Long Giang Ha

More information

SUPPORT VECTOR MACHINE

SUPPORT VECTOR MACHINE SUPPORT VECTOR MACHINE Mainly based on https://nlp.stanford.edu/ir-book/pdf/15svm.pdf 1 Overview SVM is a huge topic Integration of MMDS, IIR, and Andrew Moore s slides here Our foci: Geometric intuition

More information

POINT SET TOPOLOGY. Definition 2 A set with a topological structure is a topological space (X, O)

POINT SET TOPOLOGY. Definition 2 A set with a topological structure is a topological space (X, O) POINT SET TOPOLOGY Definition 1 A topological structure on a set X is a family O P(X) called open sets and satisfying (O 1 ) O is closed for arbitrary unions (O 2 ) O is closed for finite intersections.

More information

After taking the square and expanding, we get x + y 2 = (x + y) (x + y) = x 2 + 2x y + y 2, inequality in analysis, we obtain.

After taking the square and expanding, we get x + y 2 = (x + y) (x + y) = x 2 + 2x y + y 2, inequality in analysis, we obtain. Lecture 1: August 25 Introduction. Topology grew out of certain questions in geometry and analysis about 100 years ago. As Wikipedia puts it, the motivating insight behind topology is that some geometric

More information

DYNAMICAL CUBES AND A CRITERIA FOR SYSTEMS HAVING PRODUCT EXTENSIONS

DYNAMICAL CUBES AND A CRITERIA FOR SYSTEMS HAVING PRODUCT EXTENSIONS DYNAMICAL CUBES AND A CRITERIA FOR SYSTEMS HAVING PRODUCT EXTENSIONS SEBASTIÁN DONOSO AND WENBO SUN Abstract. For minimal Z 2 -topological dynamical systems, we introduce a cube structure and a variation

More information

II - REAL ANALYSIS. This property gives us a way to extend the notion of content to finite unions of rectangles: we define

II - REAL ANALYSIS. This property gives us a way to extend the notion of content to finite unions of rectangles: we define 1 Measures 1.1 Jordan content in R N II - REAL ANALYSIS Let I be an interval in R. Then its 1-content is defined as c 1 (I) := b a if I is bounded with endpoints a, b. If I is unbounded, we define c 1

More information

Minimal Attribute Space Bias for Attribute Reduction

Minimal Attribute Space Bias for Attribute Reduction Minimal Attribute Space Bias for Attribute Reduction Fan Min, Xianghui Du, Hang Qiu, and Qihe Liu School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu

More information

Combinatorics in Banach space theory Lecture 12

Combinatorics in Banach space theory Lecture 12 Combinatorics in Banach space theory Lecture The next lemma considerably strengthens the assertion of Lemma.6(b). Lemma.9. For every Banach space X and any n N, either all the numbers n b n (X), c n (X)

More information

David Hilbert was old and partly deaf in the nineteen thirties. Yet being a diligent

David Hilbert was old and partly deaf in the nineteen thirties. Yet being a diligent Chapter 5 ddddd dddddd dddddddd ddddddd dddddddd ddddddd Hilbert Space The Euclidean norm is special among all norms defined in R n for being induced by the Euclidean inner product (the dot product). A

More information

Group Decision-Making with Incomplete Fuzzy Linguistic Preference Relations

Group Decision-Making with Incomplete Fuzzy Linguistic Preference Relations Group Decision-Making with Incomplete Fuzzy Linguistic Preference Relations S. Alonso Department of Software Engineering University of Granada, 18071, Granada, Spain; salonso@decsai.ugr.es, F.J. Cabrerizo

More information

Topological vectorspaces

Topological vectorspaces (July 25, 2011) Topological vectorspaces Paul Garrett garrett@math.umn.edu http://www.math.umn.edu/ garrett/ Natural non-fréchet spaces Topological vector spaces Quotients and linear maps More topological

More information

Generative MaxEnt Learning for Multiclass Classification

Generative MaxEnt Learning for Multiclass Classification Generative Maximum Entropy Learning for Multiclass Classification A. Dukkipati, G. Pandey, D. Ghoshdastidar, P. Koley, D. M. V. S. Sriram Dept. of Computer Science and Automation Indian Institute of Science,

More information

Problem Set 2: Solutions Math 201A: Fall 2016

Problem Set 2: Solutions Math 201A: Fall 2016 Problem Set 2: s Math 201A: Fall 2016 Problem 1. (a) Prove that a closed subset of a complete metric space is complete. (b) Prove that a closed subset of a compact metric space is compact. (c) Prove that

More information

Hilbert spaces. 1. Cauchy-Schwarz-Bunyakowsky inequality

Hilbert spaces. 1. Cauchy-Schwarz-Bunyakowsky inequality (October 29, 2016) Hilbert spaces Paul Garrett garrett@math.umn.edu http://www.math.umn.edu/ garrett/ [This document is http://www.math.umn.edu/ garrett/m/fun/notes 2016-17/03 hsp.pdf] Hilbert spaces are

More information

A New Wrapper Method for Feature Subset Selection

A New Wrapper Method for Feature Subset Selection A New Wrapper Method for Feature Subset Selection Noelia Sánchez-Maroño 1 and Amparo Alonso-Betanzos 1 and Enrique Castillo 2 1- University of A Coruña- Department of Computer Science- LIDIA Lab. Faculty

More information

Banach Spaces II: Elementary Banach Space Theory

Banach Spaces II: Elementary Banach Space Theory BS II c Gabriel Nagy Banach Spaces II: Elementary Banach Space Theory Notes from the Functional Analysis Course (Fall 07 - Spring 08) In this section we introduce Banach spaces and examine some of their

More information

Chapter 6: Classification

Chapter 6: Classification Chapter 6: Classification 1) Introduction Classification problem, evaluation of classifiers, prediction 2) Bayesian Classifiers Bayes classifier, naive Bayes classifier, applications 3) Linear discriminant

More information

MA651 Topology. Lecture 9. Compactness 2.

MA651 Topology. Lecture 9. Compactness 2. MA651 Topology. Lecture 9. Compactness 2. This text is based on the following books: Topology by James Dugundgji Fundamental concepts of topology by Peter O Neil Elements of Mathematics: General Topology

More information

Lebesgue Measure on R n

Lebesgue Measure on R n 8 CHAPTER 2 Lebesgue Measure on R n Our goal is to construct a notion of the volume, or Lebesgue measure, of rather general subsets of R n that reduces to the usual volume of elementary geometrical sets

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Topological properties of Z p and Q p and Euclidean models

Topological properties of Z p and Q p and Euclidean models Topological properties of Z p and Q p and Euclidean models Samuel Trautwein, Esther Röder, Giorgio Barozzi November 3, 20 Topology of Q p vs Topology of R Both R and Q p are normed fields and complete

More information

Lebesgue Measure on R n

Lebesgue Measure on R n CHAPTER 2 Lebesgue Measure on R n Our goal is to construct a notion of the volume, or Lebesgue measure, of rather general subsets of R n that reduces to the usual volume of elementary geometrical sets

More information

Easy Categorization of Attributes in Decision Tables Based on Basic Binary Discernibility Matrix

Easy Categorization of Attributes in Decision Tables Based on Basic Binary Discernibility Matrix Easy Categorization of Attributes in Decision Tables Based on Basic Binary Discernibility Matrix Manuel S. Lazo-Cortés 1, José Francisco Martínez-Trinidad 1, Jesús Ariel Carrasco-Ochoa 1, and Guillermo

More information

4 Choice axioms and Baire category theorem

4 Choice axioms and Baire category theorem Tel Aviv University, 2013 Measure and category 30 4 Choice axioms and Baire category theorem 4a Vitali set....................... 30 4b No choice....................... 31 4c Dependent choice..................

More information

Final Examination CS540-2: Introduction to Artificial Intelligence

Final Examination CS540-2: Introduction to Artificial Intelligence Final Examination CS540-2: Introduction to Artificial Intelligence May 9, 2018 LAST NAME: SOLUTIONS FIRST NAME: Directions 1. This exam contains 33 questions worth a total of 100 points 2. Fill in your

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Analysis Finite and Infinite Sets The Real Numbers The Cantor Set

Analysis Finite and Infinite Sets The Real Numbers The Cantor Set Analysis Finite and Infinite Sets Definition. An initial segment is {n N n n 0 }. Definition. A finite set can be put into one-to-one correspondence with an initial segment. The empty set is also considered

More information

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability... Functional Analysis Franck Sueur 2018-2019 Contents 1 Metric spaces 1 1.1 Definitions........................................ 1 1.2 Completeness...................................... 3 1.3 Compactness......................................

More information

Statistical Theory 1

Statistical Theory 1 Statistical Theory 1 Set Theory and Probability Paolo Bautista September 12, 2017 Set Theory We start by defining terms in Set Theory which will be used in the following sections. Definition 1 A set is

More information

Chapter 2 Metric Spaces

Chapter 2 Metric Spaces Chapter 2 Metric Spaces The purpose of this chapter is to present a summary of some basic properties of metric and topological spaces that play an important role in the main body of the book. 2.1 Metrics

More information

I teach myself... Hilbert spaces

I teach myself... Hilbert spaces I teach myself... Hilbert spaces by F.J.Sayas, for MATH 806 November 4, 2015 This document will be growing with the semester. Every in red is for you to justify. Even if we start with the basic definition

More information

The Completion of a Metric Space

The Completion of a Metric Space The Completion of a Metric Space Let (X, d) be a metric space. The goal of these notes is to construct a complete metric space which contains X as a subspace and which is the smallest space with respect

More information

Regret Analysis for Performance Metrics in Multi-Label Classification The Case of Hamming and Subset Zero-One Loss

Regret Analysis for Performance Metrics in Multi-Label Classification The Case of Hamming and Subset Zero-One Loss Regret Analysis for Performance Metrics in Multi-Label Classification The Case of Hamming and Subset Zero-One Loss Krzysztof Dembczyński 1, Willem Waegeman 2, Weiwei Cheng 1, and Eyke Hüllermeier 1 1 Knowledge

More information

Comparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees

Comparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees Comparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees Tomasz Maszczyk and W lodzis law Duch Department of Informatics, Nicolaus Copernicus University Grudzi adzka 5, 87-100 Toruń, Poland

More information

1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3

1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3 Index Page 1 Topology 2 1.1 Definition of a topology 2 1.2 Basis (Base) of a topology 2 1.3 The subspace topology & the product topology on X Y 3 1.4 Basic topology concepts: limit points, closed sets,

More information

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology

More information

AN EXPLORATION OF THE METRIZABILITY OF TOPOLOGICAL SPACES

AN EXPLORATION OF THE METRIZABILITY OF TOPOLOGICAL SPACES AN EXPLORATION OF THE METRIZABILITY OF TOPOLOGICAL SPACES DUSTIN HEDMARK Abstract. A study of the conditions under which a topological space is metrizable, concluding with a proof of the Nagata Smirnov

More information

Multiclass Classification-1

Multiclass Classification-1 CS 446 Machine Learning Fall 2016 Oct 27, 2016 Multiclass Classification Professor: Dan Roth Scribe: C. Cheng Overview Binary to multiclass Multiclass SVM Constraint classification 1 Introduction Multiclass

More information

A Logical Formulation of the Granular Data Model

A Logical Formulation of the Granular Data Model 2008 IEEE International Conference on Data Mining Workshops A Logical Formulation of the Granular Data Model Tuan-Fang Fan Department of Computer Science and Information Engineering National Penghu University

More information

Dreem Challenge report (team Bussanati)

Dreem Challenge report (team Bussanati) Wavelet course, MVA 04-05 Simon Bussy, simon.bussy@gmail.com Antoine Recanati, arecanat@ens-cachan.fr Dreem Challenge report (team Bussanati) Description and specifics of the challenge We worked on the

More information

Stat 451: Solutions to Assignment #1

Stat 451: Solutions to Assignment #1 Stat 451: Solutions to Assignment #1 2.1) By definition, 2 Ω is the set of all subsets of Ω. Therefore, to show that 2 Ω is a σ-algebra we must show that the conditions of the definition σ-algebra are

More information

Boolean Algebras, Boolean Rings and Stone s Representation Theorem

Boolean Algebras, Boolean Rings and Stone s Representation Theorem Boolean Algebras, Boolean Rings and Stone s Representation Theorem Hongtaek Jung December 27, 2017 Abstract This is a part of a supplementary note for a Logic and Set Theory course. The main goal is to

More information

First In-Class Exam Solutions Math 410, Professor David Levermore Monday, 1 October 2018

First In-Class Exam Solutions Math 410, Professor David Levermore Monday, 1 October 2018 First In-Class Exam Solutions Math 40, Professor David Levermore Monday, October 208. [0] Let {b k } k N be a sequence in R and let A be a subset of R. Write the negations of the following assertions.

More information

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata Principles of Pattern Recognition C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata e-mail: murthy@isical.ac.in Pattern Recognition Measurement Space > Feature Space >Decision

More information

The small ball property in Banach spaces (quantitative results)

The small ball property in Banach spaces (quantitative results) The small ball property in Banach spaces (quantitative results) Ehrhard Behrends Abstract A metric space (M, d) is said to have the small ball property (sbp) if for every ε 0 > 0 there exists a sequence

More information

Notions such as convergent sequence and Cauchy sequence make sense for any metric space. Convergent Sequences are Cauchy

Notions such as convergent sequence and Cauchy sequence make sense for any metric space. Convergent Sequences are Cauchy Banach Spaces These notes provide an introduction to Banach spaces, which are complete normed vector spaces. For the purposes of these notes, all vector spaces are assumed to be over the real numbers.

More information

Measures. Chapter Some prerequisites. 1.2 Introduction

Measures. Chapter Some prerequisites. 1.2 Introduction Lecture notes Course Analysis for PhD students Uppsala University, Spring 2018 Rostyslav Kozhan Chapter 1 Measures 1.1 Some prerequisites I will follow closely the textbook Real analysis: Modern Techniques

More information

An Approach to Classification Based on Fuzzy Association Rules

An Approach to Classification Based on Fuzzy Association Rules An Approach to Classification Based on Fuzzy Association Rules Zuoliang Chen, Guoqing Chen School of Economics and Management, Tsinghua University, Beijing 100084, P. R. China Abstract Classification based

More information

Topology. Xiaolong Han. Department of Mathematics, California State University, Northridge, CA 91330, USA address:

Topology. Xiaolong Han. Department of Mathematics, California State University, Northridge, CA 91330, USA  address: Topology Xiaolong Han Department of Mathematics, California State University, Northridge, CA 91330, USA E-mail address: Xiaolong.Han@csun.edu Remark. You are entitled to a reward of 1 point toward a homework

More information

Feature selection and extraction Spectral domain quality estimation Alternatives

Feature selection and extraction Spectral domain quality estimation Alternatives Feature selection and extraction Error estimation Maa-57.3210 Data Classification and Modelling in Remote Sensing Markus Törmä markus.torma@tkk.fi Measurements Preprocessing: Remove random and systematic

More information

ROUGH SETS THEORY AND DATA REDUCTION IN INFORMATION SYSTEMS AND DATA MINING

ROUGH SETS THEORY AND DATA REDUCTION IN INFORMATION SYSTEMS AND DATA MINING ROUGH SETS THEORY AND DATA REDUCTION IN INFORMATION SYSTEMS AND DATA MINING Mofreh Hogo, Miroslav Šnorek CTU in Prague, Departement Of Computer Sciences And Engineering Karlovo Náměstí 13, 121 35 Prague

More information

FORMULATION OF THE LEARNING PROBLEM

FORMULATION OF THE LEARNING PROBLEM FORMULTION OF THE LERNING PROBLEM MIM RGINSKY Now that we have seen an informal statement of the learning problem, as well as acquired some technical tools in the form of concentration inequalities, we

More information

Gene Expression Data Classification with Revised Kernel Partial Least Squares Algorithm

Gene Expression Data Classification with Revised Kernel Partial Least Squares Algorithm Gene Expression Data Classification with Revised Kernel Partial Least Squares Algorithm Zhenqiu Liu, Dechang Chen 2 Department of Computer Science Wayne State University, Market Street, Frederick, MD 273,

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

MATH 54 - TOPOLOGY SUMMER 2015 FINAL EXAMINATION. Problem 1

MATH 54 - TOPOLOGY SUMMER 2015 FINAL EXAMINATION. Problem 1 MATH 54 - TOPOLOGY SUMMER 2015 FINAL EXAMINATION ELEMENTS OF SOLUTION Problem 1 1. Let X be a Hausdorff space and K 1, K 2 disjoint compact subsets of X. Prove that there exist disjoint open sets U 1 and

More information

An Empirical Study of Building Compact Ensembles

An Empirical Study of Building Compact Ensembles An Empirical Study of Building Compact Ensembles Huan Liu, Amit Mandvikar, and Jigar Mody Computer Science & Engineering Arizona State University Tempe, AZ 85281 {huan.liu,amitm,jigar.mody}@asu.edu Abstract.

More information

LECTURE 3 Functional spaces on manifolds

LECTURE 3 Functional spaces on manifolds LECTURE 3 Functional spaces on manifolds The aim of this section is to introduce Sobolev spaces on manifolds (or on vector bundles over manifolds). These will be the Banach spaces of sections we were after

More information

Chapter 1. Measure Spaces. 1.1 Algebras and σ algebras of sets Notation and preliminaries

Chapter 1. Measure Spaces. 1.1 Algebras and σ algebras of sets Notation and preliminaries Chapter 1 Measure Spaces 1.1 Algebras and σ algebras of sets 1.1.1 Notation and preliminaries We shall denote by X a nonempty set, by P(X) the set of all parts (i.e., subsets) of X, and by the empty set.

More information

Module 1. Probability

Module 1. Probability Module 1 Probability 1. Introduction In our daily life we come across many processes whose nature cannot be predicted in advance. Such processes are referred to as random processes. The only way to derive

More information

Measurable Choice Functions

Measurable Choice Functions (January 19, 2013) Measurable Choice Functions Paul Garrett garrett@math.umn.edu http://www.math.umn.edu/ garrett/ [This document is http://www.math.umn.edu/ garrett/m/fun/choice functions.pdf] This note

More information

The Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee

The Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee The Learning Problem and Regularization 9.520 Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing

More information

On minimal models of the Region Connection Calculus

On minimal models of the Region Connection Calculus Fundamenta Informaticae 69 (2006) 1 20 1 IOS Press On minimal models of the Region Connection Calculus Lirong Xia State Key Laboratory of Intelligent Technology and Systems Department of Computer Science

More information

Lecture 9 Metric spaces. The contraction fixed point theorem. The implicit function theorem. The existence of solutions to differenti. equations.

Lecture 9 Metric spaces. The contraction fixed point theorem. The implicit function theorem. The existence of solutions to differenti. equations. Lecture 9 Metric spaces. The contraction fixed point theorem. The implicit function theorem. The existence of solutions to differential equations. 1 Metric spaces 2 Completeness and completion. 3 The contraction

More information

The Decision List Machine

The Decision List Machine The Decision List Machine Marina Sokolova SITE, University of Ottawa Ottawa, Ont. Canada,K1N-6N5 sokolova@site.uottawa.ca Nathalie Japkowicz SITE, University of Ottawa Ottawa, Ont. Canada,K1N-6N5 nat@site.uottawa.ca

More information

On Improving the k-means Algorithm to Classify Unclassified Patterns

On Improving the k-means Algorithm to Classify Unclassified Patterns On Improving the k-means Algorithm to Classify Unclassified Patterns Mohamed M. Rizk 1, Safar Mohamed Safar Alghamdi 2 1 Mathematics & Statistics Department, Faculty of Science, Taif University, Taif,

More information

Support Vector Machines with Example Dependent Costs

Support Vector Machines with Example Dependent Costs Support Vector Machines with Example Dependent Costs Ulf Brefeld, Peter Geibel, and Fritz Wysotzki TU Berlin, Fak. IV, ISTI, AI Group, Sekr. FR5-8 Franklinstr. 8/9, D-0587 Berlin, Germany Email {geibel

More information

Additive Preference Model with Piecewise Linear Components Resulting from Dominance-based Rough Set Approximations

Additive Preference Model with Piecewise Linear Components Resulting from Dominance-based Rough Set Approximations Additive Preference Model with Piecewise Linear Components Resulting from Dominance-based Rough Set Approximations Krzysztof Dembczyński, Wojciech Kotłowski, and Roman Słowiński,2 Institute of Computing

More information

Linear Classifiers as Pattern Detectors

Linear Classifiers as Pattern Detectors Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2014/2015 Lesson 16 8 April 2015 Contents Linear Classifiers as Pattern Detectors Notation...2 Linear

More information

Unsupervised Learning with Permuted Data

Unsupervised Learning with Permuted Data Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University

More information

Chapter 4. Measure Theory. 1. Measure Spaces

Chapter 4. Measure Theory. 1. Measure Spaces Chapter 4. Measure Theory 1. Measure Spaces Let X be a nonempty set. A collection S of subsets of X is said to be an algebra on X if S has the following properties: 1. X S; 2. if A S, then A c S; 3. if

More information

Optimization Models for Detection of Patterns in Data

Optimization Models for Detection of Patterns in Data Optimization Models for Detection of Patterns in Data Igor Masich, Lev Kazakovtsev, and Alena Stupina Siberian State University of Science and Technology, Krasnoyarsk, Russia is.masich@gmail.com Abstract.

More information

Exercises for Unit VI (Infinite constructions in set theory)

Exercises for Unit VI (Infinite constructions in set theory) Exercises for Unit VI (Infinite constructions in set theory) VI.1 : Indexed families and set theoretic operations (Halmos, 4, 8 9; Lipschutz, 5.3 5.4) Lipschutz : 5.3 5.6, 5.29 5.32, 9.14 1. Generalize

More information

(c) For each α R \ {0}, the mapping x αx is a homeomorphism of X.

(c) For each α R \ {0}, the mapping x αx is a homeomorphism of X. A short account of topological vector spaces Normed spaces, and especially Banach spaces, are basic ambient spaces in Infinite- Dimensional Analysis. However, there are situations in which it is necessary

More information

Discriminative Direction for Kernel Classifiers

Discriminative Direction for Kernel Classifiers Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

Notes for Functional Analysis

Notes for Functional Analysis Notes for Functional Analysis Wang Zuoqin (typed by Xiyu Zhai) November 6, 2015 1 Lecture 18 1.1 The convex hull Let X be any vector space, and E X a subset. Definition 1.1. The convex hull of E is the

More information

HILBERT SPACES AND THE RADON-NIKODYM THEOREM. where the bar in the first equation denotes complex conjugation. In either case, for any x V define

HILBERT SPACES AND THE RADON-NIKODYM THEOREM. where the bar in the first equation denotes complex conjugation. In either case, for any x V define HILBERT SPACES AND THE RADON-NIKODYM THEOREM STEVEN P. LALLEY 1. DEFINITIONS Definition 1. A real inner product space is a real vector space V together with a symmetric, bilinear, positive-definite mapping,

More information

MAT 771 FUNCTIONAL ANALYSIS HOMEWORK 3. (1) Let V be the vector space of all bounded or unbounded sequences of complex numbers.

MAT 771 FUNCTIONAL ANALYSIS HOMEWORK 3. (1) Let V be the vector space of all bounded or unbounded sequences of complex numbers. MAT 771 FUNCTIONAL ANALYSIS HOMEWORK 3 (1) Let V be the vector space of all bounded or unbounded sequences of complex numbers. (a) Define d : V V + {0} by d(x, y) = 1 ξ j η j 2 j 1 + ξ j η j. Show that

More information