A Geometric Theory of Feature Selection and Distance-Based Measures
|
|
- Norah Lynch
- 5 years ago
- Views:
Transcription
1 Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 25) A Geometric Theory of Feature Selection and Distance-Based Measures Kilho Shin and Adrian Pino Angulo University of Hyogo Kobe, Japan yshin@ai.u-hyogo.ac.jp Abstract Feature selection measures are often explained by the analogy to a rule to measure the distance of sets of features to the closest ideal sets of features. An ideal feature set is such that it can determine classes uniquely and correctly. This way of explanation was just an analogy before this paper. In this paper, we show a way to map arbitrary feature sets of datasets into a common metric space, which is indexed by a real number p with p. Since this determines the distance between an arbitrary pair of feature sets, even if they belong to different datasets, the distance of a feature set to the closest ideal feature set can be used as a feature selection measure. Surprisingly, when p =, the measure is identical to the Bayesian risk, which is probably the feature selection measure that is used the most widely in the literature. For < p, the measure is novel and has significantly different properties from the Bayesian risk. We also investigate the correlation between measurements by these measures and classification accuracy through experiments. As a result, we show that our novel measures with p > exhibit stronger correlation than the Bayesian risk. Introduction Feature selection is indeed one of the central focuses of machine learning research. In this paper, we understand the feature selection problem as follows: Given a dataset, select an appropriately small number of features that faithfully determine the classes of the examples of the dataset. If we could find a feature subset such that the features correctly determine the classes of the examples, it could be an answer we wanted [Almuallim and Dietterich, 994]. Such feature subsets are referred to as reducts in the rough set theory [Pawlak, 99] and as consistent feature sets in this paper [Liu et al., 998]. A dataset, however, does not necessarily include a consistent feature subset, and in such cases, we have to be satisfied with selecting feature subsets that are sufficiently close to being consistent. In other words, we need a measure that can measure the distance of a feature subset to the closest consistent feature set. However, we only know two necessary conditions for such a measure, namely, determinacy and monotonicity: Determinacy corresponds to the identity of indiscernibles of the axioms of metrics and requires that the measurement is zero, if, and only if, the feature set is consistent; Monotonicity, on the other hand, requires that the measurement of an arbitrary feature set is no greater than the measurement of its subset. When a measure has the determinacy and monotonicity properties, we also say that it is determinant and monotonous. With a determinant and monotonous measure, a feature selection algorithm attempts to find feature sets whose measurements are close to zero. Generally, the feature selection measures proposed in the literature are categorized into two groups. One consists of measures that simply evaluate dependence between two features [Mengle and Goharian, 29; Bolón-Canedo et al., 2; Foithong et al., 22; Suzuki and Sugiyama, 23; Wang, 25]. The framework of mrmr [Peng et al., 25] takes advantage of measures of this group and aims to select a feature set that maximizes the difference of the relevance from the redundancy of the feature set: The relevance is the collective dependence of the feature set to class labels, while the redundancy is the the internal mutual dependence of the features of the feature set. On the other hand, the other consists of determinant and monotonous measures [Liu et al., 998; Shin and Xu, 29; Arauzo-Azofra et al., 28], and 4 of the 7 feature selection measures studied in [Molina et al., 22] are known to belong to this group [Shin et al., 2]. In this paper, we are interested in the latter group. Example Bayesian risk, also known as inconsistency rate [Liu et al., 998], is defined as follows. µ D br (S) = ( ) p(s = x) max p(s = x, C = c). x Ω S S is a feature subset of a dataset D, and C is the random variable to represent classes. Ω S and Ω C are the sample spaces of S and C, and p is the empirical probability distribution of D. This measure is, indeed, determinant and monotonous. Example 2 Zhao et al. [Zhao and Liu, 27] proposed Interact, a feature selection algorithm that leverages the Bayes 382
2 risk µ br to evaluate the closeness of feature sets to the state of being consistent. Therefore, Interact selects feature sets with small µ br measurements. Example 3 Shin et al. [Shin and Xu, 29] also proposed the conditional entropy defined by µ D ce (S) = x Ω S p(s = x, C = c) log p(s = x, C = c) p(s = x) as a determinant and monotonous measure. On the other hand, the well-known feature selection algorithm that selects the top n features X with respect to the mutual information I(X; C) actually selects those feature sets that minimize µ D ({X,..., X n }) = n µ D ce ({X i }). i= By using µ D instead of µ ce D, this algorithm is fast but is not very accurate, since µ D is not determinant. When a measure represents the distance of a feature set to the closest consistent feature set in some metric space, we call it distance-based. About distance-based measures, we only knew two necessary conditions, that is, determinacy and monotonicity, and hence, the distance-based measure was only an analogy to explain the desirable natures of feature selection measures. This paper changes this. In this paper, we present a way to identify a feature subset of a dataset uniquely with a point in a common metric space. The metric space is parameterized by a real number p [, ], and hence, we have infinitely many such metric spaces. In the metric space, consistent feature sets form a closed subspace, and hence, we can determine the distance of an arbitrary feature set to the closest consistent feature set. This is nothing other than a distance-based measure. A different values of p determines a different metric function. For p =, we show that the measure is identical to the wellknown Bayesian risk. On the other hand, for p >, the measure is novel and unknown in the literature. Also, through experiments, we show that measurements by our novel distancebased measures correlate with classification accuracy. 2 Formulating feature selection measures In this paper, we do not discriminate between features and random variables. Furthermore, for a dataset D, we use the following notations. F D is the entire set of features of D. F D is finite and includes the feature C D that represents classes. Also, we denote F D \ {C D } by F D. Ω F D = X F D Ω X is the entire sample space, where Ω X is the sample space of an individual feature X. For a feature set S, Ω S means the Cartesian product X S Ω X, which is the sample space of S. p D : Ω F D Q is the empirical probability distribution. We also denote D by the triplet (F D, Ω F D, p D ). For S F D, D S is the dataset derived from D by eliminating the features in F D \ S. For S F D, D S is the dataset derived from D by replacing the features of S with a single feature S such that Ω S = Ω S. The values for S of an example in D are replaced with the vector consisting of the values. Example 4 For the dataset D determined by (a) below, we have F D = {B, R, G, C D } and Ω F D = {A, B, O, AB} {M, N, C, A} {M, F } {p, n}. When S = {R, G}, (b) and (c) determine D S and D S. B (a) D R G C D A M M p A M M n B N M p O C F n AB A M n (b) D S R G C D M M p M M n N M p C F n A M n (c) D S B S C D A (M, M) p A (M, M) n B (N, M) p O (C, F) n AB (A, M) n On the other hand, a feature selection measure µ is formulated as a family of µ D indexed by datasets D, and each µ D is a real-valued function defined over the power set P(F D ) of F D. In this paper, we assume that a feature selection measure supports the following three requirements, which all of the 9 measures surveyed in [Shin et al., 2] support as well. Requirement (Naming invariance) Measurements by the measure are invariant to renaming of features and feature values. This also implies that the measure assumes that every feature is categorical. Requirement 2 (Localization invariance) If T S, µ D (T ) = µ D S (T ) holds. Requirement 3 (Feature aggregation invariance) If T S, µ D (T ) = µ D S ((T \ S) { S}) holds. Example 5 Bayesian risk µ br has all of these invariance properties: µ D br (F D ) = µ E br (F E ) = 5 indicates the naming invariance; For S = {R, G}, µ D br ({G}) D and µ S br ({G}) are identical to 2 5, and E F F 2 F 3 C D D µ S br ({B, S}) = 5 holds These indicate the localization and feature aggregation invariance properties. In this paper, we will identify (D, S) and (E, T ), pairs of a dataset and a feature subset, if µ D (S) = µ E (T ) holds for any measure µ that supports Requirements, 2 and 3. For this, we introduce the notion of coherent mappings and then prove Proposition. Definition A mapping ϕ : F D F E is coherent between D and E, if, and only if, there exist injections v Y : Ω ϕ (Y ) Ω Y for Y Im(ϕ) and an injection v CE : Ω CD Ω CE such that p D (x) = p E (y), where v π = Y Im(ϕ) {C E } y:y Im(ϕ) {CE }=v π(x) v Y : Ω F D Ω Im(ϕ) {CE }. 383
3 Example 6 We consider the datasets D of Example 4 and E below. When we define ϕ by ϕ(b) = F and ϕ(r) = ϕ(g) = F 2, ϕ is coherent. To show this, we determine v F and v F2 as below and v CE by v CE (p) = and v CE (n) = 2. For example, p D (A, M, M, p) = 5 and p E(,,, ) + p E (,, 3, ) = + = 5 hold. E F F 2 F 3 C E F F 2 F 3 C E v F v F2 A (M, M) (C, M) 5 B 2 (M, F) 2 (C, F) 6 O 3 (N, M) 3 (A, M) 7 AB 4 (N, F) 4 (A, F) 8 Proposition If a feature selection measure µ supports Requirement to 3, µ E (S) = µ D (ϕ (S)) holds for an arbitrary coherent mapping ϕ : F D F E between D and E and an arbitrary S Im(ϕ). Proof. We let ϕ = ϕ ϕ such that ϕ : F D Im(ϕ) and ϕ : Im(ϕ) F E. Since ϕ yields a sequence of feature aggregations that convert D into E Im(ϕ) up to renaming of features and feature values, µ E (S) = µ E Im(ϕ) (S) = µ D (ϕ (S)) follows from Requirement to 3. Example 7 For the same D, E and ϕ as Example 6, µ br E ({F, F 2 }) = µ br D ({B, R, G}) = µ br E ({F }) = µ br D ({B}) = µ br E ({F 2 }) = µ br D ({R, G}) = 5 holds. 3 Projecting a feature set of a dataset into a subspace P + R of the l p space (p [, ]) We let n N { }. We introduce the space P n into which feature sets of datasets are projected. We start with preparation. Let K be either the rational number field Q or as the real number field R. We define P K and P + K follows, which are sets of functions p such that p : N 2 K. Definition 2 We define P K and P + K by: { P } K = p p(i, j), p(i, j) = ; { P + K = p p(i, j), i= j= i= j= } p(i, j). We can identify N 2 with N (for example, f : (i, j) (i+j )(i+j 2) 2 + i is bijective), and therefore, P R and P + R can be viewed as subspaces of l p for p. If we focus on datasets that include at most n classes, we can use P n K and P + n K instead. Definition 3 We define P n K and P + n K by: P n K = { p P } K j > n p(i, j) = ; P + n { K = p P + } j K > n p(i, j) = ; P n R and P + n R are metric spaces with the metric derived from the norm p of l p. P n R is closed in l p only when p =, while P + n R is the closure of n PR in l p for < p. Therefore, P n R in l and P + n R in l p for < p are complete, since l p n is a Banach space. Although P R and P + n R are bounded, none of them is compact. We define P n, a subspace of P n R, as follows. Definition 4 We let supp(p) be the smallest Ω U Ω C such that Ω U N, Ω C N and Ω U Ω C Supp(p) = {(i, j) p(i, j) > }. We define P and P n for n N by: P = {p P Q supp(p) < }; P n = {p P j > n p(i, j) = }. Example 8 When we define p by p(i, i) = ( r)r i for r (, ), p P R holds, because i= ( r)ri = r ( r) lim n n r =. Furthermore, if r Q, p P Q holds. For this p, we have Ω U = N, Ω C = N and supp(p) = {(i, i) i N}. Hence, p / P. For n N { }, P n turns out to be dense in P R n. For p P n, D p = ({U, C}, N N, p) is not a dataset, because N N is infinite. Nevertheless, we can think of a coherent mapping ϕ : F D {U} between a dataset D and D p. Theorem shows how to project a feature set S of a dataset D into P n. Theorem We let D be a dataset with n classes and S F D. There exists p P n such that ϕ : S {U} is coherent between D S and D p. Conversely, for arbitrary p P n, there exists a dataset D with n classes such that ϕ : F D {U} is coherent between D and D p. Proof. Let v : Ω S N and v 2 : Ω CD {,..., n} be injective. ( It suffices to define p P n by p(i, j) = p D S v (i), v 2 (j)) if (i, j) Im(v v 2 ) and p(i, j) = otherwise. To show the converse, we have only to let D = ({U, C}, supp(p), p). Example 9 To project D S of Example 4 into P 2, we determine p P 2 by p(, ) = p(, 2) = p(3, ) = p(6, 2) = p(7, 4) = 5 and p(i, j) = otherwise. When we determine v U and v C identical to v F2 and v CE of Example 6, ϕ : S {U} is coherent between D D and D p, and hence, D D is projected to p. Example A single D S can be projected to infinitely many points in P n. For example, we let D S, p and v U be the same as Example 9. For an arbitrary bijection π : N N, we define p π by p π (π(i), j) = p(i, j). Apparently, D S projects to p π with π v U. 384
4 Example More than one datasets are projected to a single point in P n. For example, D S of Example 4 and E {F2} of Example 6 are both projected to any of p π of Example. To be precise, a single point in P n has infinitely many datasets that project to the point. Based on Proposition and Theorem, Theorem 2 asserts that a feature selection measure can be uniquely viewed as a real-valued function defined over P n. Theorem 2 For a feature selection measure µ, ˆµ : P R uniquely exists and ˆµ(p) = µ D (S) holds for arbitrary D, S and p such that ϕ : S {U} is coherent between D S and D p. Proof. We let Dp be ({U, C}, supp(p), p) for p P. We determine ˆµ(p) by µ D p ({U}). For a dataset D such that ϕ : S {U} is coherent between D S and D p with v U : Ω S N and v C : Ω CD N, we let D p # = ({U, C}, Im(v U ) Im(v C ), p). ϕ : S {U} is coherent between D S and D p #, and the inclusion supp(p) Im(v U ) Im(v C ) yields coherence between Dp and D p #. Therefore, µ D (S) = µ D# p ({U}) = µ D p ({U}) = ˆµ(p). Example 2 Datasets that are projected to the same point in P n have the same measurement for an arbitrary feature selection measure µ. For example, as seen in Example, D S of Example 4 and E {F2} of Example 6 are both projected to the same point P 2. On the other hand, µ D br (S) = µ E br ({F 2 }) = 5 holds as seen in Example 7. Therefore, by letting ˆµ(p) = µ D (S) for D S that projects to p, ˆµ is well defined. The problem of the projection determined by Theorem consists in the fact that a single pair (D, S) is mapped to an infinite number of different points in P n (Example ). In the next section, we will solve this problem. 4 Defining a quotient space Q + R of P + R Our idea to solve the problem is to introduce an equivalence relation that unifies points of P n that are the images of the same single pair (D, S) and then use the resulting quotient space. The relation defined below is an equivalence relation. We assume n N { }. Definition 5 Let p and q be in P + R. We define p q, if, and only if, there exist bijections v U : N N and v C : N N such that p(i, j) = q(v U (i), v C (j)). Example 3 p π p holds for p π and π of Example. Definition 6 We let Q + R = P + R /. Moreover, for the canonical projection π : P + R Q + R, we let Q + n R = π(p + n R ), n QR = π(p n R ) and Q n = π(p n ). Proposition 2 is easy to see but plays a crucial rule to prove Theorem 3 and 4. Proposition 2 For p and q in P, the following are equivalent.. p q. 2. There exists (D, S) such that ϕ : S {U} is coherent between D S and D p and between D S and D q. 3. ϕ : S {U} is coherent between D S and D p, if, and only if, it is coherent between D S and D q. Theorem 3 and 4 follow from Theorem, Theorem 2 and Proposition 2. In the following, [p] denotes π(p), and [D, S] denotes the image of (D, S) in Q. ϕ : S {U} is coherent between D S and Dp Theorem 3 We let D be a dataset with n classes and S F D. There uniquely exists [p] Q n such that ϕ : S {U} is coherent between D S and D p. Conversely, for arbitrary [p] Q n, there exists a dataset D with n classes such that ϕ : F D {U} is coherent between D and D p. [D, S] (D, S) π P n Q n Theorem 4 For a feature selection measure µ, there uniquely exists [ˆµ] : Q R such that [ˆµ]([D, S]) = µ D (S) holds for any dataset D and S F D. 5 Introducing a metric d p into Q + R Although a quotient of a metric space is not always a metric space, we can derive a metric into Q + R from l p as follows. Definition 7 For [p] and [q] in Q + R, we define d p ([p], [q]) = inf{ p q p p p, q q }. Theorem 5 For p, d p is a metric over Q + R. Proof. We only prove the triangle inequality d p ([p], [r]) d p ([p], [q]) + d p ([q], [r]) taking advantage of Lemma. Lemma For p p and q in P + R, there exists q q such that p q p = p q p. For ε >, we assume p q p d p ([p], [q]) < ε 2 and q r p d p ([q], [r]) < ε 2 for q q. By Lemma, we have q r p = q r p and d p ([p], [r]) p r p p q p + q r p < d p ([p], [q]) + d p ([q], [r]) + ε. 6 Deriving a measure µ l p from d p We see that the minimum of d p distance from [D, S] to a consistent feature set determines a feature [D, S] selection measure. We first define consistent points of P + R. π(c ) Q n l p µ l p([d, S]) Definition 8 p P + R is consistent, if, and only if, there exists j : N N such that p(i, j) = if j j(i). 385
5 C + R denotes the entire set of consistent p, and we let C + n R = C + R P + n R, n CR = C + R n PR and C n = C + R n P for n N { }. Definition 9 For j : N N such that j(i) argmax{p(i, j) j N} and p P, we define: For p <, µ l p(p) = p p(i, j) p p(i, j(i)) p ; i N For p =, µ l (p) = max { p(i, j) (i, j) N 2, (i, j) (i, j(i)) }. Surprisingly, µ l is identical to the Bayesian risk. Proposition 3 If ϕ : S U is coherent between D S and D p, µ br D (S) = µ l (p) holds. Proof. We assume that v U : Ω S N and v C : Ω C N yield the coherence between D S and D p. µ D br (S) = ( ) p D S (S = x) max p D S (S = x, C = c) x Ω S = ( x Ω S = µ l (p). p(v U (x), v C (c)) max p(v U (x), v C (c)) Theorem 6 For p P, the following holds.. µ l (p) = 2 min{d ([p], [q]) q C }. 2. For < p, µ l p(p) = inf{d p ([p], [q]) q C } = min{d p ([p], [q]) q C + R supp(q) < }. Proof. Here, we prove only. For q C, we assume j : N N such that q(i, j) = if j j(i). Also, we let j : N N satisfy j(i) argmax{p(i, j) j N}. We first fix i N. p(i, j) q(i, j) = p(i, j(i)) q(i, j(i)) + \{j(i)} p(i, j) p(i, j(i)) q(i, j(i)) + p(i, j) max p(i, j). ) Then, we sum up the both sides across all i N. p q p(i, j(i)) q(i, j(i)) i N i N + i N p(i, j) max{p(i, j) j N} q(i, j(i)) i N p(i, j(i)) + µ l (p) 2 µ l (p). On the other hand, for q C such that q(i, j) = p(i, j) if j = j(i) and otherwise, we have p q = 2 ( i N max{p(i, j) j N} As q C R if q q, Lemma implies the assertion. Determinacy of µ l p immediately follows from Theorem 6. Also, it is easy to show monotonicity of µ l p. Furthermore, µ l p turns out continuous under d q for arbitrary p, q. 7 Contrasting d and d p with p > Proposition 3 asserts that µ l is identical to the Bayesian risk, which is well known in the literature. On the other hand, µ l p with p (, ] is a novel measure unknown in the literature. Also, as a metric function, d and d p with p (, ] are significantly different as stated below without proof. Q R is complete under d, while Q + R is complete under d p for p (, ]. Q R n is not compact under d, while Q + R n is compact under d p for p (, ]. Q n is dense in Q R n under d, while it is dense in Q + R n under dp for p (, ]. Moreover, in Section 8, we will see that µ l and µ l p for p (, ] have different properties in terms of correlation between their measurements and classification accuracy. Although we cannot explain the reason for this difference, we imagine that it is related with the aforementioned difference between d and d p with p (, ] as a metric function. 8 Correlation between measurements by µ l p and classification accuracy We evidently expect that there exists strong correlation between measurements by a good measure and classification accuracy. From this point of view, we ran experiments with µ l p for p =, 2,..., 5. In particular, we focus on comparison between p = and p > : µ l is identical to the Bayesian risk and is the feature selection measure that is used the most widely in the literature; On the other hand, the other measures are novel and introduced in this paper for the first time. ). 386
6 Table : Datasets NAME #FEAT. #EXAM. #CLASSES ARRHYTHMIA AUDIOLOGY MFEAT-FACTOR 26 2 MFEAT-FOURIER 76 2 MFEAT-KARHUNEN 64 2 MFEAT-PIXEL 24 2 MFEAT-ZERNIKE 47 2 MUSK OPTIDIGITS SONAR SPAMBASE SPECTROMETER Datasets Table shows the datasets that we use in our experiments as well as their important attributes, namely, the numbers of features, examples and classes. All of the datasets are obtained from the UCI repository of machine learning databases [Blake and Merz, 998]. 8.2 Methods The following are the steps of our experiments. Sampling feature sets For each dataset of Table, 6 feature sets are selected at random. The sizes of the selected feature sets also vary at random. In total, we obtain 6 2 = 72 pairs of a feature set and a dataset. Localizing datasets For each pair of a feature set and a dataset, we localize the dataset by eliminating all of the features that do not belong to the relevant feature set. Consequently, we obtain 72 localized datasets. Measuring accuracy of classification We apply three classifiers, namely, Naïve Bayes, C4.5 and SVM, to each of the localized datasets in the manner of -fold cross validation and record the averaged AUC-ROC scores, which we use as classification accuracy scores. 8.3 Results Figure, 2 and 3 display scatter plots of the results of our experiments with Naïve Bayes, C4.5 and SVM, respectively. The x-axis of each chart represents measurements by a measure out of (a) µ l, (b) µ l 2, (c) µ l 3, (d) µ l 4 and (e) µ l 5, while the y-axis does the averaged AUC-ROC scores. From the charts, we have the impression that µ l p with p > shows stronger negative correlation with classification accuracy than µ l. We look into this more closely in the next subsection. 8.4 Analysis on correlation To compare µ l and µ l p with p > in terms of correlation with classification accuracy, we introduce a function P t (x) as an index. We let N x be the number of plots whose µ l p distances fall within [x, x +.) and N t,x be the number of plots whose AUC-ROC scores exceed t and µ l p distances fall within [x, x+.). Then, we define P t (x) by P t (x) = Nt,x N x. Intuitively, P t (x) approximates the probability of the case that the classification accuracy exceeds t when the measurement by the relevant measure is x. Figure 4, 5 and 6 show the curves of P t (x) for Naïve Bayes, C4.5 and SVM. The value of t varies in {.95,.9,.85,.8}. Since P t (x) P t (x) holds for t < t, the curve for t is located above the curve for t. We observe two properties from the charts. The curves for µ l p for p > are akin to one another in shape, while the curves for µ l appear significantly different from the other measures. The curves for µ l p with p > exhibits clearer negative correlation between measurements by the measure and classification accuracy than the curves for µ l. In fact, the table below presents the correlation coefficients between the measurements x and the values of P t (x). The presented values also support the observation stated above. 9 Conclusion t = NAÏVE BAYES µ l µ l µ l µ l µ l C4.5 µ l µ l µ l µ l µ l SVM µ l µ l µ l µ l µ l We have introduced a family of feature selection measures { µ l p p [, ]} that are derived from the distance functions of some metric spaces. Although µ l turns out to be identical to the Bayesian risk, µ l p for p (, ] are novel. Through experiments, we have shown that µ l p with p > exhibits clearer negative correlation between its measurements and classification accuracy than µ l does. 387
7 . (a) µ l (b) µ l 2 (c) µ l 3 (d) µ l 4 (e) µ l BR DST2 DST3 DST4 DST Distance Distance Distance Distance Distance.6.8. Figure : Scatter plots of the experimental results (Naïve Bayes). BR. DST2. DST3. DST4. DST Distance Distance Distance Distance Distance.6.8. Figure 2: Scatter plots of the experimental results (C4.5). BR. DST2. DST3. DST4. DST Distance Distance Distance Distance Distance.6.8. t=.95 t=.9 t=.85 t=.8 Figure 3: Scatter plots of the experimental results (SVM) t=.95 t=.9 t=.85 t=.8 t=.95 t=.9 t=.85 t=.8 t=.95 t=.9 t=.85 t=.8 t=.95 t=.9 t=.85 t=.8 Precision- NB- BR Precision- NB- DST2 Precision- NB- DST3 Precision- NB- DST4 Precision- NB- DST x x x x x t=.95 t=.9 t=.85 t=.8 Figure 4: The curves of P t (x) for t {.95,.9,.85,.8} (Naïve Bayes) t=.95 t=.9 t=.85 t=.8 t=.95 t=.9 t=.85 t=.8 t=.95 t=.9 t=.85 t=.8 t=.95 t=.9 t=.85 t=.8 Precision- C4.5- BR Precision- C4.5- DST2 Precision- C4.5- DST3 Precision- C4.5- DST4 Precision- C4.5- DST x x x x x t=.95 t=.9 t=.85 t=.8 Figure 5: The curves of P t (x) for t {.95,.9,.85,.8} (C4.5) t=.95 t=.9 t=.85 t=.8 t=.95 t=.9 t=.85 t=.8 t=.95 t=.9 t=.85 t=.8 t=.95 t=.9 t=.85 t=.8 Precision- SVM- BR Precision- SVM- DST2 Precision- SVM- DST3 Precision- SVM- DST4 Precision- SVM- DST x x x x x Figure 6: The curves of P t (x) for t {.95,.9,.85,.8} (SVM)
8 References [Almuallim and Dietterich, 994] H. Almuallim and T. G. Dietterich. Learning boolean concepts in the presence of many irrelevant features. Artificial Intelligence, 69( - 2), 994. [Arauzo-Azofra et al., 28] A. Arauzo-Azofra, J. M. Benitez, and J. L Castro. Consistency measures for feature selection. Journal of Intelligent Information Systems, 3(3): , November 28. [Blake and Merz, 998] C. S. Blake and C. J. Merz. UCI repository of machine learning databases. Technical report, University of California, Irvine, 998. [Bolón-Canedo et al., 2] V. Bolón-Canedo, S. Seth, and N. Sánchez-Maro no. Statistical dependence measure for feature selection in microarray datasets. In ESANN 2, pages 23 28, 2. [Foithong et al., 22] S. Foithong, O. Pinngern, and B. Attachoo. Feature subset selection wrapper based on mutual information and rough sets. Expert Systems with Applications, 39(): , 22. [Liu et al., 998] H. Liu, H. Motoda, and M. Dash. A monotonic measure for optimal feature selection. In Proceedings of European Conference on Machine Learning, 998. [Mengle and Goharian, 29] S.S.R. Mengle and N. Goharian. Ambiguity measure feature-selection algorithm. Journal of the American Society for Information Science and Technology, 6(5):37 5, 29. [Molina et al., 22] L. Molina, L. Belanche, and A. Nebot. Feature selection algorithms: A survey and experimental evaluation. In Proceedings of IEEE International Conference on Data Mining, pages 36 33, 22. [Pawlak, 99] Z. Pawlak. Rough Sets, Theoretical aspects of reasoning about data. Kluwer Academic Publishers, 99. [Peng et al., 25] H. Peng, F. Long, and C. Ding. Feature selection based on mutual information: Criteria of maxdependency, max-relevance and min-redundancy. IEEE Transaction on Pattern Analysis and Machine Intelligence, 27(8), August 25. [Shin and Xu, 29] K. Shin and X.M. Xu. Consistencybased feature selection. In 3th International Conferece on Knowledge-Based and Intelligent Information & Engineering System, 29. [Shin et al., 2] K. Shin, D. Fernandes, and S. Miyazaki. Consistency measures for feature selection: A formal definition, relative sensitivity comparison, and a fast algorithm. In 22nd International Joint Conference on Artificial Intelligence, pages , 2. [Suzuki and Sugiyama, 23] T. Suzuki and M. Sugiyama. Sufficient dimension reduction via squared-loss mutual information estimation. Neural Computation, 25(3): , 23. [Wang, 25] L. Wang. Feature selection algorithm based on conditional dynamic mutual information. International Journal on Smart Sensing and Intelligent Systems, 8(), 25. [Zhao and Liu, 27] Z. Zhao and H. Liu. Searching for interacting features. In Proceedings of International Joint Conference on Artificial Intelligence, pages 56 6,
ENTROPIES OF FUZZY INDISCERNIBILITY RELATION AND ITS OPERATIONS
International Journal of Uncertainty Fuzziness and Knowledge-Based Systems World Scientific ublishing Company ENTOIES OF FUZZY INDISCENIBILITY ELATION AND ITS OEATIONS QINGUA U and DAEN YU arbin Institute
More informationOn Multi-Class Cost-Sensitive Learning
On Multi-Class Cost-Sensitive Learning Zhi-Hua Zhou and Xu-Ying Liu National Laboratory for Novel Software Technology Nanjing University, Nanjing 210093, China {zhouzh, liuxy}@lamda.nju.edu.cn Abstract
More informationIterative Laplacian Score for Feature Selection
Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,
More informationPart V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory
Part V 7 Introduction: What are measures and why measurable sets Lebesgue Integration Theory Definition 7. (Preliminary). A measure on a set is a function :2 [ ] such that. () = 2. If { } = is a finite
More informationPart III. 10 Topological Space Basics. Topological Spaces
Part III 10 Topological Space Basics Topological Spaces Using the metric space results above as motivation we will axiomatize the notion of being an open set to more general settings. Definition 10.1.
More informationLecture Notes 1: Vector spaces
Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector
More informationVisualize Biological Database for Protein in Homosapiens Using Classification Searching Models
International Journal of Computational Intelligence Research ISSN 0973-1873 Volume 13, Number 2 (2017), pp. 213-224 Research India Publications http://www.ripublication.com Visualize Biological Database
More informationNaive Bayesian Rough Sets
Naive Bayesian Rough Sets Yiyu Yao and Bing Zhou Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 {yyao,zhou200b}@cs.uregina.ca Abstract. A naive Bayesian classifier
More informationCourse 212: Academic Year Section 1: Metric Spaces
Course 212: Academic Year 1991-2 Section 1: Metric Spaces D. R. Wilkins Contents 1 Metric Spaces 3 1.1 Distance Functions and Metric Spaces............. 3 1.2 Convergence and Continuity in Metric Spaces.........
More informationSome Background Material
Chapter 1 Some Background Material In the first chapter, we present a quick review of elementary - but important - material as a way of dipping our toes in the water. This chapter also introduces important
More informationClassification Based on Logical Concept Analysis
Classification Based on Logical Concept Analysis Yan Zhao and Yiyu Yao Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada S4S 0A2 E-mail: {yanzhao, yyao}@cs.uregina.ca Abstract.
More informationMapping kernels defined over countably infinite mapping systems and their application
Journal of Machine Learning Research 20 (2011) 367 382 Asian Conference on Machine Learning Mapping kernels defined over countably infinite mapping systems and their application Kilho Shin yshin@ai.u-hyogo.ac.jp
More informationPostulate 2 [Order Axioms] in WRW the usual rules for inequalities
Number Systems N 1,2,3,... the positive integers Z 3, 2, 1,0,1,2,3,... the integers Q p q : p,q Z with q 0 the rational numbers R {numbers expressible by finite or unending decimal expansions} makes sense
More informationAn analysis of data characteristics that affect naive Bayes performance
An analysis of data characteristics that affect naive Bayes performance Irina Rish Joseph Hellerstein Jayram Thathachar IBM T.J. Watson Research Center 3 Saw Mill River Road, Hawthorne, NY 1532 RISH@US.IBM.CM
More informationWeek 2: Sequences and Series
QF0: Quantitative Finance August 29, 207 Week 2: Sequences and Series Facilitator: Christopher Ting AY 207/208 Mathematicians have tried in vain to this day to discover some order in the sequence of prime
More informationSolving Classification Problems By Knowledge Sets
Solving Classification Problems By Knowledge Sets Marcin Orchel a, a Department of Computer Science, AGH University of Science and Technology, Al. A. Mickiewicza 30, 30-059 Kraków, Poland Abstract We propose
More informationFast Rates for Estimation Error and Oracle Inequalities for Model Selection
Fast Rates for Estimation Error and Oracle Inequalities for Model Selection Peter L. Bartlett Computer Science Division and Department of Statistics University of California, Berkeley bartlett@cs.berkeley.edu
More informationDefinition 2.1. A metric (or distance function) defined on a non-empty set X is a function d: X X R that satisfies: For all x, y, and z in X :
MATH 337 Metric Spaces Dr. Neal, WKU Let X be a non-empty set. The elements of X shall be called points. We shall define the general means of determining the distance between two points. Throughout we
More informationGEOMETRIC CONSTRUCTIONS AND ALGEBRAIC FIELD EXTENSIONS
GEOMETRIC CONSTRUCTIONS AND ALGEBRAIC FIELD EXTENSIONS JENNY WANG Abstract. In this paper, we study field extensions obtained by polynomial rings and maximal ideals in order to determine whether solutions
More informationExercises to Applied Functional Analysis
Exercises to Applied Functional Analysis Exercises to Lecture 1 Here are some exercises about metric spaces. Some of the solutions can be found in my own additional lecture notes on Blackboard, as the
More informationWhen is undersampling effective in unbalanced classification tasks?
When is undersampling effective in unbalanced classification tasks? Andrea Dal Pozzolo, Olivier Caelen, and Gianluca Bontempi 09/09/2015 ECML-PKDD 2015 Porto, Portugal 1/ 23 INTRODUCTION In several binary
More information5 Measure theory II. (or. lim. Prove the proposition. 5. For fixed F A and φ M define the restriction of φ on F by writing.
5 Measure theory II 1. Charges (signed measures). Let (Ω, A) be a σ -algebra. A map φ: A R is called a charge, (or signed measure or σ -additive set function) if φ = φ(a j ) (5.1) A j for any disjoint
More informationMETRIC BASED ATTRIBUTE REDUCTION IN DYNAMIC DECISION TABLES
Annales Univ. Sci. Budapest., Sect. Comp. 42 2014 157 172 METRIC BASED ATTRIBUTE REDUCTION IN DYNAMIC DECISION TABLES János Demetrovics Budapest, Hungary Vu Duc Thi Ha Noi, Viet Nam Nguyen Long Giang Ha
More informationSUPPORT VECTOR MACHINE
SUPPORT VECTOR MACHINE Mainly based on https://nlp.stanford.edu/ir-book/pdf/15svm.pdf 1 Overview SVM is a huge topic Integration of MMDS, IIR, and Andrew Moore s slides here Our foci: Geometric intuition
More informationPOINT SET TOPOLOGY. Definition 2 A set with a topological structure is a topological space (X, O)
POINT SET TOPOLOGY Definition 1 A topological structure on a set X is a family O P(X) called open sets and satisfying (O 1 ) O is closed for arbitrary unions (O 2 ) O is closed for finite intersections.
More informationAfter taking the square and expanding, we get x + y 2 = (x + y) (x + y) = x 2 + 2x y + y 2, inequality in analysis, we obtain.
Lecture 1: August 25 Introduction. Topology grew out of certain questions in geometry and analysis about 100 years ago. As Wikipedia puts it, the motivating insight behind topology is that some geometric
More informationDYNAMICAL CUBES AND A CRITERIA FOR SYSTEMS HAVING PRODUCT EXTENSIONS
DYNAMICAL CUBES AND A CRITERIA FOR SYSTEMS HAVING PRODUCT EXTENSIONS SEBASTIÁN DONOSO AND WENBO SUN Abstract. For minimal Z 2 -topological dynamical systems, we introduce a cube structure and a variation
More informationII - REAL ANALYSIS. This property gives us a way to extend the notion of content to finite unions of rectangles: we define
1 Measures 1.1 Jordan content in R N II - REAL ANALYSIS Let I be an interval in R. Then its 1-content is defined as c 1 (I) := b a if I is bounded with endpoints a, b. If I is unbounded, we define c 1
More informationMinimal Attribute Space Bias for Attribute Reduction
Minimal Attribute Space Bias for Attribute Reduction Fan Min, Xianghui Du, Hang Qiu, and Qihe Liu School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu
More informationCombinatorics in Banach space theory Lecture 12
Combinatorics in Banach space theory Lecture The next lemma considerably strengthens the assertion of Lemma.6(b). Lemma.9. For every Banach space X and any n N, either all the numbers n b n (X), c n (X)
More informationDavid Hilbert was old and partly deaf in the nineteen thirties. Yet being a diligent
Chapter 5 ddddd dddddd dddddddd ddddddd dddddddd ddddddd Hilbert Space The Euclidean norm is special among all norms defined in R n for being induced by the Euclidean inner product (the dot product). A
More informationGroup Decision-Making with Incomplete Fuzzy Linguistic Preference Relations
Group Decision-Making with Incomplete Fuzzy Linguistic Preference Relations S. Alonso Department of Software Engineering University of Granada, 18071, Granada, Spain; salonso@decsai.ugr.es, F.J. Cabrerizo
More informationTopological vectorspaces
(July 25, 2011) Topological vectorspaces Paul Garrett garrett@math.umn.edu http://www.math.umn.edu/ garrett/ Natural non-fréchet spaces Topological vector spaces Quotients and linear maps More topological
More informationGenerative MaxEnt Learning for Multiclass Classification
Generative Maximum Entropy Learning for Multiclass Classification A. Dukkipati, G. Pandey, D. Ghoshdastidar, P. Koley, D. M. V. S. Sriram Dept. of Computer Science and Automation Indian Institute of Science,
More informationProblem Set 2: Solutions Math 201A: Fall 2016
Problem Set 2: s Math 201A: Fall 2016 Problem 1. (a) Prove that a closed subset of a complete metric space is complete. (b) Prove that a closed subset of a compact metric space is compact. (c) Prove that
More informationHilbert spaces. 1. Cauchy-Schwarz-Bunyakowsky inequality
(October 29, 2016) Hilbert spaces Paul Garrett garrett@math.umn.edu http://www.math.umn.edu/ garrett/ [This document is http://www.math.umn.edu/ garrett/m/fun/notes 2016-17/03 hsp.pdf] Hilbert spaces are
More informationA New Wrapper Method for Feature Subset Selection
A New Wrapper Method for Feature Subset Selection Noelia Sánchez-Maroño 1 and Amparo Alonso-Betanzos 1 and Enrique Castillo 2 1- University of A Coruña- Department of Computer Science- LIDIA Lab. Faculty
More informationBanach Spaces II: Elementary Banach Space Theory
BS II c Gabriel Nagy Banach Spaces II: Elementary Banach Space Theory Notes from the Functional Analysis Course (Fall 07 - Spring 08) In this section we introduce Banach spaces and examine some of their
More informationChapter 6: Classification
Chapter 6: Classification 1) Introduction Classification problem, evaluation of classifiers, prediction 2) Bayesian Classifiers Bayes classifier, naive Bayes classifier, applications 3) Linear discriminant
More informationMA651 Topology. Lecture 9. Compactness 2.
MA651 Topology. Lecture 9. Compactness 2. This text is based on the following books: Topology by James Dugundgji Fundamental concepts of topology by Peter O Neil Elements of Mathematics: General Topology
More informationLebesgue Measure on R n
8 CHAPTER 2 Lebesgue Measure on R n Our goal is to construct a notion of the volume, or Lebesgue measure, of rather general subsets of R n that reduces to the usual volume of elementary geometrical sets
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationTopological properties of Z p and Q p and Euclidean models
Topological properties of Z p and Q p and Euclidean models Samuel Trautwein, Esther Röder, Giorgio Barozzi November 3, 20 Topology of Q p vs Topology of R Both R and Q p are normed fields and complete
More informationLebesgue Measure on R n
CHAPTER 2 Lebesgue Measure on R n Our goal is to construct a notion of the volume, or Lebesgue measure, of rather general subsets of R n that reduces to the usual volume of elementary geometrical sets
More informationEasy Categorization of Attributes in Decision Tables Based on Basic Binary Discernibility Matrix
Easy Categorization of Attributes in Decision Tables Based on Basic Binary Discernibility Matrix Manuel S. Lazo-Cortés 1, José Francisco Martínez-Trinidad 1, Jesús Ariel Carrasco-Ochoa 1, and Guillermo
More information4 Choice axioms and Baire category theorem
Tel Aviv University, 2013 Measure and category 30 4 Choice axioms and Baire category theorem 4a Vitali set....................... 30 4b No choice....................... 31 4c Dependent choice..................
More informationFinal Examination CS540-2: Introduction to Artificial Intelligence
Final Examination CS540-2: Introduction to Artificial Intelligence May 9, 2018 LAST NAME: SOLUTIONS FIRST NAME: Directions 1. This exam contains 33 questions worth a total of 100 points 2. Fill in your
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationAnalysis Finite and Infinite Sets The Real Numbers The Cantor Set
Analysis Finite and Infinite Sets Definition. An initial segment is {n N n n 0 }. Definition. A finite set can be put into one-to-one correspondence with an initial segment. The empty set is also considered
More informationFunctional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...
Functional Analysis Franck Sueur 2018-2019 Contents 1 Metric spaces 1 1.1 Definitions........................................ 1 1.2 Completeness...................................... 3 1.3 Compactness......................................
More informationStatistical Theory 1
Statistical Theory 1 Set Theory and Probability Paolo Bautista September 12, 2017 Set Theory We start by defining terms in Set Theory which will be used in the following sections. Definition 1 A set is
More informationChapter 2 Metric Spaces
Chapter 2 Metric Spaces The purpose of this chapter is to present a summary of some basic properties of metric and topological spaces that play an important role in the main body of the book. 2.1 Metrics
More informationI teach myself... Hilbert spaces
I teach myself... Hilbert spaces by F.J.Sayas, for MATH 806 November 4, 2015 This document will be growing with the semester. Every in red is for you to justify. Even if we start with the basic definition
More informationThe Completion of a Metric Space
The Completion of a Metric Space Let (X, d) be a metric space. The goal of these notes is to construct a complete metric space which contains X as a subspace and which is the smallest space with respect
More informationRegret Analysis for Performance Metrics in Multi-Label Classification The Case of Hamming and Subset Zero-One Loss
Regret Analysis for Performance Metrics in Multi-Label Classification The Case of Hamming and Subset Zero-One Loss Krzysztof Dembczyński 1, Willem Waegeman 2, Weiwei Cheng 1, and Eyke Hüllermeier 1 1 Knowledge
More informationComparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees
Comparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees Tomasz Maszczyk and W lodzis law Duch Department of Informatics, Nicolaus Copernicus University Grudzi adzka 5, 87-100 Toruń, Poland
More information1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3
Index Page 1 Topology 2 1.1 Definition of a topology 2 1.2 Basis (Base) of a topology 2 1.3 The subspace topology & the product topology on X Y 3 1.4 Basic topology concepts: limit points, closed sets,
More informationSUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION
SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology
More informationAN EXPLORATION OF THE METRIZABILITY OF TOPOLOGICAL SPACES
AN EXPLORATION OF THE METRIZABILITY OF TOPOLOGICAL SPACES DUSTIN HEDMARK Abstract. A study of the conditions under which a topological space is metrizable, concluding with a proof of the Nagata Smirnov
More informationMulticlass Classification-1
CS 446 Machine Learning Fall 2016 Oct 27, 2016 Multiclass Classification Professor: Dan Roth Scribe: C. Cheng Overview Binary to multiclass Multiclass SVM Constraint classification 1 Introduction Multiclass
More informationA Logical Formulation of the Granular Data Model
2008 IEEE International Conference on Data Mining Workshops A Logical Formulation of the Granular Data Model Tuan-Fang Fan Department of Computer Science and Information Engineering National Penghu University
More informationDreem Challenge report (team Bussanati)
Wavelet course, MVA 04-05 Simon Bussy, simon.bussy@gmail.com Antoine Recanati, arecanat@ens-cachan.fr Dreem Challenge report (team Bussanati) Description and specifics of the challenge We worked on the
More informationStat 451: Solutions to Assignment #1
Stat 451: Solutions to Assignment #1 2.1) By definition, 2 Ω is the set of all subsets of Ω. Therefore, to show that 2 Ω is a σ-algebra we must show that the conditions of the definition σ-algebra are
More informationBoolean Algebras, Boolean Rings and Stone s Representation Theorem
Boolean Algebras, Boolean Rings and Stone s Representation Theorem Hongtaek Jung December 27, 2017 Abstract This is a part of a supplementary note for a Logic and Set Theory course. The main goal is to
More informationFirst In-Class Exam Solutions Math 410, Professor David Levermore Monday, 1 October 2018
First In-Class Exam Solutions Math 40, Professor David Levermore Monday, October 208. [0] Let {b k } k N be a sequence in R and let A be a subset of R. Write the negations of the following assertions.
More informationPrinciples of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata
Principles of Pattern Recognition C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata e-mail: murthy@isical.ac.in Pattern Recognition Measurement Space > Feature Space >Decision
More informationThe small ball property in Banach spaces (quantitative results)
The small ball property in Banach spaces (quantitative results) Ehrhard Behrends Abstract A metric space (M, d) is said to have the small ball property (sbp) if for every ε 0 > 0 there exists a sequence
More informationNotions such as convergent sequence and Cauchy sequence make sense for any metric space. Convergent Sequences are Cauchy
Banach Spaces These notes provide an introduction to Banach spaces, which are complete normed vector spaces. For the purposes of these notes, all vector spaces are assumed to be over the real numbers.
More informationMeasures. Chapter Some prerequisites. 1.2 Introduction
Lecture notes Course Analysis for PhD students Uppsala University, Spring 2018 Rostyslav Kozhan Chapter 1 Measures 1.1 Some prerequisites I will follow closely the textbook Real analysis: Modern Techniques
More informationAn Approach to Classification Based on Fuzzy Association Rules
An Approach to Classification Based on Fuzzy Association Rules Zuoliang Chen, Guoqing Chen School of Economics and Management, Tsinghua University, Beijing 100084, P. R. China Abstract Classification based
More informationTopology. Xiaolong Han. Department of Mathematics, California State University, Northridge, CA 91330, USA address:
Topology Xiaolong Han Department of Mathematics, California State University, Northridge, CA 91330, USA E-mail address: Xiaolong.Han@csun.edu Remark. You are entitled to a reward of 1 point toward a homework
More informationFeature selection and extraction Spectral domain quality estimation Alternatives
Feature selection and extraction Error estimation Maa-57.3210 Data Classification and Modelling in Remote Sensing Markus Törmä markus.torma@tkk.fi Measurements Preprocessing: Remove random and systematic
More informationROUGH SETS THEORY AND DATA REDUCTION IN INFORMATION SYSTEMS AND DATA MINING
ROUGH SETS THEORY AND DATA REDUCTION IN INFORMATION SYSTEMS AND DATA MINING Mofreh Hogo, Miroslav Šnorek CTU in Prague, Departement Of Computer Sciences And Engineering Karlovo Náměstí 13, 121 35 Prague
More informationFORMULATION OF THE LEARNING PROBLEM
FORMULTION OF THE LERNING PROBLEM MIM RGINSKY Now that we have seen an informal statement of the learning problem, as well as acquired some technical tools in the form of concentration inequalities, we
More informationGene Expression Data Classification with Revised Kernel Partial Least Squares Algorithm
Gene Expression Data Classification with Revised Kernel Partial Least Squares Algorithm Zhenqiu Liu, Dechang Chen 2 Department of Computer Science Wayne State University, Market Street, Frederick, MD 273,
More informationBrief Introduction of Machine Learning Techniques for Content Analysis
1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview
More informationMATH 54 - TOPOLOGY SUMMER 2015 FINAL EXAMINATION. Problem 1
MATH 54 - TOPOLOGY SUMMER 2015 FINAL EXAMINATION ELEMENTS OF SOLUTION Problem 1 1. Let X be a Hausdorff space and K 1, K 2 disjoint compact subsets of X. Prove that there exist disjoint open sets U 1 and
More informationAn Empirical Study of Building Compact Ensembles
An Empirical Study of Building Compact Ensembles Huan Liu, Amit Mandvikar, and Jigar Mody Computer Science & Engineering Arizona State University Tempe, AZ 85281 {huan.liu,amitm,jigar.mody}@asu.edu Abstract.
More informationLECTURE 3 Functional spaces on manifolds
LECTURE 3 Functional spaces on manifolds The aim of this section is to introduce Sobolev spaces on manifolds (or on vector bundles over manifolds). These will be the Banach spaces of sections we were after
More informationChapter 1. Measure Spaces. 1.1 Algebras and σ algebras of sets Notation and preliminaries
Chapter 1 Measure Spaces 1.1 Algebras and σ algebras of sets 1.1.1 Notation and preliminaries We shall denote by X a nonempty set, by P(X) the set of all parts (i.e., subsets) of X, and by the empty set.
More informationModule 1. Probability
Module 1 Probability 1. Introduction In our daily life we come across many processes whose nature cannot be predicted in advance. Such processes are referred to as random processes. The only way to derive
More informationMeasurable Choice Functions
(January 19, 2013) Measurable Choice Functions Paul Garrett garrett@math.umn.edu http://www.math.umn.edu/ garrett/ [This document is http://www.math.umn.edu/ garrett/m/fun/choice functions.pdf] This note
More informationThe Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee
The Learning Problem and Regularization 9.520 Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing
More informationOn minimal models of the Region Connection Calculus
Fundamenta Informaticae 69 (2006) 1 20 1 IOS Press On minimal models of the Region Connection Calculus Lirong Xia State Key Laboratory of Intelligent Technology and Systems Department of Computer Science
More informationLecture 9 Metric spaces. The contraction fixed point theorem. The implicit function theorem. The existence of solutions to differenti. equations.
Lecture 9 Metric spaces. The contraction fixed point theorem. The implicit function theorem. The existence of solutions to differential equations. 1 Metric spaces 2 Completeness and completion. 3 The contraction
More informationThe Decision List Machine
The Decision List Machine Marina Sokolova SITE, University of Ottawa Ottawa, Ont. Canada,K1N-6N5 sokolova@site.uottawa.ca Nathalie Japkowicz SITE, University of Ottawa Ottawa, Ont. Canada,K1N-6N5 nat@site.uottawa.ca
More informationOn Improving the k-means Algorithm to Classify Unclassified Patterns
On Improving the k-means Algorithm to Classify Unclassified Patterns Mohamed M. Rizk 1, Safar Mohamed Safar Alghamdi 2 1 Mathematics & Statistics Department, Faculty of Science, Taif University, Taif,
More informationSupport Vector Machines with Example Dependent Costs
Support Vector Machines with Example Dependent Costs Ulf Brefeld, Peter Geibel, and Fritz Wysotzki TU Berlin, Fak. IV, ISTI, AI Group, Sekr. FR5-8 Franklinstr. 8/9, D-0587 Berlin, Germany Email {geibel
More informationAdditive Preference Model with Piecewise Linear Components Resulting from Dominance-based Rough Set Approximations
Additive Preference Model with Piecewise Linear Components Resulting from Dominance-based Rough Set Approximations Krzysztof Dembczyński, Wojciech Kotłowski, and Roman Słowiński,2 Institute of Computing
More informationLinear Classifiers as Pattern Detectors
Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2014/2015 Lesson 16 8 April 2015 Contents Linear Classifiers as Pattern Detectors Notation...2 Linear
More informationUnsupervised Learning with Permuted Data
Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University
More informationChapter 4. Measure Theory. 1. Measure Spaces
Chapter 4. Measure Theory 1. Measure Spaces Let X be a nonempty set. A collection S of subsets of X is said to be an algebra on X if S has the following properties: 1. X S; 2. if A S, then A c S; 3. if
More informationOptimization Models for Detection of Patterns in Data
Optimization Models for Detection of Patterns in Data Igor Masich, Lev Kazakovtsev, and Alena Stupina Siberian State University of Science and Technology, Krasnoyarsk, Russia is.masich@gmail.com Abstract.
More informationExercises for Unit VI (Infinite constructions in set theory)
Exercises for Unit VI (Infinite constructions in set theory) VI.1 : Indexed families and set theoretic operations (Halmos, 4, 8 9; Lipschutz, 5.3 5.4) Lipschutz : 5.3 5.6, 5.29 5.32, 9.14 1. Generalize
More information(c) For each α R \ {0}, the mapping x αx is a homeomorphism of X.
A short account of topological vector spaces Normed spaces, and especially Banach spaces, are basic ambient spaces in Infinite- Dimensional Analysis. However, there are situations in which it is necessary
More informationDiscriminative Direction for Kernel Classifiers
Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering
More informationIntroduction to Machine Learning Midterm Exam
10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but
More informationNotes for Functional Analysis
Notes for Functional Analysis Wang Zuoqin (typed by Xiyu Zhai) November 6, 2015 1 Lecture 18 1.1 The convex hull Let X be any vector space, and E X a subset. Definition 1.1. The convex hull of E is the
More informationHILBERT SPACES AND THE RADON-NIKODYM THEOREM. where the bar in the first equation denotes complex conjugation. In either case, for any x V define
HILBERT SPACES AND THE RADON-NIKODYM THEOREM STEVEN P. LALLEY 1. DEFINITIONS Definition 1. A real inner product space is a real vector space V together with a symmetric, bilinear, positive-definite mapping,
More informationMAT 771 FUNCTIONAL ANALYSIS HOMEWORK 3. (1) Let V be the vector space of all bounded or unbounded sequences of complex numbers.
MAT 771 FUNCTIONAL ANALYSIS HOMEWORK 3 (1) Let V be the vector space of all bounded or unbounded sequences of complex numbers. (a) Define d : V V + {0} by d(x, y) = 1 ξ j η j 2 j 1 + ξ j η j. Show that
More information