An Efficient Algorithm for Large-Scale Text Categorization

Size: px
Start display at page:

Download "An Efficient Algorithm for Large-Scale Text Categorization"

Transcription

1 An Efficient Algorithm for Large-Scale Text Categorization CHANG-RUI YU 1, YAN LUO 2 1 School of Information Management and Engineering Shanghai University of Finance and Economics No.777, Guoding Rd. Shanghai. P.R.China yucr@sjtu.edu.cn 2 School of Management Shanghai Jiao Tong University No.535, Fahuazhen Rd. Shanghai. P.R.China yanluo@sjtu.edu.cn Abstract: - The text categorization is a core technique in knowledge mining field. Most of categorization methods are based on VSM in the current research, of which the widely-used method is knn. But most of them are highly complicated on computation, and could hardly be used for classifying large-scale sample. Moreover, to them, the classifier must be rebuilt when adding or deleting the training samples, which make them poor in scalability. In this paper, based on Mutual Dependence and Equivalent Radius, a new categorization method (called MDER) is proposed. MDER can be used to classify large-scale sample and has good scalability. After a series of experiments of classifying Chinese texts, the conclusion are drawn that MDER outperforms knn and CCC method, and can be used online to classify large-scale sample while keeping higher precision and recall. Key-Words: - Text categorization, Mutual dependence, Equivalent radius, VSM 1 Introduction In recent year, with the increasing interest on automatic text categorization, some algorithms are presented, such as Bayes [1], SVM (support vector machine) [2], Boosting [3], knn(the k-nearest Neighbors) [4],. Most of them are based on VSM, which has been applied to text categorization successfully in some degree [5]. However, these algorithms are highly complicated on computation, and could hardly be used on the occasion of classifying large-scale sample. Moreover, to them, the classifier must be rebuilt when increasing the corpora of the training samples. As a result, they have tough scalability. Therefore, it is essential to find a new method to classify largescale sample with higher precision and recall. This paper presents an efficient algorithm for large-scale text categorization based on mutual dependence and equivalent radius (MDER for short, below). The MDER algorithm is lower in computation complexity. It enables a quick response to classification and a good scalability. Moreover, distinguished from VSM-based algorithms, the MDER algorithm uses the equivalent radius to construct judging function, which returns a relative value according to the distribution of training samples, rather than an absolute value based on the number of training samples. As a result, the misjudgment caused by the great disparity among categories in the number of samples or in the distribution scope can be avoided. So MDER makes it possible to classify large-scale online samples, while keeping high ratio of precision and recall 2 The MDER Algorithm 2.1 Attribute Selection Based on Mutual Dependence In information process, information samples are always pre-processed in advance to better the expression of samples [6,7]. Where, an important step is to select attributes. Generally, algorithms for attribute selection fall into two categories: statisticsbased algorithms and data dictionary-based algorithms [8,9]. Both have their own advantages and disadvantages. For example, the former is domain-independent, not requiring data dictionary. However, it is lower in precision. On the contrary, the latter need the support of data dictionary, offering more precise ability. As we know, it s ISSN: Issue 2, Volume 5, February 2008

2 impossible to express the whole real world at semantic level just by a single data dictionary, so data dictionary-based algorithms must be domainspecific. Moreover, it is really time-wasted to establish a data dictionary, so statistics-based algorithms are widely used for attribute selection in practice, such as N-gram. But the N-gram algorithm is not satisfied yet, often giving some nonsensical word segmentations. Considering the problem above, this paper compares the advantages and disadvantages of mutual information and dependent value model respectively, and constructs the mutual dependence (MD) model combining the two parts advantages and discarding disadvantages. The MD model can determine to what degree two morphemes are dependent on each other, so it can be used for selecting important attributes in text classification. The MD-based algorithm of attribute selection would be helpful not only to select the right attributes, but to decrease the dimension. Also, it can ignore those attributes with little contributions to text classification. Based on the feature of Apriori (i.e. any nonempty subset of a frequent item set must be frequent), this paper proposes a new segmentation rule (called Rule1 below), by which the words in maximal length can be obtained for decreasing the dimension. In Algorithm1, a new MD model is used for processing words, which are similar to Rule1. Definition1 (Mutual Dependence, MD for short): the mutual dependence between two variables χ and η is defined as follows 2 F( S) L 1/ L MD( χη, ) = [ Fs ( ) L F( χ) F( η) ] Log ( ) F( χ) F( η) where, F (χ ) and F (η ) are the frequency of χ and η occurring respectively. F (s) is the frequency of χ and η occurring together. L is the length of samples. All these frequency are directly obtained from samples to be processed. Thus, the strongpoint is that the data dictionary is not required, and the words segmented from samples are closer to domain knowledge. Rule1 (Eliminating Redundant Words): If there exist two N-gram items t i and t j, satisfying t i t j and score(t i )= score(t j ), then either t i or t j is redundant. So t i remains by default. Where, score(*) is the score of a N-gram item, which is equal to the occurring frequency of the N-gram item. Theorem1(Range of Mutual Dependence): The mutual dependence between χ and η ranges within [0, 1 log L ), where L is the length of samples. 4 Theorem1 can be proved easily, so the proofs are omitted here. Under Theorem1, we set an upperlimit μ U and an lower-limit μ L (0<μ L <μ U < 1 4 Log L ) for selecting attributes whose mutual dependence ranges between μ U and μ L, where, μ U 1 4 Log L, μ L 0. The basic thought for attribute selection is described as follows: Use the N-gram algorithm to segment words, discard the words with higher or lower frequency, then get a word set Ux () = { x x U j gram,1 j n}, where U j gram are the words of original text with value of j in length. (2) Judge the correlation between the words in U(x) using the MD model: First, process n-gram words, then (n-1)-gram,, until to 1-gram. When processing an i-gram word x i U(x)(2 i n), segment x i into x j i equal to j in length ( 1 j < i) and x k i equal to k(k=i-j) in length. If x j i U(x) and x k i U(x),then compute MD(x j i, x k i ). When selecting attributes, use μ U and μ L to process by the following steps: 1) If MD(x j i, x k i )>μ U, then subtract the frequency of x i from the frequency of x j i and the frequency of x k i respectively; 2) If MD(x j i, x k i )<μ L, then delete x i from U(x); 3) If MD(x j i, x k i ) ranges within [μ L, μ U ],then U(x) dose not change. (3) Finally, delete the words with higher or lower frequency from U(x) respectively. When processing i-gram words, j-gram words (0<j<i) have been processed, so the computation complexity decrease, and the remained words are more precise. Most of words in Chinese consist of two characters [10], so n in N-gram is often set to be 4. ISSN: Issue 2, Volume 5, February 2008

3 Algorithm 1: Attribute Selection Algorithm Based on Mutual Dependency Input: D: texts with attributes to be selected; μ U : upper-limit of MD; μ L : lower-limit of MD; δ U : upper-limit frequency of word; δ L : lower-limit frequency of word; n: value of N-gram Output: U f : attribute set; W f : adjusted frequency set corresponding to U f Steps: Segment words using N-gram algorithm, then delete the words with higher frequency (>δ U ) or with lower frequency (<δ L ), finally get a word set U f and a word frequency set W f. (2) For I=n To 2 Step 1 Do Pick an unprocessed I-gram word from U f, and try to segment it into word x i and word x j existing in U f. According to the frequency in W f and Formula1, compute MD(x i, x j ). If MD(x i, x j ) >μ U then Subtract the frequency of I-gram word from the frequency of x i and x j of W f respectively; Else if MD(x i, x j ) <μ L then Delete the current I-gram word from U f,and delete the I-gram word s frequency from W f. End if Until all of I-gram words in U f are processed. Next I; (3) Delete those words whose frequency is larger than δ U or less than δ L from U f, and delete their frequency from W f. (4) Use Rule1 to reduce the dimension, and yield an attribute set U f and an frequency attribute W f. 2.2 Classification Model Definition2(Equivalent Radius, ER for short): Let the number of training samples for category ω i (i=1,,c) be n i, and the dimension of vector space be d. Project all samples of ω i on every dimension d j (j=1,,d) respectively, then get the projection range of ω i on d j, denoted as Rang = [ R, R ]. number of samples whose projection is located between O and Center. n is the number of samples whose projection is located at the positive direction from Center is an equivalent radius function of ωi projected on d j, expressed as bellow: R = a R (1 a ) R (2) Compute the center of gravity of ω i on d j, denoted as Center. Here, R is the radius from Center to where, 1 i c, 1 j d, a is distribution coefficient. In the following, an algorithm is given for the origin O, and R is the radius from Center to computing R and a. positive direction. Generally, R R. n is the Algorithm 2: Classification Model Input: d ti : training samples set, 1 t n i, 1 i c, where n i is the number of training samples for category ω i (i=1,,c), and c is the number of categories. output: R : equivalent radius of the category i on the dimension j.. R Center : center of gravity of the category i on the dimension j. a : distribution coefficient of the category i on the dimension j,1 i c,1 j d. Phase1: Compute a, R, R Assign sequence numbers to categories and attributes respectively, where category number ranges [1, c], and attribute dimension ranges [1, d]. (2) Let i =1. (3) For each category ω i, do the following operations: Normalize all the training sample vectors of the category ω i. Project all the training samples of ω i on the dimension d j (j=1,, d), and get the projection range Rang =[ R, R ]. Compute Center, n, n, R, R. (4) If all the categories have been processed, then enter Phase2. Else, let i=i1, and go to Step(3) to process the next category. Phase2: Determine the Equivalent Radius Let i=1. For each category ω i, do the following operations: (2) Let j=1. ISSN: Issue 2, Volume 5, February 2008

4 (3) For each dimension d j, compute = / ( n ), R = a R (1 a ) R a n n (4) If all the dimensions have been processed, then enter Step(5);Else, let j= j1, and go to Step(3) to process the next dimension. (5) If all categories have been processed, then stop. Else, let i=i1, and go to Phase2 to process the next category. a can be determined by the golden section or area. Under Formula2, if projection of x on d j is other methods. Algorithm2 sets a = n /( n n ) equal to 0, then R =0, from which the division according to sample distribution. The reason is that, overflow arises. Therefore, a distance coefficient if the distribution of samples is denser in the area denoted as 1/β is introduced. Thus, the judging toward the origin O from Center, then R < R, function is written as follows. 2 2 k ( x n < n, so R should have a greater weight, and let j Center) d x j g ( x) = 2, i=1,2,,c (3) i 2 a = n /( n n ), R j= 1 ( R ) j= k 1 β = a R (1 a ) R, and Among all categories, if ω vice versa. i is the closest to the testing sample x in relative distance, i.e. g i (x) is the smallest, and then we say x belongs to ω i. So the 2.3 MDER Algorithm rule is: if g j( x) = Min gi( x), i=1,2,,c, then x ω j. First, construct a judging function depending on i 2 2 category ω i. Where, ( xj Center) /( R ) is Next, it will be shown that the classification result is not highly sensitive to 1/β 2, because, if a projection defined as the relative distance between a training of one category on a dimension is equal to 0, then sample x={x1,x 2,,x d } and ω i. Thus, the judging the dimension is generally a specific one for other function is based on relative distance rather than categories. And a small change in 1/β 2 cannot absolute distance so as to avoid the misjudgment greatly influence the final result. caused by great disparity of different categories in the number of training samples or in the distribution Algorithm3: MDER Algorithm input: d ti : training sample set, where 1 t n i, 1 i c, n i is the number of texts to be trained for category ω i (i=1,2,,c), and c is the number of category. t j : testing sample set, where 1 j m, and m is the number of texts to be tested. output: The category of the testing samples Steps: According to Algorithm1, extract attributes from the training sample set. (2) According to Algorithm2, compute the center of gravity and the equivalent radius of classifier s projection on every dimension. (3) From the testing sample set, pick a sample x, not be classified yet. (4) Get the character value of x. (5) Compute g i (x) according to Fomula3. (6) If g j() x = Min g i () x, i=1,2,,c, then x ω j. i (7) If all the testing samples have been classified, then stops. Else, go to Step(3). Some notations are explained as below: c is the number of categories, d is the number of dimension, n is the average number of testing texts for every category, and m is the average number of training texts for every category. Then, the performance of MDER algorithm is analyzed: Classification Efficiency: From Algorithm3, we are aware of the worst-case complexity of MDER is (c n (log 2 c)!), while it is (c 2 n m c n (log 2 (c m))!) for knn. Generally, n<<m, so the complexity of MDER is less than that of knn. (2) Updating Performance: MDER is better at updating, because when adding or deleting training samples, just to adjust the trained classifier into a new classifier according to attributes of the added or deleted training samples, rather than to train all samples again. Then, we introduce how to adjust the classifier when adding a training sample. And it is in the similar way for deleting. When adding a new samples x=(x 1,x 2,,x d ), it is only required to adjust Center, the center of gravity for the category ω i (i=1,2,,c) and R. The relationship between the adjusted center of gravity written as Center and the original center of (0) gravity written as Center is described as follows: (0) Center =(n Center x )/(n1) (4) j ISSN: Issue 2, Volume 5, February 2008

5 The relationship between the adjusted ER, written as R and the original ER, written as (0) R is described as follows: (0) (0) R = R ( n - n ) ( Center - x j )/(n1) 2 Center - x /(n1) (5) j After the adjustment of Center and, and n are also need to revised according to Algorithm4:Updating MDER Classifier Input: (0) (0) Center R n n (for the existed classifier) (2) U: the set of training samples to be added. Output: Center R (for the new classifier) Steps: Pick a sample x=(x 1,x 2,,x d ) from U. (2)According to Formula4 and Formula5, compute n n R n Center and (3)If Center > x j Then { (0) (0) Ω= ( Center Center ) n / R ; n n = -Ω1; n = Ω; n } Else { (0) (0) Ω= ( Center Center ) n / R ; n = n = -Ω1; n Ω; n Algorithm 4. When more samples are to be added, repeat Formula(5) and get a new classifier. 2.4 Updating Performance According to the performance analysis, the updating of MDER algorithm is R on the category of x for the new classifier. } (4)If all the training samples to be added have been processed in above way, then the algorithm stops. Else, (0) (0) Center = Center, R = R, and go to Step. 3 Experimental Result and Analysis 3.1 Experiment Content and Performance Next, the MDER algorithm is applied to the classification of texts in Chinese. Compared with knn and CCC (Classification based on the Center of Classes), the experimental result of MDER is analyzed. The experimental data, stored in two corpuses G1 and G2, comes from the 2005-year s People Daily and the SOHU website respectively, shown as Table1 and Table2. Table 1 The testing corpora (G1) Categories Number of Texts Politics 743 Sports 321 Economy 293 Agriculture 86 Environment 104 Astronautics 135 Art 156 Education 131 Medicine 99 Transportation 102 ENERGY 42 Computer 51 Mining 68 History 102 Military 122 Total 2555 ISSN: Issue 2, Volume 5, February 2008

6 Table 2 The testing corpora (G2) Categories Number of Texts Mining 599 Military 321 Computer 246 Electronics 101 Communications 89 Energy 129 Philosophy 132 History 103 Law 124 Literature 99 Total 1943 Definition3 (Precision and Recall): In classification, a is the number of texts belonging to the category ω i and has been identified as ω i successfully. b is the number of texts not belonging to ω i, but is identified as ω i mistakenly. c is the number of texts belonging to ω i, but fails to identified as ω i. d is the number of the texts not belonging to ω i, and hasn t been identified as ω i.so Recall=a/(ac), and Precision= a/(ab). Definition4 (F-value): In multi-class classification, F-value is defined as a function depending on precision ratio and recall ratio to measure the classification performance. F-value is written as: F = 2P R / (PR) (6) The experiment process is in the following: Let the distance coefficient be 1, 6.5, 12.5, 25, 50, 100, 200, respectively. Taking different dimensions of the vector space, we use MDER algorithm to test texts in G1 and G2 for purpose of evaluating the influence of distance coefficient to classification results. (2) Pick randomly 70%, 80%, and 90% of samples from G1 and G2 respectively for training, and the remains of 30%, 20%, 10% are used to test. Taking different dimension of the vector space, we compare the MDER algorithm with knn and CCC in classification performance for open test. (3) Pick randomly 10%, 25%, 50% of samples for every category in G1 and G2 respectively to test. And the 100% takes part in training. Taking different dimension of the vector space, we compare the MDER algorithm with knn and CCC in classification performance of closed test. (4) Use Algorithm4 to add 100 texts belonging to the 15 categories to G1. Then, use the MDER algorithm to test for purpose of evaluating the learning ability to increments. 3.2 Experimental Results The Fig.1 shows the F-value versus the distance coefficient and the dimension. Not changing the dimension, F-value is improved greatly at the beginning of increasing the distance coefficient. After reaching the maximum, F-value will decrease slowly and trend to an invariableness (as Fig.1 (b)). As mentioned above, the improvement of performance is not obvious after F-value s reaching the peak, so the classification result is not sensitive to the distance coefficient any more. The reason is that, for a category, if the projection on a character is equal to 0, then generally the character is the distinct to others. The distance coefficient emphasizes the distance between the training sample and the category. The emphasizing degree can t affect the testing result. So a little change in distance coefficient within a wider scope can t greatly influence the classification performance. When the distance coefficient is larger than 40, the performance in different dimension decreases smoothly. So in the experiment, 1/β 2 is set to be 40. Fig.1 (a) shows the influence of dimension to classification performance. Too few dimensions would be not enough express the characteristic of dimension, while, too many dimensions would bring some disturbance. So in the experiment of this work, when dimension is equal to 300, the best performance is got. 200 F-Value (a) Distance coefficient Distance coefficient (b) Fig.1. the relationship of distance and dimension of property. (a) distance coefficient, dimension verus F- value and (b) F- value versus distance coefficient when dimension is 300 F-Value ISSN: Issue 2, Volume 5, February 2008

7 When classifying large-scale text, the speed should be considered, besides the recall and the precision. So in this paper, we take the average of F- value and the response time as factors to evaluate the MDER algorithm comprehensively. The larger F-value, and the shorter the time, the higher the whole performance is. Fig.2 - Fig.3 illustrate the relationship of F-value and response time versus dimensions for closed and open tests respectively. The experiments show the MDER algorithm is better than knn and CCC with respect to recall and precision. To the response time, MDER is better than knn, and is equivalent to CCC. However, the F-value in CCC is far less then that in MDER. So we can conclude that MDER outperforms knn and CCC in the whole performance. Moreover, after adding 100 texts to G1 by Algorithm4, both precision and recall increase 1%, so the performance of the classifier can still be guaranteed when updating corpora. Therefore, MDER not only offer higher classification precision and speed, but also support incremental learning. F-Value Response time(s) (a) knn MDER CCC knn MDER CCC (b) Fig.2. the relationship of value of F/response time to dimension for close test. (a) F- value versus dimension for closed test and (b) Response time versus dimension for closed test F-Value Response time(s) knn MDER CCC (a) knn MDER CCC (b) Fig.3. the relationship of value of F/response time to dimension for open test. (a) F- value versus dimension for open test and (b) Response time versus dimension for open test 4 Conclusion Most of current text categorization algorithms are based on the vector space, among which knn is the widely-used one. However, these algorithms don t adapt well to large-scale text because of the high complexity on computation. Moreover, when increasing the corpus of training samples in size, the classifier should be rebuild. So they are poor in scalability. This paper presents two concepts, mutual dependence (MD) and equivalence radius (ER), and proposes a new MD & ER based algorithm, which suits classifying large-scale text, and has good scalability. In the new algorithm, compared with knn and CCC, not only the precision and the recall are increased, but the response speed is also improved, which make it possible to classify the large-scale online text. References: [1] Aas K and Eikvil L, Text categorization: A survey, Technical Report, NR 941, Oslo: Norwegian Computing Center, [2] Joachims T, Text categorization with support vector machines: Learning with many relevant ISSN: Issue 2, Volume 5, February 2008

8 features, Proc. of the 10th European Conf. on Machine Learning (ECML-98), Chemnitz: Springer-Verlag, 1998, pp [3] Schapire R E and Singer Y B, Texter: A boosting-based system for text categorization, Machine Learning, Vol. 39, No. 2-3, 2000, pp [4] Tan S, Neighbor-Weighted k-nearest neighbor for unbalanced text corpus, Expert Systems with Applications, Vol. 28, No.4, 2005, pp [5] Sebastiani F, Machine learning in automated text categorization, ACM Computing Surveys, Vol. 34, No.1, 2002, pp.147. [6] Chien-I Lee, Hsiu-Min Chuang, An effective soft clustering approach to mining gene expressions from multi-source databases, Proceedings of the 6th WSEAS Int. Conf. on Artificial Intelligence, Knowledge Engineering and Data Bases, 2007, pp [7] Sen-chi Yu and Yuan-horng Lin, Comparisons of Possibility- and Probability-based Classification: An Example of Depression Severity Clustering, Proceedings of the 8th WSEAS International Conference on Fuzzy Systems, 2007, pp [8] Michael K, Andreas K and Dave R, Stochastic Language Models (N-Gram) Specification, W3C Working Draft, 3 January 2001, [9] Rogati M and Yang Y, High-Performing feature selection for text classification, Proc. of the 11th ACM Int l Conf. on Information and Knowledge Management (CIKM-02), McLean: ACM Press, 2002, pp [10] Thomas E, Segmenting Chinese in Unicode, the 16th International Unicode Conference, 2000, pp ISSN: Issue 2, Volume 5, February 2008

Classification of Publications Based on Statistical Analysis of Scientific Terms Distributions

Classification of Publications Based on Statistical Analysis of Scientific Terms Distributions AUSTRIAN JOURNAL OF STATISTICS Volume 37 (2008), Number 1, 109 118 Classification of Publications Based on Statistical Analysis of Scientific Terms Distributions Vaidas Balys and Rimantas Rudzkis Institute

More information

On a New Model for Automatic Text Categorization Based on Vector Space Model

On a New Model for Automatic Text Categorization Based on Vector Space Model On a New Model for Automatic Text Categorization Based on Vector Space Model Makoto Suzuki, Naohide Yamagishi, Takashi Ishida, Masayuki Goto and Shigeichi Hirasawa Faculty of Information Science, Shonan

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018 From Binary to Multiclass Classification CS 6961: Structured Prediction Spring 2018 1 So far: Binary Classification We have seen linear models Learning algorithms Perceptron SVM Logistic Regression Prediction

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 5: Vector Data: Support Vector Machine Instructor: Yizhou Sun yzsun@cs.ucla.edu October 18, 2017 Homework 1 Announcements Due end of the day of this Thursday (11:59pm)

More information

Support Vector Machine & Its Applications

Support Vector Machine & Its Applications Support Vector Machine & Its Applications A portion (1/3) of the slides are taken from Prof. Andrew Moore s SVM tutorial at http://www.cs.cmu.edu/~awm/tutorials Mingyue Tan The University of British Columbia

More information

Iterative Laplacian Score for Feature Selection

Iterative Laplacian Score for Feature Selection Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,

More information

Training the linear classifier

Training the linear classifier 215, Training the linear classifier A natural way to train the classifier is to minimize the number of classification errors on the training data, i.e. choosing w so that the training error is minimized.

More information

A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier

A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier Seiichi Ozawa 1, Shaoning Pang 2, and Nikola Kasabov 2 1 Graduate School of Science and Technology,

More information

A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier

A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier Seiichi Ozawa, Shaoning Pang, and Nikola Kasabov Graduate School of Science and Technology, Kobe

More information

Modern Information Retrieval

Modern Information Retrieval Modern Information Retrieval Chapter 8 Text Classification Introduction A Characterization of Text Classification Unsupervised Algorithms Supervised Algorithms Feature Selection or Dimensionality Reduction

More information

An Evolution Strategy for the Induction of Fuzzy Finite-state Automata

An Evolution Strategy for the Induction of Fuzzy Finite-state Automata Journal of Mathematics and Statistics 2 (2): 386-390, 2006 ISSN 1549-3644 Science Publications, 2006 An Evolution Strategy for the Induction of Fuzzy Finite-state Automata 1,2 Mozhiwen and 1 Wanmin 1 College

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

An Approach to Classification Based on Fuzzy Association Rules

An Approach to Classification Based on Fuzzy Association Rules An Approach to Classification Based on Fuzzy Association Rules Zuoliang Chen, Guoqing Chen School of Economics and Management, Tsinghua University, Beijing 100084, P. R. China Abstract Classification based

More information

Note on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing

Note on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing Note on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing 1 Zhong-Yuan Zhang, 2 Chris Ding, 3 Jie Tang *1, Corresponding Author School of Statistics,

More information

Multiple Similarities Based Kernel Subspace Learning for Image Classification

Multiple Similarities Based Kernel Subspace Learning for Image Classification Multiple Similarities Based Kernel Subspace Learning for Image Classification Wang Yan, Qingshan Liu, Hanqing Lu, and Songde Ma National Laboratory of Pattern Recognition, Institute of Automation, Chinese

More information

A Hierarchical Representation for the Reference Database of On-Line Chinese Character Recognition

A Hierarchical Representation for the Reference Database of On-Line Chinese Character Recognition A Hierarchical Representation for the Reference Database of On-Line Chinese Character Recognition Ju-Wei Chen 1,2 and Suh-Yin Lee 1 1 Institute of Computer Science and Information Engineering National

More information

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH Hoang Trang 1, Tran Hoang Loc 1 1 Ho Chi Minh City University of Technology-VNU HCM, Ho Chi

More information

Modern Information Retrieval

Modern Information Retrieval Modern Information Retrieval Chapter 8 Text Classification Introduction A Characterization of Text Classification Unsupervised Algorithms Supervised Algorithms Feature Selection or Dimensionality Reduction

More information

Artificial Neural Networks Examination, June 2005

Artificial Neural Networks Examination, June 2005 Artificial Neural Networks Examination, June 2005 Instructions There are SIXTY questions. (The pass mark is 30 out of 60). For each question, please select a maximum of ONE of the given answers (either

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING Text Data: Topic Model Instructor: Yizhou Sun yzsun@cs.ucla.edu December 4, 2017 Methods to be Learnt Vector Data Set Data Sequence Data Text Data Classification Clustering

More information

Internet Engineering Jacek Mazurkiewicz, PhD

Internet Engineering Jacek Mazurkiewicz, PhD Internet Engineering Jacek Mazurkiewicz, PhD Softcomputing Part 11: SoftComputing Used for Big Data Problems Agenda Climate Changes Prediction System Based on Weather Big Data Visualisation Natural Language

More information

Scalable Term Selection for Text Categorization

Scalable Term Selection for Text Categorization Scalable Term Selection for Text Categorization Jingyang Li National Lab of Intelligent Tech. & Sys. Department of Computer Sci. & Tech. Tsinghua University, Beijing, China lijingyang@gmail.com Maosong

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata Principles of Pattern Recognition C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata e-mail: murthy@isical.ac.in Pattern Recognition Measurement Space > Feature Space >Decision

More information

Support Vector Machine. Industrial AI Lab. Prof. Seungchul Lee

Support Vector Machine. Industrial AI Lab. Prof. Seungchul Lee Support Vector Machine Industrial AI Lab. Prof. Seungchul Lee Classification (Linear) Autonomously figure out which category (or class) an unknown item should be categorized into Number of categories /

More information

Support Vector Machine. Industrial AI Lab.

Support Vector Machine. Industrial AI Lab. Support Vector Machine Industrial AI Lab. Classification (Linear) Autonomously figure out which category (or class) an unknown item should be categorized into Number of categories / classes Binary: 2 different

More information

Large Scale Semi-supervised Linear SVM with Stochastic Gradient Descent

Large Scale Semi-supervised Linear SVM with Stochastic Gradient Descent Journal of Computational Information Systems 9: 15 (2013) 6251 6258 Available at http://www.jofcis.com Large Scale Semi-supervised Linear SVM with Stochastic Gradient Descent Xin ZHOU, Conghui ZHU, Sheng

More information

Consistency of Nearest Neighbor Methods

Consistency of Nearest Neighbor Methods E0 370 Statistical Learning Theory Lecture 16 Oct 25, 2011 Consistency of Nearest Neighbor Methods Lecturer: Shivani Agarwal Scribe: Arun Rajkumar 1 Introduction In this lecture we return to the study

More information

Brief Introduction to Machine Learning

Brief Introduction to Machine Learning Brief Introduction to Machine Learning Yuh-Jye Lee Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU August 29, 2016 1 / 49 1 Introduction 2 Binary Classification 3 Support Vector

More information

Categorization ANLP Lecture 10 Text Categorization with Naive Bayes

Categorization ANLP Lecture 10 Text Categorization with Naive Bayes 1 Categorization ANLP Lecture 10 Text Categorization with Naive Bayes Sharon Goldwater 6 October 2014 Important task for both humans and machines object identification face recognition spoken word recognition

More information

ANLP Lecture 10 Text Categorization with Naive Bayes

ANLP Lecture 10 Text Categorization with Naive Bayes ANLP Lecture 10 Text Categorization with Naive Bayes Sharon Goldwater 6 October 2014 Categorization Important task for both humans and machines 1 object identification face recognition spoken word recognition

More information

MIRA, SVM, k-nn. Lirong Xia

MIRA, SVM, k-nn. Lirong Xia MIRA, SVM, k-nn Lirong Xia Linear Classifiers (perceptrons) Inputs are feature values Each feature has a weight Sum is the activation activation w If the activation is: Positive: output +1 Negative, output

More information

CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition

CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition Ad Feelders Universiteit Utrecht Department of Information and Computing Sciences Algorithmic Data

More information

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data

More information

Mining Classification Knowledge

Mining Classification Knowledge Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology SE lecture revision 2013 Outline 1. Bayesian classification

More information

Optimum parameter selection for K.L.D. based Authorship Attribution for Gujarati

Optimum parameter selection for K.L.D. based Authorship Attribution for Gujarati Optimum parameter selection for K.L.D. based Authorship Attribution for Gujarati Parth Mehta DA-IICT, Gandhinagar parth.mehta126@gmail.com Prasenjit Majumder DA-IICT, Gandhinagar prasenjit.majumder@gmail.com

More information

Real Estate Price Prediction with Regression and Classification CS 229 Autumn 2016 Project Final Report

Real Estate Price Prediction with Regression and Classification CS 229 Autumn 2016 Project Final Report Real Estate Price Prediction with Regression and Classification CS 229 Autumn 2016 Project Final Report Hujia Yu, Jiafu Wu [hujiay, jiafuwu]@stanford.edu 1. Introduction Housing prices are an important

More information

Journal of Chemical and Pharmaceutical Research, 2014, 6(5): Research Article

Journal of Chemical and Pharmaceutical Research, 2014, 6(5): Research Article Available online www.jocpr.com Journal of Chemical and Pharmaceutical Research, 2014, 6(5):266-270 Research Article ISSN : 0975-7384 CODEN(USA) : JCPRC5 Anomaly detection of cigarette sales using ARIMA

More information

Mining Classification Knowledge

Mining Classification Knowledge Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology COST Doctoral School, Troina 2008 Outline 1. Bayesian classification

More information

Support vector machines Lecture 4

Support vector machines Lecture 4 Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The

More information

Small sample size generalization

Small sample size generalization 9th Scandinavian Conference on Image Analysis, June 6-9, 1995, Uppsala, Sweden, Preprint Small sample size generalization Robert P.W. Duin Pattern Recognition Group, Faculty of Applied Physics Delft University

More information

Bayes Theorem & Naïve Bayes. (some slides adapted from slides by Massimo Poesio, adapted from slides by Chris Manning)

Bayes Theorem & Naïve Bayes. (some slides adapted from slides by Massimo Poesio, adapted from slides by Chris Manning) Bayes Theorem & Naïve Bayes (some slides adapted from slides by Massimo Poesio, adapted from slides by Chris Manning) Review: Bayes Theorem & Diagnosis P( a b) Posterior Likelihood Prior P( b a) P( a)

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models

More information

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit

More information

Data Mining und Maschinelles Lernen

Data Mining und Maschinelles Lernen Data Mining und Maschinelles Lernen Ensemble Methods Bias-Variance Trade-off Basic Idea of Ensembles Bagging Basic Algorithm Bagging with Costs Randomization Random Forests Boosting Stacking Error-Correcting

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Linear Classification and SVM. Dr. Xin Zhang

Linear Classification and SVM. Dr. Xin Zhang Linear Classification and SVM Dr. Xin Zhang Email: eexinzhang@scut.edu.cn What is linear classification? Classification is intrinsically non-linear It puts non-identical things in the same class, so a

More information

Computational Learning Theory

Computational Learning Theory Computational Learning Theory Pardis Noorzad Department of Computer Engineering and IT Amirkabir University of Technology Ordibehesht 1390 Introduction For the analysis of data structures and algorithms

More information

hp calculators HP 33S Using the formula solver part 2 Overview of the formula solver Practice Example: A formula with several variables

hp calculators HP 33S Using the formula solver part 2 Overview of the formula solver Practice Example: A formula with several variables Overview of the formula solver Practice Example: A formula with several variables Practice Example: A direct solution Practice Example: A formula with a pole Practice Example: Where two functions intersect

More information

Expert Systems with Applications

Expert Systems with Applications Expert Systems with Applications 36 (29) 5718 5727 Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa Cluster-based under-sampling

More information

Generic Text Summarization

Generic Text Summarization June 27, 2012 Outline Introduction 1 Introduction Notation and Terminology 2 3 4 5 6 Text Summarization Introduction Notation and Terminology Two Types of Text Summarization Query-Relevant Summarization:

More information

SUPPORT VECTOR MACHINE

SUPPORT VECTOR MACHINE SUPPORT VECTOR MACHINE Mainly based on https://nlp.stanford.edu/ir-book/pdf/15svm.pdf 1 Overview SVM is a huge topic Integration of MMDS, IIR, and Andrew Moore s slides here Our foci: Geometric intuition

More information

Data Warehousing & Data Mining

Data Warehousing & Data Mining 13. Meta-Algorithms for Classification Data Warehousing & Data Mining Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 13.

More information

Dimensionality reduction

Dimensionality reduction Dimensionality reduction ML for NLP Lecturer: Kevin Koidl Assist. Lecturer Alfredo Maldonado https://www.cs.tcd.ie/kevin.koidl/cs4062/ kevin.koidl@scss.tcd.ie, maldonaa@tcd.ie 2017 Recapitulating: Evaluating

More information

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage

More information

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation Lecture 15. Pattern Classification (I): Statistical Formulation Outline Statistical Pattern Recognition Maximum Posterior Probability (MAP) Classifier Maximum Likelihood (ML) Classifier K-Nearest Neighbor

More information

Geosciences Data Digitize and Materialize, Standardization Based on Logical Inter- Domain Relationships GeoDMS

Geosciences Data Digitize and Materialize, Standardization Based on Logical Inter- Domain Relationships GeoDMS Geosciences Data Digitize and Materialize, Standardization Based on Logical Inter- Domain Relationships GeoDMS Somayeh Veiseh Iran, Corresponding author: Geological Survey of Iran, Azadi Sq, Meraj St,

More information

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Machine Learning. Lecture 9: Learning Theory. Feng Li. Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell

More information

METRIC BASED ATTRIBUTE REDUCTION IN DYNAMIC DECISION TABLES

METRIC BASED ATTRIBUTE REDUCTION IN DYNAMIC DECISION TABLES Annales Univ. Sci. Budapest., Sect. Comp. 42 2014 157 172 METRIC BASED ATTRIBUTE REDUCTION IN DYNAMIC DECISION TABLES János Demetrovics Budapest, Hungary Vu Duc Thi Ha Noi, Viet Nam Nguyen Long Giang Ha

More information

Relationship between Least Squares Approximation and Maximum Likelihood Hypotheses

Relationship between Least Squares Approximation and Maximum Likelihood Hypotheses Relationship between Least Squares Approximation and Maximum Likelihood Hypotheses Steven Bergner, Chris Demwell Lecture notes for Cmpt 882 Machine Learning February 19, 2004 Abstract In these notes, a

More information

Bayesian Classifiers and Probability Estimation. Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington

Bayesian Classifiers and Probability Estimation. Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington Bayesian Classifiers and Probability Estimation Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington 1 Data Space Suppose that we have a classification problem The

More information

Multiclass Classification-1

Multiclass Classification-1 CS 446 Machine Learning Fall 2016 Oct 27, 2016 Multiclass Classification Professor: Dan Roth Scribe: C. Cheng Overview Binary to multiclass Multiclass SVM Constraint classification 1 Introduction Multiclass

More information

Basics of reinforcement learning

Basics of reinforcement learning Basics of reinforcement learning Lucian Buşoniu TMLSS, 20 July 2018 Main idea of reinforcement learning (RL) Learn a sequential decision policy to optimize the cumulative performance of an unknown system

More information

Sparse Approximation and Variable Selection

Sparse Approximation and Variable Selection Sparse Approximation and Variable Selection Lorenzo Rosasco 9.520 Class 07 February 26, 2007 About this class Goal To introduce the problem of variable selection, discuss its connection to sparse approximation

More information

Variable Latent Semantic Indexing

Variable Latent Semantic Indexing Variable Latent Semantic Indexing Prabhakar Raghavan Yahoo! Research Sunnyvale, CA November 2005 Joint work with A. Dasgupta, R. Kumar, A. Tomkins. Yahoo! Research. Outline 1 Introduction 2 Background

More information

Selected Algorithms of Machine Learning from Examples

Selected Algorithms of Machine Learning from Examples Fundamenta Informaticae 18 (1993), 193 207 Selected Algorithms of Machine Learning from Examples Jerzy W. GRZYMALA-BUSSE Department of Computer Science, University of Kansas Lawrence, KS 66045, U. S. A.

More information

CLASSIFYING STRUCTURED WEB SOURCES USING AGGRESSIVE FEATURE SELECTION

CLASSIFYING STRUCTURED WEB SOURCES USING AGGRESSIVE FEATURE SELECTION CLASSIFYING STRUCTURED WEB SOURCES USING AGGRESSIVE FEATURE SELECTION Hieu Quang Le, Stefan Conrad Institute of Computer Science, Heinrich-Heine-Universität Düsseldorf, D-40225 Düsseldorf, Germany lqhieu@cs.uni-duesseldorf.de,

More information

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Schütze s, linked from http://informationretrieval.org/ IR 26/26: Feature Selection and Exam Overview Paul Ginsparg Cornell University,

More information

CS60021: Scalable Data Mining. Large Scale Machine Learning

CS60021: Scalable Data Mining. Large Scale Machine Learning J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 1 CS60021: Scalable Data Mining Large Scale Machine Learning Sourangshu Bhattacharya Example: Spam filtering Instance

More information

Acceleration of Levenberg-Marquardt method training of chaotic systems fuzzy modeling

Acceleration of Levenberg-Marquardt method training of chaotic systems fuzzy modeling ISSN 746-7233, England, UK World Journal of Modelling and Simulation Vol. 3 (2007) No. 4, pp. 289-298 Acceleration of Levenberg-Marquardt method training of chaotic systems fuzzy modeling Yuhui Wang, Qingxian

More information

An Empirical Study on Dimensionality Optimization in Text Mining for Linguistic Knowledge Acquisition

An Empirical Study on Dimensionality Optimization in Text Mining for Linguistic Knowledge Acquisition An Empirical Study on Dimensionality Optimization in Text Mining for Linguistic Knowledge Acquisition Yu-Seop Kim 1, Jeong-Ho Chang 2, and Byoung-Tak Zhang 2 1 Division of Information and Telecommunication

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu October 19, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network

More information

Elementary Loops Revisited

Elementary Loops Revisited Elementary Loops Revisited Jianmin Ji a, Hai Wan b, Peng Xiao b, Ziwei Huo b, and Zhanhao Xiao c a School of Computer Science and Technology, University of Science and Technology of China, Hefei, China

More information

CS264: Beyond Worst-Case Analysis Lecture #15: Topic Modeling and Nonnegative Matrix Factorization

CS264: Beyond Worst-Case Analysis Lecture #15: Topic Modeling and Nonnegative Matrix Factorization CS264: Beyond Worst-Case Analysis Lecture #15: Topic Modeling and Nonnegative Matrix Factorization Tim Roughgarden February 28, 2017 1 Preamble This lecture fulfills a promise made back in Lecture #1,

More information

Metric-based classifiers. Nuno Vasconcelos UCSD

Metric-based classifiers. Nuno Vasconcelos UCSD Metric-based classifiers Nuno Vasconcelos UCSD Statistical learning goal: given a function f. y f and a collection of eample data-points, learn what the function f. is. this is called training. two major

More information

Finding Robust Solutions to Dynamic Optimization Problems

Finding Robust Solutions to Dynamic Optimization Problems Finding Robust Solutions to Dynamic Optimization Problems Haobo Fu 1, Bernhard Sendhoff, Ke Tang 3, and Xin Yao 1 1 CERCIA, School of Computer Science, University of Birmingham, UK Honda Research Institute

More information

Nearest Neighbor Pattern Classification

Nearest Neighbor Pattern Classification Nearest Neighbor Pattern Classification T. M. Cover and P. E. Hart May 15, 2018 1 The Intro The nearest neighbor algorithm/rule (NN) is the simplest nonparametric decisions procedure, that assigns to unclassified

More information

CS540 ANSWER SHEET

CS540 ANSWER SHEET CS540 ANSWER SHEET Name Email 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 1 2 Final Examination CS540-1: Introduction to Artificial Intelligence Fall 2016 20 questions, 5 points

More information

RESEARCH ON SPATIAL DATA MINING BASED ON UNCERTAINTY IN GOVERNMENT GIS

RESEARCH ON SPATIAL DATA MINING BASED ON UNCERTAINTY IN GOVERNMENT GIS Abstract RESEARCH ON SPATIAL DATA MINING BASED ON UNCERTAINTY IN GOVERNMENT GIS Bin Li 1,2, Jiping Liu 1, Lihong Shi 1 1 Chinese Academy of Surveying and Mapping 16 beitaiping Road, Beijing 10039, China

More information

1 Review of the Perceptron Algorithm

1 Review of the Perceptron Algorithm COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #15 Scribe: (James) Zhen XIANG April, 008 1 Review of the Perceptron Algorithm In the last few lectures, we talked about various kinds

More information

COMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines. COMP9444 c Alan Blair, 2017

COMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines. COMP9444 c Alan Blair, 2017 COMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines COMP9444 17s2 Boltzmann Machines 1 Outline Content Addressable Memory Hopfield Network Generative Models Boltzmann Machine Restricted Boltzmann

More information

Natural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley

Natural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley Natural Language Processing Classification Classification I Dan Klein UC Berkeley Classification Automatically make a decision about inputs Example: document category Example: image of digit digit Example:

More information

A Neuro-Fuzzy Scheme for Integrated Input Fuzzy Set Selection and Optimal Fuzzy Rule Generation for Classification

A Neuro-Fuzzy Scheme for Integrated Input Fuzzy Set Selection and Optimal Fuzzy Rule Generation for Classification A Neuro-Fuzzy Scheme for Integrated Input Fuzzy Set Selection and Optimal Fuzzy Rule Generation for Classification Santanu Sen 1 and Tandra Pal 2 1 Tejas Networks India Ltd., Bangalore - 560078, India

More information

Classification with Perceptrons. Reading:

Classification with Perceptrons. Reading: Classification with Perceptrons Reading: Chapters 1-3 of Michael Nielsen's online book on neural networks covers the basics of perceptrons and multilayer neural networks We will cover material in Chapters

More information

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1 EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle

More information

Lecture Notes in Machine Learning Chapter 4: Version space learning

Lecture Notes in Machine Learning Chapter 4: Version space learning Lecture Notes in Machine Learning Chapter 4: Version space learning Zdravko Markov February 17, 2004 Let us consider an example. We shall use an attribute-value language for both the examples and the hypotheses

More information

A FUZZY NEURAL NETWORK MODEL FOR FORECASTING STOCK PRICE

A FUZZY NEURAL NETWORK MODEL FOR FORECASTING STOCK PRICE A FUZZY NEURAL NETWORK MODEL FOR FORECASTING STOCK PRICE Li Sheng Institute of intelligent information engineering Zheiang University Hangzhou, 3007, P. R. China ABSTRACT In this paper, a neural network-driven

More information

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows Kn-Nearest

More information

Fast Logistic Regression for Text Categorization with Variable-Length N-grams

Fast Logistic Regression for Text Categorization with Variable-Length N-grams Fast Logistic Regression for Text Categorization with Variable-Length N-grams Georgiana Ifrim *, Gökhan Bakır +, Gerhard Weikum * * Max-Planck Institute for Informatics Saarbrücken, Germany + Google Switzerland

More information

CSSS/STAT/SOC 321 Case-Based Social Statistics I. Levels of Measurement

CSSS/STAT/SOC 321 Case-Based Social Statistics I. Levels of Measurement CSSS/STAT/SOC 321 Case-Based Social Statistics I Levels of Measurement Christopher Adolph Department of Political Science and Center for Statistics and the Social Sciences University of Washington, Seattle

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu November 3, 2015 Methods to Learn Matrix Data Text Data Set Data Sequence Data Time Series Graph

More information

Math 350: An exploration of HMMs through doodles.

Math 350: An exploration of HMMs through doodles. Math 350: An exploration of HMMs through doodles. Joshua Little (407673) 19 December 2012 1 Background 1.1 Hidden Markov models. Markov chains (MCs) work well for modelling discrete-time processes, or

More information

Part A. P (w 1 )P (w 2 w 1 )P (w 3 w 1 w 2 ) P (w M w 1 w 2 w M 1 ) P (w 1 )P (w 2 w 1 )P (w 3 w 2 ) P (w M w M 1 )

Part A. P (w 1 )P (w 2 w 1 )P (w 3 w 1 w 2 ) P (w M w 1 w 2 w M 1 ) P (w 1 )P (w 2 w 1 )P (w 3 w 2 ) P (w M w M 1 ) Part A 1. A Markov chain is a discrete-time stochastic process, defined by a set of states, a set of transition probabilities (between states), and a set of initial state probabilities; the process proceeds

More information

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9 Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9 Slides adapted from Jordan Boyd-Graber Machine Learning: Chenhao Tan Boulder 1 of 39 Recap Supervised learning Previously: KNN, naïve

More information

Quantization of Rough Set Based Attribute Reduction

Quantization of Rough Set Based Attribute Reduction A Journal of Software Engineering and Applications, 0, 5, 7 doi:46/sea05b0 Published Online Decemer 0 (http://wwwscirporg/ournal/sea) Quantization of Rough Set Based Reduction Bing Li *, Peng Tang, Tommy

More information

Learning Binary Classifiers for Multi-Class Problem

Learning Binary Classifiers for Multi-Class Problem Research Memorandum No. 1010 September 28, 2006 Learning Binary Classifiers for Multi-Class Problem Shiro Ikeda The Institute of Statistical Mathematics 4-6-7 Minami-Azabu, Minato-ku, Tokyo, 106-8569,

More information

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University Chap 1. Overview of Statistical Learning (HTF, 2.1-2.6, 2.9) Yongdai Kim Seoul National University 0. Learning vs Statistical learning Learning procedure Construct a claim by observing data or using logics

More information

Introduction to Machine Learning Lecture 13. Mehryar Mohri Courant Institute and Google Research

Introduction to Machine Learning Lecture 13. Mehryar Mohri Courant Institute and Google Research Introduction to Machine Learning Lecture 13 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Multi-Class Classification Mehryar Mohri - Introduction to Machine Learning page 2 Motivation

More information