Dimensionality reduction
|
|
- Nathaniel Randolf Horton
- 6 years ago
- Views:
Transcription
1 Dimensionality reduction ML for NLP Lecturer: Kevin Koidl Assist. Lecturer Alfredo Maldonado Recapitulating: Evaluating TC systems Evaluation compares actual performance ( ˆf) to ideal performance (f) The most commonly used metrics: Recall: how good the system is at finding relevant documents for a given category (ρ): ρ = true positives true positives + false negatives (1) Precision: the quality of the classified data (π): π = true positives true positives + false positives (2) Recapitulating: Machine Learning and Text Categorisation Foundations Term Clustering Life Cycle: The overall approach Preliminaries: Linguistic Jargon Development Approaches: Train and Test, K-Fold Validation, Category Generality Text Representation: Feature Vectors, Implementations, Indexing Defining Features: Words vs. Phrases, N-Grams Computing Weights: TF-IDF, Normalisation, DIA, AIR/X 1
2 Dimensionality reduction in IR nd TC Basics of Dimensionality Reduction: Vector Representation, V. Length, Euclidean Distance, Cosine Similarity, Cosine Matching Types of DR: Local vs. Global, Feature Selection vs. Feature Extraction Term Space Reduction - TSR: Aggressiveness, Filtering by Document Frequency TSR functions selection: DIA, Info. Gain, Multual Info., Chi-Square, NLG, Relevance Score, Odds Ratio, GSS coef. Refresher: Information Theory - Entropy Function Examples: Multual Info (PMI and Normalised) and Information Gain TF-IDF Advanced: From Local to Global TSR and Comparing TSR techniques Dimensionality reduction in IR nd TC A 3d term set: T = {football, politics, economy} IR: calculate distances between vectors (e.g. via cosine matching) TC: High dimensionality may be problematic Economy d1 = <0.5,0.5,0.3> d2 = <0.5,0.3,0.3> football politics Cosine similarity between documents d and e is given by: cos(d, e) = d e d e Where d is the Euclidean norm of d. In the case of the example above (normalised vectors) the Euclidean distance can be used instead, as it gives the same rank order as cosine similarity: T dist(d, e) = (d i e i ) 2 (3) Basics of Dimensionality Reduction It starts with a n-dimensional classification (vector) space. 2
3 In text classification the axis represents features - the classifcation labels or values (label, binary etc.) A vector in the n-dimensional classification space represent a document. Q: Why a Vector Space? Are the properties of the Vector important? Is it important how the vectors stand to each other? Basics of Dimensionality Reduction: An example Document 1: Economy and Politics is important for Football. Document 2: In Economy the Politics of Football counts. Classification towards three labels Football, Economy, Politics Typicall approach: Bag of Words and Frequency. Resulting Vectors for D1 and d2: (1,1,1) and (1,1,1) Trivial Example Economy d1 = <0.5,0.5,0.3> d2 = <0.5,0.3,0.3> football politics Basics of Dimensionality Reduction: An example Document 1 and 2 are longer documents with (5,5,3) and (5,3,3) in keyword frequency How do we compare both Vectors? Normalisation: Unit Vector (on the surface of the unit hypersphere) u = v v v = , ( , , ) 3
4 Dimensionality reduction in IR nd TC A 3d term set: T = {football, politics, economy} IR: calculate distances between vectors (e.g. via cosine matching) TC: High dimensionality may be problematic Economy d1 = <0.5,0.5,0.3> d2 = <0.5,0.3,0.3> football politics Cosine similarity between documents d and e is given by: cos(d, e) = d e d e Where d is the Euclidean norm of d. In the case of the example above (normalised vectors) the Euclidean distance can be used instead, as it gives the same rank order as cosine similarity: T dist(d, e) = (d i e i ) 2 (4) Cosine similarity between documents d and e cos(d, e) = d e d e Where d is the Euclidean norm of d. In the case of the example above (normalised vectors) the Euclidean distance can be used instead, as it gives the same rank order as cosine similarity: T dist(d, e) = (d i e i ) 2 (5) Cosine similarity example with three documents and four labels Three novels. SaS: Sense and Sensibility, PaP: Pride and Prejudice, WH: Wuthering Heights? Term SaS PaP WH affection jealous gossip wuthering
5 Normalization using Log Frequency Weighting Log Frequency = 1 + log (term frequency) Log Frequency Weighting Values Term SaS PaP WH affection jealous gossip wuthering Length Normalized Values Term SaS PaP WH affection jealous gossip wuthering Cosine similarity between documents d e cos(d, e) = d e T dist(d, e) = (d i e i ) 2 (6) cos(sas,pap) x x x x cos(sas, WH) 0.79 cos(pap, WH) 0.69 What is Dimensionality Reduction? DR: a processing step whose goal is to reduce the size of the vector space from T to T T. T is called Reduced Term Set Benefits of DR: Lower computational cost for ML Help avoid overfitting (training on constitutive features, rather than contingent ones) A rule-of-thumb: overfitting is avoided if the number of training examples is proportional to the size of T (For TC, experiments have suggested a ratio of texts per feature). Local vs. Global DR DR can be done for each category or for the whole set of categories: Local DR: for for each category c i,a set T i of terms ( T i T, typically 10 T i 50) is chosen for classication under c i. 5
6 Different term sets are used for different categories. global DR:,a set T of terms ( T T ) is chosen for classication under all categories C = {c 1,..., c C } N.B.: Most feature selection techniques can be used for local and global DR alike. DR by feature selection vs. DR by feature extraction DR by term selection or Term Space Reduction (TSR): T is a subset of T. Select a T from the original feature set which yields the highest effectiveness w.r.t document indexing DR by term extraction: the terms in T are not of the same type as the terms in T, but are obtained by combinations or transformations of the original ones. E.g: in DR by term extraction, if the terms in T are words, the terms in T may not be words at all. Term Space Reduction There are two ways to reduce the term space: TSR by term wrapping: the ML algorithm itself is used to reduce term space dimensionality. TSR by term filtering: terms are ranked according to their importance for the TC task and the highest-scoring ones are chosen. Performance is measured in terms of aggressiveness: the ratio between the sizes of original and reduced feature set: T T Empirical comparisons of TSR techniques can be found in (Yang and Pedersen, 1997) and (Forman, 2003). Filtering by document frequency The simplest TSR technique: 1. Remove stop-words, etc, (see pre-processing steps) 2. Order all features t k in T according to the number of documents in which they occur. Call this metric # T r (t k ) 3. Choose T = {t 1,..., t n } s.t. it contains the n highest scoring t k Advantages: Low computational cost DR up to a factor of 10 with small reduction in effectiveness (Yang and Pedersen, 1997) 6
7 Information theoretic TSR: preliminaries Probability distributions: probabilities on an event space of documents: P ( t k, c i ): for a random document x, term t k does not occur in x and x is classified under category c i. Similarly, we represent the probability that t k does occur in x and x is filed under c i by P (t k, c i ). N.B.: This notation will be used as shorthand for instantiations of the appropriate random variables. That is, for multivariate Bernoulli models: P (T k = 0, C i = 1) and P (T k = 1, C i = 1), respectively. Commonly used TSR functions Function Notation Mathematical definition DIA factor z(t k, C i) P (c i t k ) P (t, c) Information Gain, AKA IG(T k, C i) or P (t, c) log P (t)p (c) Expected Mutual Information I(T k ; C i) c {c i, c i } t {t k, t k } Mutual information MI(T k, C i) P (t k, c i) log P (t k,c i ) P (t k )P (c i ) Chi-square χ 2 (T k, C i) T r [P (t k, c i)p ( t k, c i) P (t k, c i)p ( t k, c i)] 2 P (t k )P ( t k )P (c i)p ( c i) NLG coefficient NGL(T k, C i) T r [P (tk,c i )P ( t k, c i ) P (t k, c i )P ( t k,c i )] P (tk )P ( t k )P (c i )P ( c i ) Relevancy score RS(T k, C i) log P (t k c i) + d P ( t k c i) + d Odds ratio OR(t k, c i) P (t k c i )[1 P (t k c i )] [1 P (t k c i )]P (t k c i ) GSS coefficient GSS(T k, C i) P (t k, c i)p ( t k, c i) P (t k, c i)p ( t k, c i) from (Sebastiani, 2002) The two more exotic acronyms, GSS and NGL are for the initials of the researchers who first proposed those metrics, namely Galavotti-Sebastiani-Simi coefficient (GSS), proposed by (Galavotti et al., 2000) and Ng-Goh-Low-Leong coefficient (NGL) proposed by (Ng et al., 1997). Some functions in detail Basic intuition: the best features for a category are those distributed most differently on sets of positive and negative instances of documents filed under that category Pointwise mutual information: P (ti, cj) P MI(T i, C j) = log P (t i)p (c j) (7) Calculations to be performed: co-occurrence of terms and categories in the training corpus (T r), and frequency of occurrence of words and categories in T r. Implementing I(.,.) Example: extracting keywords from paragraphs. 7
8 1. pointwisemi(d): wordpartable 2. pmitable = () 3. parlist = split_as_paragraphs(d) 4. typelist = gettypelist(d) 5. foreach (par in parlist) do 6 ptypelist = gettypelist(par) 7. pindex = indexof(par, parlist) 8. foreach (word in ptypelist) do 9. i_w_p = log ( getwordprobability(word, par) / getwordprobability(word, d) ) 10. addtotable(<word,pindex>, i_w_p, pmitable) 11. done 12. done 13 return pmitable The keyword spotting examples in this chapter use a slightly different sample space model than the one we will be using in the TC application. The intention is to illustrate alternative ways of modelling linguistic data in a probabilistic framework, and the fact that the TSR metrics can be used in different contexts. For the algorithm above, term occurrences are taken to be the elementary events. Term occurences in the whole text generate the prior probabilities for terms P (t). Term occurrences in certain paragraphs give conditional probabilities P (t c) (i.e. occurences of terms conditioned on the paragraph, taken in this case to be the category ). Paragraphs are assumed to have a uniform prior P (c) (i.e. they are all equally likely to occur). In the case of P MI(T, C) (mutual information of a term and a paragraph), we can simply work with priors and conditionals for words: P MI(T, C) = P (t, c) P (t)p (c) = P (t c)p (c) P (t)p (c) = P (t c) P (t) (8) The conditional P (t c) can be calculated by dividing the number of times t occurs in documents of category c by the total number of tokens in those documents (the probability space for documents of category c). P (t) can be calculated by dividing the frequency of t in the training corpus by the total number of tokens in that corpus. Normalised mutual information P (ti, cj) MI(T i, C j) = P (t i, c j)log P (t i)p (c i) (9) 1. mi(d): wordpartable /* rank each word in d */ 2. mitable = () 3. parlist = split_as_paragraphs(d) 8
9 4. p_par = 1/sizeof(parlist) 5. typelist = gettypelist(d) 6. foreach (par in parlist) do 7 ptypelist = gettypelist(par) 8. pindex = indexof(par, parlist) 9. foreach (word in ptypelist) do 10. mi_w_p = getwordprobability(word, par) * p_par * log ( getwordprobability(word, par) / getwordprobability(word, d) ) 11. addtotable(<word,pindex>, mi_w_p, mitable) 12. done 13. done 14 return mitable Similarly to (8) we can simplify the computation of MI(T, C) as follows: P (t, c) MI(T, C) = P (t, c)log P (t)p (c) = P (t c)p (c)log P (t c) P (t) (10) Expected Mutual Information (Information gain) A formalisation of how much information about category c j does one gain by knowing term t i (and vice-versa). IG(T i, C j ) = t {t i, t i} c {c j, c j} P (t, c) P (t, c)log P (t)p (c) (11) Computational cost of calculating IG(.,.) is higher than that of estimating MI(.,.) IG: a simplified example 1. ig(d): wordpartable /* features = words; */ 2. igtable = () /* categories = paragraphs */ 3. parlist = split_as_paragraphs(d) 4. p_par = 1/sizeof(parlist) 5. typelist = gettypelist(d) 6. foreach (par in parlist) do 7. ptypelist = gettypelist(par) 8. pindex = indexof(par, parlist) 9. foreach (word in ptypelist) do /* oversimplification: assuming T = {word} */ 10. foreach (par in parlist) do 11. ig_w_p += getwordprobability(word, par) * p_par * log ( getwordprobability(word, par) / getwordprobability(word, d) ) 12. done 13. addtotable(<word,pindex>, ig_w_p, igtable) 14. done 15. done 16. return igtable 9
10 From local to global TSR A locally specified TSR function tsr(t k, C i), i.e. ranging over terms t k with repect to a specific to category c i, can be made global by: Summing over the set of all categories: Taking a weighted average Picking the maximum Comparing TSR techniques C tsr sum(t k ) = tsr(t k, C i) (12) C tsr wavg(t k ) = P (C i)tsr(t k, C i) (13) tsr max(t k ) = max C tsr(t k, C i) (14) Effectiveness depends on the chosen task, domain etc Reduction factor of up to 100 with IG sum and χ 2 max Summary of empirical studies on the performance of different information theoretic measures (Sebastiani, 2002): {OR sum, NGL sum, GSS max } {IG sum, χ 2 max} {# wavg, χ 2 wavg} References Forman, G. (2003). An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research, 3: Galavotti, L., Sebastiani, F., and Simi, M. (2000). Experiments on the use of feature selection and negative evidence in automated text categorization. Technical report, Paris, France, France. Ng, H. T., Goh, W. B., and Low, K. L. (1997). Feature selection, perceptron learning, and a usability case study for text categorization. In Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR 97, pages 67 73, New York, NY, USA. ACM. 10
11 Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34(1):1 47. Yang, Y. and Pedersen, J. O. (1997). A comparative study on feature selection in text categorization. In Fisher, D. H., editor, Proceedings of ICML-97, 14th International Conference on Machine Learning, pages , Nashville. Morgan Kaufmann Publishers. 11
Term Weighting and the Vector Space Model. borrowing from: Pandu Nayak and Prabhakar Raghavan
Term Weighting and the Vector Space Model borrowing from: Pandu Nayak and Prabhakar Raghavan IIR Sections 6.2 6.4.3 Ranked retrieval Scoring documents Term frequency Collection statistics Weighting schemes
More informationScoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology
Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2016 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)
More informationScoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology
Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2017 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)
More informationSupervised Term Weighting for Automated Text Categorization
Supervised Term Weighting for Automated Text Categorization Franca Debole Istituto di Scienza e Tecnologie dell Informazione Consiglio Nazionale delle Ricerche Via G. Moruzzi 1-56124 Pisa (Italy) debole@iei.pi.cnr.it
More informationScoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology
Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2014 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)
More informationCS276A Text Information Retrieval, Mining, and Exploitation. Lecture 4 15 Oct 2002
CS276A Text Information Retrieval, Mining, and Exploitation Lecture 4 15 Oct 2002 Recap of last time Index size Index construction techniques Dynamic indices Real world considerations 2 Back of the envelope
More informationInformation Retrieval and Web Search
Information Retrieval and Web Search IR models: Vector Space Model IR Models Set Theoretic Classic Models Fuzzy Extended Boolean U s e r T a s k Retrieval: Adhoc Filtering Brosing boolean vector probabilistic
More informationInformation Retrieval
Introduction to Information Retrieval CS276: Information Retrieval and Web Search Pandu Nayak and Prabhakar Raghavan Lecture 6: Scoring, Term Weighting and the Vector Space Model This lecture; IIR Sections
More informationInformation Retrieval
Introduction to Information Retrieval CS276: Information Retrieval and Web Search Christopher Manning and Prabhakar Raghavan Lecture 6: Scoring, Term Weighting and the Vector Space Model This lecture;
More informationTerm Weighting and Vector Space Model. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze
Term Weighting and Vector Space Model Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze 1 Ranked retrieval Thus far, our queries have all been Boolean. Documents either
More informationScoring, Term Weighting and the Vector Space
Scoring, Term Weighting and the Vector Space Model Francesco Ricci Most of these slides comes from the course: Information Retrieval and Web Search, Christopher Manning and Prabhakar Raghavan Content [J
More informationDealing with Text Databases
Dealing with Text Databases Unstructured data Boolean queries Sparse matrix representation Inverted index Counts vs. frequencies Term frequency tf x idf term weights Documents as vectors Cosine similarity
More informationInforma(on Retrieval
Introduc*on to Informa(on Retrieval Lecture 6-2: The Vector Space Model Binary incidence matrix Anthony and Cleopatra Julius Caesar The Tempest Hamlet Othello Macbeth... ANTHONY BRUTUS CAESAR CALPURNIA
More informationInforma(on Retrieval
Introduc*on to Informa(on Retrieval Lecture 6-2: The Vector Space Model Outline The vector space model 2 Binary incidence matrix Anthony and Cleopatra Julius Caesar The Tempest Hamlet Othello Macbeth...
More informationTDDD43. Information Retrieval. Fang Wei-Kleiner. ADIT/IDA Linköping University. Fang Wei-Kleiner ADIT/IDA LiU TDDD43 Information Retrieval 1
TDDD43 Information Retrieval Fang Wei-Kleiner ADIT/IDA Linköping University Fang Wei-Kleiner ADIT/IDA LiU TDDD43 Information Retrieval 1 Outline 1. Introduction 2. Inverted index 3. Ranked Retrieval tf-idf
More informationSymbolic methods in TC: Decision Trees
Symbolic methods in TC: Decision Trees ML for NLP Lecturer: Kevin Koidl Assist. Lecturer Alfredo Maldonado https://www.cs.tcd.ie/kevin.koidl/cs0/ kevin.koidl@scss.tcd.ie, maldonaa@tcd.ie 01-017 A symbolic
More informationText Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University
Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data
More informationINFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from
INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Schütze s, linked from http://informationretrieval.org/ IR 5: Scoring, Term Weighting, The Vector Space Model II Paul Ginsparg Cornell
More informationInformation Retrieval. Lecture 6
Information Retrieval Lecture 6 Recap of the last lecture Parametric and field searches Zones in documents Scoring documents: zone weighting Index support for scoring tf idf and vector spaces This lecture
More informationEntropy based feature selection for text categorization
Entropy based feature selection for text categorization Christine Largeron, Christophe Moulin, Mathias Géry To cite this version: Christine Largeron, Christophe Moulin, Mathias Géry. Entropy based feature
More informationA Study for Evaluating the Importance of Various Parts of Speech (POS) for Information Retrieval (IR)
A Study for Evaluating the Importance of Various Parts of Speech (POS) for Information Retrieval (IR) Chirag Shah Dept. of CSE IIT Bombay, Powai Mumbai - 400 076, Maharashtra, India. Email: chirag@cse.iitb.ac.in
More informationModern Information Retrieval
Modern Information Retrieval Chapter 8 Text Classification Introduction A Characterization of Text Classification Unsupervised Algorithms Supervised Algorithms Feature Selection or Dimensionality Reduction
More informationPV211: Introduction to Information Retrieval
PV211: Introduction to Information Retrieval http://www.fi.muni.cz/~sojka/pv211 IIR 6: Scoring, term weighting, the vector space model Handout version Petr Sojka, Hinrich Schütze et al. Faculty of Informatics,
More informationSymbolic methods in TC: Decision Trees
Symbolic methods in TC: Decision Trees ML for NLP Lecturer: Kevin Koidl Assist. Lecturer Alfredo Maldonado https://www.cs.tcd.ie/kevin.koidl/cs4062/ kevin.koidl@scss.tcd.ie, maldonaa@tcd.ie 2016-2017 2
More informationCS 572: Information Retrieval
CS 572: Information Retrieval Lecture 5: Term Weighting and Ranking Acknowledgment: Some slides in this lecture are adapted from Chris Manning (Stanford) and Doug Oard (Maryland) Lecture Plan Skip for
More informationANLP Lecture 22 Lexical Semantics with Dense Vectors
ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous
More informationSparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.
ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent
More informationClassification of Publications Based on Statistical Analysis of Scientific Terms Distributions
AUSTRIAN JOURNAL OF STATISTICS Volume 37 (2008), Number 1, 109 118 Classification of Publications Based on Statistical Analysis of Scientific Terms Distributions Vaidas Balys and Rimantas Rudzkis Institute
More informationSUBMITTED TO IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 1. Toward Optimal Feature Selection in Naive Bayes for Text Categorization
SUBMITTED TO IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 1 Toward Optimal Feature Selection in Naive Bayes for Text Categorization Bo Tang, Student Member, IEEE, Steven Kay, Fellow, IEEE, and Haibo
More informationCLASSIFYING STRUCTURED WEB SOURCES USING AGGRESSIVE FEATURE SELECTION
CLASSIFYING STRUCTURED WEB SOURCES USING AGGRESSIVE FEATURE SELECTION Hieu Quang Le, Stefan Conrad Institute of Computer Science, Heinrich-Heine-Universität Düsseldorf, D-40225 Düsseldorf, Germany lqhieu@cs.uni-duesseldorf.de,
More informationLanguage Models, Smoothing, and IDF Weighting
Language Models, Smoothing, and IDF Weighting Najeeb Abdulmutalib, Norbert Fuhr University of Duisburg-Essen, Germany {najeeb fuhr}@is.inf.uni-due.de Abstract In this paper, we investigate the relationship
More informationSupervised Term Weighting for Automated Text Categorization
Supervised Term Weighting for Automated Text Categorization Franca Debole and Fabrizio Sebastiani Istituto di Scienza e Tecnologie dell Informazione Consiglio Nazionale delle Ricerche Via Giuseppe Moruzzi,
More informationText classification II CE-324: Modern Information Retrieval Sharif University of Technology
Text classification II CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2016 Some slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)
More informationMachine Learning for NLP
Machine Learning for NLP Uppsala University Department of Linguistics and Philology Slides borrowed from Ryan McDonald, Google Research Machine Learning for NLP 1(50) Introduction Linear Classifiers Classifiers
More informationScalable Term Selection for Text Categorization
Scalable Term Selection for Text Categorization Jingyang Li National Lab of Intelligent Tech. & Sys. Department of Computer Sci. & Tech. Tsinghua University, Beijing, China lijingyang@gmail.com Maosong
More informationLecture 4 Ranking Search Results. Many thanks to Prabhakar Raghavan for sharing most content from the following slides
Lecture 4 Ranking Search Results Many thanks to Prabhakar Raghavan for sharing most content from the following slides Recap of the previous lecture Index construction Doing sorting with limited main memory
More informationINFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from
INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Schütze s, linked from http://informationretrieval.org/ IR 26/26: Feature Selection and Exam Overview Paul Ginsparg Cornell University,
More informationRetrieval by Content. Part 2: Text Retrieval Term Frequency and Inverse Document Frequency. Srihari: CSE 626 1
Retrieval by Content Part 2: Text Retrieval Term Frequency and Inverse Document Frequency Srihari: CSE 626 1 Text Retrieval Retrieval of text-based information is referred to as Information Retrieval (IR)
More informationSpam filtering: how the dimensionality reduction affects the accuracy of Naive Bayes classifiers
J Internet Serv Appl (2011) 1: 183 200 DOI 10.1007/s13174-010-0014-7 ORIGINAL PAPER Spam filtering: how the dimensionality reduction affects the accuracy of Naive Bayes classifiers Tiago A. Almeida Jurandy
More informationText classification II CE-324: Modern Information Retrieval Sharif University of Technology
Text classification II CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2017 Some slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)
More informationDepartment of Computer Science and Engineering CSE 151 University of California, San Diego Fall Midterm Examination
Department of Computer Science and Engineering CSE 151 University of California, San Diego Fall 2008 Your name: Midterm Examination Tuesday October 28, 9:30am to 10:50am Instructions: Look through the
More informationCategorization ANLP Lecture 10 Text Categorization with Naive Bayes
1 Categorization ANLP Lecture 10 Text Categorization with Naive Bayes Sharon Goldwater 6 October 2014 Important task for both humans and machines object identification face recognition spoken word recognition
More informationANLP Lecture 10 Text Categorization with Naive Bayes
ANLP Lecture 10 Text Categorization with Naive Bayes Sharon Goldwater 6 October 2014 Categorization Important task for both humans and machines 1 object identification face recognition spoken word recognition
More informationClassification. Team Ravana. Team Members Cliffton Fernandes Nikhil Keswaney
Email Classification Team Ravana Team Members Cliffton Fernandes Nikhil Keswaney Hello! Cliffton Fernandes MS-CS Nikhil Keswaney MS-CS 2 Topic Area Email Classification Spam: Unsolicited Junk Mail Ham
More informationDISTRIBUTIONAL SEMANTICS
COMP90042 LECTURE 4 DISTRIBUTIONAL SEMANTICS LEXICAL DATABASES - PROBLEMS Manually constructed Expensive Human annotation can be biased and noisy Language is dynamic New words: slangs, terminology, etc.
More informationLarge-scale Image Annotation by Efficient and Robust Kernel Metric Learning
Large-scale Image Annotation by Efficient and Robust Kernel Metric Learning Supplementary Material Zheyun Feng Rong Jin Anil Jain Department of Computer Science and Engineering, Michigan State University,
More informationInformation Retrieval
Introduction to Information Retrieval Lecture 11: Probabilistic Information Retrieval 1 Outline Basic Probability Theory Probability Ranking Principle Extensions 2 Basic Probability Theory For events A
More informationNaïve Bayes, Maxent and Neural Models
Naïve Bayes, Maxent and Neural Models CMSC 473/673 UMBC Some slides adapted from 3SLP Outline Recap: classification (MAP vs. noisy channel) & evaluation Naïve Bayes (NB) classification Terminology: bag-of-words
More informationVariable Latent Semantic Indexing
Variable Latent Semantic Indexing Prabhakar Raghavan Yahoo! Research Sunnyvale, CA November 2005 Joint work with A. Dasgupta, R. Kumar, A. Tomkins. Yahoo! Research. Outline 1 Introduction 2 Background
More informationNatural Language Processing. Topics in Information Retrieval. Updated 5/10
Natural Language Processing Topics in Information Retrieval Updated 5/10 Outline Introduction to IR Design features of IR systems Evaluation measures The vector space model Latent semantic indexing Background
More informationLearning Features from Co-occurrences: A Theoretical Analysis
Learning Features from Co-occurrences: A Theoretical Analysis Yanpeng Li IBM T. J. Watson Research Center Yorktown Heights, New York 10598 liyanpeng.lyp@gmail.com Abstract Representing a word by its co-occurrences
More informationMachine Learning for NLP
Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline
More informationRETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
RETRIEVAL MODELS Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Retrieval models Boolean model Vector space model Probabilistic
More informationSUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION
SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology
More informationDiscrete Multivariate Statistics
Discrete Multivariate Statistics Univariate Discrete Random variables Let X be a discrete random variable which, in this module, will be assumed to take a finite number of t different values which are
More information1 Handling of Continuous Attributes in C4.5. Algorithm
.. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. Data Mining: Classification/Supervised Learning Potpourri Contents 1. C4.5. and continuous attributes: incorporating continuous
More informationBenchmarking Non-Parametric Statistical Tests
R E S E A R C H R E P O R T I D I A P Benchmarking Non-Parametric Statistical Tests Mikaela Keller a Samy Bengio a Siew Yeung Wong a IDIAP RR 05-38 January 5, 2006 to appear in Advances in Neural Information
More informationMachine Learning for natural language processing
Machine Learning for natural language processing Classification: k nearest neighbors Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 28 Introduction Classification = supervised method
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationSemantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing
Semantics with Dense Vectors Reference: D. Jurafsky and J. Martin, Speech and Language Processing 1 Semantics with Dense Vectors We saw how to represent a word as a sparse vector with dimensions corresponding
More information1 Handling of Continuous Attributes in C4.5. Algorithm
.. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. Data Mining: Classification/Supervised Learning Potpourri Contents 1. C4.5. and continuous attributes: incorporating continuous
More informationModern Information Retrieval
Modern Information Retrieval Chapter 8 Text Classification Introduction A Characterization of Text Classification Unsupervised Algorithms Supervised Algorithms Feature Selection or Dimensionality Reduction
More informationClassification and Regression Trees
Classification and Regression Trees Ryan P Adams So far, we have primarily examined linear classifiers and regressors, and considered several different ways to train them When we ve found the linearity
More informationMachine Learning. Ludovic Samper. September 1st, Antidot. Ludovic Samper (Antidot) Machine Learning September 1st, / 77
Machine Learning Ludovic Samper Antidot September 1st, 2015 Ludovic Samper (Antidot) Machine Learning September 1st, 2015 1 / 77 Antidot Software vendor since 1999 Paris, Lyon, Aix-en-Provence 45 employees
More informationOutline for today. Information Retrieval. Cosine similarity between query and document. tf-idf weighting
Outline for today Information Retrieval Efficient Scoring and Ranking Recap on ranked retrieval Jörg Tiedemann jorg.tiedemann@lingfil.uu.se Department of Linguistics and Philology Uppsala University Efficient
More informationModel-based estimation of word saliency in text
Model-based estimation of word saliency in text Xin Wang and Ata Kabán School of Computer Science, The University of Birmingham, Birmingham, B15 2TT, UK {X.C.Wang,A.Kaban}@cs.bham.ac.uk Abstract. We investigate
More informationWhat is Text mining? To discover the useful patterns/contents from the large amount of data that can be structured or unstructured.
What is Text mining? To discover the useful patterns/contents from the large amount of data that can be structured or unstructured. Text mining What can be used for text mining?? Classification/categorization
More informationLanguage as a Stochastic Process
CS769 Spring 2010 Advanced Natural Language Processing Language as a Stochastic Process Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu 1 Basic Statistics for NLP Pick an arbitrary letter x at random from any
More informationOutline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation
More informationCLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition
CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition Ad Feelders Universiteit Utrecht Department of Information and Computing Sciences Algorithmic Data
More informationDeep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017
Deep Learning for Natural Language Processing Sidharth Mudgal April 4, 2017 Table of contents 1. Intro 2. Word Vectors 3. Word2Vec 4. Char Level Word Embeddings 5. Application: Entity Matching 6. Conclusion
More informationLearning theory. Ensemble methods. Boosting. Boosting: history
Learning theory Probability distribution P over X {0, 1}; let (X, Y ) P. We get S := {(x i, y i )} n i=1, an iid sample from P. Ensemble methods Goal: Fix ɛ, δ (0, 1). With probability at least 1 δ (over
More information6.036 midterm review. Wednesday, March 18, 15
6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that
More informationECE521 Lecture7. Logistic Regression
ECE521 Lecture7 Logistic Regression Outline Review of decision theory Logistic regression A single neuron Multi-class classification 2 Outline Decision theory is conceptually easy and computationally hard
More informationClassification Algorithms
Classification Algorithms UCSB 290N, 2015. T. Yang Slides based on R. Mooney UT Austin 1 Table of Content roblem Definition Rocchio K-nearest neighbor case based Bayesian algorithm Decision trees 2 Given:
More informationAn Efficient Algorithm for Large-Scale Text Categorization
An Efficient Algorithm for Large-Scale Text Categorization CHANG-RUI YU 1, YAN LUO 2 1 School of Information Management and Engineering Shanghai University of Finance and Economics No.777, Guoding Rd.
More informationCS 188: Artificial Intelligence. Outline
CS 188: Artificial Intelligence Lecture 21: Perceptrons Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein. Outline Generative vs. Discriminative Binary Linear Classifiers Perceptron Multi-class
More informationCS249: ADVANCED DATA MINING
CS249: ADVANCED DATA MINING Clustering Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu May 2, 2017 Announcements Homework 2 due later today Due May 3 rd (11:59pm) Course project
More informationOnline Passive-Aggressive Algorithms. Tirgul 11
Online Passive-Aggressive Algorithms Tirgul 11 Multi-Label Classification 2 Multilabel Problem: Example Mapping Apps to smart folders: Assign an installed app to one or more folders Candy Crush Saga 3
More informationLearning Methods for Linear Detectors
Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2011/2012 Lesson 20 27 April 2012 Contents Learning Methods for Linear Detectors Learning Linear Detectors...2
More informationPrediction of Citations for Academic Papers
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationSignificance Tests for Bizarre Measures in 2-Class Classification Tasks
R E S E A R C H R E P O R T I D I A P Significance Tests for Bizarre Measures in 2-Class Classification Tasks Mikaela Keller 1 Johnny Mariéthoz 2 Samy Bengio 3 IDIAP RR 04-34 October 4, 2004 D a l l e
More informationLanguage Models. CS6200: Information Retrieval. Slides by: Jesse Anderton
Language Models CS6200: Information Retrieval Slides by: Jesse Anderton What s wrong with VSMs? Vector Space Models work reasonably well, but have a few problems: They are based on bag-of-words, so they
More informationBayes Theorem & Naïve Bayes. (some slides adapted from slides by Massimo Poesio, adapted from slides by Chris Manning)
Bayes Theorem & Naïve Bayes (some slides adapted from slides by Massimo Poesio, adapted from slides by Chris Manning) Review: Bayes Theorem & Diagnosis P( a b) Posterior Likelihood Prior P( b a) P( a)
More informationA Neural Passage Model for Ad-hoc Document Retrieval
A Neural Passage Model for Ad-hoc Document Retrieval Qingyao Ai, Brendan O Connor, and W. Bruce Croft College of Information and Computer Sciences, University of Massachusetts Amherst, Amherst, MA, USA,
More informationAn Analysis on Frequency of Terms for Text Categorization
An Analysis on Frequency of Terms for Text Categorization Edgar Moyotl-Hernández Fac. de Ciencias de la Computación B. Universidad Autónoma de Puebla C.U. 72570, Puebla, México emoyotl@mail.cs.buap.mx,
More informationRecap of the last lecture. CS276A Information Retrieval. This lecture. Documents as vectors. Intuition. Why turn docs into vectors?
CS276A Information Retrieval Recap of the last lecture Parametric and field searches Zones in documents Scoring documents: zone weighting Index support for scoring tf idf and vector spaces Lecture 7 This
More information18.6 Regression and Classification with Linear Models
18.6 Regression and Classification with Linear Models 352 The hypothesis space of linear functions of continuous-valued inputs has been used for hundreds of years A univariate linear function (a straight
More informationGeneralization bounds
Advanced Course in Machine Learning pring 200 Generalization bounds Handouts are jointly prepared by hie Mannor and hai halev-hwartz he problem of characterizing learnability is the most basic question
More informationOn a New Model for Automatic Text Categorization Based on Vector Space Model
On a New Model for Automatic Text Categorization Based on Vector Space Model Makoto Suzuki, Naohide Yamagishi, Takashi Ishida, Masayuki Goto and Shigeichi Hirasawa Faculty of Information Science, Shonan
More informationClassifying Chinese Texts in Two Steps
Classifying Chinese Texts in Two Steps Xinghua Fan,, 3, Maosong Sun, Key-sun Choi 3, and Qin Zhang State Key Laboratory of Intelligent Technoy and Systems, Tsinghua University, Beijing 00084, China fanxh@tsinghua.org.cn,
More informationKnowledge Discovery in Data: Overview. Naïve Bayesian Classification. .. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar..
Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar Knowledge Discovery in Data: Naïve Bayes Overview Naïve Bayes methodology refers to a probabilistic approach to information discovery
More informationVector Space Scoring Introduction to Information Retrieval INF 141 Donald J. Patterson
Vector Space Scoring Introduction to Information Retrieval INF 141 Donald J. Patterson Content adapted from Hinrich Schütze http://www.informationretrieval.org Querying Corpus-wide statistics Querying
More informationFeature Engineering, Model Evaluations
Feature Engineering, Model Evaluations Giri Iyengar Cornell University gi43@cornell.edu Feb 5, 2018 Giri Iyengar (Cornell Tech) Feature Engineering Feb 5, 2018 1 / 35 Overview 1 ETL 2 Feature Engineering
More informationComparing Relevance Feedback Techniques on German News Articles
B. Mitschang et al. (Hrsg.): BTW 2017 Workshopband, Lecture Notes in Informatics (LNI), Gesellschaft für Informatik, Bonn 2017 301 Comparing Relevance Feedback Techniques on German News Articles Julia
More informationPerformance evaluation of binary classifiers
Performance evaluation of binary classifiers Kevin P. Murphy Last updated October 10, 2007 1 ROC curves We frequently design systems to detect events of interest, such as diseases in patients, faces in
More informationBasic Probability and Information Theory: quick revision
Basic Probability and Information Theory: quick revision ML for NLP Lecturer: S Luz http://www.scss.tcd.ie/~luzs/t/cs4ll4/ February 17, 2015 In these notes we review the basics of probability theory and
More informationStochastic gradient descent; Classification
Stochastic gradient descent; Classification Steve Renals Machine Learning Practical MLP Lecture 2 28 September 2016 MLP Lecture 2 Stochastic gradient descent; Classification 1 Single Layer Networks MLP
More informationCSC 411: Lecture 03: Linear Classification
CSC 411: Lecture 03: Linear Classification Richard Zemel, Raquel Urtasun and Sanja Fidler University of Toronto Zemel, Urtasun, Fidler (UofT) CSC 411: 03-Classification 1 / 24 Examples of Problems What
More informationRanked Retrieval (2)
Text Technologies for Data Science INFR11145 Ranked Retrieval (2) Instructor: Walid Magdy 31-Oct-2017 Lecture Objectives Learn about Probabilistic models BM25 Learn about LM for IR 2 1 Recall: VSM & TFIDF
More information