Text Analysis. Week 5
|
|
- Donald Greer
- 5 years ago
- Views:
Transcription
1 Week 5 Text Analysis Reference and Slide Source: ChengXiang Zhai and Sean Massung Text Data Management and Analysis: a Practical Introduction to Information Retrieval and Text Mining. Association for Computing Machinery and Morgan & Claypool, New York, NY, USA.
2 What is Multimedia? Multiple Carriers of information
3 Sensory data (Video, Audio, etc.) Web data (Text) (OSN, News, etc.) Multimedia User data (User attributes, Preferences, etc.)
4 Main Topics Audio Text Video Information Fusion Case Studies 4
5 Text Mining in Multimedia Real world Actionable knowledge Sensor 1 Sensor 2 Sensor k Non-text data Numerical Categorical Relational Video Data mining software General data mining Video mining Text data Text mining 5
6 Humans as Sensors Weather Locations Networks Thermometer Geo sensor Network sensor 3 C, 15 F, 41 N and 120 W Perceive Human sensor Express Lorem ipsum, dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Lorem ipsum, dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Lorem ipsum, dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor. Excepteur sint occaecat cupidatat nonproident, sunt in culpa qui officia deserunt mollit anim id est laborum. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. 6
7 Knowledge of the observed world Events and activities Market trend Prediction of future events Product popularity Ratings/reviews 7
8 Knowledge about the Observer Preferences Sentiments Mood 8
9 Text Analysis is also called Text Mining!
10 Text Mining Information Retrieval (IR) Natural Language Processing (NLP) Concept Extraction Web Mining Document Clustering Document Classification
11 How to Capture Text?
12 1. Offline documents 2. Online on the web 3. From speech signal
13 Capturing text from the Web Web crawling/web scrapping APIs JSON/CSV
14 Challenges in Text Analysis Documents are described with many attributes The attributes are either characters, words, or phrases 14
15 How to Represent Text?
16 Digital Representation ASCII Characters American Standard Code for Information Interchange 7-bits EBCDIC Characters Extended Binary Coded Decimal Interchange Code 8-bits
17 How to Represent for Text Analysis?
18 Let s take a sentence A dog is chasing a boy on the playground.
19 Idea 1: String of characters! Most general and simple approach But semantic analysis is difficult Words have more meaning than characters
20 Idea 2: Sequence of words! A dog is chasing a boy on the playground
21 Advantages Easily discover most frequent words Most frequent words lead to topics Many other analysis possibilities
22 Challenges In some languages, e.g. Chinese, it is difficult to identify word boundaries All words are treated equally, nouns, adjectives, verbs all are same This may limit the semantic analysis
23 Idea 3: Part of Speech (POS) Tagging! +Noun, Verb, Determiners, etc.
24 A dog is chasing a boy on the playground String of characters A dog is chasing a boy on the playground Sequence of words Det Det Noun Aux Aux Verb Det Det Noun Prep Det Det Noun Noun phrase Complex verb Noun phrase Noun phrase + POS tags Verb phrase Verb phrase Prep phrase + Syntactic structures Sentence A dog Animal CHASE a boy Person ON the playground Location + Entities and relations Deeper analysis requires more efforts!
25 POS Tagging Challenges Even less robust than sequence of words Difficult to identify all entities with the right types Relations are even harder to find
26 How would you differentiate a love letter from a research paper? Love Like Feel I, you, this, that Challenge Method Application We, you, this, that
27 Idea 4: Count the word frequencies for each document! 27
28 Example 28
29 Bag-of-Words (BoW) Consider the document as a bag/set of words! 29
30 What assumptions are we making? 1. Words are mutually independent 2. Word order in text is irrelevant 30
31 BoW is a Vector Space Model! 31
32 Dimensions Each term represents one dimension of the vector space The term can be a single word or a sequence of words (phrase) The number of terms determines the dimension of the vector space 32
33 Vector Elements The vector elements are weights associated with each term These weights reflect the relevance of the term If the corpus contains n terms, a document is represented as d = {w 1, w 2, w n } 33
34 Term Document Matrix (TDM) An m x n matrix with following features: Rows (i=1,m) represent terms from the corpus Columns (j=1,n) represent documents from the corpus Cell ij stores the weight of the term i in the context of the document j 34
35 VSM: Term Document Matrix Information theory deals with uncertainty and the transfer or storage of quantified Information theory deals with uncertainty information in the form of bits. It is applied in and the transfer or storage of quantified many fields, Information such as electrical theory deals engi- with neering, uncertainty information in the form of bits. It is applied in computer and science, the transfer mathematics, or storage physics, of and quantified many fields, Information such as electrical theory deals engi- with neering, uncertainty linguistics. information A few concepts in the form from of information bits. It is applied in computer and science, the transfer mathematics, or storage physics, of and quantified theory many are fields, very Information useful such as electrical theory in text deals engi- data with neering, uncertainty linguistics. information A few concepts in the form from of information bits. It is applied in management computer and science, the analysis, transfer mathematics, which or storage physics, we of and quantified theory many are fields, very Information useful such as electrical theory in text deals engi- data with neering, uncertainty introduce linguistics. management here briefly. information A few The concepts in analysis, most the important form from of information bits. It is applied in computer and science, the transfer mathematics, which storage physics, we of and quantified concept theory of information many are very theory useful is entropy, in text data introduce linguistics. here briefly. information fields, such A few The concepts in as most the electrical important form from of information bits. engi- It neering, is applied in which concept is management a building computer of information block and science, for analysis, theory many mathematics, is other which physics, we and theory many are fields, very useful such as entropy, electrical in text engi- data neering, measures. introduce linguistics. here briefly. The most important which is management a building computer A few block and science, concepts for analysis, many mathematics, from information other which physics, we and The problem concept can theory of be information formally are defined very theory useful as the is entropy, in text data measures. introduce linguistics. here briefly. A few The concepts most important from information quantified which The problem uncertainty is management a building can be in predicting block and formally defined the for analysis, value many other which we concept theory of information are very theory useful as the is entropy, in text data of a random measures. variable. introduce In the common quantified which uncertainty is management here briefly. a building in predicting block and The the for analysis, most important value many other which we example The of a of random a problem coin, concept the can variable. two of be values information formally In would defined the common be theory 1 as the is entropy, measures. introduce here briefly. The most important or 0 (depicting quantified which heads uncertainty or tails) in predicting and the the value example The of a problem coin, concept is a building the can two of be values information block for formally would defined be theory many 1 as the is other entropy, random of variable a random measures. representing variable. In these common or 0 (depicting quantified which heads uncertainty is or a building tails) in predicting and block the the for value many other outcomes example random is X. of In The variable a other of random a problem coin, words, the can two be values formally would defined be 1 as the measures. representing variable. In these common or 0 (depicting quantified heads uncertainty or tails) in predicting and the the value outcomes example random is X. of In The variable a other of random a problem coin, words, the can two be values formally would defined be 1 as the representing variable. In these common or 0 (depicting quantified heads uncertainty or tails) in predicting and the the value outcomes example random is X. of In variable a other of random a coin, words, the two values would be 1 representing variable. In these common or 0 (depicting heads or tails) and the outcomes example random is X. In variable other of a coin, words, the two values would be 1 representing these or 0 (depicting heads or tails) and the outcomes random is X. In variable other words, representing these outcomes is X. In other words, D1 D2 D3 D4 D5 complexity algorithm entropy traffic network
36 Text Preprocessing Objective: to choose most relevant words for the corpus also called a dictionary! 36
37 1. Transform various forms of the terms in a common normalized form Transform all words to lower case Use a dictionary to replace synonyms with a common, general term 37
38 Examples Apple, apple, APPLE -> apple Intelligent Systems, Intelligent systems, Intelligent-systems -> intelligent systems automobile, car -> vehicle 38
39 2. Remove High Frequency Terms Very high frequency terms are semantically almost useless Examples: the, a, an, we, do, to 39
40 3. Remove Low Frequency Terms Very low frequency terms are semantically rich, but not representative of the class Examples: dextrosinistral 40
41 The rest of the words are those that represent the corpus the best! Remove words that do not bear meaning Upper cut-off Lower cut-off Frequency of words Revolving power of significant words Significant words Remove highly infrequent words Words by rank order Image s Image source: 41
42 4. Stop Words Stop-words are those words that (on their own) do not bear any information / meaning It is estimated that they represent 20-30% of words in any corpus There is no unique stop-words list Frequently used lists are available at: 42
43 Removing words causes loss of meaning and structure! this is not a good option -> option to be or not to be -> null 43
44 5. Lemmatization & Stemming to reduce the variability of words 44
45 Stemming It is a crude heuristic process that chops off the ends of words to get to a basic form Does not consider linguistic features of the words Normalizes verb tenses 45
46 Examples walking, walks, walked, walker => walk argue, argued, argues, arguing => argu apply, applications, reapplied => apply denormalization => norm
47 Lemmatization Use vocabulary and morphological analysis of words to get the base or dictionary form of a word Base form is also known as the lemma E.g., argue, argued, argues, arguing -> argue 47
48 Summary of Preprocessing Steps 1. Transform various forms of the terms in a common normalized form 2. Remove high frequency terms 3. Remove low frequency terms 4. Remove stop-words 5. Lemmatization & Stemming 48
49 How do we compute term weights? Information theory deals with uncertainty and the transfer or storage of quantified Information theory deals with uncertainty information in the form of bits. It is applied in and the transfer or storage of quantified many fields, Information such as electrical theory deals engi- with neering, uncertainty information in the form of bits. It is applied in computer and science, the transfer mathematics, or storage physics, of and quantified many fields, Information such as electrical theory deals engi- with neering, uncertainty linguistics. information A few concepts in the form from of information bits. It is applied in computer and science, the transfer mathematics, or storage physics, of and quantified theory many are fields, very Information useful such as electrical theory in text deals engi- data with neering, uncertainty linguistics. information A few concepts in the form from of information bits. It is applied in management computer and science, the analysis, transfer mathematics, which or storage physics, we of and quantified theory many are fields, very Information useful such as electrical theory in text deals engi- data with neering, uncertainty introduce linguistics. management here briefly. information A few The concepts in analysis, most the important form from of information bits. It is applied in computer and science, the transfer mathematics, which storage physics, we of and quantified concept theory of information many are very theory useful is entropy, in text data introduce linguistics. here briefly. information fields, such A few The concepts in as most the electrical important form from of information bits. engi- It neering, is applied in which concept is management a building computer of information block and science, for analysis, theory many mathematics, is other which physics, we and theory many are fields, very useful such as entropy, electrical in text engi- data neering, measures. introduce linguistics. here briefly. The most important which is management a building computer A few block and science, concepts for analysis, many mathematics, from information other which physics, we and The problem concept can theory of be information formally are defined very theory useful as the is entropy, in text data measures. introduce linguistics. here briefly. A few The concepts most important from information quantified which The problem uncertainty is management a building can be in predicting block and formally defined the for analysis, value many other which we concept theory of information are very theory useful as the is entropy, in text data of a random measures. variable. introduce In the common quantified which uncertainty is management here briefly. a building in predicting block and The the for analysis, most important value many other which we example The of a of random a problem coin, concept the can variable. two of be values information formally In would defined the common be theory 1 as the is entropy, measures. introduce here briefly. The most important or 0 (depicting quantified which heads uncertainty or tails) in predicting and the the value example The of a problem coin, concept is a building the can two of be values information block for formally would defined be theory many 1 as the is other entropy, random of variable a random measures. representing variable. In these common or 0 (depicting quantified which heads uncertainty is or a building tails) in predicting and block the the for value many other outcomes example random is X. of In The variable a other of random a problem coin, words, the can two be values formally would defined be 1 as the measures. representing variable. In these common or 0 (depicting quantified heads uncertainty or tails) in predicting and the the value outcomes example random is X. of In The variable a other of random a problem coin, words, the can two be values formally would defined be 1 as the representing variable. In these common or 0 (depicting quantified heads uncertainty or tails) in predicting and the the value outcomes example random is X. of In variable a other of random a coin, words, the two values would be 1 representing variable. In these common or 0 (depicting heads or tails) and the outcomes example random is X. In variable other of a coin, words, the two values would be 1 representing these or 0 (depicting heads or tails) and the outcomes random is X. In variable other words, representing these outcomes is X. In other words, complexity algorithm entropy traffic network D1 D2 D3 D4 D5 49
50 Idea 1: take the value of 0 or 1, to reflect the presence (1) or absence (0) of the term in a particular document! 50
51 Example Doc1: Text mining is to identify useful information. Doc2: Useful information is mined from text. Doc3: Apple is delicious. text information identify mining mined is useful to from apple delicious Doc Doc Doc
52 Observation: few terms occur more often in a specific class of documents! 52
53 Idea 2: measure the term frequency (TF) in a specific document! Assumption: If a term occurs more often in a doc, it measures something important for that doc! 53
54 Term Frequency (TF) Information theory deals with uncertainty and the transfer or storage of quantified Information theory deals with uncertainty information in the form of bits. It is applied in and the transfer or storage of quantified many fields, Information such as electrical theory deals engi- with neering, uncertainty information in the form of bits. It is applied in computer and science, the transfer mathematics, or storage physics, of and quantified many fields, Information such as electrical theory deals engi- with neering, uncertainty linguistics. information A few concepts in the form from of information bits. It is applied in computer and science, the transfer mathematics, or storage physics, of and quantified theory many are fields, very Information useful such as electrical theory in text deals engi- data with neering, uncertainty linguistics. information A few concepts in the form from of information bits. It is applied in management computer and science, the analysis, transfer mathematics, which or storage physics, we of and quantified theory many are fields, very Information useful such as electrical theory in text deals engi- data with neering, uncertainty introduce linguistics. management here briefly. information A few The concepts in analysis, most the important form from of information bits. It is applied in computer and science, the transfer mathematics, which storage physics, we of and quantified concept theory of information many are very theory useful is entropy, in text data introduce linguistics. here briefly. information fields, such A few The concepts in as most the electrical important form from of information bits. engi- It neering, is applied in which concept is management a building computer of information block and science, for analysis, theory many mathematics, is other which physics, we and theory many are fields, very useful such as entropy, electrical in text engi- data neering, measures. introduce linguistics. here briefly. The most important which is management a building computer A few block and science, concepts for analysis, many mathematics, from information other which physics, we and The problem concept can theory of be information formally are defined very theory useful as the is entropy, in text data measures. introduce linguistics. here briefly. A few The concepts most important from information quantified which The problem uncertainty is management a building can be in predicting block and formally defined the for analysis, value many other which we concept theory of information are very theory useful as the is entropy, in text data of a random measures. variable. introduce In the common quantified which uncertainty is management here briefly. a building in predicting block and The the for analysis, most important value many other which we example The of a of random a problem coin, concept the can variable. two of be values information formally In would defined the common be theory 1 as the is entropy, measures. introduce here briefly. The most important or 0 (depicting quantified which heads uncertainty or tails) in predicting and the the value example The of a problem coin, concept is a building the can two of be values information block for formally would defined be theory many 1 as the is other entropy, random of variable a random measures. representing variable. In these common or 0 (depicting quantified which heads uncertainty is or a building tails) in predicting and block the the for value many other outcomes example random is X. of In The variable a other of random a problem coin, words, the can two be values formally would defined be 1 as the measures. representing variable. In these common or 0 (depicting quantified heads uncertainty or tails) in predicting and the the value outcomes example random is X. of In The variable a other of random a problem coin, words, the can two be values formally would defined be 1 as the representing variable. In these common or 0 (depicting quantified heads uncertainty or tails) in predicting and the the value outcomes example random is X. of In variable a other of random a coin, words, the two values would be 1 representing variable. In these common or 0 (depicting heads or tails) and the outcomes example random is X. In variable other of a coin, words, the two values would be 1 representing these or 0 (depicting heads or tails) and the outcomes random is X. In variable other words, representing these outcomes is X. In other words, D1 D2 D3 D4 D5 complexity algorithm entropy traffic network
55 Example Task: Separate technical and nontechnical document job and engineer both occur 100 times in a corpus of 100 documents Which term is more important? Most likely, the term job appears in most documents while engineer occurs mainly in technical documents!
56 Observation: generally words not so common in the corpus are more important! 56
57 Idea 3: assign higher weights to terms occurring in fewer documents! 57
58 Inverse Document Frequency (IDF) IDF t =1+ log N N t N = Number of documents N t = Number of documents with term t Note - IDF is computed at the corpus level for each term 58
59 A rare term has high idf and a frequent term has low idf! 59
60 Observation: terms that are not so common in the corpus, but still have same reasonable level of frequency are important! 60
61 TF-IDF Most frequently used weight matrix Generally computer as follows: TF-IDF t = TF IDF t 61
62 Text Retrieval Example Query = news about presidential campaign d 1 d 2 d 3 d 4 d 5 news about news about organic food campaign news of presidential campaign news of presidential campaign presidential candidate news of organic food campaign campaign campaign campaign 62
63 Text Retrieval Example 1. Define the dimensions 2. Define how the vectors are obtained 3. Define similarity between vectors 63
64 The goal is to find the closest document in the corpus! Programming Query q d 2? d M d 3? d 5 dential d 4 d 1? 64
65 Dot product similarity with bit vector representation q = (x 1,, x N ) d = (y 1,, y N ) x i, y i 2{0, 1} 1: word W is p where i i 2 1: word W i is present 0: word W i is absent Sim(q, d) = q.d = x 1 y x N y N = Σ N i=1 x i y i 65
66 Dot product similarity with bit vector representation Query = news about presidential campaign Similarity score d 1 d 2 d 3 d 4 d 5 news about news about organic food campaign news of presidential campaign news of presidential campaign presidential candidate news of organic food campaign campaign campaign campaign
67 Problems d 2, d 3, d 4 all are scored same d 2 should be ranked lower than d 3 and d 4 d 4 should score more than d 3 because d 4 mentioned presidential more times 67
68 Dot product similarity with Term Frequency representation q = (x 1,, x N ) d = (y 1,, y N ) x i = count of word W i in query y i = count of word W i in doc Sim(q, d) = q.d = x 1 y x N y N = Σ N i=1 x i y i 68
69 Dot product similarity with Term Frequency representation Query = news about presidential campaign Similarity score d 1 d 2 d 3 d 4 d 5 news about news about organic food campaign news of presidential campaign news of presidential campaign presidential candidate news of organic food campaign campaign campaign campaign
70 Problems d 2 and d 3 have the same scores d 2 should be ranked lower than d 3 and d 4 70
71 Dot product similarity with TF- IDF representation W 1 x i = count of word W i in query q = (x 1,, x N ) y i = c(w i, d) *IDF(W i ) d = (y 1,, y N ) W 3 W 2 71
72 Dot product similarity with TF- IDF representation Query = news about presidential campaign Similarity score d 1 d 2 d 3 d 4 d 5 news about news about organic food campaign news of presidential campaign news of presidential campaign presidential candidate news of organic food campaign campaign campaign campaign
PROJECT TITLE. B.Tech. BRANCH NAME NAME 2 (ROLL NUMBER)
PROJECT TITLE A thesis submitted in partial fulfillment of the requirements for the award of the degree of B.Tech. in BRANCH NAME By NAME 1 (ROLL NUMBER) NAME 2 (ROLL NUMBER) DEPARTMENT OF DEPARTMENT NAME
More informationWeek 7 Sentiment Analysis, Topic Detection
Week 7 Sentiment Analysis, Topic Detection Reference and Slide Source: ChengXiang Zhai and Sean Massung. 2016. Text Data Management and Analysis: a Practical Introduction to Information Retrieval and Text
More informationCLEAR SPACE AND MINIMUM SIZE. A clear space area free of competing visual elements should be maintained.
BRAND TOOL KIT LOGO The preferred logoversion to use is on white background. Negative version is reversed (white) on a dark blue background. Also a black and white version is available for 1-colour print
More informationRecurrent Neural Networks. COMP-550 Oct 5, 2017
Recurrent Neural Networks COMP-550 Oct 5, 2017 Outline Introduction to neural networks and deep learning Feedforward neural networks Recurrent neural networks 2 Classification Review y = f( x) output label
More informationBrand guidelines 2014
Brand guidelines 2014 // master logo // master logo & alternative colour breakdowns Blue C: 100 M: 95 Y: 5 K: 0 R: 43 G: 57 B: 144 #2b3990 // master logo - alternate colourful version // master logo -
More informationNeural Networks for NLP. COMP-599 Nov 30, 2016
Neural Networks for NLP COMP-599 Nov 30, 2016 Outline Neural networks and deep learning: introduction Feedforward neural networks word2vec Complex neural network architectures Convolutional neural networks
More informationUSER MANUAL. ICIM S.p.A. Certifi cation Mark
USER MANUAL ICIM S.p.A. Certifi cation Mark Index Informative note 4 The Certifi cation Mark 6 Certifi ed Management System 8 Certifi ed Management System: Examples 20 Certifi ed Product 28 Certifi ed
More informationHow to give. an IYPT talk
How to give Today you will learn: 1. What is IYPT and where is it from 2. How US was involved before 3. How to craft physics arguments 4. How to use presentation tricks 5. How to criticize flaws an IYPT
More informationDorset Coastal Explorer Planning
Dorset Coastal Explorer Planning Please read this information carefully. If you wish to proceed after reading this information you must signify your agreement to the following conditions of access by selecting
More informationLinear Algebra. Dave Bayer
Linear Algebra Dave Bayer c 2007 by Dave Bayer All rights reserved Projective Press New York, NY USA http://projectivepress.com/linearalgebra Dave Bayer Department of Mathematics Barnard College Columbia
More informationMEDIA KIT D E S T I N A T I O N S M A G A Z I N E D E S T I N A T I O N S M A G A Z I N E. C O M
MEDIA KIT 201 8 D E S T I N A T I O N S M A G A Z I N E 2 0 1 8 D E S T I N A T I O N S M A G A Z I N E. C O M O U R B R A N D A platform for those consumed by wanderlust. D E S T I N A T I O N S M A G
More informationSediment Management for Coastal Restoration in Louisiana -Role of Mississippi and Atchafalaya Rivers
Sediment Management for Coastal Restoration in Louisiana -Role of Mississippi and Atchafalaya Rivers Syed M. Khalil Louisiana Applied Coastal Engineering & Science (LACES) Division 9 th INTECOL International
More informationsps hrb con csb rmb gsv gsh CASE STUDIES CASE STUDIES CALL TO ACTION CALL TO ACTION PRODUCTS APPLICATIONS TRAINING EDUCATORS RESOURCES ABOUT CONTACT
DETAILS 0.a SITE-WIDE 1 HEADER DETAILS PRODUCTS APPLICATIONS TRAINING EDUCATORS RESOURCES ABOUT CONTACT Product Product Product 2 IMAGE DETAILS IN ORDER OF APPEARANCE 3 COLOR PALETTE TAG SIZE AND DESCRIPTION
More informationAustralia s Optical Data Centre
Australia s Optical Data Centre A Data Central and SkyMapper Collaboration Milky Way over the Pinnacles in Australia - Michael Goh Explore Tools for data retrieval and automatic cross-matching make for
More information{SOP_Title} Standard Operating Procedure. Owner: {SOP_Author} {SOP_Description} {SOP_ID} {Date}
Standard Operating Procedure {SOP_Title} {SOP_Description} {SOP_ID} {Date} {Company_Name} Phone: {Company_Phone} {Company_Website} Owner: {SOP_Author} Table of Contents 1. Purpose 3 2. Terminology 4 3.
More informationBoolean and Vector Space Retrieval Models
Boolean and Vector Space Retrieval Models Many slides in this section are adapted from Prof. Joydeep Ghosh (UT ECE) who in turn adapted them from Prof. Dik Lee (Univ. of Science and Tech, Hong Kong) 1
More informationLet s Get Visual. Michelle Samplin-Salgado Andrea Goetschius Liesl Lu
Let s Get Visual Michelle Samplin-Salgado Andrea Goetschius Liesl Lu CBA@JSI Today s Presenters MICHELLE ANDI LIESL WHERE ARE WE GOING? Basic principles of design How effective design can improve the promotion
More informationStrings in AdS 4 CP 3, and three paths to h(λ)
Strings in AdS 4 CP 3, and three paths to h(λ) Michael Abbott, TIFR Work (and work in progress) with Ines Aniceto & Diego Bombardelli (Lisbon & Porto) and with Per Sundin (Cape Town) Puri, 7 January 2011
More informationKING AND GODFREE SALAMI FEST MARCH TEN LYGON STREET CARLTON
Pidapipo KING AND GODFREE SALAMI FEST MARCH TEN LYGON STREET CARLTON ON HOVER: RED OVERLAY, WHITE TEXT BANNER HYPERLINKED TO EVENT IN NEWS PAGE FIXED FOOTER. ON HOVER: RED ON CLICK: MENU SLIDES FROM LEFT.
More informationCelebrating PHILANTHROPY. Brian Hew
Celebrating PHILANTHROPY Brian Hew Publisher / Editor-in-Chief Beaver Shriver Beaver@GiveSarasota.com Editor Viia Beaumanis VBeaumanis@gmail.com Welcome to GIVE Sarasota! em ipsum dolor sit amet, consectetuer
More informationGeneral comments, TODOs, etc.
7 9 7 9 7 9 7 9 7 9 General comments, TODOs, etc. -Think up a catchy title. -Decide which references to anonymize to maintain double-blindness. -Write the abstract -Fill in remaining [CITE], which are
More informationSTELLAR DYNAMICS IN THE NUCLEI OF M31 AND M32: EVIDENCE FOR MASSIVE BLACK HOLES A LAN D RESSLER
STELLAR DYNAMICS IN THE NUCLEI OF M31 AND M32: EVIDENCE FOR MASSIVE BLACK HOLES A LAN D RESSLER ABSTRACT 이웃하는두은하인 M32, M32 의중심에거대블랙홀이있는것같다!! Ca II M31, M32, maximum entropy technique 3 2 M31, M32 M/L masses
More informationUpdated Tutor HS Physics Content - legacy. By: Alina Slavik
Updated Tutor HS Physics Content - legacy By: Alina Slavik Updated Tutor HS Physics Content - legacy By: Alina Slavik Online: < http://legacy.cnx.org/content/col11768/1.4/ > OpenStax-CNX This selection
More informationTutor HS Physics Test Content. Collection Editor: Julia Liu
Tutor HS Physics Test Content Collection Editor: Julia Liu Tutor HS Physics Test Content Collection Editor: Julia Liu Author: Alina Slavik Online: < http://cnx.org/content/col11775/1.3/ > OpenStax-CNX
More informationCACHET Technical Report Template
TR Technical Report CACHET Technical Report Template A nice standard XeLaTeX template for technical reports written and published by the Copenhagen Center for Health Technology. John D. Doe Copenhagen
More informationBoolean and Vector Space Retrieval Models CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK).
Boolean and Vector Space Retrieval Models 2013 CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK). 1 Table of Content Boolean model Statistical vector space model Retrieval
More informationVariable Latent Semantic Indexing
Variable Latent Semantic Indexing Prabhakar Raghavan Yahoo! Research Sunnyvale, CA November 2005 Joint work with A. Dasgupta, R. Kumar, A. Tomkins. Yahoo! Research. Outline 1 Introduction 2 Background
More informationIt s all here. Game On. New Jersey s Manhattan connection. Billboard Samples. Unparalleled access to arts and culture.
Design Applications 18 Billboard Samples www. New Jersey s Manhattan connection. Game On. Unparalleled access to arts and culture. 19 Enda laces ephque nesci picilig endaes giotio namagnam aut antio int
More informationPleo Mobile Strategy
Pleo Mobile Strategy Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Nunc lobortis mattis aliquam faucibus purus in massa. Convallis
More information1 Information retrieval fundamentals
CS 630 Lecture 1: 01/26/2006 Lecturer: Lillian Lee Scribes: Asif-ul Haque, Benyah Shaparenko This lecture focuses on the following topics Information retrieval fundamentals Vector Space Model (VSM) Deriving
More informationDowntown: Location of Choice for Living 13. Message from the President 2. Downtown: Location of Choice for Business 7
Alliance for Downtown New York 2008 Message from the President 2 Downtown: Location of Choice for Business 7 Downtown: Location of Choice for Living 13 Downtown: Location of Choice for Recreation 17 Downtown:
More informationEI MEL QUAS NULLAM CONSTITUTO, NAM TE TIMEAM MENTITUM. John D. Sanderson A DISSERTATION
EI MEL QUAS NULLAM CONSTITUTO, NAM TE TIMEAM MENTITUM By John D. Sanderson A DISSERTATION Submitted in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY In Some Program MICHIGAN
More informationPutting Weirdness to Work: Quantum Information Science QIS. Quantum Computer
Putting Weirdness to Work: Quantum Information Science QIS Quantum Computer John Preskill, Caltech Biedenharn Lecture 6 September 2005 ??? 2005 2100 ??? 2005 Q Information is encoded in the state of a
More informationPart I: Web Structure Mining Chapter 1: Information Retrieval and Web Search
Part I: Web Structure Mining Chapter : Information Retrieval an Web Search The Web Challenges Crawling the Web Inexing an Keywor Search Evaluating Search Quality Similarity Search The Web Challenges Tim
More informationScoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology
Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2014 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)
More informationR ASTED UGLY MUG COFFEE: IT MAY BE BLUE, BUT IT WON T GIVE YOU THE BLUES COFFEE DRINKS TO GET YOU THROUGH THE GRIND WHAT S THE DEAL WITH MATCHA?
R ASTED APRIL 2018 WHAT S THE DEAL WITH MATCHA? page 6 UGLY MUG COFFEE: IT MAY BE BLUE, BUT IT WON T GIVE YOU THE BLUES page 4 COFFEE DRINKS TO GET YOU THROUGH THE GRIND page 8 HOMEMDAE BISCOTTI FOR BREAKFAST?
More informationUniversity of Illinois at Urbana-Champaign. Midterm Examination
University of Illinois at Urbana-Champaign Midterm Examination CS410 Introduction to Text Information Systems Professor ChengXiang Zhai TA: Azadeh Shakery Time: 2:00 3:15pm, Mar. 14, 2007 Place: Room 1105,
More informationLYFTER BRAND MANUAL. The Lyfter brand, including the logo, name, colors and identifying elements, are valuable company assets.
BRAND MANUAL Lyfter Brand Manual 2 // 16 LYFTER BRAND MANUAL This brand manual describes the visual and verbal elements that represent Lyfter s brand identity. This includes our name, logo and other elements
More informationA unique. Living Heritage
A unique Living Heritage Innovative Materials Rich Colour Palette PANTONE 2695 PANTONE 549 PANTONE 4515 PANTONE 532 PANTONE 201 PANTONE 159 PANTONE 623 PANTONE 577 PANTONE Warm Grey 1 Illustration Style
More informationNatural Language Processing. Topics in Information Retrieval. Updated 5/10
Natural Language Processing Topics in Information Retrieval Updated 5/10 Outline Introduction to IR Design features of IR systems Evaluation measures The vector space model Latent semantic indexing Background
More informationPutting Weirdness to Work: Quantum Information Science QIS. Quantum Computer. John Preskill, Caltech KITP Public Lecture 3 May 2006
Putting Weirdness to Work: Quantum Information Science QIS Quantum Computer John Preskill, Caltech KITP Public Lecture 3 May 2006 ??? 2006 2100 ??? 2006 Q Information is encoded in the state of a physical
More informationScoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology
Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2016 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)
More informationSemantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing
Semantics with Dense Vectors Reference: D. Jurafsky and J. Martin, Speech and Language Processing 1 Semantics with Dense Vectors We saw how to represent a word as a sparse vector with dimensions corresponding
More informationScoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology
Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2017 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)
More informationFROM QUERIES TO TOP-K RESULTS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
FROM QUERIES TO TOP-K RESULTS Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Retrieval models Retrieval evaluation Link
More informationDISTRIBUTIONAL SEMANTICS
COMP90042 LECTURE 4 DISTRIBUTIONAL SEMANTICS LEXICAL DATABASES - PROBLEMS Manually constructed Expensive Human annotation can be biased and noisy Language is dynamic New words: slangs, terminology, etc.
More information3. Basics of Information Retrieval
Text Analysis and Retrieval 3. Basics of Information Retrieval Prof. Bojana Dalbelo Bašić Assoc. Prof. Jan Šnajder With contributions from dr. sc. Goran Glavaš Mladen Karan, mag. ing. University of Zagreb
More informationBrand Guidelines Preparation for winter << name here >> North East Lincolnshire Leeds
Brand Guidelines Contents Introduction.... 3 The logo................................. 4 Exclusion zone... 5 Do s & don ts.... 6 Cold weather plan levels.... 7 Creating sub brand logos... 9 The Winter
More informationText Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University
Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data
More informationTerm Weighting and the Vector Space Model. borrowing from: Pandu Nayak and Prabhakar Raghavan
Term Weighting and the Vector Space Model borrowing from: Pandu Nayak and Prabhakar Raghavan IIR Sections 6.2 6.4.3 Ranked retrieval Scoring documents Term frequency Collection statistics Weighting schemes
More informationNotes on Latent Semantic Analysis
Notes on Latent Semantic Analysis Costas Boulis 1 Introduction One of the most fundamental problems of information retrieval (IR) is to find all documents (and nothing but those) that are semantically
More informationPROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability
More informationPart A. P (w 1 )P (w 2 w 1 )P (w 3 w 1 w 2 ) P (w M w 1 w 2 w M 1 ) P (w 1 )P (w 2 w 1 )P (w 3 w 2 ) P (w M w M 1 )
Part A 1. A Markov chain is a discrete-time stochastic process, defined by a set of states, a set of transition probabilities (between states), and a set of initial state probabilities; the process proceeds
More informationVISUAL IDENTITY SYSTEM. Visual tools for creating and reinforcing the USC Viterbi Brand.
VISUAL IDENTITY SYSTEM Visual tools for creating and reinforcing the USC Viterbi Brand. The Importance of the Brand. Why Is the Identity System Important? A strong, coherent visual identity is a critical
More informationSemantic Similarity from Corpora - Latent Semantic Analysis
Semantic Similarity from Corpora - Latent Semantic Analysis Carlo Strapparava FBK-Irst Istituto per la ricerca scientifica e tecnologica I-385 Povo, Trento, ITALY strappa@fbk.eu Overview Latent Semantic
More informationLanguage Models. CS6200: Information Retrieval. Slides by: Jesse Anderton
Language Models CS6200: Information Retrieval Slides by: Jesse Anderton What s wrong with VSMs? Vector Space Models work reasonably well, but have a few problems: They are based on bag-of-words, so they
More informationLanguage as a Stochastic Process
CS769 Spring 2010 Advanced Natural Language Processing Language as a Stochastic Process Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu 1 Basic Statistics for NLP Pick an arbitrary letter x at random from any
More informationChap 2: Classical models for information retrieval
Chap 2: Classical models for information retrieval Jean-Pierre Chevallet & Philippe Mulhem LIG-MRIM Sept 2016 Jean-Pierre Chevallet & Philippe Mulhem Models of IR 1 / 81 Outline Basic IR Models 1 Basic
More informationRanked Retrieval (2)
Text Technologies for Data Science INFR11145 Ranked Retrieval (2) Instructor: Walid Magdy 31-Oct-2017 Lecture Objectives Learn about Probabilistic models BM25 Learn about LM for IR 2 1 Recall: VSM & TFIDF
More informationInformation Retrieval and Web Search Engines
Information Retrieval and Web Search Engines Lecture 4: Probabilistic Retrieval Models April 29, 2010 Wolf-Tilo Balke and Joachim Selke Institut für Informationssysteme Technische Universität Braunschweig
More informationMaschinelle Sprachverarbeitung
Maschinelle Sprachverarbeitung Retrieval Models and Implementation Ulf Leser Content of this Lecture Information Retrieval Models Boolean Model Vector Space Model Inverted Files Ulf Leser: Maschinelle
More informationLorem ipsum dolor? Abstract
Lorem ipsum dolor? Lorem Ipsum Reserch Santa Claus Abstract An abstract is a brief summary of a research article, thesis, review, conference proceeding or any in-depth analysis of a particular subject
More information10/17/04. Today s Main Points
Part-of-speech Tagging & Hidden Markov Model Intro Lecture #10 Introduction to Natural Language Processing CMPSCI 585, Fall 2004 University of Massachusetts Amherst Andrew McCallum Today s Main Points
More informationNatural Language Processing Prof. Pawan Goyal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur
Natural Language Processing Prof. Pawan Goyal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 18 Maximum Entropy Models I Welcome back for the 3rd module
More informationInformation Retrieval and Web Search
Information Retrieval and Web Search IR models: Vector Space Model IR Models Set Theoretic Classic Models Fuzzy Extended Boolean U s e r T a s k Retrieval: Adhoc Filtering Brosing boolean vector probabilistic
More informationChairman s Statement 6. Board of Directors. Financial Highlights ANNUAL REPORT 2014
2 Chairman s Statement 6 Board of Directors 9 Financial Highlights ANNUAL REPORT 2014 Chairman s Statement Dear Shareholders, Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aenean quis ultrices
More informationVector Space Model. Yufei Tao KAIST. March 5, Y. Tao, March 5, 2013 Vector Space Model
Vector Space Model Yufei Tao KAIST March 5, 2013 In this lecture, we will study a problem that is (very) fundamental in information retrieval, and must be tackled by all search engines. Let S be a set
More informationVector Space Scoring Introduction to Information Retrieval INF 141 Donald J. Patterson
Vector Space Scoring Introduction to Information Retrieval INF 141 Donald J. Patterson Content adapted from Hinrich Schütze http://www.informationretrieval.org Querying Corpus-wide statistics Querying
More informationChicago Manual of Style
Sample Typeset Xulon Press will typeset the interior of your book according to the Chicago Manual of Style method of document formatting, which is the publishing industry standard. The sample attached
More informationMidterm Examination Practice
University of Illinois at Urbana-Champaign Midterm Examination Practice CS598CXZ Advanced Topics in Information Retrieval (Fall 2013) Professor ChengXiang Zhai 1. Basic IR evaluation measures: The following
More informationCAIM: Cerca i Anàlisi d Informació Massiva
1 / 21 CAIM: Cerca i Anàlisi d Informació Massiva FIB, Grau en Enginyeria Informàtica Slides by Marta Arias, José Balcázar, Ricard Gavaldá Department of Computer Science, UPC Fall 2016 http://www.cs.upc.edu/~caim
More informationInformation Retrieval
Introduction to Information Retrieval CS276: Information Retrieval and Web Search Pandu Nayak and Prabhakar Raghavan Lecture 6: Scoring, Term Weighting and the Vector Space Model This lecture; IIR Sections
More informationTALP at GeoQuery 2007: Linguistic and Geographical Analysis for Query Parsing
TALP at GeoQuery 2007: Linguistic and Geographical Analysis for Query Parsing Daniel Ferrés and Horacio Rodríguez TALP Research Center Software Department Universitat Politècnica de Catalunya {dferres,horacio}@lsi.upc.edu
More informationPageRank algorithm Hubs and Authorities. Data mining. Web Data Mining PageRank, Hubs and Authorities. University of Szeged.
Web Data Mining PageRank, University of Szeged Why ranking web pages is useful? We are starving for knowledge It earns Google a bunch of money. How? How does the Web looks like? Big strongly connected
More informationPart-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287
Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287 Review: Neural Networks One-layer multi-layer perceptron architecture, NN MLP1 (x) = g(xw 1 + b 1 )W 2 + b 2 xw + b; perceptron x is the
More informationRetrieval by Content. Part 2: Text Retrieval Term Frequency and Inverse Document Frequency. Srihari: CSE 626 1
Retrieval by Content Part 2: Text Retrieval Term Frequency and Inverse Document Frequency Srihari: CSE 626 1 Text Retrieval Retrieval of text-based information is referred to as Information Retrieval (IR)
More informationLecture 1b: Text, terms, and bags of words
Lecture 1b: Text, terms, and bags of words Trevor Cohn (based on slides by William Webber) COMP90042, 2015, Semester 1 Corpus, document, term Body of text referred to as corpus Corpus regarded as a collection
More informationIntroduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 6 Scoring term weighting and the vector space model
Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 6 Scoring term weighting and the vector space model Ranked retrieval Thus far, our queries have all been Boolean. Documents either
More information2018 David Roisum, Ph.D. Finishing Technologies, Inc.
Web 101.89 SM What Coaters Need to Know About Winders Ba c k Defect Front Sup ply Proc ess Produc t 2018 David Roisum, Ph.D. Finishing Technologies, Inc. 89.1 #1 Thing Is that Lorem ipsum dolor sit amet,
More informationSparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.
ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent
More informationANLP Lecture 22 Lexical Semantics with Dense Vectors
ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous
More informationLecture 3: Probabilistic Retrieval Models
Probabilistic Retrieval Models Information Retrieval and Web Search Engines Lecture 3: Probabilistic Retrieval Models November 5 th, 2013 Wolf-Tilo Balke and Kinda El Maarry Institut für Informationssysteme
More informationCross-Lingual Language Modeling for Automatic Speech Recogntion
GBO Presentation Cross-Lingual Language Modeling for Automatic Speech Recogntion November 14, 2003 Woosung Kim woosung@cs.jhu.edu Center for Language and Speech Processing Dept. of Computer Science The
More informationWhat is Text mining? To discover the useful patterns/contents from the large amount of data that can be structured or unstructured.
What is Text mining? To discover the useful patterns/contents from the large amount of data that can be structured or unstructured. Text mining What can be used for text mining?? Classification/categorization
More informationPaper Template. Contents. Name
Paper Template Name Abstract Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus ut quam vel ipsum porta congue ac sit amet urna. Fusce non purus sit amet sem placerat ultrices. Nulla facilisi.
More informationA MESSAGE FROM THE EDITOR, HAPPY READING
intro A MESSAGE FROM THE EDITOR, HAPPY READING Donec laoreet nulla vitae lacus sodales pharetra. Nullam cursus convallis mattis. Nullam sagittis, odio a viverra tincidunt, risus eros sollicitudin ante,
More informationTerm Weighting and Vector Space Model. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze
Term Weighting and Vector Space Model Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze 1 Ranked retrieval Thus far, our queries have all been Boolean. Documents either
More informationChicago Manual of Style
Sample Typeset Xulon Press will typeset the interior of your book according to the Chicago Manual of Style method of document formatting, which is the publishing industry standard. The sample attached
More informationModern Information Retrieval
Modern Information Retrieval Chapter 3 Modeling Introduction to IR Models Basic Concepts The Boolean Model Term Weighting The Vector Model Probabilistic Model Retrieval Evaluation, Modern Information Retrieval,
More informationSequences and Information
Sequences and Information Rahul Siddharthan The Institute of Mathematical Sciences, Chennai, India http://www.imsc.res.in/ rsidd/ Facets 16, 04/07/2016 This box says something By looking at the symbols
More informationFall CS646: Information Retrieval. Lecture 6 Boolean Search and Vector Space Model. Jiepu Jiang University of Massachusetts Amherst 2016/09/26
Fall 2016 CS646: Information Retrieval Lecture 6 Boolean Search and Vector Space Model Jiepu Jiang University of Massachusetts Amherst 2016/09/26 Outline Today Boolean Retrieval Vector Space Model Latent
More informationParamedic Academy. Welcome Jon, here are the latest updates. Your Progress. Your Bookmarks. Latest Discussions
Welcome Jon, here are the latest updates October 15, 2016 In eleifend vitae dui sit amet ornare Nunc ut luctus augue. Ut volutpat congue maximus. Nulla tris que nunc quis nulla malesuada malesuada. Pellentesque
More informationA Study for Evaluating the Importance of Various Parts of Speech (POS) for Information Retrieval (IR)
A Study for Evaluating the Importance of Various Parts of Speech (POS) for Information Retrieval (IR) Chirag Shah Dept. of CSE IIT Bombay, Powai Mumbai - 400 076, Maharashtra, India. Email: chirag@cse.iitb.ac.in
More informationSeptember Grand Fiesta American Coral Beach Cancún, Resort & Spa. Programa Académico. Instituto Nacional de. Cancerología
August 30th - 2nd September 2017 Grand Fiesta American Coral Beach Cancún, Resort & Spa Programa Académico Instituto Nacional de Cancerología WELCOME Estimado Doctor: Lorem ipsum dolor sit amet, consectetur
More informationComparative Summarization via Latent Dirichlet Allocation
Comparative Summarization via Latent Dirichlet Allocation Michal Campr and Karel Jezek Department of Computer Science and Engineering, FAV, University of West Bohemia, 11 February 2013, 301 00, Plzen,
More informationhow to *do* computationally assisted research
how to *do* computationally assisted research digital literacy @ comwell Kristoffer L Nielbo knielbo@sdu.dk knielbo.github.io/ March 22, 2018 1/30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 class Person(object):
More informationLatent Semantic Analysis. Hongning Wang
Latent Semantic Analysis Hongning Wang CS@UVa Recap: vector space model Represent both doc and query by concept vectors Each concept defines one dimension K concepts define a high-dimensional space Element
More informationCLRG Biocreative V
CLRG ChemTMiner @ Biocreative V Sobha Lalitha Devi., Sindhuja Gopalan., Vijay Sundar Ram R., Malarkodi C.S., Lakshmi S., Pattabhi RK Rao Computational Linguistics Research Group, AU-KBC Research Centre
More informationOutline for today. Information Retrieval. Cosine similarity between query and document. tf-idf weighting
Outline for today Information Retrieval Efficient Scoring and Ranking Recap on ranked retrieval Jörg Tiedemann jorg.tiedemann@lingfil.uu.se Department of Linguistics and Philology Uppsala University Efficient
More informationIR Models: The Probabilistic Model. Lecture 8
IR Models: The Probabilistic Model Lecture 8 ' * ) ( % $ $ +#! "#! '& & Probability of Relevance? ' ', IR is an uncertain process Information need to query Documents to index terms Query terms and index
More information