Determining Word Sense Dominance Using a Thesaurus

Size: px
Start display at page:

Download "Determining Word Sense Dominance Using a Thesaurus"

Transcription

1 Determining Word Sense Dominance Using a Thesaurus Saif Mohammad and Graeme Hirst Department of Computer Science University of Toronto EACL, Trento, Italy (5th April, 2006) Copyright cfl2006, Saif Mohammad and Graeme Hirst Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 1

2 Word Sense Dominance star (CELESTIAL BODY) star (CELEBRITY) The degree of dominance of a sense of a word is the proportion of occurrences of that sense in text. Applications: Ξ Sense disambiguation, document clustering,... Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 2

3 McCarthy et al. s Method SIMILARLY SENSE DISTRIBUTED TARGET TEXT D = dominance method Requires WordNet. WORDNET D DOMINANCE VALUES LIN S THESAURUS A U X I L I A R Y C O R P U S Needs auxiliary text with similar sense distribution. Requires retraining (Lin s thesaurus). Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 3

4 Our Method TARGET TEXT PUBLISHED THESAURUS D DOMINANCE VALUES WCCM WCCM = word category co-occurrence matrix A U X I L I A R Y C O R P U S We use a published thesaurus. Auxiliary text need not have similar sense distribution. No retraining is needed (WCCM created just once). Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 4

5 Published Thesauri E.g., Roget s (English), Macquarie (English), Cilin (Chinese), Bunrui Goi Hyou (Japanese) Vocabulary divided into about 1000 categories Words in a category (category terms or c-terms) are Ξ closely related. A category very roughly corresponds to a sense Ξ (Yarowsky, 1992). One word, more than one category Ξ bark in ANIMAL NOISES and MEMBRANE. Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 5

6 Why a Thesaurus? Coarse senses: WordNet is much too fine grained. Computational ease: With just 1000 categories, the word category co-occurrence matrix is of manageable size. Availability: Thesauri are available in many languages. Words for a sense: Each sense can be represented unambiguously with a set of (possibly ambiguous) words. Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 6

7 Word Category Matrix c 1 c 2 ::: c j ::: w 1 m 11 m 12 ::: m 1 j ::: 2 ::: m 2 j :::.. ::: :::.... w m 21 m 22 i ::: ::: w m i1 m i2 m ij WCCM: categories (thesaurus) vs. words (vocabulary) Cell m ij : number of times word w i co-occurs with a c-term listed in category c j Text: most of the BNC Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 7

8 Example CELESTIAL BODY space star CELEBRITY cell (space, CELESTIAL BODY) incremented by 1 cell (space, CELEBRITY) incremented by 1 Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 8

9 Example (continued) CELESTIAL BODY space X... X: star, nova, constellation, sun, empty, web Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 9

10 Word Category Matrix CELESTIAL c 1 c 2 ::: ::: BODY w 1 m 11 m 12 ::: m 1 j ::: w 2 m 21 m 22 ::: m 2 j :::..... :::. ::: space m i1 m i2 "" ::: m ::: Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 10

11 Contingency Table for w and c w: :: c :c w n wc n :w n :c n Applying a statistic gives the strength of association (SoA) cosine Dice odds ratio pointwise mutual information Yule s coefficient of colligation Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 11

12 Evidence for the Senses SoA CELESTIAL BODY space SoA star CELEBRITY Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 12

13 Base WCCM Matrix created after the first pass of unannotated text noisy Ξ Ξ captures strong associations Words that occur close to a target word Ξ Good indicators of intended sense Ξ Co-occurrence frequency used as evidence Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 13

14 Bootstrapping the WCCM Second pass of the auxiliary corpus Word sense disambiguation: using co-occurring words Ξ and evidence from base WCCM New, more accurate, WCCM Cell m Ξ ij : number of times word used in sense c j co-occurs with w i Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 14

15 Four Methods Implicit sense disambiguation Explicit sense disambiguation Weighted voting D I,W D E,W Unweighted voting D I,U D E,U The stronger the association of a sense with its co-occurring words, the higher is its dominance. Weighted vote (SoA) to each sense or unweighted vote to sense with the highest SoA Explicit word sense disambiguation or not Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 15

16 Method: D I;W Each word that co-occurs with the target word t gives a weighted vote (SoA) to each sense. Dominance of a sense c is the proportion of votes it gets. D I;W(t ;c) = w2t SoA(w;c) 2senses(t) c0 w2t SoA(w;c ) 0 T is the set of all words that co-occur with t. Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 16

17 Method: D I;U Each word that co-occurs with the target word gives an unweighted, equal vote to a winner sense. Sense with highest strength of association with cooccurring Ξ words Dominance of a sense is the proportion of votes it gets. D jfw T : argmax c ) 2senses(t) = cgj I;U(t 0 2 SoA(w;c0 = ;c) j jt Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 17

18 Methods: D E ;W and D E ;U Explicit sense disambiguation Votes from co-occurring words Ξ Votes can be weighted or unweighted Ξ Dominance of a sense Ξ Proportion of occurrences pertaining to that sense Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 18

19 Experimental Setup Naïve sense disambiguation system Ξ Gives predominant sense as output Test datasets Different sense distributions of the two most dominant Ξ senses of each target word Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 19

20 Sense-tagged Data We created pseudo-thesaurus-sense-tagged data for the 27 head words in SENSEVAL-1 English Sample Space using the held out subset of BNC. Non-monosemous target word: brilliant Category: INTELLIGENCE Monosemous c-term: clever Sentence from auxiliary text: Hermione had a clever plan. Sense annotated sentence: Hermione had a brilliant//intelligence plan. Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 20

21 Best Results Accuracy Sense distribution (alpha) Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 21

22 Best Results Accuracy Upper bound Sense distribution (alpha) Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 22

23 Best Results Accuracy Upper bound Random baseline Sense distribution (alpha) Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 23

24 Best Results Accuracy Upper bound Random baseline Lower bound Sense distribution (alpha) Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 24

25 Best Results Accuracy D E,W Mean distance below upper bound Sense distribution (alpha) Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 25

26 Best Results Accuracy D E,W Mean distance below upper bound D E ;W :.02 pmi, odds, Yule Sense distribution (alpha) Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 26

27 Best Results Accuracy D E,W D I,W Mean distance below upper bound D E ;W :.02 pmi, odds, Yule D I;W : pmi Sense distribution (alpha) Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 27

28 Best Results Accuracy D E,W D I,W D I,U D E,U Mean distance below upper bound D E ;W :.02 pmi, odds, Yule D I;W : Sense distribution (alpha) pmi D I;U:.11 phi, pmi, odds, Yule D E ;U:.16 phi, pmi, odds, Yule Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 28

29 Observations Accuracy D E,W D I,W D I,U D E,U Weighted methods are better Explicit or implicit disambiguation does not matter Odds, pmi, and Yule are better Sense distribution (alpha) Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 29

30 Effect of Bootstrapping Accuracy DE,W (odds) bootstrapped Sense distribution (alpha) Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 30

31 Effect of Bootstrapping Accuracy DE,W (odds) bootstrapped D E,W (odds) Sense distribution (alpha) Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 31

32 Effect of Bootstrapping Accuracy DE,W (odds) bootstrapped D E,W (odds) Most of the gain given by the first iteration Relative behavior of measures more or less the same Sense distribution (alpha) Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 32

33 In Summary New methods for determining sense dominance Raw text and a published thesaurus Ξ No similarly-sense-distributed text or re-training Ξ Extensive experiments Ξ Synthetically created thesaurus-sense-tagged data Results are close to the upper bound Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 33

34 Future Work WCCM has applications beyond sense dominance. Linguistic distances Distributional distance of concepts Word sense disambiguation Unsupervised naïve Bayes classifier Machine translation Domain-specific translational dominance Document clustering Represent document in concept space Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 34

Word Sense Disambiguation (WSD) Based on Foundations of Statistical NLP by C. Manning & H. Schütze, ch. 7 MIT Press, 2002

Word Sense Disambiguation (WSD) Based on Foundations of Statistical NLP by C. Manning & H. Schütze, ch. 7 MIT Press, 2002 0. Word Sense Disambiguation (WSD) Based on Foundations of Statistical NLP by C. Manning & H. Schütze, ch. 7 MIT Press, 2002 WSD Examples 1. They have the right to bear arms. (drept) The sign on the right

More information

Recap: HMM. ANLP Lecture 9: Algorithms for HMMs. More general notation. Recap: HMM. Elements of HMM: Sharon Goldwater 4 Oct 2018.

Recap: HMM. ANLP Lecture 9: Algorithms for HMMs. More general notation. Recap: HMM. Elements of HMM: Sharon Goldwater 4 Oct 2018. Recap: HMM ANLP Lecture 9: Algorithms for HMMs Sharon Goldwater 4 Oct 2018 Elements of HMM: Set of states (tags) Output alphabet (word types) Start state (beginning of sentence) State transition probabilities

More information

Statistical NLP Spring 2010

Statistical NLP Spring 2010 Statistical NLP Spring 2010 Lecture 5: WSD / Maxent Dan Klein UC Berkeley Unsupervised Learning with EM Goal, learn parameters without observing labels y y y θ x x x x x x x x x 1 EM: More Formally Hard

More information

Learning Features from Co-occurrences: A Theoretical Analysis

Learning Features from Co-occurrences: A Theoretical Analysis Learning Features from Co-occurrences: A Theoretical Analysis Yanpeng Li IBM T. J. Watson Research Center Yorktown Heights, New York 10598 liyanpeng.lyp@gmail.com Abstract Representing a word by its co-occurrences

More information

Toponym Disambiguation using Ontology-based Semantic Similarity

Toponym Disambiguation using Ontology-based Semantic Similarity Toponym Disambiguation using Ontology-based Semantic Similarity David S Batista 1, João D Ferreira 2, Francisco M Couto 2, and Mário J Silva 1 1 IST/INESC-ID Lisbon, Portugal {dsbatista,msilva}@inesc-id.pt

More information

Topic Models and Applications to Short Documents

Topic Models and Applications to Short Documents Topic Models and Applications to Short Documents Dieu-Thu Le Email: dieuthu.le@unitn.it Trento University April 6, 2011 1 / 43 Outline Introduction Latent Dirichlet Allocation Gibbs Sampling Short Text

More information

Categorization ANLP Lecture 10 Text Categorization with Naive Bayes

Categorization ANLP Lecture 10 Text Categorization with Naive Bayes 1 Categorization ANLP Lecture 10 Text Categorization with Naive Bayes Sharon Goldwater 6 October 2014 Important task for both humans and machines object identification face recognition spoken word recognition

More information

ANLP Lecture 10 Text Categorization with Naive Bayes

ANLP Lecture 10 Text Categorization with Naive Bayes ANLP Lecture 10 Text Categorization with Naive Bayes Sharon Goldwater 6 October 2014 Categorization Important task for both humans and machines 1 object identification face recognition spoken word recognition

More information

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data

More information

Semantic Similarity from Corpora - Latent Semantic Analysis

Semantic Similarity from Corpora - Latent Semantic Analysis Semantic Similarity from Corpora - Latent Semantic Analysis Carlo Strapparava FBK-Irst Istituto per la ricerca scientifica e tecnologica I-385 Povo, Trento, ITALY strappa@fbk.eu Overview Latent Semantic

More information

HMM Expanded to Multiple Interleaved Chains as a Model for Word Sense Disambiguation

HMM Expanded to Multiple Interleaved Chains as a Model for Word Sense Disambiguation HMM Expanded to Multiple Interleaved Chains as a Model for Word Sense Disambiguation Denis Turdakov and Dmitry Lizorkin Institute for System Programming of the Russian Academy of Sciences, 25 Solzhenitsina

More information

Bowl Maximum Entropy #4 By Ejay Weiss. Maxent Models: Maximum Entropy Foundations. By Yanju Chen. A Basic Comprehension with Derivations

Bowl Maximum Entropy #4 By Ejay Weiss. Maxent Models: Maximum Entropy Foundations. By Yanju Chen. A Basic Comprehension with Derivations Bowl Maximum Entropy #4 By Ejay Weiss Maxent Models: Maximum Entropy Foundations By Yanju Chen A Basic Comprehension with Derivations Outlines Generative vs. Discriminative Feature-Based Models Softmax

More information

Probabilistic Context-free Grammars

Probabilistic Context-free Grammars Probabilistic Context-free Grammars Computational Linguistics Alexander Koller 24 November 2017 The CKY Recognizer S NP VP NP Det N VP V NP V ate NP John Det a N sandwich i = 1 2 3 4 k = 2 3 4 5 S NP John

More information

Lecture 12: Algorithms for HMMs

Lecture 12: Algorithms for HMMs Lecture 12: Algorithms for HMMs Nathan Schneider (some slides from Sharon Goldwater; thanks to Jonathan May for bug fixes) ENLP 26 February 2018 Recap: tagging POS tagging is a sequence labelling task.

More information

Spatial Role Labeling CS365 Course Project

Spatial Role Labeling CS365 Course Project Spatial Role Labeling CS365 Course Project Amit Kumar, akkumar@iitk.ac.in Chandra Sekhar, gchandra@iitk.ac.in Supervisor : Dr.Amitabha Mukerjee ABSTRACT In natural language processing one of the important

More information

Cross-Lingual Language Modeling for Automatic Speech Recogntion

Cross-Lingual Language Modeling for Automatic Speech Recogntion GBO Presentation Cross-Lingual Language Modeling for Automatic Speech Recogntion November 14, 2003 Woosung Kim woosung@cs.jhu.edu Center for Language and Speech Processing Dept. of Computer Science The

More information

Phrase-Based Statistical Machine Translation with Pivot Languages

Phrase-Based Statistical Machine Translation with Pivot Languages Phrase-Based Statistical Machine Translation with Pivot Languages N. Bertoldi, M. Barbaiani, M. Federico, R. Cattoni FBK, Trento - Italy Rovira i Virgili University, Tarragona - Spain October 21st, 2008

More information

Multi-Layer Boosting for Pattern Recognition

Multi-Layer Boosting for Pattern Recognition Multi-Layer Boosting for Pattern Recognition François Fleuret IDIAP Research Institute, Centre du Parc, P.O. Box 592 1920 Martigny, Switzerland fleuret@idiap.ch Abstract We extend the standard boosting

More information

Classification & Information Theory Lecture #8

Classification & Information Theory Lecture #8 Classification & Information Theory Lecture #8 Introduction to Natural Language Processing CMPSCI 585, Fall 2007 University of Massachusetts Amherst Andrew McCallum Today s Main Points Automatically categorizing

More information

Chapter 2 The Naïve Bayes Model in the Context of Word Sense Disambiguation

Chapter 2 The Naïve Bayes Model in the Context of Word Sense Disambiguation Chapter 2 The Naïve Bayes Model in the Context of Word Sense Disambiguation Abstract This chapter discusses the Naïve Bayes model strictly in the context of word sense disambiguation. The theoretical model

More information

Lecture 12: Algorithms for HMMs

Lecture 12: Algorithms for HMMs Lecture 12: Algorithms for HMMs Nathan Schneider (some slides from Sharon Goldwater; thanks to Jonathan May for bug fixes) ENLP 17 October 2016 updated 9 September 2017 Recap: tagging POS tagging is a

More information

Word Alignment by Thresholded Two-Dimensional Normalization

Word Alignment by Thresholded Two-Dimensional Normalization Word Alignment by Thresholded Two-Dimensional Normalization Hamidreza Kobdani, Alexander Fraser, Hinrich Schütze Institute for Natural Language Processing University of Stuttgart Germany {kobdani,fraser}@ims.uni-stuttgart.de

More information

A Tutorial on Learning with Bayesian Networks

A Tutorial on Learning with Bayesian Networks A utorial on Learning with Bayesian Networks David Heckerman Presented by: Krishna V Chengavalli April 21 2003 Outline Introduction Different Approaches Bayesian Networks Learning Probabilities and Structure

More information

Classification by Selecting Plausible Formal Concepts in a Concept Lattice

Classification by Selecting Plausible Formal Concepts in a Concept Lattice Classification by Selecting Plausible Formal Concepts in a Concept Lattice Madori Ikeda and Akihito Yamamoto Graduate School of Informatics, Kyoto University, Kyoto 606-8501, Japan m.ikeda@iip.ist.i.kyoto-u.ac.jp,

More information

The Noisy Channel Model and Markov Models

The Noisy Channel Model and Markov Models 1/24 The Noisy Channel Model and Markov Models Mark Johnson September 3, 2014 2/24 The big ideas The story so far: machine learning classifiers learn a function that maps a data item X to a label Y handle

More information

Improving Topic Models with Latent Feature Word Representations

Improving Topic Models with Latent Feature Word Representations Improving Topic Models with Latent Feature Word Representations Dat Quoc Nguyen Joint work with Richard Billingsley, Lan Du and Mark Johnson Department of Computing Macquarie University Sydney, Australia

More information

Naïve Bayes Classifiers

Naïve Bayes Classifiers Naïve Bayes Classifiers Example: PlayTennis (6.9.1) Given a new instance, e.g. (Outlook = sunny, Temperature = cool, Humidity = high, Wind = strong ), we want to compute the most likely hypothesis: v NB

More information

6.036 midterm review. Wednesday, March 18, 15

6.036 midterm review. Wednesday, March 18, 15 6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that

More information

DISTRIBUTIONAL SEMANTICS

DISTRIBUTIONAL SEMANTICS COMP90042 LECTURE 4 DISTRIBUTIONAL SEMANTICS LEXICAL DATABASES - PROBLEMS Manually constructed Expensive Human annotation can be biased and noisy Language is dynamic New words: slangs, terminology, etc.

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Maxent Models and Discriminative Estimation

Maxent Models and Discriminative Estimation Maxent Models and Discriminative Estimation Generative vs. Discriminative models (Reading: J+M Ch6) Introduction So far we ve looked at generative models Language models, Naive Bayes But there is now much

More information

Introduction to Logistic Regression and Support Vector Machine

Introduction to Logistic Regression and Support Vector Machine Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel

More information

Quasi-Synchronous Phrase Dependency Grammars for Machine Translation. lti

Quasi-Synchronous Phrase Dependency Grammars for Machine Translation. lti Quasi-Synchronous Phrase Dependency Grammars for Machine Translation Kevin Gimpel Noah A. Smith 1 Introduction MT using dependency grammars on phrases Phrases capture local reordering and idiomatic translations

More information

Low-Dimensional Discriminative Reranking. Jagadeesh Jagarlamudi and Hal Daume III University of Maryland, College Park

Low-Dimensional Discriminative Reranking. Jagadeesh Jagarlamudi and Hal Daume III University of Maryland, College Park Low-Dimensional Discriminative Reranking Jagadeesh Jagarlamudi and Hal Daume III University of Maryland, College Park Discriminative Reranking Useful for many NLP tasks Enables us to use arbitrary features

More information

TEXT SIMILARITY MEASUREMENT WITH SEMANTIC ANALYSIS. Xu Li, Na Liu, Chunlong Yao and Fenglong Fan. Received February 2017; revised June 2017

TEXT SIMILARITY MEASUREMENT WITH SEMANTIC ANALYSIS. Xu Li, Na Liu, Chunlong Yao and Fenglong Fan. Received February 2017; revised June 2017 International Journal of Innovative Computing, Information and Control ICIC International c 2017 ISSN 1349-4198 Volume 13, Number 5, October 2017 pp. 1693 1708 TEXT SIMILARITY MEASUREMENT WITH SEMANTIC

More information

Topics. Bayesian Learning. What is Bayesian Learning? Objectives for Bayesian Learning

Topics. Bayesian Learning. What is Bayesian Learning? Objectives for Bayesian Learning Topics Bayesian Learning Sattiraju Prabhakar CS898O: ML Wichita State University Objectives for Bayesian Learning Bayes Theorem and MAP Bayes Optimal Classifier Naïve Bayes Classifier An Example Classifying

More information

Behavioral Data Mining. Lecture 2

Behavioral Data Mining. Lecture 2 Behavioral Data Mining Lecture 2 Autonomy Corp Bayes Theorem Bayes Theorem P(A B) = probability of A given that B is true. P(A B) = P(B A)P(A) P(B) In practice we are most interested in dealing with events

More information

The OntoNL Semantic Relatedness Measure for OWL Ontologies

The OntoNL Semantic Relatedness Measure for OWL Ontologies The OntoNL Semantic Relatedness Measure for OWL Ontologies Anastasia Karanastasi and Stavros hristodoulakis Laboratory of Distributed Multimedia Information Systems and Applications Technical University

More information

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run

More information

Semantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing

Semantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing Semantics with Dense Vectors Reference: D. Jurafsky and J. Martin, Speech and Language Processing 1 Semantics with Dense Vectors We saw how to represent a word as a sparse vector with dimensions corresponding

More information

Prenominal Modifier Ordering via MSA. Alignment

Prenominal Modifier Ordering via MSA. Alignment Introduction Prenominal Modifier Ordering via Multiple Sequence Alignment Aaron Dunlop Margaret Mitchell 2 Brian Roark Oregon Health & Science University Portland, OR 2 University of Aberdeen Aberdeen,

More information

MIA - Master on Artificial Intelligence

MIA - Master on Artificial Intelligence MIA - Master on Artificial Intelligence 1 Introduction Unsupervised & semi-supervised approaches Supervised Algorithms Maximum Likelihood Estimation Maximum Entropy Modeling Introduction 1 Introduction

More information

An Empirical Study on Computing Consensus Translations from Multiple Machine Translation Systems

An Empirical Study on Computing Consensus Translations from Multiple Machine Translation Systems An Empirical Study on Computing Consensus Translations from Multiple Machine Translation Systems Wolfgang Macherey Google Inc. 1600 Amphitheatre Parkway Mountain View, CA 94043, USA wmach@google.com Franz

More information

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015 Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch COMP-599 Oct 1, 2015 Announcements Research skills workshop today 3pm-4:30pm Schulich Library room 313 Start thinking about

More information

Classification, Linear Models, Naïve Bayes

Classification, Linear Models, Naïve Bayes Classification, Linear Models, Naïve Bayes CMSC 470 Marine Carpuat Slides credit: Dan Jurafsky & James Martin, Jacob Eisenstein Today Text classification problems and their evaluation Linear classifiers

More information

Latent Semantic Analysis. Hongning Wang

Latent Semantic Analysis. Hongning Wang Latent Semantic Analysis Hongning Wang CS@UVa VS model in practice Document and query are represented by term vectors Terms are not necessarily orthogonal to each other Synonymy: car v.s. automobile Polysemy:

More information

Expectation Maximization, and Learning from Partly Unobserved Data (part 2)

Expectation Maximization, and Learning from Partly Unobserved Data (part 2) Expectation Maximization, and Learning from Partly Unobserved Data (part 2) Machine Learning 10-701 April 2005 Tom M. Mitchell Carnegie Mellon University Clustering Outline K means EM: Mixture of Gaussians

More information

Generative Clustering, Topic Modeling, & Bayesian Inference

Generative Clustering, Topic Modeling, & Bayesian Inference Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week

More information

Latent Semantic Analysis. Hongning Wang

Latent Semantic Analysis. Hongning Wang Latent Semantic Analysis Hongning Wang CS@UVa Recap: vector space model Represent both doc and query by concept vectors Each concept defines one dimension K concepts define a high-dimensional space Element

More information

Hidden Markov Models

Hidden Markov Models CS 2750: Machine Learning Hidden Markov Models Prof. Adriana Kovashka University of Pittsburgh March 21, 2016 All slides are from Ray Mooney Motivating Example: Part Of Speech Tagging Annotate each word

More information

On the Problem of Error Propagation in Classifier Chains for Multi-Label Classification

On the Problem of Error Propagation in Classifier Chains for Multi-Label Classification On the Problem of Error Propagation in Classifier Chains for Multi-Label Classification Robin Senge, Juan José del Coz and Eyke Hüllermeier Draft version of a paper to appear in: L. Schmidt-Thieme and

More information

A Practical Algorithm for Topic Modeling with Provable Guarantees

A Practical Algorithm for Topic Modeling with Provable Guarantees 1 A Practical Algorithm for Topic Modeling with Provable Guarantees Sanjeev Arora, Rong Ge, Yoni Halpern, David Mimno, Ankur Moitra, David Sontag, Yichen Wu, Michael Zhu Reviewed by Zhao Song December

More information

N-grams. Motivation. Simple n-grams. Smoothing. Backoff. N-grams L545. Dept. of Linguistics, Indiana University Spring / 24

N-grams. Motivation. Simple n-grams. Smoothing. Backoff. N-grams L545. Dept. of Linguistics, Indiana University Spring / 24 L545 Dept. of Linguistics, Indiana University Spring 2013 1 / 24 Morphosyntax We just finished talking about morphology (cf. words) And pretty soon we re going to discuss syntax (cf. sentences) In between,

More information

Mining Newsgroups Using Networks Arising From Social Behavior by Rakesh Agrawal et al. Presented by Will Lee

Mining Newsgroups Using Networks Arising From Social Behavior by Rakesh Agrawal et al. Presented by Will Lee Mining Newsgroups Using Networks Arising From Social Behavior by Rakesh Agrawal et al. Presented by Will Lee wwlee1@uiuc.edu September 28, 2004 Motivation IR on newsgroups is challenging due to lack of

More information

Hidden Markov Models (HMMs)

Hidden Markov Models (HMMs) Hidden Markov Models HMMs Raymond J. Mooney University of Texas at Austin 1 Part Of Speech Tagging Annotate each word in a sentence with a part-of-speech marker. Lowest level of syntactic analysis. John

More information

CS4705. Probability Review and Naïve Bayes. Slides from Dragomir Radev

CS4705. Probability Review and Naïve Bayes. Slides from Dragomir Radev CS4705 Probability Review and Naïve Bayes Slides from Dragomir Radev Classification using a Generative Approach Previously on NLP discriminative models P C D here is a line with all the social media posts

More information

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Schütze s, linked from http://informationretrieval.org/ IR 26/26: Feature Selection and Exam Overview Paul Ginsparg Cornell University,

More information

Laconic: Label Consistency for Image Categorization

Laconic: Label Consistency for Image Categorization 1 Laconic: Label Consistency for Image Categorization Samy Bengio, Google with Jeff Dean, Eugene Ie, Dumitru Erhan, Quoc Le, Andrew Rabinovich, Jon Shlens, and Yoram Singer 2 Motivation WHAT IS THE OCCLUDED

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

B u i l d i n g a n d E x p l o r i n g

B u i l d i n g a n d E x p l o r i n g B u i l d i n g a n d E x p l o r i n g ( Web) Corpora EMLS 2008, Stuttgart 23-25 July 2008 Pavel Rychlý pary@fi.muni.cz NLPlab, Masaryk University, Brno O u t l i n e (1)Introduction to text/web corpora

More information

Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs

Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs (based on slides by Sharon Goldwater and Philipp Koehn) 21 February 2018 Nathan Schneider ENLP Lecture 11 21

More information

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other

More information

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS RETRIEVAL MODELS Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Retrieval models Boolean model Vector space model Probabilistic

More information

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other

More information

SYNTHER A NEW M-GRAM POS TAGGER

SYNTHER A NEW M-GRAM POS TAGGER SYNTHER A NEW M-GRAM POS TAGGER David Sündermann and Hermann Ney RWTH Aachen University of Technology, Computer Science Department Ahornstr. 55, 52056 Aachen, Germany {suendermann,ney}@cs.rwth-aachen.de

More information

Classifying Documents with Poisson Mixtures

Classifying Documents with Poisson Mixtures Classifying Documents with Poisson Mixtures Hiroshi Ogura, Hiromi Amano, Masato Kondo Department of Information Science, Faculty of Arts and Sciences at Fujiyoshida, Showa University, 4562 Kamiyoshida,

More information

Department of Computer Science and Engineering Indian Institute of Technology, Kanpur. Spatial Role Labeling

Department of Computer Science and Engineering Indian Institute of Technology, Kanpur. Spatial Role Labeling Department of Computer Science and Engineering Indian Institute of Technology, Kanpur CS 365 Artificial Intelligence Project Report Spatial Role Labeling Submitted by Satvik Gupta (12633) and Garvit Pahal

More information

Machine Learning for natural language processing

Machine Learning for natural language processing Machine Learning for natural language processing Classification: k nearest neighbors Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 28 Introduction Classification = supervised method

More information

Structured Output Prediction: Generative Models

Structured Output Prediction: Generative Models Structured Output Prediction: Generative Models CS6780 Advanced Machine Learning Spring 2015 Thorsten Joachims Cornell University Reading: Murphy 17.3, 17.4, 17.5.1 Structured Output Prediction Supervised

More information

Statistical Models of the Annotation Process

Statistical Models of the Annotation Process Bob Carpenter 1 Massimo Poesio 2 1 Alias-I 2 Università di Trento LREC 2010 Tutorial 17th May 2010 Many slides due to Ron Artstein Annotated corpora Annotated corpora are needed for: Supervised learning

More information

Language as a Stochastic Process

Language as a Stochastic Process CS769 Spring 2010 Advanced Natural Language Processing Language as a Stochastic Process Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu 1 Basic Statistics for NLP Pick an arbitrary letter x at random from any

More information

arxiv: v1 [cs.lg] 5 Jul 2010

arxiv: v1 [cs.lg] 5 Jul 2010 The Latent Bernoulli-Gauss Model for Data Analysis Amnon Shashua Gabi Pragier School of Computer Science and Engineering Hebrew University of Jerusalem arxiv:1007.0660v1 [cs.lg] 5 Jul 2010 Abstract We

More information

CS 188: Artificial Intelligence. Outline

CS 188: Artificial Intelligence. Outline CS 188: Artificial Intelligence Lecture 21: Perceptrons Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein. Outline Generative vs. Discriminative Binary Linear Classifiers Perceptron Multi-class

More information

More on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013

More on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013 More on HMMs and other sequence models Intro to NLP - ETHZ - 18/03/2013 Summary Parts of speech tagging HMMs: Unsupervised parameter estimation Forward Backward algorithm Bayesian variants Discriminative

More information

ICDM Pisa, December 16, Motivation

ICDM Pisa, December 16, Motivation Using Wikipedia for Co-clustering Based Cross-domain Text Classification Pu Wang and Carlotta Domeniconi George Mason University Jian Hu Microsoft Research Asia ICDM Pisa, December 16, 2008 Motivation

More information

Bootstrapping. 1 Overview. 2 Problem Setting and Notation. Steven Abney AT&T Laboratories Research 180 Park Avenue Florham Park, NJ, USA, 07932

Bootstrapping. 1 Overview. 2 Problem Setting and Notation. Steven Abney AT&T Laboratories Research 180 Park Avenue Florham Park, NJ, USA, 07932 Bootstrapping Steven Abney AT&T Laboratories Research 180 Park Avenue Florham Park, NJ, USA, 07932 Abstract This paper refines the analysis of cotraining, defines and evaluates a new co-training algorithm

More information

Semantic Term Matching in Axiomatic Approaches to Information Retrieval

Semantic Term Matching in Axiomatic Approaches to Information Retrieval Semantic Term Matching in Axiomatic Approaches to Information Retrieval Hui Fang Department of Computer Science University of Illinois at Urbana-Champaign ChengXiang Zhai Department of Computer Science

More information

Internet Engineering Jacek Mazurkiewicz, PhD

Internet Engineering Jacek Mazurkiewicz, PhD Internet Engineering Jacek Mazurkiewicz, PhD Softcomputing Part 11: SoftComputing Used for Big Data Problems Agenda Climate Changes Prediction System Based on Weather Big Data Visualisation Natural Language

More information

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata Principles of Pattern Recognition C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata e-mail: murthy@isical.ac.in Pattern Recognition Measurement Space > Feature Space >Decision

More information

Analysis and visualization of protein-protein interactions. Olga Vitek Assistant Professor Statistics and Computer Science

Analysis and visualization of protein-protein interactions. Olga Vitek Assistant Professor Statistics and Computer Science 1 Analysis and visualization of protein-protein interactions Olga Vitek Assistant Professor Statistics and Computer Science 2 Outline 1. Protein-protein interactions 2. Using graph structures to study

More information

Hidden Markov Models

Hidden Markov Models 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 22 April 2, 2018 1 Reminders Homework

More information

IBM Model 1 for Machine Translation

IBM Model 1 for Machine Translation IBM Model 1 for Machine Translation Micha Elsner March 28, 2014 2 Machine translation A key area of computational linguistics Bar-Hillel points out that human-like translation requires understanding of

More information

Latent Variable Models in NLP

Latent Variable Models in NLP Latent Variable Models in NLP Aria Haghighi with Slav Petrov, John DeNero, and Dan Klein UC Berkeley, CS Division Latent Variable Models Latent Variable Models Latent Variable Models Observed Latent Variable

More information

Joint Emotion Analysis via Multi-task Gaussian Processes

Joint Emotion Analysis via Multi-task Gaussian Processes Joint Emotion Analysis via Multi-task Gaussian Processes Daniel Beck, Trevor Cohn, Lucia Specia October 28, 2014 1 Introduction 2 Multi-task Gaussian Process Regression 3 Experiments and Discussion 4 Conclusions

More information

Semantic Similarity and Relatedness

Semantic Similarity and Relatedness Semantic Relatedness Semantic Similarity and Relatedness (Based on Budanitsky, Hirst 2006 and Chapter 20 of Jurafsky/Martin 2 nd. Ed. - Most figures taken from either source.) Many applications require

More information

Multiword Expression Identification with Tree Substitution Grammars

Multiword Expression Identification with Tree Substitution Grammars Multiword Expression Identification with Tree Substitution Grammars Spence Green, Marie-Catherine de Marneffe, John Bauer, and Christopher D. Manning Stanford University EMNLP 2011 Main Idea Use syntactic

More information

Midterm sample questions

Midterm sample questions Midterm sample questions UMass CS 585, Fall 2015 October 18, 2015 1 Midterm policies The midterm will take place during lecture next Tuesday, 1 hour and 15 minutes. It is closed book, EXCEPT you can create

More information

Homework 1 Solutions Probability, Maximum Likelihood Estimation (MLE), Bayes Rule, knn

Homework 1 Solutions Probability, Maximum Likelihood Estimation (MLE), Bayes Rule, knn Homework 1 Solutions Probability, Maximum Likelihood Estimation (MLE), Bayes Rule, knn CMU 10-701: Machine Learning (Fall 2016) https://piazza.com/class/is95mzbrvpn63d OUT: September 13th DUE: September

More information

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements CS 188: Artificial Intelligence Spring 2010 Lecture 22: Nearest Neighbors, Kernels 4/18/2011 Pieter Abbeel UC Berkeley Slides adapted from Dan Klein Announcements On-going: contest (optional and FUN!)

More information

Word2Vec Embedding. Embedding. Word Embedding 1.1 BEDORE. Word Embedding. 1.2 Embedding. Word Embedding. Embedding.

Word2Vec Embedding. Embedding. Word Embedding 1.1 BEDORE. Word Embedding. 1.2 Embedding. Word Embedding. Embedding. c Word Embedding Embedding Word2Vec Embedding Word EmbeddingWord2Vec 1. Embedding 1.1 BEDORE 0 1 BEDORE 113 0033 2 35 10 4F y katayama@bedore.jp Word Embedding Embedding 1.2 Embedding Embedding Word Embedding

More information

10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification

10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification 10-810: Advanced Algorithms and Models for Computational Biology Optimal leaf ordering and classification Hierarchical clustering As we mentioned, its one of the most popular methods for clustering gene

More information

CS540 ANSWER SHEET

CS540 ANSWER SHEET CS540 ANSWER SHEET Name Email 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 1 2 Final Examination CS540-1: Introduction to Artificial Intelligence Fall 2016 20 questions, 5 points

More information

Construction of thesaurus in the field of car patent Tingting Mao 1, Xueqiang Lv 1, Kehui Liu 2

Construction of thesaurus in the field of car patent Tingting Mao 1, Xueqiang Lv 1, Kehui Liu 2 International Conference on Education Technology, Management and Humanities Science (ETMHS 2015) Construction of thesaurus in the field of car patent Tingting Mao 1, Xueqiang Lv 1, Kehui Liu 2 1 (Beijing

More information

Testing the Efficacy of Part-of-Speech Information in Word Completion

Testing the Efficacy of Part-of-Speech Information in Word Completion Testing the Efficacy of Part-of-Speech Information in Word Completion Afsaneh Fazly and Graeme Hirst Department of Computer Science University of Toronto Toronto, Ontario, Canada M5S 3G4 fafsaneh, ghg@cs.toronto.edu

More information

Lecture 5 Neural models for NLP

Lecture 5 Neural models for NLP CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm

More information

Parts 3-6 are EXAMPLES for cse634

Parts 3-6 are EXAMPLES for cse634 1 Parts 3-6 are EXAMPLES for cse634 FINAL TEST CSE 352 ARTIFICIAL INTELLIGENCE Fall 2008 There are 6 pages in this exam. Please make sure you have all of them INTRODUCTION Philosophical AI Questions Q1.

More information

Global Machine Learning for Spatial Ontology Population

Global Machine Learning for Spatial Ontology Population Global Machine Learning for Spatial Ontology Population Parisa Kordjamshidi, Marie-Francine Moens KU Leuven, Belgium Abstract Understanding spatial language is important in many applications such as geographical

More information

Foundations of Natural Language Processing Lecture 5 More smoothing and the Noisy Channel Model

Foundations of Natural Language Processing Lecture 5 More smoothing and the Noisy Channel Model Foundations of Natural Language Processing Lecture 5 More smoothing and the Noisy Channel Model Alex Lascarides (Slides based on those from Alex Lascarides, Sharon Goldwater and Philipop Koehn) 30 January

More information

Ensemble Methods for Machine Learning

Ensemble Methods for Machine Learning Ensemble Methods for Machine Learning COMBINING CLASSIFIERS: ENSEMBLE APPROACHES Common Ensemble classifiers Bagging/Random Forests Bucket of models Stacking Boosting Ensemble classifiers we ve studied

More information

Deep Learning Sequence to Sequence models: Attention Models. 17 March 2018

Deep Learning Sequence to Sequence models: Attention Models. 17 March 2018 Deep Learning Sequence to Sequence models: Attention Models 17 March 2018 1 Sequence-to-sequence modelling Problem: E.g. A sequence X 1 X N goes in A different sequence Y 1 Y M comes out Speech recognition:

More information