Unsupervised Rank Aggregation with Distance-Based Models

Size: px
Start display at page:

Download "Unsupervised Rank Aggregation with Distance-Based Models"

Transcription

1 Unsupervised Rank Aggregation with Distance-Based Models Kevin Small Tufts University Collaborators: Alex Klementiev (Johns Hopkins University) Ivan Titov (Saarland University) Dan Roth (University of Illinois)

2 Motivation Consider a panel of judges Each independently generates (partial) rankings over objects to the best of their ability Query The need to meaningfully aggregate their output is a fundamental problem Applications are plentiful in Information Retrieval and Natural Language Processing 2

3 Multilingual Named Entity Discovery Named Entity Discovery [Klementiev & Roth, ACL 06]: given a bilingual corpus one side of which is annotated with Named Entities, find their counterparts in the other guimaraes Candidate r1 r2 r3 r NEs are often transliterated: rank according to a transliteration model score NEs tend to co-occur across languages: rank according to temporal alignment NEs tend to co-occur in similar contexts: rank according to contextual similarity NEs tend to co-occur in similar topics: rank according to topic similarity etc. 3

4 Overview of Our Approach We propose a formal framework for unsupervised structured label aggregation Judges independently generate a (partial) labeling attempting to reproduce the true underlying label based on their expertise in a given domain We derive an EM-based algorithm treating the votes of individual judges and the true label as the observed and unobserved data, respectively Intuition: experts in a given domain are better at generating votes close to true ranking and will tend to agree with each other, while the non-experts will not We instantiate the framework for the cases of combining permutations, combining top-k lists, and combining dependency parses. 4

5 Notation Permutation over n objects x 1 x n e = (1,2,...,n) is the identity permutation Set S n of all n! permutations Distance d : S n S n R + between permutations E.g. Kendall s tau distance: minimum number of adjacent transpositions d is assumed to be invariant to arbitrary re-labeling of the n objects 1 4 d K (, ) = 3 d(, ) = d(e, ) = D( ). If is a r.v., so is D=D( ) d K (, ) = d K (, ) = D K ( ) =

6 Background: Mallows Models Uniform when = 0 Peaky when is large Expensive to compute where σ θ R, θ 0 is the dispersion parameter σ S n is the location parameter d(.,. ) right-invariant, so Z(θ, σ) does not depend on If D can be decomposed D(π) = m i=1 V i(π) where V i are indep. r.v. s, then E θ (D) may be efficient to compute [Fligner and Verducci 86] 6

7 Generative Story for Aggregation p( ) Generate the true according to prior p( ) p( 1 1, ) p( 2 2, ) p( K K, ) 1 2 K Draw 1 K independently from K Mallows models p( i i, ), with the same location parameter K p(π, σ θ) =p(π) p(σ i θ i,π) i=1 7

8 Background: Extended Mallows Models The associated conditional model (when votes of K judges σ Sn K are available) proposed in [Lebanon and Lafferty 02]: where Free parameters θ R K, θ 0 represent the degree of expertise of individual judges. It is straightforward to generalize both models to partial rankings by constructing appropriate distance functions 8

9 Outline Motivation Problem Statement Overview of our approach Background Mallows models / Extended Mallows models Unsupervised Learning and Inference Incorporating domain-specific expertise Instantiations of the framework Combining permutations / top-k lists Experiments Dependency Parsing Conclusions Introduction and background Our contribution 9

10 Our Approach [ICML 2008] We propose a formal framework for unsupervised rank aggregation based on the extended Mallows model formalism We derive an EM-based algorithm to estimate model parameters Observed data: votes of individual judges Unobserved data: true ranking Judge 1 Judge 2 Judge K (1) (1) 1 2 (1) K (1) (2) (2) 1 2 (2) K (2) Q (Q) (Q) 1 2 (Q) K (Q) 10

11 Learning Denoting to be the value of parameters from the previous iteration, the M step for the i th ranker is: θ LHS RHS In general, > n! computations Average distance between > (n!) votes Q computations Marginal of the unobserved of the i th ranker and π (1..Q) data π (1..Q) 11

12 Learning and Inference LHS RHS Learning (estimating ) For K constituent rankers, repeat: Estimate the RHS given current parameter values Sample with Metropolis-Hastings Depends on structure type, Or use heuristics more about this later Solve the LHS to update Efficient estimation can be done for particular types of distance functions Inference (computing the most likely ranking) Sample with Metropolis-Hastings or use heuristics 12

13 Domain-specific expertise? [IJCAI 2009] Relative expertise may not stay the same May depend on the type of objects May depend on the type of query Query Typically, ranked supervised data to estimate judges expertise is very expensive to obtain Especially for multiple types 13

14 Mallows Models with Domain-Specific Expertise The associated conditional model (when votes of K judges σ Sn K are available) can be derived: p(π, t σ, θ, α) =α t exp ( K ) i=1 θ t,i d(π, σ i ) Z(θ, σ) Free parameters θ R T K, θ 0 represent the degree of expertise of individual judges. α R T are the mixture weights. Note: it is straightforward to generalize these models to other structured labels (e.g. partial rankings) by constructing appropriate distance functions 14

15 Learning α t = 1 Q E θt,i (D) = Q j=1 1 α t Q p(π (j),t σ (j), θ, α ) π (j) S n 1 Q d(π (j),σ (j) i )p(π (j),t σ (j), θ, α ) π (j) S n 3 j=1 2 For each of i th ranker and t th type: Estimate (1) and (2) given current parameter values θ and α α t Solve 3 to update Repeat E θt,i (D) θ t,i 15

16 Instantiating the Framework We have not committed to a particular type of structure In order to instantiate the framework: Design a distance function appropriate for the setting If a function if right invariant and decomposable [LHS] estimation can be done quickly Design a sampling procedure for learning [RHS] and inference 16

17 Case 1: Combining Permutations [LHS] Kendall tau distance D K is the minimum number of adjacent transpositions needed to transform one permutation into another Can be decomposed into a sum of independent random variables: D K (π) = n 1 i=1 V i (π) V i (π) = where j>i I(π 1 (i) π 1 (j)) V i And the expected value can be shown to be: E θ (D K )= neθ n 1 e θ j=1 je θj 1 e θj Monotonically decreasing, can find with line search quickly 17

18 Case 1: Combining Permutations [RHS] Sampling from the base chain of random transpositions Start with a random permutation If chain is at π, randomly transpose two objects forming π If a = p(π θ, σ)/p(π θ, σ) 1 chain moves to π Else, chain moves to π with probability a Note that we can compute distance incrementally, i.e. add the change due to a single transposition Convergence n log(n) if d is Cayley s distance [Diaconis 98], likely similar for some others No convergence results for general case, but it works well in practice 18

19 Case 1: Combining Permutations [RHS] An alternative heuristic: weighted Borda count, i.e. Linearly combine ranks of each object and argsort Model parameters represent relative expertise, so it makes sense to weigh rankers as w i = e ( θ i) w 1 + w 2 + w K argsort 19

20 Case 2: Combining Top-k [LHS] We extend Kendall tau to top-k r grey boxes z white boxes r + z = k D K ( π) = k i=1 π 1 (i) / Z Ũ i ( π)+ r(r +1) 2 + k i=1 π 1 (i) Z Ṽ i ( π) Bring grey boxes to bottom Switch with objects in (k+1) Kendall s tau for the k elements 20

21 Case 2: Combining Top-k [LHS & RHS] R.v. s and are independent, we can use the same trick to show that [LHS] is: E θ ( D K )= k keθ 1 e θ j=r+1 je jθ r(r +1) + 1 ejθ 2 e θ(z+1) r(z +1) 1 e θ(z+1) Also monotonically decreasing, can again use line search Both and reduce to Kendall tau results when same elements are ranked in both lists, i.e. r = 0 Sampling / heuristics for [RHS] and inference are similar to the permutation case 21

22 Outline Motivation Problem Statement Overview of our approach Background Mallows models / Extended Mallows models Unsupervised Learning and Inference Incorporating domain-specific expertise Instantiations of the framework Combining permutations / top-k lists Experiments Dependency Parsing Conclusions Introduction and background Our contribution 22

23 Exp. 1 Combining permutations Using sampling to estimate the RHS Judges: K = 10 (Mallows models) Objects: n = 30 Q = 10 sets of votes Using true rankings to evaluate the RHS Average D k to true permutation Using weighted (e θ i ) Borda heuristic to estimate the RHS EM Iteration Sampling Weighted True 23

24 Exp. 2 Meta-search dispersion parameters Judges: K = 4 search engines (S1, S2, S3, S4) Documents: Top k = 100 Queries: Q = 50 queries Define Mean Reciprocal Page Rank (MRPR): mean rank of the page containing the correct document Our model gets 0.92 Model parameters correspond to ranker quality 24

25 Exp. 3 Top-k rankings: robustness to noise Judges: K = 38 TREC-3 ad-hoc retrieval shared task participants Documents: Top k = 100 documents Queries: Q = 50 queries Replaced K r [0,K] randomly chosen participants with random rankers. Baseline: rank objects according to score: K CombMNZ rank = N x (k r i (x, q)) where r i (x, q) is the rank of x returned by i for query q, and is the number of participants with x in top-k i=1 N x 25

26 Exp. 3 Top-k rankings: robustness to noise Precision Learn to discard random rankers without supervision Aggregation (Top-10) 0.2 CombMNZ rank (Top-10) Aggregation (Top-30) CombMNZ rank (Top-30) Number of random rankers K r 26

27 Outline Motivation Problem Statement Overview of our approach Background Mallows models / Extended Mallows models Unsupervised Learning and Inference Incorporating domain-specific expertise Instantiations of the framework Combining permutations / top-k lists Experiments Dependency Parsing Conclusions Introduction and background Our contribution 27

28 Dependency Parses P PMOD ROOT NMOD SBJ ROOT ADV AMOD NMOD NMOD PMOD P SBJ ADV AMOD NMOD v ROOT 0 Buyers 1 v(1) stepped in to the futures pit v(2) v(3) v(4) v(5) v(6) v(7) v(8) Let v(i) denote a pair of head offset and label, i.e. a link ROOT 0 y(1) y(2) y(3) y(4) y(5) y(6) y(7) y(8) Buyers stepped in to the futures pit SBJ ADV PMOD NMOD SBJ ROOT ADV PMOD NMOD NMOD PMOD ROOT ROOT NMOD PMOD ROOT y Labeled attachment score is link accuracy, trivially computed from Hamming distance: d H (v, y) = n v(i) y(i) i=1 28

29 Dependency Parses: Parameter Estimation Parameter estimation of the type-agnostic model can be done directly Let us assume there are exactly S possibilities for each link, and that j th Q (of Q) sentences has n (j) words, j=1 n(j) = N. On each round of training the learning procedure for the type-agnostic model is equivalent to: R i = 1 N θ i = log R i log(1 R i ) log ( S 1) Q j=1 n (j) l=1 ( v S v y(j) i,(l) exp K i=1 θ i ) v y(j) i,(l) ( v S exp K ) i=1 θ i v y(j) i,(l) With small S, parameter estimation can be done quickly! 29

30 Dependency Parses: Aggregation Dependency parsers are CoNLL-2007 shared task participants 10 languages: Arabic, Basque, Catalan, Chinese, Czech, English, Greek, Hungarian, Italian, and Turkish 131 to 690 sentences and 4513 to 5390 words, depending on the language Between 20 and 23 systems, depending on the language Varied the number of participants attempting to represent expertise in the entire pool Group 1 Group 2 Group K Accuracy Baseline: majority vote on each link (ties broken randomly) 30

31 Predicting Relative Performance Estimation of relative expertise correlates with true relative expertise Results for Italian Labeled Attachment Participant True Rank True Performance sagae@is.s.u-tokyo.ac.jp nakagawa378@oki.com johan.hall@vxu.se carreras@csail.mit.edu chenwl@nict.go.jp attardi@di.unipi.it xyduan@nlpr.ia.ac.cn ivan.titov@cui.unige.ch dasmith@jhu.edu michael.schiehlen@ims.uni-stuttgart.de bcbb@db.csie.ncu.edu.tw prashanth@research.iiit.ac.in richard@cs.lth.se nguyenml@jaist.ac.jp joyce840205@gmail.com s.v.m.canisius@uvt.nl francis.maes@lip6.fr zeman@ufal.mff.cuni.cz svetoslav.marinov@his.se

32 Dependency Parses: Aggregation Performance measured as average accuracy over 10 languages 83 Average Labeled Attachment Score Largest improvement when fewer experts (higher practical significance) Number of good experts grows: voted baseline is harder to beat 79 Voted Baseline Aggregation Model Number of Participants 32

33 Conclusions Propose a formal mathematical and algorithmic framework for aggregating (partial) structured labels without supervision Show that learning can be made efficient for decomposable distance functions Instantiate the framework for combining permutations, combining top-k lists, and dependency parses Introduce a novel distance function for top-k and dependency parses 33

34 Thanks! 34

A Framework for Unsupervised Rank Aggregation

A Framework for Unsupervised Rank Aggregation SIGIR 08 LR4IR Workshop A Framework for Unsupervised Rank Aggregation Alexandre Klementiev, Dan Roth, and Kevin Small Department of Computer Science University of Illinois at Urbana-Champaign Urbana, IL

More information

Unsupervised Ensemble Learning for Mining Top-n Outliers

Unsupervised Ensemble Learning for Mining Top-n Outliers Unsupervised Ensemble Learning for Mining Top-n Outliers Jun Gao 1,WeimingHu 1, Zhongfei(Mark) Zhang 2,andOuWu 1 1 National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of

More information

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015 Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch COMP-599 Oct 1, 2015 Announcements Research skills workshop today 3pm-4:30pm Schulich Library room 313 Start thinking about

More information

Computational Linguistics

Computational Linguistics Computational Linguistics Dependency-based Parsing Clayton Greenberg Stefan Thater FR 4.7 Allgemeine Linguistik (Computerlinguistik) Universität des Saarlandes Summer 2016 Acknowledgements These slides

More information

A Hierarchical Bayesian Model for Unsupervised Induction of Script Knowledge

A Hierarchical Bayesian Model for Unsupervised Induction of Script Knowledge A Hierarchical Bayesian Model for Unsupervised Induction of Script Knowledge Lea Frermann Ivan Titov Manfred Pinkal April, 28th 2014 1 / 22 Contents 1 Introduction 2 Technical Background 3 The Script Model

More information

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly

More information

Computational Linguistics. Acknowledgements. Phrase-Structure Trees. Dependency-based Parsing

Computational Linguistics. Acknowledgements. Phrase-Structure Trees. Dependency-based Parsing Computational Linguistics Dependency-based Parsing Dietrich Klakow & Stefan Thater FR 4.7 Allgemeine Linguistik (Computerlinguistik) Universität des Saarlandes Summer 2013 Acknowledgements These slides

More information

Spectral Unsupervised Parsing with Additive Tree Metrics

Spectral Unsupervised Parsing with Additive Tree Metrics Spectral Unsupervised Parsing with Additive Tree Metrics Ankur Parikh, Shay Cohen, Eric P. Xing Carnegie Mellon, University of Edinburgh Ankur Parikh 2014 1 Overview Model: We present a novel approach

More information

Driving Semantic Parsing from the World s Response

Driving Semantic Parsing from the World s Response Driving Semantic Parsing from the World s Response James Clarke, Dan Goldwasser, Ming-Wei Chang, Dan Roth Cognitive Computation Group University of Illinois at Urbana-Champaign CoNLL 2010 Clarke, Goldwasser,

More information

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018 From Binary to Multiclass Classification CS 6961: Structured Prediction Spring 2018 1 So far: Binary Classification We have seen linear models Learning algorithms Perceptron SVM Logistic Regression Prediction

More information

Lab 12: Structured Prediction

Lab 12: Structured Prediction December 4, 2014 Lecture plan structured perceptron application: confused messages application: dependency parsing structured SVM Class review: from modelization to classification What does learning mean?

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

Unsupervised Learning with Permuted Data

Unsupervised Learning with Permuted Data Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University

More information

Latent Variable Models in NLP

Latent Variable Models in NLP Latent Variable Models in NLP Aria Haghighi with Slav Petrov, John DeNero, and Dan Klein UC Berkeley, CS Division Latent Variable Models Latent Variable Models Latent Variable Models Observed Latent Variable

More information

Structured Prediction Models via the Matrix-Tree Theorem

Structured Prediction Models via the Matrix-Tree Theorem Structured Prediction Models via the Matrix-Tree Theorem Terry Koo Amir Globerson Xavier Carreras Michael Collins maestro@csail.mit.edu gamir@csail.mit.edu carreras@csail.mit.edu mcollins@csail.mit.edu

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring / Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Combining Classifiers Empirical view Theoretical

More information

Figure A1: Treatment and control locations. West. Mid. 14 th Street. 18 th Street. 16 th Street. 17 th Street. 20 th Street

Figure A1: Treatment and control locations. West. Mid. 14 th Street. 18 th Street. 16 th Street. 17 th Street. 20 th Street A Online Appendix A.1 Additional figures and tables Figure A1: Treatment and control locations Treatment Workers 19 th Control Workers Principal West Mid East 20 th 18 th 17 th 16 th 15 th 14 th Notes:

More information

CS 188: Artificial Intelligence. Outline

CS 188: Artificial Intelligence. Outline CS 188: Artificial Intelligence Lecture 21: Perceptrons Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein. Outline Generative vs. Discriminative Binary Linear Classifiers Perceptron Multi-class

More information

Query Performance Prediction: Evaluation Contrasted with Effectiveness

Query Performance Prediction: Evaluation Contrasted with Effectiveness Query Performance Prediction: Evaluation Contrasted with Effectiveness Claudia Hauff 1, Leif Azzopardi 2, Djoerd Hiemstra 1, and Franciska de Jong 1 1 University of Twente, Enschede, the Netherlands {c.hauff,

More information

Low-Dimensional Discriminative Reranking. Jagadeesh Jagarlamudi and Hal Daume III University of Maryland, College Park

Low-Dimensional Discriminative Reranking. Jagadeesh Jagarlamudi and Hal Daume III University of Maryland, College Park Low-Dimensional Discriminative Reranking Jagadeesh Jagarlamudi and Hal Daume III University of Maryland, College Park Discriminative Reranking Useful for many NLP tasks Enables us to use arbitrary features

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

Algorithm-Independent Learning Issues

Algorithm-Independent Learning Issues Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning

More information

Efficient Cryptanalysis of Homophonic Substitution Ciphers

Efficient Cryptanalysis of Homophonic Substitution Ciphers Efficient Cryptanalysis of Homophonic Substitution Ciphers Amrapali Dhavare Richard M. Low Mark Stamp Abstract Substitution ciphers are among the earliest methods of encryption. Examples of classic substitution

More information

Linear Classifiers IV

Linear Classifiers IV Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers IV Blaine Nelson, Tobias Scheffer Contents Classification Problem Bayesian Classifier Decision Linear Classifiers, MAP Models Logistic

More information

Topic Models. Charles Elkan November 20, 2008

Topic Models. Charles Elkan November 20, 2008 Topic Models Charles Elan elan@cs.ucsd.edu November 20, 2008 Suppose that we have a collection of documents, and we want to find an organization for these, i.e. we want to do unsupervised learning. One

More information

Quasi-Synchronous Phrase Dependency Grammars for Machine Translation. lti

Quasi-Synchronous Phrase Dependency Grammars for Machine Translation. lti Quasi-Synchronous Phrase Dependency Grammars for Machine Translation Kevin Gimpel Noah A. Smith 1 Introduction MT using dependency grammars on phrases Phrases capture local reordering and idiomatic translations

More information

Information Retrieval and Organisation

Information Retrieval and Organisation Information Retrieval and Organisation Chapter 13 Text Classification and Naïve Bayes Dell Zhang Birkbeck, University of London Motivation Relevance Feedback revisited The user marks a number of documents

More information

Topic Models and Applications to Short Documents

Topic Models and Applications to Short Documents Topic Models and Applications to Short Documents Dieu-Thu Le Email: dieuthu.le@unitn.it Trento University April 6, 2011 1 / 43 Outline Introduction Latent Dirichlet Allocation Gibbs Sampling Short Text

More information

Outline: Ensemble Learning. Ensemble Learning. The Wisdom of Crowds. The Wisdom of Crowds - Really? Crowd wiser than any individual

Outline: Ensemble Learning. Ensemble Learning. The Wisdom of Crowds. The Wisdom of Crowds - Really? Crowd wiser than any individual Outline: Ensemble Learning We will describe and investigate algorithms to Ensemble Learning Lecture 10, DD2431 Machine Learning A. Maki, J. Sullivan October 2014 train weak classifiers/regressors and how

More information

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima. http://goo.gl/jv7vj9 Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT

More information

Tuning as Linear Regression

Tuning as Linear Regression Tuning as Linear Regression Marzieh Bazrafshan, Tagyoung Chung and Daniel Gildea Department of Computer Science University of Rochester Rochester, NY 14627 Abstract We propose a tuning method for statistical

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

More information

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister A Syntax-based Statistical Machine Translation Model Alexander Friedl, Georg Teichtmeister 4.12.2006 Introduction The model Experiment Conclusion Statistical Translation Model (STM): - mathematical model

More information

Expectation Maximization (EM)

Expectation Maximization (EM) Expectation Maximization (EM) The Expectation Maximization (EM) algorithm is one approach to unsupervised, semi-supervised, or lightly supervised learning. In this kind of learning either no labels are

More information

Multiclass Classification-1

Multiclass Classification-1 CS 446 Machine Learning Fall 2016 Oct 27, 2016 Multiclass Classification Professor: Dan Roth Scribe: C. Cheng Overview Binary to multiclass Multiclass SVM Constraint classification 1 Introduction Multiclass

More information

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels? Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity

More information

Information Extraction from Text

Information Extraction from Text Information Extraction from Text Jing Jiang Chapter 2 from Mining Text Data (2012) Presented by Andrew Landgraf, September 13, 2013 1 What is Information Extraction? Goal is to discover structured information

More information

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima. http://goo.gl/xilnmn Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT

More information

Lecture 13: More uses of Language Models

Lecture 13: More uses of Language Models Lecture 13: More uses of Language Models William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 13 What we ll learn in this lecture Comparing documents, corpora using LM approaches

More information

P R + RQ P Q: Transliteration Mining Using Bridge Language

P R + RQ P Q: Transliteration Mining Using Bridge Language P R + RQ P Q: Transliteration Mining Using Bridge Language Mitesh M. Khapra Raghavendra Udupa n Institute of Technology Microsoft Research, Bombay, Bangalore, Powai, Mumbai 400076 raghavu@microsoft.com

More information

6.891: Lecture 24 (December 8th, 2003) Kernel Methods

6.891: Lecture 24 (December 8th, 2003) Kernel Methods 6.891: Lecture 24 (December 8th, 2003) Kernel Methods Overview ffl Recap: global linear models ffl New representations from old representations ffl computational trick ffl Kernels for NLP structures ffl

More information

Neural Networks and Ensemble Methods for Classification

Neural Networks and Ensemble Methods for Classification Neural Networks and Ensemble Methods for Classification NEURAL NETWORKS 2 Neural Networks A neural network is a set of connected input/output units (neurons) where each connection has a weight associated

More information

Liangjie Hong, Ph.D. Candidate Dept. of Computer Science and Engineering Lehigh University Bethlehem, PA

Liangjie Hong, Ph.D. Candidate Dept. of Computer Science and Engineering Lehigh University Bethlehem, PA Rutgers, The State University of New Jersey Nov. 12, 2012 Liangjie Hong, Ph.D. Candidate Dept. of Computer Science and Engineering Lehigh University Bethlehem, PA Motivation Modeling Social Streams Future

More information

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated Fall 3 Computer Vision Overview of Statistical Tools Statistical Inference Haibin Ling Observation inference Decision Prior knowledge http://www.dabi.temple.edu/~hbling/teaching/3f_5543/index.html Bayesian

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Nonparametric Bayesian Models --Learning/Reasoning in Open Possible Worlds Eric Xing Lecture 7, August 4, 2009 Reading: Eric Xing Eric Xing @ CMU, 2006-2009 Clustering Eric Xing

More information

Session 3A: Markov chain Monte Carlo (MCMC)

Session 3A: Markov chain Monte Carlo (MCMC) Session 3A: Markov chain Monte Carlo (MCMC) John Geweke Bayesian Econometrics and its Applications August 15, 2012 ohn Geweke Bayesian Econometrics and its Session Applications 3A: Markov () chain Monte

More information

Data Warehousing & Data Mining

Data Warehousing & Data Mining 13. Meta-Algorithms for Classification Data Warehousing & Data Mining Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 13.

More information

Unsupervised machine learning

Unsupervised machine learning Chapter 9 Unsupervised machine learning Unsupervised machine learning (a.k.a. cluster analysis) is a set of methods to assign objects into clusters under a predefined distance measure when class labels

More information

Markov Networks. l Like Bayes Nets. l Graph model that describes joint probability distribution using tables (AKA potentials)

Markov Networks. l Like Bayes Nets. l Graph model that describes joint probability distribution using tables (AKA potentials) Markov Networks l Like Bayes Nets l Graph model that describes joint probability distribution using tables (AKA potentials) l Nodes are random variables l Labels are outcomes over the variables Markov

More information

A Supertag-Context Model for Weakly-Supervised CCG Parser Learning

A Supertag-Context Model for Weakly-Supervised CCG Parser Learning A Supertag-Context Model for Weakly-Supervised CCG Parser Learning Dan Garrette Chris Dyer Jason Baldridge Noah A. Smith U. Washington CMU UT-Austin CMU Contributions 1. A new generative model for learning

More information

Probabilistic Context Free Grammars. Many slides from Michael Collins

Probabilistic Context Free Grammars. Many slides from Michael Collins Probabilistic Context Free Grammars Many slides from Michael Collins Overview I Probabilistic Context-Free Grammars (PCFGs) I The CKY Algorithm for parsing with PCFGs A Probabilistic Context-Free Grammar

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 8 Continuous Latent Variable

More information

Algorithms for NLP. Classification II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley

Algorithms for NLP. Classification II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Algorithms for NLP Classification II Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Minimize Training Error? A loss function declares how costly each mistake is E.g. 0 loss for correct label,

More information

Margin-based Decomposed Amortized Inference

Margin-based Decomposed Amortized Inference Margin-based Decomposed Amortized Inference Gourab Kundu and Vivek Srikumar and Dan Roth University of Illinois, Urbana-Champaign Urbana, IL. 61801 {kundu2, vsrikum2, danr}@illinois.edu Abstract Given

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:

More information

Markov Precision: Modelling User Behaviour over Rank and Time

Markov Precision: Modelling User Behaviour over Rank and Time Markov Precision: Modelling User Behaviour over Rank and Time Marco Ferrante 1 Nicola Ferro 2 Maria Maistro 2 1 Dept. of Mathematics, University of Padua, Italy ferrante@math.unipd.it 2 Dept. of Information

More information

Hidden Markov Models

Hidden Markov Models 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 22 April 2, 2018 1 Reminders Homework

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

Spatial Role Labeling CS365 Course Project

Spatial Role Labeling CS365 Course Project Spatial Role Labeling CS365 Course Project Amit Kumar, akkumar@iitk.ac.in Chandra Sekhar, gchandra@iitk.ac.in Supervisor : Dr.Amitabha Mukerjee ABSTRACT In natural language processing one of the important

More information

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016 AM 1: Advanced Optimization Spring 016 Prof. Yaron Singer Lecture 11 March 3rd 1 Overview In this lecture we will introduce the notion of online convex optimization. This is an extremely useful framework

More information

Natural Language Processing

Natural Language Processing SFU NatLangLab Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class Simon Fraser University October 9, 2018 0 Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class

More information

A fast and simple algorithm for training neural probabilistic language models

A fast and simple algorithm for training neural probabilistic language models A fast and simple algorithm for training neural probabilistic language models Andriy Mnih Joint work with Yee Whye Teh Gatsby Computational Neuroscience Unit University College London 25 January 2013 1

More information

Lecture 15. Probabilistic Models on Graph

Lecture 15. Probabilistic Models on Graph Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how

More information

Natural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley

Natural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley Natural Language Processing Classification Classification I Dan Klein UC Berkeley Classification Automatically make a decision about inputs Example: document category Example: image of digit digit Example:

More information

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute

More information

CORE: Context-Aware Open Relation Extraction with Factorization Machines. Fabio Petroni

CORE: Context-Aware Open Relation Extraction with Factorization Machines. Fabio Petroni CORE: Context-Aware Open Relation Extraction with Factorization Machines Fabio Petroni Luciano Del Corro Rainer Gemulla Open relation extraction Open relation extraction is the task of extracting new facts

More information

Learning MN Parameters with Alternative Objective Functions. Sargur Srihari

Learning MN Parameters with Alternative Objective Functions. Sargur Srihari Learning MN Parameters with Alternative Objective Functions Sargur srihari@cedar.buffalo.edu 1 Topics Max Likelihood & Contrastive Objectives Contrastive Objective Learning Methods Pseudo-likelihood Gradient

More information

Crowdsourcing via Tensor Augmentation and Completion (TAC)

Crowdsourcing via Tensor Augmentation and Completion (TAC) Crowdsourcing via Tensor Augmentation and Completion (TAC) Presenter: Yao Zhou joint work with: Dr. Jingrui He - 1 - Roadmap Background Related work Crowdsourcing based on TAC Experimental results Conclusion

More information

Yahoo! Labs Nov. 1 st, Liangjie Hong, Ph.D. Candidate Dept. of Computer Science and Engineering Lehigh University

Yahoo! Labs Nov. 1 st, Liangjie Hong, Ph.D. Candidate Dept. of Computer Science and Engineering Lehigh University Yahoo! Labs Nov. 1 st, 2012 Liangjie Hong, Ph.D. Candidate Dept. of Computer Science and Engineering Lehigh University Motivation Modeling Social Streams Future work Motivation Modeling Social Streams

More information

Intelligent Systems:

Intelligent Systems: Intelligent Systems: Undirected Graphical models (Factor Graphs) (2 lectures) Carsten Rother 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM Roadmap for next two lectures Definition

More information

Spatial Normalized Gamma Process

Spatial Normalized Gamma Process Spatial Normalized Gamma Process Vinayak Rao Yee Whye Teh Presented at NIPS 2009 Discussion and Slides by Eric Wang June 23, 2010 Outline Introduction Motivation The Gamma Process Spatial Normalized Gamma

More information

A Support Vector Method for Multivariate Performance Measures

A Support Vector Method for Multivariate Performance Measures A Support Vector Method for Multivariate Performance Measures Thorsten Joachims Cornell University Department of Computer Science Thanks to Rich Caruana, Alexandru Niculescu-Mizil, Pierre Dupont, Jérôme

More information

Preference-Based Rank Elicitation using Statistical Models: The Case of Mallows

Preference-Based Rank Elicitation using Statistical Models: The Case of Mallows Preference-Based Rank Elicitation using Statistical Models: The Case of Mallows Robert Busa-Fekete 1 Eyke Huellermeier 2 Balzs Szrnyi 3 1 MTA-SZTE Research Group on Artificial Intelligence, Tisza Lajos

More information

Markov Networks. l Like Bayes Nets. l Graphical model that describes joint probability distribution using tables (AKA potentials)

Markov Networks. l Like Bayes Nets. l Graphical model that describes joint probability distribution using tables (AKA potentials) Markov Networks l Like Bayes Nets l Graphical model that describes joint probability distribution using tables (AKA potentials) l Nodes are random variables l Labels are outcomes over the variables Markov

More information

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015 Sequence Modelling with Features: Linear-Chain Conditional Random Fields COMP-599 Oct 6, 2015 Announcement A2 is out. Due Oct 20 at 1pm. 2 Outline Hidden Markov models: shortcomings Generative vs. discriminative

More information

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI Generative and Discriminative Approaches to Graphical Models CMSC 35900 Topics in AI Lecture 2 Yasemin Altun January 26, 2007 Review of Inference on Graphical Models Elimination algorithm finds single

More information

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science Natural Language Processing CS 6840 Lecture 06 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Statistical Parsing Define a probabilistic model of syntax P(T S):

More information

Conditional Random Fields

Conditional Random Fields Conditional Random Fields Micha Elsner February 14, 2013 2 Sums of logs Issue: computing α forward probabilities can undeflow Normally we d fix this using logs But α requires a sum of probabilities Not

More information

PAC Generalization Bounds for Co-training

PAC Generalization Bounds for Co-training PAC Generalization Bounds for Co-training Sanjoy Dasgupta AT&T Labs Research dasgupta@research.att.com Michael L. Littman AT&T Labs Research mlittman@research.att.com David McAllester AT&T Labs Research

More information

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other

More information

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other

More information

Computational Statistics and Data Analysis. Mixtures of weighted distance-based models for ranking data with applications in political studies

Computational Statistics and Data Analysis. Mixtures of weighted distance-based models for ranking data with applications in political studies Computational Statistics and Data Analysis 56 (2012) 2486 2500 Contents lists available at SciVerse ScienceDirect Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Spatial Bayesian Nonparametrics for Natural Image Segmentation

Spatial Bayesian Nonparametrics for Natural Image Segmentation Spatial Bayesian Nonparametrics for Natural Image Segmentation Erik Sudderth Brown University Joint work with Michael Jordan University of California Soumya Ghosh Brown University Parsing Visual Scenes

More information

Machine Learning for natural language processing

Machine Learning for natural language processing Machine Learning for natural language processing Classification: Maximum Entropy Models Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 24 Introduction Classification = supervised

More information

Lecture 12: Algorithms for HMMs

Lecture 12: Algorithms for HMMs Lecture 12: Algorithms for HMMs Nathan Schneider (some slides from Sharon Goldwater; thanks to Jonathan May for bug fixes) ENLP 26 February 2018 Recap: tagging POS tagging is a sequence labelling task.

More information

Day 6: Classification and Machine Learning

Day 6: Classification and Machine Learning Day 6: Classification and Machine Learning Kenneth Benoit Essex Summer School 2014 July 30, 2013 Today s Road Map The Naive Bayes Classifier The k-nearest Neighbour Classifier Support Vector Machines (SVMs)

More information

Personal Project: Shift-Reduce Dependency Parsing

Personal Project: Shift-Reduce Dependency Parsing Personal Project: Shift-Reduce Dependency Parsing 1 Problem Statement The goal of this project is to implement a shift-reduce dependency parser. This entails two subgoals: Inference: We must have a shift-reduce

More information

Dynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji

Dynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji Dynamic Data Modeling, Recognition, and Synthesis Rui Zhao Thesis Defense Advisor: Professor Qiang Ji Contents Introduction Related Work Dynamic Data Modeling & Analysis Temporal localization Insufficient

More information

Active Learning from Single and Multiple Annotators

Active Learning from Single and Multiple Annotators Active Learning from Single and Multiple Annotators Kamalika Chaudhuri University of California, San Diego Joint work with Chicheng Zhang Classification Given: ( xi, yi ) Vector of features Discrete Labels

More information

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification

More information

Time-Sensitive Dirichlet Process Mixture Models

Time-Sensitive Dirichlet Process Mixture Models Time-Sensitive Dirichlet Process Mixture Models Xiaojin Zhu Zoubin Ghahramani John Lafferty May 25 CMU-CALD-5-4 School of Computer Science Carnegie Mellon University Pittsburgh, PA 523 Abstract We introduce

More information

Probabilistic Graphical Models for Image Analysis - Lecture 1

Probabilistic Graphical Models for Image Analysis - Lecture 1 Probabilistic Graphical Models for Image Analysis - Lecture 1 Alexey Gronskiy, Stefan Bauer 21 September 2018 Max Planck ETH Center for Learning Systems Overview 1. Motivation - Why Graphical Models 2.

More information

Topic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up

Topic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up Much of this material is adapted from Blei 2003. Many of the images were taken from the Internet February 20, 2014 Suppose we have a large number of books. Each is about several unknown topics. How can

More information

Ensemble Methods. NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan

Ensemble Methods. NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan Ensemble Methods NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan How do you make a decision? What do you want for lunch today?! What did you have last night?! What are your favorite

More information

Kernel Density Topic Models: Visual Topics Without Visual Words

Kernel Density Topic Models: Visual Topics Without Visual Words Kernel Density Topic Models: Visual Topics Without Visual Words Konstantinos Rematas K.U. Leuven ESAT-iMinds krematas@esat.kuleuven.be Mario Fritz Max Planck Institute for Informatics mfrtiz@mpi-inf.mpg.de

More information

Lecture 13: Structured Prediction

Lecture 13: Structured Prediction Lecture 13: Structured Prediction Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/nlp16 CS6501: NLP 1 Quiz 2 v Lectures 9-13 v Lecture 12: before page

More information