Unsupervised Rank Aggregation with Distance-Based Models
|
|
- Katherine Clark
- 5 years ago
- Views:
Transcription
1 Unsupervised Rank Aggregation with Distance-Based Models Kevin Small Tufts University Collaborators: Alex Klementiev (Johns Hopkins University) Ivan Titov (Saarland University) Dan Roth (University of Illinois)
2 Motivation Consider a panel of judges Each independently generates (partial) rankings over objects to the best of their ability Query The need to meaningfully aggregate their output is a fundamental problem Applications are plentiful in Information Retrieval and Natural Language Processing 2
3 Multilingual Named Entity Discovery Named Entity Discovery [Klementiev & Roth, ACL 06]: given a bilingual corpus one side of which is annotated with Named Entities, find their counterparts in the other guimaraes Candidate r1 r2 r3 r NEs are often transliterated: rank according to a transliteration model score NEs tend to co-occur across languages: rank according to temporal alignment NEs tend to co-occur in similar contexts: rank according to contextual similarity NEs tend to co-occur in similar topics: rank according to topic similarity etc. 3
4 Overview of Our Approach We propose a formal framework for unsupervised structured label aggregation Judges independently generate a (partial) labeling attempting to reproduce the true underlying label based on their expertise in a given domain We derive an EM-based algorithm treating the votes of individual judges and the true label as the observed and unobserved data, respectively Intuition: experts in a given domain are better at generating votes close to true ranking and will tend to agree with each other, while the non-experts will not We instantiate the framework for the cases of combining permutations, combining top-k lists, and combining dependency parses. 4
5 Notation Permutation over n objects x 1 x n e = (1,2,...,n) is the identity permutation Set S n of all n! permutations Distance d : S n S n R + between permutations E.g. Kendall s tau distance: minimum number of adjacent transpositions d is assumed to be invariant to arbitrary re-labeling of the n objects 1 4 d K (, ) = 3 d(, ) = d(e, ) = D( ). If is a r.v., so is D=D( ) d K (, ) = d K (, ) = D K ( ) =
6 Background: Mallows Models Uniform when = 0 Peaky when is large Expensive to compute where σ θ R, θ 0 is the dispersion parameter σ S n is the location parameter d(.,. ) right-invariant, so Z(θ, σ) does not depend on If D can be decomposed D(π) = m i=1 V i(π) where V i are indep. r.v. s, then E θ (D) may be efficient to compute [Fligner and Verducci 86] 6
7 Generative Story for Aggregation p( ) Generate the true according to prior p( ) p( 1 1, ) p( 2 2, ) p( K K, ) 1 2 K Draw 1 K independently from K Mallows models p( i i, ), with the same location parameter K p(π, σ θ) =p(π) p(σ i θ i,π) i=1 7
8 Background: Extended Mallows Models The associated conditional model (when votes of K judges σ Sn K are available) proposed in [Lebanon and Lafferty 02]: where Free parameters θ R K, θ 0 represent the degree of expertise of individual judges. It is straightforward to generalize both models to partial rankings by constructing appropriate distance functions 8
9 Outline Motivation Problem Statement Overview of our approach Background Mallows models / Extended Mallows models Unsupervised Learning and Inference Incorporating domain-specific expertise Instantiations of the framework Combining permutations / top-k lists Experiments Dependency Parsing Conclusions Introduction and background Our contribution 9
10 Our Approach [ICML 2008] We propose a formal framework for unsupervised rank aggregation based on the extended Mallows model formalism We derive an EM-based algorithm to estimate model parameters Observed data: votes of individual judges Unobserved data: true ranking Judge 1 Judge 2 Judge K (1) (1) 1 2 (1) K (1) (2) (2) 1 2 (2) K (2) Q (Q) (Q) 1 2 (Q) K (Q) 10
11 Learning Denoting to be the value of parameters from the previous iteration, the M step for the i th ranker is: θ LHS RHS In general, > n! computations Average distance between > (n!) votes Q computations Marginal of the unobserved of the i th ranker and π (1..Q) data π (1..Q) 11
12 Learning and Inference LHS RHS Learning (estimating ) For K constituent rankers, repeat: Estimate the RHS given current parameter values Sample with Metropolis-Hastings Depends on structure type, Or use heuristics more about this later Solve the LHS to update Efficient estimation can be done for particular types of distance functions Inference (computing the most likely ranking) Sample with Metropolis-Hastings or use heuristics 12
13 Domain-specific expertise? [IJCAI 2009] Relative expertise may not stay the same May depend on the type of objects May depend on the type of query Query Typically, ranked supervised data to estimate judges expertise is very expensive to obtain Especially for multiple types 13
14 Mallows Models with Domain-Specific Expertise The associated conditional model (when votes of K judges σ Sn K are available) can be derived: p(π, t σ, θ, α) =α t exp ( K ) i=1 θ t,i d(π, σ i ) Z(θ, σ) Free parameters θ R T K, θ 0 represent the degree of expertise of individual judges. α R T are the mixture weights. Note: it is straightforward to generalize these models to other structured labels (e.g. partial rankings) by constructing appropriate distance functions 14
15 Learning α t = 1 Q E θt,i (D) = Q j=1 1 α t Q p(π (j),t σ (j), θ, α ) π (j) S n 1 Q d(π (j),σ (j) i )p(π (j),t σ (j), θ, α ) π (j) S n 3 j=1 2 For each of i th ranker and t th type: Estimate (1) and (2) given current parameter values θ and α α t Solve 3 to update Repeat E θt,i (D) θ t,i 15
16 Instantiating the Framework We have not committed to a particular type of structure In order to instantiate the framework: Design a distance function appropriate for the setting If a function if right invariant and decomposable [LHS] estimation can be done quickly Design a sampling procedure for learning [RHS] and inference 16
17 Case 1: Combining Permutations [LHS] Kendall tau distance D K is the minimum number of adjacent transpositions needed to transform one permutation into another Can be decomposed into a sum of independent random variables: D K (π) = n 1 i=1 V i (π) V i (π) = where j>i I(π 1 (i) π 1 (j)) V i And the expected value can be shown to be: E θ (D K )= neθ n 1 e θ j=1 je θj 1 e θj Monotonically decreasing, can find with line search quickly 17
18 Case 1: Combining Permutations [RHS] Sampling from the base chain of random transpositions Start with a random permutation If chain is at π, randomly transpose two objects forming π If a = p(π θ, σ)/p(π θ, σ) 1 chain moves to π Else, chain moves to π with probability a Note that we can compute distance incrementally, i.e. add the change due to a single transposition Convergence n log(n) if d is Cayley s distance [Diaconis 98], likely similar for some others No convergence results for general case, but it works well in practice 18
19 Case 1: Combining Permutations [RHS] An alternative heuristic: weighted Borda count, i.e. Linearly combine ranks of each object and argsort Model parameters represent relative expertise, so it makes sense to weigh rankers as w i = e ( θ i) w 1 + w 2 + w K argsort 19
20 Case 2: Combining Top-k [LHS] We extend Kendall tau to top-k r grey boxes z white boxes r + z = k D K ( π) = k i=1 π 1 (i) / Z Ũ i ( π)+ r(r +1) 2 + k i=1 π 1 (i) Z Ṽ i ( π) Bring grey boxes to bottom Switch with objects in (k+1) Kendall s tau for the k elements 20
21 Case 2: Combining Top-k [LHS & RHS] R.v. s and are independent, we can use the same trick to show that [LHS] is: E θ ( D K )= k keθ 1 e θ j=r+1 je jθ r(r +1) + 1 ejθ 2 e θ(z+1) r(z +1) 1 e θ(z+1) Also monotonically decreasing, can again use line search Both and reduce to Kendall tau results when same elements are ranked in both lists, i.e. r = 0 Sampling / heuristics for [RHS] and inference are similar to the permutation case 21
22 Outline Motivation Problem Statement Overview of our approach Background Mallows models / Extended Mallows models Unsupervised Learning and Inference Incorporating domain-specific expertise Instantiations of the framework Combining permutations / top-k lists Experiments Dependency Parsing Conclusions Introduction and background Our contribution 22
23 Exp. 1 Combining permutations Using sampling to estimate the RHS Judges: K = 10 (Mallows models) Objects: n = 30 Q = 10 sets of votes Using true rankings to evaluate the RHS Average D k to true permutation Using weighted (e θ i ) Borda heuristic to estimate the RHS EM Iteration Sampling Weighted True 23
24 Exp. 2 Meta-search dispersion parameters Judges: K = 4 search engines (S1, S2, S3, S4) Documents: Top k = 100 Queries: Q = 50 queries Define Mean Reciprocal Page Rank (MRPR): mean rank of the page containing the correct document Our model gets 0.92 Model parameters correspond to ranker quality 24
25 Exp. 3 Top-k rankings: robustness to noise Judges: K = 38 TREC-3 ad-hoc retrieval shared task participants Documents: Top k = 100 documents Queries: Q = 50 queries Replaced K r [0,K] randomly chosen participants with random rankers. Baseline: rank objects according to score: K CombMNZ rank = N x (k r i (x, q)) where r i (x, q) is the rank of x returned by i for query q, and is the number of participants with x in top-k i=1 N x 25
26 Exp. 3 Top-k rankings: robustness to noise Precision Learn to discard random rankers without supervision Aggregation (Top-10) 0.2 CombMNZ rank (Top-10) Aggregation (Top-30) CombMNZ rank (Top-30) Number of random rankers K r 26
27 Outline Motivation Problem Statement Overview of our approach Background Mallows models / Extended Mallows models Unsupervised Learning and Inference Incorporating domain-specific expertise Instantiations of the framework Combining permutations / top-k lists Experiments Dependency Parsing Conclusions Introduction and background Our contribution 27
28 Dependency Parses P PMOD ROOT NMOD SBJ ROOT ADV AMOD NMOD NMOD PMOD P SBJ ADV AMOD NMOD v ROOT 0 Buyers 1 v(1) stepped in to the futures pit v(2) v(3) v(4) v(5) v(6) v(7) v(8) Let v(i) denote a pair of head offset and label, i.e. a link ROOT 0 y(1) y(2) y(3) y(4) y(5) y(6) y(7) y(8) Buyers stepped in to the futures pit SBJ ADV PMOD NMOD SBJ ROOT ADV PMOD NMOD NMOD PMOD ROOT ROOT NMOD PMOD ROOT y Labeled attachment score is link accuracy, trivially computed from Hamming distance: d H (v, y) = n v(i) y(i) i=1 28
29 Dependency Parses: Parameter Estimation Parameter estimation of the type-agnostic model can be done directly Let us assume there are exactly S possibilities for each link, and that j th Q (of Q) sentences has n (j) words, j=1 n(j) = N. On each round of training the learning procedure for the type-agnostic model is equivalent to: R i = 1 N θ i = log R i log(1 R i ) log ( S 1) Q j=1 n (j) l=1 ( v S v y(j) i,(l) exp K i=1 θ i ) v y(j) i,(l) ( v S exp K ) i=1 θ i v y(j) i,(l) With small S, parameter estimation can be done quickly! 29
30 Dependency Parses: Aggregation Dependency parsers are CoNLL-2007 shared task participants 10 languages: Arabic, Basque, Catalan, Chinese, Czech, English, Greek, Hungarian, Italian, and Turkish 131 to 690 sentences and 4513 to 5390 words, depending on the language Between 20 and 23 systems, depending on the language Varied the number of participants attempting to represent expertise in the entire pool Group 1 Group 2 Group K Accuracy Baseline: majority vote on each link (ties broken randomly) 30
31 Predicting Relative Performance Estimation of relative expertise correlates with true relative expertise Results for Italian Labeled Attachment Participant True Rank True Performance sagae@is.s.u-tokyo.ac.jp nakagawa378@oki.com johan.hall@vxu.se carreras@csail.mit.edu chenwl@nict.go.jp attardi@di.unipi.it xyduan@nlpr.ia.ac.cn ivan.titov@cui.unige.ch dasmith@jhu.edu michael.schiehlen@ims.uni-stuttgart.de bcbb@db.csie.ncu.edu.tw prashanth@research.iiit.ac.in richard@cs.lth.se nguyenml@jaist.ac.jp joyce840205@gmail.com s.v.m.canisius@uvt.nl francis.maes@lip6.fr zeman@ufal.mff.cuni.cz svetoslav.marinov@his.se
32 Dependency Parses: Aggregation Performance measured as average accuracy over 10 languages 83 Average Labeled Attachment Score Largest improvement when fewer experts (higher practical significance) Number of good experts grows: voted baseline is harder to beat 79 Voted Baseline Aggregation Model Number of Participants 32
33 Conclusions Propose a formal mathematical and algorithmic framework for aggregating (partial) structured labels without supervision Show that learning can be made efficient for decomposable distance functions Instantiate the framework for combining permutations, combining top-k lists, and dependency parses Introduce a novel distance function for top-k and dependency parses 33
34 Thanks! 34
A Framework for Unsupervised Rank Aggregation
SIGIR 08 LR4IR Workshop A Framework for Unsupervised Rank Aggregation Alexandre Klementiev, Dan Roth, and Kevin Small Department of Computer Science University of Illinois at Urbana-Champaign Urbana, IL
More informationUnsupervised Ensemble Learning for Mining Top-n Outliers
Unsupervised Ensemble Learning for Mining Top-n Outliers Jun Gao 1,WeimingHu 1, Zhongfei(Mark) Zhang 2,andOuWu 1 1 National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of
More informationPart of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015
Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch COMP-599 Oct 1, 2015 Announcements Research skills workshop today 3pm-4:30pm Schulich Library room 313 Start thinking about
More informationComputational Linguistics
Computational Linguistics Dependency-based Parsing Clayton Greenberg Stefan Thater FR 4.7 Allgemeine Linguistik (Computerlinguistik) Universität des Saarlandes Summer 2016 Acknowledgements These slides
More informationA Hierarchical Bayesian Model for Unsupervised Induction of Script Knowledge
A Hierarchical Bayesian Model for Unsupervised Induction of Script Knowledge Lea Frermann Ivan Titov Manfred Pinkal April, 28th 2014 1 / 22 Contents 1 Introduction 2 Technical Background 3 The Script Model
More informationProbabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov
Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly
More informationComputational Linguistics. Acknowledgements. Phrase-Structure Trees. Dependency-based Parsing
Computational Linguistics Dependency-based Parsing Dietrich Klakow & Stefan Thater FR 4.7 Allgemeine Linguistik (Computerlinguistik) Universität des Saarlandes Summer 2013 Acknowledgements These slides
More informationSpectral Unsupervised Parsing with Additive Tree Metrics
Spectral Unsupervised Parsing with Additive Tree Metrics Ankur Parikh, Shay Cohen, Eric P. Xing Carnegie Mellon, University of Edinburgh Ankur Parikh 2014 1 Overview Model: We present a novel approach
More informationDriving Semantic Parsing from the World s Response
Driving Semantic Parsing from the World s Response James Clarke, Dan Goldwasser, Ming-Wei Chang, Dan Roth Cognitive Computation Group University of Illinois at Urbana-Champaign CoNLL 2010 Clarke, Goldwasser,
More informationFrom Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018
From Binary to Multiclass Classification CS 6961: Structured Prediction Spring 2018 1 So far: Binary Classification We have seen linear models Learning algorithms Perceptron SVM Logistic Regression Prediction
More informationLab 12: Structured Prediction
December 4, 2014 Lecture plan structured perceptron application: confused messages application: dependency parsing structured SVM Class review: from modelization to classification What does learning mean?
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationUnsupervised Learning with Permuted Data
Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University
More informationLatent Variable Models in NLP
Latent Variable Models in NLP Aria Haghighi with Slav Petrov, John DeNero, and Dan Klein UC Berkeley, CS Division Latent Variable Models Latent Variable Models Latent Variable Models Observed Latent Variable
More informationStructured Prediction Models via the Matrix-Tree Theorem
Structured Prediction Models via the Matrix-Tree Theorem Terry Koo Amir Globerson Xavier Carreras Michael Collins maestro@csail.mit.edu gamir@csail.mit.edu carreras@csail.mit.edu mcollins@csail.mit.edu
More informationCS281 Section 4: Factor Analysis and PCA
CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we
More informationMachine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /
Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Combining Classifiers Empirical view Theoretical
More informationFigure A1: Treatment and control locations. West. Mid. 14 th Street. 18 th Street. 16 th Street. 17 th Street. 20 th Street
A Online Appendix A.1 Additional figures and tables Figure A1: Treatment and control locations Treatment Workers 19 th Control Workers Principal West Mid East 20 th 18 th 17 th 16 th 15 th 14 th Notes:
More informationCS 188: Artificial Intelligence. Outline
CS 188: Artificial Intelligence Lecture 21: Perceptrons Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein. Outline Generative vs. Discriminative Binary Linear Classifiers Perceptron Multi-class
More informationQuery Performance Prediction: Evaluation Contrasted with Effectiveness
Query Performance Prediction: Evaluation Contrasted with Effectiveness Claudia Hauff 1, Leif Azzopardi 2, Djoerd Hiemstra 1, and Franciska de Jong 1 1 University of Twente, Enschede, the Netherlands {c.hauff,
More informationLow-Dimensional Discriminative Reranking. Jagadeesh Jagarlamudi and Hal Daume III University of Maryland, College Park
Low-Dimensional Discriminative Reranking Jagadeesh Jagarlamudi and Hal Daume III University of Maryland, College Park Discriminative Reranking Useful for many NLP tasks Enables us to use arbitrary features
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More informationAlgorithm-Independent Learning Issues
Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning
More informationEfficient Cryptanalysis of Homophonic Substitution Ciphers
Efficient Cryptanalysis of Homophonic Substitution Ciphers Amrapali Dhavare Richard M. Low Mark Stamp Abstract Substitution ciphers are among the earliest methods of encryption. Examples of classic substitution
More informationLinear Classifiers IV
Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers IV Blaine Nelson, Tobias Scheffer Contents Classification Problem Bayesian Classifier Decision Linear Classifiers, MAP Models Logistic
More informationTopic Models. Charles Elkan November 20, 2008
Topic Models Charles Elan elan@cs.ucsd.edu November 20, 2008 Suppose that we have a collection of documents, and we want to find an organization for these, i.e. we want to do unsupervised learning. One
More informationQuasi-Synchronous Phrase Dependency Grammars for Machine Translation. lti
Quasi-Synchronous Phrase Dependency Grammars for Machine Translation Kevin Gimpel Noah A. Smith 1 Introduction MT using dependency grammars on phrases Phrases capture local reordering and idiomatic translations
More informationInformation Retrieval and Organisation
Information Retrieval and Organisation Chapter 13 Text Classification and Naïve Bayes Dell Zhang Birkbeck, University of London Motivation Relevance Feedback revisited The user marks a number of documents
More informationTopic Models and Applications to Short Documents
Topic Models and Applications to Short Documents Dieu-Thu Le Email: dieuthu.le@unitn.it Trento University April 6, 2011 1 / 43 Outline Introduction Latent Dirichlet Allocation Gibbs Sampling Short Text
More informationOutline: Ensemble Learning. Ensemble Learning. The Wisdom of Crowds. The Wisdom of Crowds - Really? Crowd wiser than any individual
Outline: Ensemble Learning We will describe and investigate algorithms to Ensemble Learning Lecture 10, DD2431 Machine Learning A. Maki, J. Sullivan October 2014 train weak classifiers/regressors and how
More informationStatistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.
http://goo.gl/jv7vj9 Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT
More informationTuning as Linear Regression
Tuning as Linear Regression Marzieh Bazrafshan, Tagyoung Chung and Daniel Gildea Department of Computer Science University of Rochester Rochester, NY 14627 Abstract We propose a tuning method for statistical
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov
More informationA Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister
A Syntax-based Statistical Machine Translation Model Alexander Friedl, Georg Teichtmeister 4.12.2006 Introduction The model Experiment Conclusion Statistical Translation Model (STM): - mathematical model
More informationExpectation Maximization (EM)
Expectation Maximization (EM) The Expectation Maximization (EM) algorithm is one approach to unsupervised, semi-supervised, or lightly supervised learning. In this kind of learning either no labels are
More informationMulticlass Classification-1
CS 446 Machine Learning Fall 2016 Oct 27, 2016 Multiclass Classification Professor: Dan Roth Scribe: C. Cheng Overview Binary to multiclass Multiclass SVM Constraint classification 1 Introduction Multiclass
More informationMachine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?
Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity
More informationInformation Extraction from Text
Information Extraction from Text Jing Jiang Chapter 2 from Mining Text Data (2012) Presented by Andrew Landgraf, September 13, 2013 1 What is Information Extraction? Goal is to discover structured information
More informationStatistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.
http://goo.gl/xilnmn Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT
More informationLecture 13: More uses of Language Models
Lecture 13: More uses of Language Models William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 13 What we ll learn in this lecture Comparing documents, corpora using LM approaches
More informationP R + RQ P Q: Transliteration Mining Using Bridge Language
P R + RQ P Q: Transliteration Mining Using Bridge Language Mitesh M. Khapra Raghavendra Udupa n Institute of Technology Microsoft Research, Bombay, Bangalore, Powai, Mumbai 400076 raghavu@microsoft.com
More information6.891: Lecture 24 (December 8th, 2003) Kernel Methods
6.891: Lecture 24 (December 8th, 2003) Kernel Methods Overview ffl Recap: global linear models ffl New representations from old representations ffl computational trick ffl Kernels for NLP structures ffl
More informationNeural Networks and Ensemble Methods for Classification
Neural Networks and Ensemble Methods for Classification NEURAL NETWORKS 2 Neural Networks A neural network is a set of connected input/output units (neurons) where each connection has a weight associated
More informationLiangjie Hong, Ph.D. Candidate Dept. of Computer Science and Engineering Lehigh University Bethlehem, PA
Rutgers, The State University of New Jersey Nov. 12, 2012 Liangjie Hong, Ph.D. Candidate Dept. of Computer Science and Engineering Lehigh University Bethlehem, PA Motivation Modeling Social Streams Future
More informationOverview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated
Fall 3 Computer Vision Overview of Statistical Tools Statistical Inference Haibin Ling Observation inference Decision Prior knowledge http://www.dabi.temple.edu/~hbling/teaching/3f_5543/index.html Bayesian
More informationAdvanced Machine Learning
Advanced Machine Learning Nonparametric Bayesian Models --Learning/Reasoning in Open Possible Worlds Eric Xing Lecture 7, August 4, 2009 Reading: Eric Xing Eric Xing @ CMU, 2006-2009 Clustering Eric Xing
More informationSession 3A: Markov chain Monte Carlo (MCMC)
Session 3A: Markov chain Monte Carlo (MCMC) John Geweke Bayesian Econometrics and its Applications August 15, 2012 ohn Geweke Bayesian Econometrics and its Session Applications 3A: Markov () chain Monte
More informationData Warehousing & Data Mining
13. Meta-Algorithms for Classification Data Warehousing & Data Mining Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 13.
More informationUnsupervised machine learning
Chapter 9 Unsupervised machine learning Unsupervised machine learning (a.k.a. cluster analysis) is a set of methods to assign objects into clusters under a predefined distance measure when class labels
More informationMarkov Networks. l Like Bayes Nets. l Graph model that describes joint probability distribution using tables (AKA potentials)
Markov Networks l Like Bayes Nets l Graph model that describes joint probability distribution using tables (AKA potentials) l Nodes are random variables l Labels are outcomes over the variables Markov
More informationA Supertag-Context Model for Weakly-Supervised CCG Parser Learning
A Supertag-Context Model for Weakly-Supervised CCG Parser Learning Dan Garrette Chris Dyer Jason Baldridge Noah A. Smith U. Washington CMU UT-Austin CMU Contributions 1. A new generative model for learning
More informationProbabilistic Context Free Grammars. Many slides from Michael Collins
Probabilistic Context Free Grammars Many slides from Michael Collins Overview I Probabilistic Context-Free Grammars (PCFGs) I The CKY Algorithm for parsing with PCFGs A Probabilistic Context-Free Grammar
More informationUndirected Graphical Models
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 8 Continuous Latent Variable
More informationAlgorithms for NLP. Classification II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley
Algorithms for NLP Classification II Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Minimize Training Error? A loss function declares how costly each mistake is E.g. 0 loss for correct label,
More informationMargin-based Decomposed Amortized Inference
Margin-based Decomposed Amortized Inference Gourab Kundu and Vivek Srikumar and Dan Roth University of Illinois, Urbana-Champaign Urbana, IL. 61801 {kundu2, vsrikum2, danr}@illinois.edu Abstract Given
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationCOS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION
COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:
More informationMarkov Precision: Modelling User Behaviour over Rank and Time
Markov Precision: Modelling User Behaviour over Rank and Time Marco Ferrante 1 Nicola Ferro 2 Maria Maistro 2 1 Dept. of Mathematics, University of Padua, Italy ferrante@math.unipd.it 2 Dept. of Information
More informationHidden Markov Models
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 22 April 2, 2018 1 Reminders Homework
More informationPerformance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project
Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore
More informationSpatial Role Labeling CS365 Course Project
Spatial Role Labeling CS365 Course Project Amit Kumar, akkumar@iitk.ac.in Chandra Sekhar, gchandra@iitk.ac.in Supervisor : Dr.Amitabha Mukerjee ABSTRACT In natural language processing one of the important
More information1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016
AM 1: Advanced Optimization Spring 016 Prof. Yaron Singer Lecture 11 March 3rd 1 Overview In this lecture we will introduce the notion of online convex optimization. This is an extremely useful framework
More informationNatural Language Processing
SFU NatLangLab Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class Simon Fraser University October 9, 2018 0 Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class
More informationA fast and simple algorithm for training neural probabilistic language models
A fast and simple algorithm for training neural probabilistic language models Andriy Mnih Joint work with Yee Whye Teh Gatsby Computational Neuroscience Unit University College London 25 January 2013 1
More informationLecture 15. Probabilistic Models on Graph
Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how
More informationNatural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley
Natural Language Processing Classification Classification I Dan Klein UC Berkeley Classification Automatically make a decision about inputs Example: document category Example: image of digit digit Example:
More informationLearning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text
Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute
More informationCORE: Context-Aware Open Relation Extraction with Factorization Machines. Fabio Petroni
CORE: Context-Aware Open Relation Extraction with Factorization Machines Fabio Petroni Luciano Del Corro Rainer Gemulla Open relation extraction Open relation extraction is the task of extracting new facts
More informationLearning MN Parameters with Alternative Objective Functions. Sargur Srihari
Learning MN Parameters with Alternative Objective Functions Sargur srihari@cedar.buffalo.edu 1 Topics Max Likelihood & Contrastive Objectives Contrastive Objective Learning Methods Pseudo-likelihood Gradient
More informationCrowdsourcing via Tensor Augmentation and Completion (TAC)
Crowdsourcing via Tensor Augmentation and Completion (TAC) Presenter: Yao Zhou joint work with: Dr. Jingrui He - 1 - Roadmap Background Related work Crowdsourcing based on TAC Experimental results Conclusion
More informationYahoo! Labs Nov. 1 st, Liangjie Hong, Ph.D. Candidate Dept. of Computer Science and Engineering Lehigh University
Yahoo! Labs Nov. 1 st, 2012 Liangjie Hong, Ph.D. Candidate Dept. of Computer Science and Engineering Lehigh University Motivation Modeling Social Streams Future work Motivation Modeling Social Streams
More informationIntelligent Systems:
Intelligent Systems: Undirected Graphical models (Factor Graphs) (2 lectures) Carsten Rother 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM Roadmap for next two lectures Definition
More informationSpatial Normalized Gamma Process
Spatial Normalized Gamma Process Vinayak Rao Yee Whye Teh Presented at NIPS 2009 Discussion and Slides by Eric Wang June 23, 2010 Outline Introduction Motivation The Gamma Process Spatial Normalized Gamma
More informationA Support Vector Method for Multivariate Performance Measures
A Support Vector Method for Multivariate Performance Measures Thorsten Joachims Cornell University Department of Computer Science Thanks to Rich Caruana, Alexandru Niculescu-Mizil, Pierre Dupont, Jérôme
More informationPreference-Based Rank Elicitation using Statistical Models: The Case of Mallows
Preference-Based Rank Elicitation using Statistical Models: The Case of Mallows Robert Busa-Fekete 1 Eyke Huellermeier 2 Balzs Szrnyi 3 1 MTA-SZTE Research Group on Artificial Intelligence, Tisza Lajos
More informationMarkov Networks. l Like Bayes Nets. l Graphical model that describes joint probability distribution using tables (AKA potentials)
Markov Networks l Like Bayes Nets l Graphical model that describes joint probability distribution using tables (AKA potentials) l Nodes are random variables l Labels are outcomes over the variables Markov
More informationSequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015
Sequence Modelling with Features: Linear-Chain Conditional Random Fields COMP-599 Oct 6, 2015 Announcement A2 is out. Due Oct 20 at 1pm. 2 Outline Hidden Markov models: shortcomings Generative vs. discriminative
More informationGenerative and Discriminative Approaches to Graphical Models CMSC Topics in AI
Generative and Discriminative Approaches to Graphical Models CMSC 35900 Topics in AI Lecture 2 Yasemin Altun January 26, 2007 Review of Inference on Graphical Models Elimination algorithm finds single
More informationNatural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science
Natural Language Processing CS 6840 Lecture 06 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Statistical Parsing Define a probabilistic model of syntax P(T S):
More informationConditional Random Fields
Conditional Random Fields Micha Elsner February 14, 2013 2 Sums of logs Issue: computing α forward probabilities can undeflow Normally we d fix this using logs But α requires a sum of probabilities Not
More informationPAC Generalization Bounds for Co-training
PAC Generalization Bounds for Co-training Sanjoy Dasgupta AT&T Labs Research dasgupta@research.att.com Michael L. Littman AT&T Labs Research mlittman@research.att.com David McAllester AT&T Labs Research
More informationMaschinelle Sprachverarbeitung
Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other
More informationMaschinelle Sprachverarbeitung
Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other
More informationComputational Statistics and Data Analysis. Mixtures of weighted distance-based models for ranking data with applications in political studies
Computational Statistics and Data Analysis 56 (2012) 2486 2500 Contents lists available at SciVerse ScienceDirect Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationSpatial Bayesian Nonparametrics for Natural Image Segmentation
Spatial Bayesian Nonparametrics for Natural Image Segmentation Erik Sudderth Brown University Joint work with Michael Jordan University of California Soumya Ghosh Brown University Parsing Visual Scenes
More informationMachine Learning for natural language processing
Machine Learning for natural language processing Classification: Maximum Entropy Models Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 24 Introduction Classification = supervised
More informationLecture 12: Algorithms for HMMs
Lecture 12: Algorithms for HMMs Nathan Schneider (some slides from Sharon Goldwater; thanks to Jonathan May for bug fixes) ENLP 26 February 2018 Recap: tagging POS tagging is a sequence labelling task.
More informationDay 6: Classification and Machine Learning
Day 6: Classification and Machine Learning Kenneth Benoit Essex Summer School 2014 July 30, 2013 Today s Road Map The Naive Bayes Classifier The k-nearest Neighbour Classifier Support Vector Machines (SVMs)
More informationPersonal Project: Shift-Reduce Dependency Parsing
Personal Project: Shift-Reduce Dependency Parsing 1 Problem Statement The goal of this project is to implement a shift-reduce dependency parser. This entails two subgoals: Inference: We must have a shift-reduce
More informationDynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji
Dynamic Data Modeling, Recognition, and Synthesis Rui Zhao Thesis Defense Advisor: Professor Qiang Ji Contents Introduction Related Work Dynamic Data Modeling & Analysis Temporal localization Insufficient
More informationActive Learning from Single and Multiple Annotators
Active Learning from Single and Multiple Annotators Kamalika Chaudhuri University of California, San Diego Joint work with Chicheng Zhang Classification Given: ( xi, yi ) Vector of features Discrete Labels
More informationClustering. CSL465/603 - Fall 2016 Narayanan C Krishnan
Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification
More informationTime-Sensitive Dirichlet Process Mixture Models
Time-Sensitive Dirichlet Process Mixture Models Xiaojin Zhu Zoubin Ghahramani John Lafferty May 25 CMU-CALD-5-4 School of Computer Science Carnegie Mellon University Pittsburgh, PA 523 Abstract We introduce
More informationProbabilistic Graphical Models for Image Analysis - Lecture 1
Probabilistic Graphical Models for Image Analysis - Lecture 1 Alexey Gronskiy, Stefan Bauer 21 September 2018 Max Planck ETH Center for Learning Systems Overview 1. Motivation - Why Graphical Models 2.
More informationTopic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up
Much of this material is adapted from Blei 2003. Many of the images were taken from the Internet February 20, 2014 Suppose we have a large number of books. Each is about several unknown topics. How can
More informationEnsemble Methods. NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan
Ensemble Methods NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan How do you make a decision? What do you want for lunch today?! What did you have last night?! What are your favorite
More informationKernel Density Topic Models: Visual Topics Without Visual Words
Kernel Density Topic Models: Visual Topics Without Visual Words Konstantinos Rematas K.U. Leuven ESAT-iMinds krematas@esat.kuleuven.be Mario Fritz Max Planck Institute for Informatics mfrtiz@mpi-inf.mpg.de
More informationLecture 13: Structured Prediction
Lecture 13: Structured Prediction Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/nlp16 CS6501: NLP 1 Quiz 2 v Lectures 9-13 v Lecture 12: before page
More information