Multi-Task Structured Prediction for Entity Analysis: Search Based Learning Algorithms
|
|
- Preston Townsend
- 5 years ago
- Views:
Transcription
1 Multi-Task Structured Prediction for Entity Analysis: Search Based Learning Algorithms Chao Ma, Janardhan Rao Doppa, Prasad Tadepalli, Hamed Shahbazi, Xiaoli Fern Oregon State University Washington State University Nov. 17 th, 2017
2 Entity Analysis in Language Processing Many NLP tasks process mentions of entities things, people, organizations, etc. Named Entity Recognition Coreference Resolution Entity Linking Semantic Role Labeling Entity Relation Extraction
3 Coreference Resolution i = 1 i = 2 He left [Columbia] in 1983 with a BA degree,... after graduating from [Columbia University], he worked as a community organizer in Chicago
4 Coreference Resolution i = 1 i = 2 He left [Columbia] in 1983 with a BA degree,... after graduating from [Columbia University], he worked as a community organizer in Chicago Coreference: y i = {1, 2 i} co-referent link ycoref = ( Columbia, Columbia University, )
5 Coreference Resolution i = 1 i = 2 He left [Columbia] in 1983 with a BA degree,... after graduating from [Columbia University], he worked as a community organizer in Chicago Coreference: y i = {1, 2 i} co-referent link Columbia Columbia University Left-linking Tree formulation for coreference resolution: m 1 m 2 m 3 m 4 m 5 m 6 m 7 ycoref = (?,?,?,?,?,?,? )
6 Coreference Resolution i = 1 i = 2 He left [Columbia] in 1983 with a BA degree,... after graduating from [Columbia University], he worked as a community organizer in Chicago Coreference: y i = {1, 2 i} co-referent link Columbia Columbia University Left-linking Tree formulation for coreference resolution: m 1 m 2 m 3 m 4 m 5 m 6 m 7 ycoref = ( 1,?,?,?,?,?,? )
7 Coreference Resolution i = 1 i = 2 He left [Columbia] in 1983 with a BA degree,... after graduating from [Columbia University], he worked as a community organizer in Chicago Coreference: y i = {1, 2 i} co-referent link Columbia Columbia University Left-linking Tree formulation for coreference resolution: m 1 m 2 m 3 m 4 m 5 m 6 m 7 ycoref = ( 1, 1,?,?,?,?,? )
8 Coreference Resolution i = 1 i = 2 He left [Columbia] in 1983 with a BA degree,... after graduating from [Columbia University], he worked as a community organizer in Chicago Coreference: y i = {1, 2 i} co-referent link Columbia Columbia University Left-linking Tree formulation for coreference resolution: m 1 m 2 m 3 m 4 m 5 m 6 m 7 ycoref = ( 1, 1, 2,?,?,?,? )
9 Coreference Resolution i = 1 i = 2 He left [Columbia] in 1983 with a BA degree,... after graduating from [Columbia University], he worked as a community organizer in Chicago Coreference: y i = {1, 2 i} co-referent link Columbia Columbia University Left-linking Tree formulation for coreference resolution: m 1 m 2 m 3 m 4 m 5 m 6 m 7 ycoref = ( 1, 1, 2, 4,?,?,? )
10 Coreference Resolution i = 1 i = 2 He left [Columbia] in 1983 with a BA degree,... after graduating from [Columbia University], he worked as a community organizer in Chicago Coreference: y i = {1, 2 i} co-referent link Columbia Columbia University Left-linking Tree formulation for coreference resolution: m 1 m 2 m 3 m 4 m 5 m 6 m 7 ycoref = ( 1, 1, 2, 4, 5,?,? )
11 Coreference Resolution i = 1 i = 2 He left [Columbia] in 1983 with a BA degree,... after graduating from [Columbia University], he worked as a community organizer in Chicago Coreference: y i = {1, 2 i} co-referent link Columbia Columbia University Left-linking Tree formulation for coreference resolution: m 1 m 2 m 3 m 4 m 5 m 6 m 7 ycoref = ( 1, 1, 2, 4, 5, 6,? )
12 Coreference Resolution i = 1 i = 2 He left [Columbia] in 1983 with a BA degree,... after graduating from [Columbia University], he worked as a community organizer in Chicago Coreference: y i = {1, 2 i} co-referent link Columbia Columbia University Left-linking Tree formulation for coreference resolution: m 1 m 2 m 3 m 4 m 5 m 6 m 7 ycoref = ( 1, 1, 2, 4, 5, 6, 7 )
13 Coreference Resolution i = 1 i = 2 He left [Columbia] in 1983 with a BA degree,... after graduating from [Columbia University], he worked as a community organizer in Chicago Coreference: y i = {1, 2 i} co-referent link Columbia Columbia University Left-linking Tree formulation for coreference resolution: m 1 m 2 m 3 m 4 m 5 m 6 m 7 ycoref = ( 1, 1, 2, 4, 5, 6, 7 ) coreference clustering m 1, m 2, m 3 m 4, m 5 m 6 m 7
14 Coreference Resolution i = 1 i = 2 He left [Columbia] in 1983 with a BA degree,... after graduating from [Columbia University], he worked as a community organizer in Chicago Coreference: y i = {1, 2 i} co-referent link ycoref = ( Columbia, Columbia University, )
15 Named Entity Recognition i = 1 i = 2 He left [Columbia] in 1983 with a BA degree,... after graduating from [Columbia University], he worked as a community organizer in Chicago Coreference: y i = {1, 2 i} co-referent link ycoref = ( Columbia, Columbia University, ) Named Entity Recognition : yner = ( ORG, ORG, ) y i = {ORG, PER, GPE, LOC, FAC, VEL, WEA} Entity Linking: ylink = ( edia.org/wiki/c, ) y i = { olumbia_unive rsity } Columbia_University
16 Entity Linking i = 1 i = 2 He left [Columbia] in 1983 with a BA degree,... after graduating from [Columbia University], he worked as a community organizer in Chicago Coreference: y i = {1, 2 i} co-referent link ycoref = ( Columbia, Columbia University, ) Named Entity Recognition : yner = ( ORG, ORG, ) y i = {ORG, PER, GPE, LOC, FAC, VEL, WEA} Entity Linking: ylink = ( edia.org/wiki/c, ) y i = { olumbia_unive rsity } Columbia_University
17 Single Task Structured Prediction Typical (Single-Task) Structured Prediction: x Input Learning f (x, y) = w ϕ(x, y) f : X, Y R y Scoring function Feature Vector Output Inference ^y = argmax f (x, y) y Intractable in most cases Candidate Methods: Graphical models Structured Perceptron Structured SVM This Work Candidate Methods: Belief Propagation Integer Linear Programming (ILP) Beam Search
18 Structured SVM Learning with Search-based Inference
19 Multi-Task Structured Prediction Multi-Task Structured Prediction (MTSP): Intra-task Features Input x models f 1 : X Y1 = w 1 ϕ 1 (x, y) f 2 : X Y2 = w 2 ϕ 2 (x, y ) f 3 : X Y3 = w 3 ϕ 3 (x, y ) Output o How to exploit the interdependencies between tasks?
20 Multi-Task Structured Prediction Introduce Inter-task Features: Input x Intra-task Features models f 1 : X Y1 = w 1 ϕ 1 (x, y) f 2 : X Y2 = w 2 ϕ 2 (x, y ) f 3 : X Y3 = w 3 ϕ 3 (x, y ) ϕ (1,2) (x, y, y ) ϕ (2,3) (x, y, y ) Output ϕ (1,3) (x, y, y ) Inter-task Features
21 Pipeline Architecture Learning k (= 3) independent models, one after another; Models Predict Output Before Start: w 1 w 2 w (1,2) w 3 w (1,3) w (2,3) Define a order: Task 1 Task 2 Task 3
22 predict train Pipeline Architecture Learning k (= 3) independent models, one after another; Models Predict Output Before Start: w 1 w 2 w (1,2) w 3 w (1,3) w (2,3) Task 1: Use feature ϕ 1 (x, y) x SSVM Learner w 1
23 predict train predict train Pipeline Architecture Learning k (= 3) independent models, one after another; Models Predict Output Before Start: w 1 w 2 w (1,2) w 3 w (1,3) w (2,3) Task 1: Use feature ϕ 1 (x, y) x SSVM Learner w 1 Task 2: Use feature ϕ 2 (x,y), ϕ (1,2) (x,y,y ) SSVM Learner w 2 w (1,2)
24 predict train predict train predict train Pipeline Architecture Learning k (= 3) independent models, one after another; Models Predict Output Before Start: w 1 w 2 w (1,2) w 3 w (1,3) w (2,3) Task 1: Use feature ϕ 1 (x, y) x SSVM Learner w 1 Task 2: Use feature ϕ 2 (x,y), ϕ (1,2) (x,y,y ) SSVM Learner w 2 w (1,2) Task 3: Use feature ϕ 3 (x, y), ϕ (1,3) (x,y,y ) ϕ (2,3) (x,y,y ) SSVM Learner w 3 w (1,3) w (2,3)
25 Pipeline Performance Depends on Task Order Each group of bars represents one task. In each group, we show the accuracy when the task is placed at first (1 st bar), or at last (2 nd and 3 rd bar). The task performs better when it is placed last in order. There is no ordering that allows the pipeline to reach peak performance on all the three tasks.
26 predict train Joint Architecture Task 1 & 2 & 3: Use all features ϕ 1 (x,y), ϕ 2 (x,y), ϕ 3 (x,y), SSVM Learner ϕ (1,2) (x,y,y ), ϕ (1,3) (x,y,y ), ϕ (2,3) (x,y,y ) w 1 w 2 w 3 w (2,3) x w (1,2) w (1,3) predict ϕ = ϕ 1 (x,y) ㅇ ϕ 2 (x,y) ㅇ ϕ 3 (x,y) ㅇ ϕ (1,2) (x,y,y ) ㅇ ϕ (1,3) (x,y,y ) ㅇ ϕ (2,3) (x,y,y ) Vector concatenation Big Problem: Huge branching factor for search
27 Pruning A pruner is a classifier to prune the domain of each variable using state features. Score-agnostic Pruning Training Data Pruner training & predicting Pruned Data SSVM Learner Cost function testing Testing Results Can accelerate the training time; May or may not improve the testing accuracy; Score-sensitive Pruning Training Data SSVM Learner Cost function Pruner training & predicting Testing Results Can improve the testing accuracy; No training speedup, but evaluation does speed up.
28 Cyclic Architecture Pipeline architecture Task 1 Task 2 Task 3 Connect the tail of pipeline to the head?
29 Cyclic Architecture Unshared-Weight-Cyclic Training Step 1: Define a order: Task 1 Task 2 Task 3 Step 2: Predict initial outputs:
30 Cyclic Architecture Unshared-Weight-Cyclic Training Step 1: Define a order: Task 1 Task 2 Task 3 Step 2: Predict initial outputs: Use features ϕ 1 (x,y), ϕ (1,2) (x,y,y ), ϕ (1,3) (x,y,y ) x
31 Cyclic Architecture Unshared-Weight-Cyclic Training Step 1: Define a order: Task 1 Task 2 Task 3 Step 2: Predict initial outputs: Use features ϕ 1 (x,y), ϕ (1,2) (x,y,y ), ϕ (1,3) (x,y,y ) x Use features ϕ 2 (x,y), ϕ (1,2) (x,y,y ), ϕ (2,3) (x,y,y ) x w 2 w (1,2) w (2,3)
32 Cyclic Architecture Unshared-Weight-Cyclic Training Step 1: Step 2: Define a order: Task 1 Task 2 Task 3 Predict initial outputs: Use features ϕ 1 (x,y), ϕ (1,2) (x,y,y ), ϕ (1,3) (x,y,y ) x Use features ϕ 2 (x,y), ϕ (1,2) (x,y,y ), ϕ (2,3) (x,y,y ) Use features ϕ 3 (x,y), ϕ (1,3) (x,y,y ), ϕ (2,3) (x,y,y ) x Weights are independent x w 2 w (1,2) w (2,3)
33 Cyclic Architecture Shared-Weight-Cyclic Training Step 1: Define a order: Task 1 Task 2 Task 3 Step 2: Predict initial outputs: x w 1 w 2 w 3 w (2,3) w (1,2) w (1,3)
34 Cyclic Architecture Shared-Weight-Cyclic Training Step 1: Define a order: Task 1 Task 2 Task 3 Step 2: Predict initial outputs: Use features ϕ 1 (x,y), ϕ (1,2) (x,y,y ), ϕ (1,3) (x,y,y ) x w 1 w 2 w 3 w (2,3) w (1,2) w (1,3)
35 Cyclic Architecture Shared-Weight-Cyclic Training Step 1: Define a order: Task 1 Task 2 Task 3 Step 2: Predict initial outputs: Use features ϕ 1 (x,y), ϕ (1,2) (x,y,y ), ϕ (1,3) (x,y,y ) x Use features ϕ 2 (x,y), ϕ (1,2) (x,y,y ), ϕ (2,3) (x,y,y ) x w 1 w 2 w 3 w (2,3) w (1,2) w (1,3)
36 Cyclic Architecture Shared-Weight-Cyclic Training Step 1: Define a order: Task 1 Task 2 Task 3 Step 2: Predict initial outputs: Use features ϕ 1 (x,y), ϕ (1,2) (x,y,y ), ϕ (1,3) (x,y,y ) x Use features ϕ 2 (x,y), ϕ (1,2) (x,y,y ), ϕ (2,3) (x,y,y ) Use features ϕ 3 (x,y), ϕ (1,3) (x,y,y ), ϕ (2,3) (x,y,y ) x Weights are shared w 1 w 2 w 3 w (2,3) w (1,2) x w (1,3) Task 2 Turn
37 Experimental Setup Datasets: ACE2005 TAC-KBP2015 ACE-to-Wiki annotation Knowledge Base: Train/Dev/Test 338/144/117 Wikipedia (2015 dump) Train/Dev/Test 132/36/167 Freebase (2014 dump) Evaluation: Coref. NER Linking Within.Coref Cross.Coref NER & Linking MUC BCube CEAFe average CoNLL CoNLL CEAFm NERLC Hamming Hamming Combined accuracy of NER and Linking All metrics are accuracies (larger is better)
38 Results Joint Architecture Performance ACE05 Test Set Performance Train Coreference NER Link Algms. time MUC BCube CEAFe CoNLL Accu. Accu. Berkeley min a. Results of Joint Architecture without Pruning STSP min Joint w Rand Init min Joint w Good init min b. Results of Joint Architecture with Pruning Scoreagnostic min Scoresensitive min c. Results of Cyclic Architecture Unshrd Wt-Cyclic 1. Joint-Good-Init > STSP min Shared Wt-Cyclic min TAC15 Test Set Performance NER Link NERLC Within. Cross. Coref Coref Train. Algm. Accu. Accu. Accu. CoNLL CEAFm time Rank-1st Berkeley m29s a. Results of Joint Architecture without Pruning STSP m41s Joint w. Rand. Ini m19s Joint w. Good. Ini m11s b. Results of Joint Architecture with Pruning Scoreagnostic m15s Scoresensitive m2s c. Results of Cyclic Architecture Ushrd- Wt-Cyc m52s Shard-Wt- Cyc m56s Interdependency, captured by inter-task features, does benefit the system. 2. Joint-Good-Init > Joint-Rand-Init Search-based inference for large structured prediction problems suffers from local optima and is mitigated by a good initialization. 3. Search-based MTSP is competitive or better than the state-of-the-art system.
39 Results Joint Architecture Performance ACE05 Test Set Performance Train Coreference NER Link Algms. time MUC BCube CEAFe CoNLL Accu. Accu. Berkeley min a. Results of Joint Architecture without Pruning STSP min Joint w Rand Init min Joint w Good init min b. Results of Joint Architecture with Pruning Scoreagnostic min Scoresensitive min c. Results of Cyclic Architecture Unshrd Wt-Cyclic Joint-Good-Init > STSP 11min Shared Wt-Cyclic min TAC15 Test Set Performance NER Link NERLC Within. Cross. Coref Coref Train. Algm. Accu. Accu. Accu. CoNLL CEAFm time Rank-1st Berkeley m29s a. Results of Joint Architecture without Pruning STSP m41s Joint w. Rand. Ini m19s Joint w. Good. Ini m11s b. Results of Joint Architecture with Pruning Scoreagnostic m15s Scoresensitive m2s c. Results of Cyclic Architecture Ushrd- Wt-Cyc m52s Shard-Wt- Cyc m56s Interdependency, captured by inter-task features, does benefit the system. 2. Joint-Good-Init > Joint-Rand-Init Search-based inference for large structured prediction problems suffers from local optima and is mitigated by a good initialization. 3. Search-based MTSP is competitive or better than the state-of-the-art system. 4. Score-sensitive pruning of joint MTSP performs the best and takes most time
40 Results Cyclic Architecture Performance ACE05 Test Set Performance Train Coreference NER Link Algms. time MUC BCube CEAFe CoNLL Accu. Accu. Berkeley min a. Results of Joint Architecture without Pruning STSP min Joint w Rand Init min Joint w Good init min b. Results of Joint Architecture with Pruning Scoreagnostic min Scoresensitive min c. Results of Cyclic Architecture Unshard Wt-Cyclic min Shared Wt-Cyclic min TAC15 Test Set Performance NER Link NERLC Within. Cross. Coref Coref Train. Algm. Accu. Accu. Accu. CoNLL CEAFm time Rank-1st Berkeley m29s a. Results of Joint Architecture without Pruning STSP m41s Joint w. Rand. Ini m19s Joint w. Good. Ini m11s b. Results of Joint Architecture with Pruning Scoreagnostic m15s Scoresensitive m2s c. Results of Cyclic Architecture Unshared -Wt-Cyc m52s Shared- Wt-Cyc m56s Competitive accuracy, and much faster training Unshared weights perform better than shared weights
41 Summary 1. Search-based multi-task structured prediction outperforms prior work based on graphical models on all 3 entity analysis tasks. 2. Studied three learning and inference architectures: pipeline, cyclic, and joint, with trade-offs between accuracy and speed. 3. The joint architecture with score-sensitive pruning performs the best. 4. The cyclic architecture with unshared weights is competitive in accuracy and faster to train.
42 Thank You!
Lecture 13: Structured Prediction
Lecture 13: Structured Prediction Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/nlp16 CS6501: NLP 1 Quiz 2 v Lectures 9-13 v Lecture 12: before page
More informationStatistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.
http://goo.gl/xilnmn Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT
More informationA Convolutional Neural Network-based
A Convolutional Neural Network-based Model for Knowledge Base Completion Dat Quoc Nguyen Joint work with: Dai Quoc Nguyen, Tu Dinh Nguyen and Dinh Phung April 16, 2018 Introduction Word vectors learned
More informationStatistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.
http://goo.gl/jv7vj9 Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT
More informationMulticlass Classification-1
CS 446 Machine Learning Fall 2016 Oct 27, 2016 Multiclass Classification Professor: Dan Roth Scribe: C. Cheng Overview Binary to multiclass Multiclass SVM Constraint classification 1 Introduction Multiclass
More informationIntelligent Systems (AI-2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,
More informationIntelligent Systems (AI-2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 24, 2016 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,
More informationML4NLP Multiclass Classification
ML4NLP Multiclass Classification CS 590NLP Dan Goldwasser Purdue University dgoldwas@purdue.edu Social NLP Last week we discussed the speed-dates paper. Interesting perspective on NLP problems- Can we
More informationMachine Learning for Structured Prediction
Machine Learning for Structured Prediction Grzegorz Chrupa la National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Grzegorz Chrupa la (DCU) Machine Learning for
More informationProbabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov
Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly
More informationInformation Extraction from Text
Information Extraction from Text Jing Jiang Chapter 2 from Mining Text Data (2012) Presented by Andrew Landgraf, September 13, 2013 1 What is Information Extraction? Goal is to discover structured information
More informationDriving Semantic Parsing from the World s Response
Driving Semantic Parsing from the World s Response James Clarke, Dan Goldwasser, Ming-Wei Chang, Dan Roth Cognitive Computation Group University of Illinois at Urbana-Champaign CoNLL 2010 Clarke, Goldwasser,
More informationApprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning
Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire
More informationLecture 5 Neural models for NLP
CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm
More informationAN ABSTRACT OF THE DISSERTATION OF
AN ABSTRACT OF THE DISSERTATION OF Kai Zhao for the degree of Doctor of Philosophy in Computer Science presented on May 30, 2017. Title: Structured Learning with Latent Variables: Theory and Algorithms
More informationLearning and Inference over Constrained Output
Learning and Inference over Constrained Output Vasin Punyakanok Dan Roth Wen-tau Yih Dav Zimak Department of Computer Science University of Illinois at Urbana-Champaign {punyakan, danr, yih, davzimak}@uiuc.edu
More informationStatistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields
Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields Sameer Maskey Week 13, Nov 28, 2012 1 Announcements Next lecture is the last lecture Wrap up of the semester 2 Final Project
More informationLatent Variable Models in NLP
Latent Variable Models in NLP Aria Haghighi with Slav Petrov, John DeNero, and Dan Klein UC Berkeley, CS Division Latent Variable Models Latent Variable Models Latent Variable Models Observed Latent Variable
More informationDistributed Training Strategies for the Structured Perceptron
Distributed Training Strategies for the Structured Perceptron Ryan McDonald Keith Hall Gideon Mann Google, Inc., New York / Zurich {ryanmcd kbhall gmann}@google.com Abstract Perceptron training is widely
More informationLarge-Margin Thresholded Ensembles for Ordinal Regression
Large-Margin Thresholded Ensembles for Ordinal Regression Hsuan-Tien Lin and Ling Li Learning Systems Group, California Institute of Technology, U.S.A. Conf. on Algorithmic Learning Theory, October 9,
More informationA Deep Interpretation of Classifier Chains
A Deep Interpretation of Classifier Chains Jesse Read and Jaakko Holmén http://users.ics.aalto.fi/{jesse,jhollmen}/ Aalto University School of Science, Department of Information and Computer Science and
More informationAnticipating Visual Representations from Unlabeled Data. Carl Vondrick, Hamed Pirsiavash, Antonio Torralba
Anticipating Visual Representations from Unlabeled Data Carl Vondrick, Hamed Pirsiavash, Antonio Torralba Overview Problem Key Insight Methods Experiments Problem: Predict future actions and objects Image
More informationDeep Learning Srihari. Deep Belief Nets. Sargur N. Srihari
Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous
More informationSupport Vector Machines: Kernels
Support Vector Machines: Kernels CS6780 Advanced Machine Learning Spring 2015 Thorsten Joachims Cornell University Reading: Murphy 14.1, 14.2, 14.4 Schoelkopf/Smola Chapter 7.4, 7.6, 7.8 Non-Linear Problems
More informationLab 12: Structured Prediction
December 4, 2014 Lecture plan structured perceptron application: confused messages application: dependency parsing structured SVM Class review: from modelization to classification What does learning mean?
More informationFeature Noising. Sida Wang, joint work with Part 1: Stefan Wager, Percy Liang Part 2: Mengqiu Wang, Chris Manning, Percy Liang, Stefan Wager
Feature Noising Sida Wang, joint work with Part 1: Stefan Wager, Percy Liang Part 2: Mengqiu Wang, Chris Manning, Percy Liang, Stefan Wager Outline Part 0: Some backgrounds Part 1: Dropout as adaptive
More informationCS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning
CS 446 Machine Learning Fall 206 Nov 0, 206 Bayesian Learning Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes Overview Bayesian Learning Naive Bayes Logistic Regression Bayesian Learning So far, we
More informationDynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji
Dynamic Data Modeling, Recognition, and Synthesis Rui Zhao Thesis Defense Advisor: Professor Qiang Ji Contents Introduction Related Work Dynamic Data Modeling & Analysis Temporal localization Insufficient
More informationCLRG Biocreative V
CLRG ChemTMiner @ Biocreative V Sobha Lalitha Devi., Sindhuja Gopalan., Vijay Sundar Ram R., Malarkodi C.S., Lakshmi S., Pattabhi RK Rao Computational Linguistics Research Group, AU-KBC Research Centre
More informationNeural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35
Neural Networks David Rosenberg New York University July 26, 2017 David Rosenberg (New York University) DS-GA 1003 July 26, 2017 1 / 35 Neural Networks Overview Objectives What are neural networks? How
More informationAlgorithms for NLP. Classification II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley
Algorithms for NLP Classification II Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Minimize Training Error? A loss function declares how costly each mistake is E.g. 0 loss for correct label,
More informationLearning Classification with Auxiliary Probabilistic Information Quang Nguyen Hamed Valizadegan Milos Hauskrecht
Learning Classification with Auxiliary Probabilistic Information Quang Nguyen Hamed Valizadegan Milos Hauskrecht Computer Science Department University of Pittsburgh Outline Introduction Learning with
More informationSupport Vector and Kernel Methods
SIGIR 2003 Tutorial Support Vector and Kernel Methods Thorsten Joachims Cornell University Computer Science Department tj@cs.cornell.edu http://www.joachims.org 0 Linear Classifiers Rules of the Form:
More informationCoreference Resolution with! ILP-based Weighted Abduction
Coreference Resolution with! ILP-based Weighted Abduction Naoya Inoue, Ekaterina Ovchinnikova, Kentaro Inui, Jerry Hobbs Tohoku University, Japan! ISI/USC, USA Motivation Long-term goal: unified framework
More informationMidterms. PAC Learning SVM Kernels+Boost Decision Trees. MultiClass CS446 Spring 17
Midterms PAC Learning SVM Kernels+Boost Decision Trees 1 Grades are on a curve Midterms Will be available at the TA sessions this week Projects feedback has been sent. Recall that this is 25% of your grade!
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)
More informationMulti-View Learning of Word Embeddings via CCA
Multi-View Learning of Word Embeddings via CCA Paramveer S. Dhillon Dean Foster Lyle Ungar Computer & Information Science Statistics Computer & Information Science University of Pennsylvania, Philadelphia,
More informationBagging. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL 8.7
Bagging Ryan Tibshirani Data Mining: 36-462/36-662 April 23 2013 Optional reading: ISL 8.2, ESL 8.7 1 Reminder: classification trees Our task is to predict the class label y {1,... K} given a feature vector
More informationStructured Prediction Models via the Matrix-Tree Theorem
Structured Prediction Models via the Matrix-Tree Theorem Terry Koo Amir Globerson Xavier Carreras Michael Collins maestro@csail.mit.edu gamir@csail.mit.edu carreras@csail.mit.edu mcollins@csail.mit.edu
More informationMargin-based Decomposed Amortized Inference
Margin-based Decomposed Amortized Inference Gourab Kundu and Vivek Srikumar and Dan Roth University of Illinois, Urbana-Champaign Urbana, IL. 61801 {kundu2, vsrikum2, danr}@illinois.edu Abstract Given
More informationLecture 21: Spectral Learning for Graphical Models
10-708: Probabilistic Graphical Models 10-708, Spring 2016 Lecture 21: Spectral Learning for Graphical Models Lecturer: Eric P. Xing Scribes: Maruan Al-Shedivat, Wei-Cheng Chang, Frederick Liu 1 Motivation
More informationNatural Language Processing
Natural Language Processing Global linear models Based on slides from Michael Collins Globally-normalized models Why do we decompose to a sequence of decisions? Can we directly estimate the probability
More informationThe Noisy Channel Model and Markov Models
1/24 The Noisy Channel Model and Markov Models Mark Johnson September 3, 2014 2/24 The big ideas The story so far: machine learning classifiers learn a function that maps a data item X to a label Y handle
More informationSpatial Role Labeling CS365 Course Project
Spatial Role Labeling CS365 Course Project Amit Kumar, akkumar@iitk.ac.in Chandra Sekhar, gchandra@iitk.ac.in Supervisor : Dr.Amitabha Mukerjee ABSTRACT In natural language processing one of the important
More informationCS 188: Artificial Intelligence. Outline
CS 188: Artificial Intelligence Lecture 21: Perceptrons Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein. Outline Generative vs. Discriminative Binary Linear Classifiers Perceptron Multi-class
More informationLinear Classifiers IV
Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers IV Blaine Nelson, Tobias Scheffer Contents Classification Problem Bayesian Classifier Decision Linear Classifiers, MAP Models Logistic
More informationStructured Prediction
Structured Prediction Classification Algorithms Classify objects x X into labels y Y First there was binary: Y = {0, 1} Then multiclass: Y = {1,...,6} The next generation: Structured Labels Structured
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationarxiv: v2 [cs.cl] 20 Aug 2016
Solving General Arithmetic Word Problems Subhro Roy and Dan Roth University of Illinois, Urbana Champaign {sroy9, danr}@illinois.edu arxiv:1608.01413v2 [cs.cl] 20 Aug 2016 Abstract This paper presents
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) Human Brain Neurons Input-Output Transformation Input Spikes Output Spike Spike (= a brief pulse) (Excitatory Post-Synaptic Potential)
More informationSemantic Role Labeling via Tree Kernel Joint Inference
Semantic Role Labeling via Tree Kernel Joint Inference Alessandro Moschitti, Daniele Pighin and Roberto Basili Department of Computer Science University of Rome Tor Vergata 00133 Rome, Italy {moschitti,basili}@info.uniroma2.it
More informationA Support Vector Method for Multivariate Performance Measures
A Support Vector Method for Multivariate Performance Measures Thorsten Joachims Cornell University Department of Computer Science Thanks to Rich Caruana, Alexandru Niculescu-Mizil, Pierre Dupont, Jérôme
More information26 : Spectral GMs. Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G.
10-708: Probabilistic Graphical Models, Spring 2015 26 : Spectral GMs Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G. 1 Introduction A common task in machine learning is to work with
More informationCross-Modal Retrieval: A Pairwise Classification Approach
Cross-Modal Retrieval: A Pairwise Classification Approach Aditya Krishna Menon 1,2, Didi Surian 1,3 and Sanjay Chawla 3,4 1 National ICT Australia, Australia 2 Australian National University, Australia
More informationBoosting: Algorithms and Applications
Boosting: Algorithms and Applications Lecture 11, ENGN 4522/6520, Statistical Pattern Recognition and Its Applications in Computer Vision ANU 2 nd Semester, 2008 Chunhua Shen, NICTA/RSISE Boosting Definition
More informationSum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017
Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth
More informationLarge-Margin Thresholded Ensembles for Ordinal Regression
Large-Margin Thresholded Ensembles for Ordinal Regression Hsuan-Tien Lin (accepted by ALT 06, joint work with Ling Li) Learning Systems Group, Caltech Workshop Talk in MLSS 2006, Taipei, Taiwan, 07/25/2006
More information1 SVM Learning for Interdependent and Structured Output Spaces
1 SVM Learning for Interdependent and Structured Output Spaces Yasemin Altun Toyota Technological Institute at Chicago, Chicago IL 60637 USA altun@tti-c.org Thomas Hofmann Google, Zurich, Austria thofmann@google.com
More informationMIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,
MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run
More informationAutomatically Evaluating Text Coherence using Anaphora and Coreference Resolution
1 1 Barzilay 1) Automatically Evaluating Text Coherence using Anaphora and Coreference Resolution Ryu Iida 1 and Takenobu Tokunaga 1 We propose a metric for automatically evaluating discourse coherence
More informationAn Introduction to Machine Learning
An Introduction to Machine Learning L6: Structured Estimation Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune, January
More informationCMU at SemEval-2016 Task 8: Graph-based AMR Parsing with Infinite Ramp Loss
CMU at SemEval-2016 Task 8: Graph-based AMR Parsing with Infinite Ramp Loss Jeffrey Flanigan Chris Dyer Noah A. Smith Jaime Carbonell School of Computer Science, Carnegie Mellon University, Pittsburgh,
More informationProgressive Random k-labelsets for Cost-Sensitive Multi-Label Classification
1 26 Progressive Random k-labelsets for Cost-Sensitive Multi-Label Classification Yu-Ping Wu Hsuan-Tien Lin Department of Computer Science and Information Engineering, National Taiwan University, Taiwan
More informationCorrelation Autoencoder Hashing for Supervised Cross-Modal Search
Correlation Autoencoder Hashing for Supervised Cross-Modal Search Yue Cao, Mingsheng Long, Jianmin Wang, and Han Zhu School of Software Tsinghua University The Annual ACM International Conference on Multimedia
More informationResolving Entity Coreference in Croatian with a Constrained Mention-Pair Model. Goran Glavaš and Jan Šnajder TakeLab UNIZG
Resolving Entity Coreference in Croatian with a Constrained Mention-Pair Model Goran Glavaš and Jan Šnajder TakeLab UNIZG BSNLP 2015 @ RANLP, Hissar 10 Sep 2015 Background & Motivation Entity coreference
More informationNeural networks and support vector machines
Neural netorks and support vector machines Perceptron Input x 1 Weights 1 x 2 x 3... x D 2 3 D Output: sgn( x + b) Can incorporate bias as component of the eight vector by alays including a feature ith
More informationSum-Product Networks: A New Deep Architecture
Sum-Product Networks: A New Deep Architecture Pedro Domingos Dept. Computer Science & Eng. University of Washington Joint work with Hoifung Poon 1 Graphical Models: Challenges Bayesian Network Markov Network
More informationMedical Question Answering for Clinical Decision Support
Medical Question Answering for Clinical Decision Support Travis R. Goodwin and Sanda M. Harabagiu The University of Texas at Dallas Human Language Technology Research Institute http://www.hlt.utdallas.edu
More informationCS395T: Structured Models for NLP Lecture 19: Advanced NNs I. Greg Durrett
CS395T: Structured Models for NLP Lecture 19: Advanced NNs I Greg Durrett Administrivia Kyunghyun Cho (NYU) talk Friday 11am GDC 6.302 Project 3 due today! Final project out today! Proposal due in 1 week
More informationBe able to define the following terms and answer basic questions about them:
CS440/ECE448 Section Q Fall 2017 Final Review Be able to define the following terms and answer basic questions about them: Probability o Random variables, axioms of probability o Joint, marginal, conditional
More informationMulticlass Boosting with Repartitioning
Multiclass Boosting with Repartitioning Ling Li Learning Systems Group, Caltech ICML 2006 Binary and Multiclass Problems Binary classification problems Y = { 1, 1} Multiclass classification problems Y
More informationarxiv: v3 [cs.cl] 22 Jun 2017
Optimizing Differentiable Relaxations of Coreference Evaluation Metrics Phong Le 1 Ivan Titov 1,2 1 ILLC, University of Amsterdam 2 ILCC, School of Informatics, University of Edinburgh p.le@uva.nl ititov@inf.ed.ac.uk
More informationArtificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen
Artificial Neural Networks Introduction to Computational Neuroscience Tambet Matiisen 2.04.2018 Artificial neural network NB! Inspired by biology, not based on biology! Applications Automatic speech recognition
More informationMachine Learning (CS 567) Lecture 2
Machine Learning (CS 567) Lecture 2 Time: T-Th 5:00pm - 6:20pm Location: GFS118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationErrors, and What to Do. CS 188: Artificial Intelligence Fall What to Do About Errors. Later On. Some (Simplified) Biology
CS 188: Artificial Intelligence Fall 2011 Lecture 22: Perceptrons and More! 11/15/2011 Dan Klein UC Berkeley Errors, and What to Do Examples of errors Dear GlobalSCAPE Customer, GlobalSCAPE has partnered
More informationNeural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17
3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural
More informationCS 188: Artificial Intelligence Fall 2011
CS 188: Artificial Intelligence Fall 2011 Lecture 22: Perceptrons and More! 11/15/2011 Dan Klein UC Berkeley Errors, and What to Do Examples of errors Dear GlobalSCAPE Customer, GlobalSCAPE has partnered
More informationComputational Learning Theory
Computational Learning Theory Pardis Noorzad Department of Computer Engineering and IT Amirkabir University of Technology Ordibehesht 1390 Introduction For the analysis of data structures and algorithms
More informationAPPLIED DEEP LEARNING PROF ALEXIEI DINGLI
APPLIED DEEP LEARNING PROF ALEXIEI DINGLI TECH NEWS TECH NEWS HOW TO DO IT? TECH NEWS APPLICATIONS TECH NEWS TECH NEWS NEURAL NETWORKS Interconnected set of nodes and edges Designed to perform complex
More informationOnline Passive-Aggressive Algorithms. Tirgul 11
Online Passive-Aggressive Algorithms Tirgul 11 Multi-Label Classification 2 Multilabel Problem: Example Mapping Apps to smart folders: Assign an installed app to one or more folders Candy Crush Saga 3
More informationQualifier: CS 6375 Machine Learning Spring 2015
Qualifier: CS 6375 Machine Learning Spring 2015 The exam is closed book. You are allowed to use two double-sided cheat sheets and a calculator. If you run out of room for an answer, use an additional sheet
More informationCHAPTER 6 CONCLUSION AND FUTURE SCOPE
CHAPTER 6 CONCLUSION AND FUTURE SCOPE 146 CHAPTER 6 CONCLUSION AND FUTURE SCOPE 6.1 SUMMARY The first chapter of the thesis highlighted the need of accurate wind forecasting models in order to transform
More informationConditional Random Field
Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions
More informationLecture 15. Probabilistic Models on Graph
Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how
More informationCutting Plane Training of Structural SVM
Cutting Plane Training of Structural SVM Seth Neel University of Pennsylvania sethneel@wharton.upenn.edu September 28, 2017 Seth Neel (Penn) Short title September 28, 2017 1 / 33 Overview Structural SVMs
More informationInstance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows Kn-Nearest
More informationCORE: Context-Aware Open Relation Extraction with Factorization Machines. Fabio Petroni
CORE: Context-Aware Open Relation Extraction with Factorization Machines Fabio Petroni Luciano Del Corro Rainer Gemulla Open relation extraction Open relation extraction is the task of extracting new facts
More informationFace detection and recognition. Detection Recognition Sally
Face detection and recognition Detection Recognition Sally Face detection & recognition Viola & Jones detector Available in open CV Face recognition Eigenfaces for face recognition Metric learning identification
More informationLearning From Data Lecture 2 The Perceptron
Learning From Data Lecture 2 The Perceptron The Learning Setup A Simple Learning Algorithm: PLA Other Views of Learning Is Learning Feasible: A Puzzle M. Magdon-Ismail CSCI 4100/6100 recap: The Plan 1.
More informationNon-Linearity. CS 188: Artificial Intelligence. Non-Linear Separators. Non-Linear Separators. Deep Learning I
Non-Linearity CS 188: Artificial Intelligence Deep Learning I Instructors: Pieter Abbeel & Anca Dragan --- University of California, Berkeley [These slides were created by Dan Klein, Pieter Abbeel, Anca
More informationWhat is Optimization? Optimization is the process of minimizing or maximizing a function subject to several constraints on its variables
Ø Ø Ø Ø Ø Ø Ø Ø Ø Ø Ø Ø Ø Ø Ø Ø Ø What is Optimization? Optimization is the process of minimizing or maximizing a function subject to several constraints on its variables Optimization In our every day
More informationNamed Entity Recognition using Maximum Entropy Model SEEM5680
Named Entity Recognition using Maximum Entroy Model SEEM5680 Named Entity Recognition System Named Entity Recognition (NER): Identifying certain hrases/word sequences in a free text. Generally it involves
More informationSupport Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Support Vector Machines CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 A Linearly Separable Problem Consider the binary classification
More informationA Linear Programming Formulation for Global Inference in Natural Language Tasks
A Linear Programming Formulation for Global Inference in Natural Language Tasks Dan Roth Wen-tau Yih Department of Computer Science University of Illinois at Urbana-Champaign {danr, yih}@uiuc.edu Abstract
More informationFoundations of Machine Learning
Introduction to ML Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu page 1 Logistics Prerequisites: basics in linear algebra, probability, and analysis of algorithms. Workload: about
More informationAutomatic Feature Decomposition for Single View Co-training
Automatic Feature Decomposition for Single View Co-training Minmin Chen, Kilian Weinberger, Yixin Chen Computer Science and Engineering Washington University in Saint Louis Minmin Chen, Kilian Weinberger,
More informationGlobal Machine Learning for Spatial Ontology Population
Global Machine Learning for Spatial Ontology Population Parisa Kordjamshidi, Marie-Francine Moens KU Leuven, Belgium Abstract Understanding spatial language is important in many applications such as geographical
More information2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media,
2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising
More informationOnline Passive- Aggressive Algorithms
Online Passive- Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007 Tutor: Eneldo Loza Mencía Seminar aus Maschinellem Lernen WS07/08 Overview Online algorithms Online Binary Classification Problem Perceptron
More information