# Chunking with Support Vector Machines

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 NAACL2001 Chunking with Support Vector Machines Graduate School of Information Science, Nara Institute of Science and Technology, JAPAN Taku Kudo, Yuji Matsumoto

2 Chunking (1/2) Dividing sentences into syntactically related non-overlapping groups - Example of BaseNP chunking: In [ early trading ] in [ Hong Kong ] [ Monday ], [ gold ] was... - Example of Base Phrase chunking: [ In ]/PP [ early trading ]/NP [ in ]/PP [ Hong Kong ]/NP [ Monday ]/NP, [ gold ]/NP [ was ]/VP...

3 Chunking (2/2) Other Chunking Tasks: Named Entity extraction Japanese bunsetsu identification Tokenization Part-of-speech tagging

4 Our approaches Propose a general framework for chunking based on SVMs Apply the weighted voting from 8 SVMs-based systems trained with distinct chunk representations

5 Outline Brief introduction to Support Vector Machines How do we apply SVMs to Chunking? Weighted Voting from 8 SVM-based systems Experiments and Evaluation Summary and future work

6 Support Vector Machines (1/3) V.Vapnik 1995 Two strong properties High generalization performance independent of feature dimension Training with combinations of multiple features by using a Kernel Function.

7 Support Vector Machines (2/3) Separate positive and negative (binary) examples with a Linear Hyperplane: (w x + b, w, x R n, b R) Find an optimal hyperplane (parameter w, b) with the Maximal Margin Strategy

8 Support Vector Machines (3/3) Potential to carry out non-linear classification. Replace every dot product in optimization formula with some Kernel Function Build a linear classifier in a higher-dimensional feature space d-th polynomial kernel K(x i, x j ) = (x i x j + 1) d considering combinations of up to d features

9 Chunk representation (1/2) Regard Chunking as a Tagging task Inside/Outside (IOB1) representation I O Current token is inside a chunk. Current token is outside any given chunk. B Current token is the beginning of a chunk which immediately follows another chunk. Tjong Kim Sang (1997) introduces three alternative versions IOB2, IOE1 and IOE2

10 Chunk representation (2/2) Base NP Chunking Base Phrase Chunking IOB1 IOB2 IOE1 IOE2 IOB2 In O O O O B-PP early I B I I B-NP trading I I I E I-NP in O O O O B-PP Hong I I I I B-NP Kong I I E E I-NP Monday B B I E B-NP

11 Applying SVMs to Chunking Chunking as a classification task of the IOB tags We use the pair-wise method to extend a binary classifier (SVMs) to a multi-class classifier

12 Feature Sets for Learning Parsing from left to right, i 1, i 2 IOB tags are added dynamically(forward Parsing) Parsing from right to left, i + 1, i + 2 IOB tags are added dynamically(backward Parsing)

13 Chunking with Weighted Voting (1/3) 8 SVM-based classifiers can be built: {IOB1/IOB2/IOE1/IOE2} {Forward, Backward} Final IOB tag is obtained from the weighted voting How can we assign voting weights to individual classifiers? Uniform weights (baseline) 5-fold Cross Validation VC bound Leave-One-Out bound

14 Chunking with Weighted Voting (2/3) Estimate the accuracy of test data (not training data) From the theoretical background of SVMs. Only using the training data Without re-sampling: training and estimation simultaneously VC bound Estimate the accuracy from the size of the margin Leave-One-Out bound Estimate the accuracy from the number of support vectors

15 Experiments basenp-s: Penn Tree Bank/WSJ A standard data set for basenp chunking basenp-l: Penn Tree Bank/WSJ base Phrase chunking: Penn Tree Bank/WSJ Total 10 types of base phrase classes VP, PP, ADJP.. Data set for CoNLL-2000 Shared Task Evaluation measure: F-measure Kernel Function: 2nd-polynomial kernel

16 Results of Weighted Voting A B C D basenp-s basenp-l base Phrase chunking A:Uniform B:Cross Validation C:VC bound D:L-O-O bound

17 Results of individual representations basenp-s: F β=1 Cross Validation VC bound L-O-O bound IOB1-F IOB1-B IOB2-F IOB2-B IOE1-F IOE1-B IOE2-F IOE2-B

18 Results of individual representations basenp-l: F β=1 VC bound L-O-O bound IOB2-F IOB2-B IOE2-F IOE2-B

19 Results of individual representations base Phrase Chunking: F β=1 Cross Validation VC bound L-O-O bound IOB1-F IOB1-B IOB2-F IOB2-B IOE1-F IOE1-B IOE2-F IOE2-B

20 Discussion Accuracy improved regardless of the voting scheme used Cross-Validation and VC bound outperform Leave-One-Out bound and Uniform in almost all cases Comparing VC bound to Cross Validation comparable accuracy both provide good criteria for classifier selection but, Cross Validation requires a larger amount of computational resources

21 Comparison with related work Tjong Kim Sang 2000 Proposed method Outline of System Weighted voting of different Machine Learning algorithms (MBL, ME, IGTree) and distinct chunk representations (IOB1/IOB2/IOE1/IOE2) Weighted voting of 8-SVMs based systems trained with distinct chunk representations (IOB1/IOB2/IOE1/IOE2) F-measure basenp-s basenp-l base Phrase basenp-s basenp-l base Phrase 93.91

22 Summary We proposed a general framework for chunking based on SVMs. We can achieve higher accuracy compared to previous methods We can also improve the accuracy by applying weighted voting from 8 SVMs-based classifiers trained with distinct chunk representations For the weights assigned to the individual classifiers, we applied methods stemming from the theoretical background of SVMs (VC bound and Leave-One-Out bound)

23 Future Work Application to other chunking tasks (NE, POS tagging, bunsetsu identification) Consider more predictable bounds such as Span SVM [Chapelle,Vapnik 2000] Incorporate variable length models The context length features were selected ad-hoc But, the optimal context length depends on the task

### Transition-Based Parsing

Transition-Based Parsing Based on atutorial at COLING-ACL, Sydney 2006 with Joakim Nivre Sandra Kübler, Markus Dickinson Indiana University E-mail: skuebler,md7@indiana.edu Transition-Based Parsing 1(11)

### Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University

### Support Vector and Kernel Methods

SIGIR 2003 Tutorial Support Vector and Kernel Methods Thorsten Joachims Cornell University Computer Science Department tj@cs.cornell.edu http://www.joachims.org 0 Linear Classifiers Rules of the Form:

### Perceptron Revisited: Linear Separators. Support Vector Machines

Support Vector Machines Perceptron Revisited: Linear Separators Binary classification can be viewed as the task of separating classes in feature space: w T x + b > 0 w T x + b = 0 w T x + b < 0 Department

### Learning with kernels and SVM

Learning with kernels and SVM Šámalova chata, 23. května, 2006 Petra Kudová Outline Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Learning from data find

### Linear Classifiers IV

Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers IV Blaine Nelson, Tobias Scheffer Contents Classification Problem Bayesian Classifier Decision Linear Classifiers, MAP Models Logistic

### 13A. Computational Linguistics. 13A. Log-Likelihood Dependency Parsing. CSC 2501 / 485 Fall 2017

Computational Linguistics CSC 2501 / 485 Fall 2017 13A 13A. Log-Likelihood Dependency Parsing Gerald Penn Department of Computer Science, University of Toronto Based on slides by Yuji Matsumoto, Dragomir

### Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated

Fall 3 Computer Vision Overview of Statistical Tools Statistical Inference Haibin Ling Observation inference Decision Prior knowledge http://www.dabi.temple.edu/~hbling/teaching/3f_5543/index.html Bayesian

### CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri

CSE 151 Machine Learning Instructor: Kamalika Chaudhuri Linear Classification Given labeled data: (xi, feature vector yi) label i=1,..,n where y is 1 or 1, find a hyperplane to separate from Linear Classification

### Support vector machines Lecture 4

Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The

### MIRA, SVM, k-nn. Lirong Xia

MIRA, SVM, k-nn Lirong Xia Linear Classifiers (perceptrons) Inputs are feature values Each feature has a weight Sum is the activation activation w If the activation is: Positive: output +1 Negative, output

### Machine Learning for Structured Prediction

Machine Learning for Structured Prediction Grzegorz Chrupa la National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Grzegorz Chrupa la (DCU) Machine Learning for

### Regularization Introduction to Machine Learning. Matt Gormley Lecture 10 Feb. 19, 2018

1-61 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Regularization Matt Gormley Lecture 1 Feb. 19, 218 1 Reminders Homework 4: Logistic

### NLP Programming Tutorial 11 - The Structured Perceptron

NLP Programming Tutorial 11 - The Structured Perceptron Graham Neubig Nara Institute of Science and Technology (NAIST) 1 Prediction Problems Given x, A book review Oh, man I love this book! This book is

### Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

### Lecture 10: A brief introduction to Support Vector Machine

Lecture 10: A brief introduction to Support Vector Machine Advanced Applied Multivariate Analysis STAT 2221, Fall 2013 Sungkyu Jung Department of Statistics, University of Pittsburgh Xingye Qiao Department

### Penn Treebank Parsing. Advanced Topics in Language Processing Stephen Clark

Penn Treebank Parsing Advanced Topics in Language Processing Stephen Clark 1 The Penn Treebank 40,000 sentences of WSJ newspaper text annotated with phrasestructure trees The trees contain some predicate-argument

### EXPERIMENTS ON PHRASAL CHUNKING IN NLP USING EXPONENTIATED GRADIENT FOR STRUCTURED PREDICTION

EXPERIMENTS ON PHRASAL CHUNKING IN NLP USING EXPONENTIATED GRADIENT FOR STRUCTURED PREDICTION by Porus Patell Bachelor of Engineering, University of Mumbai, 2009 a project report submitted in partial fulfillment

### Multiclass Classification-1

CS 446 Machine Learning Fall 2016 Oct 27, 2016 Multiclass Classification Professor: Dan Roth Scribe: C. Cheng Overview Binary to multiclass Multiclass SVM Constraint classification 1 Introduction Multiclass

### Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete

### CS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines

CS4495/6495 Introduction to Computer Vision 8C-L3 Support Vector Machines Discriminative classifiers Discriminative classifiers find a division (surface) in feature space that separates the classes Several

### Machine Learning Practice Page 2 of 2 10/28/13

Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes

### W vs. QCD Jet Tagging at the Large Hadron Collider

W vs. QCD Jet Tagging at the Large Hadron Collider Bryan Anenberg: anenberg@stanford.edu; CS229 December 13, 2013 Problem Statement High energy collisions of protons at the Large Hadron Collider (LHC)

### TDT4173 Machine Learning

TDT4173 Machine Learning Lecture 3 Bagging & Boosting + SVMs Norwegian University of Science and Technology Helge Langseth IT-VEST 310 helgel@idi.ntnu.no 1 TDT4173 Machine Learning Outline 1 Ensemble-methods

### c 4, < y 2, 1 0, otherwise,

Fundamentals of Big Data Analytics Univ.-Prof. Dr. rer. nat. Rudolf Mathar Problem. Probability theory: The outcome of an experiment is described by three events A, B and C. The probabilities Pr(A) =,

### Statistical Methods for NLP

Statistical Methods for NLP Stochastic Grammars Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(22) Structured Classification

### Support Vector Machines

Wien, June, 2010 Paul Hofmarcher, Stefan Theussl, WU Wien Hofmarcher/Theussl SVM 1/21 Linear Separable Separating Hyperplanes Non-Linear Separable Soft-Margin Hyperplanes Hofmarcher/Theussl SVM 2/21 (SVM)

### Conditional Random Field

Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions

### MACHINE LEARNING. Support Vector Machines. Alessandro Moschitti

MACHINE LEARNING Support Vector Machines Alessandro Moschitti Department of information and communication technology University of Trento Email: moschitti@dit.unitn.it Summary Support Vector Machines

### Statistical Pattern Recognition

Statistical Pattern Recognition Support Vector Machine (SVM) Hamid R. Rabiee Hadi Asheri, Jafar Muhammadi, Nima Pourdamghani Spring 2013 http://ce.sharif.edu/courses/91-92/2/ce725-1/ Agenda Introduction

### Prenominal Modifier Ordering via MSA. Alignment

Introduction Prenominal Modifier Ordering via Multiple Sequence Alignment Aaron Dunlop Margaret Mitchell 2 Brian Roark Oregon Health & Science University Portland, OR 2 University of Aberdeen Aberdeen,

### Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

### Final Exam, Machine Learning, Spring 2009

Name: Andrew ID: Final Exam, 10701 Machine Learning, Spring 2009 - The exam is open-book, open-notes, no electronics other than calculators. - The maximum possible score on this exam is 100. You have 3

### Classification and Support Vector Machine

Classification and Support Vector Machine Yiyong Feng and Daniel P. Palomar The Hong Kong University of Science and Technology (HKUST) ELEC 5470 - Convex Optimization Fall 2017-18, HKUST, Hong Kong Outline

### Support Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs

E0 270 Machine Learning Lecture 5 (Jan 22, 203) Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in

### TDT 4173 Machine Learning and Case Based Reasoning. Helge Langseth og Agnar Aamodt. NTNU IDI Seksjon for intelligente systemer

TDT 4173 Machine Learning and Case Based Reasoning Lecture 6 Support Vector Machines. Ensemble Methods Helge Langseth og Agnar Aamodt NTNU IDI Seksjon for intelligente systemer Outline 1 Wrap-up from last

### Lecture Support Vector Machine (SVM) Classifiers

Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in

### Support Vector Machines

EE 17/7AT: Optimization Models in Engineering Section 11/1 - April 014 Support Vector Machines Lecturer: Arturo Fernandez Scribe: Arturo Fernandez 1 Support Vector Machines Revisited 1.1 Strictly) Separable

### UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write

### Low-Dimensional Discriminative Reranking. Jagadeesh Jagarlamudi and Hal Daume III University of Maryland, College Park

Low-Dimensional Discriminative Reranking Jagadeesh Jagarlamudi and Hal Daume III University of Maryland, College Park Discriminative Reranking Useful for many NLP tasks Enables us to use arbitrary features

### Brief Introduction to Machine Learning

Brief Introduction to Machine Learning Yuh-Jye Lee Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU August 29, 2016 1 / 49 1 Introduction 2 Binary Classification 3 Support Vector

### Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields

Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields Sameer Maskey Week 13, Nov 28, 2012 1 Announcements Next lecture is the last lecture Wrap up of the semester 2 Final Project

### Midterms. PAC Learning SVM Kernels+Boost Decision Trees. MultiClass CS446 Spring 17

Midterms PAC Learning SVM Kernels+Boost Decision Trees 1 Grades are on a curve Midterms Will be available at the TA sessions this week Projects feedback has been sent. Recall that this is 25% of your grade!

### CS 590D Lecture Notes

CS 590D Lecture Notes David Wemhoener December 017 1 Introduction Figure 1: Various Separators We previously looked at the perceptron learning algorithm, which gives us a separator for linearly separable

### Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly

### Infinite Ensemble Learning with Support Vector Machinery

Infinite Ensemble Learning with Support Vector Machinery Hsuan-Tien Lin and Ling Li Learning Systems Group, California Institute of Technology ECML/PKDD, October 4, 2005 H.-T. Lin and L. Li (Learning Systems

### CIS 520: Machine Learning Oct 09, Kernel Methods

CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed

### Spectral Learning for Non-Deterministic Dependency Parsing

Spectral Learning for Non-Deterministic Dependency Parsing Franco M. Luque 1 Ariadna Quattoni 2 Borja Balle 2 Xavier Carreras 2 1 Universidad Nacional de Córdoba 2 Universitat Politècnica de Catalunya

### Better! Faster! Stronger*!

Jason Eisner Jiarong Jiang He He Better! Faster! Stronger*! Learning to balance accuracy and efficiency when predicting linguistic structures (*theorems) Hal Daumé III UMD CS, UMIACS, Linguistics me@hal3.name

### CS 231A Section 1: Linear Algebra & Probability Review

CS 231A Section 1: Linear Algebra & Probability Review 1 Topics Support Vector Machines Boosting Viola-Jones face detector Linear Algebra Review Notation Operations & Properties Matrix Calculus Probability

### Probabilistic Context Free Grammars. Many slides from Michael Collins

Probabilistic Context Free Grammars Many slides from Michael Collins Overview I Probabilistic Context-Free Grammars (PCFGs) I The CKY Algorithm for parsing with PCFGs A Probabilistic Context-Free Grammar

### Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,

### CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang

CS 231A Section 1: Linear Algebra & Probability Review Kevin Tang Kevin Tang Section 1-1 9/30/2011 Topics Support Vector Machines Boosting Viola Jones face detector Linear Algebra Review Notation Operations

### Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 24, 2016 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,

### Support vector networks

Support vector networks Séance «svn» de l'ue «apprentissage automatique» Bruno Bouzy bruno.bouzy@parisdescartes.fr www.mi.parisdescartes.fr/~bouzy Reference These slides present the following paper: C.Cortes,

### Kernels for Structured Natural Language Data

Kernels for Structured atural Language Data Jun Suzuki, Yutaka Sasaki, and Eisaku Maeda TT Communication Science Laboratories, TT Corp -4 Hikaridai, Seika-cho, Soraku-gun, Kyoto, 69-037 Japan {jun, sasaki,

### Task-Oriented Dialogue System (Young, 2000)

2 Review Task-Oriented Dialogue System (Young, 2000) 3 http://rsta.royalsocietypublishing.org/content/358/1769/1389.short Speech Signal Speech Recognition Hypothesis are there any action movies to see

### PAC-learning, VC Dimension and Margin-based Bounds

More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based

### The definitions and notation are those introduced in the lectures slides. R Ex D [h

Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 2 October 04, 2016 Due: October 18, 2016 A. Rademacher complexity The definitions and notation

### Introduction to Classification and Sequence Labeling

Introduction to Classification and Sequence Labeling Grzegorz Chrupa la Spoken Language Systems Saarland University Annual IRTG Meeting 2009 Grzegorz Chrupa la (UdS) Classification and Sequence Labeling

### Relevance Vector Machines

LUT February 21, 2011 Support Vector Machines Model / Regression Marginal Likelihood Regression Relevance vector machines Exercise Support Vector Machines The relevance vector machine (RVM) is a bayesian

### A gentle introduction to Hidden Markov Models

A gentle introduction to Hidden Markov Models Mark Johnson Brown University November 2009 1 / 27 Outline What is sequence labeling? Markov models Hidden Markov models Finding the most likely state sequence

### Augmented Statistical Models for Speech Recognition

Augmented Statistical Models for Speech Recognition Mark Gales & Martin Layton 31 August 2005 Trajectory Models For Speech Processing Workshop Overview Dependency Modelling in Speech Recognition: latent

### UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

### Structured Prediction Theory and Algorithms

Structured Prediction Theory and Algorithms Joint work with Corinna Cortes (Google Research) Vitaly Kuznetsov (Google Research) Scott Yang (Courant Institute) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE

### Effectiveness of complex index terms in information retrieval

Effectiveness of complex index terms in information retrieval Tokunaga Takenobu, Ogibayasi Hironori and Tanaka Hozumi Department of Computer Science Tokyo Institute of Technology Abstract This paper explores

### SVMs, Duality and the Kernel Trick

SVMs, Duality and the Kernel Trick Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 26 th, 2007 2005-2007 Carlos Guestrin 1 SVMs reminder 2005-2007 Carlos Guestrin 2 Today

### Extracting Information from Text

Extracting Information from Text Research Seminar Statistical Natural Language Processing Angela Bohn, Mathias Frey, November 25, 2010 Main goals Extract structured data from unstructured text Training

### CSE 546 Final Exam, Autumn 2013

CSE 546 Final Exam, Autumn 0. Personal info: Name: Student ID: E-mail address:. There should be 5 numbered pages in this exam (including this cover sheet).. You can use any material you brought: any book,

### Learning Features from Co-occurrences: A Theoretical Analysis

Learning Features from Co-occurrences: A Theoretical Analysis Yanpeng Li IBM T. J. Watson Research Center Yorktown Heights, New York 10598 liyanpeng.lyp@gmail.com Abstract Representing a word by its co-occurrences

### Linear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights

Linear Discriminant Functions and Support Vector Machines Linear, threshold units CSE19, Winter 11 Biometrics CSE 19 Lecture 11 1 X i : inputs W i : weights θ : threshold 3 4 5 1 6 7 Courtesy of University

### Feature-Frequency Adaptive On-line Training for Fast and Accurate Natural Language Processing

Feature-Frequency Adaptive On-line Training for Fast and Accurate Natural Language Processing Xu Sun Peking University Wenjie Li Hong Kong Polytechnic University Houfeng Wang Peking University Qin Lu Hong

### Conditional Random Fields

Conditional Random Fields Micha Elsner February 14, 2013 2 Sums of logs Issue: computing α forward probabilities can undeflow Normally we d fix this using logs But α requires a sum of probabilities Not

### Linear Classification and SVM. Dr. Xin Zhang

Linear Classification and SVM Dr. Xin Zhang Email: eexinzhang@scut.edu.cn What is linear classification? Classification is intrinsically non-linear It puts non-identical things in the same class, so a

### Dynamic Time-Alignment Kernel in Support Vector Machine

Dynamic Time-Alignment Kernel in Support Vector Machine Hiroshi Shimodaira School of Information Science, Japan Advanced Institute of Science and Technology sim@jaist.ac.jp Mitsuru Nakai School of Information

### Evaluation requires to define performance measures to be optimized

Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation

### Kernel Methods. Machine Learning A W VO

Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance

### Lecture 7: Kernels for Classification and Regression

Lecture 7: Kernels for Classification and Regression CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 15, 2011 Outline Outline A linear regression problem Linear auto-regressive

### Hidden Markov Models

CS 2750: Machine Learning Hidden Markov Models Prof. Adriana Kovashka University of Pittsburgh March 21, 2016 All slides are from Ray Mooney Motivating Example: Part Of Speech Tagging Annotate each word

### An Introduction to Machine Learning

An Introduction to Machine Learning L6: Structured Estimation Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune, January

### Logistic Regression: Online, Lazy, Kernelized, Sequential, etc.

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Harsha Veeramachaneni Thomson Reuter Research and Development April 1, 2010 Harsha Veeramachaneni (TR R&D) Logistic Regression April 1, 2010

### Sequence Labelling SVMs Trained in One Pass

Sequence Labelling SVMs Trained in One Pass Antoine Bordes 12, Nicolas Usunier 1, and Léon Bottou 2 1 LIP6, Université Paris 6, 104 Avenue du Pdt Kennedy, 75016 Paris, France 2 NEC Laboratories America,

### Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

### Introduction to Machine Learning

1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer

### Parsing with Context-Free Grammars

Parsing with Context-Free Grammars Berlin Chen 2005 References: 1. Natural Language Understanding, chapter 3 (3.1~3.4, 3.6) 2. Speech and Language Processing, chapters 9, 10 NLP-Berlin Chen 1 Grammars

### A Tree-Position Kernel for Document Compression

A Tree-Position Kernel for Document Compression Hal Daumé III University of Southern California Department of Computer Science Information Sciences Institute 4676 Admiralty Way, Suite 1001 Marina del Rey,

### Midterm Exam, Spring 2005

10-701 Midterm Exam, Spring 2005 1. Write your name and your email address below. Name: Email address: 2. There should be 15 numbered pages in this exam (including this cover sheet). 3. Write your name

### Incorporating Invariances in Nonlinear Support Vector Machines

Incorporating Invariances in Nonlinear Support Vector Machines Olivier Chapelle olivier.chapelle@lip6.fr LIP6, Paris, France Biowulf Technologies Bernhard Scholkopf bernhard.schoelkopf@tuebingen.mpg.de

### Parsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing

L445 / L545 / B659 Dept. of Linguistics, Indiana University Spring 2016 1 / 46 : Overview Input: a string Output: a (single) parse tree A useful step in the process of obtaining meaning We can view the

### Parsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46.

: Overview L545 Dept. of Linguistics, Indiana University Spring 2013 Input: a string Output: a (single) parse tree A useful step in the process of obtaining meaning We can view the problem as searching

### Nearest Neighbors Methods for Support Vector Machines

Nearest Neighbors Methods for Support Vector Machines A. J. Quiroz, Dpto. de Matemáticas. Universidad de Los Andes joint work with María González-Lima, Universidad Simón Boĺıvar and Sergio A. Camelo, Universidad

### Machine Learning, Midterm Exam

10-601 Machine Learning, Midterm Exam Instructors: Tom Mitchell, Ziv Bar-Joseph Wednesday 12 th December, 2012 There are 9 questions, for a total of 100 points. This exam has 20 pages, make sure you have

### Holdout and Cross-Validation Methods Overfitting Avoidance

Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest

### Part I Week 7 Based in part on slides from textbook, slides of Susan Holmes

Part I Week 7 Based in part on slides from textbook, slides of Susan Holmes Support Vector Machine, Random Forests, Boosting December 2, 2012 1 / 1 2 / 1 Neural networks Artificial Neural networks: Networks

### NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

### Bio-molecular Event Extraction using A GA based Classifier Ensemble Technique

International Journal of Computer Information Systems and Industrial Management Applications. ISSN 2150-7988 Volume 5(2013) pp. 631-641 c MIR Labs, www.mirlabs.net/ijcisim/index.html Bio-molecular Event

### Andrew McCallum Department of Computer Science University of Massachusetts Amherst, MA

University of Massachusetts TR 04-49; July 2004 1 Collective Segmentation and Labeling of Distant Entities in Information Extraction Charles Sutton Department of Computer Science University of Massachusetts

### Learning and Inference over Constrained Output

Learning and Inference over Constrained Output Vasin Punyakanok Dan Roth Wen-tau Yih Dav Zimak Department of Computer Science University of Illinois at Urbana-Champaign {punyakan, danr, yih, davzimak}@uiuc.edu