The Geometry of Statistical Machine Translation

Size: px
Start display at page:

Download "The Geometry of Statistical Machine Translation"

Transcription

1 The Geometry of Statistical Machine Translation Presented by Rory Waite 16th of December 2015

2 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions ntroduction We provide a novel description of MERT using convex geometry 1 Convex geometry is a description of linear models 1 Ziegler Lectures on Polytopes 2 / 33

3 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions 3 / 33 Statistical Machine Translation We have a source string f We have a very large number of possible translations called hypotheses {e...} A feature function h(e, f) 2 R D that yields a D 1 column vector Features include: language models word alignments models hierarchical phrase-based models

4 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions The Linear Model Dual vector space (R D ) of 1 D weight vectors representing linear maps R D 7! R For a given weight vector w 2 (R D ) we can compute a score wh(e, f) A decoder can be represented by an argmax 2 ê(f; w) =argmax wh(e, f) e Decoding is hard, but not the focus of this talk 2 Och and Ney. Discriminative training and maximum entropy models for statistical machine translation. ACL / 33

5 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions Current Trends in SMT The trend is to use linear models with thousands or millions of features 3 Training methods include MERT, MRA, PRO, and expected BLEU Published results show that these methods give equal performance for a large number of features Systems with a large number of features are not used in evaluations Published results exhibit overtraining 3 Chiang, David. 11,001 new features for statistical machine translation 5 / 33

6 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions 6 / 33 Minimum Error Rate Training (1) Consider two K -best lists ranked by w (0) e 1 w (0) h(e 1, f 1 ) e 2 w (0) h(e 2, f 1 ) e 3 w (0) h(e 3, f 1 ) e 4 w (0) h(e 4, f 1 ) e 1 w (0) h(e 1, f 2 ) e 2 w (0) h(e 2, f 2 ) e 3 w (0) h(e 3, f 2 ) e 4 w (0) h(e 4, f 2 ) apple 1 i = 1 is the indices of the 1-bests E(f 1, f 2 ; w (0) )=E(e 1,i1, e 2,i2 )=E(e 1,1, e 2,1 )

7 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions 7 / 33 Minimum Error Rate Training (2) Consider the K -best lists reranked by w 0 e 4 w 0 h(e 4, f 1 ) e 2 w 0 h(e 2, f 1 ) e 1 w 0 h(e 1, f 1 ) e 3 w 0 h(e 3, f 1 ) e 2 w 0 h(e 2, f 2 ) e 3 w 0 h(e 3, f 2 ) e 4 w 0 h(e 4, f 2 ) e 1 w 0 h(e 1, f 2 ) apple 4 i 0 = 2 is the indices of the 1-bests E(f 1, f 2 ; w 0 )=E(e 1,i 0 1, e 2,i 0 2 )=E(e 1,4, e 2,2 )

8 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions 8 / 33 MERT using Line Optimisation Define a line w (0) + d in parameter space (R D ) Compute the model score with respect to (w (0) + d)h(e, f s ) =w (0) h(e, f s )+ dh(e, f s ) =`e( ) Define the Upper Envelope of model scores Env(f s ; )=max e {`e( ): 2 R}

9 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions 9 / 33 The Upper Envelope Env(f s ; ) `e4 `e1 `e2 `e3 E(ê(f s ; )) e4 e 3 e 1 1 2

10 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions 10 / 33 Linear Programming MERT Can h s,i = h(f s, e s,i ) maximise the model? For the sth sentence: w(h s,j h s,i ) apple 0 for 1 apple j apple K f a solution w 6= 0 exists, then the constraint set is said to be feasible f the constraints are infeasible then h s,i cannot maximise the model Feasibility can be tested with a linear program in O(D 3.5 K )

11 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions 11 / 33 Multi-Sentence LP-MERT Define i as vector that contains S elements 1 apple i s apple K Define feature vectors of the form h i = h 1,i h S,iS Test for feasibility Worst case runtime is exponential at O(K S )

12 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions Large Margin Training Define the oracle index vector î î s = argmin E(e s,is ), for all 1 apple s apple S i s The primal form of the structured SVM is: maximise y subject to w(h s,j h s, î s )+y apple 0, 1 apple j apple K, 1 apple s apple S, îs 6= j kwk = 1 Relax constraints until feasible MRA is an online variant of this approach 12 / 33

13 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions 13 / 33 Pairwise Ranking Optimisation Find a parameter vector w such that the ranking of a K -best list by model score wh s,i we s,j... follows the ranking implied by error counts E(e s,i ) apple E(e s,j ) apple... For any pair of hypotheses 1 apple i, j apple K w(h s,j h s,i ) apple 0 if E(e s,i ) E(e s,j ) 0

14 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions 14 / 33 Convex Geometry Convex geometry is the study of polytopes and their faces A polytope H s is a convex hull of feature vectors Consider a decision boundary where the polytope is fully contained in one half space A face is the intersection of the polytope with the decision boundary The parameter vector w defines the decision boundary, and thus the face

15 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions 15 / 33 Convex Geometry Two classes of faces are of interest Vertex A face consisting of a single point is called a vertex Edge A face in the form of a line segment between two vertices [h s,i, h s,j ] LP-MERT finds vertices

16 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions 16 / 33 Vertex Example h TM h LM h 2 h 4 h 5 h 1 h 3 R D

17 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions 17 / 33 Edge Example h TM h LM h 2 h 4 h 5 h 1 h 3 R D

18 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions 18 / 33 Geometry in (R D ) For a face F, the normal cone N F is the set of parameter vectors that yield the face The normal cone for a vertex is the set of feasible parameter vectors that satisfy LP-MERT The normal cone for an edge is a decision boundary in (R D ) The set of all normal cones is called the normal fan N (H s )

19 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions 19 / 33 Normal Cone for Edge 1 w TM h 2 h 4 h 5 w N [h4,h 1 ] h 1 h 4 w LM h 1 h 3 R D (R D )

20 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions 20 / 33 Normal Cone for Edge 2 w TM h 2 h 4 h 5 N [h4,h 1 ] w LM h 1 N[h 3,h 1 ] h 3 R D (R D )

21 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions 21 / 33 Normal Cone for a Vertex w TM h 2 h 4 h 5 h 1 N [h4,h 1 ] N[h 3,h 1 ] w LM N {h1 } h 3 R D (R D )

22 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions 22 / 33 The Full Normal Fan h 2 h 4 h 5 N {h2 } N[h 4,h 2 ] N {h4 } N [h4,h 1 ] h 1 N [h3,h 2 ] N[h 3,h 1 ] N {h1 } h 3 N {h3 } R D (R D )

23 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions 23 / 33 The Common Refinement The common refinement for two polytopes is: N (H s ) ^N(H t ):= {N \ N 0 : N 2N(H s ), N 0 2N(H t )} The common refinement can be extended to more than two polytopes

24 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions 24 / 33 The Minkowski Sum The Minkowski sum for two polytopes is: H s + H t := {h + h 0 : h 2 H s, h 0 2 H t } The Minkowski sum is the quantity computed in LP-MERT The common refinement and Minkowski sum have the relationship: N (H s + H t )=N (H s ) ^N(H t )

25 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions 25 / 33 Minkowski Sum Example R 2 h 1,2 h 1,3 h 2,2 h 1,2 + h 2,2 h 1,3 + h 2,2 {h 1,3 + h 2,1, h 1,2 + h 2,3 } h 1,2 + h 2,1 h 1,3 + h 2,3 h 1,1 h 1,4 h 2,1 h 2,3 h 1,1 + h 2,1 h 1,1 + h 2,2 {h 1,4 + h 2,1, h 1,1 + h 2,3 } h 1,4 + h 2,2 h 1,4 + h 2,3 H 1 H 2 H 1 + H 2 (R 2 ) N (H 1 ) N (H 2 ) N (H 1 ) ^N(H 2 )

26 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions 4 Fukuda, Komei From the zonotope construction to the Minkowski addition of convex polytopes. Journal of Symbolic Computation 26 / 33 A Minkowski Sum Algorithm Treat the Minkowski sum polytope as a graph G(V, E) V and E are the sets of vertices and edges in the polytope Start from the normal cone that contains w (0) Enumerate through the adjacent vertices and repeat Total runtime is 4 O( (D 3.5 ) V ) is the max degree of G

27 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions Upper Bounds Theorem The upper bound on V is 5 O(S D 1 K 2(D 1) ) Each vertex maps to a feasible index vector i For low dimension features, i.e. for D : S D 1 K 2(D 1) K S the optimiser is tightly constrained The optimiser cannot freely select hypothesis as the 1-best for sentences We believe this acts as an inherent form of regularisation 5 Gritzmann and Sturmfels Minkowski addition of polytopes:.... SAM Journal on Discrete Mathematics 27 / 33

28 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions 28 / 33 Upper Bounds Theorem The upper bound on V is O(S D 1 K 2(D 1) ) Recall that PRO enforces pairwise ranking Each pairwise ranking corresponds to an edge The upper bound on edges is V 2 We believe that ranking methods also benefit from this regularisation

29 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions 29 / 33 Upper Bounds Example The upper bound on V is O(S D 1 K 2(D 1) ) For the CUED WMT 13 Russian-to-English system S = 1502 K = 1000 D = 12 Only percent of the K S index vectors are feasible f D = 493 then S D 1 K 2(D 1) K S

30 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions Projected MERT MERT line optimisation can be represented as an affine projection to a M + 1 dimensional space, M < D A M+1,D h s,i = " d1 #. h s,i = h s,i d M w (0) Searches over all parameter vectors embedded in the hyperplane given by the directions 30 / 33

31 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions 31 / 33 Projected MERT Example CUED Russian-to-English entry to WMT 13 Projected to a 3-D affine subspace containing the source-to-target (UtoV) log-probability and the word insertion penalty (WP) Used the Minkowski sum implementation of Weibel (2010) Left it running for several weeks First application of multi-dimensional MERT in polynomial time

32 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions Error Surface in 2-Dimensions BLEU WP Parameter w(0) UtoV Parameter / 33

33 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions 33 / 33 Conclusions Linear models can be represented by convex geometry Overtraining is potentially a problem We could try non-linear models Neural Networks Random decision forrests

Discriminative Training

Discriminative Training Discriminative Training February 19, 2013 Noisy Channels Again p(e) source English Noisy Channels Again p(e) p(g e) source English German Noisy Channels Again p(e) p(g e) source English German decoder

More information

Decoding Revisited: Easy-Part-First & MERT. February 26, 2015

Decoding Revisited: Easy-Part-First & MERT. February 26, 2015 Decoding Revisited: Easy-Part-First & MERT February 26, 2015 Translating the Easy Part First? the tourism initiative addresses this for the first time the die tm:-0.19,lm:-0.4, d:0, all:-0.65 tourism touristische

More information

From the Zonotope Construction to the Minkowski Addition of Convex Polytopes

From the Zonotope Construction to the Minkowski Addition of Convex Polytopes From the Zonotope Construction to the Minkowski Addition of Convex Polytopes Komei Fukuda School of Computer Science, McGill University, Montreal, Canada Abstract A zonotope is the Minkowski addition of

More information

Convex Optimization and Support Vector Machine

Convex Optimization and Support Vector Machine Convex Optimization and Support Vector Machine Problem 0. Consider a two-class classification problem. The training data is L n = {(x 1, t 1 ),..., (x n, t n )}, where each t i { 1, 1} and x i R p. We

More information

Learning to translate with neural networks. Michael Auli

Learning to translate with neural networks. Michael Auli Learning to translate with neural networks Michael Auli 1 Neural networks for text processing Similar words near each other France Spain dog cat Neural networks for text processing Similar words near each

More information

Tuning as Linear Regression

Tuning as Linear Regression Tuning as Linear Regression Marzieh Bazrafshan, Tagyoung Chung and Daniel Gildea Department of Computer Science University of Rochester Rochester, NY 14627 Abstract We propose a tuning method for statistical

More information

Combinatorial Types of Tropical Eigenvector

Combinatorial Types of Tropical Eigenvector Combinatorial Types of Tropical Eigenvector arxiv:1105.55504 Ngoc Mai Tran Department of Statistics, UC Berkeley Joint work with Bernd Sturmfels 2 / 13 Tropical eigenvalues and eigenvectors Max-plus: (R,,

More information

Discriminative Training. March 4, 2014

Discriminative Training. March 4, 2014 Discriminative Training March 4, 2014 Noisy Channels Again p(e) source English Noisy Channels Again p(e) p(g e) source English German Noisy Channels Again p(e) p(g e) source English German decoder e =

More information

Theory of Alignment Generators and Applications to Statistical Machine Translation

Theory of Alignment Generators and Applications to Statistical Machine Translation Theory of Alignment Generators and Applications to Statistical Machine Translation Raghavendra Udupa U Hemanta K Mai IBM India Research Laboratory, New Delhi {uraghave, hemantkm}@inibmcom Abstract Viterbi

More information

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the

More information

Integer Programming ISE 418. Lecture 12. Dr. Ted Ralphs

Integer Programming ISE 418. Lecture 12. Dr. Ted Ralphs Integer Programming ISE 418 Lecture 12 Dr. Ted Ralphs ISE 418 Lecture 12 1 Reading for This Lecture Nemhauser and Wolsey Sections II.2.1 Wolsey Chapter 9 ISE 418 Lecture 12 2 Generating Stronger Valid

More information

Statistical Machine Translation. Part III: Search Problem. Complexity issues. DP beam-search: with single and multi-stacks

Statistical Machine Translation. Part III: Search Problem. Complexity issues. DP beam-search: with single and multi-stacks Statistical Machine Translation Marcello Federico FBK-irst Trento, Italy Galileo Galilei PhD School - University of Pisa Pisa, 7-19 May 008 Part III: Search Problem 1 Complexity issues A search: with single

More information

Minimum Error Rate Training Semiring

Minimum Error Rate Training Semiring Minimum Error Rate Training Semiring Artem Sokolov & François Yvon LIMSI-CNRS & LIMSI-CNRS/Univ. Paris Sud {artem.sokolov,francois.yvon}@limsi.fr EAMT 2011 31 May 2011 Artem Sokolov & François Yvon (LIMSI)

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Max-margin learning of GM Eric Xing Lecture 28, Apr 28, 2014 b r a c e Reading: 1 Classical Predictive Models Input and output space: Predictive

More information

Applications of Tree Automata Theory Lecture VI: Back to Machine Translation

Applications of Tree Automata Theory Lecture VI: Back to Machine Translation Applications of Tree Automata Theory Lecture VI: Back to Machine Translation Andreas Maletti Institute of Computer Science Universität Leipzig, Germany on leave from: Institute for Natural Language Processing

More information

TDT4173 Machine Learning

TDT4173 Machine Learning TDT4173 Machine Learning Lecture 3 Bagging & Boosting + SVMs Norwegian University of Science and Technology Helge Langseth IT-VEST 310 helgel@idi.ntnu.no 1 TDT4173 Machine Learning Outline 1 Ensemble-methods

More information

Nonlinear Discrete Optimization

Nonlinear Discrete Optimization Nonlinear Discrete Optimization Technion Israel Institute of Technology http://ie.technion.ac.il/~onn Billerafest 2008 - conference in honor of Lou Billera's 65th birthday (Update on Lecture Series given

More information

Phrase-Based Statistical Machine Translation with Pivot Languages

Phrase-Based Statistical Machine Translation with Pivot Languages Phrase-Based Statistical Machine Translation with Pivot Languages N. Bertoldi, M. Barbaiani, M. Federico, R. Cattoni FBK, Trento - Italy Rovira i Virgili University, Tarragona - Spain October 21st, 2008

More information

Convex Optimization and SVM

Convex Optimization and SVM Convex Optimization and SVM Problem 0. Cf lecture notes pages 12 to 18. Problem 1. (i) A slab is an intersection of two half spaces, hence convex. (ii) A wedge is an intersection of two half spaces, hence

More information

CS798: Selected topics in Machine Learning

CS798: Selected topics in Machine Learning CS798: Selected topics in Machine Learning Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS798: Selected topics in Machine Learning

More information

Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation

Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation David Vilar, Daniel Stein, Hermann Ney IWSLT 2008, Honolulu, Hawaii 20. October 2008 Human Language Technology

More information

ICS-E4030 Kernel Methods in Machine Learning

ICS-E4030 Kernel Methods in Machine Learning ICS-E4030 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 28. September, 2016 Juho Rousu 28. September, 2016 1 / 38 Convex optimization Convex optimisation This

More information

Support Vector Machines.

Support Vector Machines. Support Vector Machines www.cs.wisc.edu/~dpage 1 Goals for the lecture you should understand the following concepts the margin slack variables the linear support vector machine nonlinear SVMs the kernel

More information

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Machine Learning Support Vector Machines. Prof. Matteo Matteucci Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear

More information

Low-Dimensional Discriminative Reranking. Jagadeesh Jagarlamudi and Hal Daume III University of Maryland, College Park

Low-Dimensional Discriminative Reranking. Jagadeesh Jagarlamudi and Hal Daume III University of Maryland, College Park Low-Dimensional Discriminative Reranking Jagadeesh Jagarlamudi and Hal Daume III University of Maryland, College Park Discriminative Reranking Useful for many NLP tasks Enables us to use arbitrary features

More information

Today s class. Constrained optimization Linear programming. Prof. Jinbo Bi CSE, UConn. Numerical Methods, Fall 2011 Lecture 12

Today s class. Constrained optimization Linear programming. Prof. Jinbo Bi CSE, UConn. Numerical Methods, Fall 2011 Lecture 12 Today s class Constrained optimization Linear programming 1 Midterm Exam 1 Count: 26 Average: 73.2 Median: 72.5 Maximum: 100.0 Minimum: 45.0 Standard Deviation: 17.13 Numerical Methods Fall 2011 2 Optimization

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers Blaine Nelson, Tobias Scheffer Contents Classification Problem Bayesian Classifier Decision Linear Classifiers, MAP Models Logistic

More information

Lecture #21. c T x Ax b. maximize subject to

Lecture #21. c T x Ax b. maximize subject to COMPSCI 330: Design and Analysis of Algorithms 11/11/2014 Lecture #21 Lecturer: Debmalya Panigrahi Scribe: Samuel Haney 1 Overview In this lecture, we discuss linear programming. We first show that the

More information

TALP Phrase-Based System and TALP System Combination for the IWSLT 2006 IWSLT 2006, Kyoto

TALP Phrase-Based System and TALP System Combination for the IWSLT 2006 IWSLT 2006, Kyoto TALP Phrase-Based System and TALP System Combination for the IWSLT 2006 IWSLT 2006, Kyoto Marta R. Costa-jussà, Josep M. Crego, Adrià de Gispert, Patrik Lambert, Maxim Khalilov, José A.R. Fonollosa, José

More information

Machine Learning for Structured Prediction

Machine Learning for Structured Prediction Machine Learning for Structured Prediction Grzegorz Chrupa la National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Grzegorz Chrupa la (DCU) Machine Learning for

More information

Foundation of Intelligent Systems, Part I. SVM s & Kernel Methods

Foundation of Intelligent Systems, Part I. SVM s & Kernel Methods Foundation of Intelligent Systems, Part I SVM s & Kernel Methods mcuturi@i.kyoto-u.ac.jp FIS - 2013 1 Support Vector Machines The linearly-separable case FIS - 2013 2 A criterion to select a linear classifier:

More information

CS-E4830 Kernel Methods in Machine Learning

CS-E4830 Kernel Methods in Machine Learning CS-E4830 Kernel Methods in Machine Learning Lecture 5: Multi-class and preference learning Juho Rousu 11. October, 2017 Juho Rousu 11. October, 2017 1 / 37 Agenda from now on: This week s theme: going

More information

Homework 3. Convex Optimization /36-725

Homework 3. Convex Optimization /36-725 Homework 3 Convex Optimization 10-725/36-725 Due Friday October 14 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

Chapter 1. Preliminaries

Chapter 1. Preliminaries Introduction This dissertation is a reading of chapter 4 in part I of the book : Integer and Combinatorial Optimization by George L. Nemhauser & Laurence A. Wolsey. The chapter elaborates links between

More information

Cutting Plane Training of Structural SVM

Cutting Plane Training of Structural SVM Cutting Plane Training of Structural SVM Seth Neel University of Pennsylvania sethneel@wharton.upenn.edu September 28, 2017 Seth Neel (Penn) Short title September 28, 2017 1 / 33 Overview Structural SVMs

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Support vector machines (SVMs) are one of the central concepts in all of machine learning. They are simply a combination of two ideas: linear classification via maximum (or optimal

More information

Automatic Speech Recognition and Statistical Machine Translation under Uncertainty

Automatic Speech Recognition and Statistical Machine Translation under Uncertainty Outlines Automatic Speech Recognition and Statistical Machine Translation under Uncertainty Lambert Mathias Advisor: Prof. William Byrne Thesis Committee: Prof. Gerard Meyer, Prof. Trac Tran and Prof.

More information

Computing Lattice BLEU Oracle Scores for Machine Translation

Computing Lattice BLEU Oracle Scores for Machine Translation Computing Lattice Oracle Scores for Machine Translation Artem Sokolov & Guillaume Wisniewski & François Yvon {firstname.lastname}@limsi.fr LIMSI, Orsay, France 1 Introduction 2 Oracle Decoding Task 3 Proposed

More information

Efficient Path Counting Transducers for Minimum Bayes-Risk Decoding of Statistical Machine Translation Lattices

Efficient Path Counting Transducers for Minimum Bayes-Risk Decoding of Statistical Machine Translation Lattices Efficient Path Counting Transducers for Minimum Bayes-Risk Decoding of Statistical Machine Translation Lattices Graeme Blackwood, Adrià de Gispert, William Byrne Machine Intelligence Laboratory Cambridge

More information

Support Vector Machines: Kernels

Support Vector Machines: Kernels Support Vector Machines: Kernels CS6780 Advanced Machine Learning Spring 2015 Thorsten Joachims Cornell University Reading: Murphy 14.1, 14.2, 14.4 Schoelkopf/Smola Chapter 7.4, 7.6, 7.8 Non-Linear Problems

More information

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2

More information

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM 1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University

More information

INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS. Jan Tore Lønning, Lecture 3, 7 Sep., 2016

INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS. Jan Tore Lønning, Lecture 3, 7 Sep., 2016 1 INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS Jan Tore Lønning, Lecture 3, 7 Sep., 2016 jtl@ifi.uio.no Machine Translation Evaluation 2 1. Automatic MT-evaluation: 1. BLEU 2. Alternatives 3. Evaluation

More information

7. Lecture notes on the ellipsoid algorithm

7. Lecture notes on the ellipsoid algorithm Massachusetts Institute of Technology Michel X. Goemans 18.433: Combinatorial Optimization 7. Lecture notes on the ellipsoid algorithm The simplex algorithm was the first algorithm proposed for linear

More information

Support Vector Machine II

Support Vector Machine II Support Vector Machine II Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative HW 1 due tonight HW 2 released. Online Scalable Learning Adaptive to Unknown Dynamics and Graphs Yanning

More information

statistical machine translation

statistical machine translation statistical machine translation P A R T 3 : D E C O D I N G & E V A L U A T I O N CSC401/2511 Natural Language Computing Spring 2019 Lecture 6 Frank Rudzicz and Chloé Pou-Prom 1 University of Toronto Statistical

More information

Optimization WS 13/14:, by Y. Goldstein/K. Reinert, 9. Dezember 2013, 16: Linear programming. Optimization Problems

Optimization WS 13/14:, by Y. Goldstein/K. Reinert, 9. Dezember 2013, 16: Linear programming. Optimization Problems Optimization WS 13/14:, by Y. Goldstein/K. Reinert, 9. Dezember 2013, 16:38 2001 Linear programming Optimization Problems General optimization problem max{z(x) f j (x) 0,x D} or min{z(x) f j (x) 0,x D}

More information

arxiv: v1 [cs.cg] 3 Dec 2014

arxiv: v1 [cs.cg] 3 Dec 2014 Combinatorial Redundancy Detection Komei Fukuda 1, Bernd Gärtner 2, and May Szedlák 3 1 Department of Mathematics and Institute of Theoretical Computer Science ETH Zürich, CH-8092 Zürich, Switzerland komeifukuda@mathethzch

More information

Lecture 5. Theorems of Alternatives and Self-Dual Embedding

Lecture 5. Theorems of Alternatives and Self-Dual Embedding IE 8534 1 Lecture 5. Theorems of Alternatives and Self-Dual Embedding IE 8534 2 A system of linear equations may not have a solution. It is well known that either Ax = c has a solution, or A T y = 0, c

More information

Support Vector Machines. Machine Learning Fall 2017

Support Vector Machines. Machine Learning Fall 2017 Support Vector Machines Machine Learning Fall 2017 1 Where are we? Learning algorithms Decision Trees Perceptron AdaBoost 2 Where are we? Learning algorithms Decision Trees Perceptron AdaBoost Produce

More information

Jeff Howbert Introduction to Machine Learning Winter

Jeff Howbert Introduction to Machine Learning Winter Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable

More information

Algorithms for NLP. Machine Translation II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley

Algorithms for NLP. Machine Translation II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Algorithms for NLP Machine Translation II Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Announcements Project 4: Word Alignment! Will be released soon! (~Monday) Phrase-Based System Overview

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Separation, Inverse Optimization, and Decomposition. Some Observations. Ted Ralphs 1 Joint work with: Aykut Bulut 1

Separation, Inverse Optimization, and Decomposition. Some Observations. Ted Ralphs 1 Joint work with: Aykut Bulut 1 : Some Observations Ted Ralphs 1 Joint work with: Aykut Bulut 1 1 COR@L Lab, Department of Industrial and Systems Engineering, Lehigh University MOA 2016, Beijing, China, 27 June 2016 What Is This Talk

More information

Tightness of LP Relaxations for Almost Balanced Models

Tightness of LP Relaxations for Almost Balanced Models Tightness of LP Relaxations for Almost Balanced Models Adrian Weller University of Cambridge AISTATS May 10, 2016 Joint work with Mark Rowland and David Sontag For more information, see http://mlg.eng.cam.ac.uk/adrian/

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Triplet Lexicon Models for Statistical Machine Translation

Triplet Lexicon Models for Statistical Machine Translation Triplet Lexicon Models for Statistical Machine Translation Saša Hasan, Juri Ganitkevitch, Hermann Ney and Jesús Andrés Ferrer lastname@cs.rwth-aachen.de CLSP Student Seminar February 6, 2009 Human Language

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Reading: Ben-Hur & Weston, A User s Guide to Support Vector Machines (linked from class web page) Notation Assume a binary classification problem. Instances are represented by vector

More information

COL 352 Introduction to Automata and Theory of Computation Major Exam, Sem II , Max 80, Time 2 hr. Name Entry No. Group

COL 352 Introduction to Automata and Theory of Computation Major Exam, Sem II , Max 80, Time 2 hr. Name Entry No. Group COL 352 Introduction to Automata and Theory of Computation Major Exam, Sem II 2015-16, Max 80, Time 2 hr Name Entry No. Group Note (i) Write your answers neatly and precisely in the space provided with

More information

Lecture notes on the ellipsoid algorithm

Lecture notes on the ellipsoid algorithm Massachusetts Institute of Technology Handout 1 18.433: Combinatorial Optimization May 14th, 007 Michel X. Goemans Lecture notes on the ellipsoid algorithm The simplex algorithm was the first algorithm

More information

Multi-class SVMs. Lecture 17: Aykut Erdem April 2016 Hacettepe University

Multi-class SVMs. Lecture 17: Aykut Erdem April 2016 Hacettepe University Multi-class SVMs Lecture 17: Aykut Erdem April 2016 Hacettepe University Administrative We will have a make-up lecture on Saturday April 23, 2016. Project progress reports are due April 21, 2016 2 days

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

A Discriminative Model for Semantics-to-String Translation

A Discriminative Model for Semantics-to-String Translation A Discriminative Model for Semantics-to-String Translation Aleš Tamchyna 1 and Chris Quirk 2 and Michel Galley 2 1 Charles University in Prague 2 Microsoft Research July 30, 2015 Tamchyna, Quirk, Galley

More information

Week 8. 1 LP is easy: the Ellipsoid Method

Week 8. 1 LP is easy: the Ellipsoid Method Week 8 1 LP is easy: the Ellipsoid Method In 1979 Khachyan proved that LP is solvable in polynomial time by a method of shrinking ellipsoids. The running time is polynomial in the number of variables n,

More information

Lectures 6, 7 and part of 8

Lectures 6, 7 and part of 8 Lectures 6, 7 and part of 8 Uriel Feige April 26, May 3, May 10, 2015 1 Linear programming duality 1.1 The diet problem revisited Recall the diet problem from Lecture 1. There are n foods, m nutrients,

More information

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall

More information

Natural Language Processing Prof. Pawan Goyal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Natural Language Processing Prof. Pawan Goyal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Natural Language Processing Prof. Pawan Goyal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 18 Maximum Entropy Models I Welcome back for the 3rd module

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 477 Instructor: Tony Jebara Topic 5 Generalization Guarantees VC-Dimension Nearest Neighbor Classification (infinite VC dimension) Structural Risk Minimization Support Vector Machines

More information

CS599: Convex and Combinatorial Optimization Fall 2013 Lecture 17: Combinatorial Problems as Linear Programs III. Instructor: Shaddin Dughmi

CS599: Convex and Combinatorial Optimization Fall 2013 Lecture 17: Combinatorial Problems as Linear Programs III. Instructor: Shaddin Dughmi CS599: Convex and Combinatorial Optimization Fall 2013 Lecture 17: Combinatorial Problems as Linear Programs III Instructor: Shaddin Dughmi Announcements Today: Spanning Trees and Flows Flexibility awarded

More information

MAT-INF4110/MAT-INF9110 Mathematical optimization

MAT-INF4110/MAT-INF9110 Mathematical optimization MAT-INF4110/MAT-INF9110 Mathematical optimization Geir Dahl August 20, 2013 Convexity Part IV Chapter 4 Representation of convex sets different representations of convex sets, boundary polyhedra and polytopes:

More information

Chapter 1 Linear Programming. Paragraph 5 Duality

Chapter 1 Linear Programming. Paragraph 5 Duality Chapter 1 Linear Programming Paragraph 5 Duality What we did so far We developed the 2-Phase Simplex Algorithm: Hop (reasonably) from basic solution (bs) to bs until you find a basic feasible solution

More information

Out of GIZA Efficient Word Alignment Models for SMT

Out of GIZA Efficient Word Alignment Models for SMT Out of GIZA Efficient Word Alignment Models for SMT Yanjun Ma National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Series March 4, 2009 Y. Ma (DCU) Out of Giza

More information

Support vector machines Lecture 4

Support vector machines Lecture 4 Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The

More information

Submodular Functions Properties Algorithms Machine Learning

Submodular Functions Properties Algorithms Machine Learning Submodular Functions Properties Algorithms Machine Learning Rémi Gilleron Inria Lille - Nord Europe & LIFL & Univ Lille Jan. 12 revised Aug. 14 Rémi Gilleron (Mostrare) Submodular Functions Jan. 12 revised

More information

The Ellipsoid Algorithm

The Ellipsoid Algorithm The Ellipsoid Algorithm John E. Mitchell Department of Mathematical Sciences RPI, Troy, NY 12180 USA 9 February 2018 Mitchell The Ellipsoid Algorithm 1 / 28 Introduction Outline 1 Introduction 2 Assumptions

More information

Aspects of Tree-Based Statistical Machine Translation

Aspects of Tree-Based Statistical Machine Translation Aspects of Tree-Based Statistical Machine Translation Marcello Federico Human Language Technology FBK 2014 Outline Tree-based translation models: Synchronous context free grammars Hierarchical phrase-based

More information

Decoding and Inference with Syntactic Translation Models

Decoding and Inference with Syntactic Translation Models Decoding and Inference with Syntactic Translation Models March 5, 2013 CFGs S NP VP VP NP V V NP NP CFGs S NP VP S VP NP V V NP NP CFGs S NP VP S VP NP V NP VP V NP NP CFGs S NP VP S VP NP V NP VP V NP

More information

Using the Johnson-Lindenstrauss lemma in linear and integer programming

Using the Johnson-Lindenstrauss lemma in linear and integer programming Using the Johnson-Lindenstrauss lemma in linear and integer programming Vu Khac Ky 1, Pierre-Louis Poirion, Leo Liberti LIX, École Polytechnique, F-91128 Palaiseau, France Email:{vu,poirion,liberti}@lix.polytechnique.fr

More information

Convex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs)

Convex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs) ORF 523 Lecture 8 Princeton University Instructor: A.A. Ahmadi Scribe: G. Hall Any typos should be emailed to a a a@princeton.edu. 1 Outline Convexity-preserving operations Convex envelopes, cardinality

More information

Contents. 1 Introduction. 1.1 History of Optimization ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016

Contents. 1 Introduction. 1.1 History of Optimization ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016 ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016 LECTURERS: NAMAN AGARWAL AND BRIAN BULLINS SCRIBE: KIRAN VODRAHALLI Contents 1 Introduction 1 1.1 History of Optimization.....................................

More information

CO 250 Final Exam Guide

CO 250 Final Exam Guide Spring 2017 CO 250 Final Exam Guide TABLE OF CONTENTS richardwu.ca CO 250 Final Exam Guide Introduction to Optimization Kanstantsin Pashkovich Spring 2017 University of Waterloo Last Revision: March 4,

More information

1 Matroid intersection

1 Matroid intersection CS 369P: Polyhedral techniques in combinatorial optimization Instructor: Jan Vondrák Lecture date: October 21st, 2010 Scribe: Bernd Bandemer 1 Matroid intersection Given two matroids M 1 = (E, I 1 ) and

More information

Support Vector Machines: Maximum Margin Classifiers

Support Vector Machines: Maximum Margin Classifiers Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang 1 Outline What is behind

More information

Lecture 25: Cook s Theorem (1997) Steven Skiena. skiena

Lecture 25: Cook s Theorem (1997) Steven Skiena.   skiena Lecture 25: Cook s Theorem (1997) Steven Skiena Department of Computer Science State University of New York Stony Brook, NY 11794 4400 http://www.cs.sunysb.edu/ skiena Prove that Hamiltonian Path is NP

More information

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 18

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 18 EE/ACM 150 - Applications of Convex Optimization in Signal Processing and Communications Lecture 18 Andre Tkacenko Signal Processing Research Group Jet Propulsion Laboratory May 31, 2012 Andre Tkacenko

More information

3. Linear Programming and Polyhedral Combinatorics

3. Linear Programming and Polyhedral Combinatorics Massachusetts Institute of Technology 18.433: Combinatorial Optimization Michel X. Goemans February 28th, 2013 3. Linear Programming and Polyhedral Combinatorics Summary of what was seen in the introductory

More information

Lecture 8: Complete Problems for Other Complexity Classes

Lecture 8: Complete Problems for Other Complexity Classes IAS/PCMI Summer Session 2000 Clay Mathematics Undergraduate Program Basic Course on Computational Complexity Lecture 8: Complete Problems for Other Complexity Classes David Mix Barrington and Alexis Maciel

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models David Sontag New York University Lecture 6, March 7, 2013 David Sontag (NYU) Graphical Models Lecture 6, March 7, 2013 1 / 25 Today s lecture 1 Dual decomposition 2 MAP inference

More information

Introduction to Integer Programming

Introduction to Integer Programming Lecture 3/3/2006 p. /27 Introduction to Integer Programming Leo Liberti LIX, École Polytechnique liberti@lix.polytechnique.fr Lecture 3/3/2006 p. 2/27 Contents IP formulations and examples Total unimodularity

More information

Theory of Alignment Generators and Applications to Statistical Machine Translation

Theory of Alignment Generators and Applications to Statistical Machine Translation Theory of Alignment Generators and Applications to Statistical Machine Translation Hemanta K Maji Raghavendra Udupa U IBM India Research Laboratory, New Delhi {hemantkm, uraghave}@inibmcom Abstract Viterbi

More information

Solving the MWT. Recall the ILP for the MWT. We can obtain a solution to the MWT problem by solving the following ILP:

Solving the MWT. Recall the ILP for the MWT. We can obtain a solution to the MWT problem by solving the following ILP: Solving the MWT Recall the ILP for the MWT. We can obtain a solution to the MWT problem by solving the following ILP: max subject to e i E ω i x i e i C E x i {0, 1} x i C E 1 for all critical mixed cycles

More information

Natural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley

Natural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley Natural Language Processing Classification Classification I Dan Klein UC Berkeley Classification Automatically make a decision about inputs Example: document category Example: image of digit digit Example:

More information

ORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation

ORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation ORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation Chin-Yew Lin and Franz Josef Och Information Sciences Institute University of Southern California 4676 Admiralty Way

More information

Decomposition Methods for Large Scale LP Decoding

Decomposition Methods for Large Scale LP Decoding Decomposition Methods for Large Scale LP Decoding Siddharth Barman Joint work with Xishuo Liu, Stark Draper, and Ben Recht Outline Background and Problem Setup LP Decoding Formulation Optimization Framework

More information

Linear, Binary SVM Classifiers

Linear, Binary SVM Classifiers Linear, Binary SVM Classifiers COMPSCI 37D Machine Learning COMPSCI 37D Machine Learning Linear, Binary SVM Classifiers / 6 Outline What Linear, Binary SVM Classifiers Do 2 Margin I 3 Loss and Regularized

More information

This kind of reordering is beyond the power of finite transducers, but a synchronous CFG can do this.

This kind of reordering is beyond the power of finite transducers, but a synchronous CFG can do this. Chapter 12 Synchronous CFGs Synchronous context-free grammars are a generalization of CFGs that generate pairs of related strings instead of single strings. They are useful in many situations where one

More information

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725 Karush-Kuhn-Tucker Conditions Lecturer: Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given a minimization problem Last time: duality min x subject to f(x) h i (x) 0, i = 1,... m l j (x) = 0, j =

More information

15-850: Advanced Algorithms CMU, Fall 2018 HW #4 (out October 17, 2018) Due: October 28, 2018

15-850: Advanced Algorithms CMU, Fall 2018 HW #4 (out October 17, 2018) Due: October 28, 2018 15-850: Advanced Algorithms CMU, Fall 2018 HW #4 (out October 17, 2018) Due: October 28, 2018 Usual rules. :) Exercises 1. Lots of Flows. Suppose you wanted to find an approximate solution to the following

More information

3. Linear Programming and Polyhedral Combinatorics

3. Linear Programming and Polyhedral Combinatorics Massachusetts Institute of Technology 18.453: Combinatorial Optimization Michel X. Goemans April 5, 2017 3. Linear Programming and Polyhedral Combinatorics Summary of what was seen in the introductory

More information