The Geometry of Statistical Machine Translation
|
|
- James Cunningham
- 5 years ago
- Views:
Transcription
1 The Geometry of Statistical Machine Translation Presented by Rory Waite 16th of December 2015
2 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions ntroduction We provide a novel description of MERT using convex geometry 1 Convex geometry is a description of linear models 1 Ziegler Lectures on Polytopes 2 / 33
3 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions 3 / 33 Statistical Machine Translation We have a source string f We have a very large number of possible translations called hypotheses {e...} A feature function h(e, f) 2 R D that yields a D 1 column vector Features include: language models word alignments models hierarchical phrase-based models
4 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions The Linear Model Dual vector space (R D ) of 1 D weight vectors representing linear maps R D 7! R For a given weight vector w 2 (R D ) we can compute a score wh(e, f) A decoder can be represented by an argmax 2 ê(f; w) =argmax wh(e, f) e Decoding is hard, but not the focus of this talk 2 Och and Ney. Discriminative training and maximum entropy models for statistical machine translation. ACL / 33
5 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions Current Trends in SMT The trend is to use linear models with thousands or millions of features 3 Training methods include MERT, MRA, PRO, and expected BLEU Published results show that these methods give equal performance for a large number of features Systems with a large number of features are not used in evaluations Published results exhibit overtraining 3 Chiang, David. 11,001 new features for statistical machine translation 5 / 33
6 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions 6 / 33 Minimum Error Rate Training (1) Consider two K -best lists ranked by w (0) e 1 w (0) h(e 1, f 1 ) e 2 w (0) h(e 2, f 1 ) e 3 w (0) h(e 3, f 1 ) e 4 w (0) h(e 4, f 1 ) e 1 w (0) h(e 1, f 2 ) e 2 w (0) h(e 2, f 2 ) e 3 w (0) h(e 3, f 2 ) e 4 w (0) h(e 4, f 2 ) apple 1 i = 1 is the indices of the 1-bests E(f 1, f 2 ; w (0) )=E(e 1,i1, e 2,i2 )=E(e 1,1, e 2,1 )
7 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions 7 / 33 Minimum Error Rate Training (2) Consider the K -best lists reranked by w 0 e 4 w 0 h(e 4, f 1 ) e 2 w 0 h(e 2, f 1 ) e 1 w 0 h(e 1, f 1 ) e 3 w 0 h(e 3, f 1 ) e 2 w 0 h(e 2, f 2 ) e 3 w 0 h(e 3, f 2 ) e 4 w 0 h(e 4, f 2 ) e 1 w 0 h(e 1, f 2 ) apple 4 i 0 = 2 is the indices of the 1-bests E(f 1, f 2 ; w 0 )=E(e 1,i 0 1, e 2,i 0 2 )=E(e 1,4, e 2,2 )
8 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions 8 / 33 MERT using Line Optimisation Define a line w (0) + d in parameter space (R D ) Compute the model score with respect to (w (0) + d)h(e, f s ) =w (0) h(e, f s )+ dh(e, f s ) =`e( ) Define the Upper Envelope of model scores Env(f s ; )=max e {`e( ): 2 R}
9 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions 9 / 33 The Upper Envelope Env(f s ; ) `e4 `e1 `e2 `e3 E(ê(f s ; )) e4 e 3 e 1 1 2
10 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions 10 / 33 Linear Programming MERT Can h s,i = h(f s, e s,i ) maximise the model? For the sth sentence: w(h s,j h s,i ) apple 0 for 1 apple j apple K f a solution w 6= 0 exists, then the constraint set is said to be feasible f the constraints are infeasible then h s,i cannot maximise the model Feasibility can be tested with a linear program in O(D 3.5 K )
11 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions 11 / 33 Multi-Sentence LP-MERT Define i as vector that contains S elements 1 apple i s apple K Define feature vectors of the form h i = h 1,i h S,iS Test for feasibility Worst case runtime is exponential at O(K S )
12 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions Large Margin Training Define the oracle index vector î î s = argmin E(e s,is ), for all 1 apple s apple S i s The primal form of the structured SVM is: maximise y subject to w(h s,j h s, î s )+y apple 0, 1 apple j apple K, 1 apple s apple S, îs 6= j kwk = 1 Relax constraints until feasible MRA is an online variant of this approach 12 / 33
13 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions 13 / 33 Pairwise Ranking Optimisation Find a parameter vector w such that the ranking of a K -best list by model score wh s,i we s,j... follows the ranking implied by error counts E(e s,i ) apple E(e s,j ) apple... For any pair of hypotheses 1 apple i, j apple K w(h s,j h s,i ) apple 0 if E(e s,i ) E(e s,j ) 0
14 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions 14 / 33 Convex Geometry Convex geometry is the study of polytopes and their faces A polytope H s is a convex hull of feature vectors Consider a decision boundary where the polytope is fully contained in one half space A face is the intersection of the polytope with the decision boundary The parameter vector w defines the decision boundary, and thus the face
15 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions 15 / 33 Convex Geometry Two classes of faces are of interest Vertex A face consisting of a single point is called a vertex Edge A face in the form of a line segment between two vertices [h s,i, h s,j ] LP-MERT finds vertices
16 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions 16 / 33 Vertex Example h TM h LM h 2 h 4 h 5 h 1 h 3 R D
17 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions 17 / 33 Edge Example h TM h LM h 2 h 4 h 5 h 1 h 3 R D
18 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions 18 / 33 Geometry in (R D ) For a face F, the normal cone N F is the set of parameter vectors that yield the face The normal cone for a vertex is the set of feasible parameter vectors that satisfy LP-MERT The normal cone for an edge is a decision boundary in (R D ) The set of all normal cones is called the normal fan N (H s )
19 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions 19 / 33 Normal Cone for Edge 1 w TM h 2 h 4 h 5 w N [h4,h 1 ] h 1 h 4 w LM h 1 h 3 R D (R D )
20 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions 20 / 33 Normal Cone for Edge 2 w TM h 2 h 4 h 5 N [h4,h 1 ] w LM h 1 N[h 3,h 1 ] h 3 R D (R D )
21 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions 21 / 33 Normal Cone for a Vertex w TM h 2 h 4 h 5 h 1 N [h4,h 1 ] N[h 3,h 1 ] w LM N {h1 } h 3 R D (R D )
22 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions 22 / 33 The Full Normal Fan h 2 h 4 h 5 N {h2 } N[h 4,h 2 ] N {h4 } N [h4,h 1 ] h 1 N [h3,h 2 ] N[h 3,h 1 ] N {h1 } h 3 N {h3 } R D (R D )
23 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions 23 / 33 The Common Refinement The common refinement for two polytopes is: N (H s ) ^N(H t ):= {N \ N 0 : N 2N(H s ), N 0 2N(H t )} The common refinement can be extended to more than two polytopes
24 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions 24 / 33 The Minkowski Sum The Minkowski sum for two polytopes is: H s + H t := {h + h 0 : h 2 H s, h 0 2 H t } The Minkowski sum is the quantity computed in LP-MERT The common refinement and Minkowski sum have the relationship: N (H s + H t )=N (H s ) ^N(H t )
25 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions 25 / 33 Minkowski Sum Example R 2 h 1,2 h 1,3 h 2,2 h 1,2 + h 2,2 h 1,3 + h 2,2 {h 1,3 + h 2,1, h 1,2 + h 2,3 } h 1,2 + h 2,1 h 1,3 + h 2,3 h 1,1 h 1,4 h 2,1 h 2,3 h 1,1 + h 2,1 h 1,1 + h 2,2 {h 1,4 + h 2,1, h 1,1 + h 2,3 } h 1,4 + h 2,2 h 1,4 + h 2,3 H 1 H 2 H 1 + H 2 (R 2 ) N (H 1 ) N (H 2 ) N (H 1 ) ^N(H 2 )
26 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions 4 Fukuda, Komei From the zonotope construction to the Minkowski addition of convex polytopes. Journal of Symbolic Computation 26 / 33 A Minkowski Sum Algorithm Treat the Minkowski sum polytope as a graph G(V, E) V and E are the sets of vertices and edges in the polytope Start from the normal cone that contains w (0) Enumerate through the adjacent vertices and repeat Total runtime is 4 O( (D 3.5 ) V ) is the max degree of G
27 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions Upper Bounds Theorem The upper bound on V is 5 O(S D 1 K 2(D 1) ) Each vertex maps to a feasible index vector i For low dimension features, i.e. for D : S D 1 K 2(D 1) K S the optimiser is tightly constrained The optimiser cannot freely select hypothesis as the 1-best for sentences We believe this acts as an inherent form of regularisation 5 Gritzmann and Sturmfels Minkowski addition of polytopes:.... SAM Journal on Discrete Mathematics 27 / 33
28 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions 28 / 33 Upper Bounds Theorem The upper bound on V is O(S D 1 K 2(D 1) ) Recall that PRO enforces pairwise ranking Each pairwise ranking corresponds to an edge The upper bound on edges is V 2 We believe that ranking methods also benefit from this regularisation
29 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions 29 / 33 Upper Bounds Example The upper bound on V is O(S D 1 K 2(D 1) ) For the CUED WMT 13 Russian-to-English system S = 1502 K = 1000 D = 12 Only percent of the K S index vectors are feasible f D = 493 then S D 1 K 2(D 1) K S
30 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions Projected MERT MERT line optimisation can be represented as an affine projection to a M + 1 dimensional space, M < D A M+1,D h s,i = " d1 #. h s,i = h s,i d M w (0) Searches over all parameter vectors embedded in the hyperplane given by the directions 30 / 33
31 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions 31 / 33 Projected MERT Example CUED Russian-to-English entry to WMT 13 Projected to a 3-D affine subspace containing the source-to-target (UtoV) log-probability and the word insertion penalty (WP) Used the Minkowski sum implementation of Weibel (2010) Left it running for several weeks First application of multi-dimensional MERT in polynomial time
32 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions Error Surface in 2-Dimensions BLEU WP Parameter w(0) UtoV Parameter / 33
33 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions 33 / 33 Conclusions Linear models can be represented by convex geometry Overtraining is potentially a problem We could try non-linear models Neural Networks Random decision forrests
Discriminative Training
Discriminative Training February 19, 2013 Noisy Channels Again p(e) source English Noisy Channels Again p(e) p(g e) source English German Noisy Channels Again p(e) p(g e) source English German decoder
More informationDecoding Revisited: Easy-Part-First & MERT. February 26, 2015
Decoding Revisited: Easy-Part-First & MERT February 26, 2015 Translating the Easy Part First? the tourism initiative addresses this for the first time the die tm:-0.19,lm:-0.4, d:0, all:-0.65 tourism touristische
More informationFrom the Zonotope Construction to the Minkowski Addition of Convex Polytopes
From the Zonotope Construction to the Minkowski Addition of Convex Polytopes Komei Fukuda School of Computer Science, McGill University, Montreal, Canada Abstract A zonotope is the Minkowski addition of
More informationConvex Optimization and Support Vector Machine
Convex Optimization and Support Vector Machine Problem 0. Consider a two-class classification problem. The training data is L n = {(x 1, t 1 ),..., (x n, t n )}, where each t i { 1, 1} and x i R p. We
More informationLearning to translate with neural networks. Michael Auli
Learning to translate with neural networks Michael Auli 1 Neural networks for text processing Similar words near each other France Spain dog cat Neural networks for text processing Similar words near each
More informationTuning as Linear Regression
Tuning as Linear Regression Marzieh Bazrafshan, Tagyoung Chung and Daniel Gildea Department of Computer Science University of Rochester Rochester, NY 14627 Abstract We propose a tuning method for statistical
More informationCombinatorial Types of Tropical Eigenvector
Combinatorial Types of Tropical Eigenvector arxiv:1105.55504 Ngoc Mai Tran Department of Statistics, UC Berkeley Joint work with Bernd Sturmfels 2 / 13 Tropical eigenvalues and eigenvectors Max-plus: (R,,
More informationDiscriminative Training. March 4, 2014
Discriminative Training March 4, 2014 Noisy Channels Again p(e) source English Noisy Channels Again p(e) p(g e) source English German Noisy Channels Again p(e) p(g e) source English German decoder e =
More informationTheory of Alignment Generators and Applications to Statistical Machine Translation
Theory of Alignment Generators and Applications to Statistical Machine Translation Raghavendra Udupa U Hemanta K Mai IBM India Research Laboratory, New Delhi {uraghave, hemantkm}@inibmcom Abstract Viterbi
More informationLinear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction
Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the
More informationInteger Programming ISE 418. Lecture 12. Dr. Ted Ralphs
Integer Programming ISE 418 Lecture 12 Dr. Ted Ralphs ISE 418 Lecture 12 1 Reading for This Lecture Nemhauser and Wolsey Sections II.2.1 Wolsey Chapter 9 ISE 418 Lecture 12 2 Generating Stronger Valid
More informationStatistical Machine Translation. Part III: Search Problem. Complexity issues. DP beam-search: with single and multi-stacks
Statistical Machine Translation Marcello Federico FBK-irst Trento, Italy Galileo Galilei PhD School - University of Pisa Pisa, 7-19 May 008 Part III: Search Problem 1 Complexity issues A search: with single
More informationMinimum Error Rate Training Semiring
Minimum Error Rate Training Semiring Artem Sokolov & François Yvon LIMSI-CNRS & LIMSI-CNRS/Univ. Paris Sud {artem.sokolov,francois.yvon}@limsi.fr EAMT 2011 31 May 2011 Artem Sokolov & François Yvon (LIMSI)
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Max-margin learning of GM Eric Xing Lecture 28, Apr 28, 2014 b r a c e Reading: 1 Classical Predictive Models Input and output space: Predictive
More informationApplications of Tree Automata Theory Lecture VI: Back to Machine Translation
Applications of Tree Automata Theory Lecture VI: Back to Machine Translation Andreas Maletti Institute of Computer Science Universität Leipzig, Germany on leave from: Institute for Natural Language Processing
More informationTDT4173 Machine Learning
TDT4173 Machine Learning Lecture 3 Bagging & Boosting + SVMs Norwegian University of Science and Technology Helge Langseth IT-VEST 310 helgel@idi.ntnu.no 1 TDT4173 Machine Learning Outline 1 Ensemble-methods
More informationNonlinear Discrete Optimization
Nonlinear Discrete Optimization Technion Israel Institute of Technology http://ie.technion.ac.il/~onn Billerafest 2008 - conference in honor of Lou Billera's 65th birthday (Update on Lecture Series given
More informationPhrase-Based Statistical Machine Translation with Pivot Languages
Phrase-Based Statistical Machine Translation with Pivot Languages N. Bertoldi, M. Barbaiani, M. Federico, R. Cattoni FBK, Trento - Italy Rovira i Virgili University, Tarragona - Spain October 21st, 2008
More informationConvex Optimization and SVM
Convex Optimization and SVM Problem 0. Cf lecture notes pages 12 to 18. Problem 1. (i) A slab is an intersection of two half spaces, hence convex. (ii) A wedge is an intersection of two half spaces, hence
More informationCS798: Selected topics in Machine Learning
CS798: Selected topics in Machine Learning Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS798: Selected topics in Machine Learning
More informationAnalysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation
Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation David Vilar, Daniel Stein, Hermann Ney IWSLT 2008, Honolulu, Hawaii 20. October 2008 Human Language Technology
More informationICS-E4030 Kernel Methods in Machine Learning
ICS-E4030 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 28. September, 2016 Juho Rousu 28. September, 2016 1 / 38 Convex optimization Convex optimisation This
More informationSupport Vector Machines.
Support Vector Machines www.cs.wisc.edu/~dpage 1 Goals for the lecture you should understand the following concepts the margin slack variables the linear support vector machine nonlinear SVMs the kernel
More informationMachine Learning Support Vector Machines. Prof. Matteo Matteucci
Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear
More informationLow-Dimensional Discriminative Reranking. Jagadeesh Jagarlamudi and Hal Daume III University of Maryland, College Park
Low-Dimensional Discriminative Reranking Jagadeesh Jagarlamudi and Hal Daume III University of Maryland, College Park Discriminative Reranking Useful for many NLP tasks Enables us to use arbitrary features
More informationToday s class. Constrained optimization Linear programming. Prof. Jinbo Bi CSE, UConn. Numerical Methods, Fall 2011 Lecture 12
Today s class Constrained optimization Linear programming 1 Midterm Exam 1 Count: 26 Average: 73.2 Median: 72.5 Maximum: 100.0 Minimum: 45.0 Standard Deviation: 17.13 Numerical Methods Fall 2011 2 Optimization
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer
Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers Blaine Nelson, Tobias Scheffer Contents Classification Problem Bayesian Classifier Decision Linear Classifiers, MAP Models Logistic
More informationLecture #21. c T x Ax b. maximize subject to
COMPSCI 330: Design and Analysis of Algorithms 11/11/2014 Lecture #21 Lecturer: Debmalya Panigrahi Scribe: Samuel Haney 1 Overview In this lecture, we discuss linear programming. We first show that the
More informationTALP Phrase-Based System and TALP System Combination for the IWSLT 2006 IWSLT 2006, Kyoto
TALP Phrase-Based System and TALP System Combination for the IWSLT 2006 IWSLT 2006, Kyoto Marta R. Costa-jussà, Josep M. Crego, Adrià de Gispert, Patrik Lambert, Maxim Khalilov, José A.R. Fonollosa, José
More informationMachine Learning for Structured Prediction
Machine Learning for Structured Prediction Grzegorz Chrupa la National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Grzegorz Chrupa la (DCU) Machine Learning for
More informationFoundation of Intelligent Systems, Part I. SVM s & Kernel Methods
Foundation of Intelligent Systems, Part I SVM s & Kernel Methods mcuturi@i.kyoto-u.ac.jp FIS - 2013 1 Support Vector Machines The linearly-separable case FIS - 2013 2 A criterion to select a linear classifier:
More informationCS-E4830 Kernel Methods in Machine Learning
CS-E4830 Kernel Methods in Machine Learning Lecture 5: Multi-class and preference learning Juho Rousu 11. October, 2017 Juho Rousu 11. October, 2017 1 / 37 Agenda from now on: This week s theme: going
More informationHomework 3. Convex Optimization /36-725
Homework 3 Convex Optimization 10-725/36-725 Due Friday October 14 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationChapter 1. Preliminaries
Introduction This dissertation is a reading of chapter 4 in part I of the book : Integer and Combinatorial Optimization by George L. Nemhauser & Laurence A. Wolsey. The chapter elaborates links between
More informationCutting Plane Training of Structural SVM
Cutting Plane Training of Structural SVM Seth Neel University of Pennsylvania sethneel@wharton.upenn.edu September 28, 2017 Seth Neel (Penn) Short title September 28, 2017 1 / 33 Overview Structural SVMs
More informationSupport Vector Machines
Support Vector Machines Support vector machines (SVMs) are one of the central concepts in all of machine learning. They are simply a combination of two ideas: linear classification via maximum (or optimal
More informationAutomatic Speech Recognition and Statistical Machine Translation under Uncertainty
Outlines Automatic Speech Recognition and Statistical Machine Translation under Uncertainty Lambert Mathias Advisor: Prof. William Byrne Thesis Committee: Prof. Gerard Meyer, Prof. Trac Tran and Prof.
More informationComputing Lattice BLEU Oracle Scores for Machine Translation
Computing Lattice Oracle Scores for Machine Translation Artem Sokolov & Guillaume Wisniewski & François Yvon {firstname.lastname}@limsi.fr LIMSI, Orsay, France 1 Introduction 2 Oracle Decoding Task 3 Proposed
More informationEfficient Path Counting Transducers for Minimum Bayes-Risk Decoding of Statistical Machine Translation Lattices
Efficient Path Counting Transducers for Minimum Bayes-Risk Decoding of Statistical Machine Translation Lattices Graeme Blackwood, Adrià de Gispert, William Byrne Machine Intelligence Laboratory Cambridge
More informationSupport Vector Machines: Kernels
Support Vector Machines: Kernels CS6780 Advanced Machine Learning Spring 2015 Thorsten Joachims Cornell University Reading: Murphy 14.1, 14.2, 14.4 Schoelkopf/Smola Chapter 7.4, 7.6, 7.8 Non-Linear Problems
More informationSupport Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2
More informationSupport Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM
1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University
More informationINF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS. Jan Tore Lønning, Lecture 3, 7 Sep., 2016
1 INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS Jan Tore Lønning, Lecture 3, 7 Sep., 2016 jtl@ifi.uio.no Machine Translation Evaluation 2 1. Automatic MT-evaluation: 1. BLEU 2. Alternatives 3. Evaluation
More information7. Lecture notes on the ellipsoid algorithm
Massachusetts Institute of Technology Michel X. Goemans 18.433: Combinatorial Optimization 7. Lecture notes on the ellipsoid algorithm The simplex algorithm was the first algorithm proposed for linear
More informationSupport Vector Machine II
Support Vector Machine II Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative HW 1 due tonight HW 2 released. Online Scalable Learning Adaptive to Unknown Dynamics and Graphs Yanning
More informationstatistical machine translation
statistical machine translation P A R T 3 : D E C O D I N G & E V A L U A T I O N CSC401/2511 Natural Language Computing Spring 2019 Lecture 6 Frank Rudzicz and Chloé Pou-Prom 1 University of Toronto Statistical
More informationOptimization WS 13/14:, by Y. Goldstein/K. Reinert, 9. Dezember 2013, 16: Linear programming. Optimization Problems
Optimization WS 13/14:, by Y. Goldstein/K. Reinert, 9. Dezember 2013, 16:38 2001 Linear programming Optimization Problems General optimization problem max{z(x) f j (x) 0,x D} or min{z(x) f j (x) 0,x D}
More informationarxiv: v1 [cs.cg] 3 Dec 2014
Combinatorial Redundancy Detection Komei Fukuda 1, Bernd Gärtner 2, and May Szedlák 3 1 Department of Mathematics and Institute of Theoretical Computer Science ETH Zürich, CH-8092 Zürich, Switzerland komeifukuda@mathethzch
More informationLecture 5. Theorems of Alternatives and Self-Dual Embedding
IE 8534 1 Lecture 5. Theorems of Alternatives and Self-Dual Embedding IE 8534 2 A system of linear equations may not have a solution. It is well known that either Ax = c has a solution, or A T y = 0, c
More informationSupport Vector Machines. Machine Learning Fall 2017
Support Vector Machines Machine Learning Fall 2017 1 Where are we? Learning algorithms Decision Trees Perceptron AdaBoost 2 Where are we? Learning algorithms Decision Trees Perceptron AdaBoost Produce
More informationJeff Howbert Introduction to Machine Learning Winter
Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable
More informationAlgorithms for NLP. Machine Translation II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley
Algorithms for NLP Machine Translation II Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Announcements Project 4: Word Alignment! Will be released soon! (~Monday) Phrase-Based System Overview
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationSeparation, Inverse Optimization, and Decomposition. Some Observations. Ted Ralphs 1 Joint work with: Aykut Bulut 1
: Some Observations Ted Ralphs 1 Joint work with: Aykut Bulut 1 1 COR@L Lab, Department of Industrial and Systems Engineering, Lehigh University MOA 2016, Beijing, China, 27 June 2016 What Is This Talk
More informationTightness of LP Relaxations for Almost Balanced Models
Tightness of LP Relaxations for Almost Balanced Models Adrian Weller University of Cambridge AISTATS May 10, 2016 Joint work with Mark Rowland and David Sontag For more information, see http://mlg.eng.cam.ac.uk/adrian/
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationTriplet Lexicon Models for Statistical Machine Translation
Triplet Lexicon Models for Statistical Machine Translation Saša Hasan, Juri Ganitkevitch, Hermann Ney and Jesús Andrés Ferrer lastname@cs.rwth-aachen.de CLSP Student Seminar February 6, 2009 Human Language
More informationSupport Vector Machines
Support Vector Machines Reading: Ben-Hur & Weston, A User s Guide to Support Vector Machines (linked from class web page) Notation Assume a binary classification problem. Instances are represented by vector
More informationCOL 352 Introduction to Automata and Theory of Computation Major Exam, Sem II , Max 80, Time 2 hr. Name Entry No. Group
COL 352 Introduction to Automata and Theory of Computation Major Exam, Sem II 2015-16, Max 80, Time 2 hr Name Entry No. Group Note (i) Write your answers neatly and precisely in the space provided with
More informationLecture notes on the ellipsoid algorithm
Massachusetts Institute of Technology Handout 1 18.433: Combinatorial Optimization May 14th, 007 Michel X. Goemans Lecture notes on the ellipsoid algorithm The simplex algorithm was the first algorithm
More informationMulti-class SVMs. Lecture 17: Aykut Erdem April 2016 Hacettepe University
Multi-class SVMs Lecture 17: Aykut Erdem April 2016 Hacettepe University Administrative We will have a make-up lecture on Saturday April 23, 2016. Project progress reports are due April 21, 2016 2 days
More informationConstrained Optimization and Lagrangian Duality
CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may
More informationA Discriminative Model for Semantics-to-String Translation
A Discriminative Model for Semantics-to-String Translation Aleš Tamchyna 1 and Chris Quirk 2 and Michel Galley 2 1 Charles University in Prague 2 Microsoft Research July 30, 2015 Tamchyna, Quirk, Galley
More informationWeek 8. 1 LP is easy: the Ellipsoid Method
Week 8 1 LP is easy: the Ellipsoid Method In 1979 Khachyan proved that LP is solvable in polynomial time by a method of shrinking ellipsoids. The running time is polynomial in the number of variables n,
More informationLectures 6, 7 and part of 8
Lectures 6, 7 and part of 8 Uriel Feige April 26, May 3, May 10, 2015 1 Linear programming duality 1.1 The diet problem revisited Recall the diet problem from Lecture 1. There are n foods, m nutrients,
More informationNon-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines
Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall
More informationNatural Language Processing Prof. Pawan Goyal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur
Natural Language Processing Prof. Pawan Goyal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 18 Maximum Entropy Models I Welcome back for the 3rd module
More informationMachine Learning 4771
Machine Learning 477 Instructor: Tony Jebara Topic 5 Generalization Guarantees VC-Dimension Nearest Neighbor Classification (infinite VC dimension) Structural Risk Minimization Support Vector Machines
More informationCS599: Convex and Combinatorial Optimization Fall 2013 Lecture 17: Combinatorial Problems as Linear Programs III. Instructor: Shaddin Dughmi
CS599: Convex and Combinatorial Optimization Fall 2013 Lecture 17: Combinatorial Problems as Linear Programs III Instructor: Shaddin Dughmi Announcements Today: Spanning Trees and Flows Flexibility awarded
More informationMAT-INF4110/MAT-INF9110 Mathematical optimization
MAT-INF4110/MAT-INF9110 Mathematical optimization Geir Dahl August 20, 2013 Convexity Part IV Chapter 4 Representation of convex sets different representations of convex sets, boundary polyhedra and polytopes:
More informationChapter 1 Linear Programming. Paragraph 5 Duality
Chapter 1 Linear Programming Paragraph 5 Duality What we did so far We developed the 2-Phase Simplex Algorithm: Hop (reasonably) from basic solution (bs) to bs until you find a basic feasible solution
More informationOut of GIZA Efficient Word Alignment Models for SMT
Out of GIZA Efficient Word Alignment Models for SMT Yanjun Ma National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Series March 4, 2009 Y. Ma (DCU) Out of Giza
More informationSupport vector machines Lecture 4
Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The
More informationSubmodular Functions Properties Algorithms Machine Learning
Submodular Functions Properties Algorithms Machine Learning Rémi Gilleron Inria Lille - Nord Europe & LIFL & Univ Lille Jan. 12 revised Aug. 14 Rémi Gilleron (Mostrare) Submodular Functions Jan. 12 revised
More informationThe Ellipsoid Algorithm
The Ellipsoid Algorithm John E. Mitchell Department of Mathematical Sciences RPI, Troy, NY 12180 USA 9 February 2018 Mitchell The Ellipsoid Algorithm 1 / 28 Introduction Outline 1 Introduction 2 Assumptions
More informationAspects of Tree-Based Statistical Machine Translation
Aspects of Tree-Based Statistical Machine Translation Marcello Federico Human Language Technology FBK 2014 Outline Tree-based translation models: Synchronous context free grammars Hierarchical phrase-based
More informationDecoding and Inference with Syntactic Translation Models
Decoding and Inference with Syntactic Translation Models March 5, 2013 CFGs S NP VP VP NP V V NP NP CFGs S NP VP S VP NP V V NP NP CFGs S NP VP S VP NP V NP VP V NP NP CFGs S NP VP S VP NP V NP VP V NP
More informationUsing the Johnson-Lindenstrauss lemma in linear and integer programming
Using the Johnson-Lindenstrauss lemma in linear and integer programming Vu Khac Ky 1, Pierre-Louis Poirion, Leo Liberti LIX, École Polytechnique, F-91128 Palaiseau, France Email:{vu,poirion,liberti}@lix.polytechnique.fr
More informationConvex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs)
ORF 523 Lecture 8 Princeton University Instructor: A.A. Ahmadi Scribe: G. Hall Any typos should be emailed to a a a@princeton.edu. 1 Outline Convexity-preserving operations Convex envelopes, cardinality
More informationContents. 1 Introduction. 1.1 History of Optimization ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016
ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016 LECTURERS: NAMAN AGARWAL AND BRIAN BULLINS SCRIBE: KIRAN VODRAHALLI Contents 1 Introduction 1 1.1 History of Optimization.....................................
More informationCO 250 Final Exam Guide
Spring 2017 CO 250 Final Exam Guide TABLE OF CONTENTS richardwu.ca CO 250 Final Exam Guide Introduction to Optimization Kanstantsin Pashkovich Spring 2017 University of Waterloo Last Revision: March 4,
More information1 Matroid intersection
CS 369P: Polyhedral techniques in combinatorial optimization Instructor: Jan Vondrák Lecture date: October 21st, 2010 Scribe: Bernd Bandemer 1 Matroid intersection Given two matroids M 1 = (E, I 1 ) and
More informationSupport Vector Machines: Maximum Margin Classifiers
Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang 1 Outline What is behind
More informationLecture 25: Cook s Theorem (1997) Steven Skiena. skiena
Lecture 25: Cook s Theorem (1997) Steven Skiena Department of Computer Science State University of New York Stony Brook, NY 11794 4400 http://www.cs.sunysb.edu/ skiena Prove that Hamiltonian Path is NP
More informationEE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 18
EE/ACM 150 - Applications of Convex Optimization in Signal Processing and Communications Lecture 18 Andre Tkacenko Signal Processing Research Group Jet Propulsion Laboratory May 31, 2012 Andre Tkacenko
More information3. Linear Programming and Polyhedral Combinatorics
Massachusetts Institute of Technology 18.433: Combinatorial Optimization Michel X. Goemans February 28th, 2013 3. Linear Programming and Polyhedral Combinatorics Summary of what was seen in the introductory
More informationLecture 8: Complete Problems for Other Complexity Classes
IAS/PCMI Summer Session 2000 Clay Mathematics Undergraduate Program Basic Course on Computational Complexity Lecture 8: Complete Problems for Other Complexity Classes David Mix Barrington and Alexis Maciel
More informationProbabilistic Graphical Models
Probabilistic Graphical Models David Sontag New York University Lecture 6, March 7, 2013 David Sontag (NYU) Graphical Models Lecture 6, March 7, 2013 1 / 25 Today s lecture 1 Dual decomposition 2 MAP inference
More informationIntroduction to Integer Programming
Lecture 3/3/2006 p. /27 Introduction to Integer Programming Leo Liberti LIX, École Polytechnique liberti@lix.polytechnique.fr Lecture 3/3/2006 p. 2/27 Contents IP formulations and examples Total unimodularity
More informationTheory of Alignment Generators and Applications to Statistical Machine Translation
Theory of Alignment Generators and Applications to Statistical Machine Translation Hemanta K Maji Raghavendra Udupa U IBM India Research Laboratory, New Delhi {hemantkm, uraghave}@inibmcom Abstract Viterbi
More informationSolving the MWT. Recall the ILP for the MWT. We can obtain a solution to the MWT problem by solving the following ILP:
Solving the MWT Recall the ILP for the MWT. We can obtain a solution to the MWT problem by solving the following ILP: max subject to e i E ω i x i e i C E x i {0, 1} x i C E 1 for all critical mixed cycles
More informationNatural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley
Natural Language Processing Classification Classification I Dan Klein UC Berkeley Classification Automatically make a decision about inputs Example: document category Example: image of digit digit Example:
More informationORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation
ORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation Chin-Yew Lin and Franz Josef Och Information Sciences Institute University of Southern California 4676 Admiralty Way
More informationDecomposition Methods for Large Scale LP Decoding
Decomposition Methods for Large Scale LP Decoding Siddharth Barman Joint work with Xishuo Liu, Stark Draper, and Ben Recht Outline Background and Problem Setup LP Decoding Formulation Optimization Framework
More informationLinear, Binary SVM Classifiers
Linear, Binary SVM Classifiers COMPSCI 37D Machine Learning COMPSCI 37D Machine Learning Linear, Binary SVM Classifiers / 6 Outline What Linear, Binary SVM Classifiers Do 2 Margin I 3 Loss and Regularized
More informationThis kind of reordering is beyond the power of finite transducers, but a synchronous CFG can do this.
Chapter 12 Synchronous CFGs Synchronous context-free grammars are a generalization of CFGs that generate pairs of related strings instead of single strings. They are useful in many situations where one
More informationKarush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725
Karush-Kuhn-Tucker Conditions Lecturer: Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given a minimization problem Last time: duality min x subject to f(x) h i (x) 0, i = 1,... m l j (x) = 0, j =
More information15-850: Advanced Algorithms CMU, Fall 2018 HW #4 (out October 17, 2018) Due: October 28, 2018
15-850: Advanced Algorithms CMU, Fall 2018 HW #4 (out October 17, 2018) Due: October 28, 2018 Usual rules. :) Exercises 1. Lots of Flows. Suppose you wanted to find an approximate solution to the following
More information3. Linear Programming and Polyhedral Combinatorics
Massachusetts Institute of Technology 18.453: Combinatorial Optimization Michel X. Goemans April 5, 2017 3. Linear Programming and Polyhedral Combinatorics Summary of what was seen in the introductory
More information