Learning Weighted Automata
|
|
- Lenard Parsons
- 6 years ago
- Views:
Transcription
1 Learning Weighted Automata Joint work with Borja Balle (Amazon Research) MEHRYAR MOHRI COURANT INSTITUTE & GOOGLE RESEARCH.
2 Weighted Automata (WFAs) page 2
3 Motivation Weighted automata (WFAs): image processing (Kari, 1993). automatic speech recognition (MM, Pereira, Riley, 1996, 2008). speech synthesis (Sproat, 1995; Allauzen, MM, Riley 2004). machine translation (e.g., Iglesias et al., 2011). many other NLP tasks (very long list of refs). bioinformatics (Durbin et al., 1998). optical character recognition (Bruel, 2008). model checking (Baier et al., 2009; Aminof et al., 2011). machine learning (Cortes, Kuznetsov, MM, Warmuth, 2015). 3 page
4 Motivation Theory: rational power series, extensively studied (Eilenberg, 1993; Salomaa and Soittola, 1978; Kuich and Salomaa, 1986; Berstel and Retenauer, 1988). Algorithms (see survey chapter: MM, 2009): rational operations, intersection or composition, epsilon-removal, determinization, minimization, disambiguation. page 4
5 Learning Automata Classical results: passive learning (Gold, 1978; Angluin, 1978), (Pitt and Warmuth, 1993). active learning (model with membership and equivalences queries) (Angluin, 1987), (Bergadano and Varrichio, 1994, 1996, 2000) Spectral learning: algorithms (Hsu et al., 2009; Bailly et al., 2009), (Balle and MM, 2012). natural language processing (Balle et al., 2014). reinforcement learning (Boots et al., 2009; Hamilton et al., 2013). page 5
6 Learning Guarantees Existing analyses: (Hsu et al., 2009; Denis et al., 2016): statistical consistency, finitesample guarantees in the realizable case. (Balle and MM, 2012): algorithm-dependent finite-sample guarantee based on a stability analysis. (Kulesza et al., 2014): algorithm-dependent guarantees with distributional assumption (data drawn from some WFA). can we derive general theoretical guarantees for learning WFAs? page 6
7 This Talk Learning scenario, complexity tools. Hypothesis sets. Learning guarantees. page 7
8 Learning Scenario Training data: sample drawn i.i.d. from some distribution D, S = (x 1,y 1 ),...,(x m,y m ). X Yaccording to Problem: find WFA A in hypothesis set H with small expected loss L(A) = E [L(A(x),y)]. (x,y) D note: problem not assumed realizable (distribution according to probabilistic WFA). page 8
9 Emp. Rademacher Complexity Definition: G family of functions mapping from set Z to [a, b]. sample S =(z 1,...,z m ). i s (Rademacher variables): independent uniform random variables taking values in { 1, +1}. apple apple 1 1 apple. g(z1 apple ). 1 mx br S (G) =E sup. g2g m. =E sup m g(z m ) g2g m correlation with random noise i=1 ig(z i ). page 9
10 Emp. Rademacher Complexity Definition: G family of functions mapping from set Z to [a, b]. sample S =(z 1,...,z m ). i s (Rademacher variables): independent uniform random variables taking values in { 1, +1}. apple apple 1 1 apple. g(z1 apple ). 1 mx br S (G) =E sup. g2g m. =E sup m g(z m ) g2g m correlation with random noise Rademacher complexity of G: R m (G) = E S D m[ R b S (G)]. i=1 ig(z i ). page 10
11 Rademacher Complexity Bound Theorem: Let G be a family of functions mapping from Z to [0, 1]. Then, for any > 0, with probability at least 1, the following holds for all g 2G: s E[g(z)] apple 1 mx log 1 g(z i )+2R m (G)+ m 2m. i=1 s E[g(z)] apple 1 mx g(z i )+2R m b log 2 S (G)+3 2m. i=1 Proof: Apply McDiarmid s inequality to (Koltchinskii and Panchenko, 2002; MM et al., 2012) (S) =supe[g] g2g b ES [g]. page 11
12 This Talk Learning scenario, complexity tools. Hypothesis sets. Learning guarantees. page 12
13 Learning Automata Classical formulation: sample find smallest automaton S = (x 1,y 1 ),...,(x m,y m ) 2 ( {0, 1}) m. min A kak 0 Aconsistent with sample: s.t. 8i 2 [m], A(x i )=y i. NP-complete problem (Gold 1978, Angluin 1978); even polynomial approximation is NP-hard (Pitt and Wamuth, 1993). not the right formulation. page 13
14 Analogy: Linear Classifiers Sparse learning formulation: min w2r N kwk 0 s.t. Aw = b. non-convex optimization problem. NP-hard problem. not the right formulation. alternative norm (e.g. norm-1). page 14
15 Questions What is the appropriate norm to use for learning WFAs? Which hypothesis sets should we consider? description in terms of Hankel matrix. description in terms of transition matrices. description in terms of function norm. page 15
16 WFA - Definition WFA A over a semiring (S,,, 0, 1) and alphabet with a finite set of states is defined by Q A initial weight vector A 2 S QA ; final weight vector A 2 S Q A ; transition weight matrix A a 2 S Q A Q A, a 2. Function defined: for any x = x 1 x k 2, Notation: A x = A x1 A xk. A(x) = > AA x1 A xk A. page 16
17 WFA - Illustration A = A = A a = A b = state number initial weight final weight page 17
18 Hankel Matrix Definition: the Hankel matrix H f of function f :! R is the infinite matrix defined by 8u, v 2, H f (u, v) =f(uv). redundancy: f(x) appears in all entries (u, v) with x = uv. H f = 2 v f(uv) 7 5 u.. page 18
19 Theorem of Fliess Theorem (Fliess, 1974): rank(h f )<+1 iff f is rational. In that case, there exists a (minimal) WFA A representing f with rank(h f ) states. page 19
20 Theorem of Fliess Theorem (Fliess, 1974): rank(h f )<+1 iff f is rational. In that case, there exists a (minimal) WFA A representing f with rank(h f ) states. Proof: For any u, v 2, if H is the Hankel matrix of A, then, H(u, v) =A(uv) =( > AA u )(A v A). Thus, H = P A S > A with " #. P A = > A A u 2 R. Q A S A =. ". > A A> v. # 2 R Q A. page 20
21 Standardization (Schützenberger, 1961; Cardon and Crochemore, 1980) page 21
22 Hypothesis Sets In view of the theorem of Fliess, a natural choice is n o H 0 = A: rank(h A ) <r. for some r<+1. But, rank does not define a convex function (equivalent of norm-0 for column vectors). Instead, definition based on nuclear norm and more generally Schatten p-norms: n o H p = A: kh A k p <r, with kh A k p = apple X i p i (H A) 1 p. page 22
23 This Talk Learning scenario, complexity tools. Hypothesis sets. Learning guarantees. page 23
24 Schatten Norms Common choices for p : p =1: nuclear norm (or trace norm) kak. q 1 =Tr p A> A p =2: Frobenius norm kak 2 = Tr[A > A]. p =+1: spectral norm kak +1 = p max(a > A)= max (A). Properties: 1 Hölder s inequality: for p, p 1 with p + 1 p =1, ha, Bi applekak p kbk p. von Neumann s trace inequality theorem: ha, Bi apple P i i(a) i (B). page 24
25 Emp. Rademacher Complexity By definition of the dual norm (or Hölder s inequality), for a sample S =(x 1,...,x m ) and any decomposition x i = u i v i, br S (H p )= 1 m E apple = 1 m E apple apple r m E apple sup A2H p mx i=1 sup kh A k p appler m X i=1 ie > u i H A e vi D m X i=1 ie vi e > u i p. ie vi e > u i, H A E page 25
26 Rad. Complexity for p = 2 Lemma: br S (H 2 ) apple r p m. Proof: since p =2for p =2, br S (H 2 ) apple r apple m m E X ie vi e > u i 2 i=1 v apple r apple u X t m E ie vi e m > 2 u i 2 i=1 v = r apple u X m te i jhe ui e m > v i, e uj e > v j i i,j=1 v = r apple u X m t E he ui e m > v i, e ui e > v i i = p r. m i=1 page 26
27 Lower Bound By the Khintchine-Kahane inequality, v apple r m m E X apple ie vi e > r u u i p t E 2 2m i=1 m X i=1 ie vi e > u i 2 = p r apple se he ui e > v 2m i, e vi e > u i i = 1 p 2 r m. 2 page 27
28 Generalization Bound Theorem: assume that the loss L is the L p loss and is bounded by M. Then, for any > 0, with probability at least 1 over the draw of a sample S of size m, for all A 2 H 2, s L(A) apple L b S (A)+ 2µ p pr log 1 + M m 2m, where µ p = pm p 1. page 28
29 Proof By Talagrand s contraction lemma, o br n(x, y) 7! L(A(x),y): A 2 H 2 = 1 m E apple apple µ p m E = µ p m E = µ p m E apple apple apple sup A2H 2 sup A2H 2 sup A2H 2 sup A2H 2 mx i=1 mx i=1 mx i=1 mx i=1 i A(x i ) y i p i(a(x i ) y i ) (x 7! x p µ p -Lipschitz) ia(x i ) + µ apple m p m E X i=1 ia(x i ) = µ p b RS (H 2 ). iy i page 29
30 Rad. Complexity for p = 1 Lemma: br S (H 1 ) apple r m apple c 1 log(2m + 1) + c 2 p WS log(2m + 1), where W S =min decomp. max{u S,V S }, with U S = max u2 {i: u i = u} V S = max u2 {i: v i = v}. Proof: apply Matrix Berstein bound with M =1, d apple m, V 1 = P i e u i e > u i V 2 = P i e v i e > v i kv 1 k op = U S kv 2 k op = V S. page 30
31 Matrix Bernstein Bound Corollary: let M = P i M i be a finite sum of i.i.d. random matrices with E[M] =0 and km i k op apple M for all i; P i E[M im > i ] V P 1 and i E[M> i M i] V 2. Then, E[kMk op ] apple c 1 M log(d + 1) + c 2 p log(d + 1), (MInsker, 2011; Tropp, 2015) 2+8/ log(2) 3 where c 1 =, c 2 = p 2+4/ p log(2) ; V = diag(v 1, V 2 ), = kvk op, d = Tr(V). kvk op page 31
32 Generalization Bound Theorem: assume that the loss L is the L p loss and is bounded by M. Then, for any > 0, with probability at least 1 over the draw of a sample S of size m, for all A 2 H 1, L(A) apple L b S (A)+ 2µ pc 1 r log(2m + 1) m + 2µ pc 2 r p s W S log(2m + 1) log 2 +3M m 2m, where c 1 = 2+8/ log(2) 3 µ p = pm p 1. c 2 = p 2+4/ p log(2) page 32
33 Conclusion Theory of learning WFAs: data-dependent learning guarantees. can help guide design of algorithms. key role of notion of Hankel matrix ( spectral data-dependent combinatorial quantities (e.g. ). methods, (Hsu et al., 2009; Balle and MM, 2012)). W S page 33
34 Questions Questions: can we use learning bounds (e.g. W S ) to select prefixes/ suffixes defining sub-blocks of Hankel matrix? can we derive learning guarantees for more general algorithms than (Balle and MM, 2012)? computational challenges. page 34
35 Hypothesis Sets Definition based on matrix representation: n A n,p,r = A: Q A = n, k k appler, k k p apple r, max a kak p apple r o. page 35
36 Rademacher Complexities Corollary: let L S = max x i and L m = E, then, i S D m[l S] r n(n + 2)r r br S (A n,p,r ) apple 6 C + p log(l S + 2) m r n(n + 2)r r R m (A n,p,r ) apple 6 C + p log(l m + 2) m, where q (log(r r )) + n +2 + C = q r = max{ r /r, q log + (r )+ q r /r, p r r /r } q log + ( r)+3 p log(2) page 36
Learning Weighted Automata
Learning Weighted Automata Borja Balle 1 and Mehryar Mohri 2,3 1 School of Computer Science, McGill University, Montréal, Canada 2 Courant Institute of Mathematical Sciences, New York, NY 3 Google Research,
More informationFoundations of Machine Learning
Introduction to ML Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu page 1 Logistics Prerequisites: basics in linear algebra, probability, and analysis of algorithms. Workload: about
More informationStructured Prediction Theory and Algorithms
Structured Prediction Theory and Algorithms Joint work with Corinna Cortes (Google Research) Vitaly Kuznetsov (Google Research) Scott Yang (Courant Institute) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE
More informationGeneralization Bounds for Learning Weighted Automata
Generalization Bounds for Learning Weighted Autoata Borja Balle a,, Mehryar Mohri b,c a Departent of Matheatics and Statistics, Lancaster University, Lancaster, UK b Courant Institute of Matheatical Sciences,
More informationOn-Line Learning with Path Experts and Non-Additive Losses
On-Line Learning with Path Experts and Non-Additive Losses Joint work with Corinna Cortes (Google Research) Vitaly Kuznetsov (Courant Institute) Manfred Warmuth (UC Santa Cruz) MEHRYAR MOHRI MOHRI@ COURANT
More informationDeep Boosting. Joint work with Corinna Cortes (Google Research) Umar Syed (Google Research) COURANT INSTITUTE & GOOGLE RESEARCH.
Deep Boosting Joint work with Corinna Cortes (Google Research) Umar Syed (Google Research) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Ensemble Methods in ML Combining several base classifiers
More informationLearning Kernels -Tutorial Part III: Theoretical Guarantees.
Learning Kernels -Tutorial Part III: Theoretical Guarantees. Corinna Cortes Google Research corinna@google.com Mehryar Mohri Courant Institute & Google Research mohri@cims.nyu.edu Afshin Rostami UC Berkeley
More informationAdvanced Machine Learning
Advanced Machine Learning Deep Boosting MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Model selection. Deep boosting. theory. algorithm. experiments. page 2 Model Selection Problem:
More informationUniform concentration inequalities, martingales, Rademacher complexity and symmetrization
Uniform concentration inequalities, martingales, Rademacher complexity and symmetrization John Duchi Outline I Motivation 1 Uniform laws of large numbers 2 Loss minimization and data dependence II Uniform
More informationGeneralization Bounds in Machine Learning. Presented by: Afshin Rostamizadeh
Generalization Bounds in Machine Learning Presented by: Afshin Rostamizadeh Outline Introduction to generalization bounds. Examples: VC-bounds Covering Number bounds Rademacher bounds Stability bounds
More informationLearning with Rejection
Learning with Rejection Corinna Cortes 1, Giulia DeSalvo 2, and Mehryar Mohri 2,1 1 Google Research, 111 8th Avenue, New York, NY 2 Courant Institute of Mathematical Sciences, 251 Mercer Street, New York,
More informationSpectral Learning of Weighted Automata
Spectral Learning of Weighted Automata A Forward-Backward Perspective Borja Balle 1, Xavier Carreras 1, Franco M. Luque 2, and Ariadna Quattoni 1 1 Universitat Politècnica de Catalunya, Barcelona, Spain
More informationRademacher Complexity Bounds for Non-I.I.D. Processes
Rademacher Complexity Bounds for Non-I.I.D. Processes Mehryar Mohri Courant Institute of Mathematical ciences and Google Research 5 Mercer treet New York, NY 00 mohri@cims.nyu.edu Afshin Rostamizadeh Department
More informationA Disambiguation Algorithm for Weighted Automata
A Disambiguation Algorithm for Weighted Automata Mehryar Mohri a,b and Michael D. Riley b a Courant Institute of Mathematical Sciences, 251 Mercer Street, New York, NY 10012. b Google Research, 76 Ninth
More informationSpectral Learning of Weighted Automata
Noname manuscript No. (will be inserted by the editor) Spectral Learning of Weighted Automata A Forward-Backward Perspective Borja Balle Xavier Carreras Franco M. Luque Ariadna Quattoni Received: date
More informationThe definitions and notation are those introduced in the lectures slides. R Ex D [h
Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 2 October 04, 2016 Due: October 18, 2016 A. Rademacher complexity The definitions and notation
More informationAdvanced Machine Learning
Advanced Machine Learning Learning Kernels MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Kernel methods. Learning kernels scenario. learning bounds. algorithms. page 2 Machine Learning
More informationFunctional Analysis Review
Functional Analysis Review Lorenzo Rosasco slides courtesy of Andre Wibisono 9.520: Statistical Learning Theory and Applications September 9, 2013 1 2 3 4 Vector Space A vector space is a set V with binary
More informationADANET: adaptive learning of neural networks
ADANET: adaptive learning of neural networks Joint work with Corinna Cortes (Google Research) Javier Gonzalo (Google Research) Vitaly Kuznetsov (Google Research) Scott Yang (Courant Institute) MEHRYAR
More informationRademacher Bounds for Non-i.i.d. Processes
Rademacher Bounds for Non-i.i.d. Processes Afshin Rostamizadeh Joint work with: Mehryar Mohri Background Background Generalization Bounds - How well can we estimate an algorithm s true performance based
More informationComputational Learning Theory - Hilary Term : Learning Real-valued Functions
Computational Learning Theory - Hilary Term 08 8 : Learning Real-valued Functions Lecturer: Varun Kanade So far our focus has been on learning boolean functions. Boolean functions are suitable for modelling
More informationSpectral learning of weighted automata
Mach Learn (2014) 96:33 63 DOI 10.1007/s10994-013-5416-x Spectral learning of weighted automata A forward-backward perspective Borja Balle Xavier Carreras Franco M. Luque Ariadna Quattoni Received: 8 December
More informationBoosting Ensembles of Structured Prediction Rules
Boosting Ensembles of Structured Prediction Rules Corinna Cortes Google Research 76 Ninth Avenue New York, NY 10011 corinna@google.com Vitaly Kuznetsov Courant Institute 251 Mercer Street New York, NY
More informationIntroduction to Machine Learning Lecture 13. Mehryar Mohri Courant Institute and Google Research
Introduction to Machine Learning Lecture 13 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Multi-Class Classification Mehryar Mohri - Introduction to Machine Learning page 2 Motivation
More informationStructured Prediction
Structured Prediction Ningshan Zhang Advanced Machine Learning, Spring 2016 Outline Ensemble Methods for Structured Prediction[1] On-line learning Boosting AGeneralizedKernelApproachtoStructuredOutputLearning[2]
More informationGeneralization theory
Generalization theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Motivation 2 Support vector machines X = R d, Y = { 1, +1}. Return solution ŵ R d to following optimization problem: λ min w R d 2 w 2 2 + 1
More informationLearning with Imperfect Data
Mehryar Mohri Courant Institute and Google mohri@cims.nyu.edu Joint work with: Yishay Mansour (Tel-Aviv & Google) and Afshin Rostamizadeh (Courant Institute). Standard Learning Assumptions IID assumption.
More informationarxiv: v1 [cs.lg] 25 Oct 2016
Generalization Bounds for Weighted Autoata B. Balle 1 and M. Mohri 2,3 arxiv:1610.07883v1 cs.lg 25 Oct 2016 1 Departent of Matheatics and Statistics, Lancaster University 2 Courant Institute of Matheatical
More informationL p Distance and Equivalence of Probabilistic Automata
International Journal of Foundations of Computer Science c World Scientific Publishing Company L p Distance and Equivalence of Probabilistic Automata Corinna Cortes Google Research, 76 Ninth Avenue, New
More informationDomain Adaptation for Regression
Domain Adaptation for Regression Corinna Cortes Google Research corinna@google.com Mehryar Mohri Courant Institute and Google mohri@cims.nyu.edu Motivation Applications: distinct training and test distributions.
More informationarxiv:math/ v1 [math.co] 18 Jul 2006
Extending the scalars of minimizations arxiv:math/06074v [math.co] 8 Jul 2006 G. DUCHAMP É. LAUGEROTTE 2 Laboratoire d Informatique Fondamentale et Appliquée de Rouen Faculté des Sciences et des Techniques
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE7C (Spring 08): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee7c@berkeley.edu October
More informationRank, Trace-Norm & Max-Norm
Rank, Trace-Norm & Max-Norm as measures of matrix complexity Nati Srebro University of Toronto Adi Shraibman Hebrew University Matrix Learning users movies 2 1 4 5 5 4? 1 3 3 5 2 4? 5 3? 4 1 3 5 2 1? 4
More informationWeighted Finite-State Transducer Algorithms An Overview
Weighted Finite-State Transducer Algorithms An Overview Mehryar Mohri AT&T Labs Research Shannon Laboratory 80 Park Avenue, Florham Park, NJ 0793, USA mohri@research.att.com May 4, 004 Abstract Weighted
More informationFORMULATION OF THE LEARNING PROBLEM
FORMULTION OF THE LERNING PROBLEM MIM RGINSKY Now that we have seen an informal statement of the learning problem, as well as acquired some technical tools in the form of concentration inequalities, we
More informationN-Way Composition of Weighted Finite-State Transducers
International Journal of Foundations of Computer Science c World Scientific Publishing Company N-Way Composition of Weighted Finite-State Transducers CYRIL ALLAUZEN Google Research, 76 Ninth Avenue, New
More informationLecture 3. Random Fourier measurements
Lecture 3. Random Fourier measurements 1 Sampling from Fourier matrices 2 Law of Large Numbers and its operator-valued versions 3 Frames. Rudelson s Selection Theorem Sampling from Fourier matrices Our
More informationLearning k-edge Deterministic Finite Automata in the Framework of Active Learning
Learning k-edge Deterministic Finite Automata in the Framework of Active Learning Anuchit Jitpattanakul* Department of Mathematics, Faculty of Applied Science, King Mong s University of Technology North
More informationSpeech Recognition Lecture 8: Expectation-Maximization Algorithm, Hidden Markov Models.
Speech Recognition Lecture 8: Expectation-Maximization Algorithm, Hidden Markov Models. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.com This Lecture Expectation-Maximization (EM)
More informationLarge-Scale Training of SVMs with Automata Kernels
Large-Scale Training of SVMs with Automata Kernels Cyril Allauzen, Corinna Cortes, and Mehryar Mohri, Google Research, 76 Ninth Avenue, New York, NY Courant Institute of Mathematical Sciences, 5 Mercer
More informationLMI MODELLING 4. CONVEX LMI MODELLING. Didier HENRION. LAAS-CNRS Toulouse, FR Czech Tech Univ Prague, CZ. Universidad de Valladolid, SP March 2009
LMI MODELLING 4. CONVEX LMI MODELLING Didier HENRION LAAS-CNRS Toulouse, FR Czech Tech Univ Prague, CZ Universidad de Valladolid, SP March 2009 Minors A minor of a matrix F is the determinant of a submatrix
More informationMACHINE LEARNING - CS671 - Part 2a The Vapnik-Chervonenkis Dimension
MACHINE LEARNING - CS671 - Part 2a The Vapnik-Chervonenkis Dimension Prof. Dan A. Simovici UMB Prof. Dan A. Simovici (UMB) MACHINE LEARNING - CS671 - Part 2a The Vapnik-Chervonenkis Dimension 1 / 30 The
More informationAdaptive estimation of the copula correlation matrix for semiparametric elliptical copulas
Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas Department of Mathematics Department of Statistical Science Cornell University London, January 7, 2016 Joint work
More informationTheoretical Guarantees for Learning Weighted Automata
Theoretical Guarantees for Learning Weighted Automata Borja Balle ICGI Keynote October 2016 Thanks To My Collaborators! Mehryar Mohri Ariadna Quattoni Xavier Carreras Prakash Panangaden Doina Precup Outline
More informationWeighted Finite-State Transducers in Computational Biology
Weighted Finite-State Transducers in Computational Biology Mehryar Mohri Courant Institute of Mathematical Sciences mohri@cims.nyu.edu Joint work with Corinna Cortes (Google Research). 1 This Tutorial
More informationOnline Learning for Time Series Prediction
Online Learning for Time Series Prediction Joint work with Vitaly Kuznetsov (Google Research) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Motivation Time series prediction: stock values.
More informationPerceptron Mistake Bounds
Perceptron Mistake Bounds Mehryar Mohri, and Afshin Rostamizadeh Google Research Courant Institute of Mathematical Sciences Abstract. We present a brief survey of existing mistake bounds and introduce
More informationComplexity of Equivalence and Learning for Multiplicity Tree Automata
Journal of Machine Learning Research 16 (2015) 2465-2500 Submitted 11/14; Revised 6/15; Published 12/15 Complexity of Equivalence and Learning for Multiplicity Tree Automata Ines Marušić James Worrell
More informationSome tensor decomposition methods for machine learning
Some tensor decomposition methods for machine learning Massimiliano Pontil Istituto Italiano di Tecnologia and University College London 16 August 2016 1 / 36 Outline Problem and motivation Tucker decomposition
More informationRank minimization via the γ 2 norm
Rank minimization via the γ 2 norm Troy Lee Columbia University Adi Shraibman Weizmann Institute Rank Minimization Problem Consider the following problem min X rank(x) A i, X b i for i = 1,..., k Arises
More informationUsing Multiplicity Automata to Identify Transducer Relations from Membership and Equivalence Queries
Using Multiplicity Automata to Identify Transducer Relations from Membership and Equivalence Queries Jose Oncina Dept. Lenguajes y Sistemas Informáticos - Universidad de Alicante oncina@dlsi.ua.es September
More informationSparsification of Graphs and Matrices
Sparsification of Graphs and Matrices Daniel A. Spielman Yale University joint work with Joshua Batson (MIT) Nikhil Srivastava (MSR) Shang- Hua Teng (USC) HUJI, May 21, 2014 Objective of Sparsification:
More informationSparse and Low Rank Recovery via Null Space Properties
Sparse and Low Rank Recovery via Null Space Properties Holger Rauhut Lehrstuhl C für Mathematik (Analysis), RWTH Aachen Convexity, probability and discrete structures, a geometric viewpoint Marne-la-Vallée,
More informationFoundations of Machine Learning Lecture 9. Mehryar Mohri Courant Institute and Google Research
Foundations of Machine Learning Lecture 9 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Multi-Class Classification page 2 Motivation Real-world problems often have multiple classes:
More informationLecture 2: Linear Algebra Review
EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1
More informationLearning probability distributions generated by finite-state machines
Learning probability distributions generated by finite-state machines Jorge Castro Ricard Gavaldà LARCA Research Group Departament de Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya,
More informationInner products and Norms. Inner product of 2 vectors. Inner product of 2 vectors x and y in R n : x 1 y 1 + x 2 y x n y n in R n
Inner products and Norms Inner product of 2 vectors Inner product of 2 vectors x and y in R n : x 1 y 1 + x 2 y 2 + + x n y n in R n Notation: (x, y) or y T x For complex vectors (x, y) = x 1 ȳ 1 + x 2
More informationSome Selected Topics. Spectral Norm in Learning Theory:
Spectral Norm in Learning Theory Slide 1 Spectral Norm in Learning Theory: Some Selected Topics Hans U. Simon (RUB) Email: simon@lmi.rub.de Homepage: http://www.ruhr-uni-bochum.de/lmi Spectral Norm in
More informationSpectral k-support Norm Regularization
Spectral k-support Norm Regularization Andrew McDonald Department of Computer Science, UCL (Joint work with Massimiliano Pontil and Dimitris Stamos) 25 March, 2015 1 / 19 Problem: Matrix Completion Goal:
More informationMathematics 530. Practice Problems. n + 1 }
Department of Mathematical Sciences University of Delaware Prof. T. Angell October 19, 2015 Mathematics 530 Practice Problems 1. Recall that an indifference relation on a partially ordered set is defined
More informationVector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.
Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar
More informationLecture Notes 1: Vector spaces
Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector
More informationKernel Methods for Learning Languages
Kernel Methods for Learning Languages Leonid (Aryeh) Kontorovich a and Corinna Cortes b and Mehryar Mohri c,b a Department of Mathematics Weizmann Institute of Science, Rehovot, Israel 76100 b Google Research,
More informationFoundations of Machine Learning Multi-Class Classification. Mehryar Mohri Courant Institute and Google Research
Foundations of Machine Learning Multi-Class Classification Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation Real-world problems often have multiple classes: text, speech,
More informationON SUM OF SQUARES DECOMPOSITION FOR A BIQUADRATIC MATRIX FUNCTION
Annales Univ. Sci. Budapest., Sect. Comp. 33 (2010) 273-284 ON SUM OF SQUARES DECOMPOSITION FOR A BIQUADRATIC MATRIX FUNCTION L. László (Budapest, Hungary) Dedicated to Professor Ferenc Schipp on his 70th
More informationAdvanced Machine Learning
Advanced Machine Learning Learning with Large Expert Spaces MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Problem Learning guarantees: R T = O( p T log N). informative even for N very large.
More informationMaximum Margin Matrix Factorization
Maximum Margin Matrix Factorization Nati Srebro Toyota Technological Institute Chicago Joint work with Noga Alon Tel-Aviv Yonatan Amit Hebrew U Alex d Aspremont Princeton Michael Fink Hebrew U Tommi Jaakkola
More informationIntroduction to Machine Learning Lecture 11. Mehryar Mohri Courant Institute and Google Research
Introduction to Machine Learning Lecture 11 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Boosting Mehryar Mohri - Introduction to Machine Learning page 2 Boosting Ideas Main idea:
More informationRank Determination for Low-Rank Data Completion
Journal of Machine Learning Research 18 017) 1-9 Submitted 7/17; Revised 8/17; Published 9/17 Rank Determination for Low-Rank Data Completion Morteza Ashraphijuo Columbia University New York, NY 1007,
More informationExploring Granger Causality for Time series via Wald Test on Estimated Models with Guaranteed Stability
Exploring Granger Causality for Time series via Wald Test on Estimated Models with Guaranteed Stability Nuntanut Raksasri Jitkomut Songsiri Department of Electrical Engineering, Faculty of Engineering,
More informationElementary linear algebra
Chapter 1 Elementary linear algebra 1.1 Vector spaces Vector spaces owe their importance to the fact that so many models arising in the solutions of specific problems turn out to be vector spaces. The
More informationLecture Note 5: Semidefinite Programming for Stability Analysis
ECE7850: Hybrid Systems:Theory and Applications Lecture Note 5: Semidefinite Programming for Stability Analysis Wei Zhang Assistant Professor Department of Electrical and Computer Engineering Ohio State
More informationCase study: stochastic simulation via Rademacher bootstrap
Case study: stochastic simulation via Rademacher bootstrap Maxim Raginsky December 4, 2013 In this lecture, we will look at an application of statistical learning theory to the problem of efficient stochastic
More informationMatrix Rank Minimization with Applications
Matrix Rank Minimization with Applications Maryam Fazel Haitham Hindi Stephen Boyd Information Systems Lab Electrical Engineering Department Stanford University 8/2001 ACC 01 Outline Rank Minimization
More information1 Regression with High Dimensional Data
6.883 Learning with Combinatorial Structure ote for Lecture 11 Instructor: Prof. Stefanie Jegelka Scribe: Xuhong Zhang 1 Regression with High Dimensional Data Consider the following regression problem:
More informationWarm up. Regrade requests submitted directly in Gradescope, do not instructors.
Warm up Regrade requests submitted directly in Gradescope, do not email instructors. 1 float in NumPy = 8 bytes 10 6 2 20 bytes = 1 MB 10 9 2 30 bytes = 1 GB For each block compute the memory required
More informationRecent Developments in Compressed Sensing
Recent Developments in Compressed Sensing M. Vidyasagar Distinguished Professor, IIT Hyderabad m.vidyasagar@iith.ac.in, www.iith.ac.in/ m vidyasagar/ ISL Seminar, Stanford University, 19 April 2018 Outline
More informationIntroduction to Algebraic and Geometric Topology Week 3
Introduction to Algebraic and Geometric Topology Week 3 Domingo Toledo University of Utah Fall 2017 Lipschitz Maps I Recall f :(X, d)! (X 0, d 0 ) is Lipschitz iff 9C > 0 such that d 0 (f (x), f (y)) apple
More informationAdvanced Machine Learning
Advanced Machine Learning Follow-he-Perturbed Leader MEHRYAR MOHRI MOHRI@ COURAN INSIUE & GOOGLE RESEARCH. General Ideas Linear loss: decomposition as a sum along substructures. sum of edge losses in a
More informationSelected Examples of CONIC DUALITY AT WORK Robust Linear Optimization Synthesis of Linear Controllers Matrix Cube Theorem A.
. Selected Examples of CONIC DUALITY AT WORK Robust Linear Optimization Synthesis of Linear Controllers Matrix Cube Theorem A. Nemirovski Arkadi.Nemirovski@isye.gatech.edu Linear Optimization Problem,
More informationFinite-State Transducers
Finite-State Transducers - Seminar on Natural Language Processing - Michael Pradel July 6, 2007 Finite-state transducers play an important role in natural language processing. They provide a model for
More informationLearning Multi-Step Predictive State Representations
Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16) Learning Multi-Step Predictive State Representations Lucas Langer McGill University Canada Borja Balle
More informationConvex sets, conic matrix factorizations and conic rank lower bounds
Convex sets, conic matrix factorizations and conic rank lower bounds Pablo A. Parrilo Laboratory for Information and Decision Systems Electrical Engineering and Computer Science Massachusetts Institute
More informationj=1 [We will show that the triangle inequality holds for each p-norm in Chapter 3 Section 6.] The 1-norm is A F = tr(a H A).
Math 344 Lecture #19 3.5 Normed Linear Spaces Definition 3.5.1. A seminorm on a vector space V over F is a map : V R that for all x, y V and for all α F satisfies (i) x 0 (positivity), (ii) αx = α x (scale
More informationNORMS ON SPACE OF MATRICES
NORMS ON SPACE OF MATRICES. Operator Norms on Space of linear maps Let A be an n n real matrix and x 0 be a vector in R n. We would like to use the Picard iteration method to solve for the following system
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationTime Series Prediction & Online Learning
Time Series Prediction & Online Learning Joint work with Vitaly Kuznetsov (Google Research) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Motivation Time series prediction: stock values. earthquakes.
More informationSpectral Learning of Sequential Systems
Spectral Learning of Sequential Systems by Michael Thon A thesis submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science Approved Dissertation Committee:
More informationPlug-in Approach to Active Learning
Plug-in Approach to Active Learning Stanislav Minsker Stanislav Minsker (Georgia Tech) Plug-in approach to active learning 1 / 18 Prediction framework Let (X, Y ) be a random couple in R d { 1, +1}. X
More informationTheoretical Computer Science
Theoretical Computer Science 448 (2012) 41 46 Contents lists available at SciVerse ScienceDirect Theoretical Computer Science journal homepage: www.elsevier.com/locate/tcs Polynomial characteristic sets
More informationStatistical learning theory, Support vector machines, and Bioinformatics
1 Statistical learning theory, Support vector machines, and Bioinformatics Jean-Philippe.Vert@mines.org Ecole des Mines de Paris Computational Biology group ENS Paris, november 25, 2003. 2 Overview 1.
More information11 a 12 a 21 a 11 a 22 a 12 a 21. (C.11) A = The determinant of a product of two matrices is given by AB = A B 1 1 = (C.13) and similarly.
C PROPERTIES OF MATRICES 697 to whether the permutation i 1 i 2 i N is even or odd, respectively Note that I =1 Thus, for a 2 2 matrix, the determinant takes the form A = a 11 a 12 = a a 21 a 11 a 22 a
More informationDomain Adaptation Can Quantity Compensate for Quality?
Domain Adaptation Can Quantity Compensate for Quality? hai Ben-David David R. Cheriton chool of Computer cience University of Waterloo Waterloo, ON N2L 3G1 CANADA shai@cs.uwaterloo.ca hai halev-hwartz
More informationAdvanced Machine Learning
Advanced Machine Learning Learning and Games MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Normal form games Nash equilibrium von Neumann s minimax theorem Correlated equilibrium Internal
More informationA Pseudo-Boolean Set Covering Machine
A Pseudo-Boolean Set Covering Machine Pascal Germain, Sébastien Giguère, Jean-Francis Roy, Brice Zirakiza, François Laviolette, and Claude-Guy Quimper Département d informatique et de génie logiciel, Université
More informationIEOR 265 Lecture 3 Sparse Linear Regression
IOR 65 Lecture 3 Sparse Linear Regression 1 M Bound Recall from last lecture that the reason we are interested in complexity measures of sets is because of the following result, which is known as the M
More informationSample Selection Bias Correction
Sample Selection Bias Correction Afshin Rostamizadeh Joint work with: Corinna Cortes, Mehryar Mohri & Michael Riley Courant Institute & Google Research Motivation Critical Assumption: Samples for training
More informationMath 413/513 Chapter 6 (from Friedberg, Insel, & Spence)
Math 413/513 Chapter 6 (from Friedberg, Insel, & Spence) David Glickenstein December 7, 2015 1 Inner product spaces In this chapter, we will only consider the elds R and C. De nition 1 Let V be a vector
More informationA Randomized Approach for Crowdsourcing in the Presence of Multiple Views
A Randomized Approach for Crowdsourcing in the Presence of Multiple Views Presenter: Yao Zhou joint work with: Jingrui He - 1 - Roadmap Motivation Proposed framework: M2VW Experimental results Conclusion
More informationGeneric ǫ-removal and Input ǫ-normalization Algorithms for Weighted Transducers
International Journal of Foundations of Computer Science c World Scientific Publishing Company Generic ǫ-removal and Input ǫ-normalization Algorithms for Weighted Transducers Mehryar Mohri mohri@research.att.com
More information